Pass a function input as column name to data.frame function - r

I have a function taking a character input. Within the function, I want to use the data.frame() function. Within the data.frame() function, one column name should be the function's character input.
I tried it like this and it didn't work:
frame_create <- function(data, **character_input**){
...
some_vector <- c(1:50)
temp_frame <- data.frame(**character_input** = some_vector, ...)
return(temp_frame)
}

Either use, names to assign or with setNames as = wouldn't allow evaluation on the lhs of =. In package functions i.e tibble or lst, it can be created with := and !!
frame_create <- function(data, character_input){
some_vector <- 1:50
temp_frame <- data.frame(some_vector)
names(temp_frame) <- character_input
return(temp_frame)
}

Can you explain your requirement for using a function to create a new dataframe column? If you have a dataframe df and you want to make a copy with a new column appended then the trivial solution is:
df2 <- df
df2$new_col <- 1:50
Example of merging multiple dataframes in R:
cars1 <- mtcars
cars2 <- cars1
cars3 <- cars2
list1 <- list(cars1, cars2, cars3)
all_cars <- Reduce(rbind, list1)

Related

How can lapply work with addressing columns as unknown variables?

So, I have a list of strings named control_for. I have a data frame sampleTable with some of the columns named as strings from control_for list. And I have a third object dge_obj (DGElist object) where I want to append those columns. What I wanted to do - use lapply to loop through control_for list, and for each string, find a column in sampleTable with the same name, and then add that column (as a factor) to a DGElist object. For example, for doing it manually with just one string, it looks like this, and it works:
group <- as.factor(sampleTable[,3])
dge_obj$samples$group <- group
And I tried something like this:
lapply(control_for, function(x) {
x <- as.factor(sampleTable[, x])
dge_obj$samples$x <- x
}
Which doesn't work. I guess the problem is that R can't recognize addressing columns like this. Can someone help?
Here are two base R ways of doing it. The data set is the example of help("DGEList") and a mock up data.frame sampleTable.
Define a vector common_vars of the table's names in control_for. Then create the new columns.
library(edgeR)
sampleTable <- data.frame(a = 1:4, b = 5:8, no = letters[21:24])
control_for <- c("a", "b")
common_vars <- intersect(control_for, names(sampleTable))
1. for loop
for(x in common_vars){
y <- sampleTable[[x]]
dge_obj$samples[[x]] <- factor(y)
}
2. *apply loop.
tmp <- sapply(sampleTable[common_vars], factor)
dge_obj$samples <- cbind(dge_obj$samples, tmp)
This code can be rewritten as a one-liner.
Data
set.seed(2021)
y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
dge_obj <- DGEList(counts=y, group=rep(1:2,each=2))

purrr:pmap not passing values from a list

Calling a function from pmap is throwing an error as arguments are not being passed on their own
Tried creating lists of the parameters but this too has resulted in an error
library(tidyverse)
library(dplyr)
periods <- c(10,11,12)
redemption <- rep(100,3)
firstcallDate <- c("2014-01-01","2015-01-01","2016-01-01")
testdf <- data.frame(redemption, periods, firstcallDate)
testdf$firstcallDate <- as.Date(testdf$firstcallDate)
testdf$CallSch <- NA
CallScheduleGen <- function (redemption,periods,firstcallDate, ...) {
Price <- rep(as.double(redemption),periods)
Date <- seq(firstcallDate, by = "1 month", length = periods)
callSch <- data.frame(Price, Date)
return(callSch)
}
testdf$CallSch <- pmap_dfr(testdf,CallScheduleGen)
I am expecting a dataframe to be created in each of the cells in the testdf dataframe. Pmap appears to pass input arguments to the functions as a list, rather than element-wise.
Can anyone suggest an approach as I need this to be vectorized rather than creating a loop?

Building forvalues loops in R

[Working with R 3.2.2]
I have three data frames with the same variables. I need to modify the value of some variables and change the name of the variables (rename the columns). Instead of doing this data frame by data frame, I would like to use a loop.
This is the code I want to run:
#Change the values of the variables
vlist <- c("var1", "var2", "var3")
dataframe0[,vlist] <- dataframe0[,vlist]/10
dataframe1[,vlist] <- dataframe1[,vlist]/10
dataframe2[,vlist] <- dataframe2[,vlist]/10
#Change the name of the variables
colnames(dataframe0)[colnames(dataframe0)=="var1"] <- "temp_min"
colnames(dataframe0)[colnames(dataframe0)=="var2"] <- "temp_max"
colnames(dataframe0)[colnames(dataframe0)=="var3"] <- "prep"
colnames(dataframe1)[colnames(dataframe1)=="var1"] <- "temp_min"
colnames(dataframe1)[colnames(dataframe1)=="var2"] <- "temp_max"
colnames(dataframe1)[colnames(dataframe1)=="var3"] <- "prep"
colnames(dataframe2)[colnames(dataframe2)=="var1"] <- "temp_min"
colnames(dataframe2)[colnames(dataframe2)=="var2"] <- "temp_max"
colnames(dataframe2)[colnames(dataframe2)=="var3"] <- "prep"
I know the logic to do it with programs like Stata, with a forvalues loop:
#Change the values of the variables
forvalues i=0/2 {
dataframe`i'[,vlist] <- dataframe`i'[,vlist]/10
#Change the name of the variables
colnames(dataframe`i')[colnames(dataframe`i')=="var1"] <- "temp_min"
colnames(dataframe`i')[colnames(dataframe`i')=="var2"] <- "temp_max"
colnames(dataframe`i')[colnames(dataframe`i')=="var3"] <- "prep"
}
But, I am not able to reproduce it in R. How should I proceed? Thanks in advance!
I would go working with a list of dataframe, you can still 'split' it after if really needed:
df1 <- data.frame("id"=1:10,"var1"=11:20,"var2"=11:20,"var3"=11:20,"test"=1:10)
df2 <- df1
df3 <- df1
dflist <- list(df1,df2,df3)
for (i in seq_along(dflist)) {
df[[i]]['test'] <- df[[i]]['test']/10
colnames( dflist[[i]] )[ colnames(dflist[[i]]) %in% c('var1','var2','var3') ] <- c('temp_min','temp_max','prep')
# eventually reassign df1-3 to their list value:
# assign(paste0("df",i),dflist[[i]])
}
The interest of using a list is that you can access them a little more easily in a programmatic way.
I did change your code from 3 calls to only one, as colnames give a vector you can subset it and replace in one pass, this is assuming your var1 to var3 are always in the same order.
Addendum: if you want a single dataset at end you can use do.call(rbind,dflist) or with data.table package rbindlist(dflist).
More details on working with list of data.frames in Gregor's answer here

How can I make a tibble/tbl_df/data_frame from a vector or vectors

I have a name and a vector
my.name <- 'data.values'
my.vec <- 1:5
and I'd like to make a tibble/tbl_df/data_frame with one column that has my.name as the name of that column and my.vec as the values. What I have is
df <- data_frame(placeholder = rep(NA, length(my.vec)))
df[[my.name]] <- my.vec
df[['placeholder']] <- NULL
Which just feels silly. Is there an easier way to do this?
I am also interested in the case where I have multiple vectors and multiple names, e.g.
my.name1 <- 'data.values.day1'
my.name2 <- 'data.values.day2'
my.vec1 <- 1:5
my.vec2 <- 2:6
...
I think the best answer came in a comment.
DirtySockSniffer recommended:
as_data_frame(setNames(list(my.vec), my.name)))
which generalizes nicely to the multiple column situation
as_data_frame(setNames(list(my.vec1, my.vec2),
c(my.name1, my.name2)))
You can create a data_frame first and then set its column names:
my.data <- data_frame(my.vec.1, my.vec.2, ...)
names(my.data) <- c(my.name.1, my.name.2, ...) # Order is important here

R lapply on list of dataframes resetting rownames

You can reset the rownames in a data frame by running
>rownames(df) <- NULL
I have a list of dataframes and want to reset all the rownames on every dataframe in the list, I tried
>newlist <- llply(mylist, function(df) { rownames(df) <- NULL })
Bu tit doesn't work, returns a list of NULLS and the original remains unchanged.
This is a job for the base function lapply; you don't need to load plyr. You also need to make sure that your anonymous function returns something.
df1 <- data.frame(a=1:10)
rownames(df1) <- letters[1:10]
df2 <- data.frame(b=1:10)
rownames(df2) <- LETTERS[1:10]
mylist <- list(df1,df2)
mylist <- lapply(mylist,function(DF) {rownames(DF) <- NULL; DF})
Use rownames<- :
newlist <- lapply(mylist, "rownames<-", NULL)

Resources