Saving data frames to values in a list - r

I have a list of titles that I would like to iterate over and create/save data frames to. I have tried the using the paste() function (as seen below) but that does not work for me. Any advice would be greatly appreciated.
samples <- list("A","B","C")
for (i in samples){
paste(i,sumT,sep="_") <- data.frame(col1=NA,col1=NA)
}
My desired output is three empty data frames named: A_sumT, B_sumT and C_sumT

Here's an answer with purrr.
samples <- list("A", "B", "C")
samples %>%
purrr::map(~ data.frame()) %>%
purrr::set_names(~ paste(samples, "sumT", sep="_"))

Consider creating a list of dataframes and avoid many separate objects flooding global environment as this example can extend to hundreds and not just three. Plus with this approach, you will maintain one container capable of running bulk operations across all dataframes.
By using sapply below on a character vector, you create a named list:
samples <- c("A","B","C") # OR unlist(list("A","B","C"))
df_list <- sapply(samples, function(x) data.frame(col1=NA,col2=NA), simplify=FALSE)
# RUN ANY DATAFRAME OPERATION
head(df_list$A)
tail(df_list$B)
summary(df_list$C)
# BULK OPERATIONS
stacked_df <- do.call(rbind, df_list)
stacked_df <- do.call(cbind, df_list)
merged_df <- Reduce(function(x,y) merge(x,y,by="col1"), df_list)
Or if you need to rename list
# RENAME LIST
df_list <- setNames(df_list, paste0(samples, "_sumT"))
# RUN ANY DATAFRAME OPERATION
head(df_list$A_sumT)
tail(df_list$B_sumT)
summary(df_list$C_sumT)

Related

How to create multiple dataframes with lapply()?

I want do same things to create different data frames, can I use lapply achieve?
I tried to did it but not succeed
xx<-c("a1","b1")
lapply(xx, function(x){
x<-data.frame(c(1,2,3,4),"1")
})
I hope I can get two data frames ,like
a1<-data.frame(c(1,2,3,4),"1")
b1<-data.frame(c(1,2,3,4),"1")
An option that assigns to the .Globalenv. This as pointed out is less efficient but was provided to answer the OP's question as is:
lapply(xx, function(x) assign(x,data.frame(A=c(1,2,3,4),
B="1"),
envir=.GlobalEnv))
You can then call each data frame with their names.
a1, b1.
You could try using sapply over the xx vector of names to populate a list with the data frames:
lst <- list()
xx <- c("a1", "b1")
sapply(xx, function(x) {
lst[[x]] <- data.frame(c(1,2,3,4), "1")
})
Then, you may access each data frame using the list, e.g. lst$a1.

R: transforming multiple sets to dataframes at once

I have 31 datasets corresponding to data about 31 teachers. I need to perform multiple transformations on all these datasets. One of them is transforming all of them into dataframes
class(alexandre)
[1] "tbl_df" "tbl" "data.frame"
As I said, I have 31 similar datasets, and I need to transform all into dataframes. My code to do so has been
alexandre <- as.data.frame(alexandre)
adrian <- as.data.frame(adrian)
akemi <- as.data.frame(akemi)
arcanjo <- as.data.frame(arcanjo)
ana_barbara <- as.data.frame(ana_barbara)
brigida <- as.data.frame(brigida)
cleiton <- as.data.frame(cleiton)
daniela <- as.data.frame(daniela)
davi <- as.data.frame(davi)
eliezer <- as.data.frame(eliezer)
eduardo <- as.data.frame(eduardo)
eustaquio <- as.data.frame(eustaquio)
gilberto <- as.data.frame(gilberto)
gilmar <- as.data.frame(gilmar)
jorge <- as.data.frame(jorge)
juarez <- as.data.frame(juarez)
junior <- as.data.frame(junior)
... and add some rows to this code (31 lines of this). Obviously all these lines of code take too much space and there must be a faster(and more elegant) way to accomplish this. In fact, I tried this
teachers <- c(alexandre, akemi, adrian, brigida, davi, ...)
cnames <- function(x){
colnames(x) <- c(1:18)
}
mapply(cnames, teachers)
Then I would do all the work with a few lines of code. And this method (form a vector containing all datasets, then use mapply on the vector) would make my work much easier because, as I said, I have to perform multiple transformation on all these datasets.
This code does not work, however. I get the following error:
Error in `colnames<-`(`*tmp*`, value = c(1:18)) :
attempt to set 'colnames' on an object with less than two dimensions
This error message is very unenlightening, I find. I have no idea what to do to to make the code work, which is obviously why I'm here. Any other methods to accomplish what I'm trying to do are welcome. Thanks.
As commented and often discussed in the R tag of SO, simply use a list to maintain all your individual, similarly structured data frames. Doing so allows you the following benefits:
Easily run operations consistently across all items using loops or apply family calls without separate naming assignments.
Organizes your environment and workspace with maintenance of one object with easy reference by number or name instead of 31 objects flooding your global environment.
Facilitates data frame migrations and handling with rbind, cbind, split, by, or other operations.
To create a list of all current data frames in global environment use eapply or mget filtering on data frame objects. Each returns a named list of data frames.
teachers_df_list <- Filter(is.data.frame, eapply(.GlobalEnv, identity))
teachers_df_list <- Filter(is.data.frame, mget(x=ls()))
Alternatively, source your data frames originally from file sources using list objects such as list.files:
teachers_df_list <- lapply(list.files(...), function(f) read.csv(f, ...))
You lose no functionality of data frame if stored inside a list.
head(teachers_df_list$alexandre)
tail(teachers_df_list$adrian)
summary(teachers_df_list$akemi)
...
Then run your needed operations with lapply like renaming columns with right-hand side function, setNames. Run other needed operations: aggregate or lm.
new_teachers_df_list <- lapply(teachers_df_list,
function(df) setNames(df, paste0("col_", c(1:18)))
new_teachers_agg_list <- lapply(teachers_df_list,
function(df) aggregate(col1 ~ col2, df, sum))
new_teachers_model_list <- lapply(teachers_df_list,
function(df) summary(lm(col1 ~ col2, df)))
Even compile all data frames into one master version using do.call + rbind:
# ADD A TEACHER INDICATOR COLUMN
new_teachers_df_list <- Map(function(df, n) transform(df, teacher=n),
new_teachers_df_list, names(new_teachers_df_list))
# BUILD SINGLE DF
teachers_df <- do.call(rbind, new_teachers_df_list)
Even split master version back into individual groupings if needed later on:
# SPLIT BACK TO LIST OF DFs
teachers_df_list <- split(teachers_df, teachers_df$teacher)
Maybe you could use a list to stock all your data.frame. It seems to work, but you need to find a way to extract all data.frame in the list after that.
df_1 <- data.frame(c(0, 1, 0), c(3, 4, 5))
df_2 <- data.frame(c(0, 1, 0), c(3, 4, 5))
l <- list(df_1, df_2)
lapply(l, function(x){
colnames(x) <- 1:2
return(x)
})

R: Adress objects deep inside lists with filter commands inside functions/loops (ExtremeBounds package)

I am using the ExtremeBounds package which provides as a result a multi level list with (amongst others) dataframes at the lowest level. I run this package over several specifications and I would like to collect some columns of selected dataframes in these results. These should be collected by specification (spec1 and spec2 in the example below) and arranged in a list of dataframes. This list of dataframes can then be used for all kind of things, for example to export the results of different specifications into different Excel Sheets.
Here is some code which creates the problematic object (just run this code blindly, my problem only concerns how to deal with the kind of list it creates: eba_results):
library("ExtremeBounds")
Data <- data.frame(var1=rbinom(30,1,0.2),var2=rbinom(30,2,0.2),
var3=rnorm(30),var4=rnorm(30),var5=rnorm(30))
spec1 <- list(y=c("var1"),
freevars=c("var2"),
doubtvars=c("var3","var4"))
spec2 <- list(y=c("var1"),
freevars=c("var2"),
doubtvars=c("var3","var4","var5"))
indicators <- c("spec1","spec2")
ebaFun <- function(x){
eba <- eba(data=Data, y=x$y,
free=x$freevars,
doubtful=x$doubtvars,
reg.fun=glm, k=1, vif=7, draws=50, weights = "lri", family = binomial(logit))}
eba_results <- lapply(mget(indicators),ebaFun) #eba_results is the object in question
Manually I know how to access each element, for example:
eba_results$spec1$bounds$type #look at str(eba_results) to see the different levels
So "bounds" is a dataframe with identical column names for both spec1 and spec2. I would like to collect the following 5 columns from "bounds":
type, cdf.mu.normal, cdf.above.mu.normal, cdf.mu.generic, cdf.above.mu.generic
into one dataframe per spec. Manually this is simple but ugly:
collectedManually <-list(
manual_spec1 = data.frame(
type=eba_results$spec1$bounds$type,
cdf.mu.normal=eba_results$spec1$bounds$cdf.mu.normal,
cdf.above.mu.normal=eba_results$spec1$bounds$cdf.above.mu.normal,
cdf.mu.generic=eba_results$spec1$bounds$cdf.mu.generic,
cdf.above.mu.generic=eba_results$spec1$bounds$cdf.above.mu.generic),
manual_spec2= data.frame(
type=eba_results$spec2$bounds$type,
cdf.mu.normal=eba_results$spec2$bounds$cdf.mu.normal,
cdf.above.mu.normal=eba_results$spec2$bounds$cdf.above.mu.normal,
cdf.mu.generic=eba_results$spec2$bounds$cdf.mu.generic,
cdf.above.mu.generic=eba_results$spec2$bounds$cdf.above.mu.generic))
But I have more than 2 specifications and I think this should be possible with lapply functions in a prettier way. Any help would be appreciated!
p.s.: A generic example to which hrbrmstr's answer applies but which turned out to be too simplistic:
exampleList = list(a=list(aa=data.frame(A=rnorm(10),B=rnorm(10)),bb=data.frame(A=rnorm(10),B=rnorm(10))),
b=list(aa=data.frame(A=rnorm(10),B=rnorm(10)),bb=data.frame(A=rnorm(10),B=rnorm(10))))
and I want to have an object which collects, for example, all the A and B vectors into two data frames (each with its respective A and B) which are then a list of data frames. Manually this would look like:
dfa <- data.frame(A=exampleList$a$aa$A,B=exampleList$a$aa$B)
dfb <- data.frame(A=exampleList$a$aa$A,B=exampleList$a$aa$B)
collectedResults <- list(a=dfa, b=dfb)
There's probably a less brute-force way to do this.
If you want lists of individual columns this is one way:
get_col <- function(my_list, col_name) {
unlist(lapply(my_list, function(x) {
lapply(x, function(y) { y[, col_name] })
}), recursive=FALSE)
}
get_col(exampleList, "A")
get_col(exampleList, "B")
If you want a consolidated data.frame of indicator columns this is one way:
collect_indicators <- function(my_list, indicators) {
lapply(my_list, function(x) {
do.call(rbind, c(lapply(x, function(y) { y[, indicators] }), make.row.names=FALSE))
})[[1]]
}
collect_indicators(exampleList, c("A", "B"))
If you just want to bring the individual data.frames up a level to make it easier to iterate over to write to a file:
unlist(exampleList, recursive=FALSE)
Much assumption about the true output format is being made (the question was a bit vague).
There is a brute force way which works but is dependent on several named objects:
collectEBA <- function(x){
df <- paste0("eba_results$",x,"$bounds")
df <- eval(parse(text=df))[,c("type",
"cdf.mu.normal","cdf.above.mu.normal",
"cdf.mu.generic","cdf.above.mu.generic")]
df[is.na(df)] <- "NA"
df
}
eba_export <- lapply(indicators,collectEBA)
names(eba_export) <- indicators

Using for loops to match pairs of data frames in R

Using a particular function, I wish to merge pairs of data frames, for multiple pairings in an R directory. I am trying to write a ‘for loop’ that will do this job for me, and while related questions such as Merge several data.frames into one data.frame with a loop are helpful, I am struggling to adapt example loops for this particular use.
My data frames end with either “_df1.csv” or ‘_df2.csv”. Each pair, that I wish to merge into an output data frame, has an identical number at the being of the file name (i.e. 543_df1.csv and 543_df2.csv).
I have created a character string for each of the two types of file in my directory using the list.files command as below:
df1files <- list.files(path="~/Desktop/combined files” pattern="*_df1.csv", full.names=T, recursive=FALSE)
df2files <- list.files(path="="~/Desktop/combined files ", pattern="*_df2.csv", full.names=T, recursive=FALSE)
The function and commands that I want to apply in order to merge each pair of data frames are as follows:
findRow <- function(dt, df) { min(which(df$datetime > dt )) }
rows <- sapply(df2$datetime, findRow, df=df1)
merged <- cbind(df2, df1[rows,])
I am now trying to incorporate these commands into a for loop starting with something along the following lines, to prevent me from having to manually merge the pairs:
for(i in 1:length(df2files)){ ……
I am not yet a strong R programmer, and have hit a wall, so any help would be greatly appreciated.
My intuition (which I haven't had a chance to check) is that you should be able to do something like the following:
# read in the data as two lists of dataframes:
dfs1 <- lapply(df1files, read.csv)
dfs2 <- lapply(df2files, read.csv)
# define your merge commands as a function
merge2 <- function(df1, df2){
findRow <- function(dt, df) { min(which(df$datetime > dt )) }
rows <- sapply(df2$datetime, findRow, df=df1)
merged <- cbind(df2, df1[rows,])
}
# apply that merge command to the list of lists
mergeddfs <- mapply(merge2, dfs1, dfs2, SIMPLIFY=FALSE)
# write results to files
outfilenames <- gsub("df1","merged",df1files)
mapply(function(x,y) write.csv(x,y), mergeddfs, outfilenames)

creat a new variable within several data frames in R

I have several data frames df1, df, 2...., df10. Columns (variables) are the same in all of them.
I want to create a new variable within each of them. I can easily do it "manually" as follows:
df1$newvariable <- ifelse(df1$oldvariable == 999, NA, df1$oldvariable)
or, alternatively
df1 = transform(df1, df1$newvariable= ifelse(df1$oldvariable==999, NA, df1$oldvariable)))
Unfortunately I'm not able to do this in a loop. If I write
for (i in names) { #names is the list of dataframes
i$newvariable <- ifelse(i$oldvariable == 999, NA, i$oldvariable)
}
I get the following output
Error in i$oldvariable : $ operator is invalid for atomic vectors
What I'd do is to pool all data.frame on to a list and then use lapply as follows:
df1 <- as.data.frame(matrix(runif(2*10), ncol=2))
df2 <- as.data.frame(matrix(runif(2*10), ncol=2))
df3 <- as.data.frame(matrix(runif(2*10), ncol=2))
df4 <- as.data.frame(matrix(runif(2*10), ncol=2))
# create a list and use lapply
df.list <- list(df1, df2, df3, df4)
out <- lapply(df.list, function(x) {
x$id <- 1:nrow(x)
x
})
Now, you'll have all the data.frames with a new column id appended and out is a list of data.frames. You can access each of the data.frames with x[[1]], x[[2]] etc...
This has been asked many times. The $<- is not capable of translating that "i" index into either the first or second arguments. The [[<- is capable of doing so for the second argument but not the first. You should be learning to use lapply and you will probably need to do it with two nested lapply's, one for the list of "names" and the other for each column in the dataframes. The question is incomplete since it lacks specific examples. Make up a set of three dataframes, set some of the values to "999" and provide a list of names.

Resources