List of lists with data frames [duplicate] - r

I know this topic appeared on SO a few times, but the examples were often more complicated and I would like to have an answer (or set of possible solutions) to this simple situation. I am still wrapping my head around R and programming in general. So here I want to use lapply function or a simple loop to data list which is a list of three lists of vectors.
data1 <- list(rnorm(100),rnorm(100),rnorm(100))
data2 <- list(rnorm(100),rnorm(100),rnorm(100))
data3 <- list(rnorm(100),rnorm(100),rnorm(100))
data <- list(data1,data2,data3)
Now, I want to obtain the list of means for each vector. The result would be a list of three elements (lists).
I only know how to obtain list of outcomes for a list of vectors and
for (i in 1:length(data1)){
means <- lapply(data1,mean)
}
or by:
lapply(data1,mean)
and I know how to get all the means using rapply:
rapply(data,mean)
The problem is that rapply does not maintain the list structure.
Help and possibly some tips/explanations would be much appreciated.

We can loop through the list of list with a nested lapply/sapply
lapply(data, sapply, mean)
It is otherwise written as
lapply(data, function(x) sapply(x, mean))
Or if you need the output with the list structure, a nested lapply can be used
lapply(data, lapply, mean)
Or with rapply, we can use the argument how to get what kind of output we want.
rapply(data, mean, how='list')
If we are using a for loop, we may need to create an object to store the results.
res <- vector('list', length(data))
for(i in seq_along(data)){
for(j in seq_along(data[[i]])){
res[[i]][[j]] <- mean(data[[i]][[j]])
}
}

Related

Call multiple dataframes stored in list for a For Loop

I don't know if this is actually feasible but I need to find a workaround for this problem.
I have several dataframes stored in a list that were generated by something like this:
SSE <- list()
for (i in cms){
SSE[[paste0("SE",i)]] <- subset(DF, DF$X == i)
}
where cms is a vector that stores the DF$X values I need. So I end up with a list SSE what has many dataframes that I can use with SSE[["SE1"]] for example.
Now my problem is I want to use all the dataframes is SSE on another for loop and I don't know how to call these. This is an simplified example of what I want to do:
for (i in cms){
SSE[["SE[[i]]"]] <- arrange(SE[["SE[[i]]"]], y)
SSE[["SE[[i]]"]][105,4] <- tail(na.omit(SSE[["SE[[i]]"]]$Nump),1)
}
The actual operations I need to make are a lot more and way more complex than this, so if this isn't actually doable it would be easier for me to re create each dataframe individually instead of a creating them inside a list.
If anyone can tell me how to call these listed dataframes on the second for loop or how to modify the first for loop to create these dataframes individually (as I think I should be able to call those on the second loop) I would greatly appreciate it.
Thanks to anyone reading this!
First without seeing a sample of your data it is difficult to provide specific advice. What is SE, cms and DF?
First you could use split() to avoid the loop to split the initial data frame. Then either use lapply() for loop through the list or use names(SSE) to obtain a vector of list elements names.
#using fake data
DF <- mtcars
cms <- unique(DF$cyl)
SSE <- list()
for (i in cms){
SSE[[paste0("SE",i)]] <- subset(DF, DF$cyl == i)
}
#calling by names
for (i in names(SSE)){
SSE[[i]] <- arrange(SSE[[i]], mpg)
print(SSE[[i]])
}
Option 2
#using split function
SSE2 <- split(DF, DF$cyl)
#using lapply
SSE2 <- lapply(SSE2, function(x){
x <- arrange(x, mpg)
print(x)
})

How to exclude columns with apply

I want to exclude/copy rows/columns of multiple dataframes within a list in a list.
The code doesn't work yet. Maybe somebody here knows what to do.
Zelllysate_extr <- list()
#defining the list
Zelllysate_extr$X0809P3_extr <- X0809P3_extr
#defining the list within the list
X0809P3_extr = lapply(Zelllysate_colr[["X0809P3"]], function(x) {
as.data.frame(x) <- Zelllysate_colr[["X0809P3_colr"]][2:1500, 1 & 3:4]
return(x)
})
#defining the list for the dataframes to place in; 2:1500, 1 & 3:4 are the rows and columns to copy
thanks
Instead of trying to iterate over the list, iterate over the length of the list.
X0809P3_extr = lapply(1: length(Zelllysate_colr[["X0809P3"]]), function(x) {
Zelllysate_colr[["X0809P3_colr"]][[x]][2:1500, c(1, 3:4)]
})
You don't need a return or to set the value equal to something in lapply.
I'm assuming that Zelllysate_colr[["X0809P3"]] is a list within the list Zelllysate_colr.
If this doesn't work, you'll have to share some of your data. Most of the time the output from dput(head(dataObject)) is enough, but I think you're working with lists of lists, so that might not be enough to see the structure. You can read about how to ask great questions to get great answers quickly.

Assign names to sublists in a list

I have a list that contains three sublists, each of those sublists containing two objects.
Now I am looking for an efficent way to assign names to those objects in the sublists. In this case, the single objects of each sublist are supposed to have the same names (Matrix1 and Matrix2).
Here is a easy reproducible example:
# create random matrices
matrix1 <- matrix(rnorm(36),nrow=6)
matrix2 <- matrix(rnorm(36),nrow=6)
# combine the matrices to three lists
sublist1 <- list(matrix1, matrix2)
sublist2 <- list(matrix1, matrix2)
sublist3 <- list(matrix1, matrix2)
# combine the lists to one top list
Toplist <- list(sublist1, sublist2, sublist3)
I can do this by using a for loop:
# assign names via for loop
for (i in 1:length(Toplist)) {
names(Toplist[[i]]) <- c("Matrix1", "Matrix2")
}
I am sure there must be a more elegant way using a nested lapply command. But I struggled to implement the names() command inside it.
Anybody with a hint?
Try lapply(Toplist,setNames,c("a","b")).

Combine lapply, seq_along and ddply

I've been searching around this forum and trying to implement in my case what was said in previous answers from those questions. However, something in my code is missing.
I use lapply() with a function inside that runs ddply. This works nice. However, I would like to identify every result from a single data frame by reading the name of the data frame, and not [[1]], [[2]]...
For this reason, I am trying to implement the seq_along argument, but unsuccessfully. Let's see what I have:
I created a list to group 16 different data frames (with the same structure) in one object, called melt_noNA_noDC_regression:
melt_noNA_noDC_regression <-
list(I1U_melt_noNA_noDC_regression, I1L_melt_noNA_noDC_regression,
I1U_melt_noNA_noDC_regression, I1L_melt_noNA_noDC_regression,
CU_melt_noNA_noDC_regression, CL_melt_noNA_noDC_regression,
P3U_melt_noNA_noDC_regression, P3L_melt_noNA_noDC_regression,
P4U_melt_noNA_noDC_regression, P4L_melt_noNA_noDC_regression,
M1U_melt_noNA_noDC_regression, M1L_melt_noNA_noDC_regression,
M2U_melt_noNA_noDC_regression, M2L_melt_noNA_noDC_regression,
M3U_melt_noNA_noDC_regression, M3L_melt_noNA_noDC_regression)
Later, I run this lapply() line successfully.
lapply(melt_noNA_noDC_regression, function(x) ddply(x, .(Species), model_regression))
As I have 16 different data frames, I would like to identify them in the results of the lapply function. I have tried several combinations to include seq_along within the lapply code, as in this case:
lapply(melt_noNA_noDC_regression, function(x) {
ddply(x, .(Species), model_regression)
seq_along(x), function(i) paste(names(x)[[i]], x[[i]])
})
However, I've been getting errors constantly, and it is a bit frustrating. It is maybe very easy to solve, but I am block.
Any idea to solve this?
Consider using eapply (lapply's lesser known sibling) or mget to retrieve a named list of your dataframes. Then run them through lapply for the ddply call to return the same named dataframe list with new corresponding values.
df_list <- eapply(.GlobalEnv, function(d) d)[c("I1U_melt_noNA_noDC_regression",
"I1L_melt_noNA_noDC_regression",
"I1U_melt_noNA_noDC_regression",
...)]
df_list <- mget(c("I1U_melt_noNA_noDC_regression",
"I1L_melt_noNA_noDC_regression",
"I1U_melt_noNA_noDC_regression",
...))
# GENERALIZED FOR ANY DF IN GLOBAL ENV
df_list <- Filter(function(i) class(i)=="data.frame", eapply(.GlobalEnv, function(d) d))
new_list <- lapply(df_list, function(x) ddply(x, .(Species), model_regression))
And because eapply (being environment apply) is part of the apply family and can iterate through objects, you can bypass lapply. But you must account for non-dataframes and then filter out by df names. Hence, tryCatch is used and [] indexing:
new_list2 <- eapply(.GlobalEnv, function(x)
tryCatch(ddply(x, .(Species), model_regression),
warning = function(w) return(NA),
error = function(e) return(NA)
)
)[c("I1U_melt_noNA_noDC_regression",
"I1L_melt_noNA_noDC_regression",
"I1U_melt_noNA_noDC_regression",
...)]
all.equal(new_list, new_list2)
# [1] TRUE
With all that said, ideally in your data processing you would originally use a named dataframe list and not create separate, similar structured 16 objects flooding your global environment. Therefore, consider adjusting the source of your regression objects, so replace the following:
I1U_melt_noNA_noDC_regression <- ...
with this:
df_list = list()
df_list["I1U_melt_noNA_noDC_regression"] <- ...

Calculate Means and Covariances for large list of dataframes, replacing loops with lapply

I previously posted a question of how to create all possible combinations of a set of dataframes or the "power set" of possible data frames in this link:
Creating Dataframes of all Possible Combinations without Repetition of Columns with cbind
I was able to create the list of possible dataframes by first creating all possible combinations of the names of the dataframes, and storing them in Ccols, a section of which looks like this:
using reduce and lapply, I then called each dataframe by its name, and stashed in lists, then stashed all those lists in a list of list to calculate the Means and Covariances:
ll_cov<- list()
ll_ER<- list()
for (ii in 2:length(Ccols)){
l_cov<- list()
l_ER<- list()
for (index in 1:ncol(Ccols[[ii]])){
ls<-list()
for (i in 1:length(Ccols[[ii]][,index]) ){
KK<- get(Ccols[[ii]][i,index])
ls[[i]] <-KK
}
DAT<- transform(Reduce(merge, lapply(ls, function(x) data.frame(x, rn = row.names(x)))), row.names=rn, rn=NULL)
l_cov[[index]]<- cov(DAT)
l_ER[[index]]<- colMeans(DAT)
}
ll_cov[[ii]]<- l_cov
ll_ER[[ii]]<- l_ER
}
However, the Loop is becoming too time-consuming due to the high number of dataframes being processed and cov and colMeans calculations. I searched and came across this example ( Looping over a list of data frames and calculate the correlation coefficient ) which mentions listing data frames and then applying cov as a function, but it still running way too slowly. I tried removing one of the loops by introducing one lapply instead of the very outer loop:
Power_f<- function(X){
l_D<- list()
for (index in 2:ncol(X)){
ls<-list()
for (i in 1:length(X[,index]) ){
KK<- get(X[i,index])
ls[[i]] <-KK
}
DAT<- transform(Reduce(merge, lapply(ls, function(x) data.frame(x, rn = row.names(x)))), row.names=rn, rn=NULL)
l_D[[index]]<- (DAT)
}
return(l_D)
}
lapply(seq(from=2,to=(length(Ccols))), function(i) Power_f(Ccols[[i]]))
But it is still taking too long to run (I am not getting results). Is there a way to replace all the for looping with lapply and make it computationally efficient?

Resources