Transpose a List of Lists - r

I have a list which contains list entries, and I need to transpose the structure.
The original structure is rectangular, but the names in the sub-lists do not match.
Here is an example:
ax <- data.frame(a=1,x=2)
ay <- data.frame(a=3,y=4)
bw <- data.frame(b=5,w=6)
bz <- data.frame(b=7,z=8)
before <- list( a=list(x=ax, y=ay), b=list(w=bw, z=bz))
What I want:
after <- list(w.x=list(a=ax, b=bw), y.z=list(a=ay, b=bz))
I do not care about the names of the resultant list (at any level).
Clearly this can be done explicitly:
after <- list(x.w=list(a=before$a$x, b=before$b$w), y.z=list(a=before$a$y, b=before$b$z))
but this is ugly and only works for a 2x2 structure. What's the idiomatic way of doing this?

The following piece of code will create a list with i-th element of every list in before:
lapply(before, "[[", i)
Now you just have to do
n <- length(before[[1]]) # assuming all lists in before have the same length
lapply(1:n, function(i) lapply(before, "[[", i))
and it should give you what you want. It's not very efficient (travels every list many times), and you can probably make it more efficient by keeping pointers to current list elements, so please decide whether this is good enough for you.

The purrr package now makes this process really easy:
library(purrr)
before %>% transpose()
## $x
## $x$a
## a x
## 1 1 2
##
## $x$b
## b w
## 1 5 6
##
##
## $y
## $y$a
## a y
## 1 3 4
##
## $y$b
## b z
## 1 7 8

Here's a different idea - use the fact that data.table can store data.frame's (in fact, given your question, maybe you don't even need to work with lists of lists and could just work with data.table's):
library(data.table)
dt = as.data.table(before)
after = as.list(data.table(t(dt)))

While this is an old question, i found it while searching for the same problem, and the second hit on google had a much more elegant solution in my opinion:
list_of_lists <- list(a=list(x="ax", y="ay"), b=list(w="bw", z="bz"))
new <- do.call(rbind, list_of_lists)
new is now a rectangular structure, a strange object: A list with a dimension attribute. It works with as many elements as you wish, as long as every sublist has the same length. To change it into a more common R-Object, one could for example create a matrix like this:
new.dims <- dim(new)
matrix(new,nrow = new.dims[1])
new.dims needed to be saved, as the matrix() function deletes the attribute of the list. Another way:
new <- do.call(c, new)
dim(new) <- new.dims
You can now for example convert it into a data.frame with as.data.frame() and split it into columns or do column wise operations. Before you do that, you could also change the dim attribute of the matrix, if it fits your needs better.

I found myself with this problem but I needed a solution that kept the names of each element. The solution I came up with should also work when the sub lists are not all the same length.
invertList = function(l){
elemnames = NULL
for (i in seq_along(l)){
elemnames = c(elemnames, names(l[[i]]))
}
elemnames = unique(elemnames)
res = list()
for (i in seq_along(elemnames)){
res[[elemnames[i]]] = list()
for (j in seq_along(l)){
if(exists(elemnames[i], l[[j]], inherits = F)){
res[[i]][[names(l)[j]]] = l[[names(l)[j]]][[elemnames[i]]]
}
}
}
res
}

Related

Creating Dataframes through R For Loop

Fairly new to R, so any guidance is appreciated.
GOAL: I'm trying to create hundreds of dataframes in a short script. They follow a pattern, so I thought a For Loop would suffice, but the data.frame function seems to ignore the variable nature of the variable, reading it as it appears. Here's an example:
# Defining some dummy variables for the sake of this example
dfTitles <- c("C2000.AMY", "C2000.ACC", "C2001.AMY", "C2001.ACC")
Copes <- c("Cope1", "Cope2", "Cope3", "Cope4")
Voxels <- c(1:338)
# (Theoretically) creating a separate dataframe for each of the terms in 'dfTitles'
for (i in dfTitles){
i <- data.frame(matrix(0, nrow = 4, ncol = 338, dimnames = list(Copes, Voxels)))
}
# Trying an alternative method
for (i in 1:length(dfTitles))
{dfTitles[i] <- data.frame(matrix(0, nrow = 4, ncol = 338, dimnames = list(Copes, Voxels)))}
This results in the creation of one dataframe named 'i', in the former, or a list of 4, in the case of the latter. Any ideas? Thank you!
PROBABLY UNNECESSARY BACKGROUND INFORMATION: We're using fMRI data to run an analysis which will run correlations across stimuli, brain voxels, brain regions, and participants. We're correlating whole matrices, so separating the values (aka COPEs) into separate dataframes by both Participant ID and Brain Region is going to make the next step much much easier. I already had tried the next step after having loaded and sorted the data into one large dataframe and it was a big pain in the butt.
rm(list=ls)
dfTitles <- c("C2000.AMY", "C2000.ACC", "C2001.AMY", "C2001.ACC")
Copes <- c("Cope1", "Cope2", "Cope3", "Cope4")
Voxels <- c(1:3)
# (Theoretically) creating a separate dataframe for each of the terms in 'dfTitles'
nr <- length(Voxels)
nc <- length(Copes)
N <- length(dfTitles) # Number of data frames, same as length of dfTitles
DF <- vector(N, mode="list")
for (i in 1:N){
DF[[i]] <- data.frame(matrix(rnorm(nr*nc), nrow = nr))
dimnames(DF[[i]]) <- list(Voxels, Copes)
}
names(DF) <- dfTitles
DF[1:2]
$C2000.AMY
Cope1 Cope2 Cope3 Cope4
1 -0.8293164 -1.813807 -0.3290645 -0.7730110
2 -1.1965588 1.022871 -0.7764960 -0.3056280
3 0.2536782 -0.365232 2.3949076 0.5672671
$C2000.ACC
Cope1 Cope2 Cope3 Cope4
1 -0.7505513 1.023325 -0.3110537 -1.4298174
2 1.2807725 1.216997 1.0644983 1.6374749
3 1.0047408 1.385460 0.1527678 0.1576037
When creating objects in a for loop, they need to be saved somewhere before the next iteration of the loop, or it gets overwritten.
One way to handle that is to create an empty list or vector with c()before the beginning of your loop, and append the output of each run of the loop.
Another way to handle it is to assign the object to your environment before moving on to the next iteration of the loop.
# Defining some dummy variables for the sake of this example
dfTitles <- c("C2000.AMY", "C2000.ACC", "C2001.AMY", "C2001.ACC")
Copes <- c("Cope1", "Cope2", "Cope3", "Cope4")
Voxels <- c(1:338)
# initialize a list to store the data.frame output
df_list <- list()
for (d in dfTitles) {
# create data.frame with the dfTitle, and 1 row per Copes observation
df <- data.frame(dfTitle = d,
Copes = Copes)
# append columns for Voxels
# setting to NA, can be reassigned later as needed
for (v in Voxels) {
df[[paste0("Voxel", v)]] <- NA
}
# store df in the list as the 'd'th element
df_list[[d]] <- df
# or, assign the object to your environment
# assign(d, df)
}
# data.frames can be referenced by name
names(df_list)
head(df_list$C2000.AMY)

Conditionally add named elements to a list

I have a function to perform actions on a variable list of dataframes depending on user selections. The function mostly performs generic actions but there are a few actions that are dataframe specific.
My code runs fine if all dataframes are selected but I am unable to get it to work if not all dataframes are selected.
The following provides a minimal reproducible example:
# User switches.
df1Switch <- TRUE
df2Switch <- TRUE
df3Switch <- TRUE
# DF creation.
set.seed(1)
df <- data.frame(X=sample(1:10), Y=sample(11:20))
if (df1Switch) df1 <- df
if (df2Switch) df2 <- df
if (df3Switch) df3 <- df
# Function to do something.
fn_something <- function(file_list, file_names) {
df <- file_list
# Do lots of generic things.
df$Z <- df$X + df$Y
# Do a few specific things.
if (file_names == "Name1") df$X <- df$X + 1
else if (file_names == "Name2") df$X <- df$Z - 1
else if (file_names == "Name3") df$Y <- df$X + df$Y
return(df)
}
# Call function to do something.
file_list <- list(Name1=df1, Name2=df2, Name3=df3)
file_names <- names(file_list)
all_df <- do.call(rbind,mapply(fn_something, file_list, file_names,
SIMPLIFY=FALSE))
In this case the code runs fine as the user has selected to create all three dataframes. I use a named list so that the specific actions can be performed against the correct dataframes.
The output looks something like this (the actual numbers aren't important):
X Y Z
Name1.1 4 13 16
Name1.2 5 12 16
Name1.3 6 16 21
: : : :
Name2.1 15 13 16
: : : :
The problem arises if the user selects not to create some dataframes, e.g.:
# User switches.
df1Switch <- TRUE
df2Switch <- FALSE
df3Switch <- TRUE
Not surprisingly, in this case an object not found error results:
> # Call function to do something.
> file_list <- list(Name1=df1, Name2=df2, Name3=df3)
Error: object 'df2' not found
What I would like to do is conditionally specify the contents of file_list along the lines of this pseudo code:
file_list <- list(if (df1Switch) {Name1=df1}, if (df2Switch) {Name2=df2}, if (df3Switch) {Name3=df3})
I have come across list.foldLeft
Conditionally merge list elements but I don't know if this is suitable.
(I'll re-hash my comment:)
In general, I would encourage you to consider use of a list-of-dataframes instead of individual frames. My rationale for this:
assuming that each frame is structured (nearly) identically; and
assuming that what you do to one frame you will (or at least can) do to all frames; then
it is easier to list_of_frames <- lapply(list_of_frames, some_func) than it is to do something like:
for (nm in c("df1", "df2", "df3")) {
d <- get(nm)
d <- some_func(d)
assign(nm, d)
}
especially when dealing with non-global environments (i.e., doing this within a function).
To be clear, "easier" is subjective: though it does win code-golf, I find it much easier to read and understand that "I am running some_func on each element of list_of_frames and saving the result". (You can even save it to a new list-of-frames, thereby keeping the original frames untouched.)
You may also do things conditionally, as in
needs_work <- sapply(list_of_frames, some_checker_func) # returns logical
# or
needs_work <- c("df1", "df2") # names of elements of list_of_frames
list_of_frames[needs_work] <- lapply(list_of_frames[needs_work], some_func)
Having said that ... the direct answer to your one liner:
c(if (df1Switch) list(Name1=df1), if (df2Switch) list(Name2=df2), if (df3Switch) list(Name3=df3))
This capitalizes on the fact that unstated else results in a NULL, and the NULL-compressing (dropping) characteristic of c(). You can see it in action with:
c(if (T) list(a=1), if (T) list(b=2), if (T) list(d=4))
# $a
# [1] 1
# $b
# [1] 2
# $d
# [1] 4
c(if (T) list(a=1), if (FALSE) list(b=2), if (T) list(d=4))
# $a
# [1] 1
# $d
# [1] 4

How do I subset a list in R by every 3 indices?

I have a list which has 500 elements, but I want to get every 3rd element and save it to a variable. So I'd want list[1], list[4], list[7], list[10] and etc saved to one variable.
I tried sub.list <- list[1:500, by = 3] but this doesn't work.
L <- as.list(1:500) # create a list
L[seq(1, length(L), 3)]
# or, use recycling
L[c(TRUE, FALSE, FALSE)]
try this:
sub.list<-myList[seq_along(myList)%%3==1]
You can also use Filter:
L <- as.list(1:500) # create a list
Filter(function(i) {i %% 3 == 1},seq_along(L))
But won't work if you have NA's.

building nested lists in R

I have written a function that it's output is a list. I want to put my function into a loop and I'd like to store output of each iteration (which is of course a list) into a bigger list. In other words, each element of this BIG list is also a list. c() does not do what I want. Is there any way to do that?
To understand better what I'm asking, consider the example below:
iter1 <- list(item1 = 1, item2 = "a")
iter2 <- list(item1 = 1, item2 = "b")
All <- list(iter1 = iter1, iter2 = iter2)
I want to be able to do something similar to the code above but in a loop. How can I do that?
Thanks for your help,
There's another way to assign to a list, using my_list[[name or number]] <-. If you really want to do that in a loop, just looping over things with names like iter1, iter2, ...
A <- list()
n_iter <- 2
for (i in 1:n_iter){
iname <- paste("iter",i,sep="")
A[[iname]] <- get(iname)
}
As #mnel pointed out, dynamically growing a list is inefficient. The alternative is, I think, to use lapply:
n_iter <- 2
inames <- paste("iter",1:n_iter,sep="")
names(inames) <- inames
A <- lapply(inames,get)
This can also be done with a data frame, which would be a better format if your sublists always have two elements, each having a consistent class (item1 being numeric and item 2 being character).
n_iter <- 2
DF <- data.frame(item1=rep(0,n_iter),item2=rep("",n_iter),stringsAsFactors=FALSE)
for (i in 1:n_iter){
iname <- paste("iter",i,sep="")
DF[i,] <- get(iname)
rownames(DF)[i] <- iname
}
# item1 item2
# iter1 1 a
# iter2 1 b
However, that's a pretty ugly way of doing things; things get messy pretty quickly when using get. With your data structure, maybe you want to create iter1 and iter2 in a loop and immediately embed them into the parent list or data frame?
n_iter = 10
DF <- data.frame(item1 = rep(0,n_iter), item2 = rep("",n_iter))
for (i in 1:n_iter){
... do stuff to make anum and achar ...
DF[i,"item1"] <- anum
DF[i,"item2"] <- achar
}
Where anum and achar are the values of item1 and item2 you want to store from that iteration. Elsewhere on SO, they say that there is an alternative using the data.table package that is almost 10x as fast/efficient as this sort of data-frame assignment.
Oh, one last idea: if you want to put them in a list first, you can easily convert to a data frame later with
DF <- do.call(rbind.data.frame,A)
This gets you the equivalent of your All
c(iter1=list(iter1), iter2=list(iter2))
> identical(c(iter1=list(iter1), iter2=list(iter2)), All)
[1] TRUE
Let's say you'd like to add a third list to All:
c(All, list(iter3=iter3))
If you don't care for the list names, it looks a little cleaner
c(list(iter1), list(iter2))
I think #Frank's answer is the correct one here, but the first example he gave seemed a little strange. I think you want to do this...
bigLL <- list()
for( i in 1:3 ){
ll <- list( item1 = i , item2 = letters[i] )
bigLL[[i]] <- ll
}
bigLL
#[[1]]
#[[1]]$item1
#[1] 1
#[[1]]$item2
#[1] "a"
#[[2]]
#[[2]]$item1
#[1] 2
#[[2]]$item2
#[1] "b"
#[[3]]
#[[3]]$item1
#[1] 3
#[[3]]$item2
#[1] "c"
but you should consider the alternatives by Frank if possible.
This worked for me very well, hope it helps.
data = list()
for(i in 1:3)
{
tmp = c(1,2,3)
data = rbind(data, tmp)
}

How to apply different functions to different rows of each matrix in a list?

I have a list holding a couple of matrices with the same amount of rows (4). Now I want to apply a function like log2(row/something) to say rows 1 and 4 and a function like log2(row/something else) to rows 2 and 3.
In code:
# Create list with 2 matrices with 4 rows
l<-list(a=matrix(1:16,nrow=4),b=matrix(17:32,nrow=4))
# Now I thought it might be possible to
nl <- lapply(l, function(x){
log2(x[c(1,4),]/14)
log2(x[2:3,]/23)
})
But the result is that only the last function in the lapplyis executed. Also I thought it might be possible to:
nl <- l
lapply(nl, function(x) x[c(1,4),]) <- lapply(l, function(x) log2(x[c(1,4),]/14))
lapply(nl, function(x) x[2:3,]) <- lapply(l, function(x) log2(x[2:3,]/23))
But R really doesn't like that creative solution.
Your first solution should work, only now the function only returns the last part. A little change should work?
l<-list(a=matrix(1:16,nrow=4),b=matrix(17:32,nrow=4))
nl <- lapply(l, function(x){
x[c(1,4),] <- log2(x[c(1,4),]/14)
x[2:3,] <- log2(x[2:3,]/23)
return(x)
})

Resources