building nested lists in R - r

I have written a function that it's output is a list. I want to put my function into a loop and I'd like to store output of each iteration (which is of course a list) into a bigger list. In other words, each element of this BIG list is also a list. c() does not do what I want. Is there any way to do that?
To understand better what I'm asking, consider the example below:
iter1 <- list(item1 = 1, item2 = "a")
iter2 <- list(item1 = 1, item2 = "b")
All <- list(iter1 = iter1, iter2 = iter2)
I want to be able to do something similar to the code above but in a loop. How can I do that?
Thanks for your help,

There's another way to assign to a list, using my_list[[name or number]] <-. If you really want to do that in a loop, just looping over things with names like iter1, iter2, ...
A <- list()
n_iter <- 2
for (i in 1:n_iter){
iname <- paste("iter",i,sep="")
A[[iname]] <- get(iname)
}
As #mnel pointed out, dynamically growing a list is inefficient. The alternative is, I think, to use lapply:
n_iter <- 2
inames <- paste("iter",1:n_iter,sep="")
names(inames) <- inames
A <- lapply(inames,get)
This can also be done with a data frame, which would be a better format if your sublists always have two elements, each having a consistent class (item1 being numeric and item 2 being character).
n_iter <- 2
DF <- data.frame(item1=rep(0,n_iter),item2=rep("",n_iter),stringsAsFactors=FALSE)
for (i in 1:n_iter){
iname <- paste("iter",i,sep="")
DF[i,] <- get(iname)
rownames(DF)[i] <- iname
}
# item1 item2
# iter1 1 a
# iter2 1 b
However, that's a pretty ugly way of doing things; things get messy pretty quickly when using get. With your data structure, maybe you want to create iter1 and iter2 in a loop and immediately embed them into the parent list or data frame?
n_iter = 10
DF <- data.frame(item1 = rep(0,n_iter), item2 = rep("",n_iter))
for (i in 1:n_iter){
... do stuff to make anum and achar ...
DF[i,"item1"] <- anum
DF[i,"item2"] <- achar
}
Where anum and achar are the values of item1 and item2 you want to store from that iteration. Elsewhere on SO, they say that there is an alternative using the data.table package that is almost 10x as fast/efficient as this sort of data-frame assignment.
Oh, one last idea: if you want to put them in a list first, you can easily convert to a data frame later with
DF <- do.call(rbind.data.frame,A)

This gets you the equivalent of your All
c(iter1=list(iter1), iter2=list(iter2))
> identical(c(iter1=list(iter1), iter2=list(iter2)), All)
[1] TRUE
Let's say you'd like to add a third list to All:
c(All, list(iter3=iter3))
If you don't care for the list names, it looks a little cleaner
c(list(iter1), list(iter2))

I think #Frank's answer is the correct one here, but the first example he gave seemed a little strange. I think you want to do this...
bigLL <- list()
for( i in 1:3 ){
ll <- list( item1 = i , item2 = letters[i] )
bigLL[[i]] <- ll
}
bigLL
#[[1]]
#[[1]]$item1
#[1] 1
#[[1]]$item2
#[1] "a"
#[[2]]
#[[2]]$item1
#[1] 2
#[[2]]$item2
#[1] "b"
#[[3]]
#[[3]]$item1
#[1] 3
#[[3]]$item2
#[1] "c"
but you should consider the alternatives by Frank if possible.

This worked for me very well, hope it helps.
data = list()
for(i in 1:3)
{
tmp = c(1,2,3)
data = rbind(data, tmp)
}

Related

Add dataframe to a nested list in a nested for-loop, as in: listname[[xx]][yy] <- value

I have a nested loop during which I want to add dataframes into a list, as in:
listname <- list()
for(xx in 1:50) {
listname[[xx]] <- list()
for(yy in 1:25) {
my_df <- data.frame(aa = c(1,2,".."), bb = c(1,2,".."))
listname[[xx]][yy] <- my_df
}
}
However, I get the warning message:
"number of items to replace is not a multiple of replacement length"
... and as a result, only my_df$aa is in listname[[xx]][yy], but not my_df$bb.
What would be a more appropriate way to accomplish a nested list of dataframes with this approach?
Thank you!
EDIT: Found the solution! It turns out that the only mistake was the lack of a further bracket. It should have been listname[[xx]][[yy]] <- my_df. Thanks to Ben in the comments; it works now.
You will want to use double brackets [[ for yy in storing your data frames as follows:
listname[[xx]][[yy]] <- my_df
The use of [[ is important when working with lists, since subsetting a list with [ (single bracket alone) always returns a smaller list.
To see the difference, take a look at listname[[1]][1]:
[[1]]
aa bb
1 1 1
2 2 2
3 .. ..
This is a list returned.
Then we can compare with listname[[1]][[1]]:
aa bb
1 1 1
2 2 2
3 .. ..
Which is a data.frame.
In this case, when you are want to replace a single value in a list with a data.frame, you will want to use [[.
You can try with replicate to get such nested list of dataframes.
my_df <- data.frame(aa = c(1,2,".."), bb = c(1,2,".."))
listname <- replicate(50, replicate(25,
my_df, simplify = FALSE), simplify = FALSE)

Conditionally add named elements to a list

I have a function to perform actions on a variable list of dataframes depending on user selections. The function mostly performs generic actions but there are a few actions that are dataframe specific.
My code runs fine if all dataframes are selected but I am unable to get it to work if not all dataframes are selected.
The following provides a minimal reproducible example:
# User switches.
df1Switch <- TRUE
df2Switch <- TRUE
df3Switch <- TRUE
# DF creation.
set.seed(1)
df <- data.frame(X=sample(1:10), Y=sample(11:20))
if (df1Switch) df1 <- df
if (df2Switch) df2 <- df
if (df3Switch) df3 <- df
# Function to do something.
fn_something <- function(file_list, file_names) {
df <- file_list
# Do lots of generic things.
df$Z <- df$X + df$Y
# Do a few specific things.
if (file_names == "Name1") df$X <- df$X + 1
else if (file_names == "Name2") df$X <- df$Z - 1
else if (file_names == "Name3") df$Y <- df$X + df$Y
return(df)
}
# Call function to do something.
file_list <- list(Name1=df1, Name2=df2, Name3=df3)
file_names <- names(file_list)
all_df <- do.call(rbind,mapply(fn_something, file_list, file_names,
SIMPLIFY=FALSE))
In this case the code runs fine as the user has selected to create all three dataframes. I use a named list so that the specific actions can be performed against the correct dataframes.
The output looks something like this (the actual numbers aren't important):
X Y Z
Name1.1 4 13 16
Name1.2 5 12 16
Name1.3 6 16 21
: : : :
Name2.1 15 13 16
: : : :
The problem arises if the user selects not to create some dataframes, e.g.:
# User switches.
df1Switch <- TRUE
df2Switch <- FALSE
df3Switch <- TRUE
Not surprisingly, in this case an object not found error results:
> # Call function to do something.
> file_list <- list(Name1=df1, Name2=df2, Name3=df3)
Error: object 'df2' not found
What I would like to do is conditionally specify the contents of file_list along the lines of this pseudo code:
file_list <- list(if (df1Switch) {Name1=df1}, if (df2Switch) {Name2=df2}, if (df3Switch) {Name3=df3})
I have come across list.foldLeft
Conditionally merge list elements but I don't know if this is suitable.
(I'll re-hash my comment:)
In general, I would encourage you to consider use of a list-of-dataframes instead of individual frames. My rationale for this:
assuming that each frame is structured (nearly) identically; and
assuming that what you do to one frame you will (or at least can) do to all frames; then
it is easier to list_of_frames <- lapply(list_of_frames, some_func) than it is to do something like:
for (nm in c("df1", "df2", "df3")) {
d <- get(nm)
d <- some_func(d)
assign(nm, d)
}
especially when dealing with non-global environments (i.e., doing this within a function).
To be clear, "easier" is subjective: though it does win code-golf, I find it much easier to read and understand that "I am running some_func on each element of list_of_frames and saving the result". (You can even save it to a new list-of-frames, thereby keeping the original frames untouched.)
You may also do things conditionally, as in
needs_work <- sapply(list_of_frames, some_checker_func) # returns logical
# or
needs_work <- c("df1", "df2") # names of elements of list_of_frames
list_of_frames[needs_work] <- lapply(list_of_frames[needs_work], some_func)
Having said that ... the direct answer to your one liner:
c(if (df1Switch) list(Name1=df1), if (df2Switch) list(Name2=df2), if (df3Switch) list(Name3=df3))
This capitalizes on the fact that unstated else results in a NULL, and the NULL-compressing (dropping) characteristic of c(). You can see it in action with:
c(if (T) list(a=1), if (T) list(b=2), if (T) list(d=4))
# $a
# [1] 1
# $b
# [1] 2
# $d
# [1] 4
c(if (T) list(a=1), if (FALSE) list(b=2), if (T) list(d=4))
# $a
# [1] 1
# $d
# [1] 4

How to get a value from vector and assign it as a name of new dataframe

IN R
I have a vector of NAME:
[1] "ALKR50SV" "AMKR71SV" "AOKR71SV" "AZKR52SV" "BFKR70SV" "BJKR61SV" "BUKR6HSV"
"CDKR61SV" "CFKR31SV"
I want to use them as a name for each new dataframe
Like dataframe of ALKR50SV, dataframe of ALKR50SV ......
for loop like:
NAME[i] <- data1
will cause problem.
What should I do? Thank you.
As #joran and #neilfws said, best to work with a list of data.frames.
For example, consider the following list of three data.frames
lst <- lapply(1:3, function(x) as.data.frame(matrix(sample(20), ncol = 4)));
You can name list elements
names(lst) <- c("ALKR50SV", "AMKR71SV", "AOKR71SV");
and operate on list elements using lapply, e.g.
lapply(lst, dim);
#$ALKR50SV
#[1] 5 4
#
#$AMKR71SV
#[1] 5 4
#
#$AOKR71SV
#[1] 5 4
You can use assign:
numbers <- c('one', 'two', 'three')
for (i in 1:3) {
assign(nms[i], i)
}
one # 1
two # 2
three # 3
But as others have commented, it is most likely better to put your dataframes into a named list.

rbind all data frames with common names based on list using lapply

I have several data frames named as such:
orange_ABC
orange_BCD
apple_ABC
apple_BCD
grape_ABC
grape_BCD
I need to rbind those that have the first part of their name in common (orange, apple, grape), and name the new data frames as such. I'm accessing the names from a list of data frames names(fruitlist) (from which I made the aforementioned data frames) and have tried using lapply with function(x) with no luck. I'm somewhat new to R, so think I'm making a simple mistake when it comes to dynamically naming the new data frame...
lapply(names(fruitlist),
function(x){
frame_nm <- toString((names(fruitlist[x])))
frame_nm <- do.call(rbind, mget(ls(pattern=paste0((names(splitlist[x])),"*"))))
})
I've tried the standalone line on one type of "fruit" and it seems to work:
test_DF <- do.call(rbind, mget(ls(pattern="apple*")))
EDIT: I realize I forgot to mention that the example list of 6 data frames were created dynamically, so I can't simply generate a list of them. However, I do have a list of the "fruits", and all possible the ends of the new data frame names are known ("_ABC" and "_BCD").
As suspected, the proposed way of assigning values to objects does not work. Moreover, care has to be taken when using ls() and mget() for listing and accessing named objects within a function, because they do not automatically ascend to parent environments and only "see" variables in the local scope unless told otherwise. This applies to R version 3.4, older versions may behave differently.
Creating named objects.
In order to create new objects in the global environment, use assign() (already suggested in Luke C's answer):
> assign("foo", "some text")
> foo
[1] "some text"
Placing code inside a function induces a local scope. Explicitly specifying the global environment allows setting global variables:
> set_foo <- function (x) { assign("foo", x, envir=globalenv()) }
> set_foo("other text")
> foo
[1] "other text"
Note that omitting the envir argument would leave the global environment unaffected.
Use of ls()/mget() within a local function.
By default, this only lists names from the current (local) environment of the that function, which only sees the argument x in the example code given in the question. Similar to above, a quick fix is to specify the global environment explicitly by adding the argument envir=globalenv(). The same applies for mget().
Since no MWE was provided, I am taking the liberty of adapting the "fake data" example code provided in Luke C's answer.
# Populate environment
namelist <- paste(fruit = rep(c("orange", "apple", "grape"), 2),
nums = rep(c("_ABC", "_BCD"), each = 3), sep = "")
for(x in namelist)
assign(x, data.frame(a = 1:4, b = 11:14))
# The following re-generates the list of fruits used above
grouplist <- unique(unlist(lapply(strsplit(namelist, "_"), function (x) { x[[1]] })))
# Group and rbind by prefix, suppressing output
invisible(lapply(grouplist,
function(x) {
grouped <- do.call(rbind,
mget(ls(pattern=paste0(x,"_*"), envir=globalenv()),
envir=globalenv()))
assign(x, grouped, envir=globalenv())
}))
If your fruitlist is a named list of data frames, maybe this will suit.
First, get the like names into their own list:
fruit.groups <- split(names(fruitlist),
sapply(strsplit(names(fruitlist), split = "_"), "[[", 1))
> fruit.groups
$apple
[1] "apple_ABC" "apple_BCD"
$grape
[1] "grape_ABC" "grape_BCD"
$orange
[1] "orange_ABC" "orange_BCD"
Then, use lapply to rbind by group:
fdf <- lapply(fruit.groups, function(x){
out <- do.call(rbind, fruitlist[x])
out$from <- gsub("(\\..*)", "", rownames(out))
rownames(out) <- NULL
return(out)
})
> fdf$apple
a b from
1 1 11 apple_ABC
2 2 12 apple_ABC
3 3 13 apple_ABC
4 4 14 apple_ABC
5 1 11 apple_BCD
6 2 12 apple_BCD
7 3 13 apple_BCD
8 4 14 apple_BCD
Fake data:
namelist <- paste(fruit = rep(c("orange", "apple", "grape"), 2),
nums = rep(c("_ABC", "_BCD"), each = 3), sep = "")
fruitlist <- llply(namelist, function(x){
assign(as.character(x), data.frame(a = 1:4, b = 11:14))
})
EDIT:
From the edits to your question above:
If you have the fruits and suffixes, use expand.grid to get all possible combinations (assuming that all combinations will refer to the dynamically generated data frames).
fruits <- c("orange", "apple", "grape")
suffixes <- c("_ABC", "_BCD")
fullnames <- apply(expand.grid(fruits, suffixes), 1, paste, collapse = "")
Using that list of names, use mget to generate a list of the present dataframes.
new_fruit_df_list <- mget(fullnames)
Then, the code from above should work, modified here to reflect the name changes:
fruit.groups <- split(names(new_fruit_df_list),
sapply(strsplit(names(new_fruit_df_list), split = "_"), "[[", 1))
fdf <- lapply(fruit.groups, function(x){
out <- do.call(rbind, new_fruit_df_list[x])
out$from <- gsub("(\\..*)", "", rownames(out))
rownames(out) <- NULL
return(out)
})
Have a look at the head of each, with the added column (remove if you don't want it) showing the name of that row's original data frame.
> lapply(fdf, head, 2)
$apple
a b from
1 1 11 apple_ABC
2 2 12 apple_ABC
$grape
a b from
1 1 11 grape_ABC
2 2 12 grape_ABC
$orange
a b from
1 1 11 orange_ABC
2 2 12 orange_ABC
Give this a try:
file_groups <- ls()[grep(".*_.*", ls())]
file_groups <- gsub("(.*)_.*", "\\1", file_groups)
df_list <- lapply(file_groups,
function(x){ do.call(rbind, mget(ls(pattern = paste0(x, "*"))))})

Transpose a List of Lists

I have a list which contains list entries, and I need to transpose the structure.
The original structure is rectangular, but the names in the sub-lists do not match.
Here is an example:
ax <- data.frame(a=1,x=2)
ay <- data.frame(a=3,y=4)
bw <- data.frame(b=5,w=6)
bz <- data.frame(b=7,z=8)
before <- list( a=list(x=ax, y=ay), b=list(w=bw, z=bz))
What I want:
after <- list(w.x=list(a=ax, b=bw), y.z=list(a=ay, b=bz))
I do not care about the names of the resultant list (at any level).
Clearly this can be done explicitly:
after <- list(x.w=list(a=before$a$x, b=before$b$w), y.z=list(a=before$a$y, b=before$b$z))
but this is ugly and only works for a 2x2 structure. What's the idiomatic way of doing this?
The following piece of code will create a list with i-th element of every list in before:
lapply(before, "[[", i)
Now you just have to do
n <- length(before[[1]]) # assuming all lists in before have the same length
lapply(1:n, function(i) lapply(before, "[[", i))
and it should give you what you want. It's not very efficient (travels every list many times), and you can probably make it more efficient by keeping pointers to current list elements, so please decide whether this is good enough for you.
The purrr package now makes this process really easy:
library(purrr)
before %>% transpose()
## $x
## $x$a
## a x
## 1 1 2
##
## $x$b
## b w
## 1 5 6
##
##
## $y
## $y$a
## a y
## 1 3 4
##
## $y$b
## b z
## 1 7 8
Here's a different idea - use the fact that data.table can store data.frame's (in fact, given your question, maybe you don't even need to work with lists of lists and could just work with data.table's):
library(data.table)
dt = as.data.table(before)
after = as.list(data.table(t(dt)))
While this is an old question, i found it while searching for the same problem, and the second hit on google had a much more elegant solution in my opinion:
list_of_lists <- list(a=list(x="ax", y="ay"), b=list(w="bw", z="bz"))
new <- do.call(rbind, list_of_lists)
new is now a rectangular structure, a strange object: A list with a dimension attribute. It works with as many elements as you wish, as long as every sublist has the same length. To change it into a more common R-Object, one could for example create a matrix like this:
new.dims <- dim(new)
matrix(new,nrow = new.dims[1])
new.dims needed to be saved, as the matrix() function deletes the attribute of the list. Another way:
new <- do.call(c, new)
dim(new) <- new.dims
You can now for example convert it into a data.frame with as.data.frame() and split it into columns or do column wise operations. Before you do that, you could also change the dim attribute of the matrix, if it fits your needs better.
I found myself with this problem but I needed a solution that kept the names of each element. The solution I came up with should also work when the sub lists are not all the same length.
invertList = function(l){
elemnames = NULL
for (i in seq_along(l)){
elemnames = c(elemnames, names(l[[i]]))
}
elemnames = unique(elemnames)
res = list()
for (i in seq_along(elemnames)){
res[[elemnames[i]]] = list()
for (j in seq_along(l)){
if(exists(elemnames[i], l[[j]], inherits = F)){
res[[i]][[names(l)[j]]] = l[[names(l)[j]]][[elemnames[i]]]
}
}
}
res
}

Resources