Add dataframes to each list element - r

I am reading a series of files that end up in a list of dataframes. After doing that, i'm interested in putting some additional information related to each dataframe. So, I want to add to each element of my dataframe list, some additional elements.
My attempt was to actually build the list of "extra stuff" and then try to merge it with the list of dataframes.
Example code:
set.seed(42)
#Building my list of data.frames. In my specific case this is coming from files
A <- data.frame(x=rnorm(10), y=rnorm(10))
B <- data.frame(x=rnorm(10), y=rnorm(10))
ListD <- list(A, B)
names(ListD)<- c("A", "B") #some names to know what is what
#now my attributes. Each data.frame as some properties that i want to keep track of.
newList <- list(A=c("Color"=123, "Date"=321), B=c("Color"=111, "Date"=111))
#My wished output is a list were each element of the list has
#"Color", "Date" and a dataframe
#I tried something like:
lapply(ListD, append, values=newList)

As far as I can tell all you have to do is change you're initialization of ListD to:
ListD <- list(list(A), list(B))
Because the data structure you want is a list of lists - with the inner lists holding a data.frame and two further attributes. I can't gurantee this is exactly the result you desire but essently this is where your problem is located.

OK, I thought this would be straightforward with mapply, but I can't get the lists to play together well... maybe someone else can. So here's a for solution:
#preallocate list
updatedList <- vector(mode = "list", length = length(ListD))
names(updatedList) <- names(ListD)
for(i in 1:length(updatedList)) {
updatedList[[i]] <- c(ListD[i], newList[[i]])
}
updatedList$A
# $A
# x y
# 1 -0.51690823 0.4521443
# 2 0.97544933 -0.7212561
# 3 0.98909668 -0.2258737
# 4 -1.72753947 -0.7643175
# 5 -1.31050478 -3.2526437
# 6 -0.63845053 1.1263407
# 7 -0.09010858 -0.9386608
# 8 -0.53933869 -0.6882866
# 9 0.54668290 1.7227261
# 10 -0.87948586 -0.2413344
#
# $Color
# [1] 123
#
# $Date
# [1] 321
Alternatively, if you take #Яaffael's suggestion, mapply works, but that will depend how you're building the list from the files in the first place:
ListD <- list(list(A), list(B))
updatedList <- mapply(c, ListD, newList, SIMPLIFY = FALSE)

I used #Яaffael suggestion of a list of lists.
Since changing my list of dataframes is not very easy, because of the way i'm reading them from files, I made the list of lists with the extra data, and then join the dataframes like this:
newList <- list(A=list("Color"=123, "Date"=321), B=list("Color"=111, "Date"=111))
for(n in names(newList)){
newList[[n]]$Dataframe <- ListD[[n]]
}
the structure of my output:
> str(newList)
List of 2
$ A:List of 3
..$ Color : num 123
..$ Date : num 321
..$ Dataframe:'data.frame': 10 obs. of 2 variables:
.. ..$ x: num [1:10] 1.371 -0.565 0.363 0.633 0.404 ...
.. ..$ y: num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
$ B:List of 3
..$ Color : num 111
..$ Date : num 111
..$ Dataframe:'data.frame': 10 obs. of 2 variables:
.. ..$ x: num [1:10] -0.307 -1.781 -0.172 1.215 1.895 ...
.. ..$ y: num [1:10] 0.455 0.705 1.035 -0.609 0.505 ...

Related

Concatenate a series of lists with incrementing numeric suffixes in R

In R I have a series of lists with incrementing numeric suffixes eg mylist1 , mylist2 , mylist3.
I want to concatenate these , like c(mylist1, mylist2, mylist3)
Is there a shorthand way to manage this?
I think you are trying to create a list of lists.
You can do it simply by calling:
list(list1, list2, list3)
If you have many lists with a similar name pattern, you can select use mget to GET all objects whose names have a specific pattern, (ls(pattern=x)).
data
list8<-list(1,2)
list9<-list(3,4)
list10<-list(5,6)
#Included the lists with indexes 8:10 so that the importance of ordering by `parse_number(ls)` is highlighted. Without the `parse_number` step, the list would be sorted by names, with a different order
Answer
list_of_lists<-mget(ls(pattern = 'list\\d+')[order(parse_number(ls(pattern = 'list\\d+')))])
> str(list_of_lists)
List of 3
$ mylist8:List of 2
..$ : num 1
..$ : num 2
$ mylist9:List of 2
..$ : num 3
..$ : num 4
$ mylist10:List of 2
..$ : num 5
..$ : num 6

Convert a vector in R to a nested list

I have a vector of length n and I want to convert it to a nested list. The outer list should be of length n / 2 and each item in this list should have two sub-lists. The first sub-list in each list will hold an odd numbered element from the vector and the second sub-list will hold an even numbered element (see my example code if this isn't making sense... I'm having a hard time describing it in general terms).
Question: Is there a way to convert this vector into a nested list without using for loops?
I'm doing this in the context of a simulation and so I want to make sure it is as fast as possible. The aforementioned vector will vary in length between iterations of the simulation, so I'm trying to find an answer I can generalize to a vector of length n. The vector will always be even though.
Example vector and list:
ex_vector <- 1:6
ex_list <- list(
list(1, 2),
list(3, 4),
list(5, 6)
)
Edit: fixed an error in my example code
I'm not sure to understand the general principle, but the code below works for the example:
> x <- apply(matrix(ex_vector, ncol=2, byrow=TRUE), 1, as.list)
> str(x)
List of 3
$ :List of 2
..$ : int 1
..$ : int 2
$ :List of 2
..$ : int 3
..$ : int 4
$ :List of 2
..$ : int 5
..$ : int 6
n <- length(ex_vector)
lapply(split(ex_vector, rep(1:(n/2), each = 2)), split, 1:2)
We create a grouping variable with gl and splitthe vector into a list and convert to list with as.list
n <- 2
out <- lapply(split(ex_vector, as.integer(gl(length(ex_vector), n,
length(ex_vector)))), as.list)
str(out)
#List of 3
# $ 1:List of 2
# ..$ : int 1
# ..$ : int 2
# $ 2:List of 2
# ..$ : int 3
# ..$ : int 4
# $ 3:List of 2
# ..$ : int 5
# ..$ : int 6
Or use %/% to split
lapply(split(ex_vector, (seq_along(ex_vector)-1) %/% n + 1), as.list)
Or compactly
split(as.list(ex_vector), cumsum(seq_along(ex_vector) %%2))

List assignment for list with greater than three nesting

I have not been able to find a fix for this error. I have implemented work-arounds before, but I wonder if anyone here knows why it occurs.
the following returns no error as expected
q <- list()
q[["a"]][["b"]] <- 3
q[["a"]][["c"]] <- 4
However, when I add another level of nesting I get:
q <- list()
q[["a"]][["b"]][["c"]]<- 3
q[["a"]][["b"]][["d"]] <- 4
Error in q[["a"]][["b"]][["d"]] <- 4 : more elements supplied than there are to replace
To make this even more confusing if I add a fourth nested list I get:
q <- list()
q[["a"]][["b"]][["c"]][["d"]] <- 3
q[["a"]][["b"]][["c"]][["e"]] <- 4
Error in *tmp*[["c"]] : subscript out of bounds
I would have expected R to return the same error message for the triple nested list as for the quadruple nested list.
I first came across this a few months ago. I am running R 3.4.3.
If we check the str(q) from the first assignment, it is a list with a single element 'a'. On subsequent assignment, it is creating a named vector rather than a list.
q <- list()
q[["a"]][["b"]] <- 3
q[["a"]][["c"]] <- 4
str(q)
#List of 1
# $ a: Named num [1:2] 3 4
# ..- attr(*, "names")= chr [1:2] "b" "c"
is.vector(q$a)
#[1] TRUE
If we try to do an assignment on the next level, it is like assignment based on indexing the name i.e. 'b' which is empty and assign value on 'c'. The option would be to create a list element by wrapping the value with list
q <- list()
q[["a"]][["b"]][["c"]]<- list(3)
q[["a"]][["b"]][["d"]] <- list(4)
It returns the structure with 'q' as a list of 1 element i.e. 'a', which is again a list of length 1 ('b') and as we assign two values '3' and '4' for 'c' and 'd', it is a list of 2 elemeents
str(q)
#List of 1
# $ a:List of 1
# ..$ b:List of 2
# .. ..$ c:List of 1
# .. .. ..$ : num 3
# .. ..$ d:List of 1
# .. .. ..$ : num 4
By this way, we can nest 'n' number of lists
q <- list()
q[["a"]][["b"]][["c"]][["d"]] <- list(3)
q[["a"]][["b"]][["c"]][["e"]] <- list(4)
Note: It is not clear about the expected output structure

Applying "dim" function over elements of a list in R

I have n number of lists, lets say list1, list2, ..., listn. Each list has 10 elements and I need to calculate the "mean" of "dim" of ten elements of each list. So the output should be a vector of length n.
For example the first element of the output vector should be:
n1 = mean(dim(list1[[1]]), dim(list1[[2]]), dim(list1[[3]]), ..., dim(list1[[10]])
I know how to obtain it using for-loops but I am sure it is not the best solution.
The lists have structure derived from one of "Bioconductor" R packages called "edgeR".
So each element of the list has this structure:
$ :Formal class 'TopTags' [package "edgeR"] with 1 slots
.. ..# .Data:List of 4
.. .. ..$ :'data.frame': 2608 obs. of 4 variables:
.. .. .. ..$ logFC : num [1:2608] 6.37 -6.48 -5.72 -5.6 -4.01 ...
.. .. .. ..$ logCPM: num [1:2608] 5.1 2.55 2.08 1.57 3.08 ...
.. .. .. ..$ PValue: num [1:2608] 3.16e-292 1.57e-187 2.15e-152 5.58e-141 1.27e-135 ...
.. .. .. ..$ FDR : num [1:2608] 7.37e-288 1.83e-183 1.67e-148 3.25e-137 5.92e-132 ...
.. .. ..$ : chr "BH"
.. .. ..$ : chr [1:2] "healthy" "cancerous"
.. .. ..$ : chr "exact"
And since each list has 10 elements, I have 10 repeats of above structure when running:
str(list1)
Original question
lapply (or sapply) is your friend:
mean(sapply(mylist,dim))
If you have many lists with a uniform meaning and structure, you should use instead a list of lists (i.e., mylist[[3]] instead of mylist3).
Edited question
sapply(mylist, function(x) mean(sapply(x,dim)))
will return a vector of means of inner lists.
Question in a comment
If your list contains matrices instead of vectors and you want to average one of the dimensions (dim(.)[1] or dim(.)[2]), you can use ncol and nrow for that instead of dim.
Alternatively, you can pass any function there, e.g.,
sapply(mylist, function(x) mean(sapply(x, function(y) sum(dim(y)))))
to average the sums of dimensions.
If all your objects are called "list*" and you have no other objects with the names list in them, you can easily stick all the lists into a single list object which will make it easier to operate on them...
ll <- mget( ls( pattern = "list" ) )
sapply( ll , function(x) mean( sapply( x , dim ) )
Here is the solution using Map function where mylist is the list of yours:
Map(function(x) mean(x[[1]]:x[[10]]), mylist)
Example:
a<-list(1,2,3,4)
b<-list(2,3,5,6)
mylist<-list(a,b)
k<- Map(function(x) mean(x[[1]]:x[[4]]), mylist)
>k
[[1]]
[1] 2.5
[[2]]
[1] 4
To convert to vector:
> do.call(rbind,k)
[,1]
[1,] 2.5
[2,] 4.0
OR,
library(plyr)
ldply(k)
V1
1 2.5
2 4.0
If the elements of each list are matrix:
Map(function(x) mean(dim(x[[1]])[1]:dim(x[[10]])[1]), mylist)

combining and operating on matrices twice nested in a list

if xmpl is a list where each element has an integer age and a list data, where data contains three matrices of equal size, a to c
What is the best way to do
cor( xmpl[[:]]$data[[:]][c('a','b','c')], xmpl[[:]]$age)
where the results would be 3 x length(a) array or list that reflects age correlated with each instance of each element of a (row 1), b (row 2), and c (row 3) across xmpl.
I am reading in matrices that represent the output of different pipelines. There are 3 of these per subject and a whole lot of subjects. Currently, I've built a list of subjects that has among other things a list of pipeline matrices.
The structure looks like:
str(exmpl)
$ :List of 4
..$ id : int 5
..$ age : num 10
..$ data :List of 3
.. ..$ a: num [1:10, 1:10] 0.782 1.113 3.988 0.253 4.118 ...
.. ..$ b: num [1:10, 1:10] 5.25 5.31 5.28 5.43 5.13 ...
.. ..$ c: num [1:10, 1:10] 1.19e-05 5.64e-03 7.65e-01 1.65e-03 4.50e-01 ...
..$ otherdata: chr "ignorefornow"
#[...]
I want to correlate every element of a across all subjects with the age of subjects. Then do the same for b and c and put the results into a list.
I think I am approaching this in a way that is awkward for R. I'm interested in what the "R way" of storing and retrieving this data would be.
Data Structure and desired output http://dl.dropbox.com/u/56019781/linked/struct-2012-12-19.svg
library(plyr)
## example structure
xmpl.mat <- function(){ matrix(runif(100),nrow=10) }
xmpl.list <- function(x){ list( id=x, age=2*x, data=list( a=x*xmpl.mat(), b=x+xmpl.mat(), c=xmpl.mat()^x ), otherdata='ignorefornow' ) }
xmpl <- lapply( 1:5, xmpl.list )
## extract
ages <- laply(xmpl,'[[','age')
data <- llply(xmpl,'[[','data')
# to get the cor for one set of matrices is easy enough
# though it would be nice to do: a <- xmpl[[:]]$data$a
x.a <- sapply(data,'[[','a')
x.a.corr <- apply(x.a,1,cor,ages)
# ...
#xmpl.corr <- list(x.a.corr,x.b.corr,x.c.corr)
# and by loop, not R like?
xmpl.corr<-list()
for (i in 1:length(names(data[[1]])) ){
x <- sapply(data,'[[',i)
xmpl.corr[[i]] <- apply(x,1,cor,ages)
}
names(xmpl.corr) <- names(data[[1]])
Final output:
str(xmpl.corr)
List of 3
$ a: num [1:100] 0.712 -0.296 0.739 0.8 0.77 ...
$ b: num [1:100] 0.98 0.997 0.974 0.983 0.992 ...
$ c: num [1:100] -0.914 -0.399 -0.844 -0.339 -0.571 ..
Here's a solution. It should be short enough.
ages <- sapply(xmpl, "[[", "age") # extract ages
data <- sapply(xmpl, function(x) unlist(x[["data"]])) # combine all matrices
corr <- apply(data, 1, cor, ages) # calculate correlations
xmpl.corr <- split(corr, substr(names(corr), 1, 1)) # split the vector
Instead of x.a, x.b, x.c you would probably want to have all of these in one list.
# First, get a list of the items in data
abc <- names(xmpl[[1]]$data) # incase variables change in future
names(abc) <- abc # these are the same names that will be used for the final list. You can use whichever names make sense
## use lapply to keep as list, use sapply to "simplify" the list
x.data.list <- lapply(abc, function(z)
sapply(xmpl, function(xm) c(xm$data[[z]])) )
ages <- sapply(xmpl, `[[`, "age")
# Then compute the correlations. Note that on each element of x.data.list we are apply'ing per row
correlations <- lapply(x.data.list, apply, 1, cor, ages)

Resources