R - Refactor list of lists [duplicate] - r

I have a list which contains list entries, and I need to transpose the structure.
The original structure is rectangular, but the names in the sub-lists do not match.
Here is an example:
ax <- data.frame(a=1,x=2)
ay <- data.frame(a=3,y=4)
bw <- data.frame(b=5,w=6)
bz <- data.frame(b=7,z=8)
before <- list( a=list(x=ax, y=ay), b=list(w=bw, z=bz))
What I want:
after <- list(w.x=list(a=ax, b=bw), y.z=list(a=ay, b=bz))
I do not care about the names of the resultant list (at any level).
Clearly this can be done explicitly:
after <- list(x.w=list(a=before$a$x, b=before$b$w), y.z=list(a=before$a$y, b=before$b$z))
but this is ugly and only works for a 2x2 structure. What's the idiomatic way of doing this?

The following piece of code will create a list with i-th element of every list in before:
lapply(before, "[[", i)
Now you just have to do
n <- length(before[[1]]) # assuming all lists in before have the same length
lapply(1:n, function(i) lapply(before, "[[", i))
and it should give you what you want. It's not very efficient (travels every list many times), and you can probably make it more efficient by keeping pointers to current list elements, so please decide whether this is good enough for you.

The purrr package now makes this process really easy:
library(purrr)
before %>% transpose()
## $x
## $x$a
## a x
## 1 1 2
##
## $x$b
## b w
## 1 5 6
##
##
## $y
## $y$a
## a y
## 1 3 4
##
## $y$b
## b z
## 1 7 8

Here's a different idea - use the fact that data.table can store data.frame's (in fact, given your question, maybe you don't even need to work with lists of lists and could just work with data.table's):
library(data.table)
dt = as.data.table(before)
after = as.list(data.table(t(dt)))

While this is an old question, i found it while searching for the same problem, and the second hit on google had a much more elegant solution in my opinion:
list_of_lists <- list(a=list(x="ax", y="ay"), b=list(w="bw", z="bz"))
new <- do.call(rbind, list_of_lists)
new is now a rectangular structure, a strange object: A list with a dimension attribute. It works with as many elements as you wish, as long as every sublist has the same length. To change it into a more common R-Object, one could for example create a matrix like this:
new.dims <- dim(new)
matrix(new,nrow = new.dims[1])
new.dims needed to be saved, as the matrix() function deletes the attribute of the list. Another way:
new <- do.call(c, new)
dim(new) <- new.dims
You can now for example convert it into a data.frame with as.data.frame() and split it into columns or do column wise operations. Before you do that, you could also change the dim attribute of the matrix, if it fits your needs better.

I found myself with this problem but I needed a solution that kept the names of each element. The solution I came up with should also work when the sub lists are not all the same length.
invertList = function(l){
elemnames = NULL
for (i in seq_along(l)){
elemnames = c(elemnames, names(l[[i]]))
}
elemnames = unique(elemnames)
res = list()
for (i in seq_along(elemnames)){
res[[elemnames[i]]] = list()
for (j in seq_along(l)){
if(exists(elemnames[i], l[[j]], inherits = F)){
res[[i]][[names(l)[j]]] = l[[names(l)[j]]][[elemnames[i]]]
}
}
}
res
}

Related

Loop over elements in vector, and elements are matrices

I started using R today, so I apologize if this is too basic.
First I construct 2 matrices, and construct a vector, whose entries are these matrices. Then, I try to loop over the elements of the vector, i.e. the matrices. However, when I do, I get a "argument of length zero" error.
cam <- 1:12
ped <- 13:24
dim(cam) <- c(3,4)
dim(ped) <- c(4,3)
mats <- c('cam','ped')
for (i in 1:2) {
rownames(mats[i]) <- LETTERS[1:dim(mats[i])[1]]
colnames(mats[i]) <- LETTERS[1:dim(mats[i])[2]]
}
The error text is as follows:
Error in 1:dim(mats[i])[1] : argument of length 0
The question: how to loop over elements of a vector, these elements being matrices? (I'm guessing I'm not calling the elements correctly). Thank you for patience.
The go-to option in R is to use lists:
cam <- 1:12 ; dim(cam) <- c(3,4)
# same as matrix(1:12, nrow = 3, ncol = 4)
ped <- 13:24 ; dim(ped) <- c(4,3)
# save the list ( '=' sign for naming purposes only here)
mats <- list(cam = cam, ped = ped)
# notice the double brackets '[[' which is used for picking the list
for (i in 1:length(mats) {
rownames(mats[[i]]) <- LETTERS[1:dim(mats[[i]])[1]]
colnames(mats[[i]]) <- LETTERS[1:dim(mats[[i]])[2]]
}
# finally you can call the whole list at once as follows:
mats
# or seperately using $ or [[
mats$cam # mats[['cam']]
mats$ped # mats[['ped']]
ALTERNATIVELY
If you really want to get crazy you can take advantage of the get() and assign() functions. get() calls an object by character, and assign() can create one.
mats <- c('cam','ped')
mats.new <- NULL # initialize a matrix placeholder
for (i in 1:length(mats)) {
mats.new <- get(mats[i]) # save as a new matrix each loop
# use dimnames with a list input to do both the row and column at once
dimnames(mats.new) <- list(LETTERS[1:dim(mats.new)[1]],
LETTERS[1:dim(mats.new)[2]])
assign(mats[i],mats.new) # create (re-write) the matrix
}
If the datasets are placed in a list we can use lapply
lst <- lapply(mget(mats), function(x) {
dimnames(x) <- list(LETTERS[seq_len(nrow(x))], LETTERS[seq_len(ncol(x))])
x})
It is better to keep it in a list. In case the original objects needs to be changed
list2env(lst, envir = .GlobalEnv)

how to add value to existing variable from inside a loop?

I want to add a computed value to an existing vector from within a loop in which the wanted vector is called from within the loop . that is im looking for some function that is similar to assign() function but that will enable me to add values to an existing variables and not creating new variables.
example:
say I have 3 variabels :
sp=3
for(i in 1:sp){
name<-paste("sp",i,sep="")
assign(name,rnorm(5))
}
and now I want to access the last value in each of the variabels, double it and add the resault to the vector:
for(i in 1:sp){
name<-paste("sp",i,sep="")
name[6]<-name[5]*2
}
the problem here is that "name" is a string, how can R identify it as a veriable name and access it?
What you are asking for is something like this:
get(name)
In your code it would like this:
v <- 1:10
var <- "v"
for (i in v){
tmp <- get(var)
tmp[6] <- tmp[5]*2
assign(var, tmp)
}
# [1] 1 2 3 4 5 10 7 8 9 10
Does that help you in any way?
However, I agree with the other answer, that lists and the lapply/sapply-functions are better suited!
This is how you can do this with a list:
sp=3
mylist <- vector(mode = "list", length = sp) #initialize a list
names(mylist) <- paste0("sp",seq_len(sp)) #set the names
for(i in 1:sp){
mylist[[i]] <- rnorm(5)
}
for(i in 1:sp){
mylist[[i]] <- c(mylist[[i]], mylist[[i]][5] * 2)
}
mylist
#$sp1
#[1] 0.6974563 0.7714190 1.1980534 0.6011610 -1.5884306 -3.1768611
#
#$sp2
#[1] -0.2276942 0.2982770 0.5504381 -0.2096708 -1.9199551 -3.8399102
#
#$sp3
#[1] 0.235280995 0.276813498 0.002567075 -0.774551774 0.766898045 1.533796089
You can then access the list elements as described in help("["), i.e., mylist$sp1, mylist[["sp1"]], etc.
Of course, this is still very inefficient code and it could be improved a lot. E.g., since all three variables are of same type and length, they really should be combined into a matrix, which could be filled with one call to rnorm and which would also allow doing the second operation with vectorized operations.
#Roland is absolutely right and you absolutely should use a list for this type of problem. It's cleaner and easier to work with. Here's another way of working with what you have (It can be easily generalised):
sp <- replicate(3, rnorm(5), simplify=FALSE)
names(sp) <- paste0("sp", 1:3)
sp
#$sp1
#[1] -0.3723205 1.2199743 0.1226524 0.7287469 -0.8670466
#
#$sp2
#[1] -0.5458811 -0.3276503 -1.3031100 1.3064743 -0.7533023
#
#$sp3
#[1] 1.2683564 0.9419726 -0.5925012 -1.2034788 -0.6613149
newsp <- lapply(sp, function(x){x[6] <- x[5]*2; x})
newsp
#$sp1
#[1] -0.3723205 1.2199743 0.1226524 0.7287469 -0.8670466 -1.7340933
#
#$sp2
#[1] -0.5458811 -0.3276503 -1.3031100 1.3064743 -0.7533023 -1.5066046
#
#$sp3
#[1] 1.2683564 0.9419726 -0.5925012 -1.2034788 -0.6613149 -1.3226297
EDIT: If you are truly, sincerely dedicated to doing this despite being recommended otherwise, you can do it this way:
for(i in 1:sp){
name<-paste("sp",i,sep="")
assign(name, `[<-`(get(name), 6, `[`(get(name), 5) * 2))
}

Adding data frames as list elements (using for loop)

I have in my environment a series of data frames called EOG. There is one for each year between 2006 and 2012. Like, EOG2006, EOG2007...EOG2012. I would like to add them as elements of a list.
First, I am trying to know if this is possible. I read the official R guide and a couple of R programming manuals but I didn't find explicit examples about that.
Second, I would like to do this using a for loop. Unfortunately, the code I used to do the job is wrong and I am going crazy to fix it.
for (j in 2006:2012){
z<-j
sEOG<-paste("EOG", z, sep="")
dEOG<-get(paste("EOG", z, sep=""))
lsEOG<-list()
lsEOG[[sEOG]]<-dEOG
}
This returns a list with one single element. Where is the mistake?
You keep reinitializing the list inside the loop. You need to move lsEOG<-list() outside the for loop.
lsEOG<-list()
for (j in 2006:2012){
z <- j
sEOG <- paste("EOG", z, sep="")
dEOG <- get(paste("EOG", z, sep=""))
lsEOG[[sEOG]] <-dEOG
}
Also, you can use j directly in the paste functions:
sEOG <- paste("EOG", j, sep="")
I had the same question, but felt that the OP's initial code was a bit opaque for R beginners. So, here is perhaps a bit clearer example of how to create data frames in a loop and add them to a list which I just now figured out by playing around in the R shell:
> dfList <- list() ## create empty list
>
> for ( i in 1:5 ) {
+ x <- rnorm( 4 )
+ y <- sin( x )
+ dfList[[i]] <- data.frame( x, y ) ## create and add new data frame
+ }
>
> length( dfList ) ## 5 data frames in list
[1] 5
>
> dfList[[1]] ## print 1st data frame
x y
1 -0.3782376 -0.3692832
2 -1.3581489 -0.9774756
3 1.2175467 0.9382535
4 -0.7544750 -0.6849062
>
> dfList[[2]] ## print 2nd data frame
x y
1 -0.1211670 -0.1208707
2 -1.5318212 -0.9992406
3 0.8790863 0.7701564
4 1.4014124 0.9856888
>
> dfList[[2]][4,2] ## in 2nd data frame, print element in row 4 column 2
[1] 0.9856888
>
For R beginners like me, note that double brackets are required to access the ith data frame. Basically, double brackets are used for lists while single brackets are used for vectors.
If the data frames are saved as an object you can find them by apropos("EOG", ignore.case=FALSE) and them with a loop store them in the list:
list.EOG<- apropos("EOG", ignore.case=FALSE) #Find the objects with case sensitive
lsEOG<-NULL #Creates the object to full fill in the list
for (j in 1:length(list.EOG)){
lsEOG[i]<-get(list.EOG[i]) #Add the data.frame to each element of the list
}
to add the name of each one to the list you can use:
names(lsEOG, "names")<-list.EOG

Vector-version / Vectorizing a for which equals loop in R

I have a vector of values, call it X, and a data frame, call it dat.fram. I want to run something like "grep" or "which" to find all the indices of dat.fram[,3] which match each of the elements of X.
This is the very inefficient for loop I have below. Notice that there are many observations in X and each member of "match.ind" can have zero or more matches. Also, dat.fram has over 1 million observations. Is there any way to use a vector function in R to make this process more efficient?
Ultimately, I need a list since I will pass the list to another function that will retrieve the appropriate values from dat.fram .
Code:
match.ind=list()
for(i in 1:150000){
match.ind[[i]]=which(dat.fram[,3]==X[i])
}
UPDATE:
Ok, wow, I just found an awesome way of doing this... it's really slick. Wondering if it's useful in other contexts...?!
### define v as a sample column of data - you should define v to be
### the column in the data frame you mentioned (data.fram[,3])
v = sample(1:150000, 1500000, rep=TRUE)
### now here's the trick: concatenate the indices for each possible value of v,
### to form mybiglist - the rownames of mybiglist give you the possible values
### of v, and the values in mybiglist give you the index points
mybiglist = tapply(seq_along(v),v,c)
### now you just want the parts of this that intersect with X... again I'll
### generate a random X but use whatever X you need to
X = sample(1:200000, 150000)
mylist = mybiglist[which(names(mybiglist)%in%X)]
And that's it! As a check, let's look at the first 3 rows of mylist:
> mylist[1:3]
$`1`
[1] 401143 494448 703954 757808 1364904 1485811
$`2`
[1] 230769 332970 389601 582724 804046 997184 1080412 1169588 1310105
$`4`
[1] 149021 282361 289661 456147 774672 944760 969734 1043875 1226377
There's a gap at 3, as 3 doesn't appear in X (even though it occurs in v). And the
numbers listed against 4 are the index points in v where 4 appears:
> which(X==3)
integer(0)
> which(v==3)
[1] 102194 424873 468660 593570 713547 769309 786156 828021 870796
883932 1036943 1246745 1381907 1437148
> which(v==4)
[1] 149021 282361 289661 456147 774672 944760 969734 1043875 1226377
Finally, it's worth noting that values that appear in X but not in v won't have an entry in the list, but this is presumably what you want anyway as they're NULL!
Extra note: You can use the code below to create an NA entry for each member of X not in v...
blanks = sort(setdiff(X,names(mylist)))
mylist_extras = rep(list(NA),length(blanks))
names(mylist_extras) = blanks
mylist_all = c(mylist,mylist_extras)
mylist_all = mylist_all[order(as.numeric(names(mylist_all)))]
Fairly self-explanatory: mylist_extras is a list with all the additional list stuff you need (the names are the values of X not featuring in names(mylist), and the actual entries in the list are simply NA). The final two lines firstly merge mylist and mylist_extras, and then perform a reordering so that the names in mylist_all are in numeric order. These names should then match exactly the (unique) values in the vector X.
Cheers! :)
ORIGINAL POST BELOW... superseded by the above, obviously!
Here's a toy example with tapply that might well run significantly quicker... I made X and d relatively small so you could see what's going on:
X = 3:7
n = 100
d = data.frame(a = sample(1:10,n,rep=TRUE), b = sample(1:10,n,rep=TRUE),
c = sample(1:10,n,rep=TRUE), stringsAsFactors = FALSE)
tapply(X,X,function(x) {which(d[,3]==x)})

Accessing same named list elements of the list of lists in R

Frequently I encounter situations where I need to create a lot of similar models for different variables. Usually I dump them into the list. Here is the example of dummy code:
modlist <- lapply(1:10,function(l) {
data <- data.frame(Y=rnorm(10),X=rnorm(10))
lm(Y~.,data=data)
})
Now getting the fit for example is very easy:
lapply(modlist,predict)
What I want to do sometimes is to extract one element from the list. The obvious way is
sapply(modlist,function(l)l$rank)
This does what I want, but I wonder if there is a shorter way to get the same result?
probably these are a little bit simple:
> z <- list(list(a=1, b=2), list(a=3, b=4))
> sapply(z, `[[`, "b")
[1] 2 4
> sapply(z, get, x="b")
[1] 2 4
and you can define a function like:
> `%c%` <- function(x, n)sapply(x, `[[`, n)
> z %c% "b"
[1] 2 4
and also this looks like an extension of $:
> `%$%` <- function(x, n) sapply(x, `[[`, as.character(as.list(match.call())$n))
> z%$%b
[1] 2 4
I usually use kohske way, but here is another trick:
sapply(modlist, with, rank)
It is more useful when you need more elements, e.g.:
sapply(modlist, with, c(rank, df.residual))
As I remember I stole it from hadley (from plyr documentation I think).
Main difference between [[ and with solutions is in case missing elements. [[ returns NULL when element is missing. with throw an error unless there exist an object in global workspace having same name as searched element. So e.g.:
dah <- 1
lapply(modlist, with, dah)
returns list of ones when modlist don't have any dah element.
With Hadley's new lowliner package you can supply map() with a numeric index or an element name to elegantly pluck components out of a list. map() is the equivalent of lapply() with some extra tricks.
library("lowliner")
l <- list(
list(a = 1, b = 2),
list(a = 3, b = 4)
)
map(l, "b")
map(l, 2)
There is also a version that simplifies the result to a vector
map_v(l, "a")
map_v(l, 1)

Resources