I have a list of lists and I want to convert it into a matrix such that each column = one sublist.
Mock example
list1 <- list(1, 2)
list2 <- list(1, 2, 3)
list3 <- list(1, 2, 3, 4)
list_lists <- list (list1, list2, list3)
I'm first egalizing the lengths of all the sublists (padding with NULLs if needed) so that all sublists have the length of the longest one. That is to avoid having R repeating data to fill in the rows in the final matrix (feel free if I can skip this step somehow).
max_length <- max(unlist(lapply (list_lists, FUN = length)))
list_lists <- lapply (list_lists, function (x) {length (x) <- max_length; return (x)})
My best attempt so far
mat <- lapply (list_lists, cbind)
mat does look superficially like what I want but it is actually not. It is not a matrix (and attempts to convert it into one using as.matrix are unsuccessful) and I cannot refer to columns/rows like I would do with a matrix.
I am expecting
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
[3,] NULL 3 3
[4,] NULL NULL 4
What is weird to me is that
mat <- cbind (list_lists[[1]], list_lists[[2]], list_lists[[3]])
seems to work. I would bet these two lines are the same, how can they be different?
They are different, lapply returns a list, See below from an excerpt from documentation
Use do.call instead of mat <- lapply (list_lists, cbind) as following:
mat <- do.call("cbind",list_lists)
do.call is same as cbind (list_lists[[1]], list_lists[[2]], list_lists[[3]]) , it happens to operate on a sequence of lists which would be dataframe columns.
> do.call("cbind",list_lists)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
[3,] NULL 3 3
[4,] NULL NULL 4
>
Understanding do.call:
From documentation:
do.call constructs and executes a function call from a name or a
function and a list of arguments to be passed to it.
lapply returns a list of the same length as X, each element of which
is the result of applying FUN to the corresponding element of X.
Search on r console for ?do.call and ?lapply
You can also read: do.call and lapply
Use sapply instead of lappy like this:
list_lists <- sapply (list_lists, function (x) {length (x) <- max_length; return (x)})
this should give you the matrix that you wanted. Seems like the sapply will recursively unlist each list in the list_lists then apply the function that you specified and wrap all the outputs into a matrix, effectively bypassing the other line that you specifie above.
The stri_list2matrix function should be able to handle this:
library(stringi)
stri_list2matrix(list_lists)
## [,1] [,2] [,3]
## [1,] "1" "1" "1"
## [2,] "2" "2" "2"
## [3,] NA "3" "3"
## [4,] NA NA "4"
Another option is to use your "max_length" to create the matrix:
ml <- max(lengths(list_lists))
do.call(cbind, lapply(list_lists, function(x) `length<-`(unlist(x), ml)))
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 2 2 2
## [3,] NA 3 3
## [4,] NA NA 4
A third option is to use melt from "reshape2":
library(reshape2)
dcast(melt(list_lists), L2 ~ L1)
## L2 1 2 3
## 1 1 1 1 1
## 2 2 2 2 2
## 3 3 NA 3 3
## 4 4 NA NA 4
Related
I have a matrix:
mat <- matrix(c(3,9,5,1,-2,8), nrow = 2)
[,1] [,2] [,3]
[1,] 3 5 -2
[2,] 9 1 8
I have a list:
lst <- as.list(data.frame(matrix(c(3,9,5,1,-2,8), nrow = 2)))
$X1
[1] 3 9
$X2
[1] 5 1
$X3
[1] -2 8
I can access my matrix by mat[i,j]
I can access my list lst[[c(i,j)]]
But if in a matrix if I do mat[1,2] I get a 5. If I use same numbers in a list lst[[c(1,2)]] I get 9.
Is there a way I can get the same numbers when I access a list? Maybe manipulate the list in certain manner? When I use lst[[c(1,2)]] I want to get 5 instead of 9.I want to get the same numbers I get when using mat[i,j].
You can try
> list2DF(lst)[1, 2]
[1] 5
You can use transpose() from purrr to transpose a list.
lst2 <- purrr::transpose(lst)
lst2[[c(1,2)]]
# [1] 5
Say I have two vectors:
a <- 1:4
b <- 1:2
and a bivariate function:
f <- function(x,y) x**y
I would like to get a simple and efficient way (a one-liner?) to get (for this specific example):
[,1] [,2]
[1,] 1 1
[2,] 2 4
[3,] 3 9
[4,] 4 16
I can do:
res <- matrix(nrow=length(a), ncol=length(b))
for (i in 1:length(b)){
res[,i] <- mapply(f, a , b[i])
}
but I want to avoid loops.
Just use lapply over one of the vectors, while setting the other as constant. Then cbind() the list with do.call():
test <- do.call(cbind, lapply(b, function(x) a**x))
> test
[,1] [,2]
[1,] 1 1
[2,] 2 4
[3,] 3 9
[4,] 4 16
I need to average the elements of the nested sublist in the following way. For the example below I have list lll. I would like to have for lll[[1]] compute average of sublists (1+3+5+7)/4 =4, (2+4+6+8)/4=5. Similarly for lll[[2]] compute average of sublists we have (2+4+6+8)/4=5, (1+3+5+7)=4. I could do this using a for loop but the result is not as desired. Since I would like to have list or dataframe which is horizontal for example list(c(4,5),c(5,4)).Also when I have a list of 5000 elements for loop is not efficient. Will really appreciate a smarter way to do this.
l1<-as.matrix(c(1,2))
l2<-as.matrix(c(3,4))
l3<-as.matrix(c(5,6))
l4<-as.matrix(c(7,8))
l5<-as.matrix(c(2,1))
l6<-as.matrix(c(4,3))
l7<-as.matrix(c(6,5))
l8<-as.matrix(c(8,7))
ll1<-list(l1,l2,l3,l4)
ll2<-list(l5,l6,l7,l8)
lll<-list(ll1,ll2)
### using for loop
sum_k_a_<-list()
sum_k_b_<-list()
for (l in 1:2){
sum_k_a<-0
sum_k_b<-0
for (k in 1:4){
sum_k_a=lll[[l]][[k]][1]+sum_k_a
sum_k_b=lll[[l]][[k]][2]+sum_k_b
}
sum_k_a_[[l]]<-sum_k_a/4
sum_k_b_[[l]]<-sum_k_b/4
}
A couple of options:
lapply(lll, function(x) Reduce(`+`, x)/length(x) )
#[[1]]
# [,1]
#[1,] 4
#[2,] 5
#
#[[2]]
# [,1]
#[1,] 5
#[2,] 4
lapply(lll, function(x) rowMeans(do.call(cbind, x)))
#[[1]]
#[1] 4 5
#
#[[2]]
#[1] 5 4
You could do it using lapply and sapply:
lapply(lll,function(x) rowSums(sapply(x,function(y) c(y[1],y[2]))/4))
This returns a list of 2 elements:
[[1]]
[1] 4 5
[[2]]
[1] 5 4
We can also use tidyverse syntax
library(tidyverse)
lll %>%
map(~Reduce(`+`, .)/length(.))
#[[1]]
# [,1]
#[1,] 4
#[2,] 5
#[[2]]
# [,1]
#[1,] 5
#[2,] 4
Would really be much more simply done with an implicit sapply loop that applies mean to the unlist-ed values that are "deeper" in the list structures:
L_means <- sapply( lll, FUN=function(items) {mean( unlist(items))})
L_means
[1] 4.5 4.5
I guess I misunderstood the question, so this is what was desired:
(L_means <- sapply( lll, FUN=function(top){ apply( as.data.frame(top), 1, mean)}) )
[,1] [,2]
[1,] 4 5
[2,] 5 4
I have several matrices, lets make it simple and say I have 3 matrices. I want to create a list of them and then use rbind to put one over the other.
If I do it by hand, using the following code, it works:
list<-list(matrix1,matrix2,matrix3)
test<-do.call("rbind",list)
and I get a matrix of 97947 rows by 4 columns which is what I want.
but if I do a loop, it does not work:
list2<-list()
for (i in 1:3)
{
y<-paste0("matrix",x)
list2[[x]] <- y
}
test2<-do.call("rbind",list2)
And I get a 3x1 character matrix ???
Can someone please point me to the error?
Any comments would be greatly appreciated.
Thank you!!!!
Consider using a function like mget to get all of your matrix objects from the globalenvironment (the default environment) and put them in a list. You can then use your do.call method and avoid the loop. Here is a toy example:
# Some data
m1 <- matrix( 1:4 , 2 , byrow = TRUE )
m2 <- matrix( 1:4 , 2 , byrow = TRUE )
m3 <- matrix( 1:4 , 2 , byrow = TRUE )
# Use mget to put them in a list. mget searches the .GlobalEnvironment (by default) for the object names in it's first argument
list <- mget( paste0( "m" , 1:3 ) )
list
#$m1
# [,1] [,2]
#[1,] 1 2
#[2,] 3 4
#$m2
# [,1] [,2]
#[1,] 1 2
#[2,] 3 4
#$m3
# [,1] [,2]
#[1,] 1 2
#[2,] 3 4
# rbind them
do.call( rbind , list )
# [,1] [,2]
#[1,] 1 2
#[2,] 3 4
#[3,] 1 2
#[4,] 3 4
#[5,] 1 2
#[6,] 3 4
Do the following function pairs generate exactly the same results?
Pair 1) names() & colnames()
Pair 2) rownames() & row.names()
As Oscar Wilde said
Consistency is the last refuge of the
unimaginative.
R is more of an evolved rather than designed language, so these things happen. names() and colnames() work on a data.frame but names() does not work on a matrix:
R> DF <- data.frame(foo=1:3, bar=LETTERS[1:3])
R> names(DF)
[1] "foo" "bar"
R> colnames(DF)
[1] "foo" "bar"
R> M <- matrix(1:9, ncol=3, dimnames=list(1:3, c("alpha","beta","gamma")))
R> names(M)
NULL
R> colnames(M)
[1] "alpha" "beta" "gamma"
R>
Just to expand a little on Dirk's example:
It helps to think of a data frame as a list with equal length vectors. That's probably why names works with a data frame but not a matrix.
The other useful function is dimnames which returns the names for every dimension. You will notice that the rownames function actually just returns the first element from dimnames.
Regarding rownames and row.names: I can't tell the difference, although rownames uses dimnames while row.names was written outside of R. They both also seem to work with higher dimensional arrays:
>a <- array(1:5, 1:4)
> a[1,,,]
> rownames(a) <- "a"
> row.names(a)
[1] "a"
> a
, , 1, 1
[,1] [,2]
a 1 2
> dimnames(a)
[[1]]
[1] "a"
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
I think that using colnames and rownames makes the most sense; here's why.
Using names has several disadvantages. You have to remember that it means "column names", and it only works with data frame, so you'll need to call colnames whenever you use matrices. By calling colnames, you only have to remember one function. Finally, if you look at the code for colnames, you will see that it calls names in the case of a data frame anyway, so the output is identical.
rownames and row.names return the same values for data frame and matrices; the only difference that I have spotted is that where there aren't any names, rownames will print "NULL" (as does colnames), but row.names returns it invisibly. Since there isn't much to choose between the two functions, rownames wins on the grounds of aesthetics, since it pairs more prettily withcolnames. (Also, for the lazy programmer, you save a character of typing.)
And another expansion:
# create dummy matrix
set.seed(10)
m <- matrix(round(runif(25, 1, 5)), 5)
d <- as.data.frame(m)
If you want to assign new column names you can do following on data.frame:
# an identical effect can be achieved with colnames()
names(d) <- LETTERS[1:5]
> d
A B C D E
1 3 2 4 3 4
2 2 2 3 1 3
3 3 2 1 2 4
4 4 3 3 3 2
5 1 3 2 4 3
If you, however run previous command on matrix, you'll mess things up:
names(m) <- LETTERS[1:5]
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 3 2 4 3 4
[2,] 2 2 3 1 3
[3,] 3 2 1 2 4
[4,] 4 3 3 3 2
[5,] 1 3 2 4 3
attr(,"names")
[1] "A" "B" "C" "D" "E" NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[20] NA NA NA NA NA NA
Since matrix can be regarded as two-dimensional vector, you'll assign names only to first five values (you don't want to do that, do you?). In this case, you should stick with colnames().
So there...