sum rows in a nested list in R - r

I have a nested list coming out of a program its length is 100. I need to sum all elements of first row and all elements of 2nd row. Here a a small reproducible example. What I need is sum of 1+3+5+7= 16 and sum of 2+4+6+8= 20 as a vector or matrix.
l1<-as.matrix(c(1,2))
l2<-as.matrix(c(3,4))
l3<-as.matrix(c(5,6))
l4<-as.matrix(c(7,8))
ll1<-list(l1,l2)
ll2<-list(l3,l4)
lll<-list(ll1,ll2)
lll
[[1]]
[[1]][[1]]
[,1]
[1,] 1
[2,] 2
[[1]][[2]]
[,1]
[1,] 3
[2,] 4
[[2]]
[[2]][[1]]
[,1]
[1,] 5
[2,] 6
[[2]][[2]]
[,1]
[1,] 7
[2,] 8

I found the purrr package helpful for the function flatten() because it only removes one level of the hierarchy of the lists:
library(magrittr) #for pipes
library(purrr) #for flatten
lll %>% flatten %>% as.data.frame %>% rowSums
Based on akrun's answer it is similar to do.call(c, lll).

We can do this with base R by removing the nested list to a single list using do.call(c then cbind the elements of the list and get the rowSums
rowSums(do.call(cbind, do.call(c, lll)))
#[1] 16 20
Or otherwise we can unlist, create a matrix with 2 columns, and get the colSums
colSums(matrix(unlist(lll), ncol=2, byrow=TRUE))
#[1] 16 20

Reducein base R:
Reduce("+", lapply(Reduce(c, lll), rowSums))
#[1] 16 20

Related

Access an element of a list in the same manner how you access an element of a matrix

I have a matrix:
mat <- matrix(c(3,9,5,1,-2,8), nrow = 2)
[,1] [,2] [,3]
[1,] 3 5 -2
[2,] 9 1 8
I have a list:
lst <- as.list(data.frame(matrix(c(3,9,5,1,-2,8), nrow = 2)))
$X1
[1] 3 9
$X2
[1] 5 1
$X3
[1] -2 8
I can access my matrix by mat[i,j]
I can access my list lst[[c(i,j)]]
But if in a matrix if I do mat[1,2] I get a 5. If I use same numbers in a list lst[[c(1,2)]] I get 9.
Is there a way I can get the same numbers when I access a list? Maybe manipulate the list in certain manner? When I use lst[[c(1,2)]] I want to get 5 instead of 9.I want to get the same numbers I get when using mat[i,j].
You can try
> list2DF(lst)[1, 2]
[1] 5
You can use transpose() from purrr to transpose a list.
lst2 <- purrr::transpose(lst)
lst2[[c(1,2)]]
# [1] 5

Return a dataframe of averages from a list of dataframes

I have a list of 22 dataframes each is 49 columns and 497 rows.
I need to produce an average/mean dataframe from these 22.
Already tried these, myfiles2 is the list of dataframes
ans1 = aaply(laply(myfiles2, as.matrix), c(2, 3), mean)
ans2 <- do.call("mean", myfiles2)
ans3 <- lapply(myfiles2, function (x) lapply(x, mean, na.rm=TRUE))
ans4 <- Reduce("+", myfiles2)/length(myflies2)
ans5 <- lapply(myfiles2, mean)
The list of dataframes was created using
myfiles2 = lapply(filesToProcess, read.csv, skip=2, colClasses=colClasses)
Taking the first value in each dataframe manually and calculating the mean with mean() works.
Trying to use mean or calculating it as shown above across the list of dataframes gives an incorrect result.
The result I'm looking for is a [49X497] dataframe with each location containing the mean calculated from the same location in the 22 dataframes.
All values are 10 significant figures with 4 decimal places.
You may use simplify2array() in base R.
Example
list1
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 1 9 8 3
# [2,] 5 2 6 11
# [3,] 12 4 10 7
#
# [[2]]
# [,1] [,2] [,3] [,4]
# [1,] 4 12 3 6
# [2,] 9 2 1 7
# [3,] 5 8 10 11
#
# [[3]]
# [,1] [,2] [,3] [,4]
# [1,] 5 8 1 12
# [2,] 4 3 7 6
# [3,] 2 10 11 9
t(apply(simplify2array(list1), 1:2, mean))
# [,1] [,2] [,3]
# [1,] 3.333333 6.000000 6.333333
# [2,] 9.666667 2.333333 7.333333
# [3,] 4.000000 4.666667 10.333333
# [4,] 7.000000 8.000000 9.000000
Data
set.seed(42)
list1 <- replicate(3, matrix(sample(1:12), 3, 4), simplify=FALSE)
Use the abind package to create a 3D array from your list of data.frames;
library(abind)
myfiles2 <- abind(myfiles2, along = 3)
or in Base R:
myfiles2 <- simplify2array(myfiles2)
Then, use apply() to take the mean for each cell across all 22 data.frames:
apply(myfiles2, 1:2, mean)
After the hint from #tom above the final solution arrived at was to change the list of data frames to a single data frame with all data and use the tidyverse to process it.
There were a few little tidy ups needed.
An errant character column from the origin of the data
A column with data in both upper and lower case
Avoiding the character columns in the mean calculation
Then putting the character columns and the mean data frame back together to get it back in the correct order.
So...
Change the format to a single data frame and fix the non-numeric column
myfiles3 <- myfiles2 %>%
bind_rows() %>%
transform(EdgeStepL2 = as.numeric(EdgeStepL2))
ensure the section names are in uppercase to be consistent
myfiles3$Section <- str_to_upper(myfiles3$Section)
calculate the mean of each cell grouped by common values.
myfiles4 <- myfiles3 %>% group_by(Section,Chainage) %>%
summarise_at(vars("East":"Surf.Det"),funs(mean(., na.rm = TRUE)))
myfiles5 <- data.frame(myfiles2[[1]][1:2])
myfiles6 <- left_join(myfiles5, myfiles4)
This is not the simple solution I had hoped for but for the next person to try this.
Look for the NA's (everywhere in the data).
Make sure that all the columns you are running the mean (or other function) on are those you can calculate with.

average elements of sublists in a nested list in R

I need to average the elements of the nested sublist in the following way. For the example below I have list lll. I would like to have for lll[[1]] compute average of sublists (1+3+5+7)/4 =4, (2+4+6+8)/4=5. Similarly for lll[[2]] compute average of sublists we have (2+4+6+8)/4=5, (1+3+5+7)=4. I could do this using a for loop but the result is not as desired. Since I would like to have list or dataframe which is horizontal for example list(c(4,5),c(5,4)).Also when I have a list of 5000 elements for loop is not efficient. Will really appreciate a smarter way to do this.
l1<-as.matrix(c(1,2))
l2<-as.matrix(c(3,4))
l3<-as.matrix(c(5,6))
l4<-as.matrix(c(7,8))
l5<-as.matrix(c(2,1))
l6<-as.matrix(c(4,3))
l7<-as.matrix(c(6,5))
l8<-as.matrix(c(8,7))
ll1<-list(l1,l2,l3,l4)
ll2<-list(l5,l6,l7,l8)
lll<-list(ll1,ll2)
### using for loop
sum_k_a_<-list()
sum_k_b_<-list()
for (l in 1:2){
sum_k_a<-0
sum_k_b<-0
for (k in 1:4){
sum_k_a=lll[[l]][[k]][1]+sum_k_a
sum_k_b=lll[[l]][[k]][2]+sum_k_b
}
sum_k_a_[[l]]<-sum_k_a/4
sum_k_b_[[l]]<-sum_k_b/4
}
A couple of options:
lapply(lll, function(x) Reduce(`+`, x)/length(x) )
#[[1]]
# [,1]
#[1,] 4
#[2,] 5
#
#[[2]]
# [,1]
#[1,] 5
#[2,] 4
lapply(lll, function(x) rowMeans(do.call(cbind, x)))
#[[1]]
#[1] 4 5
#
#[[2]]
#[1] 5 4
You could do it using lapply and sapply:
lapply(lll,function(x) rowSums(sapply(x,function(y) c(y[1],y[2]))/4))
This returns a list of 2 elements:
[[1]]
[1] 4 5
[[2]]
[1] 5 4
We can also use tidyverse syntax
library(tidyverse)
lll %>%
map(~Reduce(`+`, .)/length(.))
#[[1]]
# [,1]
#[1,] 4
#[2,] 5
#[[2]]
# [,1]
#[1,] 5
#[2,] 4
Would really be much more simply done with an implicit sapply loop that applies mean to the unlist-ed values that are "deeper" in the list structures:
L_means <- sapply( lll, FUN=function(items) {mean( unlist(items))})
L_means
[1] 4.5 4.5
I guess I misunderstood the question, so this is what was desired:
(L_means <- sapply( lll, FUN=function(top){ apply( as.data.frame(top), 1, mean)}) )
[,1] [,2]
[1,] 4 5
[2,] 5 4

List of lists to matrix

I have a list of lists and I want to convert it into a matrix such that each column = one sublist.
Mock example
list1 <- list(1, 2)
list2 <- list(1, 2, 3)
list3 <- list(1, 2, 3, 4)
list_lists <- list (list1, list2, list3)
I'm first egalizing the lengths of all the sublists (padding with NULLs if needed) so that all sublists have the length of the longest one. That is to avoid having R repeating data to fill in the rows in the final matrix (feel free if I can skip this step somehow).
max_length <- max(unlist(lapply (list_lists, FUN = length)))
list_lists <- lapply (list_lists, function (x) {length (x) <- max_length; return (x)})
My best attempt so far
mat <- lapply (list_lists, cbind)
mat does look superficially like what I want but it is actually not. It is not a matrix (and attempts to convert it into one using as.matrix are unsuccessful) and I cannot refer to columns/rows like I would do with a matrix.
I am expecting
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
[3,] NULL 3 3
[4,] NULL NULL 4
What is weird to me is that
mat <- cbind (list_lists[[1]], list_lists[[2]], list_lists[[3]])
seems to work. I would bet these two lines are the same, how can they be different?
They are different, lapply returns a list, See below from an excerpt from documentation
Use do.call instead of mat <- lapply (list_lists, cbind) as following:
mat <- do.call("cbind",list_lists)
do.call is same as cbind (list_lists[[1]], list_lists[[2]], list_lists[[3]]) , it happens to operate on a sequence of lists which would be dataframe columns.
> do.call("cbind",list_lists)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
[3,] NULL 3 3
[4,] NULL NULL 4
>
Understanding do.call:
From documentation:
do.call constructs and executes a function call from a name or a
function and a list of arguments to be passed to it.
lapply returns a list of the same length as X, each element of which
is the result of applying FUN to the corresponding element of X.
Search on r console for ?do.call and ?lapply
You can also read: do.call and lapply
Use sapply instead of lappy like this:
list_lists <- sapply (list_lists, function (x) {length (x) <- max_length; return (x)})
this should give you the matrix that you wanted. Seems like the sapply will recursively unlist each list in the list_lists then apply the function that you specified and wrap all the outputs into a matrix, effectively bypassing the other line that you specifie above.
The stri_list2matrix function should be able to handle this:
library(stringi)
stri_list2matrix(list_lists)
## [,1] [,2] [,3]
## [1,] "1" "1" "1"
## [2,] "2" "2" "2"
## [3,] NA "3" "3"
## [4,] NA NA "4"
Another option is to use your "max_length" to create the matrix:
ml <- max(lengths(list_lists))
do.call(cbind, lapply(list_lists, function(x) `length<-`(unlist(x), ml)))
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 2 2 2
## [3,] NA 3 3
## [4,] NA NA 4
A third option is to use melt from "reshape2":
library(reshape2)
dcast(melt(list_lists), L2 ~ L1)
## L2 1 2 3
## 1 1 1 1 1
## 2 2 2 2 2
## 3 3 NA 3 3
## 4 4 NA NA 4

Processing list object

I have a list object in R that contains further lists, of three vectors each. What is the quickest way to generate three matrices, the first of which has all of the first vectors as rows, the second has all of the second vectors as rows, and the third has all of the third? For example, given:
metalist <- list(list(c(1,1),c(11,11),c("a","a")),
list(c(2,2),c(22,22),c("b","b")),
list(c(3,3),c(33,33),c("c","c")))
I would like to get to three matrices (or data.frames), the first consisting of:
1 1
2 2
3 3
The second consisting of
11 11
22 22
33 33
And the third consisting of
a a
b b
c c
Given that in reality the metalist has 50,000 list objects, a for loop that extracts the vector elements and progressively assembles the matrices takes forever, so I would be looking for something quicker. I'm guessing there may be some clever use of unlist() but I can't figure it out.
The pattern do.call(Map,c(f=___,...)) is a useful one to have in your toolbox. Using list in the blank "transposes" the structure, using rbind will produce your desired matrices:
do.call(Map,c(f=rbind,metalist))
[[1]]
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
[[2]]
[,1] [,2]
[1,] 11 11
[2,] 22 22
[3,] 33 33
[[3]]
[,1] [,2]
[1,] "a" "a"
[2,] "b" "b"
[3,] "c" "c"
The following will create a list of three matrices:
my_outcome <- list()
for (i in 1:3)
{
my_outcome[[i]] <- t(as.data.frame(lapply(metalist, `[[`, i)))
}
It does use a loop, but only over the number of matrices, so it should work in your case.
If you would like to completely ignore for loops, the following also gets the job done:
lapply(1:3, function(y) {
do.call(rbind,lapply(metalist, function(x) x[[y]]))
})

Resources