Applying "dim" function over elements of a list in R - r

I have n number of lists, lets say list1, list2, ..., listn. Each list has 10 elements and I need to calculate the "mean" of "dim" of ten elements of each list. So the output should be a vector of length n.
For example the first element of the output vector should be:
n1 = mean(dim(list1[[1]]), dim(list1[[2]]), dim(list1[[3]]), ..., dim(list1[[10]])
I know how to obtain it using for-loops but I am sure it is not the best solution.
The lists have structure derived from one of "Bioconductor" R packages called "edgeR".
So each element of the list has this structure:
$ :Formal class 'TopTags' [package "edgeR"] with 1 slots
.. ..# .Data:List of 4
.. .. ..$ :'data.frame': 2608 obs. of 4 variables:
.. .. .. ..$ logFC : num [1:2608] 6.37 -6.48 -5.72 -5.6 -4.01 ...
.. .. .. ..$ logCPM: num [1:2608] 5.1 2.55 2.08 1.57 3.08 ...
.. .. .. ..$ PValue: num [1:2608] 3.16e-292 1.57e-187 2.15e-152 5.58e-141 1.27e-135 ...
.. .. .. ..$ FDR : num [1:2608] 7.37e-288 1.83e-183 1.67e-148 3.25e-137 5.92e-132 ...
.. .. ..$ : chr "BH"
.. .. ..$ : chr [1:2] "healthy" "cancerous"
.. .. ..$ : chr "exact"
And since each list has 10 elements, I have 10 repeats of above structure when running:
str(list1)

Original question
lapply (or sapply) is your friend:
mean(sapply(mylist,dim))
If you have many lists with a uniform meaning and structure, you should use instead a list of lists (i.e., mylist[[3]] instead of mylist3).
Edited question
sapply(mylist, function(x) mean(sapply(x,dim)))
will return a vector of means of inner lists.
Question in a comment
If your list contains matrices instead of vectors and you want to average one of the dimensions (dim(.)[1] or dim(.)[2]), you can use ncol and nrow for that instead of dim.
Alternatively, you can pass any function there, e.g.,
sapply(mylist, function(x) mean(sapply(x, function(y) sum(dim(y)))))
to average the sums of dimensions.

If all your objects are called "list*" and you have no other objects with the names list in them, you can easily stick all the lists into a single list object which will make it easier to operate on them...
ll <- mget( ls( pattern = "list" ) )
sapply( ll , function(x) mean( sapply( x , dim ) )

Here is the solution using Map function where mylist is the list of yours:
Map(function(x) mean(x[[1]]:x[[10]]), mylist)
Example:
a<-list(1,2,3,4)
b<-list(2,3,5,6)
mylist<-list(a,b)
k<- Map(function(x) mean(x[[1]]:x[[4]]), mylist)
>k
[[1]]
[1] 2.5
[[2]]
[1] 4
To convert to vector:
> do.call(rbind,k)
[,1]
[1,] 2.5
[2,] 4.0
OR,
library(plyr)
ldply(k)
V1
1 2.5
2 4.0
If the elements of each list are matrix:
Map(function(x) mean(dim(x[[1]])[1]:dim(x[[10]])[1]), mylist)

Related

Concatenate a series of lists with incrementing numeric suffixes in R

In R I have a series of lists with incrementing numeric suffixes eg mylist1 , mylist2 , mylist3.
I want to concatenate these , like c(mylist1, mylist2, mylist3)
Is there a shorthand way to manage this?
I think you are trying to create a list of lists.
You can do it simply by calling:
list(list1, list2, list3)
If you have many lists with a similar name pattern, you can select use mget to GET all objects whose names have a specific pattern, (ls(pattern=x)).
data
list8<-list(1,2)
list9<-list(3,4)
list10<-list(5,6)
#Included the lists with indexes 8:10 so that the importance of ordering by `parse_number(ls)` is highlighted. Without the `parse_number` step, the list would be sorted by names, with a different order
Answer
list_of_lists<-mget(ls(pattern = 'list\\d+')[order(parse_number(ls(pattern = 'list\\d+')))])
> str(list_of_lists)
List of 3
$ mylist8:List of 2
..$ : num 1
..$ : num 2
$ mylist9:List of 2
..$ : num 3
..$ : num 4
$ mylist10:List of 2
..$ : num 5
..$ : num 6

How can extract row names after PCA implementation?

I am reducing the dimensional of a test DataFrame(contain 30rows and 750 colunm) with PCA model with PCA (using the FactoMineR library) as follows:
pca_base <- PCA(test, ncp=5, graph=T)
I used function dimdesc() [in FactoMineR], for dimension description,to
identify the most significantly associated variables with a given principal component as follow:
pca_dim<-dimdesc(pca_base)
pca_dim is a list of 3 length.
My question is How can I extract row names of pca_dim from the list[1] and list[2]??.
I try this code:
#to select dim 1,2 use axes
pca_dim<-dimdesc(pca_base,axes = c(1,2))
rownames(pca_dim[[1]])
But the result was NULL.
For instant, I'll use the demo data sets decathlon2 from the factoextra package:data(decathlon2)
It contains 27 individuals (athletes) described by 13 variables.
library(factoextra)
data(decathlon2)
decathlon2.active <- decathlon2[1:23, 1:10]
res.pca <- PCA(decathlon2.active,scale.unit = TRUE, graph = FALSE)
res.desc <- dimdesc(res.pca, axes = c(1,2))
Thanks!
When you have that kind of issues, to access information on an R object, the best way to solve them is to start by examining the output of function str.
str(pca_dim)
#List of 2
# $ Dim.1:List of 1
# ..$ quanti: num [1:8, 1:2] 0.794 0.743 0.734 0.61 0.428 ...
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:8] "Long.jump" "Discus" "Shot.put" "High.jump" ...
# .. .. ..$ : chr [1:2] "correlation" "p.value"
# $ Dim.2:List of 1
# ..$ quanti: num [1:3, 1:2] 8.07e-01 7.84e-01 -4.65e-01 3.21e-06 9.38e-06 ...
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:3] "Pole.vault" "X1500m" "High.jump"
# .. .. ..$ : chr [1:2] "correlation" "p.value"
So the structure of the object is simple, it is a list of two lists. Each of these sublists has just one member, a matrix with the dimnames attribute set.
So you can use standard accessor functions to get those attributes.
rownames(pca_dim$Dim.1$quanti)
#[1] "Long.jump" "Discus" "Shot.put" "High.jump" "Javeline"
#[6] "X400m" "X110m.hurdle" "X100m"
rownames(pca_dim$Dim.2$quanti)
#[1] "Pole.vault" "X1500m" "High.jump"
You have to move the result of dimdesc to data.frame for each element, like this:
rownames(data.frame(res.desc[1]))
[1] "Long.jump" "Discus" "Shot.put" "High.jump" "Javeline" "X400m" "X110m.hurdle"
[8] "X100m"
> rownames(data.frame(res.desc[2]))
[1] "Pole.vault" "X1500m" "High.jump"

List assignment for list with greater than three nesting

I have not been able to find a fix for this error. I have implemented work-arounds before, but I wonder if anyone here knows why it occurs.
the following returns no error as expected
q <- list()
q[["a"]][["b"]] <- 3
q[["a"]][["c"]] <- 4
However, when I add another level of nesting I get:
q <- list()
q[["a"]][["b"]][["c"]]<- 3
q[["a"]][["b"]][["d"]] <- 4
Error in q[["a"]][["b"]][["d"]] <- 4 : more elements supplied than there are to replace
To make this even more confusing if I add a fourth nested list I get:
q <- list()
q[["a"]][["b"]][["c"]][["d"]] <- 3
q[["a"]][["b"]][["c"]][["e"]] <- 4
Error in *tmp*[["c"]] : subscript out of bounds
I would have expected R to return the same error message for the triple nested list as for the quadruple nested list.
I first came across this a few months ago. I am running R 3.4.3.
If we check the str(q) from the first assignment, it is a list with a single element 'a'. On subsequent assignment, it is creating a named vector rather than a list.
q <- list()
q[["a"]][["b"]] <- 3
q[["a"]][["c"]] <- 4
str(q)
#List of 1
# $ a: Named num [1:2] 3 4
# ..- attr(*, "names")= chr [1:2] "b" "c"
is.vector(q$a)
#[1] TRUE
If we try to do an assignment on the next level, it is like assignment based on indexing the name i.e. 'b' which is empty and assign value on 'c'. The option would be to create a list element by wrapping the value with list
q <- list()
q[["a"]][["b"]][["c"]]<- list(3)
q[["a"]][["b"]][["d"]] <- list(4)
It returns the structure with 'q' as a list of 1 element i.e. 'a', which is again a list of length 1 ('b') and as we assign two values '3' and '4' for 'c' and 'd', it is a list of 2 elemeents
str(q)
#List of 1
# $ a:List of 1
# ..$ b:List of 2
# .. ..$ c:List of 1
# .. .. ..$ : num 3
# .. ..$ d:List of 1
# .. .. ..$ : num 4
By this way, we can nest 'n' number of lists
q <- list()
q[["a"]][["b"]][["c"]][["d"]] <- list(3)
q[["a"]][["b"]][["c"]][["e"]] <- list(4)
Note: It is not clear about the expected output structure

R get row mean of every nth element in a list

I have a nested list and I want to get the mean of one particular variable inside the list.
When my list was not nested, it was simple to do, but I am not sure how to change my code now that there are multiple elements of different sizes
str of list:
> str(means2)
List of 1
$ :List of 2
..$ :'data.frame': 12 obs. of 2 variables:
.. ..$ means : num [1:12] 465063 355968 76570 542873 854570 ...
.. ..$ variablenames: chr [1:12] "NumberOfPassengers" "FareClass" "TripType" "JourneyTravelTime" ...
..$ :'data.frame': 12 obs. of 2 variables:
.. ..$ means : num [1:12] 449490 359997 67899 602895 967327 ...
.. ..$ variablenames: chr [1:12] "NumberOfPassengers" "FareClass" "TripType" "JourneyTravelTime" ...
I was using this code
testdf=as.data.frame(rowMeans(simplify2array(sapply(means2,"[[",1))))
I am just not sure how to change this code to match the fact the means I am obtaining are from the 2nd element and not the first(only) element.
Thanks for any help
example:
edited: example had an error
Based on the str of the nested list for 'means2', this should work
unlist(lapply(means2, function(x) rowMeans(do.call(cbind, sapply(x, "[", 1)))))
As there is only a single outer list, we can extract it using [[, loop over the list elements get the first column as vector, get the elementwise sum with Reduce and divide by the length of the list (in the example it is 2).
Reduce(`+`, lapply(means2[[1]], `[`, 1))/2
Or after extracting the list elements, cbind it and do a rowMeans
rowMeans(do.call(cbind,lapply(means2[[1]], `[`, 1)))
data
means2 <- list(list(data.frame(means = 1:5, variablenames = letters[1:5]),
data.frame(means = 11:15, variablenames = letters[6:10])) )

Add dataframes to each list element

I am reading a series of files that end up in a list of dataframes. After doing that, i'm interested in putting some additional information related to each dataframe. So, I want to add to each element of my dataframe list, some additional elements.
My attempt was to actually build the list of "extra stuff" and then try to merge it with the list of dataframes.
Example code:
set.seed(42)
#Building my list of data.frames. In my specific case this is coming from files
A <- data.frame(x=rnorm(10), y=rnorm(10))
B <- data.frame(x=rnorm(10), y=rnorm(10))
ListD <- list(A, B)
names(ListD)<- c("A", "B") #some names to know what is what
#now my attributes. Each data.frame as some properties that i want to keep track of.
newList <- list(A=c("Color"=123, "Date"=321), B=c("Color"=111, "Date"=111))
#My wished output is a list were each element of the list has
#"Color", "Date" and a dataframe
#I tried something like:
lapply(ListD, append, values=newList)
As far as I can tell all you have to do is change you're initialization of ListD to:
ListD <- list(list(A), list(B))
Because the data structure you want is a list of lists - with the inner lists holding a data.frame and two further attributes. I can't gurantee this is exactly the result you desire but essently this is where your problem is located.
OK, I thought this would be straightforward with mapply, but I can't get the lists to play together well... maybe someone else can. So here's a for solution:
#preallocate list
updatedList <- vector(mode = "list", length = length(ListD))
names(updatedList) <- names(ListD)
for(i in 1:length(updatedList)) {
updatedList[[i]] <- c(ListD[i], newList[[i]])
}
updatedList$A
# $A
# x y
# 1 -0.51690823 0.4521443
# 2 0.97544933 -0.7212561
# 3 0.98909668 -0.2258737
# 4 -1.72753947 -0.7643175
# 5 -1.31050478 -3.2526437
# 6 -0.63845053 1.1263407
# 7 -0.09010858 -0.9386608
# 8 -0.53933869 -0.6882866
# 9 0.54668290 1.7227261
# 10 -0.87948586 -0.2413344
#
# $Color
# [1] 123
#
# $Date
# [1] 321
Alternatively, if you take #Яaffael's suggestion, mapply works, but that will depend how you're building the list from the files in the first place:
ListD <- list(list(A), list(B))
updatedList <- mapply(c, ListD, newList, SIMPLIFY = FALSE)
I used #Яaffael suggestion of a list of lists.
Since changing my list of dataframes is not very easy, because of the way i'm reading them from files, I made the list of lists with the extra data, and then join the dataframes like this:
newList <- list(A=list("Color"=123, "Date"=321), B=list("Color"=111, "Date"=111))
for(n in names(newList)){
newList[[n]]$Dataframe <- ListD[[n]]
}
the structure of my output:
> str(newList)
List of 2
$ A:List of 3
..$ Color : num 123
..$ Date : num 321
..$ Dataframe:'data.frame': 10 obs. of 2 variables:
.. ..$ x: num [1:10] 1.371 -0.565 0.363 0.633 0.404 ...
.. ..$ y: num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
$ B:List of 3
..$ Color : num 111
..$ Date : num 111
..$ Dataframe:'data.frame': 10 obs. of 2 variables:
.. ..$ x: num [1:10] -0.307 -1.781 -0.172 1.215 1.895 ...
.. ..$ y: num [1:10] 0.455 0.705 1.035 -0.609 0.505 ...

Resources