Nested lists and function to some elements - r

I have a nested list data
With str(data) I have the following output
List of 2
$ group_info :List of 2
..$ lat : num [1:22] 50.5 55 ...
..$ names : chr [1:22] "A" "B"
$ param : num [1:60, 1:56] 0.0923 0.0952 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:60] "RESULTS" "RESULTS1" "RESULTS2" "RESULTS3" ...
.. ..$ : chr [1:56] "exp" "pops" ...
I would like to check the values in "pops" column and if them are <0.125, then give then value of 0.125.
How would it be possible to do?
Thanks!

The easiest option is to extract the list element and assign back with pmax of the column and 0.125, thus if any element is less than 0.125, it will still return 0.125
data$param[, "pops"] <- pmax(data$param[, "pops"], 0.125, na.rm = TRUE)
Or if we want to use the <, either ifelse or replace would work
data$param[, "pops"] <- replace(data$param[, "pops"],
data$param[, "pops"] < 0.125, 0.125)

Related

How can extract row names after PCA implementation?

I am reducing the dimensional of a test DataFrame(contain 30rows and 750 colunm) with PCA model with PCA (using the FactoMineR library) as follows:
pca_base <- PCA(test, ncp=5, graph=T)
I used function dimdesc() [in FactoMineR], for dimension description,to
identify the most significantly associated variables with a given principal component as follow:
pca_dim<-dimdesc(pca_base)
pca_dim is a list of 3 length.
My question is How can I extract row names of pca_dim from the list[1] and list[2]??.
I try this code:
#to select dim 1,2 use axes
pca_dim<-dimdesc(pca_base,axes = c(1,2))
rownames(pca_dim[[1]])
But the result was NULL.
For instant, I'll use the demo data sets decathlon2 from the factoextra package:data(decathlon2)
It contains 27 individuals (athletes) described by 13 variables.
library(factoextra)
data(decathlon2)
decathlon2.active <- decathlon2[1:23, 1:10]
res.pca <- PCA(decathlon2.active,scale.unit = TRUE, graph = FALSE)
res.desc <- dimdesc(res.pca, axes = c(1,2))
Thanks!
When you have that kind of issues, to access information on an R object, the best way to solve them is to start by examining the output of function str.
str(pca_dim)
#List of 2
# $ Dim.1:List of 1
# ..$ quanti: num [1:8, 1:2] 0.794 0.743 0.734 0.61 0.428 ...
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:8] "Long.jump" "Discus" "Shot.put" "High.jump" ...
# .. .. ..$ : chr [1:2] "correlation" "p.value"
# $ Dim.2:List of 1
# ..$ quanti: num [1:3, 1:2] 8.07e-01 7.84e-01 -4.65e-01 3.21e-06 9.38e-06 ...
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:3] "Pole.vault" "X1500m" "High.jump"
# .. .. ..$ : chr [1:2] "correlation" "p.value"
So the structure of the object is simple, it is a list of two lists. Each of these sublists has just one member, a matrix with the dimnames attribute set.
So you can use standard accessor functions to get those attributes.
rownames(pca_dim$Dim.1$quanti)
#[1] "Long.jump" "Discus" "Shot.put" "High.jump" "Javeline"
#[6] "X400m" "X110m.hurdle" "X100m"
rownames(pca_dim$Dim.2$quanti)
#[1] "Pole.vault" "X1500m" "High.jump"
You have to move the result of dimdesc to data.frame for each element, like this:
rownames(data.frame(res.desc[1]))
[1] "Long.jump" "Discus" "Shot.put" "High.jump" "Javeline" "X400m" "X110m.hurdle"
[8] "X100m"
> rownames(data.frame(res.desc[2]))
[1] "Pole.vault" "X1500m" "High.jump"

R: parsing unequal-length elements of a list-type variable within a dataframe into separate (flattened) variables

I have a dataframe object that contains 825812 obs and 3 variables, the first of which (AF) is a list. The length of the unlisted list variable is 839390 because some obs have more than one AF value assigned.
length(TRAIN_vcfAF$AF) #825812
length(unlist(TRAIN_vcfAF$AF)) #839390
I would like to do three things:
1. pull the first element of the list for each obs into a new variable, 'AF1'; pull the second element (if present) for each obs into new variable, 'AF2', and so on (if any obs has >2 AF values)
2. create 'AF0' which will be the minimum of AF1, AF2, AFn....
3. create a column that provides the number (count) of unique AF values in the dataframe for each obs.
NOTE: The dataframe has informative rownames (chromosomal positions), that I would like to preserve in the output df.
Below is the str() call on the dataframe and a partial print of the dataframe.
str(TRAIN_vcfAF)
'data.frame': 825812 obs. of 3 variables:
$ AF :List of 825812
..$ : num 8.04e-05
..$ : num 8.04e-05
..$ : num 0.00113
..$ : num 0.000161
..$ : num 0.000321
..$ : num 8.04e-05
..$ : num 8.04e-05
...
.. [list output truncated]
..- attr(*, "class")= chr "AsIs"
$ drop: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ X : num 8.04e-05 8.04e-05 1.13e-03 1.61e-04 3.21e-04 ...
# partial print:
TRAIN_vcfAF[1:6,]
AF drop AF0
chr1.100111836.56777 8.035e-05 FALSE 8.035e-05
chr1.100111850.56778 8.035e-05 FALSE 8.035e-05
chr1.100127842.56781 0.001126 FALSE 1.126e-03
chr1.100133162.56783 0.0001607 FALSE 1.607e-04
chr1.100133187.56785 0.0003214 FALSE 3.214e-04
chr1.100133328.56788 8.035e-05 FALSE 8.035e-05
The last two are very simple:
TRAIN_vcfAF$AF0 = sapply(TRAIN_vcfAF$AF, min)
TRAIN_vcfAF$unique = sapply(TRAIN_vcfAF$AF, function(o) length(unique(o)))
For the first one, there may be tools for that, but just with basic manipulation you can simply do something like:
n = seq_len(max(sapply(TRAIN_vcfAF$AF, length)))
AF = t(sapply(TRAIN_vcfAF$AF, function(o) o[n]))
colnames(AF) = paste0("AF", n)
TRAIN_vcfAF = cbind(TRAIN_vcfAF, AF)

Extract items in nested lists

i have a problem:
I work with a data frame with development time (dependent variable) of five species according to the Temperature (independent variable)
with a "by" function I calculated lm's for all of the five species
by(dados, dados$Especie, function(dados) lm(dados$Tempo ~ dados$Temp, data = dados)
and as result I got lists nested in other lists as yo can see here
List of 5
$ C.albiceps :List of 12
..$ coefficients : Named num [1:2] 262.78 -1.76
.. ..- attr(*, "names")= chr [1:2] "(Intercept)" "dados$Temp"
..$ residuals : Named num [1:41] -4.157 -2.394 -0.631 1.131 2.894 ...
.. ..- attr(*, "names")= chr [1:41] "1" "2" "3" "4" ...
..$ effects : Named num [1:41] -1344.031 -133.548 0.235 1.977 3.72 ...
.. ..- attr(*, "names")= chr [1:41] "(Intercept)" "dados$Temp" "" "" ...
..$ rank : int 2
It's a list of 5 elements (one for each specie), and each specie is a list of 12 elements (from the lm function). So, 5 lists of 12, within a list of 5.
Now, my question:
I want to extract the values from my coefficients and sum them up. So I got
List$speciesName$coefficients[2] and I want to extract each value (the second item of $coefficients, for each specie), and I also want to save it in a vector (in order do calculate indices with it).
Any helpful hints on that?
You can loop over the models with an apply function and extract the coefficients with coef.
## Example
mods <- by(mtcars, mtcars$cyl, function(x) lm(mpg ~ disp, data=x))
## Sum up all the second coefficients
sum(sapply(mods, function(x) coef(x)[[2]]))

merge data.frame with multidimensional list

I have a data frame 'QARef" whith 25 variables. There are only 5 unique jobs (3rd column) but lots of rows per job:
str(QARef)
'data.frame': 648 obs. of 25 variables:
I'm using tapply to generate mean values across all 5 jobs for certain rows:
RefMean <- tapply(QARef$MTN,
list(QARef$Target_CD, QARef$Feature_Type, QARef$Orientation, QARef$Contrast, QARef$Prox),
FUN=mean, trim=0, na.rm=TRUE)
and I get something I'm hoping is referred to as multidimensional list:
str(RefMean)
num [1:17, 1:2, 1:2, 1:2, 1:2] 34.1 34.2 25.2 28.9 29.2 ...
- attr(*, "dimnames")=List of 5
..$ : chr [1:17] "55" "60" "70" "80" ...
..$ : chr [1:2] "LINE" "SQUARE"
..$ : chr [1:2] "X" "Y"
..$ : chr [1:2] "CLEAR" "DARK"
..$ : chr [1:2] "1:1" "Iso"
What I want to do is add a column to QARef which contains the correct RefMean value for each row depending on a match between values in columns of QARef and dimnames of RefMean. E.g. QARef column Feature_Type=="LINE" should match the dimname "LINE" etc.
Any hint how to do this or where to find the answer would be highly appreciated.
I think I found solution. Probably not elegant but it works:
RefMean <- data.frame(tapply(QARef$MTN,paste(QARef$Target_CD,QARef$Feature_Type,QARef$Orientation,QARef$Contrast,QARef$Prox,QARef$Measurement_Type),FUN=mean,trim=0,na.rm=TRUE))
colnames(RefMean) <- c("MTN_Ref")
Ident <- do.call(rbind, strsplit(rownames(RefMean), " "))
RefMean["Target_CD"] <- Ident[,1]
RefMean["Feature_Type"] <- Ident[,2]
RefMean["Orientation"] <- Ident[,3]
RefMean["Contrast"] <- Ident[,4]
RefMean["Prox"] <- Ident[,5]
RefMean["Measurement_Type"] <- Ident[,6]
QA4 <- merge(QARef,RefMean,by=c("Target_CD","Feature_Type","Orientation","Contrast","Prox","Measurement_Type"),all.x=TRUE,sort=FALSE)

Applying "dim" function over elements of a list in R

I have n number of lists, lets say list1, list2, ..., listn. Each list has 10 elements and I need to calculate the "mean" of "dim" of ten elements of each list. So the output should be a vector of length n.
For example the first element of the output vector should be:
n1 = mean(dim(list1[[1]]), dim(list1[[2]]), dim(list1[[3]]), ..., dim(list1[[10]])
I know how to obtain it using for-loops but I am sure it is not the best solution.
The lists have structure derived from one of "Bioconductor" R packages called "edgeR".
So each element of the list has this structure:
$ :Formal class 'TopTags' [package "edgeR"] with 1 slots
.. ..# .Data:List of 4
.. .. ..$ :'data.frame': 2608 obs. of 4 variables:
.. .. .. ..$ logFC : num [1:2608] 6.37 -6.48 -5.72 -5.6 -4.01 ...
.. .. .. ..$ logCPM: num [1:2608] 5.1 2.55 2.08 1.57 3.08 ...
.. .. .. ..$ PValue: num [1:2608] 3.16e-292 1.57e-187 2.15e-152 5.58e-141 1.27e-135 ...
.. .. .. ..$ FDR : num [1:2608] 7.37e-288 1.83e-183 1.67e-148 3.25e-137 5.92e-132 ...
.. .. ..$ : chr "BH"
.. .. ..$ : chr [1:2] "healthy" "cancerous"
.. .. ..$ : chr "exact"
And since each list has 10 elements, I have 10 repeats of above structure when running:
str(list1)
Original question
lapply (or sapply) is your friend:
mean(sapply(mylist,dim))
If you have many lists with a uniform meaning and structure, you should use instead a list of lists (i.e., mylist[[3]] instead of mylist3).
Edited question
sapply(mylist, function(x) mean(sapply(x,dim)))
will return a vector of means of inner lists.
Question in a comment
If your list contains matrices instead of vectors and you want to average one of the dimensions (dim(.)[1] or dim(.)[2]), you can use ncol and nrow for that instead of dim.
Alternatively, you can pass any function there, e.g.,
sapply(mylist, function(x) mean(sapply(x, function(y) sum(dim(y)))))
to average the sums of dimensions.
If all your objects are called "list*" and you have no other objects with the names list in them, you can easily stick all the lists into a single list object which will make it easier to operate on them...
ll <- mget( ls( pattern = "list" ) )
sapply( ll , function(x) mean( sapply( x , dim ) )
Here is the solution using Map function where mylist is the list of yours:
Map(function(x) mean(x[[1]]:x[[10]]), mylist)
Example:
a<-list(1,2,3,4)
b<-list(2,3,5,6)
mylist<-list(a,b)
k<- Map(function(x) mean(x[[1]]:x[[4]]), mylist)
>k
[[1]]
[1] 2.5
[[2]]
[1] 4
To convert to vector:
> do.call(rbind,k)
[,1]
[1,] 2.5
[2,] 4.0
OR,
library(plyr)
ldply(k)
V1
1 2.5
2 4.0
If the elements of each list are matrix:
Map(function(x) mean(dim(x[[1]])[1]:dim(x[[10]])[1]), mylist)

Resources