Extract items in nested lists - r

i have a problem:
I work with a data frame with development time (dependent variable) of five species according to the Temperature (independent variable)
with a "by" function I calculated lm's for all of the five species
by(dados, dados$Especie, function(dados) lm(dados$Tempo ~ dados$Temp, data = dados)
and as result I got lists nested in other lists as yo can see here
List of 5
$ C.albiceps :List of 12
..$ coefficients : Named num [1:2] 262.78 -1.76
.. ..- attr(*, "names")= chr [1:2] "(Intercept)" "dados$Temp"
..$ residuals : Named num [1:41] -4.157 -2.394 -0.631 1.131 2.894 ...
.. ..- attr(*, "names")= chr [1:41] "1" "2" "3" "4" ...
..$ effects : Named num [1:41] -1344.031 -133.548 0.235 1.977 3.72 ...
.. ..- attr(*, "names")= chr [1:41] "(Intercept)" "dados$Temp" "" "" ...
..$ rank : int 2
It's a list of 5 elements (one for each specie), and each specie is a list of 12 elements (from the lm function). So, 5 lists of 12, within a list of 5.
Now, my question:
I want to extract the values from my coefficients and sum them up. So I got
List$speciesName$coefficients[2] and I want to extract each value (the second item of $coefficients, for each specie), and I also want to save it in a vector (in order do calculate indices with it).
Any helpful hints on that?

You can loop over the models with an apply function and extract the coefficients with coef.
## Example
mods <- by(mtcars, mtcars$cyl, function(x) lm(mpg ~ disp, data=x))
## Sum up all the second coefficients
sum(sapply(mods, function(x) coef(x)[[2]]))

Related

Nested lists and function to some elements

I have a nested list data
With str(data) I have the following output
List of 2
$ group_info :List of 2
..$ lat : num [1:22] 50.5 55 ...
..$ names : chr [1:22] "A" "B"
$ param : num [1:60, 1:56] 0.0923 0.0952 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:60] "RESULTS" "RESULTS1" "RESULTS2" "RESULTS3" ...
.. ..$ : chr [1:56] "exp" "pops" ...
I would like to check the values in "pops" column and if them are <0.125, then give then value of 0.125.
How would it be possible to do?
Thanks!
The easiest option is to extract the list element and assign back with pmax of the column and 0.125, thus if any element is less than 0.125, it will still return 0.125
data$param[, "pops"] <- pmax(data$param[, "pops"], 0.125, na.rm = TRUE)
Or if we want to use the <, either ifelse or replace would work
data$param[, "pops"] <- replace(data$param[, "pops"],
data$param[, "pops"] < 0.125, 0.125)

How can extract row names after PCA implementation?

I am reducing the dimensional of a test DataFrame(contain 30rows and 750 colunm) with PCA model with PCA (using the FactoMineR library) as follows:
pca_base <- PCA(test, ncp=5, graph=T)
I used function dimdesc() [in FactoMineR], for dimension description,to
identify the most significantly associated variables with a given principal component as follow:
pca_dim<-dimdesc(pca_base)
pca_dim is a list of 3 length.
My question is How can I extract row names of pca_dim from the list[1] and list[2]??.
I try this code:
#to select dim 1,2 use axes
pca_dim<-dimdesc(pca_base,axes = c(1,2))
rownames(pca_dim[[1]])
But the result was NULL.
For instant, I'll use the demo data sets decathlon2 from the factoextra package:data(decathlon2)
It contains 27 individuals (athletes) described by 13 variables.
library(factoextra)
data(decathlon2)
decathlon2.active <- decathlon2[1:23, 1:10]
res.pca <- PCA(decathlon2.active,scale.unit = TRUE, graph = FALSE)
res.desc <- dimdesc(res.pca, axes = c(1,2))
Thanks!
When you have that kind of issues, to access information on an R object, the best way to solve them is to start by examining the output of function str.
str(pca_dim)
#List of 2
# $ Dim.1:List of 1
# ..$ quanti: num [1:8, 1:2] 0.794 0.743 0.734 0.61 0.428 ...
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:8] "Long.jump" "Discus" "Shot.put" "High.jump" ...
# .. .. ..$ : chr [1:2] "correlation" "p.value"
# $ Dim.2:List of 1
# ..$ quanti: num [1:3, 1:2] 8.07e-01 7.84e-01 -4.65e-01 3.21e-06 9.38e-06 ...
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:3] "Pole.vault" "X1500m" "High.jump"
# .. .. ..$ : chr [1:2] "correlation" "p.value"
So the structure of the object is simple, it is a list of two lists. Each of these sublists has just one member, a matrix with the dimnames attribute set.
So you can use standard accessor functions to get those attributes.
rownames(pca_dim$Dim.1$quanti)
#[1] "Long.jump" "Discus" "Shot.put" "High.jump" "Javeline"
#[6] "X400m" "X110m.hurdle" "X100m"
rownames(pca_dim$Dim.2$quanti)
#[1] "Pole.vault" "X1500m" "High.jump"
You have to move the result of dimdesc to data.frame for each element, like this:
rownames(data.frame(res.desc[1]))
[1] "Long.jump" "Discus" "Shot.put" "High.jump" "Javeline" "X400m" "X110m.hurdle"
[8] "X100m"
> rownames(data.frame(res.desc[2]))
[1] "Pole.vault" "X1500m" "High.jump"

merge data.frame with multidimensional list

I have a data frame 'QARef" whith 25 variables. There are only 5 unique jobs (3rd column) but lots of rows per job:
str(QARef)
'data.frame': 648 obs. of 25 variables:
I'm using tapply to generate mean values across all 5 jobs for certain rows:
RefMean <- tapply(QARef$MTN,
list(QARef$Target_CD, QARef$Feature_Type, QARef$Orientation, QARef$Contrast, QARef$Prox),
FUN=mean, trim=0, na.rm=TRUE)
and I get something I'm hoping is referred to as multidimensional list:
str(RefMean)
num [1:17, 1:2, 1:2, 1:2, 1:2] 34.1 34.2 25.2 28.9 29.2 ...
- attr(*, "dimnames")=List of 5
..$ : chr [1:17] "55" "60" "70" "80" ...
..$ : chr [1:2] "LINE" "SQUARE"
..$ : chr [1:2] "X" "Y"
..$ : chr [1:2] "CLEAR" "DARK"
..$ : chr [1:2] "1:1" "Iso"
What I want to do is add a column to QARef which contains the correct RefMean value for each row depending on a match between values in columns of QARef and dimnames of RefMean. E.g. QARef column Feature_Type=="LINE" should match the dimname "LINE" etc.
Any hint how to do this or where to find the answer would be highly appreciated.
I think I found solution. Probably not elegant but it works:
RefMean <- data.frame(tapply(QARef$MTN,paste(QARef$Target_CD,QARef$Feature_Type,QARef$Orientation,QARef$Contrast,QARef$Prox,QARef$Measurement_Type),FUN=mean,trim=0,na.rm=TRUE))
colnames(RefMean) <- c("MTN_Ref")
Ident <- do.call(rbind, strsplit(rownames(RefMean), " "))
RefMean["Target_CD"] <- Ident[,1]
RefMean["Feature_Type"] <- Ident[,2]
RefMean["Orientation"] <- Ident[,3]
RefMean["Contrast"] <- Ident[,4]
RefMean["Prox"] <- Ident[,5]
RefMean["Measurement_Type"] <- Ident[,6]
QA4 <- merge(QARef,RefMean,by=c("Target_CD","Feature_Type","Orientation","Contrast","Prox","Measurement_Type"),all.x=TRUE,sort=FALSE)

Import contingency table (.csv-format) as "table" rather than "data.frame" in R

I am working with the (I think) very cool titanic data that is publicly available.
There are two principal ways of how to import it to R:
(1) You can either use the built-in dataset Titanic (library(datasets)) or
(2) you can download it as .csv-file, e.g. here.
Now, the data is aggregated frequency data. I would like to convert the multi-dimensional contingency table into an individual-level data frame.
PROBLEM: If I use the built-in dataset, this is no problem; if I use the imported .csv-file, however, it doesn't work. This is the error message I get:
Error in rep(1:nrow(tablevars), counts) : invalid 'times' argument In
addition: Warning message: In expand.table(Titanic.table) : NAs
introduced by coercion
Why? And what do I wrong? Many thanks.
R CODE
#required packages
library(datasets)
library(epitools)
#(1) Expansion of built-in data set
data(Titanic)
Titanic.raw <- Titanic
class(Titanic.raw) # data is stored as "table"
Titanic.expand <- expand.table(Titanic.raw)
#(2) Expansion of imported data set
Titanic.raw <- read.table("Titanic.csv", header=TRUE, sep=",", row.names=1)
class(Titanic.raw) #data is stored as "data.frame"
Titanic.table <- as.table(as.matrix(Titanic.raw))
class(Titanic.table) #data is stored as "table"
Titanic.expand <- expand.table(Titanic.table)
I think you probably want xtabs: Watch out that the factor coding is different for the factors in the Titanic and the Titanic.new objects. By default factor levels have lexicographic order, while two of the Titanic factors do not :
str(Titanic)
table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
- attr(*, "dimnames")=List of 4
..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
..$ Sex : chr [1:2] "Male" "Female"
..$ Age : chr [1:2] "Child" "Adult"
..$ Survived: chr [1:2] "No" "Yes"
Titanic.raw <- read.table("~/Downloads/Titanic.csv", header=TRUE, sep=",", row.names=1)
str( Titanic.new <-
xtabs( Freq ~ Class + Sex + Age +Survived, data=Titanic.raw))
xtabs [1:4, 1:2, 1:2, 1:2] 4 13 89 3 118 154 387 670 0 0 ...
- attr(*, "dimnames")=List of 4
..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
..$ Sex : chr [1:2] "Female" "Male"
..$ Age : chr [1:2] "Adult" "Child"
..$ Survived: chr [1:2] "No" "Yes"
- attr(*, "class")= chr [1:2] "xtabs" "table"
- attr(*, "call")= language xtabs(formula = Freq ~ Class + Sex + Age + Survived, data = Titanic.raw)
An 'xtabs'-object inherits from 'table'-class so you can use that expand.table function.

R question - How to extract attributes values from bystat object and place them in variables

I'm using the bystat function from the Hmisc package in R. How can I extract attribute values and place them into variables. For example, I want to calculate mean and SD for variable aaf and put them in a dataframe or matrix.
t <- with(d.aaf,bystats(y=aaf,plot_bid,fun=function(x) {
c(Mean = round(mean(x),digits=2),SD = round(sd(x),digits=2))
}))
> str(t)
bystats [1:121, 1:3] 5 5 5 5 5 4 5 5 3 4 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:121] "P00000000006001288020278" "P00000000006001288085814"
"P00000000006001288151350" "P00000000006001288216886" ...
..$ : chr [1:3] "N" "Mean" "SD"
- attr(*, "heading")= chr "function(x) { c(Mean = round(mean(x),digits=2),
SD = round(sd(x),digits=2)) }
of aaf by plot_bid"
- attr(*, "byvarnames")= chr "plot_bid"
The way I'm doing it is by first converting "t" into a dataframe, which I do not think is very efficient.
Thanks for your suggestions.
You could use ddply from the plyr package which outputs directly to a data frame.
library(plyr)
t<-ddply(d.aaf, "plot_bid", summarise, mean=round(mean(aaf),2), SD=round(sd(aaf),2))
SD<-t$SD
mean<-t$mean

Resources