How can extract row names after PCA implementation? - r

I am reducing the dimensional of a test DataFrame(contain 30rows and 750 colunm) with PCA model with PCA (using the FactoMineR library) as follows:
pca_base <- PCA(test, ncp=5, graph=T)
I used function dimdesc() [in FactoMineR], for dimension description,to
identify the most significantly associated variables with a given principal component as follow:
pca_dim<-dimdesc(pca_base)
pca_dim is a list of 3 length.
My question is How can I extract row names of pca_dim from the list[1] and list[2]??.
I try this code:
#to select dim 1,2 use axes
pca_dim<-dimdesc(pca_base,axes = c(1,2))
rownames(pca_dim[[1]])
But the result was NULL.
For instant, I'll use the demo data sets decathlon2 from the factoextra package:data(decathlon2)
It contains 27 individuals (athletes) described by 13 variables.
library(factoextra)
data(decathlon2)
decathlon2.active <- decathlon2[1:23, 1:10]
res.pca <- PCA(decathlon2.active,scale.unit = TRUE, graph = FALSE)
res.desc <- dimdesc(res.pca, axes = c(1,2))
Thanks!

When you have that kind of issues, to access information on an R object, the best way to solve them is to start by examining the output of function str.
str(pca_dim)
#List of 2
# $ Dim.1:List of 1
# ..$ quanti: num [1:8, 1:2] 0.794 0.743 0.734 0.61 0.428 ...
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:8] "Long.jump" "Discus" "Shot.put" "High.jump" ...
# .. .. ..$ : chr [1:2] "correlation" "p.value"
# $ Dim.2:List of 1
# ..$ quanti: num [1:3, 1:2] 8.07e-01 7.84e-01 -4.65e-01 3.21e-06 9.38e-06 ...
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:3] "Pole.vault" "X1500m" "High.jump"
# .. .. ..$ : chr [1:2] "correlation" "p.value"
So the structure of the object is simple, it is a list of two lists. Each of these sublists has just one member, a matrix with the dimnames attribute set.
So you can use standard accessor functions to get those attributes.
rownames(pca_dim$Dim.1$quanti)
#[1] "Long.jump" "Discus" "Shot.put" "High.jump" "Javeline"
#[6] "X400m" "X110m.hurdle" "X100m"
rownames(pca_dim$Dim.2$quanti)
#[1] "Pole.vault" "X1500m" "High.jump"

You have to move the result of dimdesc to data.frame for each element, like this:
rownames(data.frame(res.desc[1]))
[1] "Long.jump" "Discus" "Shot.put" "High.jump" "Javeline" "X400m" "X110m.hurdle"
[8] "X100m"
> rownames(data.frame(res.desc[2]))
[1] "Pole.vault" "X1500m" "High.jump"

Related

Nested lists and function to some elements

I have a nested list data
With str(data) I have the following output
List of 2
$ group_info :List of 2
..$ lat : num [1:22] 50.5 55 ...
..$ names : chr [1:22] "A" "B"
$ param : num [1:60, 1:56] 0.0923 0.0952 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:60] "RESULTS" "RESULTS1" "RESULTS2" "RESULTS3" ...
.. ..$ : chr [1:56] "exp" "pops" ...
I would like to check the values in "pops" column and if them are <0.125, then give then value of 0.125.
How would it be possible to do?
Thanks!
The easiest option is to extract the list element and assign back with pmax of the column and 0.125, thus if any element is less than 0.125, it will still return 0.125
data$param[, "pops"] <- pmax(data$param[, "pops"], 0.125, na.rm = TRUE)
Or if we want to use the <, either ifelse or replace would work
data$param[, "pops"] <- replace(data$param[, "pops"],
data$param[, "pops"] < 0.125, 0.125)

Extract items in nested lists

i have a problem:
I work with a data frame with development time (dependent variable) of five species according to the Temperature (independent variable)
with a "by" function I calculated lm's for all of the five species
by(dados, dados$Especie, function(dados) lm(dados$Tempo ~ dados$Temp, data = dados)
and as result I got lists nested in other lists as yo can see here
List of 5
$ C.albiceps :List of 12
..$ coefficients : Named num [1:2] 262.78 -1.76
.. ..- attr(*, "names")= chr [1:2] "(Intercept)" "dados$Temp"
..$ residuals : Named num [1:41] -4.157 -2.394 -0.631 1.131 2.894 ...
.. ..- attr(*, "names")= chr [1:41] "1" "2" "3" "4" ...
..$ effects : Named num [1:41] -1344.031 -133.548 0.235 1.977 3.72 ...
.. ..- attr(*, "names")= chr [1:41] "(Intercept)" "dados$Temp" "" "" ...
..$ rank : int 2
It's a list of 5 elements (one for each specie), and each specie is a list of 12 elements (from the lm function). So, 5 lists of 12, within a list of 5.
Now, my question:
I want to extract the values from my coefficients and sum them up. So I got
List$speciesName$coefficients[2] and I want to extract each value (the second item of $coefficients, for each specie), and I also want to save it in a vector (in order do calculate indices with it).
Any helpful hints on that?
You can loop over the models with an apply function and extract the coefficients with coef.
## Example
mods <- by(mtcars, mtcars$cyl, function(x) lm(mpg ~ disp, data=x))
## Sum up all the second coefficients
sum(sapply(mods, function(x) coef(x)[[2]]))

merge data.frame with multidimensional list

I have a data frame 'QARef" whith 25 variables. There are only 5 unique jobs (3rd column) but lots of rows per job:
str(QARef)
'data.frame': 648 obs. of 25 variables:
I'm using tapply to generate mean values across all 5 jobs for certain rows:
RefMean <- tapply(QARef$MTN,
list(QARef$Target_CD, QARef$Feature_Type, QARef$Orientation, QARef$Contrast, QARef$Prox),
FUN=mean, trim=0, na.rm=TRUE)
and I get something I'm hoping is referred to as multidimensional list:
str(RefMean)
num [1:17, 1:2, 1:2, 1:2, 1:2] 34.1 34.2 25.2 28.9 29.2 ...
- attr(*, "dimnames")=List of 5
..$ : chr [1:17] "55" "60" "70" "80" ...
..$ : chr [1:2] "LINE" "SQUARE"
..$ : chr [1:2] "X" "Y"
..$ : chr [1:2] "CLEAR" "DARK"
..$ : chr [1:2] "1:1" "Iso"
What I want to do is add a column to QARef which contains the correct RefMean value for each row depending on a match between values in columns of QARef and dimnames of RefMean. E.g. QARef column Feature_Type=="LINE" should match the dimname "LINE" etc.
Any hint how to do this or where to find the answer would be highly appreciated.
I think I found solution. Probably not elegant but it works:
RefMean <- data.frame(tapply(QARef$MTN,paste(QARef$Target_CD,QARef$Feature_Type,QARef$Orientation,QARef$Contrast,QARef$Prox,QARef$Measurement_Type),FUN=mean,trim=0,na.rm=TRUE))
colnames(RefMean) <- c("MTN_Ref")
Ident <- do.call(rbind, strsplit(rownames(RefMean), " "))
RefMean["Target_CD"] <- Ident[,1]
RefMean["Feature_Type"] <- Ident[,2]
RefMean["Orientation"] <- Ident[,3]
RefMean["Contrast"] <- Ident[,4]
RefMean["Prox"] <- Ident[,5]
RefMean["Measurement_Type"] <- Ident[,6]
QA4 <- merge(QARef,RefMean,by=c("Target_CD","Feature_Type","Orientation","Contrast","Prox","Measurement_Type"),all.x=TRUE,sort=FALSE)

Import contingency table (.csv-format) as "table" rather than "data.frame" in R

I am working with the (I think) very cool titanic data that is publicly available.
There are two principal ways of how to import it to R:
(1) You can either use the built-in dataset Titanic (library(datasets)) or
(2) you can download it as .csv-file, e.g. here.
Now, the data is aggregated frequency data. I would like to convert the multi-dimensional contingency table into an individual-level data frame.
PROBLEM: If I use the built-in dataset, this is no problem; if I use the imported .csv-file, however, it doesn't work. This is the error message I get:
Error in rep(1:nrow(tablevars), counts) : invalid 'times' argument In
addition: Warning message: In expand.table(Titanic.table) : NAs
introduced by coercion
Why? And what do I wrong? Many thanks.
R CODE
#required packages
library(datasets)
library(epitools)
#(1) Expansion of built-in data set
data(Titanic)
Titanic.raw <- Titanic
class(Titanic.raw) # data is stored as "table"
Titanic.expand <- expand.table(Titanic.raw)
#(2) Expansion of imported data set
Titanic.raw <- read.table("Titanic.csv", header=TRUE, sep=",", row.names=1)
class(Titanic.raw) #data is stored as "data.frame"
Titanic.table <- as.table(as.matrix(Titanic.raw))
class(Titanic.table) #data is stored as "table"
Titanic.expand <- expand.table(Titanic.table)
I think you probably want xtabs: Watch out that the factor coding is different for the factors in the Titanic and the Titanic.new objects. By default factor levels have lexicographic order, while two of the Titanic factors do not :
str(Titanic)
table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
- attr(*, "dimnames")=List of 4
..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
..$ Sex : chr [1:2] "Male" "Female"
..$ Age : chr [1:2] "Child" "Adult"
..$ Survived: chr [1:2] "No" "Yes"
Titanic.raw <- read.table("~/Downloads/Titanic.csv", header=TRUE, sep=",", row.names=1)
str( Titanic.new <-
xtabs( Freq ~ Class + Sex + Age +Survived, data=Titanic.raw))
xtabs [1:4, 1:2, 1:2, 1:2] 4 13 89 3 118 154 387 670 0 0 ...
- attr(*, "dimnames")=List of 4
..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
..$ Sex : chr [1:2] "Female" "Male"
..$ Age : chr [1:2] "Adult" "Child"
..$ Survived: chr [1:2] "No" "Yes"
- attr(*, "class")= chr [1:2] "xtabs" "table"
- attr(*, "call")= language xtabs(formula = Freq ~ Class + Sex + Age + Survived, data = Titanic.raw)
An 'xtabs'-object inherits from 'table'-class so you can use that expand.table function.

Applying "dim" function over elements of a list in R

I have n number of lists, lets say list1, list2, ..., listn. Each list has 10 elements and I need to calculate the "mean" of "dim" of ten elements of each list. So the output should be a vector of length n.
For example the first element of the output vector should be:
n1 = mean(dim(list1[[1]]), dim(list1[[2]]), dim(list1[[3]]), ..., dim(list1[[10]])
I know how to obtain it using for-loops but I am sure it is not the best solution.
The lists have structure derived from one of "Bioconductor" R packages called "edgeR".
So each element of the list has this structure:
$ :Formal class 'TopTags' [package "edgeR"] with 1 slots
.. ..# .Data:List of 4
.. .. ..$ :'data.frame': 2608 obs. of 4 variables:
.. .. .. ..$ logFC : num [1:2608] 6.37 -6.48 -5.72 -5.6 -4.01 ...
.. .. .. ..$ logCPM: num [1:2608] 5.1 2.55 2.08 1.57 3.08 ...
.. .. .. ..$ PValue: num [1:2608] 3.16e-292 1.57e-187 2.15e-152 5.58e-141 1.27e-135 ...
.. .. .. ..$ FDR : num [1:2608] 7.37e-288 1.83e-183 1.67e-148 3.25e-137 5.92e-132 ...
.. .. ..$ : chr "BH"
.. .. ..$ : chr [1:2] "healthy" "cancerous"
.. .. ..$ : chr "exact"
And since each list has 10 elements, I have 10 repeats of above structure when running:
str(list1)
Original question
lapply (or sapply) is your friend:
mean(sapply(mylist,dim))
If you have many lists with a uniform meaning and structure, you should use instead a list of lists (i.e., mylist[[3]] instead of mylist3).
Edited question
sapply(mylist, function(x) mean(sapply(x,dim)))
will return a vector of means of inner lists.
Question in a comment
If your list contains matrices instead of vectors and you want to average one of the dimensions (dim(.)[1] or dim(.)[2]), you can use ncol and nrow for that instead of dim.
Alternatively, you can pass any function there, e.g.,
sapply(mylist, function(x) mean(sapply(x, function(y) sum(dim(y)))))
to average the sums of dimensions.
If all your objects are called "list*" and you have no other objects with the names list in them, you can easily stick all the lists into a single list object which will make it easier to operate on them...
ll <- mget( ls( pattern = "list" ) )
sapply( ll , function(x) mean( sapply( x , dim ) )
Here is the solution using Map function where mylist is the list of yours:
Map(function(x) mean(x[[1]]:x[[10]]), mylist)
Example:
a<-list(1,2,3,4)
b<-list(2,3,5,6)
mylist<-list(a,b)
k<- Map(function(x) mean(x[[1]]:x[[4]]), mylist)
>k
[[1]]
[1] 2.5
[[2]]
[1] 4
To convert to vector:
> do.call(rbind,k)
[,1]
[1,] 2.5
[2,] 4.0
OR,
library(plyr)
ldply(k)
V1
1 2.5
2 4.0
If the elements of each list are matrix:
Map(function(x) mean(dim(x[[1]])[1]:dim(x[[10]])[1]), mylist)

Resources