How to access principal components rownames in prcomp? - r

I'm probably not explaining myself very well here. How can I access the column of names in prcomp following its use as shown below? I would like to use this as a list for subsequent plots.
prcomp(USArrests)
Standard deviations:
[1] 83.732400 14.212402 6.489426 2.482790
Rotation:
PC1 PC2 PC3 PC4
Murder 0.04170432 -0.04482166 0.07989066 -0.99492173
Assault 0.99522128 -0.05876003 -0.06756974 0.03893830
UrbanPop 0.04633575 0.97685748 -0.20054629 -0.05816914
Rape 0.07515550 0.20071807 0.97408059 0.07232502
I would like to access the extract the list "Murder, Assault, UrbanPop, Rape".

It is always helpful to use str:
res <- prcomp(USArrests)
str(res)
# List of 5
# $ sdev : num [1:4] 83.73 14.21 6.49 2.48
# $ rotation: num [1:4, 1:4] 0.0417 0.9952 0.0463 0.0752 -0.0448 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
# .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
# $ center : Named num [1:4] 7.79 170.76 65.54 21.23
# ..- attr(*, "names")= chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
# $ scale : logi FALSE
# $ x : num [1:50, 1:4] 64.8 92.8 124.1 18.3 107.4 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
# .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
# - attr(*, "class")= chr "prcomp"
Then we can do:
rownames(res$rotation)
#[1] "Murder" "Assault" "UrbanPop" "Rape"

Related

How to plot PCA using hellinger transformation in ggplot?

I'm trying to ggplot using Hellinger Transformation on my dataset. It works fine for a regular prcomp function but not Hellingers. How can I plot the data from Hellinger transformed data using ggplot?
library(ggfortify)
library(vegan)
df <- iris[1:4]
pca_res <- prcomp(df, scale. = TRUE)
autoplot(pca_res, data = iris, colour =
'Species',
loadings = TRUE, loadings.colour = 'blue',
loadings.label = TRUE, loadings.label.size = 3)
##Hellinger Transformation
df.hell <- decostand(df, method = "hellinger")
df.hell <- rda(df.hell)
ggplot2::autoplot(df.hell)
autoplot(df.hell, data = iris, colour =
'Species',
loadings = TRUE, loadings.colour = 'blue',
loadings.label = TRUE, loadings.label.size = 3)
Error: Objects of type rda/cca not supported by autoplot.
Error: Objects of type rda/cca not supported by autoplot.
Edit 1: Even if the first plot can be manually computed in ggplot2, what about the rest of the plots like loading, or ellipses etc? base plot allows for overlay when using Hellingers but doesn't seem like ggplot2 would directly allow for it.
prcomp returns an object of class prcomp, which can be plotted with autoplot. As the error message says, rda function returns an object of class "rda" "cca", which cannot be plotted using autoplot. Therefore, you must extract the bits you need manually:
data.frame(PC = df.hell$CA$u, species = iris$Species) %>%
ggplot(aes(x=PC.PC1, y=PC.PC2)) +
geom_point(aes(colour=species))
You can find the relevant parts of the object by doing str(df.hell):
List of 10
$ colsum : Named num [1:4] 0.037 0.0746 0.086 0.0854
..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
$ tot.chi : num 0.0216
$ Ybar : num [1:150, 1:4] 0.0042 0.00511 0.0042 0.00359 0.00363 ...
..- attr(*, "scaled:center")= Named num [1:4] 0.656 0.479 0.498 0.267
.. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
..- attr(*, "METHOD")= chr "PCA"
$ method : chr "rda"
$ call : language rda(X = df.hell)
$ pCCA : NULL
$ CCA : NULL
$ CA :List of 7
..$ eig : Named num [1:4] 0.0208691 0.0005348 0.0001951 0.0000205
.. ..- attr(*, "names")= chr [1:4] "PC1" "PC2" "PC3" "PC4"
..$ poseig : NULL
..$ u : num [1:150, 1:4] -0.122 -0.11 -0.119 -0.106 -0.123 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
..$ v : num [1:4, 1:4] -0.241 -0.508 0.589 0.58 0.375 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
.. .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
..$ rank : int 4
..$ tot.chi: num 0.0216
..$ Xbar : num [1:150, 1:4] 0.0042 0.00511 0.0042 0.00359 0.00363 ...
.. ..- attr(*, "scaled:center")= Named num [1:4] 0.656 0.479 0.498 0.267
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
.. ..- attr(*, "METHOD")= chr "PCA"
$ inertia : chr "variance"
$ regularization: chr "this is a vegan::rda result object"
- attr(*, "class")= chr [1:2] "rda" "cca"

How do I extract summary of PCA as a dataframe in R using Prcomp?

res.pca = prcomp(y, scale = TRUE)
summ=summary(res.pca)
summ
Gives me the output Desired Output
I want to change this Summary in to a Data Frame,
I've Tried to use the do.call(cbind, lapply(res.pca, summary)) but it gives me the summary of Min/Max but not the one I desire.
Please See That I dont want to extract values from column names, I seek a general solution That I can use.
What you are looking for is in the "element" importance of summary(res.pca):
Example taken from Principal Components Analysis - how to get the contribution (%) of each parameter to a Prin.Comp.?:
a <- rnorm(10, 50, 20)
b <- seq(10, 100, 10)
c <- seq(88, 10, -8)
d <- rep(seq(3, 16, 3), 2)
e <- rnorm(10, 61, 27)
my_table <- data.frame(a, b, c, d, e)
res.pca <- prcomp(my_table, scale = TRUE)
summary(res.pca)$importance
# PC1 PC2 PC3 PC4 PC5
#Standard deviation 1.7882 0.9038 0.8417 0.52622 9.037e-17
#Proportion of Variance 0.6395 0.1634 0.1417 0.05538 0.000e+00
#Cumulative Proportion 0.6395 0.8029 0.9446 1.00000 1.000e+00
class(summary(res.pca)$importance)
#[1] "matrix"
N.B.:
When you want to "study" an object, it can be convenient to use str on it. Here, you can do str(summary(pca) to see where the information are and hence where you can get what you want:
str(summary(res.pca))
List of 6
$ sdev : num [1:5] 1.79 9.04e-01 8.42e-01 5.26e-01 9.04e-17
$ rotation : num [1:5, 1:5] 0.278 0.512 -0.512 0.414 -0.476 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:5] "a" "b" "c" "d" ...
.. ..$ : chr [1:5] "PC1" "PC2" "PC3" "PC4" ...
$ center : Named num [1:5] 34.9 55 52 9 77.8
..- attr(*, "names")= chr [1:5] "a" "b" "c" "d" ...
$ scale : Named num [1:5] 22.4 30.28 24.22 4.47 26.11
..- attr(*, "names")= chr [1:5] "a" "b" "c" "d" ...
$ x : num [1:10, 1:5] -2.962 -1.403 -1.653 -0.537 1.186 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "PC1" "PC2" "PC3" "PC4" ...
$ importance: num [1:3, 1:5] 1.788 0.64 0.64 0.904 0.163 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:3] "Standard deviation" "Proportion of Variance" "Cumulative Proportion"
.. ..$ : chr [1:5] "PC1" "PC2" "PC3" "PC4" ...
- attr(*, "class")= chr "summary.prcomp"

R kohonen - Is the input data scaled and centred automatically?

I have been following an online example for R Kohonen self-organising maps (SOM) which suggested that the data should be centred and scaled before computing the SOM.
However, I've noticed the object created seems to have attributes for centre and scale, in which case am I really applying a redundant step by centring and scaling first? Example script below
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
# Prepare SOM
set.seed(590507)
som1 <- som(dt,
somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
str(som1)
The output from the last line of the script is:
List of 13
$ data :List of 1
..$ : num [1:150, 1:4] -0.898 -1.139 -1.381 -1.501 -1.018 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
.. ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
.. ..- attr(*, "scaled:scale")= Named num [1:4] 0.828 0.436 1.765 0.762
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
$ unit.classif : num [1:150] 3 5 5 5 4 2 4 4 6 5 ...
$ distances : num [1:150] 0.0426 0.0663 0.0768 0.0744 0.1346 ...
$ grid :List of 6
..$ pts : num [1:36, 1:2] 1.5 2.5 3.5 4.5 5.5 6.5 1 2 3 4 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "x" "y"
..$ xdim : num 6
..$ ydim : num 6
..$ topo : chr "hexagonal"
..$ neighbourhood.fct: Factor w/ 2 levels "bubble","gaussian": 1
..$ toroidal : logi FALSE
..- attr(*, "class")= chr "somgrid"
$ codes :List of 1
..$ : num [1:36, 1:4] -0.376 -0.683 -0.734 -1.158 -1.231 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:36] "V1" "V2" "V3" "V4" ...
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
$ changes : num [1:500, 1] 0.0445 0.0413 0.0347 0.0373 0.0337 ...
$ alpha : num [1:2] 0.05 0.01
$ radius : Named num [1:2] 3.61 0
..- attr(*, "names")= chr [1:2] "66.66667%" ""
$ user.weights : num 1
$ distance.weights: num 1
$ whatmap : int 1
$ maxNA.fraction : int 0
$ dist.fcts : chr "sumofsquares"
- attr(*, "class")= chr "kohonen"
Note notice that in lines 7 and 10 of the output there are references to centre and scale. I would appreciate an explanation as to the process here.
Your step with scaling is not redundant because in source code there are no scaling, and attributes, that you see in 7 and 10 are attributes from train dataset.
To check this, just run and compare results of this chunk of code:
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
#compare train datasets
str(dt)
str(as.matrix(iris[, 1:4]))
# Prepare SOM
set.seed(590507)
som1 <- kohonen::som(dt,
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#without scaling
som2 <- kohonen::som(as.matrix(iris[, 1:4]),
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#compare results of som function
str(som1)
str(som2)

5-D Kernel density estimation in R using “kde” function

I want to perform Kernel density estimate for a 5-dimensional data (x,y,z,time,size) by using "kde" function in "ks" library of R. In it's manual it says it can do Kernel density estimate for 1- to 6-dimensional data (Page 24 of manual: http://cran.r-project.org/web/packages/ks/ks.pdf).
My problem is that it says for more than 3 dimensions I need to specify eval.points. I don't know how can I specify the evaluation points because there is no example for more than 3 dimensions. For example if I want to Generate regular 3D sequences data in the space of the problem and use them as the eval-point, what should I do?
Here is my data:
422.697323 164.19886 2.457419 8.083796636 0.83367586
423.008236 163.32434 0.5551326 37.58477455 0.893893903
204.733908 218.36365 1.9397874 37.88324312 0.912809449
203.963056 218.4808 0.3723791 43.21775903 0.926406005
100.727581 46.60876 1.4022341 49.41510519 0.782807523
453.335182 244.25521 1.6292517 51.73779175 0.903910803
134.909462 210.96333 2.2389119 53.13433521 0.896529401
135.300562 212.02055 0.6739541 67.55073745 0.748783521
258.237117 134.29735 2.1205291 76.34032587 0.735699304
341.305271 149.26953 3.718958 94.33975483 0.849509216
307.138925 59.60571 0.6311074 106.9636715 0.987923188
307.76875 58.91453 2.6496741 113.8515307 0.802115718
415.025535 217.17398 1.7155688 115.7464603 0.875580325
414.977687 216.73327 1.7107369 115.9776948 0.767143582
311.006135 173.24378 2.7819572 120.8079566 0.925380118
310.116929 174.28122 4.3318722 129.2648401 0.776528535
347.260911 37.34946 3.5155427 136.7851291 0.851787115
351.317624 33.65703 0.5806926 138.7349284 0.909723017
4.471892 59.42068 1.4062959 139.0543783 0.967270976
5.480223 59.72857 2.7326106 139.2114277 0.987787428
199.513023 21.53302 2.5163259 143.5895625 0.864164659
198.718031 23.50163 0.4801849 147.2280466 0.741587333
26.650517 35.2019 0.8246514 150.4876506 0.744788202
25.089379 90.47825 0.8700944 152.1944046 0.777252476
26.307439 88.41552 2.4422487 155.9090026 0.952215177
234.282901 236.11422 1.8115261 155.9658144 0.776284654
235.052948 236.77437 1.9644963 156.6900297 0.944285448
23.048202 98.6261 3.4573048 159.7700912 0.773057491
21.516695 98.05431 2.5029284 160.8202997 0.978779087
213.936324 151.87013 3.1042192 161.0612489 0.80499513
277.887935 197.25753 1.3659279 163.673142 0.758978575
277.239746 197.54001 2.2109361 166.2629868 0.775325157
And this is the code that I am using:
library(ks)
library(rgl)
kern <- read.table(file.choose(), sep=",")
hat <- kde(kern)
It works for upto 3 dimensions but for 4 and 5 dimensions it says: need to specify eval.points for more than 3 dimensions.
Also, I'd like to know how can I plot these kernels? For example use z as the conditioning variable and plot x,y,time in a 3D scatterplot and also use different colors for different ranges of size
Like you I wasn't initially able to find a worked example and the documentation doesn't really describe what sort of object is expected. For your 5d set of data I tried setting up a 5d-grid of points that were constructed from the 10, 25th, 50th, 75th and 90th percentiles for each of the dimensions. My dataset was named "dat":
evpts <- do.call(expand.grid, lapply(dat, quantile, prob=c(0.1,.25,.5,.75,.9)) )
I then passed that to the kde function and seemed to satisfy the algorithm. Whether this is "correct" does need checking. No guarantees.
> hat <- kde(dat, eval.points= evpts)
> str(hat)
List of 8
$ x : num [1:31, 1:5] 423 423 205 204 101 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "V1" "V2" "V3" "V4" ...
$ eval.points:'data.frame': 3125 obs. of 5 variables:
..$ V1: Named num [1:3125] 23 118 234 326 415 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "25%" "50%" "75%" ...
..$ V2: Named num [1:3125] 35.2 35.2 35.2 35.2 35.2 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ V3: Named num [1:3125] 0.581 0.581 0.581 0.581 0.581 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ V4: Named num [1:3125] 43.2 43.2 43.2 43.2 43.2 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ V5: Named num [1:3125] 0.749 0.749 0.749 0.749 0.749 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..- attr(*, "out.attrs")=List of 2
.. ..$ dim : Named int [1:5] 5 5 5 5 5
.. .. ..- attr(*, "names")= chr [1:5] "V1" "V2" "V3" "V4" ...
.. ..$ dimnames:List of 5
.. .. ..$ V1: chr [1:5] "V1= 23.0482" "V1=117.8185" "V1=234.2829" "V1=326.1557" ...
.. .. ..$ V2: chr [1:5] "V2= 35.20190" "V2= 59.51319" "V2=149.26953" "V2=211.49194" ...
.. .. ..$ V3: chr [1:5] "V3=0.5806926" "V3=1.1180112" "V3=1.9397874" "V3=2.5830000" ...
.. .. ..$ V4: chr [1:5] "V4= 43.21776" "V4= 71.94553" "V4=129.26484" "V4=151.34103" ...
.. .. ..$ V5: chr [1:5] "V5=0.7487835" "V5=0.7764066" "V5=0.8517871" "V5=0.9190948" ...
$ estimate : Named num [1:3125] 3.23e-08 5.70e-08 1.01e-08 4.07e-10 6.20e-12 ...
..- attr(*, "names")= chr [1:3125] "1" "2" "3" "4" ...
$ H : num [1:5, 1:5] 5073.879 1010.815 1.211 -651.089 -0.223 ...
$ gridded : logi FALSE
$ binned : logi FALSE
$ names : chr [1:5] "V1" "V2" "V3" "V4" ...
$ w : num [1:31] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "class")= chr "kde"
I did find an earlier version of the package documentaion that offered this as a worked example of a 4d execution, sot I think my effort is essentially the same, modulo different dimensions:
data(iris)
ir <- iris[,1:4][iris[,5]=="setosa",]
H.scv <- Hscv(ir)
fhat <- kde(ir, H.scv, eval.points=ir)

extracting values from an aovp object in Lmperm

I have the following object M, from which I need to extract the fstatistic. It is a model generated by the function summaryC of a model generated by aovp, both functions from package lmPerm. I have tried hints for extracting values from normal linear models and from the functions in attr, extract and getElement, but without success.
Anybody could give me a hint?
> str(M)
List of 2
$ Error: vegetation: NULL
$ Error: Within :List of 11
..$ NA : NULL
..$ terms :Classes 'terms', 'formula' length 3 Temp ~ depth
.. .. ..- attr(*, "variables")= language list(Temp, depth)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "Temp" "depth"
.. .. .. .. ..$ : chr "depth"
.. .. ..- attr(*, "term.labels")= chr "depth"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
..$ residuals : Named num [1:498] -46.9 -43.9 -46.9 -38.9 -41.9 ...
.. ..- attr(*, "names")= chr [1:498] "3" "4" "5" "6" ...
..$ coefficients : num [1:4, 1:4] -2.00 -1.00 -1.35e-14 1.00 2.59 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:4] "depth1" "depth2" "depth3" "depth4"
.. .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
..$ aliased : Named logi [1:4] FALSE FALSE FALSE FALSE
.. ..- attr(*, "names")= chr [1:4] "depth1" "depth2" "depth3" "depth4"
..$ sigma : num 29
..$ df : int [1:3] 4 494 4
..$ r.squared : num 0.00239
..$ adj.r.squared: num -0.00367
..$ **fstatistic** : Named num [1:3] 0.395 3 494
.. ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
..$ cov.unscaled : num [1:4, 1:4] 0.008 -0.002 -0.002 -0.002 -0.002 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:4] "depth1" "depth2" "depth3" "depth4"
.. .. ..$ : chr [1:4] "depth1" "depth2" "depth3" "depth4"
..- attr(*, "class")= chr "summary.lmp"
- attr(*, "class")= chr "listof"
there it goes a reproducible example to play with:
Temp=1:100
depth<- rep( c("1","2","3","4","5"), 100)
vegetation=rep( c("1","2"), 50)
df=data.frame(Temp,depth,vegetation)
M=summaryC(aovp(Temp~depth+Error(vegetation),df, perm=""))
as the str output from your example shows, M is a list of two lists, the second one contains what you want. Hence list extraction via [[ does the trick:
> M[[2]][["fstatistic"]]
value numdf dendf
0.3946 3.0000 494.0000
If this is not what you want, please comment.

Resources