How to plot PCA using hellinger transformation in ggplot? - r

I'm trying to ggplot using Hellinger Transformation on my dataset. It works fine for a regular prcomp function but not Hellingers. How can I plot the data from Hellinger transformed data using ggplot?
library(ggfortify)
library(vegan)
df <- iris[1:4]
pca_res <- prcomp(df, scale. = TRUE)
autoplot(pca_res, data = iris, colour =
'Species',
loadings = TRUE, loadings.colour = 'blue',
loadings.label = TRUE, loadings.label.size = 3)
##Hellinger Transformation
df.hell <- decostand(df, method = "hellinger")
df.hell <- rda(df.hell)
ggplot2::autoplot(df.hell)
autoplot(df.hell, data = iris, colour =
'Species',
loadings = TRUE, loadings.colour = 'blue',
loadings.label = TRUE, loadings.label.size = 3)
Error: Objects of type rda/cca not supported by autoplot.
Error: Objects of type rda/cca not supported by autoplot.
Edit 1: Even if the first plot can be manually computed in ggplot2, what about the rest of the plots like loading, or ellipses etc? base plot allows for overlay when using Hellingers but doesn't seem like ggplot2 would directly allow for it.

prcomp returns an object of class prcomp, which can be plotted with autoplot. As the error message says, rda function returns an object of class "rda" "cca", which cannot be plotted using autoplot. Therefore, you must extract the bits you need manually:
data.frame(PC = df.hell$CA$u, species = iris$Species) %>%
ggplot(aes(x=PC.PC1, y=PC.PC2)) +
geom_point(aes(colour=species))
You can find the relevant parts of the object by doing str(df.hell):
List of 10
$ colsum : Named num [1:4] 0.037 0.0746 0.086 0.0854
..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
$ tot.chi : num 0.0216
$ Ybar : num [1:150, 1:4] 0.0042 0.00511 0.0042 0.00359 0.00363 ...
..- attr(*, "scaled:center")= Named num [1:4] 0.656 0.479 0.498 0.267
.. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
..- attr(*, "METHOD")= chr "PCA"
$ method : chr "rda"
$ call : language rda(X = df.hell)
$ pCCA : NULL
$ CCA : NULL
$ CA :List of 7
..$ eig : Named num [1:4] 0.0208691 0.0005348 0.0001951 0.0000205
.. ..- attr(*, "names")= chr [1:4] "PC1" "PC2" "PC3" "PC4"
..$ poseig : NULL
..$ u : num [1:150, 1:4] -0.122 -0.11 -0.119 -0.106 -0.123 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
..$ v : num [1:4, 1:4] -0.241 -0.508 0.589 0.58 0.375 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
.. .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
..$ rank : int 4
..$ tot.chi: num 0.0216
..$ Xbar : num [1:150, 1:4] 0.0042 0.00511 0.0042 0.00359 0.00363 ...
.. ..- attr(*, "scaled:center")= Named num [1:4] 0.656 0.479 0.498 0.267
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
.. ..- attr(*, "METHOD")= chr "PCA"
$ inertia : chr "variance"
$ regularization: chr "this is a vegan::rda result object"
- attr(*, "class")= chr [1:2] "rda" "cca"

Related

Refer to variable by part of the variable name

It seems that, in R, I can refer to a variable with part of a variable name. But I am confused about why I can do that.
Use the following code as an example:
library(car)
scatterplot(housing ~ total)
house.lm <- lm(housing ~ total)
summary(house.lm)
str(summary(house.lm))
summary(house.lm)$coefficients[2,2]
summary(house.lm)$coe[2,2]
When I print the structure of summary(house.lm), I got the following output:
> str(summary(house.lm))
List of 11
$ call : language lm(formula = housing ~ total)
$ terms :Classes 'terms', 'formula' language housing ~ total
.. ..- attr(*, "variables")= language list(housing, total)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "housing" "total"
.. .. .. ..$ : chr "total"
.. ..- attr(*, "term.labels")= chr "total"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(housing, total)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "housing" "total"
$ residuals : Named num [1:162] -8.96 -11.43 3.08 8.45 2.2 ...
..- attr(*, "names")= chr [1:162] "1" "2" "3" "4" ...
$ coefficients : num [1:2, 1:4] 28.4523 0.0488 10.2117 0.0103 2.7862 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "(Intercept)" "total"
.. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
$ aliased : Named logi [1:2] FALSE FALSE
..- attr(*, "names")= chr [1:2] "(Intercept)" "total"
$ sigma : num 53.8
$ df : int [1:3] 2 160 2
$ r.squared : num 0.123
$ adj.r.squared: num 0.118
$ fstatistic : Named num [1:3] 22.5 1 160
..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
$ cov.unscaled : num [1:2, 1:2] 3.61e-02 -3.31e-05 -3.31e-05 3.67e-08
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "(Intercept)" "total"
.. ..$ : chr [1:2] "(Intercept)" "total"
- attr(*, "class")= chr "summary.lm"
However, it seems that I can refer to the variable coefficients with all of the following commands:
summary(house.lm)$coe[2,2]
summary(house.lm)$coef[2,2]
summary(house.lm)$coeff[2,2]
summary(house.lm)$coeffi[2,2]
summary(house.lm)$coeffic[2,2]
summary(house.lm)$coeffici[2,2]
summary(house.lm)$coefficie[2,2]
summary(house.lm)$coefficien[2,2]
summary(house.lm)$coefficient[2,2]
summary(house.lm)$coefficients[2,2]
They all give the same results: 0.01029709
Therefore, I was wondering when I can refer to a variable with only part of its name in R?
You can do it when rest of name is unambiguous. For example
df <- data.frame(abcd = c(1,2,3), xyz = c(4,5,6), abc = c(5,6,7))
> df$xy
[1] 4 5 6
> df$ab
NULL
> df$x
[1] 4 5 6
df$xy and even df$x gives right data, but df$ab results in NULL because it can refer to both df$abc and df$abcd. It's like when you type df$xy in RStudio and press Ctrl + Space you will get rigtht variable name, so you could refer to part of variable name.
http://adv-r.had.co.nz/Functions.html#lexical-scoping
When calling a function you can specify arguments by position, by
complete name, or by partial name. Arguments are matched first by
exact name (perfect matching), then by prefix matching, and finally by
position.
When you are doing quick coding to analyse some data, using partial names is not a problem, but I tend to agree, it's not good when writing code. In a package you can't do that, R-CMD check will find every occurence.

R kohonen - Is the input data scaled and centred automatically?

I have been following an online example for R Kohonen self-organising maps (SOM) which suggested that the data should be centred and scaled before computing the SOM.
However, I've noticed the object created seems to have attributes for centre and scale, in which case am I really applying a redundant step by centring and scaling first? Example script below
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
# Prepare SOM
set.seed(590507)
som1 <- som(dt,
somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
str(som1)
The output from the last line of the script is:
List of 13
$ data :List of 1
..$ : num [1:150, 1:4] -0.898 -1.139 -1.381 -1.501 -1.018 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
.. ..- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
.. ..- attr(*, "scaled:scale")= Named num [1:4] 0.828 0.436 1.765 0.762
.. .. ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
$ unit.classif : num [1:150] 3 5 5 5 4 2 4 4 6 5 ...
$ distances : num [1:150] 0.0426 0.0663 0.0768 0.0744 0.1346 ...
$ grid :List of 6
..$ pts : num [1:36, 1:2] 1.5 2.5 3.5 4.5 5.5 6.5 1 2 3 4 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "x" "y"
..$ xdim : num 6
..$ ydim : num 6
..$ topo : chr "hexagonal"
..$ neighbourhood.fct: Factor w/ 2 levels "bubble","gaussian": 1
..$ toroidal : logi FALSE
..- attr(*, "class")= chr "somgrid"
$ codes :List of 1
..$ : num [1:36, 1:4] -0.376 -0.683 -0.734 -1.158 -1.231 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:36] "V1" "V2" "V3" "V4" ...
.. .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
$ changes : num [1:500, 1] 0.0445 0.0413 0.0347 0.0373 0.0337 ...
$ alpha : num [1:2] 0.05 0.01
$ radius : Named num [1:2] 3.61 0
..- attr(*, "names")= chr [1:2] "66.66667%" ""
$ user.weights : num 1
$ distance.weights: num 1
$ whatmap : int 1
$ maxNA.fraction : int 0
$ dist.fcts : chr "sumofsquares"
- attr(*, "class")= chr "kohonen"
Note notice that in lines 7 and 10 of the output there are references to centre and scale. I would appreciate an explanation as to the process here.
Your step with scaling is not redundant because in source code there are no scaling, and attributes, that you see in 7 and 10 are attributes from train dataset.
To check this, just run and compare results of this chunk of code:
# Load package
require(kohonen)
# Set data
data(iris)
# Scale and centre
dt <- scale(iris[, 1:4],center=TRUE)
#compare train datasets
str(dt)
str(as.matrix(iris[, 1:4]))
# Prepare SOM
set.seed(590507)
som1 <- kohonen::som(dt,
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#without scaling
som2 <- kohonen::som(as.matrix(iris[, 1:4]),
kohonen::somgrid(6,6, "hexagonal"),
rlen=500,
keep.data=TRUE)
#compare results of som function
str(som1)
str(som2)

replace values in a nested list with values from another matrix

I have a list of 28 elements like the following which are combinations of 8 elements two by two (SG1, SG2, ... SG8):
str(combinations)
List of 28
$ : chr [1:2] "SG1" "SG2"
$ : chr [1:2] "SG1" "SG3"
$ : chr [1:2] "SG1" "SG4"
...
I have stored the results of a t.test in an object:
results <- lapply(seq_along(combinations), function (n) {
mydatatemp <- mydata[with(mydata, Subgroup %in% unlist(combinations2[n]) & Group %in% c("G1", "G3")),]
result <- t.test(mydatatemp[
mydatatemp$Subgroup == sapply(combinations[n], "[",1),4],
mydatatemp[mydatatemp$Subgroup == sapply(combinations[n], "[", 2),4],
alternative="two.sided", var.equal=TRUE)
return(result)})
and my str(result) is like this, of course 28 list elements one after another like this one:
List of 28
$ :List of 9
..$ statistic : Named num -6.9
.. ..- attr(*, "names")= chr "t"
..$ parameter : Named num 38
.. ..- attr(*, "names")= chr "df"
..$ p.value : num 3.33e-08
..$ conf.int : atomic [1:2] -0.301 -0.164
.. ..- attr(*, "conf.level")= num 0.95
..$ estimate : Named num [1:2] 0.196 0.429
.. ..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
..$ null.value : Named num 0
.. ..- attr(*, "names")= chr "difference in means"
..$ alternative: chr "two.sided"
..$ method : chr " Two Sample t-test"
..$ data.name : chr [1:2] "mydatatemp[mydatatemp$Subgroup == sapply(combinations[n], \"[\", 1), and mydatatemp[mydatatemp$Subgroup == sapply(combinations[n], \"[\""| __truncated__ " 4] and 4]"
..- attr(*, "class")= chr "htest"
how can I rename all of the $data.name elements chr [1:2] with the following paste code?
paste(matrix(unlist(combinations), ncol = 2, byrow = TRUE)[,1], matrix(unlist(combinations), ncol = 2, byrow = TRUE)[,2], sep = " vs. ")
the reason I am doing this is to replace those ugly looking characters in the $data.name element.

How to access principal components rownames in prcomp?

I'm probably not explaining myself very well here. How can I access the column of names in prcomp following its use as shown below? I would like to use this as a list for subsequent plots.
prcomp(USArrests)
Standard deviations:
[1] 83.732400 14.212402 6.489426 2.482790
Rotation:
PC1 PC2 PC3 PC4
Murder 0.04170432 -0.04482166 0.07989066 -0.99492173
Assault 0.99522128 -0.05876003 -0.06756974 0.03893830
UrbanPop 0.04633575 0.97685748 -0.20054629 -0.05816914
Rape 0.07515550 0.20071807 0.97408059 0.07232502
I would like to access the extract the list "Murder, Assault, UrbanPop, Rape".
It is always helpful to use str:
res <- prcomp(USArrests)
str(res)
# List of 5
# $ sdev : num [1:4] 83.73 14.21 6.49 2.48
# $ rotation: num [1:4, 1:4] 0.0417 0.9952 0.0463 0.0752 -0.0448 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
# .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
# $ center : Named num [1:4] 7.79 170.76 65.54 21.23
# ..- attr(*, "names")= chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
# $ scale : logi FALSE
# $ x : num [1:50, 1:4] 64.8 92.8 124.1 18.3 107.4 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
# .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
# - attr(*, "class")= chr "prcomp"
Then we can do:
rownames(res$rotation)
#[1] "Murder" "Assault" "UrbanPop" "Rape"

extracting values from an aovp object in Lmperm

I have the following object M, from which I need to extract the fstatistic. It is a model generated by the function summaryC of a model generated by aovp, both functions from package lmPerm. I have tried hints for extracting values from normal linear models and from the functions in attr, extract and getElement, but without success.
Anybody could give me a hint?
> str(M)
List of 2
$ Error: vegetation: NULL
$ Error: Within :List of 11
..$ NA : NULL
..$ terms :Classes 'terms', 'formula' length 3 Temp ~ depth
.. .. ..- attr(*, "variables")= language list(Temp, depth)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "Temp" "depth"
.. .. .. .. ..$ : chr "depth"
.. .. ..- attr(*, "term.labels")= chr "depth"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
..$ residuals : Named num [1:498] -46.9 -43.9 -46.9 -38.9 -41.9 ...
.. ..- attr(*, "names")= chr [1:498] "3" "4" "5" "6" ...
..$ coefficients : num [1:4, 1:4] -2.00 -1.00 -1.35e-14 1.00 2.59 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:4] "depth1" "depth2" "depth3" "depth4"
.. .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
..$ aliased : Named logi [1:4] FALSE FALSE FALSE FALSE
.. ..- attr(*, "names")= chr [1:4] "depth1" "depth2" "depth3" "depth4"
..$ sigma : num 29
..$ df : int [1:3] 4 494 4
..$ r.squared : num 0.00239
..$ adj.r.squared: num -0.00367
..$ **fstatistic** : Named num [1:3] 0.395 3 494
.. ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
..$ cov.unscaled : num [1:4, 1:4] 0.008 -0.002 -0.002 -0.002 -0.002 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:4] "depth1" "depth2" "depth3" "depth4"
.. .. ..$ : chr [1:4] "depth1" "depth2" "depth3" "depth4"
..- attr(*, "class")= chr "summary.lmp"
- attr(*, "class")= chr "listof"
there it goes a reproducible example to play with:
Temp=1:100
depth<- rep( c("1","2","3","4","5"), 100)
vegetation=rep( c("1","2"), 50)
df=data.frame(Temp,depth,vegetation)
M=summaryC(aovp(Temp~depth+Error(vegetation),df, perm=""))
as the str output from your example shows, M is a list of two lists, the second one contains what you want. Hence list extraction via [[ does the trick:
> M[[2]][["fstatistic"]]
value numdf dendf
0.3946 3.0000 494.0000
If this is not what you want, please comment.

Resources