Issue with RDA plot in R - r

I am a novice in using R for multivariate analysis . I am trying to get a RDA plot depicting the relationship between my species abundance and environmental data. I have 6 environmental variables. But when I obtain the plot, I am able to see only only two vectors representing two variables alone. The commands I have used are below.
data <- read.csv("all_data.csv",h=T);
library(vegan)
sp1 <- data[,c("Sample","Acidobacteria","Actinobacteria","Aquificae","Bacteroidetes")];
env1 <- data[,c("Nitrogen","TOC","Phosphate","Sand","Silt","Clay")];
myrda <- rda(sp1,env1)
plot(myrda,scaling=2)
Someone please help me out with this. I wish to see all the 6 environmental parameters in my RDA plot.

Here is an example using vegan's example data varespec and varechem. The plot of the rda model automatically displays all 14 environmental variables:
library(vegan)
data(varespec)
data(varechem)
myrda <- rda(varespec, varechem)
myrda
colnames(varechem) # 14 variables
plot(myrda,scaling=2) # 14 vectors shown
Maybe double check that your data.frames correctly contain variable names so thet the plot knows where to grab labels. I would also make sure that your data splitting is working correctly - I don't think that your method will always work. Here is a possible alternative that should:
sp.incl <- match(c("Sample","Acidobacteria","Actinobacteria","Aquificae","Bacteroidetes"), colnames(data))
sp1 <- data[,sp.incl]
env.incl <- match(c("Nitrogen","TOC","Phosphate","Sand","Silt","Clay"), colnames(data))
env1 <- data[,env.incl]

Related

How to project a new data row onto PCA space using dudi.mix in R?

I have a mixed dataset (comprising continuous, ordinal and nominal variables) that is high-dimensional (with more variables than rows). I want to perform a mixed data PCA using the dudi.mix() function in the R package ade4. After PCA, I want to project a new supplementary row onto the PCA space, i.e. find its coordinates in the PCA coordinate system. I tried the suprow() function in ade4 but it gives me the following error message: “Not yet implemented for 'dudi.mix'. Please use 'dudi.hillsmith'.” I don’t want to use the dudi.hillsmith() function because I think it only allows for mixed continuous and nominal variables, but my dataset comprises continuous, nominal and ordinal variables, and my understanding is that dudi.mix() is the correct function to use in this case.
Is there an alternative way how I can project a new row onto the PCA space generated by dudi.mix()?
Below an example:
# load ade4 package
library(ade4)
# a high-dimensional mixed dataset with 11 rows and 13 variables
dat <- data.frame(
a=as.numeric(c(2.5,0.5,2.2,1.9,3.1,2.3,2.0,1.0,1.5,1.1,3.4)),
b=as.numeric(c(2.4,0.7,2.9,2.2,3.0,2.7,1.6,1.1,1.6,0.9,3.1)),
c=as.numeric(c(1.3,1.1,2.4,3.1,2.2,1.3,1.5,1.8,1.1,0.5,3.8)),
d=as.numeric(c(1.9,0.9,2.1,2.3,2.8,1.9,1.9,1.3,2.9,0.8,2.9)),
e=as.numeric(c(2.2,1.2,2.5,2.9,1.9,3.1,2.1,0.9,1.8,0.9,2.8)),
f=as.factor(c(0,0,0,0,1,0,1,1,1,0,1)),
g=as.factor(c(0,1,0,0,1,0,1,0,1,0,0)),
h=as.factor(c(1,1,1,1,0,1,0,0,0,1,1)),
i=as.factor(c(1,0,0,0,0,1,0,1,0,0,0)),
j=as.ordered(c(0,1,0,2,3,4,0,1,2,4,2)),
k=as.ordered(c(1,2,1,3,4,4,1,2,2,3,3)),
l=as.ordered(c(0,1,1,2,3,2,0,1,1,3,1)),
m=as.ordered(c(0,0,1,2,1,2,2,1,0,2,1)))
# first 10 rows are used for PCA
dat.1 <- dat[1:10,]
# the 11s row should be projected onto the PCA space
dat.2 <- dat[11,]
# pca on dat.1 with 9 kept axes (i.e. number of rows - 1)
pca.res <- dudi.mix(df=dat.1, scann=FALSE, nf = 9)
# my attempt to project dat.2 onto pca.res fails
suprow(x=pca.res, Xsup=dat.2)

HCPC in FactomineR: How to count individuals in Clusters?

the title says it all. I performed a multiple correspondence analysis (MCA) in FactomineR with factoshiny and did an HPCP afterwards. I now have 3 clusters on my 2 dimensions. While the factoshiny interface really helps visualize and navigate the analysis easily, I can't find a way to count the individuals in my clusters. Additionally, I would love to assign the clustervariables to the individuals on my dataset. Those operations are easily performed with hclust, but their algorithms don't work on categorical data.
##dummy dataset
x <- as.factor(c(1,1,2,1,3,4,3,2,1))
y <- as.factor(c(2,3,1,4,4,2,1,1,1))
z <- as.factor(c(1,2,1,1,3,4,2,1,1))
data <- data.frame(x,y,z)
# used packages
library(FactoMineR)
library(Factoshiny)
# the function used to open factoshiny in your browser
res.MCA <- Factoshiny(data)
# factoshiny code:
# res.MCA<-MCA(data,graph=FALSE)
# hcpc code in factoshiny
res.MCA<-MCA(data,ncp=8,graph=FALSE)
res.HCPC<-HCPC(res.MCA,nb.clust=3,consol=FALSE,graph=FALSE)
plot.HCPC(res.HCPC,choice='tree',title='Hierarchical tree')
plot.HCPC(res.HCPC,choice='map',draw.tree=FALSE,title='Factor map')
plot.HCPC(res.HCPC,choice='3D.map',ind.names=FALSE,centers.plot=FALSE,angle=60,title='Hierarchical tree on the factor map')
I now want a variable data$cluster with 3 levels so that I can count the individuals in the clusters.
To anyone encountering a similar problem, this helped:
res.HCPC$data.clust # returns all values and cluster membership for every individual
res.HCPC$data.clust[1,]$clust # for the first individual
table(res.HCPC$data.clust$clust) # gives table of frequencies per cluster

lmList diagnostic plots - is it possible to subset data during a procedure or do data frames have to be subset and then passed in?

I am new to R and am trying to produce a vast number of diagnostic plots for linear models for a huge data set.
I discovered the lmList function from the nlme package.
This works a treat but what I now need is a means of passing in a fraction of this data into the plot function so that the resulting plots are not minute and unreadable.
In the example below 27 plots are nicely displayed. I want to produce diagnostics for much more data.
Is it necessary to subset the data first? (presumably with loops) or is it possible to subset within the plotting function (presumably with some kind of loop) rather than create 270 data frames and pass them all in separately?
I'm sorry to say that my R is so basic that I do not even know how to pass variables into names and values together in for loops (I tried using the paste function but it failed).
The data and function for the example are below – I would be picking values of Subject by their row numbers within the data frame. I grant that the 27 plots here show nicely but for sake of example it would be nice to split them into say into 3 sets of 9.
fm1 <- lmList(distance ~ age | Subject, Orthodont)
# observed versus fitted values by Subject
plot(fm1, distance ~ fitted(.) | Subject, abline = c(0,1))
Examples from:
https://stat.ethz.ch/R-manual/R-devel/library/nlme/html/plot.lmList.html
I would be most grateful for help and hope that my question isn't insulting to anyone's intelligence or otherwise annoying.
I can't see how to pass a subset to the plot.lmList function. But, here is a way to do it using standard split-apply-combine strategy. Here, the Subjects are just split into three arbitrary groups of 9, and lmList is applied to each group.
## Make 3 lmLists
fits <- lapply(split(unique(Orthodont$Subject), rep(1:3, each=3)), function(x) {
eval(substitute(
lmList(distance ~ age | Subject, # fit the data to subset
data=Orthodont[Orthodont$Subject %in% x,]), # use the subset
list(x=x))) # substitue the actual x-values so the proper call gets stored
})
## Make plots
for (i in seq_along(fits)) {
dev.new()
print(plot(fits[[i]], distance ~ fitted(.) | Subject, abline = c(0,1)))
}

Unable to plot PCA data in R. Are scores defined by a given object/name to plot them specifically?

I have completed a simple PCA function using code that was passed down thru the institution. It outputs scores, loadings, eigen values, % eigen values, # of principal components, mean of columns, std deviation, and lastly the starting data. In the output file the scores are labeled with [[1]] before displaying the scores. I am attempting to plot these scores but I am unsure on how to take that data from this point. I assumed it was assigned to this [[1]] or something in the code defined these scores. This line of code is presented below:
"#"perform pca on x
x.svd <- svd(x);
x.R <- x.svd$u %*% diag(x.svd$d);
x.C <- t(x.svd$v);
x.EV <- x.svd$d * x.svd$d
x.EVpct <- x.EV/sum(x.EV);
x.EV <- x.EV[1:sm];
x.EVpct <- x.EVpct[1:sm];
x.CumEVpct <- x.EVpct;
x.R is the part of the code enacting the scores but that too will not work with the plot function. Hopefully someone understands what I am struggling to ask. Any help is very appreciated. Thank you for your time.
The easiest thing to do would be:
pc <- prcomp(x)
plot(pc$x[, 1:2]

Variable Clustering (varclus) Summary Tables

I am using varclus from the Hmisc package in R. Are there ways to produce summary tables from varclus like those is in SAS (e.g. Output 100.1.2 and Output 100.1.3 ) in R. Basically, I would like to know the information that is contained in the plot in a tabular or matrix form. For example: what variables are in what clusters (in SAS cluster structure), proportion of variance they explain, etc.
# varclust example in R using mtcars data
mtc <- mtcars[,2:8]
mtcn <- data.matrix(mtc)
clust <- varclus(mtcn)
clust
plot(clust)
#cut_tree <- cutree(varclus(mtcn)$hclust, k=5) # This would show group membership, but only after I chose some a cut point, not what I am after

Resources