Variable Clustering (varclus) Summary Tables - r

I am using varclus from the Hmisc package in R. Are there ways to produce summary tables from varclus like those is in SAS (e.g. Output 100.1.2 and Output 100.1.3 ) in R. Basically, I would like to know the information that is contained in the plot in a tabular or matrix form. For example: what variables are in what clusters (in SAS cluster structure), proportion of variance they explain, etc.
# varclust example in R using mtcars data
mtc <- mtcars[,2:8]
mtcn <- data.matrix(mtc)
clust <- varclus(mtcn)
clust
plot(clust)
#cut_tree <- cutree(varclus(mtcn)$hclust, k=5) # This would show group membership, but only after I chose some a cut point, not what I am after

Related

HCPC in FactomineR: How to count individuals in Clusters?

the title says it all. I performed a multiple correspondence analysis (MCA) in FactomineR with factoshiny and did an HPCP afterwards. I now have 3 clusters on my 2 dimensions. While the factoshiny interface really helps visualize and navigate the analysis easily, I can't find a way to count the individuals in my clusters. Additionally, I would love to assign the clustervariables to the individuals on my dataset. Those operations are easily performed with hclust, but their algorithms don't work on categorical data.
##dummy dataset
x <- as.factor(c(1,1,2,1,3,4,3,2,1))
y <- as.factor(c(2,3,1,4,4,2,1,1,1))
z <- as.factor(c(1,2,1,1,3,4,2,1,1))
data <- data.frame(x,y,z)
# used packages
library(FactoMineR)
library(Factoshiny)
# the function used to open factoshiny in your browser
res.MCA <- Factoshiny(data)
# factoshiny code:
# res.MCA<-MCA(data,graph=FALSE)
# hcpc code in factoshiny
res.MCA<-MCA(data,ncp=8,graph=FALSE)
res.HCPC<-HCPC(res.MCA,nb.clust=3,consol=FALSE,graph=FALSE)
plot.HCPC(res.HCPC,choice='tree',title='Hierarchical tree')
plot.HCPC(res.HCPC,choice='map',draw.tree=FALSE,title='Factor map')
plot.HCPC(res.HCPC,choice='3D.map',ind.names=FALSE,centers.plot=FALSE,angle=60,title='Hierarchical tree on the factor map')
I now want a variable data$cluster with 3 levels so that I can count the individuals in the clusters.
To anyone encountering a similar problem, this helped:
res.HCPC$data.clust # returns all values and cluster membership for every individual
res.HCPC$data.clust[1,]$clust # for the first individual
table(res.HCPC$data.clust$clust) # gives table of frequencies per cluster

lmList diagnostic plots - is it possible to subset data during a procedure or do data frames have to be subset and then passed in?

I am new to R and am trying to produce a vast number of diagnostic plots for linear models for a huge data set.
I discovered the lmList function from the nlme package.
This works a treat but what I now need is a means of passing in a fraction of this data into the plot function so that the resulting plots are not minute and unreadable.
In the example below 27 plots are nicely displayed. I want to produce diagnostics for much more data.
Is it necessary to subset the data first? (presumably with loops) or is it possible to subset within the plotting function (presumably with some kind of loop) rather than create 270 data frames and pass them all in separately?
I'm sorry to say that my R is so basic that I do not even know how to pass variables into names and values together in for loops (I tried using the paste function but it failed).
The data and function for the example are below – I would be picking values of Subject by their row numbers within the data frame. I grant that the 27 plots here show nicely but for sake of example it would be nice to split them into say into 3 sets of 9.
fm1 <- lmList(distance ~ age | Subject, Orthodont)
# observed versus fitted values by Subject
plot(fm1, distance ~ fitted(.) | Subject, abline = c(0,1))
Examples from:
https://stat.ethz.ch/R-manual/R-devel/library/nlme/html/plot.lmList.html
I would be most grateful for help and hope that my question isn't insulting to anyone's intelligence or otherwise annoying.
I can't see how to pass a subset to the plot.lmList function. But, here is a way to do it using standard split-apply-combine strategy. Here, the Subjects are just split into three arbitrary groups of 9, and lmList is applied to each group.
## Make 3 lmLists
fits <- lapply(split(unique(Orthodont$Subject), rep(1:3, each=3)), function(x) {
eval(substitute(
lmList(distance ~ age | Subject, # fit the data to subset
data=Orthodont[Orthodont$Subject %in% x,]), # use the subset
list(x=x))) # substitue the actual x-values so the proper call gets stored
})
## Make plots
for (i in seq_along(fits)) {
dev.new()
print(plot(fits[[i]], distance ~ fitted(.) | Subject, abline = c(0,1)))
}

Issue with RDA plot in R

I am a novice in using R for multivariate analysis . I am trying to get a RDA plot depicting the relationship between my species abundance and environmental data. I have 6 environmental variables. But when I obtain the plot, I am able to see only only two vectors representing two variables alone. The commands I have used are below.
data <- read.csv("all_data.csv",h=T);
library(vegan)
sp1 <- data[,c("Sample","Acidobacteria","Actinobacteria","Aquificae","Bacteroidetes")];
env1 <- data[,c("Nitrogen","TOC","Phosphate","Sand","Silt","Clay")];
myrda <- rda(sp1,env1)
plot(myrda,scaling=2)
Someone please help me out with this. I wish to see all the 6 environmental parameters in my RDA plot.
Here is an example using vegan's example data varespec and varechem. The plot of the rda model automatically displays all 14 environmental variables:
library(vegan)
data(varespec)
data(varechem)
myrda <- rda(varespec, varechem)
myrda
colnames(varechem) # 14 variables
plot(myrda,scaling=2) # 14 vectors shown
Maybe double check that your data.frames correctly contain variable names so thet the plot knows where to grab labels. I would also make sure that your data splitting is working correctly - I don't think that your method will always work. Here is a possible alternative that should:
sp.incl <- match(c("Sample","Acidobacteria","Actinobacteria","Aquificae","Bacteroidetes"), colnames(data))
sp1 <- data[,sp.incl]
env.incl <- match(c("Nitrogen","TOC","Phosphate","Sand","Silt","Clay"), colnames(data))
env1 <- data[,env.incl]

Creating a dendrogram with the results from the results the multipatt function in the indicspecies package

I am getting familiar with the multipatt function in indicspecies package. Thus far I only see summary being used to give a breakdown of the results. However I would like a dendrogram, ideally with the names of the species which are more 'indicative' of my given community location.
example from package file:
library(indicspecies)
library(stats)
data(wetland) ## Loads species data
wetkm = kmeans(wetland, centers=3) ## Creates three clusters using kmeans
## Runs the combination analysis using IndVal.g as statistic
wetpt = multipatt(wetland, wetkm$cluster, control = how(nperm=999))
## Lists those species with significant association to one combination
summary(wetpt)
wetpt gives the raw results but I am not sure how to proceed to get a cluster plot out of this result. Can anyone offer any pointers?

how to cut the dendrogram with VARCLUS in R (package Hmisc)

I want to perform variable clustering using the varclus() function from Hmisc package.
However I do not know how to put clusters of variables into a table if I cut the dendrogram into 10 clusters of variable.
I used to use
groups <- cutree(hclust(d), k=10)
to cut dendrograms of individuals but it doesn't work for variables.
Expanding on #Anatoliy's comment, you indeed use the same cutree() function
as before because the clustering done in varclus() is actually done by the hclust() function.
When you use varclus() you're creating an object of class varclus that contains a hclust object - which can be referenced by using $hclust.
Example:
x <- varclus(d)
x_hclust <- x$hclust ## retrieve hclust object
groups <- cutree(x_hclust, 10)

Resources