how to cut the dendrogram with VARCLUS in R (package Hmisc) - r

I want to perform variable clustering using the varclus() function from Hmisc package.
However I do not know how to put clusters of variables into a table if I cut the dendrogram into 10 clusters of variable.
I used to use
groups <- cutree(hclust(d), k=10)
to cut dendrograms of individuals but it doesn't work for variables.

Expanding on #Anatoliy's comment, you indeed use the same cutree() function
as before because the clustering done in varclus() is actually done by the hclust() function.
When you use varclus() you're creating an object of class varclus that contains a hclust object - which can be referenced by using $hclust.
Example:
x <- varclus(d)
x_hclust <- x$hclust ## retrieve hclust object
groups <- cutree(x_hclust, 10)

Related

k-means analysis: how to convert data into numeric?

I want to perform a k-means analysis in R. For that I need numeric data. I tried the following
unlist(pca)
as.numeric(pca)
lapply(pca,as.numeric(pca))
pca is just "normal" Principal Component Analysis data, showed in a plot (with fviz_pca_ind() function).
By the way, when I try to run the k-means analysis, it gives me "list object cannot be coerced to type double". That is why I thought to turn everything into numeric.
How to convert the pca-data into numeric?
Thank you ;)
You're almost correct
lapply(pca,as.numeric)
as.numeric is a function and therefore an object. You need to pass it to lapply() as an object and therefore without the quotation marks.
Most pca objects should return you a list, and you should show which package or function is used to perform the pca, so we can see what's in the list
For example, if you use prcomp, it returns a list of eigenvectors / loadings ($rotation) and principal components ($x). I suppose you are trying to do k-means on the principal componets, and you can do it like:
# perform pca
pca = prcomp(USArrests,scale=TRUE)
# we call out the PCs using pca$x
# and kmeans
kmeans_clus = kmeans(pca$x,3)
## plot
# define colors
COLS = c("#65587f","#f18867","#e85f99")
plot(pca$x[,1:2],col=COLS[kmeans_clus$cluster],pch=20)
legend("topright",fill=COLS,legend=1:3,horiz=TRUE)

ANOSIM with cutree groupings

What i would like to do is an ANOSIM of defined groupings in some assemblage data to see whether the groupings are significantly different from one another, in a similar fashion to this example code:
data(dune)
data(dune.env)
dune.dist <- vegdist(dune)
attach(dune.env)
dune.ano <- anosim(dune.dist, Management)
summary(dune.ano)
However in my own data I have the species abundance in a bray-curtis matrices and after creating hclust() diagrams and creating my own groupings visually by looking at the dendrogram and setting the height. I can then through cutree() get these groupings which can be superimposed on MDS plots etc. but I would like to check the significance of the similarity between the groupings i have created - i.e are the groupings significantly different or just arbitrary groupings?
e.g.
data("dune")
dune.dist <- vegdist(dune)
clua <- hclust(dune.dist, "average")
plot(clua)
rect.hclust(clua, h =0.65)
c1 <- cutree(clua, h=0.65)
I then want to use the c1 defined category as the groupings, which in the example code given was the management factor, and test their similarities to see whether they are actually different via anosim().
I am pretty sure this is just a matter of my inept coding.... any advice would be appreciated.
cutree returns groups as integers: you must change these to factors if you want to use them in anosim: Try anosim(vegdist(dune), factor(c1)). You better contact a local statistician for using anosim to analyse dissimilarities using clusters created from these very same dissimilarities.

Silhouette plot in R

I have a set of data containing:
item, associated cluster, silhouette coefficient. I can further augment this data set with more information if necessary.
I would like to generate a silhouette plot in R. I am having trouble with this because examples I came across use the built-in kmeans (or related) clustering function and plot the result. I want to bypass this step and produce the plot for my own clustering algorithm but I'm ending up short on providing the correct arguments to the plot function.
Thank you.
EDIT
Data set example https://pastebin.mozilla.org/8853427
What I've tried is loading the dataset and passing it to the plot function using various arguments based on https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/silhouette.html
Function silhouette in package cluster can do the plots for you. It just needs a vector of cluster membership (produced from whatever algorithm you choose) and a dissimilarity matrix (probably best to use the same one used in producing the clusters). For example:
library (cluster)
library (vegan)
data(varespec)
dis = vegdist(varespec)
res = pam(dis,3) # or whatever your choice of clustering algorithm is
sil = silhouette (res$clustering,dis) # or use your cluster vector
windows() # RStudio sometimes does not display silhouette plots correctly
plot(sil)
EDIT: For k-means (which uses squared Euclidean distance)
library (vegan)
library (cluster)
data(varespec)
dis = dist(varespec)^2
res = kmeans(varespec,3)
sil = silhouette (res$cluster, dis)
windows()
plot(sil)

How does the heatmap function in R cluster the data an how can we get the number of groups?

I have a distance matrix which is plotted with heatmap function. The heatmap function cluster the data to the groups. I want to cluster them to the same group.
The arguments are:
heatmap(distanceMatrix, symm = T)
The groups of data is evident in the diagnal of matrix.
something like this:
In fact I looking for the number of groups. After that I can use hcluste and cut in R to partition the data.
Have you looked at the help file of the function (`?heatmap)? See the below arguments.
distfun
function used to compute the distance (dissimilarity) between both rows and columns. Defaults to dist.
hclustfun
function used to compute the hierarchical clustering when Rowv or Colv are not dendrograms. Defaults to hclust. Should take as argument a result of distfun and return an object to which as.dendrogram can be applied.
Thelibrary(NbClust) package solved the problem
"NbClust: An examination of indices for determining the number of clusters : NbClust Package"

Variable Clustering (varclus) Summary Tables

I am using varclus from the Hmisc package in R. Are there ways to produce summary tables from varclus like those is in SAS (e.g. Output 100.1.2 and Output 100.1.3 ) in R. Basically, I would like to know the information that is contained in the plot in a tabular or matrix form. For example: what variables are in what clusters (in SAS cluster structure), proportion of variance they explain, etc.
# varclust example in R using mtcars data
mtc <- mtcars[,2:8]
mtcn <- data.matrix(mtc)
clust <- varclus(mtcn)
clust
plot(clust)
#cut_tree <- cutree(varclus(mtcn)$hclust, k=5) # This would show group membership, but only after I chose some a cut point, not what I am after

Resources