How to get a good dendrogram using R - r

I am using R to do a hierarchical cluster analysis using the Ward's squared euclidean distance. I have a matrix of x columns(stations) and y rows(numbers in float), the first row contain the header(stations' names). I want to have a good dendrogram where the name of the station appear at the bottom of the tree as i am not able to interprete my result. My aim is to find those stations which are similar. However using the following codes i am having numbers (100,101,102,...) for the lower branches.
Yu<-read.table("yu_s.txt",header = T, dec=",")
library(cluster)
agn1 <- agnes(Yu, metric = "euclidean", method="ward", stand = TRUE)
hcd<-as.dendrogram(agn1)
par(mfrow=c(3,1))
plot(hcd, main="Main")
plot(cut(hcd, h=25)$upper,
main="Upper tree of cut at h=25")
plot(cut(hcd, h=25)$lower[[2]],
main="Second branch of lower tree with cut at h=25")

A nice collection of examples are present here (http://gastonsanchez.com/blog/how-to/2012/10/03/Dendrograms.html)
Two methods:
with hclust from base R
hc<-hclust(dist(mtcars),method="ward")
plot(hc)
Default plot
ggplot
with ggplot and ggdendro
library(ggplot2)
library(ggdendro)
# basic option
ggdendrogram(hc, rotate = TRUE, size = 4, theme_dendro = FALSE)

Related

Visualizing PCA with large number of variables in R using ggbiplot

I am trying to visualize a PCA that includes 87 variables.
prc <-prcomp(df[,1:87], center = TRUE, scale. = TRUE)
ggbiplot(prc, labels = rownames(df[,1:87]), var.axes = TRUE)
When I create the biplot, many of the vectors overlap with each other, making it impossible to read the labels. I was wondering if there is any way to only show some of the labels at a time. For example, I think it'd be useful if I could create a few separate biplots with each one showing only a subset of the labels on the vectors.
This question seems closely related, but I don't know if it translates to the latest version of ggbiplot. I'm also not sure how to modify the original functions.
A potential solution is to use the factoextra package to visualize your PCA results. The fviz_pca_biplot() function includes a repel argument. When repel = TRUE the plot labels are spread out to minimize overlap. There are also select.var options mentioned in the documentation, such as select.var = list(contrib=5) to display only the 5 most influential vectors. Also a select.var = list(name) option that seems to allow for the specification of a specific subset of variables that you want shown.
# read data
df <- mtcars[, c(1:7,10:11)]
# perform PCA
library("FactoMineR")
res.pca <- PCA(df, graph = FALSE)
# visualize
library(factoextra)
fviz_pca_biplot(res.pca, repel = TRUE, select.var = list(contrib = 5))

Plotting R2 of each/certain PCA component per wavelength with R

I have some experience in using PCA, but this is the first time I am attempting to use PCA for spectral data...
I have a large data with spectra where I used prcomp command to calculated PCA for the whole dataset. My results show that 3 components explain 99% of the variance.
I would like to plot the contribution of each of the three PCA components at every wavelength (in steps of 4, 200-1000 nm) like the example of a plot 2 I found on this site:
https://learnche.org/pid/latent-variable-modelling/principal-component-analysis/pca-example-analysis-of-spectral-data
Does anyone have a code how I could do this in R?
Thank you
I believe the matrix of variable loadings is found in model.pca$rotation, see prcomp documentation.
So something like this should do (using the example on your linked website):
file <- 'http://openmv.net/file/tablet-spectra.csv'
spectra <- read.csv(file, header = FALSE)
n.comp <- 4
model.pca <- prcomp(spectra[,2:651],
center = TRUE,
scale =TRUE,
rank. = n.comp)
summary(model.pca)
par(mfrow=c(n.comp,1))
sapply(1:n.comp, function(comp){
plot(2:651, model.pca$rotation[,comp], type='l', lwd=2,
main=paste("Comp.", comp), xlab="Wavelength INDEX")
})
I don't have the wavelength values, so I used the indices of the array here ; output below.

Inconsistent clustering with ComplexHeatmap?

So I'm trying to generate a heatmap for my data using Bioconductor's ComplexHeatmap package, but I get slightly different results depending on whether I make the dendrogram myself, or tell Heatmap to make it.
Packages:
require(ComplexHeatmap)
require(dendextend)
Data:
a=rnorm(400,1)
b=as.matrix(a)
dim(b)=c(80,5)
If I make the dendrogram myself:
d=dist(b,method="euclidean")
d=as.dist(d)
h=hclust(d,method="ward.D")
dend=as.dendrogram(h)
Heatmap(b,
cluster_columns=FALSE,
cluster_rows = dend)
Versus having Heatmap do the clustering:
Heatmap(b,
cluster_columns=FALSE,
clustering_distance_rows = "euclidean",
clustering_method_rows = "ward.D")
They tend to look very similar, but they'll be very slightly different.
And this matters a lot for my data. Heatmap's clustering ends up organizing my data way, way better, however, I also want to extract the list of clustered items via like cutree(), but I don't think I can extract it from Heatmap's clustering.
Does anyone know what's going on?
the dendrograms are the same. The only thing that changes is the ordering. You can verify this using:
hmap1 <- Heatmap(b,
cluster_columns=FALSE,
cluster_rows = dend)
hmap2 <- Heatmap(b,
cluster_columns=FALSE,
clustering_distance_rows = "euclidean",
clustering_method_rows = "ward.D")
#Reorder both row dendrograms using the same weights:
rowdend1 <- reorder(row_dend(hmap1)[[1]], 1:80)
rowdend2 <- reorder(row_dend(hmap2)[[1]], 1:80)
#check that they are identical:
identical( rowdend1, rowdend2)
## [1] TRUE
The ComplexHeatmap::Heatmap function has an argument row_dend_reorder with default value TRUE that you should check.

How to cut a dendrogram in r

Okay so I'm sure this has been asked before but I can't find a nice answer anywhere after many hours of searching.
I have some data, I run a classification then I make a dendrogram.
The problem has to do with aesthetics, specifically; (1) how to cut according to the number of groups (in this example I want 3), (2) make the group labels aligned with the branches of the trees, (2) Re-scale so that there aren't any huge gaps between the groups
More on (3). I have dataset which is very species rich and there would be ~1000 groups without cutting. If I cut at say 3, the tree has some branches on the right and one 'miles' off to the right which I would want to re-scale so that its closer. All of this is possible via external programs but I want to do it all in r!
Bonus points if you can put an average silhouette width plot nested into the top right of this plot
Here is example using iris data
library(ggplot2)
data(iris)
df = data.frame(iris)
df$Species = NULL
ED10 = vegdist(df,method="euclidean")
EucWard_10 = hclust(ED10,method="ward.D2")
hcd_ward10 = as.dendrogram(EucWard_10)
plot(hcd_ward10)
plot(cut(hcd_ward10, h = 10)$upper, main = "Upper tree of cut at h=75")
I suspect what you would want to look at is the dendextend R package (it also has a paper in bioinformatics).
I am not fully sure about your question on (3), since I am not sure I understand what rescaling means. What I can tell you is that you can do quite a lot of dendextend. Here is a quick example for coloring the branches and labels for 3 groups.
library(ggplot2)
library(vegan)
data(iris)
df = data.frame(iris)
df$Species = NULL
library(vegan)
ED10 = vegdist(df,method="euclidean")
EucWard_10 = hclust(ED10,method="ward.D2")
hcd_ward10 = as.dendrogram(EucWard_10)
plot(hcd_ward10)
install.packages("dendextend")
library(dendextend)
dend <- hcd_ward10
dend <- color_branches(dend, k = 3)
dend <- color_labels(dend, k = 3)
plot(dend)
You can also get an interactive dendrogram by using plotly (ggplot method is available through dendextend):
library(plotly)
library(ggplot2)
p <- ggplot(dend)
ggplotly(p)

plotting hypervolumes from hypervolume() in 3d

I am using the hypervolume package to construct hypervolumes in n-dimensional space.
I can't figure out how to tweak 3d plots when plot() is operated over an object of class 'Hypervolume' or 'HypervolumeList'. The arguments are listed in the package documentation for function 'plot.HypervolumeList' (pg 21).
Example using data included with package:
# Call data and run routine that will create a HypervolumeList
require(hypervolume)
data(finch)
doHypervolumeFinchDemo = T
demo(finch)
# Create 3d plot of HypervolumeList
plot(hv_finches_list, pairplot = F)
Specific questions follow. Included code that I tried--but doesn't work.
How do I change the size of the data points?
plot(hv_finches_list, pairplot = F, cex.data = 5)
How do I choose whether to plot the uniformly random points?
plot(hv_fiches_list, pairplot = F, showrandom = T)
Is it possible to add centroids or contours to the 3d plot?
plot(hv_finches_list, pairplot = F, showcentroid = T)

Resources