Dendrogram modification using dendextend in R - r

I am trying to modify and tweak cluster dendrogram using dendextend, using below codes:
# prepare hierarchical cluster
hc = hclust(dist(mtcars))
dend <- as.dendrogram(hc)
dend %>% set("branches_lty", 3) %>% plot()
Please how can i set branches_lty for a specific K cluster?
Also, i want to modify and align the leave text to a give length and indent as shown in the picture.
I attach an example picture to see, i can’t achieve it with dendextend package.
NB:
I can plot it using A2Rplot, but i cant modify it. is it possible to use both?
# load code of A2R function
source("http://addictedtor.free.fr/packages/A2R/lastVersion/R/code.R")
# colored dendrogram
op = par(bg = "#EFEFEF")
A2Rplot(hc, k = 3, boxes = FALSE, col.up = "gray50", col.down = c("#FF6B6B", "#4ECDC4", "#556270"))

You can solve this using set("branches_k_lty", k= 3), for example:
library(dendextend)
hc = hclust(dist(mtcars))
dend <- as.dendrogram(hc)
dend %>% set("branches_k_lty", k= 3) %>% plot()

Related

ggdendrogram : adding colored rectangles for each cluster

I am not able to add colored rectangles around the chosen clusters.
library(lattice)
library(permute)
library(vegan)
library("ggplot2")
library("ggdendro")
library("dendextend")
data(dune)
d <- vegdist(dune)
csin <- hclust(d, method = "aver")
ggdendrogram(csin)
rect.dendrogram(csin, 3, border = 1:3)
I get this answer:
"Error in rect.dendrogram(csin, 3, border = 1:3) :
x is not a dendrogram object."
Although csin is the dendrogram object. Does anyone have a clue?
As I wrote in the comments:
csin is hclust and not a dendrogram (use as.dendrogram to make it into a dendrogram)
rect.dendrogram works with base R plots, not ggplot2.
Here is a simple example of making your rect.dendrogram work:
library("dendextend")
d <- dist(iris[,-5])
csin <- as.dendrogram(hclust(d, method = "aver"))
plot(csin)
rect.dendrogram(csin, 3, border = 1:3)
The output:

How to rotate the plot in r base package graphics?

I know this is a little bit too much, but I am plotting a dendrogram plot in r, and here is my code:
dd <- dist(scale(full[,c(1,2,3,4)]),method="euclidean")
hc = hclust(dd,method="ward.D2")
dend <- color_branches(as.dendrogram(hc),6)
labels_colors(dend) <-
rainbow_hcl(6)[sort_levels_values(
as.numeric(classified[, 9])[order.dendrogram(dend)]
)]
plot(dend,horiz=T)
and I got this plot:
Is there any way can do mirror symmetry to make it like this:(please ignore the difference in colour)
plot_horiz.dendrogram(dend, side = TRUE)
should do the trick. See https://rdrr.io/cran/dendextend/f/vignettes/FAQ.Rmd

FactoMineR/factoextra visualize all the clusters in the dendrogram

I performed a hierarchical clustering on a dataframe using the HCPC function of the package FactoMineR. Problem is, I cannot visualize the number of clusters I asked when I draw the dendrogram using factoextra.
Here is below a reproducible example of my problem
model <- HCPC(iris[,1:4], nb.clust = 5)
there are indeed 5 clusters above
fviz_dend(model, k = 5,
cex = 0.7,
palette = "default",
rect = TRUE, rect_fill = TRUE,
)
But just 3 mapped within the dendrogram
I bumped into the same problem: the fviz_dend function would always return what it considers to be the optimal amount of clusters, even when I tried to override this – either in the HCPC or in the fviz_dend functions.
One way to fix this while sticking to FactoMineR and factoextra would be to change the default amount of clusters calculated by the HCPC function:
model$call$t$nb.clust = 5
And then run the fviz_dend function.
This should return the result that you were expecting.
You can just use the dendextend R package with the color_branches function:
library(dendextend)
dend <- USArrests %>% dist %>% hclust(method = "ave") %>% as.dendrogram
dd <- color_branches(dend,5)
plot(dd)

R getting subtrees from dendrogram based on cutree labels

I have clustered a large dataset and found 6 clusters I am interested in analyzing more in depth.
I found the clusters using hclust with "ward.D" method, and I would like to know whether there is a way to get "sub-trees" from hclust/dendrogram objects.
For example
library(gplots)
library(dendextend)
data <- iris[,1:4]
distance <- dist(data, method = "euclidean", diag = FALSE, upper = FALSE)
hc <- hclust(distance, method = 'ward.D')
dnd <- as.dendrogram(hc)
plot(dnd) # to decide the number of clusters
clusters <- cutree(dnd, k = 6)
I used cutree to get the labels for each of the rows in my dataset.
I know I can get the data for each corresponding cluster (cluster 1 for example) with:
c1_data = data[clusters == 1,]
Is there any easy way to get the subtrees for each corresponding label as returned by dendextend::cutree? For example, say I am interesting in getting the
I know I can access the branches of the dendrogram doing something like
subtree <- dnd[[1]][[2]
but how I can get exactly the subtree corresponding to cluster 1?
I have tried
dnd[clusters == 1]
but this of course doesn't work. So how can I get the subtree based on the labels returned by cutree?
================= UPDATED answer
This can now be solved using the get_subdendrograms from dendextend.
# needed packages:
# install.packages(gplots)
# install.packages(viridis)
# install.packages(devtools)
# devtools::install_github('talgalili/dendextend') # dendextend from github
# define dendrogram object to play with:
dend <- iris[,-5] %>% dist %>% hclust %>% as.dendrogram %>% set("labels_to_character") %>% color_branches(k=5)
dend_list <- get_subdendrograms(dend, 5)
# Plotting the result
par(mfrow = c(2,3))
plot(dend, main = "Original dendrogram")
sapply(dend_list, plot)
This can also be used within a heatmap:
# plot a heatmap of only one of the sub dendrograms
par(mfrow = c(1,1))
library(gplots)
sub_dend <- dend_list[[1]] # get the sub dendrogram
# make sure of the size of the dend
nleaves(sub_dend)
length(order.dendrogram(sub_dend))
# get the subset of the data
subset_iris <- as.matrix(iris[order.dendrogram(sub_dend),-5])
# update the dendrogram's internal order so to not cause an error in heatmap.2
order.dendrogram(sub_dend) <- rank(order.dendrogram(sub_dend))
heatmap.2(subset_iris, Rowv = sub_dend, trace = "none", col = viridis::viridis(100))
================= OLDER answer
I think what can be helpful for you are these two functions:
The first one just iterates through all clusters and extracts substructure. It requires:
the dendrogram object from which we want to get the subdendrograms
the clusters labels (e.g. returned by cutree)
Returns a list of subdendrograms.
extractDendrograms <- function(dendr, clusters){
lapply(unique(clusters), function(clust.id){
getSubDendrogram(dendr, which(clusters==clust.id))
})
}
The second one performs a depth-first search to determine in which subtree the cluster exists and if it matches the full cluster returns it. Here, we use the assumption that all elements of a cluster are in one subtress. It requires:
the dendrogram object
positions of the elements in cluster
Returns a subdendrograms corresponding to the cluster of given elements.
getSubDendrogram<-function(dendr, my.clust){
if(all(unlist(dendr) %in% my.clust))
return(dendr)
if(any(unlist(dendr[[1]]) %in% my.clust ))
return(getSubDendrogram(dendr[[1]], my.clust))
else
return(getSubDendrogram(dendr[[2]], my.clust))
}
Using these two functions we can use the variables you have provided in the question and get the following output. (I think the line clusters <- cutree(dnd, k = 6) should be clusters <- cutree(hc, k = 6) )
my.sub.dendrograms <- extractDendrograms(dnd, clusters)
plotting all six elements from the list gives all subdendrograms
EDIT
As suggested in the comment, I add a function that as an input takes a dendrogram dend and the number of subtrees k, but it still uses the previously defined, recursive function getSubDendrogram:
prune_cutree_to_dendlist <- function(dend, k, order_clusters_as_data=FALSE) {
clusters <- cutree(dend, k, order_clusters_as_data)
lapply(unique(clusters), function(clust.id){
getSubDendrogram(dend, which(clusters==clust.id))
})
}
A test case for 5 substructures:
library(dendextend)
dend <- iris[,-5] %>% dist %>% hclust %>% as.dendrogram %>% set("labels_to_character") %>% color_branches(k=5)
subdend.list <- prune_cutree_to_dendlist(dend, 5)
#plotting
par(mfrow = c(2,3))
plot(dend, main = "original dend")
sapply(prunned_dends, plot)
I have performed some benchmark using rbenchmark with the function suggested by Tal Galili (here named prune_cutree_to_dendlist2) and the results are quite promising for the DFS approach from the above:
library(rbenchmark)
benchmark(prune_cutree_to_dendlist(dend, 5),
prune_cutree_to_dendlist2(dend, 5), replications=5)
test replications elapsed relative user.self
1 prune_cutree_to_dendlist(dend, 5) 5 0.02 1 0.020
2 prune_cutree_to_dendlist2(dend, 5) 5 60.82 3041 60.643
I wrote now function prune_cutree_to_dendlist to do what you asked for. I should add it to dendextend at some point in the future.
In the meantime, here is an example of the code and output (the function is a bit slow. Making it faster relies on having prune be faster, which I won't get to fixing in the near future.)
# install.packages("dendextend")
library(dendextend)
dend <- iris[,-5] %>% dist %>% hclust %>% as.dendrogram %>%
set("labels_to_character")
dend <- dend %>% color_branches(k=5)
# plot(dend)
prune_cutree_to_dendlist <- function(dend, k) {
clusters <- cutree(dend,k, order_clusters_as_data = FALSE)
# unique_clusters <- unique(clusters) # could also be 1:k but it would be less robust
# k <- length(unique_clusters)
# for(i in unique_clusters) {
dends <- vector("list", k)
for(i in 1:k) {
leves_to_prune <- labels(dend)[clusters != i]
dends[[i]] <- prune(dend, leves_to_prune)
}
class(dends) <- "dendlist"
dends
}
prunned_dends <- prune_cutree_to_dendlist(dend, 5)
sapply(prunned_dends, nleaves)
par(mfrow = c(2,3))
plot(dend, main = "original dend")
sapply(prunned_dends, plot)
How did you get 6 clusters using hclust? You can cut the tree at any point, so you just ask cuttree to give you more clusters:
clusters = cutree(hclusters, number_of_clusters)
If you have a lot of data this may not be very handy though. In these cases what I do is manually picking the clusters that I want to study further and then running hclust only on the data in these clusters. I don't know of any functionality in hclust that allows you to do this automatically, but it's quite easy:
good_clusters = c(which(clusters==1),
which(clusters==2)) #or whichever cLusters you want
new_df = df[good_clusters,]
new_hclusters = hclust(new_df)
new_clusters = cutree(new_hclusters, new_number_of_clusters)

Labelling circular dendextend dendrogram

I'm trying to plot a circular dendrogram of compositional data. Using the following code:
library(dendextend)
library(circlize)
library(compositions)
data("Hydrochem")
hydro<-Hydrochem
d <- dist(hydro[7:19], method="euclidean")
hc <- hclust(d, method = "average")
dend <- as.dendrogram(hc)
hydro$River <- as.character(hydro$River)
labels(dend) <- hydro$River[order.dendrogram(dend)]
plot(dend)
I can get a normal dendrogram of what I want with the correct label orders.
But when I run circlize_dendrogram(dend), I get this:
What's vexing me is the dendrogram in the middle - when I don't use the order of the dendrogram for the labels (i.e. just typing labels(dend) <- hydro$River), the inner dendrogram is fine and everything looks great.
I've tried altering the labels_track_height and dend_track_height settings to no avail, and when I run the same process on smaller toy datasets this issue doesn't arise.
Any ideas?
So you actually have two problems surfacing in your code:
1. The labels are not unique.
2. The plot does not give enough room for the labels, after you've updated them in the dendrogram object
The first problem can be solved by adding numbers to the non-unique labels you supply, thus making them unique. The solution for the second problem is to play with the labels_track_height argument in the circlize_dendrogram function. Here is the updated code (notice the last line, where the difference is):
library(dendextend)
library(circlize)
library(compositions)
data("Hydrochem")
hydro<-Hydrochem
d <- dist(hydro[7:19], method="euclidean")
hc <- hclust(d, method = "average")
dend <- as.dendrogram(hc)
tmp <- as.character(hydro$River)[order.dendrogram(dend)]
labels(dend) <- paste0(seq_along(tmp), "_", tmp)
plot(dend)
circlize_dendrogram(dend, labels_track_height = 0.4)
The output you get is this:
(This is now done automatically in dendextend 1.6.0, currently available on github - and later on also on CRAN)
So, the solution to this problem (if anyone can provide more details please do, because I don't really understand why this matters at all) is to add a second dend <- as.dendrogram(hc) call after defining the labels. So, the code looks like this:
d <- dist(hydro[7:19], method="euclidean")
hc <- hclust(d, method = "average")
dend <- as.dendrogram(hc)
hydro$River <- as.character(hydro$River)
labels(dend) <- hydro$River[order.dendrogram(dend)]
dend <- as.dendrogram(hc)
circlize_dendrogram(dend)
NOTE by another user: this does not solve the question.

Resources