I am trying to create a dendrogram of the communities only of a network. The example code below gives me a dendrogram of all the nodes, but as I work with a relatively large dataset, I would like to create a dendrogram of only the comunities,so that I would have a smaller dendrogram with only the communities, is this possible?
library(igraph)
set.seed(1)
g001 <- erdos.renyi.game(100, 1/10, directed = FALSE)
fc01 <- fastgreedy.community(g001)
colors <- rainbow(max(membership(fc01)))
plot(g001, vertex.size=2, vertex.label=NA, vertex.color=colors[membership(fc01)] )
dendPlot(fc01, mode="phylo", cex=1)
Thank you.
The dendrogram class has a cut function you can use to split up a dendrogram at a certain height. The community algorithm seems to use heights based on how many objects there are. Therefore, given your fc01 object above, you can split it into subgroups with
ss01 <- cut(as.dendrogram(fc01), h=length(membership(fc01))-length(fc01))$lower
That created 5 groups for me. We can plot the entire set and the 5 subsets with
layout(matrix(1:6, nrow=2))
dendPlot(fc01, mode="hclust")
lapply(ss01, plot, cex=1)
So each sub-graph is in ss01[[1]], ss02[[2]], etc...
Related
I am interested in visualizing the results of a hierarchical cluster analysis. Is it possible to use a dendrogram to display the names or labels of clusters (and subclusters) without displaying the original cases that went into the cluster analysis?
For example, this code applies a hierarchical cluster analysis to the mtcars dataset.
data("mtcars")
clust <- hclust(get_dist(mtcars, method = "pearson"), method = "complete")
plot(clust)
Let's say I cut the tree at 4 clusters and rename the clusters "sedan", "truck", "sportscar", and "van" (totally arbitrary labels).
clust1 <- cutree(clust,4)
clust1 <- dplyr::recode(clust1,
'1'='sedan',
'2'='truck',
'3'='sportscar',
'4'='van')
Is it possible to display a dendrogram which shows these four labels as the nodes on the bottom of the tree, suppressing the names of the original car names?
I am also interested in displaying subclusters within clusters in a similar way, but that may be outside the scope of this question. Bonus points if you can also give a suggestion for how to display subclusters within clusters in a dendrogram while suppressing the names of the original cases! :)
Thank you in advance!
Yes, you can do this. I do not understand your get_dist so I will illustrate using the ordinary distance dist.
data("mtcars")
clust <- hclust(dist(mtcars), method = "complete")
To cut off and display just the top of the tree, change it to a dendrogram and use upper. But you need to know what to height to cut it at. That is in the structure clust.
tail(clust$height)
[1] 113.3023 134.8119 141.7044 214.9367 261.8499 425.3447
Since you want four branches, you can cut at any height between the third and fourth heights (from the end). I will use 213.
MTC_Dend = as.dendrogram(clust)
TreeTop = cut(MTC_Dend, h = 213)$upper
You can get the basic plot now with plot(TreeTop), but it won't have the labels that you want. To change the labels, use the package dendextend which offers a tool specifically to change the labels.
library("dendextend")
labels(TreeTop) = c('sedan','truck', 'sportscar', 'van')
plot(TreeTop)
I want to color the branches of a dendrogram based on the value in a column of a dataframe used in the hclust function.
Before you mark this question as duplicate as was done in this question, which links to this question. Note that this was actually never addressed fully in the answer. It is easy to color branches based on the topology of the dendrogram, but I cannot figure out how to color branches based on a column in the dataframe that was used in the hclust function.
I've tried using the dendextend package in two very similar ways:
library(dendextend)
par(mar = c(2,1,0,8)) #make sure the whole plot is on the page
hc <- hclust(dist(mtcars)) #cluster dataframe based on distance
dend <- as.dendrogram(hc) #use dendextend to create dendrogram
dend2 <- color_branches(dend, col = mtcars$cyl) #attempt but fail at coloring branches
plot (dend2, horiz = TRUE) #plot dendrogram
and
dend3 <- assign_values_to_leaves_edgePar(dend, value = mtcars$cyl, edgePar = "col") #attempt but fail at coloring branches
plot (dend3, horiz = TRUE) #plot dendrogram
replacing mtcars$cyl with factor(mtcars$cyl doesnt solve the problem either.
Both of these solutions produce a dendrogram that is not properly colored.
It appears that it is ordering the colors from the bottom to the top of the dendrogram based on the order of the values in the cyl column, but since the branches are no longer in that order, the coloring doesn't make any sense. I would prefer not to sort the dataframe as a way around this problem.
Thanks.
You need to put the colors in the order of the leaves of the dendrogram. You can use labels() to extract the names used on the leaves
dend2 <- color_branches(dend, col=mtcars[labels(dend),"cyl"])
I have an hclust tree with nearly 2000 samples. I have cut it to an appropriate number of clusters and would like to plot the dendrogram but ending at the height that I cut the clusters rather than all the way to every individual leaf. Every plotting guide is about coloring all the leaves by cluster or drawing a box, but nothing seems to just leave the leaves below the cut line out completely.
My full dendrogram looks like the following:
I would like to plot it as if it stops where I've drawn the abline here (for example):
This should get you started. I suggest reading the help page for "dendrogram"
Here is the example from the help page:
hc <- hclust(dist(USArrests))
dend1 <- as.dendrogram(hc)
plot(dend1)
dend2 <- cut(dend1, h = 100)
plot(dend2$upper)
plot(dend2$upper, nodePar = list(pch = c(1,7), col = 2:1))
By performing the cut on the dendrogram object (not the hclust object) you can then plot the upper part of the dendrogram. It will take a some work to replace the branch1, 2, 3, and 4 labels depending on your analysis.
Good luck.
I would like to compare two plots of graphs(an observed graph and a simulated one) that have the exact same nodes.
I would like to keep the nodes position fix so I can compare the difference in the edge's distribution.
I have tried set.seed but it's just keeping the plot identical every time I run it.
Is there a way to take the layout of a graph and use it for the other?
Thanks,
Fwiw, I guess you can use the layout argument of plot:
library(igraph)
set.seed(1)
g1 <- ba.game(20, dir=F)
g2 <- ba.game(20, dir=F)
par(mfrow = c(1, 2))
coords <- layout.fruchterman.reingold(g1)
plot(g1, layout = coords)
plot(g2, layout = coords)
I am trying to display a hierarchical cluster as a venn diagram or any other useful display BESIDES a dendrogram. I want to be able to display my data in many different view types.
Currently doing this will plot a dendrogram:
x <- hclust(dist(mtcars))
plot(x)
What can I do to display a cluster diagram that LOOKS like this:
https://www.projectrhea.org/rhea/images/3/3b/Lecture23VennClusters_OldKiwi.jpg
or this
http://bl.ocks.org/mbostock/7607535
or anything else that makes sense for displaying cluster data in this example.
Preferably I want to be able to do this in Shiny, but a simple R example will suffice. Thank you in advance.
The plots you showed are cluster plots. There are different ways to make these plots. Here's one approach. You can vary the symbols, or turn them off, and likewise for fill, as desired. Also, there are options for dendrogram plotting, ie here
library(cluster)
head(mtcars)
fit <- kmeans(mtcars, 3) # 3 clusters
aggregate(mtcars, by=list(fit$cluster), mean)
newmtcars <- data.frame(mtcars, fit$cluster)
head(newmtcars)
# plot cluster solution
library(cluster)
clusplot(mtcars, fit$cluster,
color=TRUE, shade=TRUE, lines=0)
refs: http://www.statmethods.net/advstats/cluster.html
https://stats.stackexchange.com/questions/31083/how-to-produce-a-pretty-plot-of-the-results-of-k-means-cluster-analysis
I'm not sure how a Venn diagram would differ from the above plot. Maybe there needs to overlapping groups. That depends on the data and the tree command. Could try varying the tree command, in this case kmeans, shows a small overlap when the number of iterations is selected.
fit <- kmeans(mtcars, 3, iter.max = 2) # 3 clusters, low number of iterations
clusplot(mtcars, fit$cluster,
color=TRUE, shade=FALSE, lines=0)
One approach to do this with hierarchical clustering is to extract the groups from the tree, and then use clusplot on the resulting groups.
fit <- hclust(dist(mtcars))
groups <- cutree(fit, k=3)
clusplot(mtcars, groups[rownames(mtcars)],
color=TRUE, shade=FALSE, lines=0)
To see how the data segments with more cuts in a tree, including hierarchial tree, one approach is to use cut followed by clusplot
heir_tree_fit <- hclust(dist(mtcars))
for (ncut in seq(1,10)) {
group <- cutree(heir_tree_fit, k=ncut)
clusplot(mtcars, group[rownames(mtcars)],
color=TRUE, shade=FALSE, lines=0, main=paste(ncut,"cuts"))
}
Here are the figures for 2, 6, and 10 cuts
You can make one plot with all the cuts
par(new=FALSE)
for (ncut in seq(1,10)) {
group <- cutree(heir_tree_fit, k=ncut)
clusplot(mtcars, group[rownames(mtcars)],
color=TRUE, shade=FALSE, lines=0, xlim=c(-5,5),ylim=c(-5,5))
par(new=TRUE)
}
par(new=FALSE)
Another approach to making a Venn diagram of hierarchical clustering is to extract the groups from the tree, and then use vennDiagram on the resulting groups.
# To make a Venn diagram
# source("http://bioconductor.org/biocLite.R")
biocLite("limma")
library(limma)
inGrp1 <- groups==1
inGrp2 <- groups==2
inGrp3 <- groups==3
vennData <- cbind(inGrp1, inGrp2, inGrp3)
aVenn <- vennCounts(vennData)
vennDiagram(aVenn)