I am trying to create circular phylogenetic tree. I have this part of code:
fit<- hclust(dist(Data[,-4]), method = "complete", members = NULL)
nclus= 3
color=c('red','blue','green')
color_list=rep(color,nclus/length(color))
clus=cutree(fit,nclus)
plot(as.phylo(fit),type='fan',tip.color=color_list[clus],label.offset=0.2,no.margin=TRUE, cex=0.70, show.node.label = TRUE)
And this is result:
Also I am trying to show label for each node and to color branches. Any suggestion how to do that?
Thanks!
When you say "color branches" I assume you mean color the edges. This seems to work, but I have to think there's a better way.
Using the built-in mtcars dataset here, since you did not provide your data.
plot.fan <- function(hc, nclus=3) {
palette <- c('red','blue','green','orange','black')[1:nclus]
clus <-cutree(hc,nclus)
X <- as.phylo(hc)
edge.clus <- sapply(1:nclus,function(i)max(which(X$edge[,2] %in% which(clus==i))))
order <- order(edge.clus)
edge.clus <- c(min(edge.clus),diff(sort(edge.clus)))
edge.clus <- rep(order,edge.clus)
plot(X,type='fan',
tip.color=palette[clus],edge.color=palette[edge.clus],
label.offset=0.2,no.margin=TRUE, cex=0.70)
}
fit <- hclust(dist(mtcars[,c("mpg","hp","wt","disp")]))
plot.fan(fit,3); plot.fan(fit,5)
Regarding "label the nodes", if you mean label the tips, it looks like you've already done that. If you want different labels, unfortunately, unlike plot.hclust(...) the labels=... argument is rejected. You could experiment with the tiplabels(....) function, but it does not seem to work very well with type="fan". The labels come from the row names of Data, so your best bet IMO is to change the row names prior to clustering.
If you actually mean label the nodes (the connection points between the edges, have a look at nodelabels(...). I don't provide a working example because I can't imagine what labels you would put there.
Related
This is the function that is part of FactorMiner package
https://github.com/cran/FactoMineR/blob/master/R/plot.HCPC.R
As an example this is the code I ran
res.pca <- PCA(iris[, -5], scale = TRUE)
hc <- HCPC(res.pca, nb.clust=-1,)
plot.HCPC(hc, choice="3D.map", angle=60)
hc$call$X$clust <- factor(hc$call$X$clust, levels = unique(hc$call$X$clust))
plot(hc, choice="map")
The difference is when i run this hc$call$X$clust <- factor(hc$call$X$clust, levels = unique(hc$call$X$clust))
before plot.HCPC this doesn't change the annotation in the figure but when I do the same thing before I ran this plot(hc, choice="map") it is reflected in the final output.
When i see the plot.HCPC function this is the line of the code that does embed the cluster info into the figure
for(i in 1:nb.clust) leg=c(leg, paste("cluster",levs[i]," ", sep=" "))
legend("topleft", leg, text.col=as.numeric(levels(X$clust)),cex=0.8)
My question I have worked with small function where I understand when i edit or modify which one goes where and does what here in this case its a complicated function at least to me so Im not sure how do I modify that part and get what I would like to see.
I would like to see in case of my 3D dendrogram each of the cluster are labelled with group the way we can do in complexheatmap where we can annotate that are in row or column with a color code so it wont matter what the order in the data-frame we can still identify(it's just visual thing I know but I would like to learn how to modify these)
I have a dataset called data. The data is not that important, but every interaction has a name. I want to create a graph in iGraph with the following code:
tab <- count(data, B, S, K)
factors <- table(interaction(tab$B, tab$K),interaction(tab$S,tab$K))
graph1 <- graph_from_incidence_matrix(factors)
plot(graph1, vertex.size = 40, layout = layout.bipartite)
However, I get the following:
All the names of interactions are completely mixed together. I can make it a little more readable by lowering the vertex.size, but I want to find a solution to my problem.
I want to create more space between the verticies, but I cannot seem to find the right way.
I have tried creating a manual graph by using tkplot, but it is annoying that I manually have to sort them each time.
Best regards
I asked a number of different experts to sort 92 objects based on their similarity. Based on their answers, I constructed a 92 x 92 dissimilarity matrix. in R, I examined this matrix using the following commands:
cluster1 <- hclust(as.dist(DISS_MATRIX), method = "average")
plot(cluster1, cex=.55)
To highlight the clusters, I wanted to draw rectangles around them:
rect.hclust(cluster1, k = 3, border = "red")
The result is as follows:
However, when the objects have longer names ("AAAAAAAAAAAAAAAA43" instead of "A43") then the formating is off:
rownames(DISS_MATRIX) <- paste0(rep("AAAAAAAAAAAAAAAAAAAAAAAAAAAA",92),1:92)
colnames(DISS_MATRIX) <- paste0(rep("AAAAAAAAAAAAAAAAAAAAAAAAAAAA",92),1:92)
cluster1 <- hclust(as.dist(DISS_MATRIX), method = "average")
plot(cluster1, cex=.55)
rect.hclust(cluster1, k = 3, border = "red")
This can be seen by the resulting dendogram.
The rectangles seem to have moved up to the end of the dendogram. Not nice. I assume this glitch must have been due to the long names of 92 objects in the dissimilarity matrix. It may also not seem very relevant. Just make sure your objects have names short enough.
However, due to different reasons I want my objects to have their original (i.e.admittedly long) names. This graph is for a presentation and thus I do not want to work with codes. I also do not want to use any other package since I generally find hclust quite easy to use. However, I do not find any way to position rectangles within the rect.hclust command. Hence, what can I do to position the rectangles into the dendogram even if object names are long? Thanks.
You wrote that "I also do not want to use any other package since I generally find hclust quite easy to use."
While hclust is great for creating the hierarchical clustering object it does not support much in terms of plotting. Once you have the hclust output, it is better to change it to dendrogram (using as.dendrogram) for visualizations (since it is better suited for that). There is no way to do what you want without using sophisticated code, which is packed in a package, this is the best route (IMHO) for you to move forward. (I know because I wrote rect.dendrogram, and it took a lot of work to get it to work the way you want it)
The dendextend R package allows many functions for manipulating and visualizing dendrograms (see the vignette here).
Specifically, the rect.dendrogram function can handle such cases as you asked about (with having long labels). For example (I've added color_branches and color_labels for the fun of it):
library(dendextend)
hc <- mtcars[, c("mpg", "disp")] %>% dist %>% hclust(method = "average")
dend <- hc %>% as.dendrogram %>% hang.dendrogram
# let's make the text longer
labels(dend)[1] <- "AAAAAAAAAAAAAAAAAAAAA"
par(mar = c(15,2,1,1))
dend %>% color_branches(k=3) %>% color_labels(k=3) %>% plot
dend %>% rect.dendrogram(k=3)
I'm looking to write some simple code that will select for certain clusters below a threshold height and highlight them (either with a box or by colour).
So far I have used cutree, which selects the clusters I am after, but it also selects all the clusters of size 1.
I've managed to use which to select the clusters I actually want, but as this is only a very small section of the data I have I don't want to have to go through manually to choose these. Is there a way that I can cut the tree but only select clusters bigger than one?
This is the code I'm using at the moment:
plot(hClust,hang = -1,cex=0.5)
abline(h= 0.0018,col = 'blue')
ct <- cutree(hClust, h = 0.0018)
clust <- rect.hclust(hClust, h=0.0018, which = c(1,2,4,8,23))
You do not provide your data so I will illustrate with the built-in mtcars data. Of course, the heights are different than yours. Same set-up as your problem:
hClust =hclust(dist(mtcars))
plot(hClust,hang = -1, cex=0.8)
abline(h= 28,col = 'blue')
Now we can call rect.hclust without printing (border=0), to get the clusters numbered as rect.hclust see them. Then we can select the clusters with more than one point and put the boxes around those.
clust <- rect.hclust(hClust, h=28, border=0)
NumMemb = sapply(clust, length)
clust <- rect.hclust(hClust, h=28, which=which(NumMemb>1))
I have a tree with a lot of branches. Here is my code to plot the tree. The problem is that the labels overlap each other, specially towards the bottom of the tree. Is there any way to plot the tree so that the labels don't overlap?
par(mfrow=c(1,1))
plot(prunedTree, type=c("uniform"))
text(prunedTree)
Note--I used "type=c("uniform"))" because it helped readability the lower branches. Also, prunedTree is the class "tree" from the tree package.
Here's a sample of what is being produced currently.
EDIT: Code to fully reproduce the issue.
load(url("https://spark-public.s3.amazonaws.com/dataanalysis/samsungData.rda"))
samsungData$subject <- factor(samsungData$subject)
samsungData$activity <- factor(samsungData$activity)
samsungData <- samsungData[, !c(duplicated(names(samsungData)))]
names(samsungData) <- gsub("[.]", "", names(samsungData))
samsungData <- data.frame(samsungData)
trainDF <- samsungData[samsungData$subject %in% c(1,3,5,6),]
tree1 <- tree(activity ~ ., data=trainDF)
plot(tree1)
text(tree1)
You have several general options:
Use a wider graphics device. (i.e. png(...,width = 1200,height = ...))
Shrink the text using cex = 0.5 (or smaller)
Use more concise column (i.e. variable) names
Some combination of the previous three.
I thought I could get text.tree to use fewer significant digits in labeling the splits, but I can't seem to do that. rpart appears to use only 4 digits by default, so that would save you some space as well.
In addition to joran indications listed above, you can play with parameters:
srt to rtotate your text.
give different colors for text
For example :
plot(tree1)
text(tree1,col=rainbow(5)[1:25],srt=85,cex=0.8)