How do I label the terminal nodes of a cut dendrogram? - r

I used the following code to cut the dendrogram at a particular height.The problem I'm having is that when I cut a dendrogram, I can't figure out how to add labels to the nodes.How can I cut a dendrogram with labels using R program?
library(Heatplus)
cc=as.dendrogram(hclust(as.dist(mat),method="single"))
cutplot.dendrogram(cc,h=20)

After a fair amount of digging into the help documentation for ?dendrogram, I stumbled on the dendrapply function that contains an example to do something very similar. Here is your solution, based on a modification of the example in ?dendrapply:
Create dendrogram and cut at height h=20:
dhc <- as.dendrogram(hc <- hclust(dist(USArrests), "ave"))
chc <- cut(dhc, h=20)$upper
Define a vector with the newLabels, and a function newLab that modifies an individual node label. Then pass this to dendrapply:
newLabels <- paste("Custom", 1:22, sep="_")
local({
newLab <<- function(n) {
if(is.leaf(n)) {
a <- attributes(n)
i <<- i+1
attr(n, "label") <- newLabels[i]
}
n
}
i <- 0
})
nhc <- dendrapply(chc, newLab)
labels(nhc)
[1] "Custom_1" "Custom_2" "Custom_3" "Custom_4" "Custom_5" "Custom_6"
[7] "Custom_7" "Custom_8" "Custom_9" "Custom_10" "Custom_11" "Custom_12"
[13] "Custom_13" "Custom_14" "Custom_15" "Custom_16" "Custom_17" "Custom_18"
[19] "Custom_19" "Custom_20" "Custom_21" "Custom_22"
plot(nhc)

Here is a modified solution for what Andrie wrote, but using a new package called "dendextend", built exactly for this sort of thing.
You can see many examples in the presentations and vignettes of the package, in the "usage" section in the following URL: https://github.com/talgalili/dendextend
Here is the solution for this question:
# define dendrogram object to play with:
dhc <- as.dendrogram(hc <- hclust(dist(USArrests), "ave"))
chc <- cut(dhc, h=20)$upper
# loading the package
require(dendextend)# let's add some color:
# change labels with a simple assignment:
labels(chc) <- paste("Custom", 1:22, sep="_")
plot(chc)
For installing the package (since I have yet to upload it to CRAN), use:
####################
## installing dendextend for the first time:
if (!require('installr')) install.packages('installr'); require('installr')
## install.Rtools() # run this if you are using Windows and don't have Rtools
require2(devtools)
install_github('dendextend', 'talgalili')
require2(Rcpp)
install_github('dendextendRcpp', 'talgalili')
Best,
Tal

cc$labels
This is a vector of all the elements in the dendogram.
cc$labels <- myVector
You can add in your own vector to change the labels

Related

Return graph from function without plotting it

I want to write function which returns a graph but it should not plot the graph. It should only plot the graph when I ask it to.
Here is a MWE.
graph_functions <- function(x) {
plot(1:length(x), x)
points(1:length(x), x^2)
t <- recordPlot()
return(t)
}
answer <- graph_functions(1:10)
library(cowplot)
plot_grid(answer, answer)
In the above code I do not want it to plot the graph when I first compute the answer by calling graph_functions(1:10). I only want it to plot the graph when I use plot_grid().
graph_functions<- function(x) {
plot(1:length(x),x)
points(1:length(x),x^2)
t<- recordPlot()
return(t)
}
answer <- c(1:10)
library(cowplot)
plot_grid(graph_functions(answer),graph_functions(answer))
You could put the function inside the plot_grid() function and just store the parameters in the answer variable.
You can open a null device and render to it. Note that if you're using cowplot with base-R graphics, you should upgrade to the development version, with devtools::install_github("wilkelab/cowplot"). It provides much improved handling of base-R graphics.
graph_functions <- function(x) {
cur_dev <- grDevices::dev.cur() # store current device
pdf(NULL, width = 6, height = 6) # open null device
grDevices::dev.control("enable") # turn on recording for the null device
null_dev <- grDevices::dev.cur() # store null device
# make sure we always clean up properly, even if something causes an error
on.exit({
grDevices::dev.off(null_dev)
if (cur_dev > 1) grDevices::dev.set(cur_dev) # only set cur device if not null device
})
# plot
plot(1:length(x), x)
points(1:length(x), x^2)
recordPlot()
}
answer1 <- graph_functions(1:10)
answer2 <- graph_functions(1:20)
cowplot::plot_grid(answer1, answer2)
Created on 2018-12-04 by the reprex package (v0.2.1)

Labelling circular dendextend dendrogram

I'm trying to plot a circular dendrogram of compositional data. Using the following code:
library(dendextend)
library(circlize)
library(compositions)
data("Hydrochem")
hydro<-Hydrochem
d <- dist(hydro[7:19], method="euclidean")
hc <- hclust(d, method = "average")
dend <- as.dendrogram(hc)
hydro$River <- as.character(hydro$River)
labels(dend) <- hydro$River[order.dendrogram(dend)]
plot(dend)
I can get a normal dendrogram of what I want with the correct label orders.
But when I run circlize_dendrogram(dend), I get this:
What's vexing me is the dendrogram in the middle - when I don't use the order of the dendrogram for the labels (i.e. just typing labels(dend) <- hydro$River), the inner dendrogram is fine and everything looks great.
I've tried altering the labels_track_height and dend_track_height settings to no avail, and when I run the same process on smaller toy datasets this issue doesn't arise.
Any ideas?
So you actually have two problems surfacing in your code:
1. The labels are not unique.
2. The plot does not give enough room for the labels, after you've updated them in the dendrogram object
The first problem can be solved by adding numbers to the non-unique labels you supply, thus making them unique. The solution for the second problem is to play with the labels_track_height argument in the circlize_dendrogram function. Here is the updated code (notice the last line, where the difference is):
library(dendextend)
library(circlize)
library(compositions)
data("Hydrochem")
hydro<-Hydrochem
d <- dist(hydro[7:19], method="euclidean")
hc <- hclust(d, method = "average")
dend <- as.dendrogram(hc)
tmp <- as.character(hydro$River)[order.dendrogram(dend)]
labels(dend) <- paste0(seq_along(tmp), "_", tmp)
plot(dend)
circlize_dendrogram(dend, labels_track_height = 0.4)
The output you get is this:
(This is now done automatically in dendextend 1.6.0, currently available on github - and later on also on CRAN)
So, the solution to this problem (if anyone can provide more details please do, because I don't really understand why this matters at all) is to add a second dend <- as.dendrogram(hc) call after defining the labels. So, the code looks like this:
d <- dist(hydro[7:19], method="euclidean")
hc <- hclust(d, method = "average")
dend <- as.dendrogram(hc)
hydro$River <- as.character(hydro$River)
labels(dend) <- hydro$River[order.dendrogram(dend)]
dend <- as.dendrogram(hc)
circlize_dendrogram(dend)
NOTE by another user: this does not solve the question.

Getting a dendrogram's branch lengths in a breadth-first-search order

Is there any R function to retrieve the branch lengths of a dendrogram:
set.seed(1)
mat <- matrix(rnorm(100*10),nrow=100,ncol=10)
dend <- as.dendrogram(hclust(dist(t(mat))))
in a breadth-first-search order?
For dend I'd like to get this result:
c(16.38688,15.41441,15.99504,14.68365,13.52949,14.39275,12.96921,13.91157,13.15395)
which is node depths (excluding leaves) ordered by bps.
Thanks
You can easily code one like this:
dendro_depth <- function(dendro){
if(!is.null(attributes(dendro)$leaf))
0
else
max(dendro_depth(dendro[[1]]),dendro_depth(dendro[[2]])) +1
}
See get_branches_heights from dendextend.
set.seed(1)
mat <- matrix(rnorm(100*10),nrow=100,ncol=10)
dend <- as.dendrogram(hclust(dist(t(mat))))
library(dendextend)
get_branches_heights(dend, sort = F)
It does not seem to be exactly in the order youu want, but see if this is still useful:
> get_branches_heights(dend, sort = F)
[1] 16.38688 15.41441 14.68365 15.99504 13.52949
[6] 12.96921 14.39275 13.91157 13.15395
BTW, the recent github version of dendextend also comes with the highlight_branches function for coloring branches based on branch height (in case this is somehow related to your motivation):
plot(highlight_branches(dend))
The data:
set.seed(1)
mat <- matrix(rnorm(100*10),nrow=100,ncol=10)
dend <- as.dendrogram(hclust(dist(t(mat))))
Using the data.tree package allows traversing trees in various orders. level will give what the question specifies:
require(data.tree)
dend.dt <- as.Node(dend)
sapply(Traverse(dend.dt,traversal = "level", pruneFun = isNotLeaf),function(x) x$plotHeight)
[1] 16.38688 15.41441 15.99504 14.68365 13.52949 14.39275 12.96921 13.91157 13.15395

Coloring dendrogram’s end branches (or leaves) based on column number of data frame in R

From a dataframe data.main, I am able to generate a hclust dendrogram as,
aa1<- c(2,4,6,8)
bb1<- c(1,3,7,11)
aa2<-c(3,6,9,12)
bb2<-c(3,5,7,9)
data.main<- data.frame(aa1,bb1,aa2,bb2)
d1<-dist(t(data.main))
hcl1<- hclust(d1)
plot(hcl1)
Further, I know there are ways to use a tree cutoff to color the branches or leaves. However, is it possible to color them based on partial column names or column number (e.g. I want that branch corresponding to aa1, aa2 be red and bb1 and bb2 be blue)?
I have checked the R package dendextend but am still not able to find a direct/easy way to get the desired result.
It's easier to change colors for a dendrogram than an hclust object, but it's pretty straightforward to convert. You can do
drg1 <- dendrapply(as.dendrogram(hcl1, hang=.1), function(n){
if(is.leaf(n)){
labelCol <- c(a="red", b="blue")[substr(attr(n,"label"),1,1)];
attr(n, "nodePar") <- list(pch = NA, lab.col = labelCol);
attr(n, "edgePar") <- list(col = labelCol); # to color branch as well
}
n;
});
plot(drg1)
which will draw
UPDATE
I'm only leaving my answer because it is valid and someone might find OOMPA useful. However, after seeing the solution of using dendrapply as suggested by MrFlick, I recommend it instead. You might find other features of the OOMPA package useful, but I wouldn't install it just for functionality that already exists in core R.
Original Answer
Install OOMPA (Object-Oriented Microarray and Proteomics Analysis package):
source("http://silicovore.com/OOMPA/oompaLite.R")
oompaLite()
Then use the plotColoredClusters function from the library ClassDiscovery:
library(ClassDiscovery)
aa1<- c(2,4,6,8)
bb1<- c(1,3,7,11)
aa2<-c(3,6,9,12)
bb2<-c(3,5,7,9)
data.main<- data.frame(aa1,bb1,aa2,bb2)
d1<-dist(t(data.main))
hcl1<- hclust(d1)
#identify the labels
labels=hcl1[4]$labels
# Choose which ones are in the "aa" group
aa_present <- grepl("aa", labels)
colors <- ifelse(aa_present, "red", "blue")
plotColoredClusters(hcl1,labs=labels,cols=colors)
Result:
ice, the dendextend package allows to do this using the assign_values_to_leaves_edgePar function.
Here is how to use it:
aa1 <- c(2,4,6,8)
bb1 <- c(1,3,7,11)
aa2 <- c(3,6,9,12)
bb2 <- c(3,5,7,9)
data.main <- data.frame(aa1,bb1,aa2,bb2)
d1 <- dist(t(data.main))
hcl1 <- hclust(d1)
# plot(hcl1)
dend <- as.dendrogram(hcl1)
col_aa_red <- ifelse(grepl("aa", labels(dend)), "red", "blue")
dend2 <- assign_values_to_leaves_edgePar(dend=dend, value = col_aa_red, edgePar = "col")
plot(dend2)
Result:

Plotting subsets of an AffyRNAdeg {affy} object with plot AffyRNAdeg?

library(affy)
microarrays <- ReadAffy() # 98 CEL files are read into the same object
RNAdeg <- AffyRNAdeg(microarrays)
Now I want to plot subsets of RNAdeg
plotAffyRNAdeg(RNAdeg[.......?]) # What can I do?
I've tried various 'for' loops without success.
But if plot line colors are specified then plotAffyRNAdeg plots a subset of 1:(number of colors specified), but I haven't thought of a way to use that effectively. For example, below plots the first through the sixth AffyRNAdeg'd set of microarray data (first through sixth .CEL file read in by ReadAffy() )
plotAffyRNAdeg(RNAdeg,col=c(2,2,2,3,3,3))
OK, one way was found by running AffyRNAdeg() on subsets of the object the CEL files are in and putting the resulting data in a list of lists organized by experiment, then plotting the list elements. Maybe there is an easier way, but this worked (I'm quite new to R).
library(affy)
library(RColorBrewer)
> sampleNames(ARTHwoundMA[,11:14])
[1] "GSE18960_05_GSM469416_trt_rep2.CEL" "GSE18960_06_GSM469418_trt_rep3.CEL"
[3] "GSE5525_GSM128715_ctrl12h.CEL" "GSE5525_GSM128716_ctrl24h.CEL
# RNA DEG
# Indices to subset by experiment
cel_names <- substr(sampleNames(ARTHwoundMA),1,7)
unique_exp <- unique(substr(sampleNames(ARTHwoundMA),1,7))
exp_ind <- list()
for (i in 1:length(unique_exp))
{
tempvec <- vector()
for (j in 1:length(cel_names))
{
if (cel_names[j]==unique_exp[i])
{
tempvec <- append(tempvec,j)
}
}
exp_ind[[(length(exp_ind)+1)]] <- tempvec
}
# Calculating
RNAdeg_exp <- list()
for(i in 1:length(exp_ind))
{
RNAdeg_exp[[i]] <- AffyRNAdeg(ARTHwoundMA[,exp_ind[[i]]])
}
# Plotting
colors <- colorRampPalette(rev(brewer.pal(9, "Reds")))(length(exp_ind[[i]])
pdf(file="C:\\R working directory\\TEST\\RNAdeg_plots.pdf")
for(i in 1:length(exp_ind))
{
par(bg="gray")
colors <- colorRampPalette(rev(brewer.pal(9, "Reds")))(length(exp_ind[[i]]))
plotAffyRNAdeg(RNAdeg_exp[[i]], col=colors)
plot.new()
legend("topleft", lty=1, lwd=2,col=colors,
legend=paste(sampleNames(ARTHwoundMA[,exp_ind[[i]]])))
}
dev.off()

Resources