How to get individually made dendrograms from dendextend to consesnus tree analysis? - r

I am pretty new to R so I am struggling with the following:
I have a dataset where I am clustering several organ`s expresion values per patient. Like this I build 10 individual dendrograms. NOW, I would like to perform a consensus analysis and see how these individually built trees agree.
For the individual trees I have been using dendextend package. And I wanted to use the ape consensus function but I dont know how to transform my outcome from dendextend such that it is accepted in the consensus.
Like this I can plot the 10 individual trees
dend <- RawData[,4:ncol(RawData)] %>%
dist %>%
hclust(method = "average")%>%
as.dendrogram
labels(dend) <- TMA_TT
dev.new()
dend %>% plot(main="dend")
I d like to use this:
consensus(..., p = 1, check.labels = TRUE)
....either (i) a single object of class "phylo", (ii) a series of such objects separated by commas, or (iii) a list containing such objects.
But I am not sure how to get my dendextend results to this format.

Related

Set common y axis limits from a list of ggplots

I am running a function that returns a custom ggplot from an input data (it is in fact a plot with several layers on it). I run the function over several different input data and obtain a list of ggplots.
I want to create a grid with these plots to compare them but they all have different y axes.
I guess what I have to do is extract the maximum and minimum y axes limits from the ggplot list and apply those to each plot in the list.
How can I do that? I guess its through the use of ggbuild. Something like this:
test = ggplot_build(plot_list[[1]])
> test$layout$panel_scales_x
[[1]]
<ScaleContinuousPosition>
Range:
Limits: 0 -- 1
I am not familiar with the structure of a ggplot_build and maybe this one in particular is not a standard one as it comes from a "custom" ggplot.
For reference, these plots are created whit the gseaplot2 function from the enrichplot package.
I dont know how to "upload" an R object but if that would help, let me know how to do it.
Thanks!
edit after comments (thanks for your suggestions!)
Here is an example of the a gseaplot2 plot. GSEA stands for Gene Set Enrichment Analysis, it is a technique used in genomic studies. The gseaplot2 function calculates a running average and then plots it and another bar plot on the bottom.
and here is the grid I create to compare the plots generated from different data:
I would like to have a common scale for the "Running Enrichment Score" part.
I guess I could try to recreate the gseaplot2 function and input all of the datasets and then create the grid by facet_wrap, but I was wondering if there was an easy way of extracting parameters from a plot list.
As a reproducible example (from the enrichplot package):
library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
wpgmtfile <- system.file("extdata/wikipathways-20180810-gmt-Homo_sapiens.gmt", package="clusterProfiler")
wp2gene <- read.gmt(wpgmtfile)
wp2gene <- wp2gene %>% tidyr::separate(term, c("name","version","wpid","org"), "%")
wpid2gene <- wp2gene %>% dplyr::select(wpid, gene) #TERM2GENE
wpid2name <- wp2gene %>% dplyr::select(wpid, name) #TERM2NAME
ewp2 <- GSEA(geneList, TERM2GENE = wpid2gene, TERM2NAME = wpid2name, verbose=FALSE)
gseaplot2(ewp2, geneSetID=1, subplots=1:2)
And this is how I generate the plot list (probably there is a much more elegant way):
plot_list = list()
for(i in 1:3) {
fig_i = gseaplot2(ewp2,
geneSetID=i,
subplots=1:2)
plot_list[[i]] = fig_i
}
ggarrange(plotlist=plot_list)

recluster.cons function and colorful dendrogram

I created a dendrogram using the 'recluster.cons' function of the recluster package. I would like to know how to color the branches of the dendrogram by group resulting from this function.
tree <- recluster.cons(sp2, p=1)$cons # sp2 is a presence-absence matrix
plot(tree, direction="downwards")
Here is the current dendrogram:
You need to define how many clusters you want to get from the clustering (like cutree), and then using dendextend seems like an easier option. First I simulate a dataset that might look like yours:
library(recluster)
set.seed(222)
testdata = lapply(1:3,function(i){
truep = runif(200)
replicate(7,rbinom(200,size=1,prob=truep))
})
testdata = t(do.call(cbind,testdata))
rownames(testdata) = paste0(rep(letters[1:3],each=7),rep(1:7,3))
We plot it, 3 clusters of sites because it was simulated as such:
tree <- recluster.cons(sp2, p=1)$cons # sp2 is a presence-absence matrix
plot(tree,direction="downwards")
Then colour it:
dendextend
dend <- color_branches(as.dendrogram(tree),k=3)
plot(dend)

R rect.hclust: rectangles too high in dendogram

I asked a number of different experts to sort 92 objects based on their similarity. Based on their answers, I constructed a 92 x 92 dissimilarity matrix. in R, I examined this matrix using the following commands:
cluster1 <- hclust(as.dist(DISS_MATRIX), method = "average")
plot(cluster1, cex=.55)
To highlight the clusters, I wanted to draw rectangles around them:
rect.hclust(cluster1, k = 3, border = "red")
The result is as follows:
However, when the objects have longer names ("AAAAAAAAAAAAAAAA43" instead of "A43") then the formating is off:
rownames(DISS_MATRIX) <- paste0(rep("AAAAAAAAAAAAAAAAAAAAAAAAAAAA",92),1:92)
colnames(DISS_MATRIX) <- paste0(rep("AAAAAAAAAAAAAAAAAAAAAAAAAAAA",92),1:92)
cluster1 <- hclust(as.dist(DISS_MATRIX), method = "average")
plot(cluster1, cex=.55)
rect.hclust(cluster1, k = 3, border = "red")
This can be seen by the resulting dendogram.
The rectangles seem to have moved up to the end of the dendogram. Not nice. I assume this glitch must have been due to the long names of 92 objects in the dissimilarity matrix. It may also not seem very relevant. Just make sure your objects have names short enough.
However, due to different reasons I want my objects to have their original (i.e.admittedly long) names. This graph is for a presentation and thus I do not want to work with codes. I also do not want to use any other package since I generally find hclust quite easy to use. However, I do not find any way to position rectangles within the rect.hclust command. Hence, what can I do to position the rectangles into the dendogram even if object names are long? Thanks.
You wrote that "I also do not want to use any other package since I generally find hclust quite easy to use."
While hclust is great for creating the hierarchical clustering object it does not support much in terms of plotting. Once you have the hclust output, it is better to change it to dendrogram (using as.dendrogram) for visualizations (since it is better suited for that). There is no way to do what you want without using sophisticated code, which is packed in a package, this is the best route (IMHO) for you to move forward. (I know because I wrote rect.dendrogram, and it took a lot of work to get it to work the way you want it)
The dendextend R package allows many functions for manipulating and visualizing dendrograms (see the vignette here).
Specifically, the rect.dendrogram function can handle such cases as you asked about (with having long labels). For example (I've added color_branches and color_labels for the fun of it):
library(dendextend)
hc <- mtcars[, c("mpg", "disp")] %>% dist %>% hclust(method = "average")
dend <- hc %>% as.dendrogram %>% hang.dendrogram
# let's make the text longer
labels(dend)[1] <- "AAAAAAAAAAAAAAAAAAAAA"
par(mar = c(15,2,1,1))
dend %>% color_branches(k=3) %>% color_labels(k=3) %>% plot
dend %>% rect.dendrogram(k=3)

Plot LOESS (STL) decomposition using Ggvis

I want to be able to plot the three different elements of The Seasonal Trend Decomposition using Loess (STL) with Ggvis.
However, I recive this error:
Error: data_frames can only contain 1d atomic vectors and lists
I am using the nottem data set.
# The Seasonal Trend Decomposition using Loess (STL) with Ggvis
# Load nottem data set
library(datasets)
nottem <- nottem
# Decompose using stl()
nottem.stl = stl(nottem, s.window="periodic")
# Plot decomposition
plot(nottem.stl)
Now, this is the information I am interested in. In order to make this into a plot that I can play around with I transform this into a data frame using the xts-package. So far so good.
# Transform nottem.stl to a data.frame
library(xts)
df.nottem.stl <- as.data.frame(as.xts(nottem.stl$time.series))
# Add date to data.frame
df.nottem.stl$date <- data.frame(time = seq(as.Date("1920-01-01"), by = ("months"), length =240))
# Glimpse data
glimpse(df.nottem.stl)
# Plot simple line of trend
plot(df.nottem.stl$date, df.nottem.stl$trend, type = "o")
This is pretty much the plot I want. However, I want to be able to use it with Shiny and therefore Ggvis is preferable.
# Plot ggvis
df.nottem.stl%>%
ggvis(~date, ~trend)%>%
layer_lines()
This is where I get my error.
Any hints on what might go wrong?
First of all your df.nottem.stl data.frame contains a Date data.frame, so you should be using the date$time column. Then using the layer_paths function instead of the layer_lines will make it work. I always find layer_paths working better than layer_lines:
So this will work:
library(ggvis)
df.nottem.stl%>%
ggvis(~date$time, ~trend)%>%
#for points
layer_points() %>%
#for lines
layer_paths()
Output:

How to make R output text details about a dendrogram object?

Please see my previous question for details relating to test data and commands used to create a dendrogram: Using R to cluster based on euclidean distance and a complete linkage metric, too many vectors?
Here is a quick summary of my commands to make the dendrogram:
un_exprs <- as.matrix(read.table("sample.txt", header=TRUE, sep = "\t", row.names = 1, as.is=TRUE))
exprs <- t(un_exprs)
eucl_dist=dist(exprs,method = 'euclidean')
hie_clust=hclust(eucl_dist, method = 'complete')\
dend <- as.dendrogram(hie_clust)
plot(dend)
This makes a very nice dengrogram plot. However, lets say this dendrogram has 2 clusters... I want to get a text list of each element belonging to each of the 2 clusters. I'm assuming this is trivial, but I don't have enough experience with R for this to be intuitive. Thanks!
You can compute this from the hclust return with stats::cutree
cutree(hie_clust,k=2)

Resources