So I'm trying to generate a heatmap for my data using Bioconductor's ComplexHeatmap package, but I get slightly different results depending on whether I make the dendrogram myself, or tell Heatmap to make it.
Packages:
require(ComplexHeatmap)
require(dendextend)
Data:
a=rnorm(400,1)
b=as.matrix(a)
dim(b)=c(80,5)
If I make the dendrogram myself:
d=dist(b,method="euclidean")
d=as.dist(d)
h=hclust(d,method="ward.D")
dend=as.dendrogram(h)
Heatmap(b,
cluster_columns=FALSE,
cluster_rows = dend)
Versus having Heatmap do the clustering:
Heatmap(b,
cluster_columns=FALSE,
clustering_distance_rows = "euclidean",
clustering_method_rows = "ward.D")
They tend to look very similar, but they'll be very slightly different.
And this matters a lot for my data. Heatmap's clustering ends up organizing my data way, way better, however, I also want to extract the list of clustered items via like cutree(), but I don't think I can extract it from Heatmap's clustering.
Does anyone know what's going on?
the dendrograms are the same. The only thing that changes is the ordering. You can verify this using:
hmap1 <- Heatmap(b,
cluster_columns=FALSE,
cluster_rows = dend)
hmap2 <- Heatmap(b,
cluster_columns=FALSE,
clustering_distance_rows = "euclidean",
clustering_method_rows = "ward.D")
#Reorder both row dendrograms using the same weights:
rowdend1 <- reorder(row_dend(hmap1)[[1]], 1:80)
rowdend2 <- reorder(row_dend(hmap2)[[1]], 1:80)
#check that they are identical:
identical( rowdend1, rowdend2)
## [1] TRUE
The ComplexHeatmap::Heatmap function has an argument row_dend_reorder with default value TRUE that you should check.
Related
I am trying to visualize a PCA that includes 87 variables.
prc <-prcomp(df[,1:87], center = TRUE, scale. = TRUE)
ggbiplot(prc, labels = rownames(df[,1:87]), var.axes = TRUE)
When I create the biplot, many of the vectors overlap with each other, making it impossible to read the labels. I was wondering if there is any way to only show some of the labels at a time. For example, I think it'd be useful if I could create a few separate biplots with each one showing only a subset of the labels on the vectors.
This question seems closely related, but I don't know if it translates to the latest version of ggbiplot. I'm also not sure how to modify the original functions.
A potential solution is to use the factoextra package to visualize your PCA results. The fviz_pca_biplot() function includes a repel argument. When repel = TRUE the plot labels are spread out to minimize overlap. There are also select.var options mentioned in the documentation, such as select.var = list(contrib=5) to display only the 5 most influential vectors. Also a select.var = list(name) option that seems to allow for the specification of a specific subset of variables that you want shown.
# read data
df <- mtcars[, c(1:7,10:11)]
# perform PCA
library("FactoMineR")
res.pca <- PCA(df, graph = FALSE)
# visualize
library(factoextra)
fviz_pca_biplot(res.pca, repel = TRUE, select.var = list(contrib = 5))
I am plotting a set of 15 samples clustered in three groups A, B, C, and the heatmap orders them such as C, A, B. (I have read this is due to that it plots on the right the cluster with the strongest similarity). I would like to order the clusters so the leaves of the cluster are seen as A, B, C (therefore reorganising the order of the cluster branches. Is there a function that can help me do this?
The code I have used:
library(pheatmap)
pheatmap(mat, annotation_col = anno,
color = colorRampPalette(c("blue", "white", "red"))(50), show_rownames = F)
(cluster_cols=FALSE would not cluster the samples at all, but that is not what I want)
I have also found on another forum this, but I am unsure how to change the function code and if it would work for me:
clustering_callback callback function to modify the clustering. Is
called with two parameters: original hclust object and the matrix used
for clustering. Must return a hclust object.
Hi I am not sure if that is of any help for you but when you check?pheatmap and scroll down to examples the last snippet of code actually does give that example.
# Modify ordering of the clusters using clustering callback option
callback = function(hc, mat){
sv = svd(t(mat))$v[,1]
dend = reorder(as.dendrogram(hc), wts = sv)
as.hclust(dend)
}
pheatmap(test, clustering_callback = callback)
I tried it on my heatmap and the previously defined function actually sorted the clusters exactly the way I needed them. Although I have to admit (as I am new to R) I don't fully understand what the defined callback function does.
Maybe you can also write a function with the dendsortpackage as I know you can reorder the branches of a dendrogram with it.
In this case, luckily clustering of the columns coincides with sample number order, (which is similar to dendrogram) so I added cluster_cols = FALSE and solved the issue of re-clustering the columns (and avoided writing the callback function.
pheatmap(mat,
annotation_col = anno,
fontsize_row = 2,
show_rownames = T,
cutree_rows = 3,
cluster_cols = FALSE)
# install.packages("dendsort")
library(dendsort)
sort_hclust <- function(...) as.hclust(dendsort(as.dendrogram(...)))
cluster_cols=sort_hclust(hclust(dist(mat)))
By default, R's heatmap will cluster rows and columns:
mtscaled = as.matrix(scale(mtcars))
heatmap(mtscaled, scale='none')
I can disable the clustering:
heatmap(mtscaled, Colv=NA, Rowv=NA, scale='none')
And then the dendrogram goes away:
But now the data is not clustered anymore.
I don't want the dendrograms to be shown, but I still want the rows and/or columns to be clustered. How can I do this?
Example of what I want:
You can do this with pheatmap:
mtscaled <- as.matrix(scale(mtcars))
pheatmap::pheatmap(mtscaled, treeheight_row = 0, treeheight_col = 0)
See pheatmap output here:
library(gplots)
heatmap.2(mtscaled,dendrogram='none', Rowv=TRUE, Colv=TRUE,trace='none')
Rowv -is TRUE, which implies dendrogram is computed and reordered based on row means.
Colv - columns should be treated identically to the rows.
I had similar issue with pheatmap, which has better visualisation and heatmap or heatmap.2. Though heatmap.2 is a choice for your solution, Here is the solution with pheatmap, by extracting the order of clustered data.
library(pheatmap)
mtscaled = as.matrix(scale(mtcars))
H = pheatmap(mtscaled)
Here is the output of pheatmap
pheatmap(mtscaled[H$tree_row$order,H$tree_col$order],cluster_rows = F,cluster_cols = F)
Here is the output of pheatmap after extracting the order of clusters
For ComplexHeatmap, there are function parameters to remove the dendrograms:
library(ComplexHeatmap)
Heatmap(as.matrix(iris[,1:4]), name = "mat", show_column_dend = FALSE, show_row_dend = FALSE)
You can rely on base R structures and consider following approach based on building the hclust trees by yourself.
mtscaled = as.matrix(scale(mtcars))
row_order = hclust(dist(mtscaled))$order
column_order = hclust(dist(t(mtscaled)))$order
heatmap(mtscaled[row_order,column_order], Colv=NA, Rowv=NA, scale="none")
No need to install additional junk.
Do the dendrogram twice using the basic R heatmap function. Take the output of the first run, which clusters but has mandatory drawing of the dendrogram and feed it into the heatmap function again. This time, without clustering, and without drawing the dendrogram.
#generate a random symmetrical matrix with a little bit of structure, and make a heatmap
M100s<-matrix(runif(10000),nrow=100)
M100s[2,]<-runif(100,min=0.1,max=0.2)
M100s[4,]<-runif(100,min=0.1,max=0.2)
M100s[6,]<-runif(100,min=0.1,max=0.2)
M100s[99,]<-runif(100,min=0.1,max=0.2)
M100s[37,]<-runif(100,min=0.1,max=0.2)
M100s[lower.tri(M100s)] <- t(M100s)[lower.tri(M100s)]
heatmap(M100s)
#save the output
OutputH <- heatmap(M100s)
#run it again without clustering or the dendrogram
M100c <- M100s
M100c1 <- M100c[,OutputH$rowInd]
M100c2 <- M100c1[OutputH$colInd,]
heatmap(M100c2,Rowv = NA, Colv = NA, labRow = NA, labCol = NA)
I am using R to do a hierarchical cluster analysis using the Ward's squared euclidean distance. I have a matrix of x columns(stations) and y rows(numbers in float), the first row contain the header(stations' names). I want to have a good dendrogram where the name of the station appear at the bottom of the tree as i am not able to interprete my result. My aim is to find those stations which are similar. However using the following codes i am having numbers (100,101,102,...) for the lower branches.
Yu<-read.table("yu_s.txt",header = T, dec=",")
library(cluster)
agn1 <- agnes(Yu, metric = "euclidean", method="ward", stand = TRUE)
hcd<-as.dendrogram(agn1)
par(mfrow=c(3,1))
plot(hcd, main="Main")
plot(cut(hcd, h=25)$upper,
main="Upper tree of cut at h=25")
plot(cut(hcd, h=25)$lower[[2]],
main="Second branch of lower tree with cut at h=25")
A nice collection of examples are present here (http://gastonsanchez.com/blog/how-to/2012/10/03/Dendrograms.html)
Two methods:
with hclust from base R
hc<-hclust(dist(mtcars),method="ward")
plot(hc)
Default plot
ggplot
with ggplot and ggdendro
library(ggplot2)
library(ggdendro)
# basic option
ggdendrogram(hc, rotate = TRUE, size = 4, theme_dendro = FALSE)
I am trying to cluster a protein dna interaction dataset, and draw a heatmap using heatmap.2 from the R package gplots. My matrix is symmetrical.
Here is a copy of the data-set I am using after it is run through pearson:DataSet
Here is the complete process that I am following to generate these graphs: Generate a distance matrix using some correlation in my case pearson, then take that matrix and pass it to R and run the following code on it:
library(RColorBrewer);
library(gplots);
library(MASS);
args <- commandArgs(TRUE);
matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);
mtscaled <- as.matrix(scale(matrix_a))
# location <- args[2];
# setwd(args[2]);
pdf("result.pdf", pointsize = 15, width = 18, height = 18)
mycol <- c("blue","white","red")
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
#colors <- colorpanel(75,"midnightblue","mediumseagreen","yellow")
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
dev.off()
The issue I am having is once I use breaks to help me control the color separation the heatmap no longer looks symmetrical.
Here is the heatmap before I use breaks, as you can see the heatmap looks symmetrical:
Here is the heatmap when breaks are used:
I have played with the cutoff's for the sequences to make sure for instance one sequence does not end exactly where the other begins, but I am not able to solve this problem. I would like to use the breaks to help bring out the clusters more.
Here is an example of what it should look like, this image was made using cluster maker:
I don't expect it to look identical to that, but I would like it if my heatmap is more symmetrical and I had better definition in terms of the clusters. The image was created using the same data.
After some investigating I noticed was that after running my matrix through heatmap, or heatmap.2 the values were changing, for example the interaction taken from the provided data set of
Pacdh-2
and
pegg-2
gave a value of 0.0250313 before the matrix was sent to heatmap.
After that I looked at the matrix values using result$carpet and the values were then
-0.224333135
-1.09805379
for the two interactions
So then I decided to reorder the original matrix based on the dendrogram from the clustered matrix so that I was sure that the values would be the same. I used the following stack overflow question for help:
Order of rows in heatmap?
Here is the code used for that:
rowInd <- rev(order.dendrogram(result$rowDendrogram))
colInd <- rowInd
data_ordered <- matrix_a[rowInd, colInd]
I then used another program "matrix2png" to draw the heatmap:
I still have to play around with the colors but at least now the heatmap is symmetrical and clustered.
Looking into it even more the issue seems to be that I was running scale(matrix_a) when I change my code to just be mtscaled <- as.matrix(matrix_a) the result now looks symmetrical.
I'm certainly not the person to attempt reproducing and testing this from that strange data object without code that would read it properly, but here's an idea:
..., col=bluered(20)[4:20], ...
Here's another though which should return the full rand of red which tha above strategy would not:
shift.BR<- colorRamp(c("blue","white", "red"), bias=0.5 )((1:16)/16)
heatmap.2( ...., col=rgb(shift.BR, maxColorValue=255), .... )
Or you can use this vector:
> rgb(shift.BR, maxColorValue=255)
[1] "#1616FF" "#2D2DFF" "#4343FF" "#5A5AFF" "#7070FF" "#8787FF" "#9D9DFF" "#B4B4FF" "#CACAFF" "#E1E1FF" "#F7F7FF"
[12] "#FFD9D9" "#FFA3A3" "#FF6C6C" "#FF3636" "#FF0000"
There was a somewhat similar question (also today) that was asking for a blue to red solution for a set of values from -1 to 3 with white at the center. This it the code and output for that question:
test <- seq(-1,3, len=20)
shift.BR <- colorRamp(c("blue","white", "red"), bias=2)((1:20)/20)
tpal <- rgb(shift.BR, maxColorValue=255)
barplot(test,col = tpal)
(But that would seem to be the wrong direction for the bias in your situation.)