heatmap.2 defaults to dist for calculating the distance matrix and hclust for clustering.
Does anyone now how I can set dist to use the euclidean method and hclust to use the centroid method?
I provided a compilable code sample bellow.
I tried: distfun = dist(method = "euclidean"),
but that doesn't work. Any ideas?
library("gplots")
library("RColorBrewer")
test <- matrix(c(79,38.6,30.2,10.8,22,
81,37.7,28.4,9.7,19.9,
82,36.2,26.8,9.8,20.9,
74,29.9,17.2,6.1,13.9,
81,37.4,20.5,6.7,14.6),ncol=5,byrow=TRUE)
colnames(test) <- c("18:0","18:1","18:2","18:3","20:0")
rownames(test) <- c("Sample 1","Sample 2","Sample 3", "Sample 4","Sample 5")
test <- as.table(test)
mat=data.matrix(test)
heatmap.2(mat,
dendrogram="row",
Rowv=TRUE,
Colv=NULL,
distfun = dist,
hclustfun = hclust,
xlab = "Lipid Species",
ylab = NULL,
colsep=c(1),
sepcolor="black",
key=TRUE,
keysize=1,
trace="none",
density.info=c("none"),
margins=c(8, 12),
col=bluered
)
Glancing at the code for heatmap.2 I'm fairly sure that the default is to use dist, and it's default is in turn to use euclidean distances.
The reason your attempt at passing distfun = dist(method = 'euclidean') didn't work is that distfun (and hclustfun) are supposed to simply be name of functions. So if you want to alter defaults and pass arguments you need to write a wrapper function like this:
heatmap.2(...,hclustfun = function(x) hclust(x,method = 'centroid'),...)
As I mentioned, I'm fairly certain that heatmap.2 is using euclidean distances by default, but a similar solution can be used to alter the distance function used:
heatmap.2(...,distfun = function(x) dist(x,method = 'euclidean'),...)
Related
I have created a heatmap with a corresponding dendogram based on the hierarchical clustering, using the pheatmap package. Now, I want to change the order of the leaves in the dendogram. Preferably using the optimal leaves method. I have searched around but not found any solution on how to change the achieve this.
I would appreciate suggestions on how to change the order of the leaves, using the optimal leaves method.
Here's my example code with random data:
mat <- matrix(rgamma(1000, shape = 1) * 5, ncol = 50)
p <- pheatmap(mat,
clustering_distance_cols = "manhattan",
cluster_cols=TRUE,
cluster_rows=FALSE
)
For "optimal leaf ordering" you can use order method from seriation library. pheatmap accepts clustering_callback argument. According to docs:
clustering_callback callback function to modify the clustering. Is called with two parameters: original hclust object and the matrix used
for clustering. Must return a hclust object.
So you need to construct callback function which accepts hclust object and initial matrix and returns optimized hclust object.
Here is a code:
library(pheatmap)
library(seriation)
cl_cb <- function(hcl, mat){
# Recalculate manhattan distances for reorder method
dists <- dist(mat, method = "manhattan")
# Perform reordering according to OLO method
hclust_olo <- reorder(hcl, dists)
return(hclust_olo)
}
mat <- matrix(rgamma(1000, shape = 1) * 5, ncol = 50)
p <- pheatmap(mat,
clustering_distance_cols = "manhattan",
cluster_cols=TRUE,
cluster_rows=FALSE,
clustering_callback = cl_cb
)
I have a simlarity matrix as follows:
xx <- cor(matrix(rnorm(650), ncol =25))
I want to cluster this similarity matrix and image in a heatmap. Is the following correct?
yy <- heatmap(1-xx, Rowv=T, scale='none',symm = T,keep.dendro=F,
Here, I am taking 1-xx which is a dissimilarity matrix. Is this the right thing to do, or should it be input in some other way?
I have figured it out upon reading one of the examples in R. Here is what one has to do using the similarity matrix.
hU <- heatmap(xx, Rowv = FALSE, symm = TRUE,
distfun = function(c) as.dist(1 - c),
hclustfun = function(d) hclust(d, method = "single"),
keep.dendro = FALSE)
I hope that this helps someone!
Hi so i am trying to plot my nmds of a assemblage data which is in a bray-curtis dissimilarity matrix in R. I have been able to apply ordielipse(),ordihull() and even change the colours based on group factors created by cutree() of a hclst()
e.g using the dune data from the vegan package
data(dune)
Dune.dis <- vegdist(Dune, method = "bray)
Dune.mds <- metaMDS(Dune, distance = "bray", k=2)
#hierarchical cluster
clua <- hclust(Dune.dis, "average")
plot(clua, hang = -1)
# set groupings
rect.hclust(clua, 4)
grp <- cutree(clua, 4)
#plot mds
plot(Dune.mds, display = "sites", type = "text", cex = 1.5)
#show groupings
ordielipse(Dune.mds, group = grp, border =1, col ="red", lwd = 3)
or even colour the points just by the cutree
colvec <- c("red2", "cyan", "deeppink3", "green3")
colvec[grp]
plot(Dune.mds, display = "sites", type = "text", cex = 1.5) #or use type = "points"
points(P4.mds, col = colvec[c2], bg =colvec[c2], pch=21)
However what i really want to do is use the SIMPROF function using the package "clustsig" to then colour the points based on significant groupings - this is more of a technical coding language thing - i am sure there is a way to create a string of factors but i am sure there is a more efficient way to do it
heres my code so far for that:
simp <- simprof(Dune.dis, num.expected = 1000, num.simulated = 999, method.cluster = "average", method.distance = "braycurtis", alpha = 0.05, sample.orientation = "row")
#plot dendrogram
simprof.plot(simp, plot = TRUE)
Now i am just not sure how do the next step to plot the nmds using the groupings defined by the SIMPROF - how do i make the SIMPROF results a factor string without literally typing it my self it myself?
Thanks in advance.
You wrote you know how to get colours from an hclust object with cutree. Then read the documentation of clustsig::simprof. This says that simprof returns an hclust object within its result object. It also returns numgroups which is the suggested number of clusters. Now you have all information you need to use the cutree of hclust you already know. If your simprof result is called simp, use cutree(simp$hclust, simp$numgroups) to extract the integer vector corresponding to the clustsig::simprof result, and use this to colours.
I have never used simprof or clustsig, but I gathered all this information from its documentation.
By default, R's heatmap will cluster rows and columns:
mtscaled = as.matrix(scale(mtcars))
heatmap(mtscaled, scale='none')
I can disable the clustering:
heatmap(mtscaled, Colv=NA, Rowv=NA, scale='none')
And then the dendrogram goes away:
But now the data is not clustered anymore.
I don't want the dendrograms to be shown, but I still want the rows and/or columns to be clustered. How can I do this?
Example of what I want:
You can do this with pheatmap:
mtscaled <- as.matrix(scale(mtcars))
pheatmap::pheatmap(mtscaled, treeheight_row = 0, treeheight_col = 0)
See pheatmap output here:
library(gplots)
heatmap.2(mtscaled,dendrogram='none', Rowv=TRUE, Colv=TRUE,trace='none')
Rowv -is TRUE, which implies dendrogram is computed and reordered based on row means.
Colv - columns should be treated identically to the rows.
I had similar issue with pheatmap, which has better visualisation and heatmap or heatmap.2. Though heatmap.2 is a choice for your solution, Here is the solution with pheatmap, by extracting the order of clustered data.
library(pheatmap)
mtscaled = as.matrix(scale(mtcars))
H = pheatmap(mtscaled)
Here is the output of pheatmap
pheatmap(mtscaled[H$tree_row$order,H$tree_col$order],cluster_rows = F,cluster_cols = F)
Here is the output of pheatmap after extracting the order of clusters
For ComplexHeatmap, there are function parameters to remove the dendrograms:
library(ComplexHeatmap)
Heatmap(as.matrix(iris[,1:4]), name = "mat", show_column_dend = FALSE, show_row_dend = FALSE)
You can rely on base R structures and consider following approach based on building the hclust trees by yourself.
mtscaled = as.matrix(scale(mtcars))
row_order = hclust(dist(mtscaled))$order
column_order = hclust(dist(t(mtscaled)))$order
heatmap(mtscaled[row_order,column_order], Colv=NA, Rowv=NA, scale="none")
No need to install additional junk.
Do the dendrogram twice using the basic R heatmap function. Take the output of the first run, which clusters but has mandatory drawing of the dendrogram and feed it into the heatmap function again. This time, without clustering, and without drawing the dendrogram.
#generate a random symmetrical matrix with a little bit of structure, and make a heatmap
M100s<-matrix(runif(10000),nrow=100)
M100s[2,]<-runif(100,min=0.1,max=0.2)
M100s[4,]<-runif(100,min=0.1,max=0.2)
M100s[6,]<-runif(100,min=0.1,max=0.2)
M100s[99,]<-runif(100,min=0.1,max=0.2)
M100s[37,]<-runif(100,min=0.1,max=0.2)
M100s[lower.tri(M100s)] <- t(M100s)[lower.tri(M100s)]
heatmap(M100s)
#save the output
OutputH <- heatmap(M100s)
#run it again without clustering or the dendrogram
M100c <- M100s
M100c1 <- M100c[,OutputH$rowInd]
M100c2 <- M100c1[OutputH$colInd,]
heatmap(M100c2,Rowv = NA, Colv = NA, labRow = NA, labCol = NA)
My dendrograms are horribly ugly, on the verge of unreadable, and usually look like this:
library(TraMineR)
library(cluster)
data(biofam)
lab <- c("P","L","M","LM","C","LC","LMC","D")
biofam.seq <- seqdef(biofam[1:500,10:25], states=lab)
ccost <- seqsubm(biofam.seq, method = "CONSTANT", cval = 2, with.missing=TRUE)
sequences.OM <- seqdist(biofam.seq, method = "OM", norm= TRUE, sm = ccost,
with.missing=TRUE)
clusterward <- agnes(sequences.OM, diss = TRUE, method = "ward")
plot(clusterward, which.plots = 2)
What I would like to create is something like the following, meaning a round dendrogram, where the size of the labels can be carefully controlled so that they are actually visible:
How can I accomplish this in R?
The following solution may not be optimal but worth a try:
library(ape)
CL1 <- as.hclust(clusterward)
CL2 <- as.phylo(CL1)
plot(CL2, type="fan", cex=0.5)
The main issue obviously being the fact that there is still too many objects, hence too many labels. To turn the labels off, use argument show.tip.label=FALSE. You can also get rid of the margins to occupy the complete device with no.margin=TRUE:
plot(CL2, type="fan", show.tip.label=FALSE, no.margin=TRUE)