I found this topic and I would like to got similar result plot. I am trying to plot a heatmap with the library pheatmap in R and I need to increase the left branch size. I would like to decrease the colored area and increase dendrogram branches.
I found another topic that do a bit of what I need, but in that one, dendrogram is splitted from heatmap.
Here is a sample code:
library(pheatmap)
test = cbind(matrix(rnorm(3000), 100, 2), matrix(rnorm(3000)+10, 100, 2))
pheatmap(test)
pheatmap(test, cluster_cols = FALSE)
Related
I have a gene expression data set and want to show a heatmap of some of the genes. First, I want to make hierarchical clustering based on all genes, and create a dendrogram, and then create a heatmap on a subset of those genes. In explicit, the heatmap will have same columns as the dendrogram already created, but show less rows. I have tried to use the code below, but it seems that pheatmap re-orders the clusters based on the reduced matrix.
# Random data
full_mat <- matrix(rgamma(1000, shape = 1) * 5, ncol = 50)
reduced_mat <- full_mat[1:5,]
# Function to calculate distances on full-matrix and make dendrogram
cl_cb <- function(hcl, mat){
# Recalculate manhattan distances for reorder method
dists <- dist(full_mat, method = "manhattan")
# Perform reordering according to OLO or GW method
hclust_olo <- reorder(hcl, dists, method="GW")
return(hclust_olo)
}
# Only display the reduced matrix (same columns but fewer rows)
p <- pheatmap(reduced_mat,
show_rownames=TRUE,
show_colnames = TRUE,
cluster_cols=T,
cluster_rows=F,
scale = "none",
clustering_callback = cl_cb
)
I have tried to set cluster_cols = Fbut then no dendogram or re-ordering is made at all.
Try using the function heatmap.2 instead. Install it if you don't already have it.
After this, run the following:
heatmap.2(reduced_mat, dendrogram = "both", labRow=row.names(reduced_mat),
labCol=colnames(reduced_mat), Colv = FALSE, Rowv = FALSE)
#If you want to only show row or col dendrogram, change dendrogram = "both" to dendrogram = "column" (or "row")
It will still produce a dendrogram based on the subsetted data set however, it should not change the order of the matrix used. If I understand correctly, this is what you want.
If you provide a reproducible example, using dput(), I could try it out myself.
Maybe what you could do, if you are set on doing this, is to create the heatmap, keep the ordering of the rows and cols, do not create a second dendrogram and instead save the heatmap as an image, this can be done with the following:
dev.copy(jpeg,filename="plot.jpg")
dev.off ()
Do the same with your original heatmap, crop out the part of the dendrogram you are interested in, and paste it in photoshop or paint to the second heatmap image created.
However, as mentioned in my comment, this is not a "true" dendrogram of the subsetted dataset, rather, a "snippet" of the original heatmap.
Let me know if it works!
With the plotweb() function in the bipartite package in R, I've created a network, but some of the labels are too long for the plot area. As a result, they are being cut off at the top and bottom (I've included a picture).
I'm trying to make it fit in the plot, or if not possible, to be able to export it as an image without the edges being cut off.
I've tried par(mar=c(), but that doesn't seem to do anything. ybig() can allow the top half to fit in, but it doesn't change the bottom section.
See photo: labels being cut off from the web
It seems that the argument y.lim could be adjusted in such cases. From help(plotweb), about y.lim:
[...] Useful if labels are plotted outside the plotting region and for multitrophic plots [...]
Here is an example:
library(bipartite)
data(Safariland)
# Forge some long names/labels
cn <- colnames(Safariland)
rn <- rownames(Safariland)
colnames(Safariland) <- paste(cn, cn)
rownames(Safariland) <- paste(rn, rn)
# plotweb with trying to fit the labels - tweak y.lim until you get it right
plotweb(Safariland, text.rot = 90, y.lim = c(-1.5, 4))
I am trying to do k-means clustering using R, and this is what I have done so far:
tmp <- kmeans(ds, centers = 4, iter.max = 1000)
plot(ds[tmp$cluster==1,c(1,5)], col = "red", xlim = c(min(ds[,1]),
max(ds[,1])), ylim = c(min(ds[,5]), max(ds[,5])))
points(ds[tmp$cluster==2,c(1,5)], col = "blue")
points(ds[tmp$cluster==3,c(1,5)], col = "seagreen")
points(ds[tmp$cluster==4,c(1,5)], col = "orange")
points(tmp$centers[,c(1,5)], col = "black")
and I get the following graph:
I am quite new to this, so I may be way off, but this graph does not look quite right to me. The data is basically divided in zones and to be honest, I was expecting to see something along the lines of this:
The circles in this picture are just to showcase where I was expecting the clusters to be. Can anyone explain why the data is clustered like that? I did the clustering multiple times and I always end up with this result.
The dataset I am using can be found here.
Notice that Age runs from about 18 to 60, so the maximum distance between age is about 40. Now notice that the incomes range from 0 to 20000. The distance between points is heavily dominated by the income. If you wish both variables to be used in the clustering, you should scale the data before clustering. Try
tmp<-kmeans(scale(ds), centers = 4, iter.max = 1000)
This is how the k-means clustering algorithm work. Google "k-means clustering" and look at the picture results and you will see different variations: circular clusters and the type you received. If you set number of clusters k to a different number, you will get different clusters. The goal of the algorithm is to partition a data set into a desired number of non-overlapping clusters k, so that the total within-cluster variation is minimized. And this is the result you see in your plot.
I have an hclust tree with nearly 2000 samples. I have cut it to an appropriate number of clusters and would like to plot the dendrogram but ending at the height that I cut the clusters rather than all the way to every individual leaf. Every plotting guide is about coloring all the leaves by cluster or drawing a box, but nothing seems to just leave the leaves below the cut line out completely.
My full dendrogram looks like the following:
I would like to plot it as if it stops where I've drawn the abline here (for example):
This should get you started. I suggest reading the help page for "dendrogram"
Here is the example from the help page:
hc <- hclust(dist(USArrests))
dend1 <- as.dendrogram(hc)
plot(dend1)
dend2 <- cut(dend1, h = 100)
plot(dend2$upper)
plot(dend2$upper, nodePar = list(pch = c(1,7), col = 2:1))
By performing the cut on the dendrogram object (not the hclust object) you can then plot the upper part of the dendrogram. It will take a some work to replace the branch1, 2, 3, and 4 labels depending on your analysis.
Good luck.
I am trying to cluster a protein dna interaction dataset, and draw a heatmap using heatmap.2 from the R package gplots. Here is the complete process that I am following to generate these graphs:
Generate a distance matrix using some correlation in my case pearson.
library(RColorBrewer);
library(gplots);
args <- commandArgs(TRUE);
matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);
mtscaled <- as.matrix(scale(matrix_a))
pdf("result.pdf", pointsize = 15, width = 18, height = 18)
result <- heatmap.2(mtscaled, Colv=T,Rowv=T, scale='none',symm = T, col = brewer.pal(9,"Reds"))
dev.off()
I am able to acomplish this with the normal heatmap function by doing the following:
result <- heatmap(mtscaled, Colv=T,Rowv=T, scale='none',symm = T)
However when I use the same settings for Heatmap.2 the clusters don't line up as well on the diagonal. I have attached 2 images the first image uses heatmap and the second image uses heatmap.2. I have used the Reds color from the package RColorBrewer to help better show what I am taking about. I would normally just use the default heatmap function, but I need the color variation that heatmap.2 provides.
Here is a list to the dataset used to generate the heatmaps, after it has been turned into a distance matrix:
DataSet
It's as if two of the arguments are conflicting. Colv=T says to order the columns by cluster, and symm=T says to order the columns the same as the rows. Of course, both constraints could be satisfied since the data is symmetrical, but instead Colv=T wins and you get two independent cluster orderings that happen to be different.
If you give up on having redundant copy of the dendrogram, the following gives the heatmap you want, at least:
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col = brewer.pal(9,"Reds"))