Reordering rows for HeatMap.2 - r

I have RNA-Seq data that I want to visualize in a HeatMap with HeatMap.2 in R, but I do not understand to to force the HeatMap to look the way I want it to.
I have search online a lot and I feel like I am very close, but cannot overcome the last hurdle. I am using the following code where I have a matrix with 6 samples (triplicates of 2 conditions) and 209 specific genes I want to look at. The 209 genes I'm interested in fit into 3 categories, and I'm trying to show that using the RowSideColors= argument.
Here is my code:
colors <- colorpanel(75,"yellow","black","dodgerblue2")
heatmap.2(as.matrix(counts),
col=colors,
RowSideColors=SideCol,
scale="row",
key=T,
keysize=1,
density.info="none",
trace="none",
cexCol=0.9,
cexRow=0.5)
I know from searching online that I can use the Rowv command to reorder the dendrogram and order the rows the way I want, but I don't understand how to use the command. When I set Rowv=F it does not make a dendrogram and my genes are ordered the same as they are in the matrix. I want them to be grouped by RowSideColor category and then arranged such that they follow the key (i.e. all blue rows together and fade into black then yellow).
I thought I could get around this obstacle by determining the row z-score myself and arranging my matrix by category and row z-score, but my z-score calculations were much different than what Heatmap.2 determines, and the row color scale follows no pattern. I determined z-score by (x-mean)/sd
How can I arrange the rows the way I want them to be?
Thanks in advance for any help, I greatly appreciate it!
EDIT:
This is a crude representation of what I'd like the HeatMap to look like:

Related

How do I add a legend to my heatmap in R?

I have a large dataset of the expression of genes.
The rows are the genes.
The columns are SPECIFIC tissues- so it is the gene expression in that tissue
I'm using the following code to make a heatmap:
heatmap(expression_all_tissues_matrix, scale= "column",col=brewer.pal(9,"Blues"))
I do not know how to make a legend.
I've tried to make the legend/key seperately but I cannot figure out how to use "Blues" in brewer.pal.
Thanks!
The use of the pheatmap package with its eponymous function allows to get what you are looking for. The following code allows you to have the legend on the same graph.
require(pheatmap)
require(RColorBrewer)
pheatmap(as.matrix(expression_all_tissues_matrix),color=brewer.pal(9,"Blues"))
You can also play on several arguments to associate rows and columns by clustering, but if you don't want to classify them, just use the arguments cluster_rows = F and cluster_col = F . Don't forget to normalize the data, it can help you to have a nicer rendering. Use ?pheatmap for more information.

Plotting heatmap with R and clustering

hello everyone I am trying to plot the heat map wanted cluster the plot and plot is not looking good wanted change the color i am newbie can any one tell me how can I plot heat-map with clustering values which are showing similar pattern cluster together
my data data_link
what i tried simply tried to log normalize the data and plot the graph
library(ggplot2)
library(reshape2)
mydata=read.table("Test_data", sep="\t", header=TRUE)
melted_cormat <- melt(mydata)
head(melted_cormat)
melted_cormat$new=log2(1+melted_cormat$value)
ggplot(data = melted_cormat, aes(x=variable, y=ID, fill=new)) +
geom_tile()
is it posible increase each value cell size like below
image
please suggest me
Thank you
You can make a heatmap from this data, but I don't think it will be a very good way to visualize this much data. You have 287 rows in mydata, which means you will have 287 rows in your plot. This will make the individual rows difficult to make out, and it will make labelling of the y axis impossible.
The other issue is that approximately 99% of your values are under 1000, yet your highest value is almost 6000. That means that the scaling of your fill is going to be extremely uneven. It will be difficult to see much detail in the lower ranges.
If you want to see clustering you could use pheatmap instead of ggplot2, and I would probably do a log transform on the fill scale to reveal the details better. However, the problem with simply having too much data on a single plot persists.
mymatrix <- log(as.matrix(mydata[,-1]))
mymatrix[mymatrix < 0] <- 0
pheatmap::pheatmap(mymatrix)
EDIT
If you only plotted the first 10 rows of data, you can see this is more clearly like a heatmap:
pheatmap(as.matrix(mydata[1:10,-1]))
Or the first 30 rows:
pheatmap(as.matrix(mydata[1:30,-1]))

is there a way to preserve the clustering in a heatmap but reduce the number of observations?

I have data-set with 90 observations(rows) across 20 columns. I have generated a pretty neat heatmap which clusters my data in two groups with the package pheatmap. Although its not entirely clean but the two clusters of dendrogram pretty much separates my samples in 2 distinct groups as per my conditions. Now I want to reduce this set of 90 to a stricter set around 20-30 obeservations but still want to preserve the same clustering order as shown in pheatmap. Is there a way to do that? or any other package that reduces my observations to a minimum set which can still preserve by clustering order as seen now? The code for pheatmap is
pheatmap(mydata[rownames(df.90),],scale="row",clustering_distance_cols = "correlation",show_rownames= T,show_colnames=T,color=col,annotation=batch.annotation,cluster_col=T,fontsize_row = 8,fontsize_col = 8,clustering_method = "ward.D2",border_color = NA,)
any package in R that I am missing out can handle such or even something in the pheatmap I can use as a function for reducing the variables and make a kind of permutation test to find the minimum set of observations that can still retain my clustering
The data is genes in rows and expression in columns across patients.
I would like to answer my own question and want feedback. I used the kmeans_k=30 in the pheatmap and obtained 29 clusters that are still able to preserve my clustering of the 90 observations that I made previously. From there I obtained the genes in their respective clusters. I selected the top 5 clusters from that heatmap on either side of the observations that can still produce my required heatmap since they are the ones having high SD. Since all through my pheatmap I have scale="row" and kept both row dendrogram and col dendrogram on, I did not want to change them even now. So when I now plot this 31 genes(observations) in fact they improve my row clustering even more and totally partitions them in 2 groups in a more cleaner way as I wanted. Codes for kemans and new heatmap
with kmeans 30
obj<-pheatmap(df.90,scale="row",clustering_distance_cols = "correlation",show_rownames= T,show_colnames=T,color=col,annotation=batch.annotation,cluster_col=T,fontsize_row = 6,fontsize_col = 7,clustering_method = "ward.D2",border_color = NA,cellwidth = NA,cellheight = NA,kmeans_k = 30)
retrieve the clusters and extract the observations/genes
obj$kmeans$cluster
obtaining the top clusters and plot them with the heatmap
pheatmap(mydata[rownames(df.31),],scale="row",clustering_distance_cols = "correlation",show_rownames= T,show_colnames=T,color=col,annotation=batch.annotation,cluster_col=T,fontsize_row = 8,fontsize_col = 8,clustering_method = "ward.D2",border_color = NA,)
What you guys think of this approach? It is not like the one I intended but it is also not wrong I think. I would like to have feedback if someone can give a better method or approach or if they think it is also not correct. Thanks

creating comparable heatmaps in R

I am trying to create 2 heatmaps with variable values in R. I would like the colors and values to be scaled so that the values of the two heatmaps will be comparable. Right now I am using the heatmap.2 from the gplot package.
MyHeatMap <- heatmap.2(MyData, trace="none", col=greenred)
My data is in the from of a numeric matrix. I have two of these matrices where the numeric ranges of the values are slightly different and I would like to create quality heatmaps for both (does not have to necessarily be using the sample package).
I've encountered this issue a number of times in my own analyses and here is how I would suggest handling it.
Firstly, set your greenred color variable to have 256 colors with greenred(256).
Then, create a break variable that contains the range of numbers that you would like to split these 256 colors on for both heatmaps (the length will be one more than the length of the color vector). So, for instance, if you wanted the spread to be from -1 to 1 from green to red, respectively, you would do
pairs.breaks = seq(from=-1,to=1,length.out=257)
Then, when calling your heatmaps, use
MyHeatMap1 <- heatmap.2(MyData1, trace="none", col=greenred(256), breaks=pairs.breaks)
MyHeatMap2 <- heatmap.2(MyData2, trace="none", col=greenred(256), breaks=pairs.breaks)
This should produce two heat maps with different data sets that use identical color scales.
Hope this helps!
Ron

Clustering and heatmap in R

I am a newbie to R and I am trying to do some clustering on a data table where rows represent individual objects and columns represent the features that have been measured for these objects. I've worked through some clustering tutorials and I do get some output, however, the heatmap that I get after clustering does not correspond at all to the heatmap produced from the same data table with another programme. While the heatmap of that programme does indicate clear differences in marker expression between the objects, my heatmap doesn't show much differences and I cannot recognize any clustering (i.e., colour) pattern on the heatmap, it just seems to be a randomly jumbled set of colours that are close to each other (no big contrast). Here is an example of the code I am using, maybe someone has an idea on what I might be doing wrong.
mydata <- read.table("mydata.csv")
datamat <- as.matrix(mydata)
datalog <- log(datamat)
I am using log values for the clustering because I know that the other programme does so, too
library(gplots)
hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete")
mycl <- cutree(hr, k=7)
mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)]
heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA,
col=colorpanel(40, "black","yellow","green"),
scale="column", RowSideColors=mycol)
Again, I plot the original colours but use the log-clusters because I know that this is what the other programme does.
I tried to play around with the methods, but I don't get anything that would at least somehow look like a clustered heatmap. When I take out the scaling, the heatmap becomes extremely dark (and I am actually quite sure that I have somehow to scale or normalize the data by column). I also tried to cluster with k-means, but again, this didn't help. My idea was that the colour scale might not be used completely because of two outliers, but although removing them slightly increased the range of colours plotted on the heatmap, this still did not reveal proper clusters.
Is there anything else I could play around with?
And is it possible to change the colour scale with heatmap so that outliers are found in the last bin that has a range of "everything greater than a particular value"? I tried to do this with heatmap.2 (argument "breaks"), but I didn't quite succeed and also I didn't manage to put the row side colours that I use with the heatmap function.
If you are okay with using heatmap.2 from the gplots package that will allow you to add breaks to assign colors to ranges represented in your heatmap.
For example if you had 3 colors blue, white, and red with the values going from low to high you could do something like this:
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
In this case you have 3 sets of values that correspond to the 3 colors, the values will differ of course depending on what values you have with your data.
One thing you are doing in your program is to call hclust on your data then to call heatmap on it, however if you look in the heatmap manual page it states:
Defaults to hclust.
So I don't think you need to do that. You might want to take a look at some similar questions that I had asked that might help to point you in the right direction:
Heatmap Question 1
Heatmap Question 2
If you post an image of the heatmap you get and an image of the heatmap that the other program is making it will be easier for us to help you out more.

Resources