I have a large dataset of the expression of genes.
The rows are the genes.
The columns are SPECIFIC tissues- so it is the gene expression in that tissue
I'm using the following code to make a heatmap:
heatmap(expression_all_tissues_matrix, scale= "column",col=brewer.pal(9,"Blues"))
I do not know how to make a legend.
I've tried to make the legend/key seperately but I cannot figure out how to use "Blues" in brewer.pal.
Thanks!
The use of the pheatmap package with its eponymous function allows to get what you are looking for. The following code allows you to have the legend on the same graph.
require(pheatmap)
require(RColorBrewer)
pheatmap(as.matrix(expression_all_tissues_matrix),color=brewer.pal(9,"Blues"))
You can also play on several arguments to associate rows and columns by clustering, but if you don't want to classify them, just use the arguments cluster_rows = F and cluster_col = F . Don't forget to normalize the data, it can help you to have a nicer rendering. Use ?pheatmap for more information.
Related
I have RNA-Seq data that I want to visualize in a HeatMap with HeatMap.2 in R, but I do not understand to to force the HeatMap to look the way I want it to.
I have search online a lot and I feel like I am very close, but cannot overcome the last hurdle. I am using the following code where I have a matrix with 6 samples (triplicates of 2 conditions) and 209 specific genes I want to look at. The 209 genes I'm interested in fit into 3 categories, and I'm trying to show that using the RowSideColors= argument.
Here is my code:
colors <- colorpanel(75,"yellow","black","dodgerblue2")
heatmap.2(as.matrix(counts),
col=colors,
RowSideColors=SideCol,
scale="row",
key=T,
keysize=1,
density.info="none",
trace="none",
cexCol=0.9,
cexRow=0.5)
I know from searching online that I can use the Rowv command to reorder the dendrogram and order the rows the way I want, but I don't understand how to use the command. When I set Rowv=F it does not make a dendrogram and my genes are ordered the same as they are in the matrix. I want them to be grouped by RowSideColor category and then arranged such that they follow the key (i.e. all blue rows together and fade into black then yellow).
I thought I could get around this obstacle by determining the row z-score myself and arranging my matrix by category and row z-score, but my z-score calculations were much different than what Heatmap.2 determines, and the row color scale follows no pattern. I determined z-score by (x-mean)/sd
How can I arrange the rows the way I want them to be?
Thanks in advance for any help, I greatly appreciate it!
EDIT:
This is a crude representation of what I'd like the HeatMap to look like:
I am using the R package adegenet to plot the neighbor-joining tree.
In my file I have 20,000 columns and 500 rows. Rows correspond to individuals. My first column is Population ID and second column is Individual ID. Columns contain values 0,1 & 2. I am able to plot a tree in one color, but depending upon the population I want every cluster to be a different color.
This is what I did, If "dat" is my data file,then
D<-dist(as.matrix(dat))
tre<-nj(D)
plot(tre, type = "unr", show.tip.lab = TRUE, cex=0.3, font=1, edge.col="Blue")
If I try edge.col=c("red","green","blue") I run into following error :
Error in if (use.edge.length) unrooted.xy(Ntip, Nnode, z$edge, z$edge.length, :
argument is not interpretable as logical
Ill appreciate any help!
Your example should be reproducible, so that it would be easier to help and reproduce your problem. See this post for more details. I'm trying with iris and it works like a charm. By the way, I think adegenet is not required here, the plot is actually a plot.phylo from the package ape), and all other functions are either built-in or from ape).
Documentation (?plot.phylo) says:
edge.col a vector of mode character giving the colours used to draw the branches of the plotted phylogeny. These are taken to be in the same order than the component edge of phy. If fewer colours are given than the length of edge, then the colours are recycled.
ape preserves the order or rows, and you can use a factor to index you vector of colors, so a reproducible example using iris could be:
library(ape)
D <-dist(as.matrix(iris[, 1:4]))
tree <- nj(D)
plot(tree, type = "unr", show.tip.lab = TRUE, cex=0.3, font=1,
edge.col=c("red","green","blue")[iris$Species])
Is that what you want?
I am creating a heatmap for a given matrix. I also separately have multiple factors to be shown along with the heatmap. Right now I could create one RowSideColors for one factor. But is there a way to create RowSideColors for multiple factors from gplots heatmap.2 function?
In other words, many RowSideColors with the heatmap. Any tips?
Based on what you've posted, I've attempted to include a reproducible example below in case anyone else has a similar question:
require(gplots)
data(mtcars)
df <- as.matrix(mtcars[,8:11])
df = df[order(rownames(df)),] # sorts the rows in alphabetical order
# specifying a column dendrogram
heatmap.2(df, Rowv=FALSE, dendrogram=c("column"))
The resulting heatmap is as follows:
After bit of digging, I found the solution myself that if you specify
tmpSorted = tmp[order(rownames(tmp)),] # sorts alphabetical order
heatmap.2(tmpSorted, Rowv=F .... )
the option Rowv=F works!
I am a newbie to R and I am trying to do some clustering on a data table where rows represent individual objects and columns represent the features that have been measured for these objects. I've worked through some clustering tutorials and I do get some output, however, the heatmap that I get after clustering does not correspond at all to the heatmap produced from the same data table with another programme. While the heatmap of that programme does indicate clear differences in marker expression between the objects, my heatmap doesn't show much differences and I cannot recognize any clustering (i.e., colour) pattern on the heatmap, it just seems to be a randomly jumbled set of colours that are close to each other (no big contrast). Here is an example of the code I am using, maybe someone has an idea on what I might be doing wrong.
mydata <- read.table("mydata.csv")
datamat <- as.matrix(mydata)
datalog <- log(datamat)
I am using log values for the clustering because I know that the other programme does so, too
library(gplots)
hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete")
mycl <- cutree(hr, k=7)
mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)]
heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA,
col=colorpanel(40, "black","yellow","green"),
scale="column", RowSideColors=mycol)
Again, I plot the original colours but use the log-clusters because I know that this is what the other programme does.
I tried to play around with the methods, but I don't get anything that would at least somehow look like a clustered heatmap. When I take out the scaling, the heatmap becomes extremely dark (and I am actually quite sure that I have somehow to scale or normalize the data by column). I also tried to cluster with k-means, but again, this didn't help. My idea was that the colour scale might not be used completely because of two outliers, but although removing them slightly increased the range of colours plotted on the heatmap, this still did not reveal proper clusters.
Is there anything else I could play around with?
And is it possible to change the colour scale with heatmap so that outliers are found in the last bin that has a range of "everything greater than a particular value"? I tried to do this with heatmap.2 (argument "breaks"), but I didn't quite succeed and also I didn't manage to put the row side colours that I use with the heatmap function.
If you are okay with using heatmap.2 from the gplots package that will allow you to add breaks to assign colors to ranges represented in your heatmap.
For example if you had 3 colors blue, white, and red with the values going from low to high you could do something like this:
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
In this case you have 3 sets of values that correspond to the 3 colors, the values will differ of course depending on what values you have with your data.
One thing you are doing in your program is to call hclust on your data then to call heatmap on it, however if you look in the heatmap manual page it states:
Defaults to hclust.
So I don't think you need to do that. You might want to take a look at some similar questions that I had asked that might help to point you in the right direction:
Heatmap Question 1
Heatmap Question 2
If you post an image of the heatmap you get and an image of the heatmap that the other program is making it will be easier for us to help you out more.
I've generated a set of levels from my dataset, and now I want to find a way to sum the rest of the data columns in order to plot it while plotting my first column. Something like:
levelSet <- cut(frame$x1, "cutting")
boxplot(frame$x1~levelSet)
for (l in levelSet)
{
x2Sum<-sum(frame$x2[levelSet==l])
}
or maybe the inside of the loop should look like:
lines(sum(frame$x2[levelSet==l]))
Any thoughts? I am new to R, but I can't seem to get a hang of the indexing and ~ notation thus far.
I know r doesn't work this way, but I'd like functionality that 'looks' like
hist(frame$x2~levelSet)
## Or
hist(frame$x2, breaks = levelSet)
To plot a histograph, boxplot, etc. over a level set:
Try the lattice package:
library(lattice)
histogram(~x2|equal.count(x1),data=frame)
Substitute shingle for equal.count to set your own break points.
ggplot2 would also work nicely for this.
To put a histogram over a boxplot:
par(mfrow=c(2,1))
hist(x2)
boxplot(x2)
You can also use the layout() command to fine-tune the arrangement.