I am trying to create 2 heatmaps with variable values in R. I would like the colors and values to be scaled so that the values of the two heatmaps will be comparable. Right now I am using the heatmap.2 from the gplot package.
MyHeatMap <- heatmap.2(MyData, trace="none", col=greenred)
My data is in the from of a numeric matrix. I have two of these matrices where the numeric ranges of the values are slightly different and I would like to create quality heatmaps for both (does not have to necessarily be using the sample package).
I've encountered this issue a number of times in my own analyses and here is how I would suggest handling it.
Firstly, set your greenred color variable to have 256 colors with greenred(256).
Then, create a break variable that contains the range of numbers that you would like to split these 256 colors on for both heatmaps (the length will be one more than the length of the color vector). So, for instance, if you wanted the spread to be from -1 to 1 from green to red, respectively, you would do
pairs.breaks = seq(from=-1,to=1,length.out=257)
Then, when calling your heatmaps, use
MyHeatMap1 <- heatmap.2(MyData1, trace="none", col=greenred(256), breaks=pairs.breaks)
MyHeatMap2 <- heatmap.2(MyData2, trace="none", col=greenred(256), breaks=pairs.breaks)
This should produce two heat maps with different data sets that use identical color scales.
Hope this helps!
Ron
Related
I am plotting a heatmap in R using the base R heatmap() function. Is there a way to define more colours so that the heatmap has a greater variation in the colours used. Currently it is using about 10 and the "hottest" area is quite large and dark purple. I want more colours so that this large area itself it broken down into more colours to better differentiate.
Try experimenting with the color palettes of the grDevices package.
library(grDevices)
heatmap(x, col = topo.colors(n))
where n is the number of colors.
Or, alternatively
col = rainbow(n)
col = terrain.colors(n)
col = cm.colors(n)
However, often the problem with differentiation does not depend on the number of colors, but on the data variability: many of them may be clustered in a small range of values. In such case you could try to differentiate them by chosing a subrange or transforming the data, for example by graphing their logaritm.
Examples:
50 colors from cm.colors palette:
heatmap(Ca, col=cm.colors(50), Rowv=NA, Colv=NA)
matrix of log values, with 50 colors from cm.colors palette:
heatmap(log(Ca), col=cm.colors(50), Rowv=NA, Colv=NA)
in which subtler differences can be seen.
Heatmap with high expression values on the bottom
I'm quite new to Rstudio and I'm trying to make a heatmap using the heatmaply function in r, but in some heatmaps (with different data) the high expression values (in red) show on top, and with another dataset the high expression values show up at the bottom, with low expression values on top, as in the image.
I use the same code for the different datasets
heatmaply(Heatmap_DEXFORM, dendrogram = "row", scale_fill_gradient_fun = scale_fill_gradient2(low="blue",high="red", midpoint=0,limits=c(-4,6))
Is this a result of the way my data is shaped? Is there a command where I can make the heatmap flip so the high expression values show on top, as in my other heatmaps?
Thanks in advance!
heatmaps are typically ordered based on hierarchical clustering rather than the magnitude of the values. To order based on magnitude (high at the top or vice versa) you would need to supply a dendrogram (as Tal suggested) or manually re-order your data (for example, based on the row sums or row means (or column sums/means)).
See the toy example below.
mat <- scale(mtcars)
heatmaply(mat, dend = "none")
heatmaply(mat[order(rowSums(mat)), ], dend = "none")
In R i have a cloud of data around zero ,and some data around 1, i want to "rescale" my heat colors to distinguish lower numbers.This has to be done in a rainbow way, i don't want "discrete colors".I tried with breaks in image.plot but it doesn't work.
image.plot(X,Y,as.matrix(mymatrix),col=heat.colors(800),asp=1,scale="none")
I tried :
lowerbreak=seq(min(values),quantile2,len=80)
highbreak=seq(quantile2+0.0000000001,max(values),len=20)
break=c(lowerbreak,highbreak)
ii <- cut(values, breaks = break,
include.lowest = TRUE)
colors <- colorRampPalette(c("lightblue", "blue"))(99)[ii]
Here's an approach using the "squash" library. With makecmap(), you specify your colour values and breaks, and you can also specify that it should be log stretched using the base parameter. It's a bit complex, but gives you granular control. I use it to colorize skewed data, where I need more definition in the "low end".
To achieve the rainbow palette, I used the built-in "jet" colour function, but you can use any colour set - I give an example for creating a greyscale ramp with "colorRampPalette".
Whatever ramp you use, it will take some playing with the base value to optimize for your data.
install.packages("squash")
library("squash")
#choose your colour thresholds - outliers will be RED
minval=0 #lowest value to get a colour
maxval=2.0 #highest value to get a colour
n.cols=100 #how many colours do you want in your palette?
col.int=1/n.cols
#create your palette
colramp=makecmap(x=seq(minval,maxval,col.int),
n=n.cols,
breaks=prettyLog,
symm=F,
base=10,#to give ramp a log(base) stretch
colFn=jet,
col.na="red",
right=F,
include.lowest=T)
# If you don't like the colFn options in "makecmap", define your own!
# Here's an example in greyscale; pass this to "colFn" above
user.colfn=colorRampPalette(c("black","white"))
Example for using colramp in a plot (assuming you've already created colramp as above somewhere in your program):
varx=1:100
vary=1:100
plot(x,y,col=colramp$colors) #colors is the 2nd vector in the colramp list
To select specific colours, subset from the list via, e.g., colors[1:20] (if you try this with the example above, the first colors will repeat 5 times - not really useful but you get the logic and can play around).
In my case, I had a grid of values that I wanted to turn into a coloured raster image (i.e. colour mapping some continuous data). Here's example code for that, using a made up matrix:
#create a "dummy matrix"
matx=matrix(data=c(rep(2,50),rep(0,500),rep(0.5,500),rep(1,500),rep(1.5,500)),nrow=50,ncol=41,byrow=F)
#transpose the matrix
# the output of "savemat" is rotated 90 degrees to the left
# so savemat(maty) will be a colorized version of (matx)
maty=t(matx)
#savemat creates an image using colramp
savemat(x=maty,
filename="/Users/KeeganSmith/Desktop/matx.png",
map=colramp,
outlier="red",
dev="png",
do.dev.off=T)
When using colorRampPalette, you can set the bias argument to emphasise low (or high) values.
Something like colorRampPalette(heat.colors(100),bias=3) will result focus the 'ramp' on the lower, helping them to be more visually distinguishable.
I am curious if there's a way to improve upon the answers mentioned in 1
For example,
1) Can the x and y columns of the data-frame be colored differently rather than red or using a color gradient?. And as specified in ggplot2 documentation, I don't want color the columns according to a factor
2) Furthermore, can the shape of points be altered respectively for each of the columns in the data-fame (e.g. triangles for x values and round for y values)
To achieve the same, afaik, I tried to plot each column separately by tweaking the code mentioned in 1
All i got was the same plot with red color for each point with a failure to change the shape when using the aes() function for each column separately.
Thanks and Regards,
Yogesh
I am a newbie to R and I am trying to do some clustering on a data table where rows represent individual objects and columns represent the features that have been measured for these objects. I've worked through some clustering tutorials and I do get some output, however, the heatmap that I get after clustering does not correspond at all to the heatmap produced from the same data table with another programme. While the heatmap of that programme does indicate clear differences in marker expression between the objects, my heatmap doesn't show much differences and I cannot recognize any clustering (i.e., colour) pattern on the heatmap, it just seems to be a randomly jumbled set of colours that are close to each other (no big contrast). Here is an example of the code I am using, maybe someone has an idea on what I might be doing wrong.
mydata <- read.table("mydata.csv")
datamat <- as.matrix(mydata)
datalog <- log(datamat)
I am using log values for the clustering because I know that the other programme does so, too
library(gplots)
hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete")
mycl <- cutree(hr, k=7)
mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)]
heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA,
col=colorpanel(40, "black","yellow","green"),
scale="column", RowSideColors=mycol)
Again, I plot the original colours but use the log-clusters because I know that this is what the other programme does.
I tried to play around with the methods, but I don't get anything that would at least somehow look like a clustered heatmap. When I take out the scaling, the heatmap becomes extremely dark (and I am actually quite sure that I have somehow to scale or normalize the data by column). I also tried to cluster with k-means, but again, this didn't help. My idea was that the colour scale might not be used completely because of two outliers, but although removing them slightly increased the range of colours plotted on the heatmap, this still did not reveal proper clusters.
Is there anything else I could play around with?
And is it possible to change the colour scale with heatmap so that outliers are found in the last bin that has a range of "everything greater than a particular value"? I tried to do this with heatmap.2 (argument "breaks"), but I didn't quite succeed and also I didn't manage to put the row side colours that I use with the heatmap function.
If you are okay with using heatmap.2 from the gplots package that will allow you to add breaks to assign colors to ranges represented in your heatmap.
For example if you had 3 colors blue, white, and red with the values going from low to high you could do something like this:
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
In this case you have 3 sets of values that correspond to the 3 colors, the values will differ of course depending on what values you have with your data.
One thing you are doing in your program is to call hclust on your data then to call heatmap on it, however if you look in the heatmap manual page it states:
Defaults to hclust.
So I don't think you need to do that. You might want to take a look at some similar questions that I had asked that might help to point you in the right direction:
Heatmap Question 1
Heatmap Question 2
If you post an image of the heatmap you get and an image of the heatmap that the other program is making it will be easier for us to help you out more.