Rescaling colors palette in r - r

In R i have a cloud of data around zero ,and some data around 1, i want to "rescale" my heat colors to distinguish lower numbers.This has to be done in a rainbow way, i don't want "discrete colors".I tried with breaks in image.plot but it doesn't work.
image.plot(X,Y,as.matrix(mymatrix),col=heat.colors(800),asp=1,scale="none")
I tried :
lowerbreak=seq(min(values),quantile2,len=80)
highbreak=seq(quantile2+0.0000000001,max(values),len=20)
break=c(lowerbreak,highbreak)
ii <- cut(values, breaks = break,
include.lowest = TRUE)
colors <- colorRampPalette(c("lightblue", "blue"))(99)[ii]

Here's an approach using the "squash" library. With makecmap(), you specify your colour values and breaks, and you can also specify that it should be log stretched using the base parameter. It's a bit complex, but gives you granular control. I use it to colorize skewed data, where I need more definition in the "low end".
To achieve the rainbow palette, I used the built-in "jet" colour function, but you can use any colour set - I give an example for creating a greyscale ramp with "colorRampPalette".
Whatever ramp you use, it will take some playing with the base value to optimize for your data.
install.packages("squash")
library("squash")
#choose your colour thresholds - outliers will be RED
minval=0 #lowest value to get a colour
maxval=2.0 #highest value to get a colour
n.cols=100 #how many colours do you want in your palette?
col.int=1/n.cols
#create your palette
colramp=makecmap(x=seq(minval,maxval,col.int),
n=n.cols,
breaks=prettyLog,
symm=F,
base=10,#to give ramp a log(base) stretch
colFn=jet,
col.na="red",
right=F,
include.lowest=T)
# If you don't like the colFn options in "makecmap", define your own!
# Here's an example in greyscale; pass this to "colFn" above
user.colfn=colorRampPalette(c("black","white"))
Example for using colramp in a plot (assuming you've already created colramp as above somewhere in your program):
varx=1:100
vary=1:100
plot(x,y,col=colramp$colors) #colors is the 2nd vector in the colramp list
To select specific colours, subset from the list via, e.g., colors[1:20] (if you try this with the example above, the first colors will repeat 5 times - not really useful but you get the logic and can play around).
In my case, I had a grid of values that I wanted to turn into a coloured raster image (i.e. colour mapping some continuous data). Here's example code for that, using a made up matrix:
#create a "dummy matrix"
matx=matrix(data=c(rep(2,50),rep(0,500),rep(0.5,500),rep(1,500),rep(1.5,500)),nrow=50,ncol=41,byrow=F)
#transpose the matrix
# the output of "savemat" is rotated 90 degrees to the left
# so savemat(maty) will be a colorized version of (matx)
maty=t(matx)
#savemat creates an image using colramp
savemat(x=maty,
filename="/Users/KeeganSmith/Desktop/matx.png",
map=colramp,
outlier="red",
dev="png",
do.dev.off=T)

When using colorRampPalette, you can set the bias argument to emphasise low (or high) values.
Something like colorRampPalette(heat.colors(100),bias=3) will result focus the 'ramp' on the lower, helping them to be more visually distinguishable.

Related

How do I edit the dendrogram whilst using heatmap.2 function in r

I am trying to create a heatmap to represent the change of gene expression over a period of time. the code I have used is this:
coul <- colorRampPalette(brewer.pal(8, "Reds"))(25)
heatmap.2(dm, dendogram=c("row"),Colv=NA, xlab="Time points", ylab="Genes of interest", scale="row", col=coul, tracecol = NA)
As one cant really see the branches of the dendogram that well, I was wondering whether you can somehow stretch it out to become more visible?
I was also wondering how to remove the "colour key and histogram" label.
Many thanks!
if you don't mind the color guide being wide, you just use the lwid option, you specify a vector that decides the ratio of dendrogram to heatmap, below I use c(3,3), which means 1:1.
set.seed(100)
x = matrix(rnorm(1000),100,10)
heatmap.2(x,trace="none",Colv=NA,dendrogram=c("row"),tracecol = NA)
heatmap.2(x,trace="none",Colv=NA,
dendrogram=c("row"),lwid=c(3,3),tracecol = NA,keysize=0.75)
One way to narrow the margins of the color guide is to use key.par, where you set the margins on the right to be larger (I use 10 in example below).
heatmap.2(x,trace="none",Colv=NA,dendrogram=c("row"),
lwid=c(3,3),tracecol = NA,keysize=0.75,key.par=list(mar=c(3,3,3,10)))

Manually creating an object that looks like a heatmap color key

I'm working on trying to create a key for a heatmap, but as far as I know, I cannot use the existing tools for adding a legend since I've generated the colors myself (I manually turn a scaled variable into rgb values for a short rainbow ( [255,0,0] to [0,0,255] ).
Basically, all I want to do is use the rightmost 10th of the screen to create a rectangle with these 10 colors: "#0000FF", "#0072FF", "#00E3FF", "#00FFAA", "#00FF38", "#39FF00", "#AAFF00", "#FFE200", "#FF7100", "#FF0000"
with three numerical labels - at 0, max/2, and max
In essence, I want to manually produce an object that looks like a rudimentary heatmap color key.
As far as I know, split.screen can only split the screen in half, which isn't what I'm looking for. I want the graphic I already know how to produce to take up the leftmost 90% of the screen, and I want this colored rectangle to take up the other 10%.
Thanks.
EDIT: I greatly appreciate the advice about the best way to the the plot - that said, I still would like to know the best way to do the task originally asked - creating the legend by hand; I already am able to produce the exact heatmap graphic that I'm looking for - the false coloring wasn't the only problem with ggplot that I was having - it was just the final factor convincing me to switch. I need a non ggplot solution.
EDIT #2: This is close to the solution I am looking for, except this only goes up to 10 instead of accepting a maximum value as a parameter (I will be running this code on multiple data-sets, all with different maximum values - I want the legend to reflect this). Additionally, if I change the size of the graph, the key falls apart into disconnected squares.
Take a look at the layouts function (link). I think you want something like this:
layout(matrix(c(1,2), 1, 2, byrow = TRUE), widths=c(9,1))
## plot heatmap
## plot legend
I would also recommend the ggplot2 package and the geom_tile function which will take care of all of this for you.
Assuming your data is in a data frame with the x and y coordinates and heatmap value (e.g. gdat <- data.frame(x_coord=c(1,2,...), y_coord=c(1,1,...), val=c(6,2,...))) Then you should be able to produce your desired heat map plot with the following ggplot command:
ggplot(gdat) + geom_tile(aes(x=x_coord, y=y_coord, fill=val)) +
scale_fill_gradient(low="#0000FF", high="#FF0000")
To get your data into the following format you may want to look into the very useful reshape2 package.
Given a script no ggplot restriction on this answer here is how one could produce the plot with just base R.
colors <- c("#0000FF", "#0072FF", "#00E3FF", "#00FFAA", "#00FF38",
"#39FF00", "#AAFF00", "#FFE200", "#FF7100", "#FF0000")
layout(matrix(c(1,2), 1, 2, byrow = TRUE), widths=c(9,1))
plot(rnorm(20), rnorm(20), col=sample(colors, 20, replace=TRUE))
par(mar=c(0,0,0,0))
plot(x=rep(1,10), y=1:10, col=colors, pch=15, cex=7.1)
You may have to adjust the cex for your device.

Scaling heat map colours for multiple heat maps

So I have a bunch of matrices that I am trying to plot as a heatmaps. I am using the heatmap.2() function in the ggplot2 packaage.
I have been trying for quite some time with it, and I am sure there is a very simple fix, but my issue is this:
How do I keep the colours consistent between heatmaps? For example, to make the values that provide the colours absolute as opposed to relative.
I have tried doing something similar to this question:
R/ggplot: Reuse color key for multiple heat maps
But I was unable to figure out the ggplot function; I kept receiving an error message stating that there were "no layers in plot".
After reading the comments on the above question, I tried using scales::rescale() and discrete_scale() but the former does not remove the problem, while the latter did not work.
I am fully aware that I might be doing something very simple wrong, and just being a bit of an idiot, but for the life of me I can't figure out where I am going wrong.
As for the data itself, I am trying to plot 10 matrices/heatmaps, each 10x10 cells (showing change over time) and the values in the cells range from 1.0 to 1.2.
As an example, this is the code I am using (once I have my 10x10 matrix).
Matrix1<-matrix(data=(runif(100,1.0,1.2)),nrow=10,ncol=10)
heatmap.2(Matrix1, Colv=NA, Rowv=NA, dendrogram="none",
trace="none", key=F, cellnote=round(Matrix1,digits=2),
notecex=1.6, notecol="black",
labRow=seq(10,100,10), labCol=seq(10,100,10),
main="Title1", xlab="Xlab1", ylab="Ylab1"
)
So any help with either figuring out how to create the scaled values for the heatmap.2() function, or how I can use the ggplot() function would be greatly appreciated!
It's important to note that heatmap.2 is not a ggplot2 function. The ggplot2 package is not necessarily compatible with all plotting types. If you look at the ?heatmap.2 help page, in the upper left corner it shows you where the function is from. heatmap.2 {gplots} means that function comes from the gplots package. These are different pacakges so they have different rules how they work.
To get the same colors across different heatmaps, you want to explicitly get the breaks= parameter. By default it splits the observed range of the data into equal chunks. But since each data set may have a different min and max, these chunks may have different start and end points. By specifying breaks, you can make them all consistent. Since your data ranges from 1 to 1.2, you can set
mybreaks <- seq(1.0, 1.2, length.out=7)
and then in your call add
heatmap.2(Matrix1, Colv=NA, Rowv=NA, dendrogram="none",
...
breaks=mybreaks,
...
)
That should make them all match up.
Maybe this will help you. With the following code multiple heatmaps are stored in a list and displayed in a grid later on. This will allow you to control the colours of each heatmap since each heatmap is created separately. So in this case I chose to use green and red for the number range in each chart.
data(mtcars)
require(ggplot2)
require(gridExtra)
myplotslist2 <- list()
var = c("mpg", "wt", "drat")
new = cbind(mtcars, "variable")
new = cbind(car = rownames(mtcars), new)
for (i in 1:length(var)){
t= paste("new[[\"variable\"]] = \"", var[[i]],"\"; a = ggplot(new, aes(variable, car)) + geom_tile(aes(fill = ", var[[i]], "),colour = \"white\") + scale_fill_gradient(low = \"red\", high = \"green\") + theme(axis.title.y=element_blank(), axis.text.y=element_blank(),legend.position=\"none\"); myplotslist2[[i]] = a")
eval(parse(text=t))
}
grid.arrange(grobs=myplotslist2, ncol=length(var))
The result looks like this:
I hope this helps.
I explain more in my blogpost. https://dwh-businessintelligence.blogspot.nl/2016/05/pca-3d-and-k-means.html

How to change heatmap.2 color range in R?

I'm using gplot to produce a heatmap showing log2-fold changes of a treatment groups versus paired controls. With the following code:
heatmap.2(as.matrix(SeqCountTable), col=redgreen(75),
density.info="none", trace="none", dendrogram=c("row"),
symm=F,symkey=T,symbreaks=T, scale="none")
I output a heat map with real fold change values (i.e., non Row-Z score) which is what I'm after, in the Red-Black-Green color scheme that is every biologist's favorite!
The actual range of log2-fold change is -3/+7, with many values in the -2/-1 and +1/+2 range, which appear as dark red/green (respectively). This makes the whole heatmap quite dark and so difficult to interpret.
Is there a way of skewing the color gradient to make it less linear? That is, so that the gradient from black to quite bright occurs over a smaller range?
And / or change the color range to be asymmetric, i.e., to run from -3/+7, as the data does, rather than -7/+7 as the scale currently does, with black still centered on zero?
I got the color range to be asymmetric simply by changing the symkey argument to FALSE
symm=F,symkey=F,symbreaks=T, scale="none"
Solved the color issue with colorRampPalette with the breaks argument to specify the range of each color, e.g.
colors = c(seq(-3,-2,length=100),seq(-2,0.5,length=100),seq(0.5,6,length=100))
my_palette <- colorRampPalette(c("red", "black", "green"))(n = 299)
Altogether
heatmap.2(as.matrix(SeqCountTable), col=my_palette,
breaks=colors, density.info="none", trace="none",
dendrogram=c("row"), symm=F,symkey=F,symbreaks=T, scale="none")
You could try to create your own color palette using the RColorBrewer package
my_palette <- colorRampPalette(c("green", "black", "red"))(n = 1000)
and see how this looks like. But I assume in your case only scaling would help if you really want to keep the black in "the middle". You can simply use my_palette instead of the redgreen()
I recommend that you check out the RColorBrewer package, they have pretty nice in-built palettes, and see interactive website for colorbrewer.
I think you need to set symbreaks = FALSE
That should allow for asymmetrical color scales.
Here's another option for those not using heatmap.2 (aheatmap is good!)
Make a sequential vector of 100 values from min to max of your input matrix, find value closest to 0 in that, make two vector of colours to and from desired midpoint, combine and use them:
breaks <- seq(from=min(range(inputMatrix)), to=max(range(inputMatrix)), length.out=100)
midpoint <- which.min(abs(breaks - 0))
rampCol1 <- colorRampPalette(c("forestgreen", "darkgreen", "black"))(midpoint)
rampCol2 <- colorRampPalette(c("black", "darkred", "red"))(100-(midpoint+1))
rampCols <- c(rampCol1,rampCol2)

Clustering and heatmap in R

I am a newbie to R and I am trying to do some clustering on a data table where rows represent individual objects and columns represent the features that have been measured for these objects. I've worked through some clustering tutorials and I do get some output, however, the heatmap that I get after clustering does not correspond at all to the heatmap produced from the same data table with another programme. While the heatmap of that programme does indicate clear differences in marker expression between the objects, my heatmap doesn't show much differences and I cannot recognize any clustering (i.e., colour) pattern on the heatmap, it just seems to be a randomly jumbled set of colours that are close to each other (no big contrast). Here is an example of the code I am using, maybe someone has an idea on what I might be doing wrong.
mydata <- read.table("mydata.csv")
datamat <- as.matrix(mydata)
datalog <- log(datamat)
I am using log values for the clustering because I know that the other programme does so, too
library(gplots)
hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete")
mycl <- cutree(hr, k=7)
mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)]
heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA,
col=colorpanel(40, "black","yellow","green"),
scale="column", RowSideColors=mycol)
Again, I plot the original colours but use the log-clusters because I know that this is what the other programme does.
I tried to play around with the methods, but I don't get anything that would at least somehow look like a clustered heatmap. When I take out the scaling, the heatmap becomes extremely dark (and I am actually quite sure that I have somehow to scale or normalize the data by column). I also tried to cluster with k-means, but again, this didn't help. My idea was that the colour scale might not be used completely because of two outliers, but although removing them slightly increased the range of colours plotted on the heatmap, this still did not reveal proper clusters.
Is there anything else I could play around with?
And is it possible to change the colour scale with heatmap so that outliers are found in the last bin that has a range of "everything greater than a particular value"? I tried to do this with heatmap.2 (argument "breaks"), but I didn't quite succeed and also I didn't manage to put the row side colours that I use with the heatmap function.
If you are okay with using heatmap.2 from the gplots package that will allow you to add breaks to assign colors to ranges represented in your heatmap.
For example if you had 3 colors blue, white, and red with the values going from low to high you could do something like this:
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
In this case you have 3 sets of values that correspond to the 3 colors, the values will differ of course depending on what values you have with your data.
One thing you are doing in your program is to call hclust on your data then to call heatmap on it, however if you look in the heatmap manual page it states:
Defaults to hclust.
So I don't think you need to do that. You might want to take a look at some similar questions that I had asked that might help to point you in the right direction:
Heatmap Question 1
Heatmap Question 2
If you post an image of the heatmap you get and an image of the heatmap that the other program is making it will be easier for us to help you out more.

Resources