R heatmap with diverging colour palette - r

I am trying to create a simple heatmap in R, using a diverging colour palette. I want to use a gradient so that all numbers below a threshold N are designated a color (say purple), and all numbers above the threshold are designated another color (say orange). The further away the number is from the threshold, the darker the color should be.
Here is a sample dataset:
Division,COL1,COL2,COL3,COL4,COL5,COL6,COL7
Division 1,31.9221884012222,75.8181694429368,97.0480443444103,96.295954938978,70.5677134916186,63.0451830103993,93.0396212730557
Division 2,85.7012346852571,29.0621076244861,16.9130333233625,94.6443660184741,19.9103083927184,61.9562198873609,72.3791105207056
Division 3,47.1665125340223,99.4153356179595,8.51091076619923,79.1276383213699,41.915355855599,7.45079894550145,24.6946100145578
Division 4,66.0743870772421,24.6163331903517,78.694460215047,42.04714265652,50.2694897353649,73.0409651994705,87.3745442833751
Division 5,29.6664374880493,35.4036891367286,19.2967326845974,5.48460693098605,32.4517334811389,15.5926876701415,76.0523204226047
Division 6,95.4969164915383,8.63230894319713,61.7535551078618,24.5590241160244,25.5453423131257,56.397921172902,44.4693325087428
Division 7,87.5015622004867,28.7770316936076,56.5095080062747,34.6680747810751,28.1923673115671,65.0204187724739,13.795713102445
Division 8,70.1077231671661,72.4712177179754,38.4903231170028,36.1821102909744,97.0875509083271,17.184783378616,78.2292529474944
Division 9,47.3570406902581,90.2257485780865,65.6037972308695,77.0234781783074,25.6294377148151,84.900529962033,82.5080851092935
Division 10,58.0811711959541,0.493217632174492,58.5604055318981,53.5780876874924,9.12552657537162,20.313960686326,78.1371118500829
Division 11,34.6708688884974,76.711881859228,22.6064443588257,22.1724311355501,5.48891355283558,79.1159523651004,56.8405059166253
Division 12,33.6812808644027,44.1363711375743,70.6362190190703,3.78900407813489,16.6075889021158,9.12654218263924,39.9711143691093
Here is a simple snippet to produce a heatmap from the above data
data <- read.csv("dataset.csv", sep=",")
row.names(data) <- data$Division
data <- data[,2:7]
data_matrix <- data.matrix(data)
heatmap(data_matrix, Rowv=NA, Colv=NA, col = heat.colors(256), scale="column", margins=c(5,10))
How can I modify the above code to produce:
a color gradient (orange) for all numbers ABOVE 50 (darker the further the number is from 50)
a color gradient (purple) for all numbers BELOW 50 (darker the further the number is from 50)
Nice to have (but optional) write the number value in the grid cell
Nice to have (but optional), use a different color for grid cell that is EXACTLY the threshold number (50 in this case)
[[Edit]]
I have just seen this question on SO, which seems to be very similar. The answer uses ggplot (which I have no experience of), and I have so far, been unable to adapt the ggplot solution to my slightly more complicated data.

This should get you most of the way. (Note that you'll need to set scale="none" if you want the plotted colors to correspond to the actual (rather than the rescaled) values of the cells).
ncol <- 100
## Make a vector with n colors
cols <- RColorBrewer:::brewer.pal(11,"PuOr") # OR c("purple","white","orange")
rampcols <- colorRampPalette(colors = cols, space="Lab")(ncol)
rampcols[(n/2) + 1] <- rgb(t(col2rgb("green")), maxColorValue=256)
## Make a vector with n+1 breaks
rampbreaks <- seq(0, 100, length.out = ncol+1)
## Try it out
heatmap(data_matrix, Rowv = NA, Colv = NA, scale="none",
col = rampcols, breaks = rampbreaks)
EDIT
For finer control over the placement of the threshold, I'd suggest creating two separate palettes -- one for values less than the threshold and one for values above the threshold -- and then "suturing" them together. Try something like this, playing around with different values for Min, Max, Thresh, etc.:
nHalf <- 50
Min <- 0
Max <- 100
Thresh <- 50
## Make vector of colors for values below threshold
rc1 <- colorRampPalette(colors = c("purple", "white"), space="Lab")(nHalf)
## Make vector of colors for values above threshold
rc2 <- colorRampPalette(colors = c("white", "orange"), space="Lab")(nHalf)
rampcols <- c(rc1, rc2)
## In your example, this line sets the color for values between 49 and 51.
rampcols[c(nHalf, nHalf+1)] <- rgb(t(col2rgb("green")), maxColorValue=256)
rb1 <- seq(Min, Thresh, length.out=nHalf+1)
rb2 <- seq(Thresh, Max, length.out=nHalf+1)[-1]
rampbreaks <- c(rb1, rb2)
heatmap(data_matrix, Rowv = NA, Colv = NA, scale="none",
col = rampcols, breaks = rampbreaks)

I found this thread very useful and also pulled some ideas from here, but for my purposes I needed to generalize some things and wanted to use the RColorBrewer package. While I was working on it Dr. Brewer (of Color Brewer fame) stopped in my office and told me I needed to interpolate within the smaller color breaks rather than just pick the end points. I thought others might find this useful so I am posting my function here for posterity.
The function takes in your data vector, the name of a diverging colorBrewer palette, and the center point for your color scheme (default is 0). It outputs a list containing 2 objects: a classIntervals object and a vector of colors: The function is set to interpolate a total of 100 colors but that can be modified with some care.
diverge.color <- function(data,pal_choice="RdGy",centeredOn=0){
nHalf=50
Min <- min(data,na.rm=TRUE)
Max <- max(data,na.rm=TRUE)
Thresh <- centeredOn
pal<-brewer.pal(n=11,pal_choice)
rc1<-colorRampPalette(colors=c(pal[1],pal[2]),space="Lab")(10)
for(i in 2:10){
tmp<-colorRampPalette(colors=c(pal[i],pal[i+1]),space="Lab")(10)
rc1<-c(rc1,tmp)
}
rb1 <- seq(Min, Thresh, length.out=nHalf+1)
rb2 <- seq(Thresh, Max, length.out=nHalf+1)[-1]
rampbreaks <- c(rb1, rb2)
cuts <- classIntervals(data, style="fixed",fixedBreaks=rampbreaks)
return(list(cuts,rc1))
}
in my work I am using this scheme to plot a raster layer (rs) using spplot like so:
brks<-diverge.color(values(rs))
spplot(rs,col.regions=brks[[2]],at=brks[[1]]$brks,colorkey=TRUE))

Related

Plotting terra raster with white color set for 0 values

I have a raster in terra that has values that ranges around 0. I want my negative values to be blue and positive values to be red. This question has already been ask here and here, however the answer is for the raster package and do not apply for the terra package has there is no breakpoints argument.
Here is what I tried so far:
# preparing the session
library(terra)
library(magrittr)
# preparing the raster, I'm making sure my data go through 0 but are not centered at 0
set.seed(1234)
rr <- rast(matrix(rnorm(400, 1.5, 1), nrow=20, ncol=20))
Even if my values are not centered at 0, I want my color intensity to be centered at 0, to do so, I find the range that I need :
the_range <- range(values(rr)) %>% abs %>% max %>% multiply_by(c(-1,1))
Then I create my color palette:
the_palette_fc <- leaflet::colorNumeric(palette = "RdBu", domain = the_range, reverse = T)
If I set, for example, 31 color classes to my map and plot it normally with terra I get:
plot(rr, col=the_palette_fc(seq(the_range[1], the_range[2], length.out=31)))
You can see that the white color was not fixed to the zero value. Thats is probably because terra::plot calculate it's own range which is different from mine.
Reading around online, I found a possible option which is to play with the coltab argument. I tried it 2 ways:
coltab(rr) <- data.frame(
val=seq(the_range[1], the_range[2], length.out=31),
col=the_palette_fc(seq(the_range[1], the_range[2], length.out=31)))
plot(rr)
and:
coltab(rr) <- data.frame(
val=values(rr),
col=the_palette_fc(values(rr)))
plot(rr)
But the result do not seem to give the wanted results. Also, there is no legend automatically added which is also necessary. Maybe the coltab argument should be only used for categorical values?
So is there a way in terra to fined tune the colors palette so white is fix for 0 values?
To set 0 as the midpoint, you can set the domain of your palette to c(-max(rr[]), max(rr[])):
the_palette_fc <- leaflet::colorNumeric(palette = "RdBu",
domain = c(-max(rr[]), max(rr[])),
reverse = TRUE)
the_colors <- the_palette_fc(seq(min(rr[]), max(rr[]), length.out = 50))
plot(rr, col = the_colors)
If you want the scale to go from darkest blue to darkest red with a transition of white at 0, it is a bit more involved, but you could do:
the_palette_fc <- leaflet::colorNumeric(palette = "RdBu",
domain = c(-max(rr[]), max(rr[])),
reverse = TRUE)
breakpoints <- seq(min(rr[]), max(rr[]), length.out = 50)
the_colors <- the_palette_fc(c(seq(-max(rr[]), 0, length = sum(breakpoints < 0)),
seq(0, max(rr[]), length = sum(breakpoints > 0))))
plot(rr, col = the_colors)
terra::plot actually does have a "breaks" argument; but it is only documented in the examples so it was easy to overlook. You can do
library(terra)
set.seed(1234)
rr <- rast(matrix(rnorm(400, 1.5, 1), nrow=20, ncol=20))
ceil <- values(rr) |> abs() |> max() |> ceiling()
pal <- leaflet::colorNumeric(palette = "RdBu", domain=c(-ceil, ceil), reverse = T)
b <- seq(-ceil, ceil, 1)
plot(rr, type="interval", breaks=b, col=pal(b))
In response to your comment:
For your purposes, you can also use the range argument, I think.
plot(rr, range=c(-ceil, ceil), col=pal(seq(-ceil,ceil,.1)))
And if you use "breaks", with terra 1.6-14 (currently the development version) you can now overwrite the default legend type.
plot(rr, type="cont", breaks=b, col=pal(b))

How to create heatmap illustraing mesh differences controlling the position of center color for divergence color palette?

I have two 3D meshes of human faces and I wish to use heatmap to illustrate differences. I want to use red-blue divergent color scale.
My data can be found here. In my data, "vb1.xlsx" and "vb2.xlsx" contain 3D coordinates of the two meshes. "it.xlsx" is the face information. The "dat_col.xlsx" contains pointwise distances between the two meshes based on which heatmap could be produced. I used the following code to generate the two meshes based on vertex and face information. I then used the meshDist function in Morpho package to calculate distances between each pair of vertex on the two meshes.
library(Morpho)
library(xlsx)
library(rgl)
library(RColorBrewer)
library(tidyverse)
mshape1 <- read.xlsx("...\\vb1.xlsx", sheetIndex = 1, header = F)
mshape2 <- read.xlsx("...\\vb2.xlsx", sheetIndex = 1, header = F)
it <- read.xlsx("...\\it.xlsx", sheetIndex = 1, header = F)
# Preparation for use in tmesh3d
vb_mat_mshape1 <- t(mshape1)
vb_mat_mshape1 <- rbind(vb_mat_mshape1, 1)
rownames(vb_mat_mshape1) <- c("xpts", "ypts", "zpts", "")
vb_mat_mshape2 <- t(mshape2)
vb_mat_mshape2 <- rbind(vb_mat_mshape2, 1)
rownames(vb_mat_mshape2) <- c("xpts", "ypts", "zpts", "")
it_mat <- t(as.matrix(it))
rownames(it_mat) <- NULL
vertices1 <- c(vb_mat_mshape1)
vertices2 <- c(vb_mat_mshape2)
indices <- c(it_mat)
mesh1 <- tmesh3d(vertices = vertices1, indices = indices, homogeneous = TRUE,
material = NULL, normals = NULL, texcoords = NULL)
mesh2 <- tmesh3d(vertices = vertices2, indices = indices, homogeneous = TRUE,
material = NULL, normals = NULL, texcoords = NULL)
mesh1smooth <- addNormals(mesh1)
mesh2smooth <- addNormals(mesh2)
# Calculate mesh distance using meshDist function in Morpho package
mD <- meshDist(mesh1smooth, mesh2smooth)
pd <- mD$dists
The pd, containing information on pointwise distances between the two meshes, can be found in the first column of the "dat_col.xlsx" file.
A heatmap is generated from the meshDist function as follows:
I wish to have better control of the heatmap by using red-blue divergent color scale. More specifically, I want positive/negative values to be colored blue/red using 100 colors from the RdBu color pallete in the RColorBrewer package. To do so, I first cut the range of pd values into 99 intervals of equal lengths. I then determined which of the 99 intervals does each pd value lie in. The code is as below:
nlevel <- 99
breaks <- NULL
for (i in 1:(nlevel - 1)) {
breaks[i] <- min(pd) + ((max(pd) - min(pd))/99) * i
}
breaks <- c(min(pd), breaks, max(pd))
pd_cut <- cut(pd, breaks = breaks, include.lowest = TRUE)
dat_col <- data.frame(pd = pd, pd_cut = pd_cut, group = as.numeric(pd_cut))
The pd_cut is the inteval corresponding to each pd and group is the interval membership of each pd. Color is then assgined to each pd according to the value in group with the following code:
dat_col <- dat_col %>%
mutate(color = colorRampPalette(
brewer.pal(n = 9, name = "RdBu"))(99)[dat_col$group])
The final heatmap is as follows:
open3d()
shade3d(mesh1smooth, col=dat_col$color, specular = "#202020", polygon_offset = 1)
Since I have 99 intervals, the middle interval is the 50th, (-3.53e-05,-1.34e-05]. However, it is the 51th interval, (-1.34e-05,8.47e-06], that contains the 0 point.
Following my way of color assignment (colorRampPalette(brewer.pal(n = 9, name = "RdBu"))(99)[dat_col$group]), the center color (the 50th color imputed from colorRampPalette) is given to pds belonging to the 50th interval. However, I want pds that belong to the 51th interval, the interval that harbors 0, to be assgned the center color.
I understand that in my case, my issue won't affect the appearance of heatmap too much. But I believe this is not a trivial issue and can significantly affect the heatmap when the interval that contains 0 is far from the middle interval. This could happen when the two meshes under comparison is very different. It makes more sense to me to assign center color to the interval that contains 0 rather than the one(s) that lie in the middle of all intervals.
Of course I can manually replace hex code of the 50th imputed color to the desired center color as follows:
color <- colorRampPalette(brewer.pal(n = 9, name = "RdBu"))(99)
color2 <- color
color2[50] <- "#ffffff" #assume white is the intended center color
But the above approach affected the smoothness of color gradient since the color that was originally imputed by some smooth function is replaced by some arbitrary color. But how could I assign center color to pds that lie in the interval that transgresses 0 while at the same time not affecting the smoothness of the imputed color?
There are a couple of things to fix to get what you want.
First, the colours. You base the colours on this code:
color <- colorRampPalette(brewer.pal(n = 9, name = "RdBu"))(99)
You can look at the result of that calculation, and you'll see that there is no white in it. The middle color is color[50] which evaluates to "#F7F6F6", i.e.
a slightly reddish light gray colour. If you look at the original RdBu palette, the middle colour was "#F7F7F7", so this change was done by colorRampPalette(). To me it looks like a minor bug in that function: it truncates the colour values instead of rounding them, so the values
[50,] 247.00000 247.00000 247.00000
convert to "#F7F6F6", i.e. red 247, green 246, blue 246. You can avoid this by choosing some other number of colours in your palette. I see "F7F7F7" as the middle colour with both 97 and 101 colours. But being off by one probably doesn't matter much, so I wouldn't worry about this.
The second problem is your discretization of the range of the pd values. You want zero in the middle bin. If you want the bins all to be of equal size, then it needs to be symmetric: so instead of running from min(pd) to max(pd), you could use this calculation:
limit <- max(abs(pd))
breaks <- -limit + (0:nlevel)*2*limit/nlevel
This will put zero exactly in the middle of the middle bin, but some of the bins at one end or the other might not be used. If you don't care if the bins are of equal size, you could get just as many negatives as positives by dividing them up separately. I like the above solution better.
Edited to add: For the first problem, a better solution is to use
color <- hcl.colors(99, "RdBu")
with the new function in R 3.6.0. This does give a light gray as the middle color.

R corrplot - color relying on value

I have a binary data.frame (53115 rows; 520 columns) and I want to plot a correlation plot. I want to colour it based on the values, correlation values >=0.95 (red), otherwise, blue.
correl <- abs(round(cor(bin_mat), 2))
pdf("corrplot.pdf", width = 200, height = 200)
a <- corrplot(correl, order = "hclust", addCoef.col = "black", number.cex=0.8, cl.lim = c(0,1), col=c(rep("deepskyblue",19) ,"red"))
dev.off()
I get the correlation plot but in many cases I get a wrong coloring (see below on 0.91).
data: file
How can I manage to have a right coloring?
In general corrplot library is quite weird when it comes to cl.lim and colors. For some reason it doesn't seem to matter if you set cl.lim or not - the colors will still be distributed from -1 to 1.
So in your case just use 39 blue colors instead of 19 (to cover the range from -1 to 1):
cors <- cor(iris[,-5])
cors[cbind(c(1,2), c(2,1))] <- 0.912
corrplot(cors, col=c(rep("blue", 39), "red"), cl.lim=c(-1,1), addCoef.col="black")
And the result:

Parliamentary seats graph -> colors and labels?

I using the following code to create a parliamentary seats graph with R for the German electoral results of 2013.
I would want to change the colors for each party (CDU/CSU -> red, SPD -> blue, Linke -> yellow and Gruene -> green). When I try to do this, the colors seem to appear randomly, destroying the sequences of the parties in the graph.
I also want to take off the black contour of the graph to leave only the seats graph visible.
VoteGermany2013 <- data.frame(Party=c( "CDU/CSU", "SPD", "LINKE","GRUENE"),
Result=c(311,193,64,63))
seats <- function(N,M, r0=2.5){
radii <- seq(r0, 1, len=M)
counts <- numeric(M)
pts = do.call(rbind,
lapply(1:M, function(i){
counts[i] <<- round(N*radii[i]/sum(radii[i:M]))
theta <- seq(0, pi, len = counts[i])
N <<- N - counts[i]
data.frame(x=radii[i]*cos(theta), y=radii[i]*sin(theta), r=i,
theta=theta)
} )
)
pts = pts[order(-pts$theta,-pts$r),]
pts
}
election <- function(seats, counts){
stopifnot(sum(counts)==nrow(seats))
seats$party = rep(1:length(counts),counts)
seats
}
layout = seats(631,16)
result = election(layout, VoteGermany2013$Result) # no overall majority!!!
plot(result$x, result$y, col=result$party,pch=19, asp=1)
Nice example. I'm guessing that you want to suppress the axes. This uses the result$party numeric values as an index into the color vector you specified. The col vector (which the index creates at equal length to the x and y arguments) needs to be as long as the 'x' and 'y' values if there is no regularity that is in sync with 'col's length. (If colors repeat their grouping in sync with the multiple of the length of the 'col'-vector, then no problem. Recycling takes care of everything.) With no regularity in the grouping, the 'col'-vector gets recycled, and chaos ensues.
plot(result$x, result$y,
col=c( "red", "blue", "yellow","green")[result$party], #numeric index
pch=19, asp=1,
frame.plot=FALSE, # gets rid of the surrounding rectangle
axes="F") # gets rid of the numbers and ticks
You can suppress the 'xlab' and 'ylab' by assigning to ""
There is a package for this as well: ggparliament.

colorRamp returns 0

I'm trying to plot lines and color the lines based on the probability of that connection. Given a vector of probabilities, I use:
colfunc <- colorRamp(c("white", "red"))
colors <- colfunc(probs)
colors is then an nx3 matrix of rgb values. However, colfunc quite often returns a 0 value, so when i attempt to plot using these colors, R complains
Error in col2rgb(colors) : numerical color values must be positive
Is there an error in the way I am defining my color function?
Your function works fine, I think, but it doesn't return colors you can use with plot, because plot wants a color, not RGB values in a matrix.
There's probably a better way, but you can simply covert the matrix:
probs <- runif(10)
colors <- colfunc(probs)
my_col = apply(colors, MARGIN = 1, function(x) rgb(x[1]/255, x[2]/255, x[3]/255))
plot(1:10, 1:10, col = my_col) # should work fine
or you could just wrap your function
better_colfunc <- function(x, ramp = colorRamp(c("white", "red"))) {
colors <- ramp(x)
colors = apply(colors, MARGIN = 1, function(x) rgb(x[1]/255, x[2]/255, x[3]/255))
return(colors)
}
plot(1:10, 1:10, col = better_colfunc(probs, ramp = colfunc))
As for "colfunc quite often returns a 0 value", and other issues, you'll need to share both some data (what do your probs look like?) as well as perhaps the actual plotting code. See here for tips on making reproducible questions.
I am a bit confused what you are trying to do...the col2rgb function returns rgb values, so if you already have those then what do you want?
Or if you want rgb, why not use:
col2rgb(c("white", "red"))

Resources