Problems with pheatmap usage - r

I'm trying to play around with pheatmap and getting stuck at the very beginning.
Creating a toy example:
library(pheatmap)
set.seed(1)
my.mat <- matrix(rnorm(90), nrow = 30, ncol = 30)
rownames(my.mat) <- 1:30
colnames(my.mat) <- 1:30
col.scale = colorRampPalette(c("red", "blue"), space = "rgb")(10)
breaks.size = 11
pheatmap(my.mat, color = col.scale, breaks = breaks.size, border_color = NA, cellwidth = 10, cellheight = 10)
Throws this error message:
Error in unit(y, default.units) : 'x' and 'units' must have length > 0
And the plot it produces doesn't seem right:
For example, I can't understand why the top right cells are white. i also thought the setting cellwidth = 10 and cellheight = 10 means getting square cells and not rectangular. And finally, if anyone knows if it's possible to have the row names and col names apear on the same side of the heat map as the dendograms (i.e., at the tips of the dendogram), that'll be great.

Well, the reason you are getting that error is that you are using the breaks= parameter incorrectly. From the ?pheatmap help page
breaks: a sequence of numbers that covers the range of values in mat and is one element longer than color vector. Used for mapping values to colors. Useful, if needed to map certain values to certain colors, to certain values. If value is NA then the breaks are calculated automatically.
You can't just pass a single value like you might with other functions.
Also i'm not sure what you are saying about the cells not being square. You are plotting a 30x30 square shape (at least it is for me). Because you are clustering, you're only getting one color per cluster.
I'm guessing part of the problem may be you're only generating 90 random variables for a 900 cell matrix so those values are repeating (your data is very structured). Perhaps you meant
my.mat <- matrix(rnorm(900), nrow = 30, ncol = 30)
doing so gives you the following plot

Related

R: automatically assigning all colors

I am working with the R programming language. I have this data:
letters = replicate(52, paste(sample(LETTERS, 10, replace=TRUE), collapse=""))
values = rnorm(52, 100, 100)
my_data = data.frame(letters, values)
I am trying to plot this data:
library(ggplot2)
library(waffle)
waffle(my_data, size = 0.6, rows = 10)
But this gives me the error:
! Insufficient values in manual scale. 51 needed but only 8 provided.
Run `rlang::last_error()` to see where the error occurred.
Normally, I would have manually provided the colors - but 51 colors are a lot to insert manually. Is there some automatic way that can recognize how many colors are required and then fill them all in?
Thanks!
You can use a vector of 53 colors using a palette function such as scales::hue_pal()(53) (note I have had to alter the way the input data is used, since your unmodified example data and code simply returns an error)
waffle(setNames(abs(round(my_data$values/10)),
my_data$letters), size = 0.6, rows = 10,
colors = scales::hue_pal()(53)) +
theme(legend.position = "bottom")
The obvious caveat is that 53 discrete colors is far too many to have in a waffle plot. It is simply unintelligible from a data visualisation point of view. Whatever you are trying to demonstrate, there will certainly be a better way to do it than a waffle chart with 53 discrete colors.

How to create heatmap illustraing mesh differences controlling the position of center color for divergence color palette?

I have two 3D meshes of human faces and I wish to use heatmap to illustrate differences. I want to use red-blue divergent color scale.
My data can be found here. In my data, "vb1.xlsx" and "vb2.xlsx" contain 3D coordinates of the two meshes. "it.xlsx" is the face information. The "dat_col.xlsx" contains pointwise distances between the two meshes based on which heatmap could be produced. I used the following code to generate the two meshes based on vertex and face information. I then used the meshDist function in Morpho package to calculate distances between each pair of vertex on the two meshes.
library(Morpho)
library(xlsx)
library(rgl)
library(RColorBrewer)
library(tidyverse)
mshape1 <- read.xlsx("...\\vb1.xlsx", sheetIndex = 1, header = F)
mshape2 <- read.xlsx("...\\vb2.xlsx", sheetIndex = 1, header = F)
it <- read.xlsx("...\\it.xlsx", sheetIndex = 1, header = F)
# Preparation for use in tmesh3d
vb_mat_mshape1 <- t(mshape1)
vb_mat_mshape1 <- rbind(vb_mat_mshape1, 1)
rownames(vb_mat_mshape1) <- c("xpts", "ypts", "zpts", "")
vb_mat_mshape2 <- t(mshape2)
vb_mat_mshape2 <- rbind(vb_mat_mshape2, 1)
rownames(vb_mat_mshape2) <- c("xpts", "ypts", "zpts", "")
it_mat <- t(as.matrix(it))
rownames(it_mat) <- NULL
vertices1 <- c(vb_mat_mshape1)
vertices2 <- c(vb_mat_mshape2)
indices <- c(it_mat)
mesh1 <- tmesh3d(vertices = vertices1, indices = indices, homogeneous = TRUE,
material = NULL, normals = NULL, texcoords = NULL)
mesh2 <- tmesh3d(vertices = vertices2, indices = indices, homogeneous = TRUE,
material = NULL, normals = NULL, texcoords = NULL)
mesh1smooth <- addNormals(mesh1)
mesh2smooth <- addNormals(mesh2)
# Calculate mesh distance using meshDist function in Morpho package
mD <- meshDist(mesh1smooth, mesh2smooth)
pd <- mD$dists
The pd, containing information on pointwise distances between the two meshes, can be found in the first column of the "dat_col.xlsx" file.
A heatmap is generated from the meshDist function as follows:
I wish to have better control of the heatmap by using red-blue divergent color scale. More specifically, I want positive/negative values to be colored blue/red using 100 colors from the RdBu color pallete in the RColorBrewer package. To do so, I first cut the range of pd values into 99 intervals of equal lengths. I then determined which of the 99 intervals does each pd value lie in. The code is as below:
nlevel <- 99
breaks <- NULL
for (i in 1:(nlevel - 1)) {
breaks[i] <- min(pd) + ((max(pd) - min(pd))/99) * i
}
breaks <- c(min(pd), breaks, max(pd))
pd_cut <- cut(pd, breaks = breaks, include.lowest = TRUE)
dat_col <- data.frame(pd = pd, pd_cut = pd_cut, group = as.numeric(pd_cut))
The pd_cut is the inteval corresponding to each pd and group is the interval membership of each pd. Color is then assgined to each pd according to the value in group with the following code:
dat_col <- dat_col %>%
mutate(color = colorRampPalette(
brewer.pal(n = 9, name = "RdBu"))(99)[dat_col$group])
The final heatmap is as follows:
open3d()
shade3d(mesh1smooth, col=dat_col$color, specular = "#202020", polygon_offset = 1)
Since I have 99 intervals, the middle interval is the 50th, (-3.53e-05,-1.34e-05]. However, it is the 51th interval, (-1.34e-05,8.47e-06], that contains the 0 point.
Following my way of color assignment (colorRampPalette(brewer.pal(n = 9, name = "RdBu"))(99)[dat_col$group]), the center color (the 50th color imputed from colorRampPalette) is given to pds belonging to the 50th interval. However, I want pds that belong to the 51th interval, the interval that harbors 0, to be assgned the center color.
I understand that in my case, my issue won't affect the appearance of heatmap too much. But I believe this is not a trivial issue and can significantly affect the heatmap when the interval that contains 0 is far from the middle interval. This could happen when the two meshes under comparison is very different. It makes more sense to me to assign center color to the interval that contains 0 rather than the one(s) that lie in the middle of all intervals.
Of course I can manually replace hex code of the 50th imputed color to the desired center color as follows:
color <- colorRampPalette(brewer.pal(n = 9, name = "RdBu"))(99)
color2 <- color
color2[50] <- "#ffffff" #assume white is the intended center color
But the above approach affected the smoothness of color gradient since the color that was originally imputed by some smooth function is replaced by some arbitrary color. But how could I assign center color to pds that lie in the interval that transgresses 0 while at the same time not affecting the smoothness of the imputed color?
There are a couple of things to fix to get what you want.
First, the colours. You base the colours on this code:
color <- colorRampPalette(brewer.pal(n = 9, name = "RdBu"))(99)
You can look at the result of that calculation, and you'll see that there is no white in it. The middle color is color[50] which evaluates to "#F7F6F6", i.e.
a slightly reddish light gray colour. If you look at the original RdBu palette, the middle colour was "#F7F7F7", so this change was done by colorRampPalette(). To me it looks like a minor bug in that function: it truncates the colour values instead of rounding them, so the values
[50,] 247.00000 247.00000 247.00000
convert to "#F7F6F6", i.e. red 247, green 246, blue 246. You can avoid this by choosing some other number of colours in your palette. I see "F7F7F7" as the middle colour with both 97 and 101 colours. But being off by one probably doesn't matter much, so I wouldn't worry about this.
The second problem is your discretization of the range of the pd values. You want zero in the middle bin. If you want the bins all to be of equal size, then it needs to be symmetric: so instead of running from min(pd) to max(pd), you could use this calculation:
limit <- max(abs(pd))
breaks <- -limit + (0:nlevel)*2*limit/nlevel
This will put zero exactly in the middle of the middle bin, but some of the bins at one end or the other might not be used. If you don't care if the bins are of equal size, you could get just as many negatives as positives by dividing them up separately. I like the above solution better.
Edited to add: For the first problem, a better solution is to use
color <- hcl.colors(99, "RdBu")
with the new function in R 3.6.0. This does give a light gray as the middle color.

avoiding over-crowding of labels in r graphs

I am working on avoid over crowding of the labels in the following plot:
set.seed(123)
position <- c(rep (0,5), rnorm (5,1,0.1), rnorm (10, 3,0.1), rnorm (3, 4, 0.2), 5, rep(7,5), rnorm (3, 8,2), rnorm (10,9,0.5),
rep (0,5), rnorm (5,1,0.1), rnorm (10, 3,0.1), rnorm (3, 4, 0.2), 5, rep(7,5), rnorm (3, 8,2), rnorm (10,9,0.5))
group <- c(rep (1, length (position)/2),rep (2, length (position)/2) )
mylab <- paste ("MR", 1:length (group), sep = "")
barheight <- 0.5
y.start <- c(group-barheight/2)
y.end <- c(group+barheight/2)
mydf <- data.frame (position, group, barheight, y.start, y.end, mylab)
plot(0,type="n",ylim=c(0,3),xlim=c(0,10),axes=F,ylab="",xlab="")
#Create two horizontal lines
require(fields)
yline(1,lwd=4)
yline(2,lwd=4)
#Create text for the lines
text(10,1.1,"Group 1",cex=0.7)
text(10,2.1,"Group 2",cex=0.7)
#Draw vertical bars
lng = length(position)/2
lg1 = lng+1
lg2 = lng*2
segments(mydf$position[1:lng],mydf$y.start[1:lng],y1=mydf$y.end[1:lng])
segments(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2],y1=mydf$y.end[lg1:lg2])
text(mydf$position[1:lng],mydf$y.start[1:lng]+0.65, mydf$mylab[1:lng], srt = 90)
text(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2]+0.65, mydf$mylab[lg1:lg2], srt = 90)
You can see some areas are crowed with the labels - when x value is same or similar. I want just to display only one label (when there is multiple label at same point). For example,
mydf$position[1:5] are all 0,
but corresponding labels mydf$mylab[1:5] -
MR1 MR2 MR3 MR4 MR5
I just want to display the first one "MR1".
Similarly the following points are too close (say the difference of 0.35), they should be considered a single cluster and first label will be displayed. In this way I would be able to get rid of overcrowding of labels. How can I achieve it ?
If you space the labels out and add some extra lines you can label every marker.
clpl <- function(xdata, names, y=1, dy=0.25, add=FALSE){
o = order(xdata)
xdata=xdata[o]
names=names[o]
if(!add)plot(0,type="n",ylim=c(y-1,y+2),xlim=range(xdata),axes=F,ylab="",xlab="")
abline(h=1,lwd=4)
dy=0.25
segments(xdata,y-dy,xdata,y+dy)
tpos = seq(min(xdata),max(xdata),len=length(xdata))
text(tpos,y+2*dy,names,srt=90,adj=0)
segments(xdata,y+dy,tpos,y+2*dy)
}
Then using your data:
clpl(mydf$position[lg1:lg2],mydf$mylab[lg1:lg2])
gives:
You could then think about labelling clusters underneath the main line.
I've not given much thought to doing multiple lines in a plot, but I think with a bit of mucking with my code and the add parameter it should be possible. You could also use colour to show clusters. I'm fairly sure these techniques are present in some of the clustering packages for R...
Obviously with a lot of markers even this is going to get smushed, but with a lot of clusters the same thing is going to happen. Maybe you end up labelling clusters with a this technique?
In general, I agree with #Joran that cluster labelling can't be automated but you've said that labelling a group of lines with the first label in the cluster would be OK, so it is possible to automate some of the process.
Putting the following code after the line lg2 = lng*2 gives the result shown in the image below:
clust <- cutree(hclust(dist(mydf$position[1:lng])),h=0.75)
u <- rep(T,length(unique(clust)))
clust.labels <- sapply(c(1:lng),function (i)
{
if (u[clust[i]])
{
u[clust[i]] <<- F
as.character(mydf$mylab)[i]
}
else
{
""
}
})
segments(mydf$position[1:lng],mydf$y.start[1:lng],y1=mydf$y.end[1:lng])
segments(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2],y1=mydf$y.end[lg1:lg2])
text(mydf$position[1:lng],mydf$y.start[1:lng]+0.65, clust.labels, srt = 90)
text(mydf$position[lg1:lg2],mydf$y.start[lg1:lg2]+0.65, mydf$mylab[lg1:lg2], srt = 90)
(I've only labelled the clusters on the lower line -- the same principle could be applied to the upper line too). The parameter h of cutree() might have to be adjusted case-by-case to give the resolution of labels that you want, but this approach is at least easier than labelling every cluster by hand.

Heat map- adjusting color range

library(gplots)
shades= c(seq(-1,0.8,length=64),seq(0.8,1.2,length=64),seq(1.2,3,length=64))
heatmap.2(cor_mat, dendrogram='none', Rowv=FALSE, Colv=FALSE, col=redblue(64),
breaks=shades, key=TRUE, cexCol=0.7, cexRow=1, keysize=1)
There is some problem with breaks. Wish to receive help on it.
After running the code I get this error message
Error in image.default(1:nc, 1:nr, x, xlim = 0.5 + c(0, nc), ylim = 0.5 + : must have one more break than colour
Thank you for your time and consideration.
Well, we don't have cor_mat so we can't try this ourselves, but the problem seems to be what it says on the tin, isn't it? The way heatmap (and generally all functions based on image) works with breaks and a vector of colours, is that the breaks define the points where changes in the value of your data matrix means the colour changes. In short, if break = c(1,2,3), and your col = c("red", "blue"):
values < 1 will be transparent
values >= 1, <= 2 will be plotted as red
values > 2, <= 3 will be plotted as blue
values > 3 will be transparent
What's going on in your code is that with 'shade' you've supplied a length 3*64 vector to break, while redblue(64) only gives you 64 colours. Try replacing redblue(64) with, say, redblue(3*64-1).

spplot() - make color.key look nice

I'm afraid I have a spplot() question again.
I want the colors in my spplot() to represent absolute values, not automatic values as spplot does it by default.
I achieve this by making a factor out of the variable I want to draw (using the command cut()). This works very fine, but the color-key doesn't look good at all.
See it yourself:
library(sp)
data(meuse.grid)
gridded(meuse.grid) = ~x+y
meuse.grid$random <- rnorm(nrow(meuse.grid), 7, 2)
meuse.grid$random[meuse.grid$random < 0] <- 0
meuse.grid$random[meuse.grid$random > 10] <- 10
# making a factor out of meuse.grid$ random to have absolute values plotted
meuse.grid$random <- cut(meuse.grid$random, seq(0, 10, 0.1))
spplot(meuse.grid, c("random"), col.regions = rainbow(100, start = 4/6, end = 1))
How can I have the color.key on the right look good - I'd like to have fewer ticks and fewer labels (maybe just one label on each extreme of the color.key)
Thank you in advance!
[edit]
To make clear what I mean with absolute values: Imagine a map where I want to display the sea height. Seaheight = 0 (which is the min-value) should always be displayed blue. Seaheight = 10 (which, just for the sake of the example, is the max-value) should always be displayed red. Even if there is no sea on the regions displayed on the map, this shouldn't change.
I achieve this with the cut() command in my example. So this part works fine.
THIS IS WHAT MY QUESTION IS ABOUT
What I don't like is the color description on the right side. There are 100 ticks and each tick has a label. I want fewer ticks and fewer labels.
The way to go is using the attribute colorkey. For example:
## labels
labelat = c(1, 2, 3, 4, 5)
labeltext = c("one", "two", "three", "four", "five")
## plot
spplot(meuse.grid,
c("random"),
col.regions = rainbow(100, start = 4/6, end = 1),
colorkey = list(
labels=list(
at = labelat,
labels = labeltext
)
)
)
First, it's not at all clear what you are wanting here. There are many ways to make the color.key look "nice" and that is to understand first the data being passed to spplot and what is being asked of it. cut() is providing fully formatted intervals like (2.3, 5.34] which will need to be handled a different way, increasing the margins in the plot, specific formatting and spacing for the labels, etc. etc. This just may not be what you ultimately want.
Perhaps you just want integer values, rounded from the input values?
library(sp)
data(meuse.grid)
gridded(meuse.grid) = ~x+y
meuse.grid$random <- rnorm(nrow(meuse.grid), 7, 2)
Round the values (or trunc(), ceil(), floor() them . . .)
meuse.grid$rclass <- round(meuse.grid$random)
spplot(meuse.grid, c("rclass"), col.regions = rainbow(100, start = 4/6, end = 1))

Resources