I was wondering if anyone knows of a package that allows partial row labeling of heatmaps. I am currently using pheatmap() to construct my heatmaps, but I can use any package that has this functionality.
I have plots with many rows of differentially expressed genes and I would like to label a subset of them. There are two main things to consider (that I can think of):
The placement of the text annotation depends on the height of the row. If the rows are too narrow, then the text label will be ambiguous without some sort of pointer.
If multiple adjacent rows are significant (i.e. will be labelled), then these will need to be offset, and again, a pointer will be needed.
Below is an example of a partial solution that really only gets maybe halfway there, but I hope illustrates what I'd like to be able to do.
set.seed(1)
require(pheatmap)
require(RColorBrewer)
require(grid)
### Data to plot
data_mat <- matrix(sample(1:10000, 300), nrow = 50, ncol = 6)
rownames(data_mat) <- paste0("Gene", 1:50)
colnames(data_mat) <- c(paste0("A", 1:3), paste0("B", 1:3))
### Set how many genes to annotate
### TRUE - make enough labels that some overlap
### FALSE - no overlap
tooMany <- T
### Select a few genes to annotate
if (tooMany) {
sigGenes_v <- paste0("Gene", c(5,20,26,42,47,16,28))
newMain_v <- "Too Many Labels"
} else {
sigGenes_v <- paste0("Gene", c(5,20,26,42))
newMain_v <- "OK Labels"
}
### Make color list
colors_v <- brewer.pal(8, "Dark2")
colors_v <- colors_v[c(1:length(sigGenes_v), 8)]
names(colors_v) <- c(sigGenes_v, "No")
annColors_lsv <- list("Sig" = colors_v)
### Column Metadata
colMeta_df <- data.frame(Treatment = c(rep("A", 3), rep("B", 3)),
Replicate = c(rep(1:3, 2)),
stringsAsFactors = F,
row.names = colnames(data_mat))
### Row metadata
rowMeta_df <- data.frame(Sig = rep("No", 50),
stringsAsFactors = F,
row.names = rownames(data_mat))
for (gene_v in sigGenes_v) rowMeta_df[rownames(rowMeta_df) == gene_v, "Sig"] <- gene_v
### Heatmap
heat <- pheatmap(data_mat,
annotation_row = rowMeta_df,
annotation_col = colMeta_df,
annotation_colors = annColors_lsv,
cellwidth = 10,
main = "Original Heat")
### Get order of genes after clustering
genesInHeatOrder_v <- heat$tree_row$labels[heat$tree_row$order]
whichSigInHeatOrder_v <- which(genesInHeatOrder_v %in% sigGenes_v)
whichSigInHeatOrderLabels_v <- genesInHeatOrder_v[whichSigInHeatOrder_v]
sigY <- 1 - (0.02 * whichSigInHeatOrder_v)
### Change title
whichMainGrob_v <- which(heat$gtable$layout$name == "main")
heat$gtable$grobs[[whichMainGrob_v]] <- textGrob(label = newMain_v,
gp = gpar(fontsize = 16))
### Remove rows
whichRowGrob_v <- which(heat$gtable$layout$name == "row_names")
heat$gtable$grobs[[whichRowGrob_v]] <- textGrob(label = whichSigInHeatOrderLabels_v,
y = sigY,
vjust = 1)
grid.newpage()
grid.draw(heat)
Here are a few outputs:
original heatmap:
ok labels:
ok labels, with flags:
too many labels
too many labels, with flags
The "with flags" outputs are the desired final results.
I just saved these as images from the Rstudio plot viewer. I recognize that I could save them as pdfs and provide a larger file size to get rid of the label overlap, but then the individual cells would be larger than I want.
Based on your code, you seem fairly comfortable with gtables & grobs. A (relatively) straightforward way to achieve the look you want is to zoom in on the row label grob, & make some changes there:
replace unwanted labels with "";
evenly spread out labels within the available space;
add line segments joining the old and new label positions.
I wrote a wrapper function for this, which works as follows:
# heat refers to the original heatmap produced from the pheatmap() function
# kept.labels should be a vector of labels you wish to show
# repel.degree is a number in the range [0, 1], controlling how much the
# labels are spread out from one another
add.flag(heat,
kept.labels = sigGenes_v,
repel.degree = 0)
add.flag(heat,
kept.labels = sigGenes_v,
repel.degree = 0.5)
add.flag(heat,
kept.labels = sigGenes_v,
repel.degree = 1)
Function (explanations in annotations):
add.flag <- function(pheatmap,
kept.labels,
repel.degree) {
# repel.degree = number within [0, 1], which controls how much
# space to allocate for repelling labels.
## repel.degree = 0: spread out labels over existing range of kept labels
## repel.degree = 1: spread out labels over the full y-axis
heatmap <- pheatmap$gtable
new.label <- heatmap$grobs[[which(heatmap$layout$name == "row_names")]]
# keep only labels in kept.labels, replace the rest with ""
new.label$label <- ifelse(new.label$label %in% kept.labels,
new.label$label, "")
# calculate evenly spaced out y-axis positions
repelled.y <- function(d, d.select, k = repel.degree){
# d = vector of distances for labels
# d.select = vector of T/F for which labels are significant
# recursive function to get current label positions
# (note the unit is "npc" for all components of each distance)
strip.npc <- function(dd){
if(!"unit.arithmetic" %in% class(dd)) {
return(as.numeric(dd))
}
d1 <- strip.npc(dd$arg1)
d2 <- strip.npc(dd$arg2)
fn <- dd$fname
return(lazyeval::lazy_eval(paste(d1, fn, d2)))
}
full.range <- sapply(seq_along(d), function(i) strip.npc(d[i]))
selected.range <- sapply(seq_along(d[d.select]), function(i) strip.npc(d[d.select][i]))
return(unit(seq(from = max(selected.range) + k*(max(full.range) - max(selected.range)),
to = min(selected.range) - k*(min(selected.range) - min(full.range)),
length.out = sum(d.select)),
"npc"))
}
new.y.positions <- repelled.y(new.label$y,
d.select = new.label$label != "")
new.flag <- segmentsGrob(x0 = new.label$x,
x1 = new.label$x + unit(0.15, "npc"),
y0 = new.label$y[new.label$label != ""],
y1 = new.y.positions)
# shift position for selected labels
new.label$x <- new.label$x + unit(0.2, "npc")
new.label$y[new.label$label != ""] <- new.y.positions
# add flag to heatmap
heatmap <- gtable::gtable_add_grob(x = heatmap,
grobs = new.flag,
t = 4,
l = 4
)
# replace label positions in heatmap
heatmap$grobs[[which(heatmap$layout$name == "row_names")]] <- new.label
# plot result
grid.newpage()
grid.draw(heatmap)
# return a copy of the heatmap invisibly
invisible(heatmap)
}
Related
I want to identify 3d cylinders in an rgl plot to obtain one attribute of the nearest / selected cylinder. I tried using labels to simply spell out the attribute, but I work on data with more than 10.000 cylinders. Therefore, it gets so crowded that the labels are unreadable and it takes ages to render.
I tried to understand the documentation of rgl and I guess the solution to my issue is selecting the cylinder in the plot manually. I believe the function selectpoints3d() is probably the way to go. I believe it returns all vertices within the drawn rectangle, but I don't know how to go back to the cylinder data? I could calculate which cylinder is closest to the mean of the selected vertices, but this seems like a "quick & dirty" way to do the job.
Is there a better way to go? I noticed the argument value=FALSE to get the indices only, but I don't know how to go back to the cylinders.
Here is some dummy data and my code:
# dummy data
cylinder <- data.frame(
start_X = rep(1:3, 2)*2,
start_Y = rep(1:2, each = 3)*2,
start_Z = 0,
end_X = rep(1:3, 2)*2 + round(runif(6, -1, 1), 2),
end_Y = rep(1:2, each = 3)*2 + round(runif(6, -1, 1), 2),
end_Z = 0.5,
radius = 0.25,
attribute = sample(letters[1:6], 6)
)
# calculate centers
cylinder$center_X <- rowMeans(cylinder[,c("start_X", "end_X")])
cylinder$center_Y <- rowMeans(cylinder[,c("start_Y", "end_Y")])
cylinder$center_Z <- rowMeans(cylinder[,c("start_Z", "end_Z")])
# create cylinders
cylinder_list <- list()
for (i in 1:nrow(cylinder)) {
cylinder_list[[i]] <- cylinder3d(
center = cbind(
c(cylinder$start_X[i], cylinder$end_X[i]),
c(cylinder$start_Y[i], cylinder$end_Y[i]),
c(cylinder$start_Z[i], cylinder$end_Z[i])),
radius = cylinder$radius[i],
closed = -2)
}
# plot cylinders
open3d()
par3d()
shade3d(shapelist3d(cylinder_list, plot = FALSE), col = "blue")
text3d(cylinder$center_X+0.5, cylinder$center_Y+0.5, cylinder$center_Z+0.5, cylinder$attribute, color="red")
# get attribute
nearby <- selectpoints3d(value=TRUE, button = "right")
nearby <- colMeans(nearby)
cylinder$dist <- sqrt(
(nearby["x"]-cylinder$center_X)**2 +
(nearby["y"]-cylinder$center_Y)**2 +
(nearby["z"]-cylinder$center_Z)**2)
cylinder$attribute[which.min(cylinder$dist)]
If you call selectpoints3d(value = FALSE), you get two columns. The first column is the id of the object that was found. Your cylinders get two ids each. One way to mark the cylinders is to use "tags". For example, this modification of your code:
# dummy data
cylinder <- data.frame(
start_X = rep(1:3, 2)*2,
start_Y = rep(1:2, each = 3)*2,
start_Z = 0,
end_X = rep(1:3, 2)*2 + round(runif(6, -1, 1), 2),
end_Y = rep(1:2, each = 3)*2 + round(runif(6, -1, 1), 2),
end_Z = 0.5,
radius = 0.25,
attribute = sample(letters[1:6], 6)
)
# calculate centers
cylinder$center_X <- rowMeans(cylinder[,c("start_X", "end_X")])
cylinder$center_Y <- rowMeans(cylinder[,c("start_Y", "end_Y")])
cylinder$center_Z <- rowMeans(cylinder[,c("start_Z", "end_Z")])
# create cylinders
cylinder_list <- list()
for (i in 1:nrow(cylinder)) {
cylinder_list[[i]] <- cylinder3d(
center = cbind(
c(cylinder$start_X[i], cylinder$end_X[i]),
c(cylinder$start_Y[i], cylinder$end_Y[i]),
c(cylinder$start_Z[i], cylinder$end_Z[i])),
radius = cylinder$radius[i],
closed = -2)
# Add tag here:
cylinder_list[[i]]$material$tag <- cylinder$attribute[i]
}
# plot cylinders
open3d()
par3d()
shade3d(shapelist3d(cylinder_list, plot = FALSE), col = "blue")
text3d(cylinder$center_X+0.5, cylinder$center_Y+0.5, cylinder$center_Z+0.5, cylinder$attribute, color="red")
# Don't get values, get the ids
nearby <- selectpoints3d(value=FALSE, button = "right", closest = FALSE)
ids <- nearby[, "id"]
# Convert them to tags. If you select one of the labels, you'll get
# a blank in the list of tags, because we didn't tag the text.
unique(tagged3d(id = ids))
When I was trying this, I found that using closest = TRUE in selectpoints3d seemed to get too many ids; there may be a bug there.
I have vertices and indices data for human face here. I have a post one year ago on plotting 3D facial surface mesh based on these data. Now, I want to plot only the right half and mid-facial vertices while ignoring the left side vertices. Based on my earlier plot, I tried the following code:
library(tidyverse)
library(readxl)
library(rgl)
vb <- read_excel("...\\vb.xlsx", sheet = "Sheet1", col_names = F)
it <- read_excel("...\\it.xlsx", sheet = "Sheet1", col_names = F)
# Extract vertices for the right side
lm_right_ind <- which(vb[,1] < 0)
vb_mat_right <- t(vb[lm_right_ind, ])
vb_mat_right <- rbind(vb_mat_right, 1)
rownames(vb_mat_right) <- c("xpts", "ypts", "zpts", "")
vertices1_right <- c(vb_mat_right)
# Extract `it` whose rows do not contain vertices on the left side
# Left-side vertices have vb[,1] greater than 0
lm_left_ind <- which(vb[,1] > 0)
leftContain <- NULL
for (i in 1: dim(it)[1]) {
if (T %in% (it[i,] %in% lm_left_ind)) {
leftContain[i] <- i
} else {leftContain[i] <- NA}
}
leftContain <- leftContain[!is.na(leftContain)]
# Remove indices that involve left-side vertices
it_rightMid <- it[-leftContain,]
it_mat_right <- t(as.matrix(it_rightMid))
rownames(it_mat_right) <- NULL
indices_right <- c(it_mat_right)
# Plot
try1_right <- tmesh3d(vertices = vertices1_right, indices = indices_right, homogeneous = TRUE,
material = NULL, normals = NULL, texcoords = NULL)
# Use addNormals to smooth the plot. See my Stackoverflow question:
# https://stackoverflow.com/questions/53918849/smooth-3d-trangular-mesh-in-r
try12_right <- addNormals(try1_right)
shade3d(try12_right, col="#add9ec", specular = "#202020", alpha = 0.8)
I got an error whing trying to obtain try12_right:
Error in v[, it[3, i]] : subscript out of bounds.
I did exactly as what I did in my earlier plot but why something went wrong here? Thank you.
Here's an example of using a clipping plane to leave off the left hand side of a mesh object:
library(rgl)
open3d()
root <- currentSubscene3d()
newSubscene3d("inherit", "inherit", "inherit", parent = root) # Clipping limited to this subscene
shade3d(addNormals(subdivision3d(icosahedron3d(), 2)), col = "pink")
clipplanes3d(a = 1, b = 0, c = 0, d = 0)
useSubscene3d(root)
decorate3d()
The fiddling with subscenes limits the clipping to just the shaded sphere, not everything else in the picture.
This produces this output:
If there's nothing else there, it's simpler:
library(rgl)
open3d()
shade3d(addNormals(subdivision3d(icosahedron3d(), 2)), col = "pink")
clipplanes3d(a = 1, b = 0, c = 0, d = 0)
which produces
I'm attempting to use Plotly to create an R Shiny app that dynamically populates a set of subplots sharing an x axis. Each subplot corresponds to a unique value in the data's UserID column. The code in question:
valid.directories <- get.valid.directories()
num.ids <- length(valid.directories)
# for subplot domain scaling
fraction <- 100 / num.ids
output[["plot1"]] <- renderPlotly({
# populate first subplot
p <- plot_ly(selected.data(), x = Timestamp[which(as.integer(valid.directories[1]) == as.integer(UserID))], y = Name[which(as.integer(valid.directories[1]) == as.integer(UserID))], xaxis = "x1", yaxis = "y1", mode = "markers")
# accumulate arguments for layout call
args.names <- c("showlegend", "xaxis", "yaxis")
for (i in 2:num.ids) {
args.names <- c(args.names, paste0("yaxis", i))
}
args <- vector("list", length(args.names))
names(args) <- args.names
args[["showlegend"]] <- FALSE
args[["xaxis"]] <- list(title = "")
args[["yaxis"]] <- list(anchor = 'x', domain = c(0, (fraction - (fraction / 10)) / 100), title = valid.directories[1])
for (j in 2:num.ids) {
p <- add_trace(p, x = Timestamp[which(as.integer(valid.directories[j]) == as.integer(UserID))], y = Name[which(as.integer(valid.directories[j]) == as.integer(UserID))], xaxis = "x1", yaxis = paste0("y", j), mode = "markers")
if (j == num.ids) {
last.upper.bound <- 1
} else {
last.upper.bound <- (fraction * j - (fraction / 10)) / 100
}
args[[paste0("yaxis", j)]] <- list(anchor = 'x', domain = c((fraction * (j - 1) + (fraction / 10)) / 100, last.upper.bound), title = valid.directories[j])
}
# populate additional subplots
p <- do.call(layout, c(p, args))
p
})
It almost works, but only the first and last subplots are populated correctly. Interior subplots appear, but do not display any data and their y axis domains are 0-4 instead of the correct values, AC, EOD and RT. I've checked that the underlying data is not the problem. Here's a screenshot of the incorrect plot:
And this one shows the same data plotted correctly on entirely separate plots:
I've been unable to figure out why the interior subplots are not shown correctly. The last subplot (9999) is populated in the same loop as the interior ones, but it is correct. I've verified that last.upper.bound, which is different for the last subplot, is not the issue.
I have made a loop for making multiply plots, however i have no way of saving them, my code looks like this:
#----------------------------------------------------------------------------------------#
# RING data: Mikkel
#----------------------------------------------------------------------------------------#
# Set working directory
setwd()
#### Read data & Converting factors ####
dat <- read.table("Complete RING.txt", header =TRUE)
str(dat)
dat$Vial <- as.factor(dat$Vial)
dat$Line <- as.factor(dat$Line)
dat$Fly <- as.factor(dat$Fly)
dat$Temp <- as.factor(dat$Temp)
str(dat)
datSUM <- summaryBy(X0.5_sec+X1_sec+X1.5_sec+X2_sec+X2.5_sec+X3_sec~Vial_nr+Concentration+Sex+Line+Vial+Temp,data=dat, FUN=sum)
fl<-levels(datSUM$Line)
colors = c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3")
meltet <- melt(datSUM, id=c("Concentration","Sex","Line","Vial", "Temp", "Vial_nr"))
levels(meltet$variable) <- c('0,5 sec', '1 sec', '1,5 sec', '2 sec', '2,5 sec', '3 sec')
meltet20 <- subset(meltet, Line=="20")
meltet20$variable <- as.factor(meltet20$variable)
AllConcentrations <- levels(meltet20$Concentration)
for (i in AllConcentrations) {
meltet.i <- meltet20[meltet20$Concentration ==i,]
quartz()
print(dotplot(value~variable|Temp, group=Sex, data = meltet.i ,xlab="Time", ylab="Total height pr vial [mm above buttom]", main=paste('Line 20 concentration ', meltet.i$Concentration[1]),
key = list(points = list(col = colors[1:2], pch = c(1, 2)),
text = list(c("Female", "Male")),
space = "top"), col = colors, pch =c(1, 2))) }
I have tried with the quartz.save function, but that just overwrites the files. Im using a mac if that makes any difference.
When I want to save multiple plots in a loop I tend to do something like...
for(i in AllConcentrations){
meltet.i <- meltet20[meltet20$Concentration ==i,]
pdf(paste("my_filename", i, ".pdf", sep = ""))
dotplot(value~variable|Temp, group=Sex, data = meltet.i ,xlab="Time", ylab="Total height pr vial [mm above buttom]", main=paste('Line 20 concentration ', meltet.i$Concentration[1]),
key = list(points = list(col = colors[1:2], pch = c(1, 2)),
text = list(c("Female", "Male")),
space = "top"), col = colors, pch =c(1, 2))
dev.off()
}
This will create a pdf file for every level in AllConcentrations and save it in your working directory. It will paste together my_filename, the number of the iteration i, and then .pdf together to make each file unique. Of course, you will want to adjust height and width in the pdf function.
I have the following scripts:
library("gplots")
mydata <- mtcars
mydata.nr <- nrow(mydata)
mydata.newval <- data.frame(row.names=rownames(mydata),new.val=-log(runif(mydata.nr)))
# Functions
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="euclidean")
# Set colors
hmcols <- rev(redgreen(256));
# Plot the scaled data
heatmap.2(as.matrix(mydata),dendrogram="row",scale="row",col=hmcols,trace="none", margin=c(8,9), hclust=hclustfunc,distfun=distfunc);
Which generate the following heatmap:
Now given a new data.frame which contain new values for each cars:
mydata.nr <- nrow(mydata)
mydata.newval <- data.frame(row.names=rownames(mydata),new.val=-log(runif(mydata.nr)))
I want to create a single column heatmap with gradient gray positioned next to row names.
How can I achieve that in R heatmap.2?
Does this do what you want? You can use the RowSideColors option to add a column to the side of the heatmap.
new.vals = mydata.newval[,1]
mydata.newval$scaled = ( new.vals - min(new.vals) ) /
( max(new.vals) - min(new.vals) )
mydata.newval$gray = gray( mydata.newval$scaled )
heatmap.2( as.matrix(mydata),
dendrogram = "row", scale = "row",
col = hmcols, trace = "none",
margin = c(8,9),
hclust = hclustfunc, distfun = distfunc,
RowSideColors=mydata.newval$gray )
If you want the gray column in between the heatmap and the labels, there isn't a simple
way to do that with heatmap.2; I don't think it was designed for
such purposes. One way to hack it together would be to make the gray values
go from 10 to 11 (or something out of the range of the rest of the data). Then
you would change the colors mapped to the breaks (see here). However, this
would make your key look pretty funky.
# heatmap.2 does the clustering BEFORE the scaling.
# Clustering after scaling might give different results
# heatmap.2 also reorders the dendrogram according to rowMeans.
# (Code copied directly from the heatmap.2 function)
x = as.matrix(mydata)
Rowv = rowMeans(x, na.rm = TRUE)
hcr = hclustfunc(distfunc(x))
ddr = as.dendrogram(hcr)
ddr = reorder(ddr, Rowv) # the row dendrogram
# Scale the data as heatmap.2 does
rm = rowMeans(x, na.rm = TRUE)
x = sweep(x, 1, rm)
sx = apply(x, 1, sd, na.rm = TRUE)
x = sweep(x, 1, sx, "/")
# add the new data as a column
new.vals = mydata.newval[,1]
new.vals.scaled = ( new.vals - min(new.vals) ) /
( max(new.vals) - min(new.vals) ) # scaled from 0 to 1
x = cbind( x, gray = max(x) + new.vals.scaled + 0.1 )
# make the custom breaks and colors
edge = max(abs(x-1.1))
breaks = seq(-edge,edge+1.1,length.out=1000)
gradient1 = greenred( sum( breaks[-length(breaks)] <= edge ) )
gradient2 = colorpanel( sum( breaks[-length(breaks)] > edge ), "white", "black" )
hm.colors = c(gradient1,gradient2)
hm = heatmap.2( x, col=hm.colors, breaks=breaks,
scale="none",
dendrogram="row", Rowv=ddr,
trace="none", margins=c(8,9) )
Although this hack works, I would look for a more robust solution using more flexible packages that play with different viewports using the grid package.