Changing the colour palette based on quantile values in pheatmap - r

enter image description hereI am very new to R and I am trying to make a pheatmap out of my data. I just copied some existing code included in a tutorial and it seems it pretty nicely fitted to my data after some tweaking. I also included some quantile code, that should change the color breaks based on the data I have, because most of the values are between 0-100 but just couple of them is in thousands.
I would like to make it so the small values are more variable in colours and keep the light yellow only to the most extreme values. From the legend it seems it is the other way around right now...
Can somebody help me to tweak the breaks?
Thanks!
Here is my code and the output heatmap (changed because it contained sensitive data..).
x <- read.table("proteins_cpm_commas.tsv", header=TRUE, row.names = 1)
x <- as.matrix(x)
x
pheatmap(x,
drop_levels = TRUE,
cluster_rows=F,
cluster_cols=F,
treeheight_col = 0,
treeheight_row = 0,
fontsize = 8,
color = inferno(length(mat_breaks) - 1),
breaks = mat_breaks,
main = "Title ")
mat_breaks <- seq(min(x), max(x), length.out = 20)
mat_breaks
quantile_breaks <- function(xs, n = 20) {
breaks <- quantile(xs, probs = seq(0, 1, length.out = n))
breaks[!duplicated(breaks)]
}
mat_breaks <- quantile_breaks(x, n = 20)
mat_breaks

While this post is not using the pheatmap package, since the solutions I came up with are all rather hacks, I would recommend a solution using the ComplexHeatmap package. Example using mock data:
suppressPackageStartupMessages(
lapply(c("ComplexHeatmap", "circlize", "viridisLite"),
require, character.only=TRUE))
set.seed(23)
x <- matrix(rexp(2000, rate=.001), ncol=20)
quantile_breaks <- function(xs, n = 10) {
breaks <- quantile(xs, probs = seq(0, 1, length.out = n))
breaks[!duplicated(breaks)]
}
mat_breaks <- quantile_breaks(x, n = 11)
col_fun_prop <- colorRamp2(quantile_breaks(x, n = 11), viridisLite::inferno(11))
Heatmap(x,col = col_fun_prop,
heatmap_legend_param=list(
labels=round(mat_breaks), at=mat_breaks, col_fun = col_fun_prop,
title = "Prop", break_dist = 1))
Created on 2022-03-31 by the reprex package (v2.0.1)

Related

Pixelwise regression on rasters in R

I am performing a pixelwise regression on 4 different raster files, each representing a different time stamp of a coastal dune. My goal is to calculate the slope of elevation change, to see which areas show the fastest growth in elevation.
They have been resampled to have the same resolution, and were then stacked.
I then wanted to perform a regression analysis on them (which I found here: pixel level regression with large raster dataset). I however got an error, and it think it is something silly (I am quite new to raster analyses) but I do not seem to figure it out.
This is my code:
dem18 = raster("20220912_0216419_DUDE Oostende_T18_DEM.tif")
dem1 = raster("20210223_DUDE_Oostende_T1_DEM.tif")
dem3 = raster("20210430_0216404_DUDE Oostende_T3_DEM.tif")
dem15 = raster("20220419_0216416_DUDE Oostende_T15_DEM.tif")
dem1 = resample(dem1, dem18)
dem3 = resample(dem3, dem18)
dem15 = resample(dem15, dem18)
dem_stacked = stack(dem1, dem3, dem15, dem18)
plot(dem_stacked, xlim = x_lim, ylim = y_lim, col = terrain.colors(100))
This is the plot:
This is my regression:
func = function(val) { summary(lm(y ~ x, data = data.frame(x = val, y = 1:4)))$coefficients[2,1]}
slope = calc(dem_stacked, func)
plot(slope)
Error in .calcTest(x[1:5], fun, na.rm, forcefun, forceapply) :
cannot use this function
Thanks in advance!
I think your issue is your indexing in the regression. Try something like this:
library(raster)
#test dataset
set.seed(32)
dem1 <- raster(matrix(runif(9, 1, 10), 3, 3))
dem2 <- raster(matrix(runif(9, 1, 20), 3, 3))
dem3 <- raster(matrix(runif(9, 1, 100), 3, 3))
dem4 <- raster(matrix(runif(9, 1, 300), 3, 3))
dem_stacked <- stack(dem1, dem2, dem3, dem4)
plot(dem_stacked)
func <- function(val) {
summary(lm(y ~ x, data = data.frame(x = val, y = 1:4)))$coefficients[2,1]
}
slopes <- calc(dem_stacked, func)
plot(slopes)
#test regression on one pixel stack
func(as.vector(dem_stacked[1,1]))
#> [1] 0.01707964
Created on 2022-09-27 by the reprex package (v2.0.1)

How to rotate nodes of a time-calibrated phylogenetic tree to match a particular order in R?

I have a time-calibrated phylogenetic tree from BEAST and I would like to make a figure in which its nodes are rotated to match an arbitrary ordering. The following code works perfectly to plot the tree with the nodes in the order they are in the input file.
library("phytools")
library("phyloch")
library("strap")
library("coda")
t <- read.beast("mcctree.tre") # I couldn't upload the file here
t$root.time <- t$height[1]
num_taxa <- length(t$tip.label)
display_all_node_bars <- TRUE
names_list <-vector()
for (name in t$tip){
v <- strsplit(name, "_")[[1]]
if(display_all_node_bars){
names_list = c(names_list, name)
}
else if(v[length(v)]=="0"){
names_list = c(names_list, name)
}
}
nids <- vector()
pos <- 1
len_nl <- length(names_list)
for(n in names_list){
for(nn in names_list[pos:len_nl]){
if(n != nn){
m <- getMRCA(t,c(n, nn))
if(m %in% nids == FALSE){
nids <- c(nids, m)
}
}
}
pos <- pos+1
}
pdf("tree.pdf", width = 20, height = 20)
geoscalePhylo(tree = t,
x.lim = c(-2,21),
units = c("Epoch"),
tick.scale = "myr",
boxes = FALSE,
width = 1,
cex.tip = 2,
cex.age = 3,
cex.ts = 2,
erotate = 0,
label.offset = 0.1)
lastPP <- get("last_plot.phylo", envir = .PlotPhyloEnv)
for(nv in nids){
bar_xx_a <- c(lastPP$xx[nv]+t$height[nv-num_taxa]-t$"height_95%_HPD_MIN"[nv-num_taxa],
lastPP$xx[nv]-(t$"height_95%_HPD_MAX"[nv-num_taxa]-t$height[nv-num_taxa]))
lines(bar_xx_a, c(lastPP$yy[nv], lastPP$yy[nv]), col = rgb(0, 0, 1, alpha = 0.3), lwd = 12)
}
t$node.label <- t$posterior
p <- character(length(t$node.label))
p[t$node.label >= 0.95] <- "black"
p[t$node.label < 0.95 & t$node.label >= 0.75] <- "gray"
p[t$node.label < 0.75] <- "white"
nodelabels(pch = 21, cex = 1.5, bg = p)
dev.off()
The following code is my attempt to rotate the nodes in the way I want (following this tutorial: http://blog.phytools.org/2015/04/finding-closest-set-of-node-rotations.html). And it works for rotating the nodes. However, the blue bars indicating the confidence intervals of the divergence time estimates get out of their correct place - this is what I would like help to correct. This will be used in much larger files with hundreds of branches - the example here is simplified.
new.order <- c("Sp8","Sp9","Sp10","Sp7","Sp6","Sp5","Sp4","Sp2","Sp3","Ou1","Ou2","Sp1")
t2 <- setNames(1:Ntip(t), new.order)
new.order.tree <- minRotate(t, t2)
new.order.tree$root.time <- t$root.time
new.order.tree$height <- t$height
new.order.tree$"height_95%_HPD_MIN" <- t$"height_95%_HPD_MIN"
new.order.tree$"height_95%_HPD_MAX" <- t$"height_95%_HPD_MAX"
pdf("reordered_tree.pdf", width = 20, height = 20)
geoscalePhylo(tree = new.order.tree,
x.lim = c(-2,21),
units = c("Epoch"),
tick.scale = "myr",
boxes = FALSE,
width = 1,
cex.tip = 2,
cex.age = 3,
cex.ts = 2,
erotate = 0,
label.offset = 0.1)
lastPP <- get("last_plot.phylo", envir = .PlotPhyloEnv)
for(nv in nids){
bar_xx_a <- c(lastPP$xx[nv]+new.order.tree$height[nv-num_taxa]-new.order.tree$"height_95%_HPD_MIN"[nv-num_taxa],
lastPP$xx[nv]-(new.order.tree$"height_95%_HPD_MAX"[nv-num_taxa]-new.order.tree$height[nv-num_taxa]))
lines(bar_xx_a, c(lastPP$yy[nv], lastPP$yy[nv]), col = rgb(0, 0, 1, alpha = 0.3), lwd = 12)
}
new.order.tree$node.label <- t$posterior
p <- character(length(new.order.tree$node.label))
p[new.order.tree$node.label >= 0.95] <- "black"
p[new.order.tree$node.label < 0.95 & new.order.tree$node.label >= 0.75] <- "gray"
p[new.order.tree$node.label < 0.75] <- "white"
nodelabels(pch = 21, cex = 1.5, bg = p)
dev.off()
I've found several similar questions here and in other forums, but none dealing specifically with time-calibrated trees - which is the core of the problem described above.
The short answer is that phyTools::minRotate() doesn't recognize the confidence intervals as associated with nodes. If you contact the phyTools maintainers, they may well be able to add this functionality quite easily.
Meanwhile, you can correct this yourself.
I don't know how read.beast() saves confidence intervals – let's say they're saved in t$conf.int. (Type unclass(t) at the R command line to see the full structure; you should be able to identify the appropriate property.)
If the tree's node labels are unique, then you can infer the new sequence of nodes using match():
library("phytools")
new.order <- c("Sp8","Sp9","Sp10","Sp7","Sp6","Sp5","Sp4","Sp2","Sp3","Ou1","Ou2","Sp1")
# Set up a fake initial tree -- you would load the tree from a file
tree <- rtree(length(new.order))
tree$tip.label <- sort(new.order)
tree$node.label <- seq_len(tree$Nnode)
tree$conf.int <- seq_len(tree$Nnode) * 10
# Plot tree
par(mfrow = c(1, 2), mar = rep(0, 4), cex = 0.9) # Create space
plot(tree, show.node.label = TRUE)
nodelabels(tree$conf.int, adj = 1) # Annotate "correct" intervals
# Re-order nodes with minRotate
noTree <- minRotate(tree, setNames(seq_along(new.order), new.order))
plot(noTree, show.node.label = TRUE)
# Move confidence intervals to correct node
tree$conf.int <- tree$conf.int[match(noTree$node.label, tree$node.label)]
nodelabels(tree$conf.int, adj = 1)
If you can't guarantee that the node labels are unique, you can always overwrite them in a temporary object:
# Find node order
treeCopy <- tree
treeCopy$node.label <- seq_len(tree$Nnode)
nodeOrder <- match(minRotate(treeCopy)$node.label, treeCopy$node.label)
# Apply node order
tree$conf.int <- tree$conf.int[nodeOrder]

Adjust nomogram ticks with (splines) transformation, rms package [R]

I'm using a Cox regression model considering my variable trough splines transformation. All is working nice until the subsequent nomogram... as expected, the scale of my variable is also transformed but I'd like to add some custom ticks inside the region between values 0 and 2 (I guess is the transformed one). Any idea, if you please?
Here's my code...
data <- source("https://pastebin.com/raw/rGtUSTLz")$value
ddist <- datadist(data)
options(datadist = "ddist")
fit <- cph(Surv(time, event) ~ rcs(var, 3), data = data, surv = T, x = T, y = T)
surv <- Survival(fit)
plot(nomogram(fit,
fun = list(function(x) surv(times = 10, lp = x),
function(x) surv(times = 30, lp = x),
function(x) surv(times = 60, lp = x)),
funlabel = paste("c", 1:3), lp = T))
... and these are the real and the desired outputs.
Thanks in advance for your help!
I have had this issue too. My answer is a work around using another package, regplot. Alternatively, if you know what the point values are at the tick marks you want plotted, then you can supply those instead of using the output from regplot. Basically, you need to modify the tick marks and points that are output from the nomogram function and supplied to plot the nomogram.
This method also provides a way to remove points / tick marks by editing the nomogram output.
data <- source("https://pastebin.com/raw/rGtUSTLz")$value
ddist <- datadist(data)
options(datadist = "ddist")
fit <- cph(Surv(time, event) ~ rcs(var, 3), data = data, surv = T, x = T, y = T)
surv <- Survival(fit)
nom1 <- nomogram(fit, fun = list(function(x) surv(times = 10, lp = x),
function(x) surv(times = 30, lp = x),
function(x) surv(times = 60, lp = x)),
funlabel = paste("c", 1:3), lp = T)
library(regplot)
# call regplot with points = TRUE to get output
regplot(fit, fun = list(function(x) surv(times = 10, lp = x),
function(x) surv(times = 30, lp = x),
function(x) surv(times = 60, lp = x)),
funlabel = paste("c", 1:3), points = TRUE)
# look at the points supplied through regplot and take those.
nom1_edit <- nom1
# now we edit the ticks supplied for var and their corresponding point value
nom1_edit[[1]][1] <- list(c(0, 0.06, 0.15, 0.3, 2,4,6,8,10,12,14,16))
nom1_edit[[1]][2] <- list(c(0, 10, 21, 32, 42.41191, 50.63878, 58.86565,
67.09252, 75.31939, 83.54626, 91.77313, 100.00000))
nom1_edit$var$points <- c(0, 10, 21, 32, 42.41191, 50.63878, 58.86565,
67.09252, 75.31939, 83.54626, 91.77313, 100.00000)
# plot the edited nomogram with new points
plot(nom1_edit)

Graphical output of density for the function gammamixEM (package mixtools)

I'm using the function gammamixEM from the package mixtools. How can I return the graphical output of density as in the function normalmixEM (i.e., the second plot in plot(...,which=2)) ?
Update:
Here is a reproducible example for the function gammamixEM:
x <- c(rgamma(200, shape = 0.2, scale = 14), rgamma(200,
shape = 32, scale = 10), rgamma(200, shape = 5, scale = 6))
out <- gammamixEM(x, lambda = c(1, 1, 1)/3, verb = TRUE)
Here is a reproducible example for the function normalmixEM:
data(faithful)
attach(faithful)
out <- normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03)
plot(out, which=2)
I would like to obtain this graphical output of density from the function gammamixEM.
Here you go.
out <- normalmixEM(waiting, arbvar = FALSE, epsilon = 1e-03)
x <- out
whichplots <- 2
density = 2 %in% whichplots
loglik = 1 %in% whichplots
def.par <- par(ask=(loglik + density > 1), "mar") # only ask and mar are changed
mix.object <- x
k <- ncol(mix.object$posterior)
x <- sort(mix.object$x)
a <- hist(x, plot = FALSE)
maxy <- max(max(a$density), .3989*mix.object$lambda/mix.object$sigma)
I just had to dig into the source code of plot.mixEM
So, now to do this with gammamixEM:
x <- c(rgamma(200, shape = 0.2, scale = 14), rgamma(200,
shape = 32, scale = 10), rgamma(200, shape = 5, scale = 6))
gammamixEM.out <- gammamixEM(x, lambda = c(1, 1, 1)/3, verb = TRUE)
mix.object <- gammamixEM.out
k <- ncol(mix.object$posterior)
x <- sort(mix.object$x)
a <- hist(x, plot = FALSE)
maxy <- max(max(a$density), .3989*mix.object$lambda/mix.object$sigma)
main2 <- "Density Curves"
xlab2 <- "Data"
col2 <- 2:(k+1)
hist(x, prob = TRUE, main = main2, xlab = xlab2,
ylim = c(0,maxy))
for (i in 1:k) {
lines(x, mix.object$lambda[i] *
dnorm(x,
sd = sd(x)))
}
I believe it should be pretty straight forward to continue this example a bit, if you want to add the labels, smooth lines, etc. Here's the source of the plot.mixEM function.

how can I set the bin centre values of histogram myself?

Lets say I have a data frame like below
mat <- data.frame(matrix(data = rexp(200, rate = 10), nrow = 100, ncol = 10))
Which then I can calculate the histogram on each of them columns using
matAllCols <- apply(mat, 2, hist)
Now if you look at matAllCols$breaks , you can see sometimes 11, sometimes 12 etc.
what I want is to set a threshold for it. for example it should always be 12 and the distances between each bin centre (which is stored as matAllCols$mids) be 0.01
Doing it for one column at the time seems to be simple, but when I tried to do it for all columns, it does not work. also this is only breaks, how to set the mids is also not straightforward
matAllCols <- apply(mat, 2, function(x) hist(x , breaks = 12))
is there anyway to do this ?
You can solve the probrem by giving the all breakpoints between histogram cells as breaks. (But this is written in stat.ethz.ch/R-manual/R-devel/library/graphics/html/hist.html as #Colonel Beauvel said)
set.seed(1); mat <- data.frame(matrix(data = rexp(200, rate = 10), nrow = 100, ncol = 10))
# You need to check the data range to decide the breakpoints.
range(mat) # [1] 0.002025041 0.483281274
# You can set the breakpoints manually.
matAllCols <- apply(mat, 2, function(x) hist(x , breaks = seq(0, 0.52, 0.04)))
You are looking for
set.seed(1)
mat <- data.frame(matrix(data = rexp(200, rate = 10), nrow = 100, ncol = 10))
matAllCols <- apply(mat, 2, function(x) hist(x , breaks = seq(0, 0.5, 0.05)))
or simply
x <- rexp(200, rate = 10)
hist(x[x>=0 & x <=0.5] , breaks = seq(0, 0.5, 0.05))

Resources