R: ggplot2 and loop with data.table - r

I have to make 4 plots which differ only for y and ylab.
I start from a data.table dt which is
set.seed(123)
dt <- data.table(a = rnorm(20),
b = rnorm(20),
c = rnorm(20),
d = rnorm(20),
e = rnorm(20))
Every single plot should be a scatterplot with row numbers as x vs y values. Additionally, I want to plot some hline at median(y) + h*mad(y) where h = c(0, -2, 2, -3, 3)
This plot should be repeated for columns a, c, d and e of dt.
I came up with this bit of code
# Defining y labels #
ylabels <- c(bquote(phantom(.)^100*A~"/"*phantom(.)^200*A),
bquote(phantom(.)^101*C~"/"*phantom(.)^201*B),
bquote(phantom(.)^102*D~"/"*phantom(.)^202*D),
bquote(phantom(.)^103*E~"/"*phantom(.)^202*E))
# Selecting columns of dt
ydata <- names(dt)[c(1, 3, 4, 5)]
h <- c(0, -2, 2, -3, 3)
hcol <- c("#009E73", "#E69F00", "#E69F00", "red", "red")
# The for cycle should create the 4 plots and assign them to a list
plots <- list()
for (i in seq_along(ydata)) {
p1 <- ggplot(dt, aes_string(x = seq(1, dt[, .N]), y = ydata[i])) +
geom_point() +
geom_hline(aes_string(yintercept = median(ydata[i]) +
h * mad(ydata[i])), color = hcol) +
xlab("Replicate") +
ylab(ylabels[i]) +
scale_x_continuous(breaks = seq(1, dt[,.N])))
plots[[i]] <- p1 # add each plot into plot list
}
Then plots will be fed to the multiplot function from Cookbook for R.
However my loop doesn't work properly because it fails to calculate the median and mad values.
Do you have any suggestions to make the code work?

# data.table with the median +- h* mad values
hline.values <- dt[, lapply(.SD, function(x) median(x) + h * mad(x)),
.SDcols = ydata]
# new empty list
plots <- list()
for (i in seq_along(ydata)) {
p1 <- ggplot(dt, aes_string(x = seq(1, dt[, .N]), y = ydata[i])) +
geom_point() +
geom_hline(data = hline.values,
aes_string(yintercept = ydata[i])) +
# Axis labels and theme
xlab("Replicate") +
ylab(ylabels[[i]]) +
scale_x_continuous(breaks = seq(1, dt[, .N]))
plots[[i]] <- p1
}

Related

List of plots generated in ggplot2 using scale_color_gradientn have wrong coloring

I'm attempting to use library(scales) and scale_color_gradientn() to create a custom mapping of colors to a continuous variable, in an attempt to limit the effect of outliers on the coloring of my plot. This works for a single plot, but does not work when I use a loop to generate several plots and store them in a list.
Here is a minimal working example:
library(ggplot2)
library(scales)
data1 <- as.data.frame(cbind(x = rnorm(100),
y = rnorm(100),
v1 = rnorm(100, mean = 2, sd = 1),
v2 = rnorm(100, mean = -2, sd = 1)))
#add outliers
data1[1,"v1"] <- 200
data1[2,"v1"] <- -200
data1[1,"v2"] <- 50
data1[2,"v2"] <- -50
#define color palette
cols <- colorRampPalette(c("#3540FF","black","#FF3535"))(n = 100)
#simple color scale
col2 <- scale_color_gradient2(low = "#3540FF",
mid = "black",
high = "#FF3535"
)
#outlier-adjusted color scale
{
aa <- min(data1$v1)
bb <- quantile(data1$v1, 0.05)
cc <- quantile(data1$v1, 0.95)
dd <- max(data1$v1)
coln <- scale_color_gradientn(colors = cols[c(1,5,95,100)],
values = rescale(c(aa,bb,cc,dd),
limits = c(aa,dd))
)
}
Plots:
1. Plot with simple scales - outliers cause scales to stretch out.
ggplot(data1, aes(x = x, y = y, color = v1))+
geom_point()+
col2
2. Plot with outlier-adjusted scales - correct color scaling.
ggplot(data1, aes(x = x, y = y, color = v1))+
geom_point()+
coln
3. The scales for v1 do not work for v2 as the data is different.
ggplot(data1, aes(x = x, y = y, color = v2))+
geom_point()+
coln
#loop to produce list of plots each with own scale
{
plots <- list()
k <- 1
for (i in c("v1","v2")){
aa <- min(data1[,i])
bb <- quantile(data1[,i],0.05)
cc <- quantile(data1[,i], 0.95)
dd <- max(data1[,i])
colm <- scale_color_gradientn(colors = cols[c(1,5,95,100)],
values = rescale(c(aa,bb,cc,dd),
limits = c(aa,dd)))
plots[[k]] <- ggplot(data1, aes_string(x = "x",
y = "y",
color = i
))+
geom_point()+
colm
k <- k + 1
}
}
4. First plot has the wrong scales.
plots[[1]]
5. Second plot has the correct scales.
plots[[2]]
So I'm guessing this has something to do with the scale_color_gradientn() function being called when the plotting takes place, rather than within the loop.
If anyone can help with this, it'd be much appreciated. In base R I would bin the continuous data and assigning hex colors into a vector used for fill color, but I'm unsure how I can apply this within ggplot.
You need to use a closure (function with associated environment):
{
plots <- list()
k <- 1
for (i in c("v1", "v2")){
colm <- function() {
aa <- min(data1[, i])
bb <- quantile(data1[, i], 0.05)
cc <- quantile(data1[, i], 0.95)
dd <- max(data1[, i])
scale_color_gradientn(colors = cols[c(1, 5, 95, 100)],
values = rescale(c(aa, bb, cc, dd),
limits = c(aa, dd)))
}
plots[[k]] <- ggplot(data1, aes_string(x = "x",
y = "y",
color = i)) +
geom_point() +
colm()
k <- k + 1
}
}
plots[[1]]
plots[[2]]

Manually assigning colors with scale_fill_manual only works for certain hexagon sizes

I am trying to create a scatterplot that is summarized by hexagon bins of counts. I would like the user to be able to define the count breaks for the color scale. I have this working, using scale_fill_manual(). Oddly, however, it only works sometimes. In the MWE below, using the given seed value, if xbins=10, there are issues resulting in a plot as follows:
However, if xbins=20 or 40, for example, the plot doesn't seem to have problems:
My MWE is as follows:
library(ggplot2)
library(hexbin)
library(RColorBrewer)
set.seed(1)
xbins <- 20
x <- abs(rnorm(10000))
y <- abs(rnorm(10000))
minVal <- min(x, y)
maxVal <- max(x, y)
maxRange <- c(minVal, maxVal)
buffer <- (maxRange[2] - maxRange[1]) / (xbins / 2)
h <- hexbin(x = x, y = y, xbins = xbins, shape = 1, IDs = TRUE,
xbnds = maxRange, ybnds = maxRange)
hexdf <- data.frame (hcell2xy(h), hexID = h#cell, counts = h#count)
my_breaks <- c(2, 4, 6, 8, 20, 1000)
clrs <- brewer.pal(length(my_breaks) + 3, "Blues")
clrs <- clrs[3:length(clrs)]
hexdf$countColor <- cut(hexdf$counts, breaks = c(0, my_breaks, Inf),
labels = rev(clrs))
ggplot(hexdf, aes(x = x, y = y, hexID = hexID, fill = countColor)) +
scale_fill_manual(values = levels(hexdf$countColor)) +
geom_hex(stat = "identity") +
geom_abline(intercept = 0, color = "red", size = 0.25) +
coord_fixed(xlim = c(-0.5, (maxRange[2] + buffer)),
ylim = c(-0.5, (maxRange[2] + buffer))) +
theme(aspect.ratio=1)
My goal is to tweak this code so that the plot does not have problems (where suddenly certain hexagons are different sizes and shapes than the rest) regardless of the value assigned to xbins. However, I am puzzled what may be causing this problem for certain xbins values. Any advice would be greatly appreciated.
EDIT:
I am updating the example code after taking into account comments by #bdemarest and #Axeman. I followed the most popular answer in the link #Axeman recommends, and believe it is more useful when you are working with scale_fill_continuous() on an integer vector. Here, I am working on scale_fill_manual() on a factor vector. As a result, I am still unable to get this goal to work. Thank you.
library(ggplot2)
library(hexbin)
library(RColorBrewer)
set.seed(1)
xbins <- 10
x <- abs(rnorm(10000))
y <- abs(rnorm(10000))
minVal <- min(x, y)
maxVal <- max(x, y)
maxRange <- c(minVal, maxVal)
buffer <- (maxRange[2] - maxRange[1]) / (xbins / 2)
bindata = data.frame(x=x,y=y,factor=as.factor(1))
h <- hexbin(bindata, xbins = xbins, IDs = TRUE, xbnds = maxRange, ybnds = maxRange)
counts <- hexTapply (h, bindata$factor, table)
counts <- t (simplify2array (counts))
counts <- melt (counts)
colnames (counts) <- c ("factor", "ID", "counts")
counts$factor =as.factor(counts$factor)
hexdf <- data.frame (hcell2xy (h), ID = h#cell)
hexdf <- merge (counts, hexdf)
my_breaks <- c(2, 4, 6, 8, 20, 1000)
clrs <- brewer.pal(length(my_breaks) + 3, "Blues")
clrs <- clrs[3:length(clrs)]
hexdf$countColor <- cut(hexdf$counts, breaks = c(0, my_breaks, Inf), labels = rev(clrs))
ggplot(hexdf, aes(x = x, y = y, fill = countColor)) +
scale_fill_manual(values = levels(hexdf$countColor)) +
geom_hex(stat = "identity") +
geom_abline(intercept = 0, color = "red", size = 0.25) +
coord_cartesian(xlim = c(-0.5, maxRange[2]+buffer), ylim = c(-0.5, maxRange[2]+ buffer)) + theme(aspect.ratio=1)
you can define colors in 'geom' instead of 'scale' that modifies the scale of plot:
ggplot(hexdf, aes(x = x, y = y)) +
geom_hex(stat = "identity",fill =hexdf$countColor)

Fill negative value area below geom_line [duplicate]

I frequently use kernel density plots to illustrate distributions. These are easy and fast to create in R like so:
set.seed(1)
draws <- rnorm(100)^2
dens <- density(draws)
plot(dens)
#or in one line like this: plot(density(rnorm(100)^2))
Which gives me this nice little PDF:
I'd like to shade the area under the PDF from the 75th to 95th percentiles. It's easy to calculate the points using the quantile function:
q75 <- quantile(draws, .75)
q95 <- quantile(draws, .95)
But how do I shade the the area between q75 and q95?
With the polygon() function, see its help page and I believe we had similar questions here too.
You need to find the index of the quantile values to get the actual (x,y) pairs.
Edit: Here you go:
x1 <- min(which(dens$x >= q75))
x2 <- max(which(dens$x < q95))
with(dens, polygon(x=c(x[c(x1,x1:x2,x2)]), y= c(0, y[x1:x2], 0), col="gray"))
Output (added by JDL)
Another solution:
dd <- with(dens,data.frame(x,y))
library(ggplot2)
qplot(x,y,data=dd,geom="line")+
geom_ribbon(data=subset(dd,x>q75 & x<q95),aes(ymax=y),ymin=0,
fill="red",colour=NA,alpha=0.5)
Result:
An expanded solution:
If you wanted to shade both tails (copy & paste of Dirk's code) and use known x values:
set.seed(1)
draws <- rnorm(100)^2
dens <- density(draws)
plot(dens)
q2 <- 2
q65 <- 6.5
qn08 <- -0.8
qn02 <- -0.2
x1 <- min(which(dens$x >= q2))
x2 <- max(which(dens$x < q65))
x3 <- min(which(dens$x >= qn08))
x4 <- max(which(dens$x < qn02))
with(dens, polygon(x=c(x[c(x1,x1:x2,x2)]), y= c(0, y[x1:x2], 0), col="gray"))
with(dens, polygon(x=c(x[c(x3,x3:x4,x4)]), y= c(0, y[x3:x4], 0), col="gray"))
Result:
This question needs a lattice answer. Here's a very basic one, simply adapting the method employed by Dirk and others:
#Set up the data
set.seed(1)
draws <- rnorm(100)^2
dens <- density(draws)
#Put in a simple data frame
d <- data.frame(x = dens$x, y = dens$y)
#Define a custom panel function;
# Options like color don't need to be hard coded
shadePanel <- function(x,y,shadeLims){
panel.lines(x,y)
m1 <- min(which(x >= shadeLims[1]))
m2 <- max(which(x <= shadeLims[2]))
tmp <- data.frame(x1 = x[c(m1,m1:m2,m2)], y1 = c(0,y[m1:m2],0))
panel.polygon(tmp$x1,tmp$y1,col = "blue")
}
#Plot
xyplot(y~x,data = d, panel = shadePanel, shadeLims = c(1,3))
Here's another ggplot2 variant based on a function that approximates the kernel density at the original data values:
approxdens <- function(x) {
dens <- density(x)
f <- with(dens, approxfun(x, y))
f(x)
}
Using the original data (rather than producing a new data frame with the density estimate's x and y values) has the benefit of also working in faceted plots where the quantile values depend on the variable by which the data is being grouped:
Code used
library(tidyverse)
library(RColorBrewer)
# dummy data
set.seed(1)
n <- 1e2
dt <- tibble(value = rnorm(n)^2)
# function that approximates the density at the provided values
approxdens <- function(x) {
dens <- density(x)
f <- with(dens, approxfun(x, y))
f(x)
}
probs <- c(0.75, 0.95)
dt <- dt %>%
mutate(dy = approxdens(value), # calculate density
p = percent_rank(value), # percentile rank
pcat = as.factor(cut(p, breaks = probs, # percentile category based on probs
include.lowest = TRUE)))
ggplot(dt, aes(value, dy)) +
geom_ribbon(aes(ymin = 0, ymax = dy, fill = pcat)) +
geom_line() +
scale_fill_brewer(guide = "none") +
theme_bw()
# dummy data with 2 groups
dt2 <- tibble(category = c(rep("A", n), rep("B", n)),
value = c(rnorm(n)^2, rnorm(n, mean = 2)))
dt2 <- dt2 %>%
group_by(category) %>%
mutate(dy = approxdens(value),
p = percent_rank(value),
pcat = as.factor(cut(p, breaks = probs,
include.lowest = TRUE)))
# faceted plot
ggplot(dt2, aes(value, dy)) +
geom_ribbon(aes(ymin = 0, ymax = dy, fill = pcat)) +
geom_line() +
facet_wrap(~ category, nrow = 2, scales = "fixed") +
scale_fill_brewer(guide = "none") +
theme_bw()
Created on 2018-07-13 by the reprex package (v0.2.0).

How can I add significant brackets between facets of a faceted R plot using ggplot?

I’m using the following code to plot longitudinal data with the facet_grid() option. I would like to indicate significant group differences between the facet grids using brackets and asterisks. However, so far I could only add text/lines within the individual grids, and not between them.
for(i in seq_along(varlist)){
p <- ggplot(data = Plot, aes(x = Timepoint , y = eval(parse(text = varlist[i])),
group = Sub_ID, colour = Subgroup)) + geom_point() +
geom_line(linetype = "dashed")
r <- p + stat_smooth(aes(group = 1, method = "lm")) + stat_summary(aes(group = 1),
geom = "point", fun.y = mean, shape = 17, size = 5) + facet_grid(. ~ Subgroup)
ggsave(filename=paste(varlist[i],"_by_subgroup.jpg", sep=""),width = 10, height = 7.5)
}
Load the libraries
require(data.table)
require(ggplot2)
require(gtable)
Make toy data
data0 <- data.table(iris)[,list(Mean.Sepal.Length=mean(.SD[,Sepal.Length]),Mean.Petal.Length=mean(.SD[,Petal.Length])),by=list(Species)]
data1 <- melt(data0,id.vars="Species")
## ## Draw the bars
p <- ggplot(data=data1,aes(x=variable,y=value,fill=variable)) +
geom_bar(stat="identity") +
facet_grid(~Species) +
scale_x_discrete(breaks=NULL)
p
Draw the brackets and asterisks
## make function to rescale the coordinates to npc
scale_to_npc <- function(x, range) scales::rescale(c(range, x), c(0,1))[-c(1,2)]
scale_x <- function(x,facet,ranges){scale_to_npc(x,ranges[[facet]][["x.range"]])}
scale_y <- function(y,facet,ranges){scale_to_npc(y,ranges[[facet]][["y.range"]])}
## build grobs and get the ranges
gb <- ggplot_build(p)
g <- ggplot_gtable(gb)
## gtable_show_layout(g)
ranges <- gb$panel$ranges
## get and rescale the coordinates
y1 <- data1[variable=="Mean.Petal.Length",min(value)]
y3 <- data1[,max(value)]
y4 <- data1[variable=="Mean.Petal.Length",max(value)]
data2 <- data.frame(x.=c(2,2,2,2,1.5),y.=c(y1,y3*1.01,y3*1.01,y4,y3*1.01),facet=c(1,1,3,3,2))
data2b <- data.frame(
x=mapply(scale_x,data2[,1],data2[,3],MoreArgs=list(ranges =ranges)),
y=mapply(scale_y,data2[,2],data2[,3],MoreArgs=list(ranges=ranges))
)
## draw the brackets and asterisks
g <- gtable_add_grob(g, moveToGrob(data2b[1,1],data2b[1,2]),t=4,l=4,b=4,r=4)
g <- gtable_add_grob(g, lineToGrob(data2b[2,1],data2b[2,2]),t=4.5,l=4,b=4,r=4)
g <- gtable_add_grob(g, moveToGrob(data2b[2,1],data2b[2,2]),t=4.5,l=4,b=4,r=4)
g <- gtable_add_grob(g, lineToGrob(data2b[3,1],data2b[3,2]),t=4,l=8,b=4,r=8)
g <- gtable_add_grob(g, moveToGrob(data2b[3,1],data2b[3,2]),t=4,l=8,b=4,r=8)
g <- gtable_add_grob(g, lineToGrob(data2b[4,1],data2b[4,2]),t=4.5,l=8,b=4,r=8)
g <- gtable_add_grob(g, textGrob("***",data2b[5,1],data2b[5,2]),t=4,l=4,b=4,r=8)
## turn clip off to allow the line across panels
g$layout$clip <- "off"
grid.newpage()
grid.draw(g)

ggplot2 boxplot: horizontal bar at median?

I would like to make a ggplot2 boxplot more meaningful by adding a thick bar at the median (so that if the median is equal to either of the lower or upper quartiles, it can be detected to which it is equal). I came across a recent post of Kohske:
Can I get boxplot notches in ggplot2?
but I didn't know how to give the "crossbar" a "height". Then I tried
to use a rectangle but it didn't work either. Here is a minimal example:
require(ggplot2)
require(reshape2)
require(plyr)
set.seed(1)
## parameters
p1 <- c(5, 20, 100)
p2 <- c("f1", "f2", "f3", "f4", "f5")
p3 <- c("g1","g2","g3","g4","g5")
N <- 1000
## lengths
l1 <- length(p1)
l2 <- length(p2)
l3 <- length(p3)
## build result array containing the measurements
arr <- array(rep(NA, l1*l2*l3*N), dim=c(l1, l2, l3, N),
dimnames=list(
p1=p1,
p2=p2,
p3=p3,
N=1:N))
for(i in 1:l1){
for(j in 1:l2){
for(k in 1:l3){
arr[i,j,k,] <- i+j+k+runif(N, min=-4, max=4)
}
}
}
arr <- arr + rexp(3*5*5*N)
## create molten data
mdf <- melt(arr, formula = . ~ p1 + p2 + p3 + N) # create molten data frame
## confidence interval calculated by `boxplot.stats`
f <- function(x){
ans <- boxplot.stats(x)
data.frame(x=x, y=ans$stats[3], ymin=ans$conf[1], ymax=ans$conf[2])
}
## (my poor) trial
ggplot(mdf, aes(x=p3, y=value)) + geom_boxplot(outlier.shape=1) +
stat_summary(fun.data=f, geom="rectangle", colour=NA, fill="black",
xmin=x-0.36, xmax=x+0.36, ymin=max(y-0.2, ymin), ymax=min(y+0.2,
ymax)) + facet_grid(p2 ~ p1, scales = "free_y")
**SOLUTION** (after the discussion with Kohske below):
f <- function(x, height){
ans <- median(x)
data.frame(y=ans, ymin=ans-height/2, ymax=ans+height/2)
}
p <- ggplot(mdf, aes(x=p3, y=value)) + geom_boxplot(outlier.shape=1) +
stat_summary(fun.data=f, geom="crossbar", height=0.5, colour=NA,
fill="black", width=0.78) +
facet_grid(p2 ~ p1, scales = "free_y")
pdf()
print(p)
dev.off()
**UPDATE** Hmmm... it's not that trivial. The following example shows that the "height" of the crossbar should be adapted to the y-axis scale, otherwise it might be overseen.
require(ggplot2)
require(reshape2)
require(plyr)
set.seed(1)
## parameters
p1 <- c(5, 20, 100)
p2 <- c("f1", "f2", "f3", "f4", "f5")
p3 <- c("g1","g2","g3","g4","g5")
N <- 1000
## lengths
l1 <- length(p1)
l2 <- length(p2)
l3 <- length(p3)
## build result array containing the measurements
arr <- array(rep(NA, l1*l2*l3*N), dim=c(l1, l2, l3, N),
dimnames=list(
p1=p1,
p2=p2,
p3=p3,
N=1:N))
for(i in 1:l1){
for(j in 1:l2){
for(k in 1:l3){
arr[i,j,k,] <- i+j^4+k+runif(N, min=-4, max=4)
}
}
}
arr <- arr + rexp(3*5*5*N)
arr[1,2,5,] <- arr[1,2,5,]+30
arr[1,5,3,] <- arr[1,5,3,]+100
## create molten data
mdf <- melt(arr, formula = . ~ p1 + p2 + p3 + N) # create molten data frame
f <- function(x, height){
ans <- median(x)
data.frame(y=ans, ymin=ans-height/2, ymax=ans+height/2)
}
## plot
p <- ggplot(mdf, aes(x=p3, y=value)) + geom_boxplot(outlier.shape=1) +
stat_summary(fun.data=f, geom="crossbar", height=0.7, colour=NA,
fill="black", width=0.78) +
facet_grid(p2 ~ p1, scales = "free_y")
pdf()
print(p)
dev.off()
here is an example:
f <- function(x, height) {
ans <- median(x)
data.frame(ymin = ans-height/2, ymax = ans+height/2, y = ans)
}
df <- data.frame(x=gl(2,6), y=c(1,1,1,1,3,3, 1,1,3,3,3,3))
ggplot(df, aes(x, y)) + geom_boxplot() +
stat_summary(fun.data = f, geom = "crossbar", height = 0.1,
colour = NA, fill = "skyblue", width = 0.8, alpha = 0.5)
if you just want to change the apparence, then here is a quick hack, I don't recommend though,
df <- data.frame(x=gl(2,6), y=c(c(1,1,1,1,3,3), c(1,1,3,3,3,3)*10))
ggplot(df, aes(x, y)) + geom_boxplot() + facet_grid(x~.)
gs <- grid.gget("geom_boxplot", grep = T)
if (inherits(gs, "grob")) gs <- list(gs)
gss <- llply(gs, function(g) g$children[[length(g$children)]])
l_ply(gss, function(g) grid.edit(g$name, grep=T, just = c("left", "center"), height = unit(0.05, "native"), gp = gpar(fill = "skyblue", alpha = 0.5, col = NA)))

Resources