How can you limit range of stat_function plots with ggplot2? - r

I am trying to create a figure to show different saturation levels and their effect on sampling dynamics for a talk using the following code:
max <- 2
decay <- function(x, k, C) {
C * (1 - exp(-k*x))
}
require("ggplot2")
ggplot(NULL, aes(x=x, colour = C)) +
stat_function(data = data.frame(x = 0:max, C = factor(1)), fun = function(x) { decay(x, k=10, C=1e1) }) +
stat_function(data = data.frame(x = 0:max, C = factor(2)), fun = function(x) { decay(x, k=10, C=1e2) }) +
stat_function(data = data.frame(x = 0:max, C = factor(3)), fun = function(x) { decay(x, k=10, C=1e3) }) +
stat_function(data = data.frame(x = 0:max, C = factor(4)), fun = function(x) { decay(x, k=10, C=1e4) }) +
stat_function(data = data.frame(x = 0:max, C = factor(5)), fun = function(x) { decay(x, k=10, C=1e5) }) +
stat_function(data = data.frame(x = 0:max, C = factor(6)), fun = function(x) { decay(x, k=10, C=1e6) }) +
scale_colour_manual(values = c("red", "orange", "yellow", "green", "blue", "violet"), labels = c(1, 2, 3, 4, 5, 6)) + scale_colour_discrete(name=expression(paste(C, " value"))) +
ylab(label="count") + ylim(0, 100)
The intention is to show that for the high C value cases the curve will appear linear. However, the ylim prevents any curve from being shown where it has a value greater than the max for the ylim when I would expect it to merely truncate the curve at the max value.
How do I get the desired behaviou?

You have noticed the difference between limiting the scale(using scale_y_continuous(limits=...))
or limiting the coordinate space (using coord_cartesian(ylim=...).
When you call ylim it uses the equivalent of scale_y_continuous and drops observations not in the range
The help for ylim and xlim describe this (and point to coord_cartesian as an alternative)
# here is your example rewritten
ggplot(data = NULL, aes(x=x,colour=C)) +
lapply(1:6, function(y){
stat_function(data = data.frame(x=0:max,C=factor(y)),
fun = function(x) decay(x,k=10, C = 10^y))) +
scale_colour_manual(values = c("red", "orange", "yellow", "green", "blue", "violet"),
labels = c(1, 2, 3, 4, 5, 6)) +
scale_colour_discrete(name=expression(paste(C, " value"))) +
ylab(label="count") +
coord_cartesian(ylim = c(0,100))

Related

Shift geometric object along horizontal axis with ggplot

I want to use ggplot to plot three curves, each made with stat_function and with its own parameters.
This is done with the code below:
library(ggplot2)
ggplot(data.frame(x = c(0, 25)), aes(x)) +
stat_function(fun = function(x) plogis(x, location = 5, scale = 2), colour = "red") +
stat_function(fun = function(x) plogis(x, location = 9, scale = 3), colour = "blue") +
stat_function(fun = function(x) plogis(x, location = 9, scale = 4), colour = "green")
which gives the figure below:
What I want to achieve is to shift the blue and green curves, exactly as they are, to the right along the horizontal axis (each by an arbitrary amount).
I don't know of an explicit way to do it in ggplot, so I tried to specify a different frame for the second and third geometric objects, as below:
ggplot(data.frame(x = c(0, 25)), aes(x)) +
stat_function(fun = function(x) plogis(x, location = 5, scale = 2), colour = "red") +
stat_function(data = data.frame(x = c(3, 28)), fun = function(x) plogis(x, location = 9, scale = 3), colour = "blue") +
stat_function(data = data.frame(x = c(5, 30)), fun = function(x) plogis(x, location = 9, scale = 4), colour = "green")
But the resulting image is the same as the one above.
Your solution is almost correct, but you need to subtract the same constant within the function itself, so that the y-values still correspond.
c1 <- 4
c2 <- 4
p2 <- ggplot(data.frame(x = c(0, 25)), aes(x)) +
stat_function(fun = function(x) plogis(x, location = 5, scale = 2), colour = "red") +
stat_function(data = data.frame(x = c(0+c1, 25+c1)),
fun = function(x) plogis(x - c1, location = 9, scale = 3), colour = "blue") +
stat_function(data = data.frame(x = c(0+c2, 25+c2)),
fun = function(x) plogis(x - c2, location = 9, scale = 4), colour = "green")
p2
PS: In the answer, I have added the constants also to the data.frame itself, so that the shift is shown (you can remove them from the df in case you want you want only the original x-range shown).

Automatically writing scatterplots in ggplot2 to a folder

I have a large number of variables and would like to create scatterplots comparing all variables to a single variable. I have been able to do this in base R using lapply, but I cannot complete the same task in ggplot2 using lapply.
Below is an example dataset.
df <- data.frame("ID" = 1:16)
df$A <- c(1,2,3,4,5,6,7,8,9,10,11,12,12,14,15,16)
df$B <- c(5,6,7,8,9,10,13,15,14,15,16,17,18,18,19,20)
df$C <- c(11,12,14,16,10,12,14,16,10,12,14,16,10,12,14,16)
I define the variables I would like to generate scatterplots with, using the code below:
df_col_names <- df %>% select(A:C) %>% colnames(.)
Below is how I have been able to successfully complete the task of plotting all variables against variable A, using lapply in base R:
lapply(df_col_names, function(x) {
tiff(filename=sprintf("C:\\Documents\\%s.tiff", x),
width = 1000, height = 1000, res=200)
plot(df$A, df[[x]],
pch=19,
cex = 1.5,
ylab = x,
ylim = c(0, 20),
xlim = c(0, 20))
dev.off()
})
Below is my attempt at completing the task in ggplot2 without any success. It generates the tiff images, although they are empty.
lapply(df_col_names, function(x) {
tiff(filename=sprintf("C:\\Documents\\%s.tiff", x),
width = 1000, height = 1000, res=200)
ggplot(df) +
geom_point(data = df,
aes(x = A, y = df_col_names[[x]], size = 3)) +
geom_smooth(aes(x = A, y = df_col_names[[x]], size = 0), method = "lm", size=0.5) +
coord_fixed(ratio = 1, xlim = c(0, 20), ylim = c(0, 20)) +
guides(size = FALSE, color = FALSE) +
theme_bw(base_size = 14)
dev.off()
})
It works for me with ggsave. Also note that you are passing string column names to ggplot so use .data to refer to actual column values.
library(ggplot2)
lapply(df_col_names, function(x) {
ggplot(df) +
geom_point( aes(x = A, y = .data[[x]], size = 3)) +
geom_smooth(aes(x = A, y = .data[[x]], size = 0), method = "lm", size=0.5) +
coord_fixed(ratio = 1, xlim = c(0, 20), ylim = c(0, 20)) +
guides(size = FALSE, color = FALSE) +
theme_bw(base_size = 14) -> plt
ggsave(sprintf("%s.tiff", x), plt)
})

Removing points from plot generated with stat_summary

I've been asked to remove points from a plot that I've made with ggplot2. I'm attaching a MWE:
require(ggplot2)
require(Hmisc)
x = 5
k = 50
kx = k*5
data.A.1 = data.frame(n = rep(sample(1:100, k, replace=FALSE), x), v = rnorm(kx, 1, 2) + 0)
data.B.1 = data.frame(n = rep(sample(1:100, k, replace=FALSE), x), v = rnorm(kx, 1, 2) + 1)
data.C.1 = data.frame(n = rep(sample(1:100, k, replace=FALSE), x), v = rnorm(kx, 1, 2) + 2)
data.A.2 = data.frame(n = rep(sample(1:100, k, replace=FALSE), x), v = rnorm(kx, 1, 2) + 3)
data.B.2 = data.frame(n = rep(sample(1:100, k, replace=FALSE), x), v = rnorm(kx, 1, 2) + 4)
data.C.2 = data.frame(n = rep(sample(1:100, k, replace=FALSE), x), v = rnorm(kx, 1, 2) + 5)
multiple.plot.6x3.interval <- function(D, L) {
data = data.frame()
# join all the data in D into 'data'
e = 0
NN = ""
for (i in seq(1, length(D))) {
lidx = i%%3
if (lidx == 0) { lidx = 3 }
if (lidx == 1) {
e = e + 1
NN = paste0("10^", e)
}
n.obs = length(D[[i]]$n)
D[[i]]$lang.name = rep(L[lidx], n.obs)
D[[i]]$N = rep(NN, n.obs)
data = rbind(data, D[[i]])
}
# make the plot
g <- ggplot(data, aes(x=n, y=v)) +
stat_summary( # plot confidence interval
fun.data = mean_cl_boot, fun.args = (conf.int = 0.99),
geom = "ribbon", fill = "darkgrey"
) +
stat_summary( # plot maximum and minimum bars
fun = mean, fun.min = min, fun.max = max,
colour = "red", size = 0.15
) +
stat_summary(
fun = mean,
geom = "line", linetype = "solid", size = 0.4, color = "black"
) +
coord_cartesian(xlim=c(1, 100)) +
scale_x_continuous(breaks=seq(1, 101, 10)-1) +
facet_grid(
N ~ lang.name, labeller = "label_parsed"
) +
labs(
x=bquote("X"),
y=bquote("Y")
) +
theme(text = element_text(size = 20))
return (g)
}
g <- multiple.plot.6x3.interval(
list(
data.A.1, data.B.1, data.C.1,
data.A.2, data.B.2, data.C.2
),
c("A", "B", "C")
)
plot(g)
The result of this code is the one I want, but with the exception that I've been asked to remove the points that this
stat_summary( # plot maximum and minimum bars
fun = mean, fun.min = min, fun.max = max,
colour = "red", size = 0.15
) +
generates while keeping the bars.
This is what I get, and I would like to remove the red points (not the red bars).
Using size = 0 will make the bars completely invisible. I haven't been able to do this myself. I wonder: can this be done? If so, how? Any help will be appreciated.
Thank you all.
Try geom = "errorbar" as an argument to stat_summary:
require(ggplot2)
require(Hmisc)
x = 5
k = 50
kx = k*5
data.A.1 = data.frame(n = rep(sample(1:100, k, replace=FALSE), x), v = rnorm(kx, 1, 2) + 0)
data.B.1 = data.frame(n = rep(sample(1:100, k, replace=FALSE), x), v = rnorm(kx, 1, 2) + 1)
data.C.1 = data.frame(n = rep(sample(1:100, k, replace=FALSE), x), v = rnorm(kx, 1, 2) + 2)
data.A.2 = data.frame(n = rep(sample(1:100, k, replace=FALSE), x), v = rnorm(kx, 1, 2) + 3)
data.B.2 = data.frame(n = rep(sample(1:100, k, replace=FALSE), x), v = rnorm(kx, 1, 2) + 4)
data.C.2 = data.frame(n = rep(sample(1:100, k, replace=FALSE), x), v = rnorm(kx, 1, 2) + 5)
multiple.plot.6x3.interval <- function(D, L) {
data = data.frame()
# join all the data in D into 'data'
e = 0
NN = ""
for (i in seq(1, length(D))) {
lidx = i%%3
if (lidx == 0) { lidx = 3 }
if (lidx == 1) {
e = e + 1
NN = paste0("10^", e)
}
n.obs = length(D[[i]]$n)
D[[i]]$lang.name = rep(L[lidx], n.obs)
D[[i]]$N = rep(NN, n.obs)
data = rbind(data, D[[i]])
}
# make the plot
g <- ggplot(data, aes(x=n, y=v)) +
stat_summary( # plot confidence interval
fun.data = mean_cl_boot, fun.args = (conf.int = 0.99),
geom = "ribbon", fill = "darkgrey"
) +
stat_summary( # plot maximum and minimum bars
fun = mean, fun.min = min, fun.max = max,
geom = "errorbar", ### HERE
colour = "red", size = 0.15
) +
stat_summary(
fun = mean,
geom = "line", linetype = "solid", size = 0.4, color = "black"
) +
coord_cartesian(xlim=c(1, 100)) +
scale_x_continuous(breaks=seq(1, 101, 10)-1) +
facet_grid(
N ~ lang.name, labeller = "label_parsed"
) +
labs(
x=bquote("X"),
y=bquote("Y")
) +
theme(text = element_text(size = 20))
return (g)
}
g <- multiple.plot.6x3.interval(
list(
data.A.1, data.B.1, data.C.1,
data.A.2, data.B.2, data.C.2
),
c("A", "B", "C")
)
plot(g)

Grid as bars in ggplot

A common layout in many sites is to draw the grid as shaded bars:
I'm doing this with this function:
grid_bars <- function(data, y, n = 5, fill = "gray90") {
breaks <- pretty(data[[y]], n)
len <- length(breaks)-1
all_bars <- data.frame(
b.id = rep(1:len, 4),
b.x = c(rep(-Inf, len), rep(Inf, len*2), rep(-Inf, len)),
b.y = c(rep(breaks[-length(breaks)], 2), rep(breaks[-1], 2))
)
bars <- all_bars[all_bars$b.id %in% (1:len)[c(FALSE, TRUE)], ]
grid <- list(
geom_polygon(data = bars, aes(b.x, b.y, group = b.id),
fill = fill, colour = fill),
scale_y_continuous(breaks = breaks),
theme(panel.grid = element_blank())
)
return(grid)
}
#-------------------------------------------------
dat <- data.frame(year = 1875:1972,
level = as.vector(LakeHuron))
ggplot(dat, aes(year, level)) +
grid_bars(dat, "level", 10) +
geom_line(colour = "steelblue", size = 1.2) +
theme_classic()
But it needs to specify data and y again. How to take those directly from the ggplot?
After having a look at the options for extending ggplot2 in Hadley Wickham's book on ggplot2 you probably have to set up your own Geom or Stat layer to achieve the desired result. This way you can access the data and aesthetics specified in ggplot() or even pass different data and aesthetics to your fun. Still a newbie in writing extensions for ggplot2 but a first approach may look like so:
library(ggplot2)
# Make bars dataframe
make_bars_df <- function(y, n) {
breaks <- pretty(y, n)
len <- length(breaks) - 1
all_bars <- data.frame(
group = rep(1:len, 4),
x = c(rep(-Inf, len), rep(Inf, len * 2), rep(-Inf, len)),
y = c(rep(breaks[-length(breaks)], 2), rep(breaks[-1], 2))
)
all_bars[all_bars$group %in% (1:len)[c(FALSE, TRUE)], ]
}
# Setup Geom
geom_grid_bars_y <- function(mapping = NULL, data = NULL, stat = "identity",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, n = 5, ...) {
layer(
geom = GeomGridBarsY, mapping = mapping, data = data, stat = stat,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(n = n, ...)
)
}
GeomGridBarsY <- ggproto("GeomGridBarsY", Geom,
required_aes = c("y"),
default_aes = aes(alpha = NA, colour = NA, fill = "gray90", group = NA,
linetype = "solid", size = 0.5, subgroup = NA),
non_missing_aes = aes("n"),
setup_data = function(data, params) {
transform(data)
},
draw_group = function(data, panel_scales, coord, n = n) {
bars <- make_bars_df(data[["y"]], n)
# setup data for GeomPolygon
## If you want this to work with facets you have to take care of the PANEL
bars$PANEL <- factor(1)
# Drop x, y, group from data
d <- data[ , setdiff(names(data), c("x", "y", "group"))]
d <- d[!duplicated(d), ]
# Merge information in data to bars
bars <- merge(bars, d, by = "PANEL")
# Set color = fill
bars[["colour"]] <- bars[["fill"]]
# Draw
grid::gList(
ggplot2::GeomPolygon$draw_panel(bars, panel_scales, coord)
)
},
draw_key = draw_key_rect
)
grid_bars <- function(n = 5, fill = "gray90") {
list(
geom_grid_bars_y(n = n, fill = fill),
scale_y_continuous(breaks = scales::pretty_breaks(n = n)),
theme(panel.grid = element_blank())
)
}
dat <- data.frame(year = 1875:1972,
level = as.vector(LakeHuron))
ggplot(dat, aes(year, level)) +
grid_bars(n = 10, fill = "gray95") +
geom_line(colour = "steelblue", size = 1.2) +
theme_classic()
Just for reference:
A first and simple approach to get grid bars one could simply adjust the size of the grid lines via theme() like so:
# Simple approach via theme
ggplot(dat, aes(year, level)) +
geom_line(colour = "steelblue", size = 1.2) +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
theme_classic() +
theme(panel.grid.major.y = element_line(size = 8))
Created on 2020-06-14 by the reprex package (v0.3.0)

Making a specific quantile plot in R

I am very intrigued by the following visulization (Decile term)
And I wonder how it would be possible to do it in R.
There is of course histograms and density plots, but they do not make such a nice visualization. Especially, I would like to know if it possible to do it with ggplot/tidyverse.
edit in response to the comment
library(dplyr)
library(ggplot2)
someData <- data_frame(x = rnorm(1000))
ggplot(someData, aes(x = x)) +
geom_histogram()
this produces a histogram (see http://www.r-fiddle.org/#/fiddle?id=LQXazwMY&version=1)
But how I can get the coloful bars? How to implement the small rectangles? (The arrows are less relevant).
You have to define a number of breaks, and use approximate deciles that match those histogram breaks. Otherwise, two deciles will end up in one bar.
d <- data_frame(x = rnorm(1000))
breaks <- seq(min(d$x), max(d$x), length.out = 50)
quantiles <- quantile(d$x, seq(0, 1, 0.1))
quantiles2 <- sapply(quantiles, function(x) breaks[which.min(abs(x - breaks))])
d$bar <- as.numeric(as.character(cut(d$x, breaks, na.omit((breaks + dplyr::lag(breaks)) / 2))))
d$fill <- cut(d$x, quantiles2, na.omit((quantiles2 + dplyr::lag(quantiles2)) / 2))
ggplot(d, aes(bar, y = 1, fill = fill)) +
geom_col(position = 'stack', col = 1, show.legend = FALSE, width = diff(breaks)[1])
Or with more distinct colors:
ggplot(d, aes(bar, y = 1, fill = fill)) +
geom_col(position = 'stack', col = 1, show.legend = FALSE, width = diff(breaks)[1]) +
scale_fill_brewer(type = 'qual', palette = 3) # The only qual pallete with enough colors
Add some styling and increase the breaks to 100:
ggplot(d, aes(bar, y = 1, fill = fill)) +
geom_col(position = 'stack', col = 1, show.legend = FALSE, width = diff(breaks)[1], size = 0.3) +
scale_fill_brewer(type = 'qual', palette = 3) +
theme_classic() +
coord_fixed(diff(breaks)[1], expand = FALSE) + # makes square blocks
labs(x = 'x', y = 'count')
And here is a function to make that last one:
decile_histogram <- function(data, var, n_breaks = 100) {
breaks <- seq(min(data[[var]]), max(data[[var]]), length.out = n_breaks)
quantiles <- quantile(data[[var]], seq(0, 1, 0.1))
quantiles2 <- sapply(quantiles, function(x) breaks[which.min(abs(x - breaks))])
data$bar <- as.numeric(as.character(
cut(data[[var]], breaks, na.omit((breaks + dplyr::lag(breaks)) / 2)))
)
data$fill <- cut(data[[var]], quantiles2, na.omit((quantiles2 + dplyr::lag(quantiles2)) / 2))
ggplot2::ggplot(data, ggplot2::aes(bar, y = 1, fill = fill)) +
ggplot2::geom_col(position = 'stack', col = 1, show.legend = FALSE, width = diff(breaks)[1], size = 0.3) +
ggplot2::scale_fill_brewer(type = 'qual', palette = 3) +
ggplot2::theme_classic() +
ggplot2::coord_fixed(diff(breaks)[1], expand = FALSE) +
ggplot2::labs(x = 'x', y = 'count')
}
Use as:
d <- data.frame(x = rnorm(1000))
decile_histogram(d, 'x')

Resources