Extending ggplot2: How to build a geom and stat? - r

I am in the early stages of learning how to extend ggplot2. I would like to create a custom geom and associated stat. My starting point was the vignette. In addition, I have benefited from this and this. I'm trying to put together a template to teach myself and hopefully others.
Main question:
Inside my function calculate_shadows() the needed parameter params$anchor is NULL. How can I access it?
The goal described below is intended solely for learning how to create custom stat and geom functions, it's not a real goal: as you can see from the screenshots, I do know how to leverage the power of ggplot2 to make the graphs.
The geom will read the data and for the supplied variables ("x", "y") will plot (for want of a better word) shadows: a horizontal line min(x)--max(x) at the default y=0 and a vertical line min(y)--max(y) at the default x=0. If an option is supplied, these "anchors" could be changed, e.g. if the user supplies x = 35, y = 1, the horizontal line would be drawn at the intercept y = 1 while the vertical line would be drawn at the intercept x = 35. Usage:
library(ggplot2)
ggplot(data = mtcars, aes(x = mpg, y = wt)) +
geom_point() +
geom_shadows(x = 35, y = 1)
The stat will read the data and for the supplied variables ("x", "y") will compute shadows according to the value of stat. For instance, by passing stat = "identity", the shadows would be computed for the min and max of the data (as done by geom_shadows). But by passing stat = "quartile", the shadows would be computed for first and third quartile. More generally, one could pass a function like stats::quantile with arguments args = list(probs = c(0.10, 0.90), type = 6), to compute shadows using the 10th and 90th percentiles and the quantile method of type 6. Usage:
ggplot(data = mtcars, aes(x = mpg, y = wt)) +
geom_point() +
stat_shadows(stat = "quartile")
Unfortunately, my lack of familiarity with extending ggplot2 stopped me well short of my objective. These plots were "faked" with geom_segment. Based on the tutorial and discussions cited above and inspecting existing code like stat-qq or stat-smooth, I have put together a basic architecture for this goal. It must contain many mistakes, I would be grateful for guidance. Also, note that either of these approaches would be fine: geom_shadows(anchor = c(35, 1)) or geom_shadows(x = 35, y = 1).
Now here are my efforts. First, geom-shadows.r to define geom_shadows(). Second, stat-shadows.r to define stat_shadows(). The code doesn't work as is. But if I execute its content, it does produce the desired statistics. For clarity, I have removed most of the calculations in stat_shadows(), such as quartiles, to focus on essentials. Any obvious mistake in the layout?
geom-shadows.r
#' documentation ought to be here
geom_shadows <- function(
mapping = NULL,
data = NULL,
stat = "shadows",
position = "identity",
...,
anchor = list(x = 0, y = 0),
shadows = list("x", "y"),
type = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomShadows,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
anchor = anchor,
shadows = shadows,
type = type,
na.rm = na.rm,
...
)
)
}
GeomShadows <- ggproto("GeomShadows", Geom,
# set up the data, e.g. remove missing data
setup_data = function(data, params) {
data
},
# set up the parameters, e.g. supply warnings for incorrect input
setup_params = function(data, params) {
params
},
draw_group = function(data, panel_params, coord, anchor, shadows, type) {
# draw_group uses stats returned by compute_group
# set common aesthetics
geom_aes <- list(
alpha = data$alpha,
colour = data$color,
size = data$size,
linetype = data$linetype,
fill = alpha(data$fill, data$alpha),
group = data$group
)
# merge aesthetics with data calculated in setup_data
geom_stats <- new_data_frame(c(list(
x = c(data$x.xmin, data$y.xmin),
xend = c(data$x.xmax, data$y.xmax),
y = c(data$x.ymin, data$y.ymin),
yend = c(data$x.ymax, data$y.ymax),
alpha = c(data$alpha, data$alpha)
), geom_aes
), n = 2)
# turn the stats data into a GeomPath
geom_grob <- GeomSegment$draw_panel(unique(geom_stats),
panel_params, coord)
# pass the GeomPath to grobTree
ggname("geom_shadows", grobTree(geom_grob))
},
# set legend box styles
draw_key = draw_key_path,
# set default aesthetics
default_aes = aes(
colour = "blue",
fill = "red",
size = 1,
linetype = 1,
alpha = 1
)
)
stat-shadows.r
#' documentation ought to be here
stat_shadows <-
function(mapping = NULL,
data = NULL,
geom = "shadows",
position = "identity",
...,
# do I need to add the geom_shadows arguments here?
anchor = list(x = 0, y = 0),
shadows = list("x", "y"),
type = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
stat = StatShadows,
data = data,
mapping = mapping,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
# geom_shadows argument repeated here?
anchor = anchor,
shadows = shadows,
type = type,
na.rm = na.rm,
...
)
)
}
StatShadows <-
ggproto("StatShadows", Stat,
# do I need to repeat required_aes?
required_aes = c("x", "y"),
# set up the data, e.g. remove missing data
setup_data = function(data, params) {
data
},
# set up parameters, e.g. unpack from list
setup_params = function(data, params) {
params
},
# calculate shadows: returns data_frame with colnames: xmin, xmax, ymin, ymax
compute_group = function(data, scales, anchor = list(x = 0, y = 0), shadows = list("x", "y"), type = NULL, na.rm = TRUE) {
.compute_shadows(data = data, anchor = anchor, shadows = shadows, type = type)
}
)
# Calculate the shadows for each type / shadows / anchor
.compute_shadows <- function(data, anchor, shadows, type) {
# Deleted all type-checking, etc. for MWE
# Only 'type = c(double, double)' accepted, e.g. type = c(0, 1)
qs <- type
# compute shadows along the x-axis
if (any(shadows == "x")) {
shadows.x <- c(
xmin = as.numeric(stats::quantile(data[, "x"], qs[[1]])),
xmax = as.numeric(stats::quantile(data[, "x"], qs[[2]])),
ymin = anchor[["y"]],
ymax = anchor[["y"]])
}
# compute shadows along the y-axis
if (any(shadows == "y")) {
shadows.y <- c(
xmin = anchor[["x"]],
xmax = anchor[["x"]],
ymin = as.numeric(stats::quantile(data[, "y"], qs[[1]])),
ymax = as.numeric(stats::quantile(data[, "y"], qs[[2]])))
}
# store shadows in one data_frame
stats <- new_data_frame(c(x = shadows.x, y = shadows.y))
# return the statistics
stats
}
.

Until a more thorough answer comes along: You are missing
extra_params = c("na.rm", "shadows", "anchor", "type"),
inside GeomShadows <- ggproto("GeomShadows", Geom,
and possibly also inside StatShadows <- ggproto("StatShadows", Stat,.
Inside geom-.r and stat-.r there are many very useful comments that clarify how geoms and stats work. In particular (hat tips Claus Wilke over at github issues):
# Most parameters for the geom are taken automatically from draw_panel() or
# draw_groups(). However, some additional parameters may be needed
# for setup_data() or handle_na(). These can not be imputed automatically,
# so the slightly hacky "extra_params" field is used instead. By
# default it contains `na.rm`
extra_params = c("na.rm"),

Related

ggplot2: How to get a Geom Class to affect ScaleContinuous Class

I have begun to extend ggplot2 and I'm still getting a feel for how the package calls all of its internal functions.
I have a new ggproto class that extends one of the current Geom environments. The current model of the class will plot something along the discrete x axis, ideally touching the x axis ticks. This model works well when the y axis is already on a discrete scale, because the default expansion values only adds .6 padding. However on a continuous y scale, the default padding can make these new plotted objects seem far.
In summary, how can I make my Geom class override the default expansion without just adding either scale_y_continuous(expand = c(0,0,.05,0) or scale_y_discrete(expand = c(0, 0, .6,0) to my layer function...
Consider the following reproducible example
library(dplyr)
library(tidyr)
library(ggplot2)
library(stringr)
mtcars0 <- as_tibble(mtcars, rownames = "CarNames") %>%
gather(key = "qualities", value = "value", -CarNames) %>%
group_by(qualities) %>%
mutate(scaledValue = scale(value)) %>%
ungroup() %>%
mutate(carCase = case_when(str_detect(CarNames, "^[A-M]") ~ "A-M",
TRUE ~ "N-Z"))
"%||%" <- function(a, b) {
if (!is.null(a)) a else b
}
MyText <- ggproto("GeomMyText",
GeomText,
extra_params = c("na.rm","padDist"),
setup_data = function(data, params){
#find bottom of plot with sufficent space
minpadding <- params$padDist %||% diff(range(data$y))*.05
data$y <- min(data$y) - minpadding
data
})
geom_mytext <- function (mapping = NULL, data = NULL, stat = "identity", position = "identity",
..., parse = FALSE, nudge_x = 0, nudge_y = 0, check_overlap = FALSE,
na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, padDist = NULL)
{
if (!missing(nudge_x) || !missing(nudge_y)) {
if (!missing(position)) {
abort("You must specify either `position` or `nudge_x`/`nudge_y`.")
}
position <- position_nudge(nudge_x, nudge_y)
}
layer(data = data, mapping = mapping, stat = stat, geom = MyText,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(parse = parse, check_overlap = check_overlap,
na.rm = na.rm, padDist = padDist, ...))
}
result <- ggplot(mtcars0, aes(x = CarNames, value)) +
geom_point() +
geom_mytext(aes(label = carCase)) +
theme(axis.text.x = element_text(angle=90))
#Default
result
#Desired Result without having to call scale_y_continuous
result + scale_y_continuous(expand = c(0,0,0.05,0))
I'm assuming I need to extend the ScaleContinuous environment but I have no idea how to connect the MyText environment with it.
Any suggestions?
---- EDIT ----
Thanks for the quick replies!
A few things -
I am aware there is over plotting and clipping of labels, this isn't my actual Geom environment, just something I could put together that demonstrates my question.
As I was afraid, it seems that everyone so far is providing the solution that was raised for this question. While supplying my own scale is less than ideal - because I have to put logic in to discern if they associated y axis is discrete/continuous, when ggplot2 already knows this, I figured that there might be a trick I was missing. For now I will continue development with the suggestions given. Thanks!
---- EDIT 2 ----
I took another look at the solution given here. The exact parameters I need to modify is
panel_params$y$continuous_range[1] <- panel_params$y$limits[1]
And I need to do this somewhere in draw_panel. It seems like the associated scales are contained there and the coord$transform(data, panel_params) is responsible for including the padding on the rescaled axis depending on what is set for panel_params$y$limits and panel_params$y$continuous_range.
Thanks again to everyone who contributed!
Nice question - thanks for posting.
This is easier than you'd think. You simply bundle the desired scale object with the layer object that is returned from your geom_mytext function by concatenating them with c. In this example I have also bundled a coord_cartesian object so that I could turn clipping off to show the text properly. I have also changed the default check.overlap to TRUE because your labels are being overplotted.
Note I haven't changed your ggplot call at all
geom_mytext <- function (mapping = NULL, data = NULL, stat = "identity", position = "identity",
..., parse = FALSE, nudge_x = 0, nudge_y = 0, check_overlap = FALSE,
na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, padDist = NULL)
{
if (!missing(nudge_x) || !missing(nudge_y)) {
if (!missing(position)) {
abort("You must specify either `position` or `nudge_x`/`nudge_y`.")
}
position <- position_nudge(nudge_x, nudge_y)
}
c(layer(data = data, mapping = mapping, stat = stat, geom = MyText,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(parse = parse, check_overlap = check_overlap,
na.rm = na.rm, padDist = padDist, ...)),
scale_y_continuous(expand = c(0,0,0.05,0)),
coord_cartesian(clip = "off"))
}
result <- ggplot(mtcars0, aes(x = CarNames, value)) +
geom_point() +
geom_mytext(label = "test") +
theme(axis.text.x = element_text(angle=90))
result
Now come the caveats. Because you are supplying your own scale_y_continuous object, users won't like that ggplot complains when they try to add their own y scale. You will also need some logic to choose between adding a continuous or discrete y scale. I don't think these are insurmountable problems though.
I wouldn't generally recommend the route you are taking to achieve text at the bottom. The reason is that within the grammar of graphics paradigm, all different graphical elements (themes, coords, stats, facets, geoms) should all operate independently from oneanother. When I add a geom to a plot I expect the data to affect the scales but not the geom affecting the scales.
That said, here is the easiest way to automatically set the y-expansion with a geom, and that is to simply return a list of the geom and the scale from the constructor. This is similar to how geom_sf() automatically sets the coord_sf(). I know in your question you mention this is not how you like it, but there is just no natural infrastructure for geoms layers to communicate to the scales other than with data.
geom_mytext <- function (mapping = NULL, data = NULL, stat = "identity", position = "identity",
..., parse = FALSE, nudge_x = 0, nudge_y = 0, check_overlap = FALSE,
na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, padDist = NULL)
{
if (!missing(nudge_x) || !missing(nudge_y)) {
if (!missing(position)) {
abort("You must specify either `position` or `nudge_x`/`nudge_y`.")
}
position <- position_nudge(nudge_x, nudge_y)
}
layer <- layer(data = data, mapping = mapping, stat = stat, geom = MyText,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(parse = parse, check_overlap = check_overlap,
na.rm = na.rm, padDist = padDist, ...))
list(layer, scale_y_continuous(expand = c(0,0,0.05,0)))
}
What I'm recommending instead is to simply set data$y <- -Inf, which will be ignored by the scale training, leaving the default expansion factor intact but placing your data at the x-axis anyway.
MyText <- ggproto("GeomMyText",
GeomText,
extra_params = c("na.rm","padDist"),
setup_data = function(data, params){
data$y <- -Inf
data
})
Which gives me this plot:
For comparison this is what your reprex plots for me:
As an aside, there seem to be a lot of duplicated labels, which you might want to adress in your final geom.

Calculation error creating ggplot2 stat extension

I'm trying to create a ggplot2 extension. I'd like to create a stat (or geom) that allows me to plot well logs in a fashion somewhat similar to this well log image. The calculation encapsulated in the ggproto object eventually fails though.
Let's look at the pet example! Suppose we have some well log information giving us the lower boundary of a geological unit, its name and a position indicator:
library(tidyverse)
testwell <- tibble::tribble(
~depth, ~type, ~pos,
3, "loam", 1,
7, "sand", 1,
7.5, "clay", 1,
11, "murl", 1,
12.2, "gravel", 1
)
In order to transform this into plotable polygons I created a ggproto object and a stat.
StatGeostack <- ggproto("StatGeostack", Stat,
required_aes = c("x", "y", "group"),
compute_group = function(data, scales, params, glvl = 0){
bound_low <- data$y
strata <- data$group
position <- data$x
# x-position doesn't really matter for now so let's just jitter it a bit
# and pretend we have a drillcore diameter of "1" so we can draw polygons
xmin <- position[1]-0.5
xmax <- position[1]+0.5
#The lower boundary of the first strata is the upper boundary of the second and so on
bound_up <- c(glvl, bound_low)
length(bound_up) <- length(bound_low)
# This tibble contains all the information alas not in the right format
stackframe <- tibble::tibble(
strata = strata,
bound_up = bound_up,
bound_low = bound_low
)
# adapt input format for ggplot2 polygons
purrr::pmap_dfr(stackframe,
function(strata, bound_up, bound_low, xmin, xmax){
tibble::tibble(y = c(rep(bound_up, times = 2),
rep(bound_low, times = 2)),
x = c(xmin, xmax, xmax, xmin),
group = rep(strata, times = 4))
},
xmin = xmin, xmax = xmax)
}
)
#creating the stat corresponding to the ggproto object
stat_geostack <- function(mapping = NULL, data = NULL, geom = "polygon",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, glvl = 0, ...){
layer(
stat = StatGeostack, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, glvl = 0, ...)
)
}
So this is what I came up with. To see why I'm not happy with the result, let's use the new stat in a plot
ggplot(data = testwell,
aes(x = pos, y = depth*-1, group = type)) +
stat_geostack(aes(fill = type)) +
theme_bw()
The result looks like this. On first glance this doesn't look too bad, but then again half of our stratigraphy is missing. It's in the legend but not drawn in the panel.
I tried to figure out what is going on by changing from "fill" to "color"
ggplot(data = testwell,
aes(x = pos, y = depth*-1, group = type)) +
stat_geostack(fill = NA, aes(color = type), size = 2) +
theme_bw()
So what apparently happens is that the missing stratigraphic units aren't gone but hidden behind each other. The reason for this seems to be, that each polygon has its upper boundary set to zero, while the lower boundary is calculated as intended. The calculation-mechanism I used inside the ggproto's "calculate_group" function works just fine in the global environment. I can calculate the polygon vertices outside the ggplot2 extension mechanism and than make a regular ggplot2+geom_polygon plot with no problem, but that's not what I want to do.
Thank you for staying with me in this somewhat exhausting post.

Creating geom / stat from scratch

I just started working with R not long ago, and I am currently trying to strengthen my visualization skills. What I want to do is to create boxplots with mean diamonds as a layer on top (see picture in the link below). I did not find any functions that does this already, so I guess I have to create it myself.
What I was hoping to do was to create a geom or a stat that would allow something like this to work:
ggplot(data, aes(...))) +
geom_boxplot(...) +
geom_meanDiamonds(...)
I have no idea where to start in order to build this new function. I know which values are needed for the mean diamonds (mean and confidence interval), but I do not know how to build the geom / stat that takes the data from ggplot(), calculates the mean and CI for each group, and plots a mean diamond on top of each boxplot.
I have searched for detailed descriptions on how to build these type of functions from scratch, however, I have not found anything that really starts from the bottom. I would really appreciate it, if anyone could point me towards some useful guides.
Thank you!
I'm currently learning to write geoms myself, so this is going to be a rather long & rambling post as I go through my thought processes, untangling the Geom aspects (creating polygons & line segments) from the Stats aspects (calculating where these polygons & segments should be) of a geom.
Disclaimer: I'm not familiar with this kind of plot, and Google didn't throw up many authoritative guides. My understanding of how the confidence interval is calculated / used here may be off.
Step 0. Understand the relationship between a geom / stat and a layer function.
geom_boxplot and stat_boxplot are examples of layer functions. If you enter them into the R console, you'll see that they are (relatively) short, and does not contain actual code for calculating the box / whiskers of the boxplot. Instead, geom_boxplot contains a line that says geom = GeomBoxplot, while stat_boxplot contains a line that says stat = StatBoxplot (reproduced below).
> stat_boxplot
function (mapping = NULL, data = NULL, geom = "boxplot", position = "dodge2",
..., coef = 1.5, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
{
layer(data = data, mapping = mapping, stat = StatBoxplot,
geom = geom, position = position, show.legend = show.legend,
inherit.aes = inherit.aes, params = list(na.rm = na.rm,
coef = coef, ...))
}
GeomBoxplot and StatBoxplot are ggproto objects. They are where the magic happens.
Step 1. Recognise that ggproto()'s _inherit parameter is your friend.
Don't reinvent the wheel. Since we want to create something that overlaps nicely with a boxplot, we can take reference from the Geom / Stat used for that, and only change what's necessary.
StatMeanDiamonds <- ggproto(
`_class` = "StatMeanDiamonds",
`_inherit` = StatBoxplot,
... # add functions here to override those defined in StatBoxplot
)
GeomMeanDiamonds <- ggproto(
`_class` = "GeomMeanDiamonds",
`_inherit` = GeomBoxplot,
... # as above
)
Step 2. Modify the Stat.
There are 3 functions defined within StatBoxplot: setup_data, setup_params, and compute_group. You can refer to the code on Github (link above) for the details, or view them by entering for example StatBoxplot$compute_group.
The compute_group function calculates the ymin / lower / middle / upper / ymax values for all the y values associated with each group (i.e. each unique x value), which are used to plot the box plot. We can override it with one that calculates the confidence interval & mean values instead:
# ci is added as a parameter, to allow the user to specify different confidence intervals
compute_group_new <- function(data, scales, width = NULL,
ci = 0.95, na.rm = FALSE){
a <- mean(data$y)
s <- sd(data$y)
n <- sum(!is.na(data$y))
error <- qt(ci + (1-ci)/2, df = n-1) * s / sqrt(n)
stats <- c("lower" = a - error, "mean" = a, "upper" = a + error)
if(length(unique(data$x)) > 1) width <- diff(range(data$x)) * 0.9
df <- as.data.frame(as.list(stats))
df$x <- if(is.factor(data$x)) data$x[1] else mean(range(data$x))
df$width <- width
df
}
(Optional) StatBoxplot has provision for the user to include weight as an aesthetic mapping. We can allow for that as well, by replacing:
a <- mean(data$y)
s <- sd(data$y)
n <- sum(!is.na(data$y))
with:
if(!is.null(data$weight)) {
a <- Hmisc::wtd.mean(data$y, weights = data$weight)
s <- sqrt(Hmisc::wtd.var(data$y, weights = data$weight))
n <- sum(data$weight[!is.na(data$y) & !is.na(data$weight)])
} else {
a <- mean(data$y)
s <- sd(data$y)
n <- sum(!is.na(data$y))
}
There's no need to change the other functions in StatBoxplot. So we can define StatMeanDiamonds as follows:
StatMeanDiamonds <- ggproto(
`_class` = "StatMeanDiamonds",
`_inherit` = StatBoxplot,
compute_group = compute_group_new
)
Step 3. Modify the Geom.
GeomBoxplot has 3 functions: setup_data, draw_group, and draw_key. It also includes definitions for default_aes() and required_aes().
Since we've changed the upstream data source (the data produced by StatMeanDiamonds contain the calculated columns "lower" / "mean" / "upper", while the data produced by StatBoxplot would have contained the calculated columns "ymin" / "lower" / "middle" / "upper" / "ymax"), do check whether the downstream setup_data function is affected as well. (In this case, GeomBoxplot$setup_data makes no reference to the affected columns, so no changes required here.)
The draw_group function takes the data produced by StatMeanDiamonds and set up by setup_data, and produces multiple data frames. "common" contains the aesthetic mappings common to all geoms. "diamond.df" for the mappings that contribute towards the diamond polygon, and "segment.df" for the mappings that contribute towards the horizontal line segment at the mean. The data frames are then passed to the draw_panel functions of GeomPolygon and GeomSegment respectively, to produce the actual polygons / line segments.
draw_group_new = function(data, panel_params, coord,
varwidth = FALSE){
common <- data.frame(colour = data$colour,
size = data$size,
linetype = data$linetype,
fill = alpha(data$fill, data$alpha),
group = data$group,
stringsAsFactors = FALSE)
diamond.df <- data.frame(x = c(data$x, data$xmax, data$x, data$xmin),
y = c(data$upper, data$mean, data$lower, data$mean),
alpha = data$alpha,
common,
stringsAsFactors = FALSE)
segment.df <- data.frame(x = data$xmin, xend = data$xmax,
y = data$mean, yend = data$mean,
alpha = NA,
common,
stringsAsFactors = FALSE)
ggplot2:::ggname("geom_meanDiamonds",
grid::grobTree(
GeomPolygon$draw_panel(diamond.df, panel_params, coord),
GeomSegment$draw_panel(segment.df, panel_params, coord)
))
}
The draw_key function is used to create the legend for this layer, should the need arise. Since GeomMeanDiamonds inherits from GeomBoxplot, the default is draw_key = draw_key_boxplot, and we don't have to change it. Leaving it unchanged will not break the code. However, I think a simpler legend such as draw_key_polygon offers a less cluttered look.
GeomBoxplot's default_aes specifications look fine. But we need to change the required_aes since the data we expect to get from StatMeanDiamonds is different ("lower" / "mean" / "upper" instead of "ymin" / "lower" / "middle" / "upper" / "ymax").
We are now ready to define GeomMeanDiamonds:
GeomMeanDiamonds <- ggproto(
"GeomMeanDiamonds",
GeomBoxplot,
draw_group = draw_group_new,
draw_key = draw_key_polygon,
required_aes = c("x", "lower", "upper", "mean")
)
Step 4. Define the layer functions.
This is the boring part. I copied from geom_boxplot / stat_boxplot directly, removing all references to outliers in geom_meanDiamonds, changing to geom = GeomMeanDiamonds / stat = StatMeanDiamonds, and adding ci = 0.95 to stat_meanDiamonds.
geom_meanDiamonds <- function(mapping = NULL, data = NULL,
stat = "meanDiamonds", position = "dodge2",
..., varwidth = FALSE, na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE){
if (is.character(position)) {
if (varwidth == TRUE) position <- position_dodge2(preserve = "single")
} else {
if (identical(position$preserve, "total") & varwidth == TRUE) {
warning("Can't preserve total widths when varwidth = TRUE.", call. = FALSE)
position$preserve <- "single"
}
}
layer(data = data, mapping = mapping, stat = stat,
geom = GeomMeanDiamonds, position = position,
show.legend = show.legend, inherit.aes = inherit.aes,
params = list(varwidth = varwidth, na.rm = na.rm, ...))
}
stat_meanDiamonds <- function(mapping = NULL, data = NULL,
geom = "meanDiamonds", position = "dodge2",
..., ci = 0.95,
na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) {
layer(data = data, mapping = mapping, stat = StatMeanDiamonds,
geom = geom, position = position, show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ci = ci, ...))
}
Step 5. Check output.
# basic
ggplot(iris,
aes(Species, Sepal.Length)) +
geom_boxplot() +
geom_meanDiamonds()
# with additional parameters, to see if they break anything
ggplot(iris,
aes(Species, Sepal.Length)) +
geom_boxplot(width = 0.8) +
geom_meanDiamonds(aes(fill = Species),
color = "red", alpha = 0.5, size = 1,
ci = 0.99, width = 0.3)

Q-Q plot with ggplot2::stat_qq, colours, single group

I'm looking for a more convenient way to get a Q-Q plot in ggplot2 where the quantiles are computed for the data set as a whole. but I can use mappings (colour/shapes) for groups in the data.
library(dplyr)
library(ggplot2)
library(broom) ## for augment()
Make up some data:
set.seed(1001)
N <- 1000
G <- 10
dd <- data_frame(x=runif(N),
f=factor(sample(1:G,size=N,replace=TRUE)),
y=rnorm(N)+2*x+as.numeric(f))
m1 <- lm(y~x,data=dd)
dda <- cbind(augment(m1),f=dd$f)
Basic plot:
ggplot(dda)+stat_qq(aes(sample=.resid))
if I try to add colour, the groups get separated for the quantile computation (which I don't want):
ggplot(dda)+stat_qq(aes(sample=y,colour=f))
If I use stat_qq(aes(sample=y,colour=f,group=1)) ggplot ignores the colour specification and I get the first plot back.
I want a plot where the points are positioned as in the first case, but coloured as in the second case. I have a qqnorm-based manual solution that I can post but am looking for something nicer ...
You could calculate the quantiles yourself and then plot using geom_point:
dda = cbind(dda, setNames(qqnorm(dda$.resid, plot.it=FALSE), c("Theoretical", "Sample")))
ggplot(dda) +
geom_point(aes(x=Theoretical, y=Sample, colour=f))
Ah, I guess I should have read to the end of your question. This is the manual solution you were referring to, right? Although you could just package it as a function:
my_stat_qq = function(data, colour.var) {
data=cbind(data, setNames(qqnorm(data$.resid, plot.it=FALSE), c("Theoretical", "Sample")))
ggplot(data) +
geom_point(aes_string(x="Theoretical", y="Sample", colour=colour.var))
}
my_stat_qq(dda, "f")
Here's a ggproto-based approach that attempts to change StatQq, since the underlying issue here (colour specification gets ignored when group is specified explicitly) is due to how its compute_group function is coded.
Define alternate version of StatQq with modified compute_group (last few lines of code):
StatQq2 <- ggproto("StatQq", Stat,
default_aes = aes(y = after_stat(sample), x = after_stat(theoretical)),
required_aes = c("sample"),
compute_group = function(data, scales, quantiles = NULL,
distribution = stats::qnorm, dparams = list(),
na.rm = FALSE) {
sample <- sort(data$sample)
n <- length(sample)
# Compute theoretical quantiles
if (is.null(quantiles)) {
quantiles <- stats::ppoints(n)
} else if (length(quantiles) != n) {
abort("length of quantiles must match length of data")
}
theoretical <- do.call(distribution, c(list(p = quote(quantiles)), dparams))
res <- ggplot2:::new_data_frame(list(sample = sample,
theoretical = theoretical))
# NEW: append remaining columns from original data
# (e.g. if there were other aesthetic variables),
# instead of returning res directly
data.new <- subset(data[rank(data$sample), ],
select = -c(sample, PANEL, group))
if(ncol(data.new) > 0) res <- cbind(res, data.new)
res
}
)
Define geom_qq2 / stat_qq2 to use modified StatQq2 instead of StatQq for their stat:
geom_qq2 <- function (mapping = NULL, data = NULL, geom = "point",
position = "identity", ..., distribution = stats::qnorm,
dparams = list(), na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE) {
layer(data = data, mapping = mapping, stat = StatQq2, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(distribution = distribution, dparams = dparams,
na.rm = na.rm, ...))
}
stat_qq2 <- function (mapping = NULL, data = NULL, geom = "point",
position = "identity", ..., distribution = stats::qnorm,
dparams = list(), na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE) {
layer(data = data, mapping = mapping, stat = StatQq2, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(distribution = distribution, dparams = dparams,
na.rm = na.rm, ...))
}
Usage:
cowplot::plot_grid(
ggplot(dda) + stat_qq(aes(sample = .resid)), # original
ggplot(dda) + stat_qq2(aes(sample = .resid, # new
color = f, group = 1))
)

ggplot2 2.0 new stat_ function: setting default scale for given aesthetics

I try to use the new functionality of ggplot2 in R that allows creating our own stat_ functions. I'm creating a simple one to compute and plot an interpolated surface between points arranged on a 2d array.
I would like to create a stat_topo() requiring x, y, and val aesthetics, plotting a simple geom_raster of interpolated val mapped to fill.
library(ggplot2)
library(dplyr)
library(akima)
cpt_grp <- function(data, scales) {
#interpolate data in 2D
itrp <- akima::interp(data$x,data$y,data$val,linear=F,extrap=T)
out <- expand.grid(x=itrp$x, y=itrp$y,KEEP.OUT.ATTRS = F)%>%
mutate(fill=as.vector(itrp$z))
# str(out)
return(out)
}
StatTopo <- ggproto("StatTopo", Stat,
compute_group = cpt_grp,
required_aes = c("x","y","val")
)
stat_topo <- function(mapping = NULL, data = NULL, geom = "raster",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
layer(
stat = StatTopo, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
set.seed(1)
nchan <- 30
d <- data.frame(val = rnorm(nchan), # some random values to be mapped to fill color
x = 1:nchan*cos(1:nchan), # the x and y position of the points to interpolate
y = 1:nchan*sin(1:nchan))
plot(d$x,d$y)
ggplot(d,aes(x=x,y=y,val=val)) +
stat_topo() +
geom_point()
When I run this, I get the following error:
Error: numerical color values must be >= 0, found -1
I understand that this is because somehow the scale of the fill aesthetic is set to discrete.
If I enter this:
ggplot(d,aes(x=x,y=y,val=val)) +
stat_topo() +
scale_fill_continuous() +
geom_point()
I get what I wanted: the expected raster with a continuous color scale, which I want the stat_ to do by default...
So I guess the question is:
How can I prevent ggplot from setting a discrete scale here, and ideally set a default scale within the call to my new stat_ function.
Apparently, when creating a new variable inside a stat_ function, one needs to explicitly associate it to the aesthetic it will be mapped to with the parameter default_aes = aes(fill = ..fill..) within the ggproto definition.
This is telling ggplot that it is a calculated aesthetic and it will pick a scale based on the data type.
So here we need to define the stat_ as follows:
cpt_grp <- function(data, scales) {
# interpolate data in 2D
itrp <- akima::interp(data$x,data$y,data$val,linear=F,extrap=T)
out <- expand.grid(x=itrp$x, y=itrp$y,KEEP.OUT.ATTRS = F)%>%
mutate(fill=as.vector(itrp$z))
# str(out)
return(out)
}
StatTopo <- ggproto("StatTopo", Stat,
compute_group = cpt_grp,
required_aes = c("x","y","val"),
default_aes = aes(fill = ..fill..)
)
stat_topo <- function(mapping = NULL, data = NULL, geom = "raster",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
layer(
stat = StatTopo, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
Then the following code:
set.seed(1)
nchan <- 30
d <- data.frame(val = rnorm(nchan),
x = 1:nchan*cos(1:nchan),
y = 1:nchan*sin(1:nchan))
ggplot(d,aes(x=x,y=y,val=val)) +
stat_topo() +
geom_point()
Produces as expected:
Without the need to specify a scale_ manually, but leaving the possibility to adapt the scale easily as usual with e.g. scale_fill_gradient2(low = 'blue',mid='white',high='red')
I got this answer here: https://github.com/hadley/ggplot2/issues/1481
Okay, slept on it, and had an idea, and I think this might do what you want. In your stat_topo layer function instead of the ggproto I returned a list with it as the first element and then added to that list another ggproto with a call to scale_fill_continuous().
library(ggplot2)
library(dplyr)
library(akima)
cpt_grp <- function(data, scales) {
#interpolate data in 2D
itrp <- akima::interp(data$x,data$y,data$val,linear=F,extrap=T)
out <- expand.grid(x=itrp$x, y=itrp$y,KEEP.OUT.ATTRS = F)%>%
mutate(fill=as.vector(itrp$z))
return(out)
}
StatTopo <- ggproto("StatTopo", Stat,
compute_group = cpt_grp,
required_aes = c("x","y","val")
)
stat_topo <- function(mapping = NULL, data = NULL, geom = "raster",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
list(
layer(
stat = StatTopo, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm )
),
scale_fill_continuous()
)
}
set.seed(1)
nchan <- 30
d <- data.frame(val = rnorm(nchan), # some random values to be mapped to fill color
x = 1:nchan*cos(1:nchan), # the x and y position of interp points
y = 1:nchan*sin(1:nchan))
ggplot(d,aes(x=x,y=y,val=val)) +
stat_topo() +
geom_point()
yielding the same picture as above.

Resources