Related
I am trying to create a custom function that extends ggplot2. The goal of the function is to superimpose a mean with horizontal and vertical standard errors. The code below does the entire thing.
library(plyr)
library(tidyverse)
summ <- ddply(mtcars,.(),summarise,
dratSE = sqrt(var(drat))/length(drat),
mpgSE = sqrt(var(mpg))/length(mpg),
drat = mean(drat),
mpg = mean(mpg))
ggplot(data = mtcars, mapping = aes(x = drat, y = mpg)) +
geom_point(shape = 21, fill = 'black', color = 'white', size = 3) +
geom_errorbarh(data = summ, aes(xmin = drat - dratSE, xmax = drat + dratSE)) +
geom_errorbar(data = summ, aes(ymin = mpg - mpgSE, ymax = mpg+mpgSE), width = .1) +
geom_point(data = summ, color='red',size=4)
Ideally, it would only take a function such as geom_scattermeans() to do this whole thing. But I am not sure how the aesthetics get transferred into subsequent geom functions from ggplot().
Also I've had difficulties in making a function that receives column names as argument and making it work with ddply().
I think plyr is pretty defunct at this point. I would recommend the dplyr package. When programming with dplyr you can use {{ (curly-curly, or embracing as the documentation says) to properly quote expressions.
library(ggplot2)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
geom_point_error <- function(data, x, y, color = 'red', size = 4) {
data <- dplyr::summarise(
data,
x_se = sqrt(var({{x}}))/length({{x}}),
y_se = sqrt(var({{y}}))/length({{y}}),
x = mean({{x}}),
y = mean({{y}})
)
list(
geom_errorbarh(data = data,
mapping = aes(y = y,
xmin = x - x_se, xmax = x + x_se), inherit.aes = F),
geom_errorbar(data = data,
mapping = aes(x = x,
ymin = y - y_se, ymax = y + y_se), width = .1,inherit.aes = F),
geom_point(data = data,
mapping = aes(x = x, y = y),
color = color, size = size)
)
}
ggplot(data = mtcars, mapping = aes(x = drat, y = mpg)) +
geom_point(shape = 21, fill = 'black', color = 'white', size = 3) +
geom_point_error(mtcars, x = drat, y = mpg)
Created on 2021-05-17 by the reprex package (v1.0.0)
The second option would be to build your own ggproto Geom to handle these calculations inside ggplot2, but that is a bit much for right now.
Since my first answer is still the easier solution, I decieded to keep it. This answer should get OP closer to their goal.
Building a ggproto object can be cumbersome depending on what you are trying to do. In your case you are combining 3 ggproto Geoms classes together with the possibility of a new Stat.
The three Geoms are:
GeomErrorbar
GeomErrorbarh
GeomPoint
To get started, sometimes you just need to inherit from one of the classes and overwrite the method, but to pool the three together you will need to do more work.
Lets first consider how each of these Geoms draw their grid objects. Depending on the Geom it is in one of these functions draw_layer(), draw_panel(), and draw_group(). Fortunately, each of the geoms we want to use only use draw_panel() which mean a bit less work for us - we will just call these methods directly and build a new grobTree object. We will just need to be careful that all the correct parameters are making it to our new Geom's draw_panel() method.
Before we get to writing our own draw_panel, we have to first consider setup_params() and setup_data() functions. Occasionally, these will modify the data right out the gate. These steps are usually helpful to have automatic processing here and are often used to standardize/transform the data. A good example is GeomTile and GeomRect, they are essentially the same Geoms but their setup_data() functions differ because they are parameterized differently.
Lets assume you only want to assign an x and a y aesthetic, and leave the calculations of xmin, ymin, xmax, and ymax to the geoms/stats.
Fortunately, GeomPoint just returns the data with no modifications, so we will need to incorporate GeomErrorbar and GeomErrorbarh's setup_data() first. To skip some steps, I am just going to make a new Stat that will take care of transforming those values for us within a compute_group() method.
A note here, GeomErrorbar and GeomErrorbarh allow for another parameter to be included - width and height respectively, which controls how wide the flat portions of the error bars are.
also, within these functions, each will make their own xmin, xmax, ymin, ymax - so we will need to distinguish these parameters.
First load required information into the namespace
library(ggplot2)
library(grid)
"%||%" <- ggplot2:::`%||%`
Start with new Stat, I've decided to call it a PointError
StatPointError <- ggproto(
"StatPointError",
Stat,
#having `width` and `height` as named parameters here insure
#that they will be available to the `Stat` ggproto object.
compute_group = function(data, scales, width = NULL, height = NULL){
data$width <- data$width %||% width %||% (resolution(data$x, FALSE)*0.9)
data$height <- data$height %||% height %||% (resolution(data$y, FALSE)*0.9)
data <- transform(
data,
x = mean(x),
y = mean(y),
# positions for flat parts of vertical error bars
xmin = mean(x) - width /2,
xmax = mean(x) + width / 2,
width = NULL,
# y positions of vertical error bars
ymin = mean(y) - sqrt(var(y))/length(y),
ymax = mean(y) + sqrt(var(y))/length(y),
#positions for flat parts of horizontal error bars
ymin_h = mean(y) - height /2,
ymax_h = mean(y) + height /2,
height = NULL,
# x positions of horizontal error bars
xmin_h = mean(x) - sqrt(var(x))/length(x),
xmax_h = mean(x) + sqrt(var(x))/length(x)
)
unique(data)
}
)
Now for the fun part, the Geom, again I'm going for PointError as a consistent name.
GeomPointError <- ggproto(
"GeomPointError",
GeomPoint,
#include some additional defaults
default_aes = aes(
shape = 19,
colour = "black",
size = 1.5, # error bars have defaults of 0.5 - you may want to add another parameter?
fill = NA,
alpha = NA,
linetype = 1,
stroke = 0.5, # for GeomPoint
width = 0.5, # for GeomErrorbar
height = 0.5, # for GeomErrorbarh
),
draw_panel = function(data, panel_params, coord, width = NULL, height = NULL, na.rm = FALSE) {
#make errorbar grobs
data_errbar <- data
data_errbar[["size"]] <- 0.5
errorbar_grob <- GeomErrorbar$draw_panel(data = data_errbar,
panel_params = panel_params, coord = coord,
width = width, flipped_aes = FALSE)
#re-parameterize errbarh data
data_errbarh <- transform(data,
xmin = xmin_h, xmax = xmax_h, ymin = ymin_h, ymax = ymax_h,
xmin_h = NULL, xmax_h = NULL, ymin_h = NULL, ymax_h = NULL,
size = 0.5)
#make errorbarh grobs
errorbarh_grob <- GeomErrorbarh$draw_panel(data = data_errbarh,
panel_params = panel_params, coord = coord,
height = height)
point_grob <- GeomPoint$draw_panel(data = data, panel_params = panel_params,
coord = coord, na.rm = na.rm)
gt <- grobTree(
errorbar_grob,
errorbarh_grob,
point_grob, name = 'geom_point_error')
gt
}
)
Last, we need a function for the user to call that will make a Layer object.
geom_point_error <- function(mapping = NULL, data = NULL,
position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = StatPointError,
geom = GeomPointError,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
na.rm = na.rm,
...
)
)
}
Now we can test if this is working properly
ggplot(data = mtcars, mapping = aes(x = drat, y = mpg)) +
geom_point(shape = 21, fill = 'black', color = 'white', size = 3) +
geom_point_error(color = "red", width = .1, height = .3)
ggplot(data = mtcars, mapping = aes(x = drat, y = mpg)) +
geom_point(shape = 21, fill = 'black', color = 'white', size = 3) +
geom_point_error(aes(color = hp>100))
Created on 2021-05-18 by the reprex package (v1.0.0)
There is obviously so much more you could do with this, from including additional default aesthetics such that you could control the color and size of the lines/points separately (may want to override GeomPointError$setup_data() to insure everything maps correctly).
Finially, this geom is pretty naive in that it assumes the x and y data mappings are continuous. It still works with mixing continuous and discrete, but looks a bit funky
ggplot(mpg, aes(cty, model)) +
geom_point() +
geom_point_error(color = 'red')
I am learning to code in ggplot2. I wish to create a new geom which makes something complex – not a simple polygon. Say, a rectangle with a point in the middle.
When I am defining my new geom object, say GeomFafik, I need to specify the function draw_panel (or draw_group, or whichever) which returns a grob. Here is how it looks now (based on the extending ggplot2 vignette:
GeomFafik <- ggproto("GeomFafik",
Geom,
required_aes=c("xmin", "ymin", "xmax", "ymax"),
default_aes=aes(shape=19, colour="black"),
draw_key=draw_key_blank(),
draw_panel=function(data, panel_params, coord) {
coords <- coord$transform(data, panel_params)
# how can I combine grob1 with grob2 and
# return the result?
grob1 <- grid::rectGrob(coords$xmin, coords$ymin,
width=coords$xmax - coords$xmin,
height=coords$ymax - coords$ymin,
gp=gpar(col=coord$colour))
grob2 <- grid::pointsGrob(x=coords$xmin + (coords$xmax - coords$xmin)/2,
y=coords$ymin + (coords$ymax - coords$ymax)/2,
gp=gpar(col=coord$colour))
})
Am I right that I should build a grob using grob() or gTree() from grid? Or is there a ggplot2 way of doing that (similar to zeroGrob)? And if grid, then which of the two should I use? And where can I find examples? The manual page doesn't really say much.
OK, so that was easier than I thought. The solution is to use gTree as follows (thanks to this example):
GeomFafik <- ggproto("GeomFafik",
Geom,
required_aes=c("xmin", "ymin", "xmax", "ymax"),
default_aes=aes(shape=19, colour="black"),
draw_key=draw_key_blank(),
draw_panel=function(data, panel_params, coord) {
coords <- coord$transform(data, panel_params)
w <- coords$xmax - coords$xmin
h <- coords$ymax - coords$ymin
x <- coords$xmin + w/2
y <- coords$ymin + h/2
grob1 <- grid::rectGrob(x, y, width=w, height=h,
gp=gpar(col=coord$colour))
grob2 <- grid::pointsGrob(x=x, y=y,
gp=gpar(col=coord$colour))
grid::gTree("fafik_grob", children=grid::gList(grob1, grob2))
})
geom_fafik <- function(mapping = NULL, data = NULL, stat = "identity",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
layer(
geom = GeomFafik, mapping = mapping, data = data, stat = stat,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
Result, which is what I wanted:
I am in the early stages of learning how to extend ggplot2. I would like to create a custom geom and associated stat. My starting point was the vignette. In addition, I have benefited from this and this. I'm trying to put together a template to teach myself and hopefully others.
Main question:
Inside my function calculate_shadows() the needed parameter params$anchor is NULL. How can I access it?
The goal described below is intended solely for learning how to create custom stat and geom functions, it's not a real goal: as you can see from the screenshots, I do know how to leverage the power of ggplot2 to make the graphs.
The geom will read the data and for the supplied variables ("x", "y") will plot (for want of a better word) shadows: a horizontal line min(x)--max(x) at the default y=0 and a vertical line min(y)--max(y) at the default x=0. If an option is supplied, these "anchors" could be changed, e.g. if the user supplies x = 35, y = 1, the horizontal line would be drawn at the intercept y = 1 while the vertical line would be drawn at the intercept x = 35. Usage:
library(ggplot2)
ggplot(data = mtcars, aes(x = mpg, y = wt)) +
geom_point() +
geom_shadows(x = 35, y = 1)
The stat will read the data and for the supplied variables ("x", "y") will compute shadows according to the value of stat. For instance, by passing stat = "identity", the shadows would be computed for the min and max of the data (as done by geom_shadows). But by passing stat = "quartile", the shadows would be computed for first and third quartile. More generally, one could pass a function like stats::quantile with arguments args = list(probs = c(0.10, 0.90), type = 6), to compute shadows using the 10th and 90th percentiles and the quantile method of type 6. Usage:
ggplot(data = mtcars, aes(x = mpg, y = wt)) +
geom_point() +
stat_shadows(stat = "quartile")
Unfortunately, my lack of familiarity with extending ggplot2 stopped me well short of my objective. These plots were "faked" with geom_segment. Based on the tutorial and discussions cited above and inspecting existing code like stat-qq or stat-smooth, I have put together a basic architecture for this goal. It must contain many mistakes, I would be grateful for guidance. Also, note that either of these approaches would be fine: geom_shadows(anchor = c(35, 1)) or geom_shadows(x = 35, y = 1).
Now here are my efforts. First, geom-shadows.r to define geom_shadows(). Second, stat-shadows.r to define stat_shadows(). The code doesn't work as is. But if I execute its content, it does produce the desired statistics. For clarity, I have removed most of the calculations in stat_shadows(), such as quartiles, to focus on essentials. Any obvious mistake in the layout?
geom-shadows.r
#' documentation ought to be here
geom_shadows <- function(
mapping = NULL,
data = NULL,
stat = "shadows",
position = "identity",
...,
anchor = list(x = 0, y = 0),
shadows = list("x", "y"),
type = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomShadows,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
anchor = anchor,
shadows = shadows,
type = type,
na.rm = na.rm,
...
)
)
}
GeomShadows <- ggproto("GeomShadows", Geom,
# set up the data, e.g. remove missing data
setup_data = function(data, params) {
data
},
# set up the parameters, e.g. supply warnings for incorrect input
setup_params = function(data, params) {
params
},
draw_group = function(data, panel_params, coord, anchor, shadows, type) {
# draw_group uses stats returned by compute_group
# set common aesthetics
geom_aes <- list(
alpha = data$alpha,
colour = data$color,
size = data$size,
linetype = data$linetype,
fill = alpha(data$fill, data$alpha),
group = data$group
)
# merge aesthetics with data calculated in setup_data
geom_stats <- new_data_frame(c(list(
x = c(data$x.xmin, data$y.xmin),
xend = c(data$x.xmax, data$y.xmax),
y = c(data$x.ymin, data$y.ymin),
yend = c(data$x.ymax, data$y.ymax),
alpha = c(data$alpha, data$alpha)
), geom_aes
), n = 2)
# turn the stats data into a GeomPath
geom_grob <- GeomSegment$draw_panel(unique(geom_stats),
panel_params, coord)
# pass the GeomPath to grobTree
ggname("geom_shadows", grobTree(geom_grob))
},
# set legend box styles
draw_key = draw_key_path,
# set default aesthetics
default_aes = aes(
colour = "blue",
fill = "red",
size = 1,
linetype = 1,
alpha = 1
)
)
stat-shadows.r
#' documentation ought to be here
stat_shadows <-
function(mapping = NULL,
data = NULL,
geom = "shadows",
position = "identity",
...,
# do I need to add the geom_shadows arguments here?
anchor = list(x = 0, y = 0),
shadows = list("x", "y"),
type = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
stat = StatShadows,
data = data,
mapping = mapping,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
# geom_shadows argument repeated here?
anchor = anchor,
shadows = shadows,
type = type,
na.rm = na.rm,
...
)
)
}
StatShadows <-
ggproto("StatShadows", Stat,
# do I need to repeat required_aes?
required_aes = c("x", "y"),
# set up the data, e.g. remove missing data
setup_data = function(data, params) {
data
},
# set up parameters, e.g. unpack from list
setup_params = function(data, params) {
params
},
# calculate shadows: returns data_frame with colnames: xmin, xmax, ymin, ymax
compute_group = function(data, scales, anchor = list(x = 0, y = 0), shadows = list("x", "y"), type = NULL, na.rm = TRUE) {
.compute_shadows(data = data, anchor = anchor, shadows = shadows, type = type)
}
)
# Calculate the shadows for each type / shadows / anchor
.compute_shadows <- function(data, anchor, shadows, type) {
# Deleted all type-checking, etc. for MWE
# Only 'type = c(double, double)' accepted, e.g. type = c(0, 1)
qs <- type
# compute shadows along the x-axis
if (any(shadows == "x")) {
shadows.x <- c(
xmin = as.numeric(stats::quantile(data[, "x"], qs[[1]])),
xmax = as.numeric(stats::quantile(data[, "x"], qs[[2]])),
ymin = anchor[["y"]],
ymax = anchor[["y"]])
}
# compute shadows along the y-axis
if (any(shadows == "y")) {
shadows.y <- c(
xmin = anchor[["x"]],
xmax = anchor[["x"]],
ymin = as.numeric(stats::quantile(data[, "y"], qs[[1]])),
ymax = as.numeric(stats::quantile(data[, "y"], qs[[2]])))
}
# store shadows in one data_frame
stats <- new_data_frame(c(x = shadows.x, y = shadows.y))
# return the statistics
stats
}
.
Until a more thorough answer comes along: You are missing
extra_params = c("na.rm", "shadows", "anchor", "type"),
inside GeomShadows <- ggproto("GeomShadows", Geom,
and possibly also inside StatShadows <- ggproto("StatShadows", Stat,.
Inside geom-.r and stat-.r there are many very useful comments that clarify how geoms and stats work. In particular (hat tips Claus Wilke over at github issues):
# Most parameters for the geom are taken automatically from draw_panel() or
# draw_groups(). However, some additional parameters may be needed
# for setup_data() or handle_na(). These can not be imputed automatically,
# so the slightly hacky "extra_params" field is used instead. By
# default it contains `na.rm`
extra_params = c("na.rm"),
I would like to draw a hollow histogram that has no vertical bars drawn inside of it, but just an outline. I couldn't find any way to do it with geom_histogram. The geom_step+stat_bin combination seemed like it could do the job. However, the bins of geom_step+stat_bin are shifted by a half bin either to the right or to the left, depending on the step's direction= parameter value. It seems like it is doing its "steps" WRT bin centers. Is there any way to change this behavior so it would do the "steps" at bin edges?
Here's an illustration:
d <- data.frame(x=rnorm(1000))
qplot(x, data=d, geom="histogram",
breaks=seq(-4,4,by=.5), color=I("red"), fill = I("transparent")) +
geom_step(stat="bin", breaks=seq(-4,4,by=.5), color="black", direction="vh")
I propose making a new Geom like so:
library(ggplot2)
library(proto)
geom_stephist <- function(mapping = NULL, data = NULL, stat="bin", position="identity", ...) {
GeomStepHist$new(mapping=mapping, data=data, stat=stat, position=position, ...)
}
GeomStepHist <- proto(ggplot2:::Geom, {
objname <- "stephist"
default_stat <- function(.) StatBin
default_aes <- function(.) aes(colour="black", size=0.5, linetype=1, alpha = NA)
reparameterise <- function(., df, params) {
transform(df,
ymin = pmin(y, 0), ymax = pmax(y, 0),
xmin = x - width / 2, xmax = x + width / 2, width = NULL
)
}
draw <- function(., data, scales, coordinates, ...) {
data <- as.data.frame(data)[order(data$x), ]
n <- nrow(data)
i <- rep(1:n, each=2)
newdata <- rbind(
transform(data[1, ], x=xmin, y=0),
transform(data[i, ], x=c(rbind(data$xmin, data$xmax))),
transform(data[n, ], x=xmax, y=0)
)
rownames(newdata) <- NULL
GeomPath$draw(newdata, scales, coordinates, ...)
}
guide_geom <- function(.) "path"
})
This also works for non-uniform breaks. To illustrate the usage:
d <- data.frame(x=runif(1000, -5, 5))
ggplot(d, aes(x)) +
geom_histogram(breaks=seq(-4,4,by=.5), color="red", fill=NA) +
geom_stephist(breaks=seq(-4,4,by=.5), color="black")
This isn't ideal, but it's the best I can come up with:
h <- hist(d$x,breaks=seq(-4,4,by=.5))
d1 <- data.frame(x = h$breaks,y = c(h$counts,NA))
ggplot() +
geom_histogram(data = d,aes(x = x),breaks = seq(-4,4,by=.5),
color = "red",fill = "transparent") +
geom_step(data = d1,aes(x = x,y = y),stat = "identity")
Yet another one. Use ggplot_build to build a plot object of the histogram for rendering. From this object x and y values are extracted, to be used for geom_step. Use by to offset x values.
by <- 0.5
p1 <- ggplot(data = d, aes(x = x)) +
geom_histogram(breaks = seq(from = -4, to = 4, by = by),
color = "red", fill = "transparent")
df <- ggplot_build(p1)$data[[1]][ , c("x", "y")]
p1 +
geom_step(data = df, aes(x = x - by/2, y = y))
Edit following comment from #Vadim Khotilovich (Thanks!)
The xmin from the plot object can be used instead (-> no need for offset adjustment)
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y")]
p1 +
geom_step(data = df, aes(x = xmin, y = y))
An alternative, also less than ideal:
qplot(x, data=d, geom="histogram", breaks=seq(-4,4,by=.5), color=I("red"), fill = I("transparent")) +
stat_summary(aes(x=round(x * 2 - .5) / 2, y=1), fun.y=length, geom="step")
Missing some bins that you can probably add back if you mess around a bit. Only (somewhat meaningless) advantage is it is more in ggplot than #Joran's answer, though even that is debatable.
I answer my own comment earlier today: here is a modified version of #RosenMatev's answer updated for the v2 (ggplot2_2.0.0) using ggproto:
GeomStepHist <- ggproto("GeomStepHist", GeomPath,
required_aes = c("x"),
draw_panel = function(data, panel_scales, coord, direction) {
data <- as.data.frame(data)[order(data$x), ]
n <- nrow(data)
i <- rep(1:n, each=2)
newdata <- rbind(
transform(data[1, ], x=x - width/2, y=0),
transform(data[i, ], x=c(rbind(data$x-data$width/2, data$x+data$width/2))),
transform(data[n, ], x=x + width/2, y=0)
)
rownames(newdata) <- NULL
GeomPath$draw_panel(newdata, panel_scales, coord)
}
)
geom_step_hist <- function(mapping = NULL, data = NULL, stat = "bin",
direction = "hv", position = "stack", na.rm = FALSE,
show.legend = NA, inherit.aes = TRUE, ...) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomStepHist,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
direction = direction,
na.rm = na.rm,
...
)
)
}
TLDR: use geom_step(..., direction = "mid")
This has become much easier since Daniel Mastropietro and Dewey Dunnington implemented the "mid" as an additional option for the direction argument of geom_step for ggplot2 v3.3.0:
library(ggplot2)
set.seed(1)
d <- data.frame(x = rnorm(1000))
ggplot(d, aes(x)) +
geom_histogram(breaks = seq(-4, 4, by=.5), color="red", fill = "transparent") +
geom_step(stat="bin", breaks=seq(-4, 4, by=.5), color = "black", direction = "mid")
Below, for reference, the code from the question formatted like above answer:
ggplot(d, aes(x)) +
geom_histogram(breaks = seq(-4, 4, by=.5), color = "red", fill = "transparent") +
geom_step(stat="bin", breaks = seq(-4, 4, by=.5), color = "black", direction = "vh")
Created on 2020-09-02 by the reprex package (v0.3.0)
a simple way to do something similar to #Rosen Matev (that does not work with ggplot2_2.0.0 as mentioned by #julou), I would just
1) calculate manually the value of the bins (using a small function as shown below)
2) use geom_step()
Hope this helps !
geom_step_hist<- function(d,binw){
dd=NULL
bin=min(d$y) # this enables having a first value that is = 0 (to have the left vertical bar of the plot when using geom_step)
max=max(d$y)+binw*2 # this enables having a last value that is = 0 (to have the right vertical bar of the plot when using geom_step)
xx=NULL
yy=NULL
while(bin<=max){
n=length(temp$y[which(temp$y<bin & temp$y>=(bin-binw))])
yy=c(yy,n)
xx=c(xx,bin-binw)
bin=bin+binw
rm(n)
}
dd=data.frame(xx,yy)
return(dd)
}
hist=ggplot(dd,aes(x=xx,y=yy))+
geom_step()
I would like to draw a hollow histogram that has no vertical bars drawn inside of it, but just an outline. I couldn't find any way to do it with geom_histogram. The geom_step+stat_bin combination seemed like it could do the job. However, the bins of geom_step+stat_bin are shifted by a half bin either to the right or to the left, depending on the step's direction= parameter value. It seems like it is doing its "steps" WRT bin centers. Is there any way to change this behavior so it would do the "steps" at bin edges?
Here's an illustration:
d <- data.frame(x=rnorm(1000))
qplot(x, data=d, geom="histogram",
breaks=seq(-4,4,by=.5), color=I("red"), fill = I("transparent")) +
geom_step(stat="bin", breaks=seq(-4,4,by=.5), color="black", direction="vh")
I propose making a new Geom like so:
library(ggplot2)
library(proto)
geom_stephist <- function(mapping = NULL, data = NULL, stat="bin", position="identity", ...) {
GeomStepHist$new(mapping=mapping, data=data, stat=stat, position=position, ...)
}
GeomStepHist <- proto(ggplot2:::Geom, {
objname <- "stephist"
default_stat <- function(.) StatBin
default_aes <- function(.) aes(colour="black", size=0.5, linetype=1, alpha = NA)
reparameterise <- function(., df, params) {
transform(df,
ymin = pmin(y, 0), ymax = pmax(y, 0),
xmin = x - width / 2, xmax = x + width / 2, width = NULL
)
}
draw <- function(., data, scales, coordinates, ...) {
data <- as.data.frame(data)[order(data$x), ]
n <- nrow(data)
i <- rep(1:n, each=2)
newdata <- rbind(
transform(data[1, ], x=xmin, y=0),
transform(data[i, ], x=c(rbind(data$xmin, data$xmax))),
transform(data[n, ], x=xmax, y=0)
)
rownames(newdata) <- NULL
GeomPath$draw(newdata, scales, coordinates, ...)
}
guide_geom <- function(.) "path"
})
This also works for non-uniform breaks. To illustrate the usage:
d <- data.frame(x=runif(1000, -5, 5))
ggplot(d, aes(x)) +
geom_histogram(breaks=seq(-4,4,by=.5), color="red", fill=NA) +
geom_stephist(breaks=seq(-4,4,by=.5), color="black")
This isn't ideal, but it's the best I can come up with:
h <- hist(d$x,breaks=seq(-4,4,by=.5))
d1 <- data.frame(x = h$breaks,y = c(h$counts,NA))
ggplot() +
geom_histogram(data = d,aes(x = x),breaks = seq(-4,4,by=.5),
color = "red",fill = "transparent") +
geom_step(data = d1,aes(x = x,y = y),stat = "identity")
Yet another one. Use ggplot_build to build a plot object of the histogram for rendering. From this object x and y values are extracted, to be used for geom_step. Use by to offset x values.
by <- 0.5
p1 <- ggplot(data = d, aes(x = x)) +
geom_histogram(breaks = seq(from = -4, to = 4, by = by),
color = "red", fill = "transparent")
df <- ggplot_build(p1)$data[[1]][ , c("x", "y")]
p1 +
geom_step(data = df, aes(x = x - by/2, y = y))
Edit following comment from #Vadim Khotilovich (Thanks!)
The xmin from the plot object can be used instead (-> no need for offset adjustment)
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y")]
p1 +
geom_step(data = df, aes(x = xmin, y = y))
An alternative, also less than ideal:
qplot(x, data=d, geom="histogram", breaks=seq(-4,4,by=.5), color=I("red"), fill = I("transparent")) +
stat_summary(aes(x=round(x * 2 - .5) / 2, y=1), fun.y=length, geom="step")
Missing some bins that you can probably add back if you mess around a bit. Only (somewhat meaningless) advantage is it is more in ggplot than #Joran's answer, though even that is debatable.
I answer my own comment earlier today: here is a modified version of #RosenMatev's answer updated for the v2 (ggplot2_2.0.0) using ggproto:
GeomStepHist <- ggproto("GeomStepHist", GeomPath,
required_aes = c("x"),
draw_panel = function(data, panel_scales, coord, direction) {
data <- as.data.frame(data)[order(data$x), ]
n <- nrow(data)
i <- rep(1:n, each=2)
newdata <- rbind(
transform(data[1, ], x=x - width/2, y=0),
transform(data[i, ], x=c(rbind(data$x-data$width/2, data$x+data$width/2))),
transform(data[n, ], x=x + width/2, y=0)
)
rownames(newdata) <- NULL
GeomPath$draw_panel(newdata, panel_scales, coord)
}
)
geom_step_hist <- function(mapping = NULL, data = NULL, stat = "bin",
direction = "hv", position = "stack", na.rm = FALSE,
show.legend = NA, inherit.aes = TRUE, ...) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomStepHist,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
direction = direction,
na.rm = na.rm,
...
)
)
}
TLDR: use geom_step(..., direction = "mid")
This has become much easier since Daniel Mastropietro and Dewey Dunnington implemented the "mid" as an additional option for the direction argument of geom_step for ggplot2 v3.3.0:
library(ggplot2)
set.seed(1)
d <- data.frame(x = rnorm(1000))
ggplot(d, aes(x)) +
geom_histogram(breaks = seq(-4, 4, by=.5), color="red", fill = "transparent") +
geom_step(stat="bin", breaks=seq(-4, 4, by=.5), color = "black", direction = "mid")
Below, for reference, the code from the question formatted like above answer:
ggplot(d, aes(x)) +
geom_histogram(breaks = seq(-4, 4, by=.5), color = "red", fill = "transparent") +
geom_step(stat="bin", breaks = seq(-4, 4, by=.5), color = "black", direction = "vh")
Created on 2020-09-02 by the reprex package (v0.3.0)
a simple way to do something similar to #Rosen Matev (that does not work with ggplot2_2.0.0 as mentioned by #julou), I would just
1) calculate manually the value of the bins (using a small function as shown below)
2) use geom_step()
Hope this helps !
geom_step_hist<- function(d,binw){
dd=NULL
bin=min(d$y) # this enables having a first value that is = 0 (to have the left vertical bar of the plot when using geom_step)
max=max(d$y)+binw*2 # this enables having a last value that is = 0 (to have the right vertical bar of the plot when using geom_step)
xx=NULL
yy=NULL
while(bin<=max){
n=length(temp$y[which(temp$y<bin & temp$y>=(bin-binw))])
yy=c(yy,n)
xx=c(xx,bin-binw)
bin=bin+binw
rm(n)
}
dd=data.frame(xx,yy)
return(dd)
}
hist=ggplot(dd,aes(x=xx,y=yy))+
geom_step()