stacking geom_ribbon - r

I am trying to use geom_ribbon to mimic the behavior of geom_area
but i am not successful. would you have any hint on why the following does not work ?
I used Hadley's statement from ggplot2 geom_area web pages :
"An area plot is a special case of geom_ribbon, where the minimum of the range is fixed to 0, and the position adjustment defaults to position_stacked."
test <- expand.grid(Param = LETTERS[1:3], x = 1:5)
test$y <- test$x
# Ok
p <- ggplot(test)
p <- p + geom_area(aes(x = x, y = y, group = Param, fill = Param), alpha = 0.3)
p
# not ok - initial idea
p <- ggplot(test)
p <- p + geom_ribbon(aes(x = x, ymin = 0, ymax = y, group = Param, fill = Param), alpha = 0.3, position = position_stack())
p
further, how can I look in the code of functions coded the way geom_XXX are?
my traditional way gives the following, which is not very usefull:
> geom_ribbon
function (mapping = NULL, data = NULL, stat = "identity", position = "identity",
na.rm = FALSE, ...)
GeomRibbon$new(mapping = mapping, data = data, stat = stat, position = position,
na.rm = na.rm, ...)
Thanks for your help
Regards
Pascal

You just didn't map a variable to y in your geom_ribbon call. Adding y = y causes it to work for me. In general, geom_ribbon doesn't require a y aesthetic, but I believe it does in the case of stacking. I presume there's a well-thought out reasoning for why that is, but you never know...
Also, all the source code for ggplot2 is on github.

Related

Shade parts of a ggplot based on a (changing) dummy variable

I want to shade areas in a ggplot but I don't want to manually tell geom_rect() where to stop and where to start. My data changes and I always want to shade several areas based on a condition.
Here for example with the condition "negative":
library("ggplot2")
set.seed(3)
plotdata <- data.frame(somevalue = rnorm(10), indicator = 0 , counter = 1:10)
plotdata[plotdata$somevalue < 0,]$indicator <- 1
plotdata
I can do that manually like here or here:
plotranges <- data.frame(from = c(1,4,9), to = c(2,4,9))
ggplot() +
geom_line(data = plotdata, aes(x = counter, y = somevalue)) +
geom_rect(data = plotranges, aes(xmin = from - 1, xmax = to, ymin = -Inf, ymax = Inf), alpha = 0.4)
But my problem is that, so to speak, the set.seed() argument changes and I want to still automatically generate the plot without specifying min and max values of the shading manually. Is there a way (maybe without geom_rect() but instead geom_bar()?) to plot shading based directly on my indicator variable?
edit: Here is my own best attempt, as you can see not great:
ggplot(data = plotdata, aes(x = counter, y = somevalue)) + geom_line() +
geom_bar(aes(y = indicator*max(somevalue)), stat= "identity")
You can use stat_summary() to calculate the extremes of runs of your indicator. In the code below data.table::rleid() groups the data by runs of indicators. In the summary layer, y doesn't really do anything, so we use it to store the resolution of your datapoints, which we then later use to offset the xmin/xmax parameters. The after_stat() function is used to access computed variables after the ranges have been computed.
library("ggplot2")
plotdata <- data.frame(somevalue = rnorm(10), counter = 1:10)
plotdata$indicator <- as.numeric(plotdata$somevalue < 0)
ggplot(plotdata, aes(counter, somevalue)) +
stat_summary(
geom = "rect",
aes(group = data.table::rleid(indicator),
xmin = after_stat(xmin - y * 0.5),
xmax = after_stat(xmax + y * 0.5),
alpha = I((indicator) * 0.4),
y = resolution(counter)),
fun.min = min, fun.max = max,
orientation = "y", ymin = -Inf, ymax = Inf
) +
geom_line()
Created on 2021-09-14 by the reprex package (v2.0.1)

geom_text / geom_label with the bquote function

My data :
dat <- data_frame(x = c(1,2,3,4,5,6), y = c(2,2,2,6,2,2))
I wish to display this expression beside the point (x=4,y=6) :
expression <- bquote(paste(frac(a[z], b[z]), " = ", .(dat[which.max(dat$y),"y"] %>% as.numeric())))
But, when I am using this expression with ggplot :
ggplot() +
geom_point(data = dat, aes(x = x, y = y)) +
geom_label(data = dat[which.max(dat$y),], aes(x = x, y = y, label = expression))
I get this error message :
Error: Aesthetics must be either length 1 or the same as the data (1): label
You could use the following code (keeping your definitions of the data and the expression):
Not related to your question, but: it is always better to define aesthetics in the ggplot-call and get it reused in the subsequent function calls. If needed, you may override the definitions, like done below in geom_label
ggplot(data = dat, aes(x = x, y = y)) +
geom_point() +
geom_label(data = dat[4,], label = deparse(expression), parse = TRUE,
hjust = 0, nudge_x = .1)
hjust and nudge_x are used to position the label relative to the point. One could argue to use nudge_y as well to get the whole label in the picture.
yielding this plot:
Please let me know whether this is what you want.

Adding tables to ggplot2 with facet_wrap in r

I want to add a table with some info that will be different in each panel within the facet.
I'm using ggplot2 and facet_grid.
say I want to add some kind of descriptive statistics to each panel, and they not necessarily the same.
these statistics are placed in a df I made for that purpose.
I found a few way to add these table to the graphs but:
as far as I concern Annotate will give me the same table for all the panels in the facet.
I would really like to use the facet_warp for the simplicity and not grid_extra...
library(datasets)
data(mtcars)
ggplot(data = mpg, aes(x = displ, y = hwy, color = drv)) +
geom_point() +
facet_wrap( ~ cyl,scales="free_y")
the place of the table is not that important to me, but I don't want it to overlap the graph.
My objective is kind of mixture between the two answers in that thread:
Adding table to ggplot with facets
The first answers (with annotate-) won't work for me since I want the table in each of the plot to be unique.)
The second answer is better, but I do not want it to overlap or hide some of the details in the graph, and in each panel the lines/scatters located in different place so I can't use it like that. I would like it to be attached just like in the annotate.
try this
library(ggplot2)
library(tibble)
library(gridExtra)
library(grid)
GeomCustom <- ggproto(
"GeomCustom",
Geom,
setup_data = function(self, data, params) {
data <- ggproto_parent(Geom, self)$setup_data(data, params)
data
},
draw_group = function(data, panel_scales, coord) {
vp <- grid::viewport(x=data$x, y=data$y)
g <- grid::editGrob(data$grob[[1]], vp=vp)
ggplot2:::ggname("geom_custom", g)
},
required_aes = c("grob","x","y")
)
geom_custom <- function(mapping = NULL,
data = NULL,
stat = "identity",
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = FALSE,
...) {
layer(
geom = GeomCustom,
mapping = mapping,
data = data,
stat = stat,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
gl <- list(tableGrob(iris[1:2,1:3]),
tableGrob(iris[1:4,1:3]),
tableGrob(iris[1:3,1:3]),
tableGrob(iris[1:2,1:2]))
dummy <- tibble(f=letters[1:4], grob = gl )
d <- tibble(x=rep(1:3, 4), f=rep(letters[1:4], each=3))
ggplot(d, aes(x,x)) +
facet_wrap(~f) +
theme_bw() +
geom_custom(data=dummy, aes(grob=grob), x = 0.5, y = 0.5)

Object not being found when using ggplot2 in R

While creating a shot chart in R, I've been using some open source stuff from Todd W. Schneider's BallR court design (https://github.com/toddwschneider/ballr/blob/master/plot_court.R)
along with another Stack Overflow post on how to create percentages within hexbins (How to replicate a scatterplot with a hexbin plot in R?).
Both sources have been really helpful for me.
When I run the following lines of code, I get a solid hexbin plot of percent made for shots for the different locations on the court:
ggplot(shots_df, aes(x = location_y-25, y = location_x, z = made_flag)) +
stat_summary_hex(fun = mean, alpha = 0.8, bins = 30) +
scale_fill_gradientn(colors = my_colors(7), labels = percent_format(),
name = "Percent Made")
However, when I include the BallR court design code snippet, which is shown below:
ggplot(shots_df, aes(x=location_y-25,y=location_x,z=made_flag)) +
stat_summary_hex(fun = mean, alpha = 0.8, bins = 30) +
scale_fill_gradientn(colors = my_colors(7), labels=percent_format(),
name="Percent Made") +
geom_path(data = court_points,
aes(x = x, y = y, group = desc, linetype = dash),
color = "#000004") +
scale_linetype_manual(values = c("solid", "longdash"), guide = FALSE) +
coord_fixed(ylim = c(0, 35), xlim = c(-25, 25)) +
theme_court(base_size = 22)
I get the error: Error in eval(expr, envir, enclos) : object 'made_flag' not found, even though that the made_flag is 100% in the data frame, shots_df, and worked in the original iteration. I am lost on how to fix this problem.
I believe your problem lies in the geom_path() layer. Try this tweek:
geom_path(data = court_points, aes(x = x, y = y, z = NULL, group = desc, linetype = dash))
Because you set the z aesthetic at the top, it is still inheriting in geom_path() even though you are on a different data source. You have to manually overwrite this with z = NULL.

In R ggplot2, include stat_ecdf() endpoints (0,0) and (1,1)

I'm trying to use stat_ecdf() to plot cumulative successes as a function of a rank score created by a predictive model.
#libraries
require(ggplot2)
require(scales)
# fake data for reproducibility
set.seed(123)
n <- 200
df <- data.frame(model_score= rexp(n=n,rate=1:n),
obs_set= sample(c("training","validation"),n,replace=TRUE))
df$model_rank <- rank(df$model_score)/n
df$target_outcome <- rbinom(n,1,1-df$model_rank)
# Plot Gain Chart using stat_ecdf()
ggplot(subset(df,target_outcome==1),aes(x = model_rank)) +
stat_ecdf(aes(colour = obs_set), size=1) +
scale_x_continuous(limits=c(0,1), labels=percent,breaks=seq(0,1,.1)) +
xlab("Model Percentile") + ylab("Percent of Target Outcome") +
scale_y_continuous(limits=c(0,1), labels=percent) +
geom_segment(aes(x=0,y=0,xend=1,yend=1),
colour = "gray", linetype="longdash", size=1) +
ggtitle("Gain Chart")
All I want to do is force the ECDF to start at (0,0) and end at (1,1) so that there are no gaps at the beginning or end of the curve. If possible, I'd like to do it within the syntax of ggplot2, but I'd settle for a clever workaround.
#Henrik this is NOT a duplicate of this question, because I have already defined my limits with scale_x_ and _y_continuous(), and adding expand_limits() doesn't do anything. It is not the origin of the PLOT but the endpoints of the stat_ecdf() that need fixed.
Unfortunately, the definition of stat_ecdf gives no wiggle room here; it determines the endpoints internally.
There is a somewhat advanced solution. With the latest version of ggplot2 (devtools::install_github("hadley/ggplot2")), the extensibility is improved, to the point where it is possible to override this behavior, but not without some boilerplate.
stat_ecdf2 <- function(mapping = NULL, data = NULL, geom = "step",
position = "identity", n = NULL, show.legend = NA,
inherit.aes = TRUE, minval=NULL, maxval=NULL,...) {
layer(
data = data,
mapping = mapping,
stat = StatEcdf2,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
stat_params = list(n = n, minval=minval,maxval=maxval),
params = list(...)
)
}
StatEcdf2 <- ggproto("StatEcdf2", StatEcdf,
calculate = function(data, scales, n = NULL, minval=NULL, maxval=NULL, ...) {
df <- StatEcdf$calculate(data, scales, n, ...)
if (!is.null(minval)) { df$x[1] <- minval }
if (!is.null(maxval)) { df$x[length(df$x)] <- maxval }
df
}
)
Now, stat_ecdf2 will behave the same as stat_ecdf, but with an optional minval and maxval parameter. So this will do the trick:
ggplot(subset(df,target_outcome==1),aes(x = model_rank)) +
stat_ecdf2(aes(colour = obs_set), size=1, minval=0, maxval=1) +
scale_x_continuous(limits=c(0,1), labels=percent,breaks=seq(0,1,.1)) +
xlab("Model Percentile") + ylab("Percent of Target Outcome") +
scale_y_continuous(limits=c(0,1), labels=percent) +
geom_segment(aes(x=0,y=0,xend=1,yend=1),
colour = "gray", linetype="longdash", size=1) +
ggtitle("Gain Chart")
The big caveat here is that I don't know if the current extensibility model will be supported in the future; it has changed several times in the past, and the change to use "ggproto" is recent -- like July 15th 2015 recent.
As a plus, this gave me a chance to really dig into ggplot's internals, which is something that I've been meaning to do for a while.

Resources