Using ... function argument as input to another function - r

I would like to use ... to pass arguments into ggplot in a different function. For example:
dat <- data.frame(x = c(1, 2, 3), y = c(1, 2, 3))
f <- function(dat) {
ylimits = c(min(dat$x, dat$y), max(dat$x, dat$y))
g(dat, ylim = ylimits)
}
g <- function(dat, ...) {
args <- eval(substitute(alist(...)))
ggplot(dat, aes(x = x, y = y)) + geom_point() + coord_cartesian(ylim = args[['ylim']])
}
f(dat)
I tried using eval(args[['ylim']]), various combinations of quote/deparse/substitute but I haven't been able to get it to evaluate properly.

The environment of the previous function isn't passed along with the object, so if you save the call and then try to evaluate it the expression in g it won't be able to find ylimits, which only exists in f's environment.
One option is to use the lazyeval package, but it is currently being deprecated in favor of rlang, whose dots_list will do the trick nicely for you:
library(ggplot2)
dat <- data.frame(x = c(1, 2, 3), y = c(1, 2, 3))
f <- function(dat) {
ylimits = c(min(dat$x, dat$y), max(dat$x, dat$y))
g(dat, ylim = ylimits)
}
g <- function(dat, ...) {
args <- rlang::dots_list(...)
ggplot(dat, aes(x = x, y = y)) + geom_point() + coord_cartesian(ylim = eval(args[['ylim']]))
}
f(dat)

Related

Plot a discontinuous function in R without connecting a "jump"

I'd like to plot a discontinuous function without connecting a jump. For example, in the following plot, I'd like to delete the line connecting (0.5, 0.5) and (0.5, 1.5).
f <- function(x){
(x < .5) * (x) + (x >= .5) * (x + 1)
}
ggplot()+
geom_function(fun = f)
Edit: I'm looking for a solution that works even if the discountinuous point is not a round number, say pi/10.
You could write a little wrapper function which finds discontinuities in the given function and plots them as separate groups:
plot_fun <- function(fun, from = 0, to = 1, by = 0.001) {
x <- seq(from, to, by)
groups <- cut(x, c(-Inf, x[which(abs(diff(fun(x))) > 0.1)], Inf))
df <- data.frame(x, groups, y = fun(x))
ggplot(df, aes(x, y, group = groups)) +
geom_line()
}
This allows
plot_fun(f)
plot_fun(floor, 0, 10)
This answer is based on Allan Cameron's answer, but depicts the jump using open and closed circles. Whether the function is right or left continuous is controlled by an argument.
library("ggplot2")
plot_fun <- function(fun, from = 0, to = 1, by = 0.001, right_continuous = TRUE) {
x <- seq(from, to, by)
tol_vertical <- 0.1
y <- fun(x)
idx_break <- which(abs(diff(y)) > tol_vertical)
x_break <- x[idx_break]
y_break_l <- y[idx_break]
y_break_r <- y[idx_break + 1]
groups <- cut(x, c(-Inf, x_break, Inf))
df <- data.frame(x, groups, y = fun(x))
plot_ <- ggplot(df, aes(x, y, group = groups)) +
geom_line()
# add open and closed points showing jump
dataf_l <- data.frame(x = x_break, y = y_break_l)
dataf_r <- data.frame(x = x_break, y = y_break_r)
shape_open_circle <- 1
# this is the default of shape, but might as well specify.
shape_closed_circle <- 19
shape_size <- 4
if (right_continuous) {
shape_l <- shape_open_circle
shape_r <- shape_closed_circle
} else {
shape_l <- shape_closed_circle
shape_r <- shape_open_circle
}
plot_ <- plot_ +
geom_point(data = dataf_l, aes(x = x, y = y), group = NA, shape = shape_l, size = shape_size) +
geom_point(data = dataf_r, aes(x = x, y = y), group = NA, shape = shape_r, size = shape_size)
return(plot_)
}
Here's the OP's original example:
f <- function(x){
(x < .5) * (x) + (x >= .5) * (x + 1)
}
plot_fun(f)
Here's Allan's additional example using floor, which shows multiple discontinuities:
plot_fun(floor, from = 0, to = 10)
And here's an example showing that the function does not need to be piecewise linear:
f_curved <- function(x) ifelse(x > 0, yes = 0.5*(2-exp(-x)), no = 0)
plot_fun(f_curved, from = -1, to = 5)
You can insert everything inside an ifelse:
f <- function(x){
ifelse(x==0.5,
NA,
(x < .5) * (x) + (x >= .5) * (x + 1))
}
ggplot()+
geom_function(fun = f)

Labelling R2 and p value in ggplot?

I am trying to add lm model coefs of two parallel modelling results onto the same ggplot plot. Here is my working example:
library(ggplot2)
set.seed(100)
dat <- data.frame(
x <- rnorm(100, 1),
y <- rnorm(100, 10),
lev <- gl(n = 2, k = 50, labels = letters[1:2])
)
mod1 <- lm(y~x, dat = dat[lev %in% "a", ])
r1 <- paste("R^2==", round(summary(mod1)[[9]], 3))
p1<- paste("p==", round(summary(mod1)[[4]][2, 4], 3), sep= "")
lab1 <- paste(r1, p1, sep =",")
mod2 <- lm(y~x, dat = dat[lev %in% "b", ])
r2 <- paste("R^2==", round(summary(mod2)[[9]], 3))
p2 <- paste("p==", round(summary(mod2)[[4]][2, 4], 3), sep= "")
lab2 <- paste(r2, p2, sep =",")
ggplot(dat, aes(x = x, y = y, col = lev)) + geom_jitter() + geom_smooth(method = "lm") + annotate("text", x = 2, y = 12, label = lab1, parse = T) + annotate("text", x = 10, y = 8, label = lab2, parse = T)
Here is the promot shows:
Error in parse(text = text[[i]]) : <text>:1:12: unexpected ','
1: R^2== 0.008,
Now the problem is that I could label either R2 or p value seperately, but not both of them together. How could I do to put the two results into one single line on the figure?
BTW, any other efficienty way of doing the same thing as my code? I have nine subplots that I want to put into one full plot, and I don't want to add them one by one.
++++++++++++++++++++++++++ Some update ++++++++++++++++++++++++++++++++++
Following #G. Grothendieck 's kind suggestion and idea, I tried to wrap the most repeatative part of the codes into a function, so I could finish all the plot with a few lines. Now the problem is that, whatever I changed the input variables, the output plot are basically the same, except the axis labels. Can anyone explain why? The following is the working code I used:
library(ggplot2)
library(ggpubr)
set.seed(100)
dat <- data.frame(
x = rnorm(100, 1),
y = rnorm(100, 10),
z = rnorm(100, 25),
lev = gl(n = 2, k = 50, labels = letters[1:2])
)
test <- function(dat, x, y){
fmt <- "%s: Adj ~ R^2 == %.3f * ',' ~ {p == %.3f}"
mod1 <- lm(y ~ x, dat, subset = lev == "a")
sum1 <- summary(mod1)
lab1 <- sprintf(fmt, "a", sum1$adj.r.squared, coef(sum1)[2, 4])
mod2 <- lm(y ~ x, dat, subset = lev == "b")
sum2 <- summary(mod2)
lab2 <- sprintf(fmt, "b", sum2$adj.r.squared, coef(sum2)[2, 4])
colors <- 1:2
p <- ggplot(dat, aes(x = x, y = y, col = lev)) +
geom_jitter() +
geom_smooth(method = "lm") +
annotate("text", x = 2, y = c(12, 8), label = c(lab1, lab2),
parse = TRUE, hjust = 0, color = colors) +
scale_color_manual(values = colors)
return(p)
}
ggarrange(test(dat, x, z), test(dat, y, z))
There are several problems here:
x, y and lev are arguments to data.frame so they must be specified using = rather than <-
make use of the subset= argument in lm
use sprintf instead of paste to simplify the specification of labels
label the text strings a and b and make them the same color as the corresponding lines to identify which is which
the formula syntax needs to be corrected. See fmt below.
it would be clearer to use component names and accessor functions of the summary objects where available
use TRUE rather than T because the latter can be overridden if there is a variable called T but TRUE can never be overridden.
use hjust=0 and adjust the x= and y= in annotate to align the two text strings
combine the annotate statements
place the individual terms of the ggplot statement on separate lines for improved readability
This gives:
library(ggplot2)
set.seed(100)
dat <- data.frame(
x = rnorm(100, 1),
y = rnorm(100, 10),
lev = gl(n = 2, k = 50, labels = letters[1:2])
)
fmt <- "%s: Adj ~ R^2 == %.3f * ',' ~ {p == %.3f}"
mod1 <- lm(y ~ x, dat, subset = lev == "a")
sum1 <- summary(mod1)
lab1 <- sprintf(fmt, "a", sum1$adj.r.squared, coef(sum1)[2, 4])
mod2 <- lm(y ~ x, dat, subset = lev == "b")
sum2 <- summary(mod2)
lab2 <- sprintf(fmt, "b", sum2$adj.r.squared, coef(sum2)[2, 4])
colors <- 1:2
ggplot(dat, aes(x = x, y = y, col = lev)) +
geom_jitter() +
geom_smooth(method = "lm") +
annotate("text", x = 2, y = c(12, 8), label = c(lab1, lab2),
parse = TRUE, hjust = 0, color = colors) +
scale_color_manual(values = colors)
Unless I'm misunderstanding your question, the problem's with the parse = T arguments to your annotate calls. I don't think your strings need to be parsed. Try parse = F instead, or just drop the parameter, as the default value seems to be FALSE anyway

ggplot2: Releveling factors for sequentially added layers

I am writing a plotting method for class "foo". I would like this plot function to take multiple foo objects and plot them on the same graph.
The Code
#parabola function
parabolas <- function(x, parm) {
y <- parm[1]*(x^2)+parm[2]*x+parm[3]
return(y)
}
#make foo object
make_foo <- function(a, b, c) {
x <- runif(100, 0 , 20)
y <- parabolas(x = x, parm = c(a,b,c)) + rnorm(100, mean = 100 ,sd = 100)
foo <- list(data = data.frame(x = x, y = y), parameters = c(a,b,c))
class(foo) <- "foo"
return(foo)
}
#plot function
plot.foo <- function(x,
...,
labels) {
a <- ggplot(NULL, aes(x = x, y = y))
foo.list <- list(x, ...)
#browser()
#build plot
for(i in 1:length(foo.list)){
foo.obj <- foo.list[[i]]
foo.obj$data$lab <- factor(rep(labels[i], nrow(foo.obj$data)), levels = labels)
a <- a + geom_point(data = foo.obj$data, size = 5, alpha = .7, aes(color = lab))
a <- a + stat_function(data = foo.obj$data,
fun = parabolas,
args = list(parm = foo.obj$parameters), size = 1.2)
}
return(a)
}
The Problem
ggplot will relevel the factor levels of lab according to the alphabetical order of the factor labels. I do not know how to choose the factor level order for lab when adding these layers sequentially. I would like for the first element of labels to correspond to the first foo object plotted, and the second element to correspond to the second foo object, and so forth and so forth.
foo1 <- make_foo(2, 10, 3)
foo2 <- make_foo(-6, -3, 2000)
plot(foo1, foo2, labels = c("obj1","obj2"))
#label for foo1 is "obj1" and label for foo2 is "obj2"
plot(foo1, foo2, labels = c("obj3","obj2"))
#label for foo1 should be "obj3" and label for foo2 should be "obj2"
The motivation
The reason I structure the plot function like this as opposed to binding the data frames together and assigning the correct factor levels to lab is because in that particular case, facet_wrap and stat_function do not work well together. After applying multiple stat_function and using facet_wrap together, all curves will appear in each panel. This thread illustrates a similar problem.
Because I have these different layers limited to different data sets, facet_wrap will correctly facet each stat_function plot according to the data/parameters used to draw it.
plot(foo1, foo2, labels = c("z","a")) + facet_wrap(~lab, scales = "free")
#Shows facet_wrap works as intended but the labels for foo1 and foo2 are
#still not in the intended order
You can manually override the order of the color scale by setting the limits. Here is how:
plot.foo <- function(x,
...,
labels) {
a <- ggplot(NULL, aes(x = x, y = y))
foo.list <- list(x, ...)
#browser()
#build plot
for(i in 1:length(foo.list)){
foo.obj <- foo.list[[i]]
foo.obj$data$lab <- factor(rep(labels[i], nrow(foo.obj$data)), levels = labels)
a <- a + geom_point(data = foo.obj$data, size = 5, alpha = .7, aes(color = lab))
a <- a + stat_function(data = foo.obj$data,
fun = parabolas,
args = list(parm = foo.obj$parameters), size = 1.2)
}
### added line:
a <- a + scale_color_discrete(limits = labels)
###
return(a)
}

Store `ggplot` call in a `data.frame` (or alternative) and then evaluate it

I'd like to store ggplot calls in a data.frame (or alternative) and evaluate it later.
An example:
define <- function(df, call) {
df[nrow(df) + 1, ] <- call
df
}
plot <- function(df, index) {
eval(parse(text = df$plots[index]))
}
df <- data.frame(plots = character(0), stringsAsFactors = FALSE)
df <- define(df, "ggplot() + geom_segment(aes(x = 1, y = 1, xend = 2, yend = 2))")
df <- define(df, "ggplot() + geom_segment(aes(x = 1, y = 2, xend = 2, yend = 1))")
plot(df, 1)
plot(df, 2)
This sort of works and plots:
But there are some issues:
I'd like define the calls without '"' signs. Like define(df, ggplot() + geom_..).
I'd rather store the calls as call objects.
How can I achieve this?
I would use a list for this case. data.frames are a bit tricky when storing unevaluated content (because underneath they contain more information than just the call). Lists are more versatile (and easier to use in this context):
#saves the unevaluated call
define <- function(mylist, call) {
mylist[[length(mylist) + 1]] <- substitute(call)
mylist
}
#evaluates the call
ploteval <- function(mylist, index) {
eval(mylist[[index]])
}
mylist <- list()
mylist <- define(mylist, ggplot() + geom_segment(aes(x = 1, y = 1, xend = 2, yend = 2)))
mylist <- define(mylist, ggplot() + geom_segment(aes(x = 1, y = 2, xend = 2, yend = 1)))
ploteval(mylist, 1)
ploteval(mylist, 2)
This will work.
As a short explanation, substitute will store the unevaluated call, which will then be evaluated with ploteval. It is also not a good idea to overwrite plot so I gave it a new name ploteval.
We could also capture the expression as expression (enexpr) and store as string
define <- function(df, call) {
df[nrow(df) + 1, ] <- rlang::as_label(rlang::enexpr(call))
df
}
plot <- function(df, index) {
eval(parse(text = df[["plots"]][index]))
}
df <- data.frame(plots = character(0), stringsAsFactors = FALSE)
df <- define(df, ggplot() + geom_segment(aes(x = 1, y = 1, xend = 2, yend = 2)))
df <- define(df, ggplot() + geom_segment(aes(x = 1, y = 2, xend = 2, yend = 1)))
plot(df, 1)
plot(df, 2)

How to write a ggplot '+'-pipeable function that can refer to the input plot

I'm trying to write a function that can be called using the '+'-based ggplot2 syntax.
myplot + myfunction
Specifically, the function I'm writing symmetrizes the y-axis about zero, so it needs to determine the y-axis range for the input plot.
So let,
ylim_sym <- function(p){
get_y_range <- function(p){
ggplot2::ggplot_build(p)$layout$panel_ranges[[1]]$y.range
}
max_offset <- max(abs(get_y_range(p)))
p + ylim(- max_offset, max_offset)
}
With this function, the following works:
qplot(x = 1:10, y = exp(rnorm(10))) %>% ylim_sym()
But this doesn't work because of some precedence issue between +.gg and %>%:
qplot(x = 1:10, y = exp(rnorm(10))) +
geom_abline(slope = 0) %>%
ylim_sym()
(I could write the latter (all_my_ggplot_pipeline) %>% ylim_sym() but it's pretty ugly syntax).
Ideally, I'd like to be able to write ylim_sym such that it can be piped like so,
qplot(x = 1:10, y = exp(rnorm(10))) + ylim_sym()
but I can't work out how to access the plot on the LHS of + within ylim_sym
Any ideas?
I was able to solve it by doing the following.
StatSymYLim <- ggproto(
"StatSymYLim", Stat,
compute_group = function(data, scales) {
out <- data.frame(
x = median(data$x),
y = c(-1, 1) * max(abs(data$y))
)
out
},
required_aes = c("x", "y")
)
ylim_sym <- function(...){
geom_blank(..., stat = StatSymYLim)
}
Then the following works as required:
qplot(x = 1:10, y = exp(rnorm(10))) +
geom_abline(slope = 0) +
ylim_sym()
My understanding of ggplot2 internals is pretty shaky to be fair, so this might be a naive solution.
Note: your function needs an update as the structure of the object has slightly changed
Using package ggfun this would work:
# devtools::install_github("moodymudskipper/ggfun")
library(ggfun)
ylim_sym <- function(p){
get_y_range <- function(p){
ggplot2::ggplot_build(p)$layout$panel_params[[1]]$y.range
}
max_offset <- max(abs(get_y_range(p)))
p + ylim(- max_offset, max_offset)
}
qplot(x = 1:10, y = exp(rnorm(10))) +
geom_abline(slope = 0) +
ylim_sym

Resources