Context: I have a function that takes an object created with stats::lm() as its main argument. The goal of this function is to make ggplots with only this lm object. Warning: variables used in the model are NOT arguments of the function i.e. if the model is lmobj <- lm(y ~ x, data = df) then the function only takes lmobj as an argument. Indeed it is different from question like this one. Moreover, I am not looking for "ggplot only" solutions that take the raw data and compute regression line and scatterplot (e.g. ggplot2::geom_smooth(method = "lm")).
Problem: ggplot() geom functions have a x and y argument that require unquoted variables (see reference manual); how can I recover these from lmobj?
Expected output:
library(ggplot2)
lmobj <- lm(Petal.Width ~ Petal.Length, data = iris)
myfun <- function(.lm) {
# make a scatterplot with .lm using ggplot
ggplot(data = .lm[["model"]], aes(x = Petal.Width, y = Petal.Length)) +
geom_point()
}
myfun(lmobj)
Trials and errors
I tried to grab an unquoted variable name from lmobject using cat():
> cat(names(lmobj[["model"]][2]))
Petal.Length
But it creates an error:
> myfuntest <- function(.lm) {
+ # make a scatterplot with .lm using ggplot
+ ggplot(data = .lm[["model"]], aes(x = cat(names(.lm[["model"]][2])),
+ y = cat(names(.lm[["model"]][1])))) +
+ geom_point()
+ }
> myfuntest(lmobj)
Petal.LengthPetal.WidthPetal.LengthPetal.WidthError: geom_point requires the following missing aesthetics: x and y
The following works:
myfun <- function(model) {
coefs <- names(model$model)
ggplot(data = model$model) +
aes(x = !! rlang::sym(coefs[1L]), y = !! rlang::sym(coefs[2L]))) +
geom_point()
}
The relevant point here is that aes uses ‘rlang’s tidy evaluation and as such requires the arguments to be injected via !! as names.
One way to do it is to evaluate first the arguments of aes as symbols and then call aes by wrapping it into a do.call
myfun <- function(.lm) {
ggplot(data = .lm[["model"]],
do.call(aes, list(x = sym(names(.lm[["model"]])[2]),
y = sym(names(.lm[["model"]])[1])))) +
geom_point()
}
Related
This question already has answers here:
How to use a variable to specify column name in ggplot
(6 answers)
Closed last month.
I would like to create a function that takes a column name and creates a plot based on that. For example, I want to be able to plot the Petal.Length column of the iris dataset against other variables. The way to do it without indirection is
ggplot(data = iris) + geom_point(x = Petal.Width, y = Petal.Length)
This is the plot I would like to replicate through indirection, but none of the following attempts work. These two return similar undesired plots:
colname = "Petal.Width"
ggplot(data = iris) + geom_point(x = colname, y = Petal.Length)
ggplot(data = iris) + geom_point(x = {{colname}}, y = Petal.Length)
The following attempt does not work either, it returns an error:
ggplot(data = iris) + geom_point(aes(x = !!!rlang::syms(colname), y = Petal.Length))
#> Warning in geom_point(aes(x = !!!rlang::syms(colname_x), y = Petal.Length)):
#> Ignoring unknown aesthetics:
#> Error in `geom_point()`:
#> ! Problem while setting up geom.
#> ℹ Error occurred in the 1st layer.
#> Caused by error in `compute_geom_1()`:
#> ! `geom_point()` requires the following missing aesthetics: x
Any hint on how we could do this? The idea is to have a function that is able to plot that kind of chart just by giving a string corresponding to one x variable that appears in the dataset.
Using the .data pronoun and by wrapping inside aes() you could do:
library(ggplot2)
colname <- "Petal.Width"
ggplot(data = iris) +
geom_point(aes(x = .data[[colname]], y = Petal.Length))
Your plot #2 is invariant in x because it takes "Petal.Width" as a literal value (as in data.frame(x="Petal.Width")), obviously not what you intend.
There are a few ways to work with programmatic variables:
We can use the .data pronoun in ggplot, as in
var1 <- "mpg"
var2 <- "disp"
ggplot(mtcars, aes(x = .data[[var1]], y = .data[[var2]])) +
geom_point()
We can use rlang::quo and !! for interactive work:
x <- rlang::quo(mpg)
y <- rlang::quo(disp)
ggplot(mtcars, aes(x = !!x, y = !!y)) +
geom_point()
If in a function, we can enquo (and !!) them:
fun <- function(data, x, y) {
x <- rlang::enquo(x)
y <- rlang::enquo(y)
ggplot(data, aes(!!x, !!y)) +
geom_point()
}
fun(mtcars, mpg, disp)
another (more clumsy) approach with base R:
colname = "Petal.Width"
ggplot(data = iris) +
geom_point(aes(x = eval(parse(text = colname)), y = Petal.Length)
)
I would like to use a variable of the dataframe passed to the data parameter of function the ggplot in another ggplot2 function in the same call.
For instance, in the following example I want to refer to the variable x in the dataframe passed to the data parameter in ggplot in another function scale_x_continuous such as in:
library(ggplot2)
set.seed(2017)
samp <- sample(x = 20, size= 1000, replace = T)
ggplot(data = data.frame(x = samp), mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(x), max(x)))
And I get the error :
Error in seq(min(x)) : object 'x' not found
which I understand. Of course I can avoid the problem by doing :
df <- data.frame(x = samp)
ggplot(data = df, mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(df$x), max(df$x)))
but I don't want to be forced to define the object df outside the call to ggplot. I want to be able to directly refer to the variables in the dataframe I passed in data.
Thanks a lot
The scale_x_continuous function does not evaluate it's parameters in the data environment. One reason for this is that each layer can have it's own data source so by the time you got to the scales it wouldn't be clear which data environment is the "correct" one any more.
You could write a helper function to initialize the plot with your default. For example
helper <- function(df, col) {
ggplot(data = df, mapping = aes_string(x = col)) +
scale_x_continuous(breaks = seq(min(df[[col]]), max(df[[col]])))
}
and then call
helper(data.frame(x = samp), "x") + geom_bar()
Or you could write a wrapper around just the scale part. For example
scale_x_custom <- function(x) {
scale_x_continuous(breaks = seq(min(x) , max(x)))
}
and then you can add your custom scale to your plot
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom(df$x)
Or since you just want breaks at integer values, you can calculate the breaks from the default limits without needed to actually specify the data. For example
scale_x_custom <- function() {
scale_x_continuous(expand=expansion(0, .3),
breaks = function(x) {
seq(ceiling(min(x)), floor(max(x)))
})
}
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom()
Another less than ideal alternative would be to utilize the . special symbol in combination with {} which is imported from magrittr.
Enclosing the ggplot call in curly brackets allows one to reference . multiple times.
data.frame(x = samp) %>%
{ggplot(data = ., mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(.$x), max(.$x)))}
I am teaching a statistics course where I'm trying to gently introduce my students to R syntax (specifically, ggplot). To do so, I have created wrapper functions for many basic commmands. For example:
basic.plot.function = function(x,y, data=d){
p = ggplot(data, aes_string(x=x, y=y)) + geom_point() + geom_smooth()
print(p)
#dput(p) # this function isn't doing what I want it to
}
I want the function to output the plot (which is what print(p) does), but I also want it to write to the console the actual code used to create this. In other words, if the user types:
mydata = data.frame(x1 = runif(100), x2 = runif(100))
basic.plot.function("x1","x2", data=mydata)
I want it to output:
ggplot(mydata, aes_string(x="x1", y="x2")) + geom_point() + geom_smooth()
Any ideas how I can do that?
Solution
library(ggplot2)
basic.plot.function = function(x, y, data = d){
call <- paste0('ggplot(', deparse(substitute(data)), ', aes_string(x=',
deparse(substitute(x)), ', y=', deparse(substitute(y)),
')) + geom_point() + geom_smooth()')
p <- eval(parse(text = call))
print(p)
print(call)
}
Example
data("iris")
basic.plot.function('Sepal.Length', 'Sepal.Width', iris)
> basic.plot.function('Sepal.Length', 'Sepal.Width', iris)
`geom_smooth()` using method = 'loess'
[1] "ggplot(iris, aes_string(x=\"Sepal.Length\", y=\"Sepal.Width\")) + geom_point() + geom_smooth()"
Explanation
desparse(substitute(x)) converts an argument x into a string. You can use that to make a string of the function call to print when printing your ggplot object. You can use eval(parse()) to evaluate that string to make your ggplot object.
I would like to use a variable of the dataframe passed to the data parameter of function the ggplot in another ggplot2 function in the same call.
For instance, in the following example I want to refer to the variable x in the dataframe passed to the data parameter in ggplot in another function scale_x_continuous such as in:
library(ggplot2)
set.seed(2017)
samp <- sample(x = 20, size= 1000, replace = T)
ggplot(data = data.frame(x = samp), mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(x), max(x)))
And I get the error :
Error in seq(min(x)) : object 'x' not found
which I understand. Of course I can avoid the problem by doing :
df <- data.frame(x = samp)
ggplot(data = df, mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(df$x), max(df$x)))
but I don't want to be forced to define the object df outside the call to ggplot. I want to be able to directly refer to the variables in the dataframe I passed in data.
Thanks a lot
The scale_x_continuous function does not evaluate it's parameters in the data environment. One reason for this is that each layer can have it's own data source so by the time you got to the scales it wouldn't be clear which data environment is the "correct" one any more.
You could write a helper function to initialize the plot with your default. For example
helper <- function(df, col) {
ggplot(data = df, mapping = aes_string(x = col)) +
scale_x_continuous(breaks = seq(min(df[[col]]), max(df[[col]])))
}
and then call
helper(data.frame(x = samp), "x") + geom_bar()
Or you could write a wrapper around just the scale part. For example
scale_x_custom <- function(x) {
scale_x_continuous(breaks = seq(min(x) , max(x)))
}
and then you can add your custom scale to your plot
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom(df$x)
Or since you just want breaks at integer values, you can calculate the breaks from the default limits without needed to actually specify the data. For example
scale_x_custom <- function() {
scale_x_continuous(expand=expansion(0, .3),
breaks = function(x) {
seq(ceiling(min(x)), floor(max(x)))
})
}
ggplot(data = df, mapping = aes(x = x)) +
geom_bar() +
scale_x_custom()
Another less than ideal alternative would be to utilize the . special symbol in combination with {} which is imported from magrittr.
Enclosing the ggplot call in curly brackets allows one to reference . multiple times.
data.frame(x = samp) %>%
{ggplot(data = ., mapping = aes(x = x)) + geom_bar() +
scale_x_continuous(breaks = seq(min(.$x), max(.$x)))}
would like to create a function that generates graphs using ggplot. For the sake of simplicity, the typical graph may be
ggplot(car, aes(x=speed, y=dist)) + geom_point()
The function I would like to create is of the type
f <- function(DS, x, y) ggplot(DS, aes(x=x, y=y)) + geom_point()
This however won't work, since x and y are not strings. This problem has been noted in previous SO questions (e.g., this one), but without providing, in my view, a satisfactory answer. How would one modify the function above to make it work with arbitrary data frames?
One solution would be to pass x and y as string names of columns in data frame DS.
f <- function(DS, x, y) {
ggplot(DS, aes_string(x = x, y = y)) + geom_point()
}
And then call the function as:
f(cars, "speed", "dist")
However, it seems that you don't want that? Can you provide an example why you would need different functionality? Is it because you don't want to have the arguments in the same data frame?
I think it's possible the following type of codes, which only build the aes component.
require(ggplot2)
DS <- data.frame(speed=rnorm(10), dist=rnorm(10))
f <- function(DS, x, y, geom, opts=NULL) {
aes <- eval(substitute(aes(x, y),
list(x = substitute(x), y = substitute(y))))
p <- ggplot(DS, aes) + geom + opts
}
p <- f(DS, speed, dist, geom_point())
p
However, it seems to be complicated approach.
Another option is to use do.call. Here is a one line copy paste from a working code:
gg <- gg + geom_rect( do.call(aes, args=list(xmin=xValues-0.5, xmax=xValues+0.5, ymin=yValues, ymax=rep(Inf, length(yValues))) ), alpha=0.2, fill=colors )
One approach that I can think of is using match.call() to reach the variable names contained by the parameters/arguments passed to the custom plotting function and then use eval() on them. In this way you avoid passing them as quoted to your custom function, if you do not like that.
library(ggplot2)
fun <- function(df, x, y) {
arg <- match.call()
ggplot(df, aes(x = eval(arg$x), y = eval(arg$y))) + geom_point()
}
fun(mpg, cty, hwy) # no need to pass the variables (column names) as quoted / as strings