I am teaching a statistics course where I'm trying to gently introduce my students to R syntax (specifically, ggplot). To do so, I have created wrapper functions for many basic commmands. For example:
basic.plot.function = function(x,y, data=d){
p = ggplot(data, aes_string(x=x, y=y)) + geom_point() + geom_smooth()
print(p)
#dput(p) # this function isn't doing what I want it to
}
I want the function to output the plot (which is what print(p) does), but I also want it to write to the console the actual code used to create this. In other words, if the user types:
mydata = data.frame(x1 = runif(100), x2 = runif(100))
basic.plot.function("x1","x2", data=mydata)
I want it to output:
ggplot(mydata, aes_string(x="x1", y="x2")) + geom_point() + geom_smooth()
Any ideas how I can do that?
Solution
library(ggplot2)
basic.plot.function = function(x, y, data = d){
call <- paste0('ggplot(', deparse(substitute(data)), ', aes_string(x=',
deparse(substitute(x)), ', y=', deparse(substitute(y)),
')) + geom_point() + geom_smooth()')
p <- eval(parse(text = call))
print(p)
print(call)
}
Example
data("iris")
basic.plot.function('Sepal.Length', 'Sepal.Width', iris)
> basic.plot.function('Sepal.Length', 'Sepal.Width', iris)
`geom_smooth()` using method = 'loess'
[1] "ggplot(iris, aes_string(x=\"Sepal.Length\", y=\"Sepal.Width\")) + geom_point() + geom_smooth()"
Explanation
desparse(substitute(x)) converts an argument x into a string. You can use that to make a string of the function call to print when printing your ggplot object. You can use eval(parse()) to evaluate that string to make your ggplot object.
Related
I am trying to convert my use of ggplot2 in functions to using tidy evaluation so as to avoid the warning messages that are evaluated. I particular, I have extensively used aes_string() and these need to be converted to aes(). I can handle cases when just the name of a column is passed as a characater. However, I have been unable to work how to deal with the case when the character is a mathematical expression.
Here is a small reproducible example of the problem that I an trying to solve.
library(ggplot2)
set.seed(1)
dat <- data.frame(x=rnorm(100),y=rnorm(100))
xvar <- 'x+100'
yvar <- 'y'
#This works but uses the deprecated aes_string
ggplot(dat,aes_string(x=xvar,y=yvar))
#This works
ggplot(dat, ggplot2::aes(x=x+100, y=.data[[yvar]])) + geom_point()
#This does not work
ggplot(dat, aes(x={{xvar}}, y=.data[[yvar]])) + geom_point()
My question is what tidy evaluation techniques do I need to employ to use xvar to specify the x variable as is possible with aes_string()?
You could use eval(str2expression()):
library(ggplot2)
ggplot(dat, aes(x = eval(str2expression(xvar)), y = eval(str2expression(yvar)))) +
geom_point() +
labs(x = xvar, y = yvar)
Or using the analogous rlang functions:
library(rlang)
ggplot(dat, aes(x = eval_tidy(parse_expr(xvar)), y = eval_tidy(parse_expr(yvar)))) +
geom_point() +
labs(x = xvar, y = yvar)
Note you’ll want to manually set the axis labels using labs(); otherwise you’ll end up with e.g. "eval(str2expression(xvar))" for the x axis.
It may need parse_expr/eval
library(ggplot2)
ggplot(dat, aes(x=eval(rlang::parse_expr(xvar)), y=.data[[yvar]])) +
geom_point() +
xlab(xvar)
-output
Or another option would be to interpolate and do the eval/parse
eval(parse(text = glue::glue("ggplot(dat, aes(x = {xvar}, y = {yvar}))",
"+ geom_point()")))
Context: I have a function that takes an object created with stats::lm() as its main argument. The goal of this function is to make ggplots with only this lm object. Warning: variables used in the model are NOT arguments of the function i.e. if the model is lmobj <- lm(y ~ x, data = df) then the function only takes lmobj as an argument. Indeed it is different from question like this one. Moreover, I am not looking for "ggplot only" solutions that take the raw data and compute regression line and scatterplot (e.g. ggplot2::geom_smooth(method = "lm")).
Problem: ggplot() geom functions have a x and y argument that require unquoted variables (see reference manual); how can I recover these from lmobj?
Expected output:
library(ggplot2)
lmobj <- lm(Petal.Width ~ Petal.Length, data = iris)
myfun <- function(.lm) {
# make a scatterplot with .lm using ggplot
ggplot(data = .lm[["model"]], aes(x = Petal.Width, y = Petal.Length)) +
geom_point()
}
myfun(lmobj)
Trials and errors
I tried to grab an unquoted variable name from lmobject using cat():
> cat(names(lmobj[["model"]][2]))
Petal.Length
But it creates an error:
> myfuntest <- function(.lm) {
+ # make a scatterplot with .lm using ggplot
+ ggplot(data = .lm[["model"]], aes(x = cat(names(.lm[["model"]][2])),
+ y = cat(names(.lm[["model"]][1])))) +
+ geom_point()
+ }
> myfuntest(lmobj)
Petal.LengthPetal.WidthPetal.LengthPetal.WidthError: geom_point requires the following missing aesthetics: x and y
The following works:
myfun <- function(model) {
coefs <- names(model$model)
ggplot(data = model$model) +
aes(x = !! rlang::sym(coefs[1L]), y = !! rlang::sym(coefs[2L]))) +
geom_point()
}
The relevant point here is that aes uses ‘rlang’s tidy evaluation and as such requires the arguments to be injected via !! as names.
One way to do it is to evaluate first the arguments of aes as symbols and then call aes by wrapping it into a do.call
myfun <- function(.lm) {
ggplot(data = .lm[["model"]],
do.call(aes, list(x = sym(names(.lm[["model"]])[2]),
y = sym(names(.lm[["model"]])[1])))) +
geom_point()
}
How do I pass multiple arguments through to my ggplot function?
Here is an example of the plot I want to automate.
library(ggplot2)
library(scales)
p <- ggplot(diamonds, aes(x=cut, y=price) ) +
geom_boxplot() +
scale_y_continuous(labels = dollar)
p
But I want to graph multiple different variables and use the appropriate scale e.g. price, depth etc, some are in dollars.
So I made a function
myfunction <- function(var1,var2){
p <- ggplot(diamonds, aes(x=cut, y= var1) ) +
geom_boxplot() +
scale_y_continuous(labels = var2)
p
return(p)
}
When I test the function, it doesn't work. Both arguments cause different errors on their own.
myfunction("price","dollar")
For var1 I get:
Error: Discrete value supplied to continuous scale
and var2:
Error in f(..., self = self) : Breaks and labels are different lengths
Question 1. Why doesn't that work? This is the most important question for me.
I then wish to make multiple graphs, which I can do with a for loop, but I keep hearing I should do it with apply. Here's what I tried.
Question 2. How would you make the multiple graphs work with apply?
FirstPlotData <- c("price","dollar")
SecondPlotData <- c("depth", "comma")
plotMetaData <- data.frame(FirstPlotData,SecondPlotData)
lapply doesn't work for me with multiple arguments. Can it pass multiple arguments?
lapply(plotMetaData, function(avar,bvar)myfunction(avar, bvar))
Would mapply work? How?
mapply(mytestfunction,plotMetaData[1,],plotMetaDataList[2,])
Thanks in advance. I note that I could do the multiple graphs with facet, but for my more complex example, with hiding outliers, scaling, and also doing stats, then doing the multiple plots and putting in a {cowplot} grid seems easier.
Try this
library(ggplot2)
library(scales)
library(rlang) # for sym
myfunction <- function(var1,var2){
p <- ggplot(diamonds, aes(x=cut, y= !! sym(var1)) ) +
geom_boxplot() +
scale_y_continuous(labels = get(var2))
p
return(p)
}
myfunction('price','dollar')
You probably want aes_string. This function has been designed to make programming with ggplot easier (similar ideas have also been applied to dplyr commands). The following works:
library(tidyverse)
data(diamonds)
myfunction <- function(var1){
p <- ggplot(diamonds, aes_string(x="cut", y= var1) ) +
geom_boxplot()
p
return(p)
}
myfunction("price")
Why?
contrast the following:
# works
ggplot(diamonds, aes(x=cut, y= price) ) + geom_boxplot()
# these 2 are equivalent, but do not work
ggplot(diamonds, aes(x=cut, y= "price") ) + geom_boxplot()
var1 = "price"
ggplot(diamonds, aes(x=cut, y= var1) ) + geom_boxplot()
# these 2 are equivalent, both works but inputs are strings
ggplot(diamonds, aes_string(x="cut", y= "price") ) + geom_boxplot()
var1 = "price"
ggplot(diamonds, aes_string(x="cut", y= var1) ) + geom_boxplot()
Using apply?
For this purpose I would be inclined to use loops (others are welcome to disagree). If you are set on using an apply approach then you probably want apply as lapply, mapply, vapply and sapply are list-, multivariate-, vector- and simple-apply respectively.
A more ggplot way of doing this now, is using .data pronoun.
library(ggplot2)
myfunction <- function(var1, var2) {
p <- ggplot(diamonds, aes(x = cut, y = .data[[var1]])) +
geom_boxplot() +
scale_y_continuous(
labels = getFromNamespace(x = var2, ns = "scales")
)
p
return(p)
}
myfunction("price", "dollar")
myfunction("price", "comma")
Then to create multiple plots with these function by passing multiple arguments, a better and tidier approach is using map functions from the {purrr}
plots <- purrr::map2(
.x = c("price", "price"),
.y = c("dollar", "comma"),
.f = myfunction
)
So, plots[[1]] contains the 1st plot with var1 = "price" and var2 = "dollar" and plots[[2]] contains the 2nd plot with var1 = "price" and var2 = "comma".
would like to create a function that generates graphs using ggplot. For the sake of simplicity, the typical graph may be
ggplot(car, aes(x=speed, y=dist)) + geom_point()
The function I would like to create is of the type
f <- function(DS, x, y) ggplot(DS, aes(x=x, y=y)) + geom_point()
This however won't work, since x and y are not strings. This problem has been noted in previous SO questions (e.g., this one), but without providing, in my view, a satisfactory answer. How would one modify the function above to make it work with arbitrary data frames?
One solution would be to pass x and y as string names of columns in data frame DS.
f <- function(DS, x, y) {
ggplot(DS, aes_string(x = x, y = y)) + geom_point()
}
And then call the function as:
f(cars, "speed", "dist")
However, it seems that you don't want that? Can you provide an example why you would need different functionality? Is it because you don't want to have the arguments in the same data frame?
I think it's possible the following type of codes, which only build the aes component.
require(ggplot2)
DS <- data.frame(speed=rnorm(10), dist=rnorm(10))
f <- function(DS, x, y, geom, opts=NULL) {
aes <- eval(substitute(aes(x, y),
list(x = substitute(x), y = substitute(y))))
p <- ggplot(DS, aes) + geom + opts
}
p <- f(DS, speed, dist, geom_point())
p
However, it seems to be complicated approach.
Another option is to use do.call. Here is a one line copy paste from a working code:
gg <- gg + geom_rect( do.call(aes, args=list(xmin=xValues-0.5, xmax=xValues+0.5, ymin=yValues, ymax=rep(Inf, length(yValues))) ), alpha=0.2, fill=colors )
One approach that I can think of is using match.call() to reach the variable names contained by the parameters/arguments passed to the custom plotting function and then use eval() on them. In this way you avoid passing them as quoted to your custom function, if you do not like that.
library(ggplot2)
fun <- function(df, x, y) {
arg <- match.call()
ggplot(df, aes(x = eval(arg$x), y = eval(arg$y))) + geom_point()
}
fun(mpg, cty, hwy) # no need to pass the variables (column names) as quoted / as strings
would like to create a function that generates graphs using ggplot. For the sake of simplicity, the typical graph may be
ggplot(car, aes(x=speed, y=dist)) + geom_point()
The function I would like to create is of the type
f <- function(DS, x, y) ggplot(DS, aes(x=x, y=y)) + geom_point()
This however won't work, since x and y are not strings. This problem has been noted in previous SO questions (e.g., this one), but without providing, in my view, a satisfactory answer. How would one modify the function above to make it work with arbitrary data frames?
One solution would be to pass x and y as string names of columns in data frame DS.
f <- function(DS, x, y) {
ggplot(DS, aes_string(x = x, y = y)) + geom_point()
}
And then call the function as:
f(cars, "speed", "dist")
However, it seems that you don't want that? Can you provide an example why you would need different functionality? Is it because you don't want to have the arguments in the same data frame?
I think it's possible the following type of codes, which only build the aes component.
require(ggplot2)
DS <- data.frame(speed=rnorm(10), dist=rnorm(10))
f <- function(DS, x, y, geom, opts=NULL) {
aes <- eval(substitute(aes(x, y),
list(x = substitute(x), y = substitute(y))))
p <- ggplot(DS, aes) + geom + opts
}
p <- f(DS, speed, dist, geom_point())
p
However, it seems to be complicated approach.
Another option is to use do.call. Here is a one line copy paste from a working code:
gg <- gg + geom_rect( do.call(aes, args=list(xmin=xValues-0.5, xmax=xValues+0.5, ymin=yValues, ymax=rep(Inf, length(yValues))) ), alpha=0.2, fill=colors )
One approach that I can think of is using match.call() to reach the variable names contained by the parameters/arguments passed to the custom plotting function and then use eval() on them. In this way you avoid passing them as quoted to your custom function, if you do not like that.
library(ggplot2)
fun <- function(df, x, y) {
arg <- match.call()
ggplot(df, aes(x = eval(arg$x), y = eval(arg$y))) + geom_point()
}
fun(mpg, cty, hwy) # no need to pass the variables (column names) as quoted / as strings