Use of ggplot() within another function in R - r

I'm trying to write a simple plot function, using the ggplot2 library. But the call to ggplot doesn't find the function argument.
Consider a data.frame called means that stores two conditions and two mean values that I want to plot (condition will appear on the X axis, means on the Y).
library(ggplot2)
m <- c(13.8, 14.8)
cond <- c(1, 2)
means <- data.frame(means=m, condition=cond)
means
# The output should be:
# means condition
# 1 13.8 1
# 2 14.8 2
testplot <- function(meansdf)
{
p <- ggplot(meansdf, aes(fill=meansdf$condition, y=meansdf$means, x = meansdf$condition))
p + geom_bar(position="dodge", stat="identity")
}
testplot(means)
# This will output the following error:
# Error in eval(expr, envir, enclos) : object 'meansdf' not found
So it seems that ggplot is calling eval, which can't find the argument meansdf. Does anyone know how I can successfully pass the function argument to ggplot?
(Note: Yes I could just call the ggplot function directly, but in the end I hope to make my plot function do more complicated stuff! :) )

The "proper" way to use ggplot programmatically is to use aes_string() instead of aes() and use the names of the columns as characters rather than as objects:
For more programmatic uses, for example if you wanted users to be able to specify column names for various aesthetics as arguments, or if this function is going in a package that needs to pass R CMD CHECK without warnings about variable names without definitions, you can use aes_string(), with the columns needed as characters.
testplot <- function(meansdf, xvar = "condition", yvar = "means",
fillvar = "condition") {
p <- ggplot(meansdf,
aes_string(x = xvar, y= yvar, fill = fillvar)) +
geom_bar(position="dodge", stat="identity")
}

As Joris and Chase have already correctly answered, standard best practice is to simply omit the meansdf$ part and directly refer to the data frame columns.
testplot <- function(meansdf)
{
p <- ggplot(meansdf,
aes(fill = condition,
y = means,
x = condition))
p + geom_bar(position = "dodge", stat = "identity")
}
This works, because the variables referred to in aes are looked for either in the global environment or in the data frame passed to ggplot. That is also the reason why your example code - using meansdf$condition etc. - did not work: meansdf is neither available in the global environment, nor is it available inside the data frame passed to ggplot, which is meansdf itself.
The fact that the variables are looked for in the global environment instead of in the calling environment is actually a known bug in ggplot2 that Hadley does not consider fixable at the moment.
This leads to problems, if one wishes to use a local variable, say, scale, to influence the data used for the plot:
testplot <- function(meansdf)
{
scale <- 0.5
p <- ggplot(meansdf,
aes(fill = condition,
y = means * scale, # does not work, since scale is not found
x = condition))
p + geom_bar(position = "dodge", stat = "identity")
}
A very nice workaround for this case is provided by Winston Chang in the referenced GitHub issue: Explicitly setting the environment parameter to the current environment during the call to ggplot.
Here's what that would look like for the above example:
testplot <- function(meansdf)
{
scale <- 0.5
p <- ggplot(meansdf,
aes(fill = condition,
y = means * scale,
x = condition),
environment = environment()) # This is the only line changed / added
p + geom_bar(position = "dodge", stat = "identity")
}
## Now, the following works
testplot(means)

Here is a simple trick I use a lot to define my variables in my functions environment (second line):
FUN <- function(fun.data, fun.y) {
fun.data$fun.y <- fun.data[, fun.y]
ggplot(fun.data, aes(x, fun.y)) +
geom_point() +
scale_y_continuous(fun.y)
}
datas <- data.frame(x = rnorm(100, 0, 1),
y = x + rnorm(100, 2, 2),
z = x + rnorm(100, 5, 10))
FUN(datas, "y")
FUN(datas, "z")
Note how the y-axis label also changes when different variables or data-sets are used.

I don't think you need to include the meansdf$ part in your function call itself. This seems to work on my machine:
meansdf <- data.frame(means = c(13.8, 14.8), condition = 1:2)
testplot <- function(meansdf)
{
p <- ggplot(meansdf, aes(fill=condition, y=means, x = condition))
p + geom_bar(position="dodge", stat="identity")
}
testplot(meansdf)
to produce:

This is an example of a problem that is discussed earlier. Basically, it comes down to ggplot2 being coded for use in the global environment mainly. In the aes() call, the variables are looked for either in the global environment or within the specified dataframe.
library(ggplot2)
means <- data.frame(means=c(13.8,14.8),condition=1:2)
testplot <- function(meansdf)
{
p <- ggplot(meansdf, aes(fill=condition,
y=means, x = condition))
p + geom_bar(position="dodge", stat="identity")
}
EDIT:
update: After seeing the other answer and updating the ggplot2 package, the code above works. Reason is, as explained in the comments, that ggplot will look for the variables in aes in either the global environment (when the dataframe is specifically added as meandf$... ) or within the mentioned environment.
For this, be sure you work with the latest version of ggplot2.

If is important to pass the variables (column names) to the custom plotting function unquoted, while different variable names are used within the function, then another workaround that I tried, was to make use of match.call() and eval (like here as well):
library(ggplot2)
meansdf <- data.frame(means = c(13.8, 14.8), condition = 1:2)
testplot <- function(df, x, y) {
arg <- match.call()
scale <- 0.5
p <- ggplot(df, aes(x = eval(arg$x),
y = eval(arg$y) * scale,
fill = eval(arg$x)))
p + geom_bar(position = "dodge", stat = "identity")
}
testplot(meansdf, condition, means)
Created on 2019-01-10 by the reprex package (v0.2.1)
Another workaround, but with passing quoted variables to the custom plotting function is using get():
meansdf <- data.frame(means = c(13.8, 14.8), condition = 1:2)
testplot <- function(df, x, y) {
scale <- 0.5
p <- ggplot(df, aes(x = get(x),
y = get(y) * scale,
fill = get(x)))
p + geom_bar(position = "dodge", stat = "identity")
}
testplot(meansdf, "condition", "means")
Created on 2019-01-10 by the reprex package (v0.2.1)

This frustrated me for some time. I wanted to send different data frames with different variable names and I wanted the ability to plot different columns from the data frame. I finally got a work around by creating some dummy (global) variables to handle plotting and forcing assignment inside the function
plotgraph function(df,df.x,df.y) {
dummy.df <<- df
dummy.x <<- df.x
dummy.y <<- df.y
p = ggplot(dummy.df,aes(x=dummy.x,y=dummy.y,.....)
print(p)
}
then in the main code I can just call the function
plotgraph(data,data$time,data$Y1)
plotgraph(data,data$time,data$Y2)

Short answer: Use qplot
Long answer:
In essence you want something like this:
my.barplot <- function(x=this.is.a.data.frame.typically) {
# R code doing the magic comes here
...
}
But that lacks flexibility because you must stick to consistent column naming to avoid the annoying R scope idiosyncrasies. Of course the next logic step is:
my.barplot <- function(data=data.frame(), x=..., y....) {
# R code doing something really really magical here
...
}
But then that starts looking suspiciously like a call to qplot(), right?
qplot(data=my.data.frame, x=some.column, y=some.other column,
geom="bar", stat="identity",...)
Of course now you'd like to change things like scale titles but for that a function comes handy... the good news is that scoping issues are mostly gone.
my.plot <- qplot(data=my.data.frame, x=some.column, y=some.other column,...)
set.scales(p, xscale=scale_X_continuous, xtitle=NULL,
yscale=scale_y_continuous(), title=NULL) {
return(p + xscale(title=xtitle) + yscale(title=ytitle))
}
my.plot.prettier <- set.scale(my.plot, scale_x_discrete, 'Days',
scale_y_discrete, 'Count')

Another workaround is to define the aes(...) as a variable of your function :
func<-function(meansdf, aes(...)){}
This just worked fine for me on a similar topic

You don't need anything fancy. Not even dummy variables. You only need to add a print() inside your function, is like using cat() when you want something to show in the console.
myplot <- ggplot(......) + Whatever you want here
print(myplot)
It worked for me more than one time inside the same function

I just generate new data frame variables with the desired names inside the function:
testplot <- function(df, xVar, yVar, fillVar) {
df$xVar = df[,which(names(df)==xVar)]
df$yVar = df[,which(names(df)==yVar)]
df$fillVar = df[,which(names(df)==fillVar)]
p <- ggplot(df,
aes(x=xvar, y=yvar, fill=fillvar)) +
geom_bar(position="dodge", stat="identity")
}

Related

Scatterplot function that can change based on variables for axes

I am trying to write a function that seems like it should be very simple but I am having problems with it. I want to write a function that takes in three arguements: a dataframe, x-axis variable and y-axis variable. Based on these, I want it to return a scatterplot in which the x-axis variable and y-axis variable can be changed. This is the very basic function I wrote:
scatter_plot <- function(dataframe, x_input, y_input) {
plot <- ggplot(data = dataframe) +
geom_point(mapping = aes(x = x_input, y = y_input),
)
}
For reproducibility, consider the dataset midwest that is in the ggplot2 package. The code I wrote does not produce errors when I run it, but when I try to pass arguments into it, such as
scatter_plot(midwest, percollege, percpovertyknown)
the function returns
"Error in FUN(X[[i]], ...) : object 'percollege' not found"
It seems like it does not recognize the variables in the argument, but I have been playing around with the function for quite some time and I can't seem to figure it out. Can someone help me with how to fix this so my function works correctly?
tidyverse uses non standard evaluation (NSE), which makes using its facilities in functions slightly more complicated than you expect. Here's a version of your function that works for me.
scatter_plot <- function(dataframe, x_input, y_input) {
qX <- enquo(x_input)
qY <- enquo(y_input)
plot <- ggplot(data = dataframe) +
geom_point(mapping = aes(x = !! qX, y = !! qY),
)
return(plot)
}
As you've assigned your plot to an object, I've added a return statement.
See here for more information on NSE.
Using !!rlang::ensym() in your function should work.
scatter_plot <- function(dataframe, x_input, y_input) {
plot <- ggplot(data = dataframe) +
geom_point(mapping = aes(x = !!rlang::ensym(x_input), y = !!rlang::ensym(y_input)))
plot
}
Example
scatter_plot(midwest, percollege, percpovertyknown)

How to write a facet_wrap (ggplot2) within a function

I have written a function to plot a bar graph. But when I get to facet wrap the '~' sign is making things difficult.
rf.funct <- function(dat, predictor, feature){
ggplot(get(dat), aes(get(predictor), N)) +
geom_bar(stat = 'identity') +
facet_wrap(get(~feature)) # this is where the problem is
}
I've tried the following:
facet_wrap((get(~feature))) # invalid first argument
facet_wrap(paste0("~ ", get(feature))) # object 'feature' not found
How do i make sure the '~' sign gets included with the function?
You don't need to use get. You've passed the data frame into the function using the dat argument, so just feed dat to ggplot and it will have the data from within its environment.
rf.funct <- function(dat, predictor, feature) {
ggplot(dat, aes_string(predictor, "N")) +
geom_bar(stat = 'identity') +
facet_wrap(feature)
}
The predictor and feature arguments should be entered as strings. Then you can use aes_string to specify the aesthetics. facet_wrap can now take a character vector directly, without need of a formula (as pointed out by #WeihuangWong).
I was having a similar problem and the answers & comments on here helped me fix it. However, this post is about 6 years old now, and I think the most modern solution would be along these lines:
rf.funct <- function(dat, predictor, feature){
ggplot(dat, aes({{predictor}}, N)) +
geom_bar(stat = 'identity') +
facet_wrap(enquo(feature))
}

how to use aes_string for groups in ggplot2 inside a function when making boxplot

new to ggplot2, I've scoured the web but still couldn't figure this out.
I understand how to plot a boxplot in ggplot2, my problem is that I can't pass along the variable I use for groups when it is inside a function.
so, normally (i.e. NOT inside a function), I would write this:
ggplot(myData, aes(factor(Variable1), Variable2)) +
geom_boxplot(fill="grey", colour="black")+
labs(title = "Variable1 vs. Variable2" ) +
labs (x = "variable1", y = "Variable2")
Where myData is my data frame
Variable 1 is a 2 level factor variable
Variable 2 is a continuous variable
I want to make boxplots of Variable 1 by its 2 levels/groups
and this works fine,
but as soon as I write this inside a function I couldn't get it to work.
my attempt in writing the function:
myfunction = function (data, Variable1) {
ggplot(data=myData, aes_string(factor("Variable1"), "Variable2"))+
geom_boxplot(fill="grey", colour="black")+
labs(title = paste("Variable1 vs. Variable2" )) +
labs (x = "variable1", y = "Variable2")
}
this only gives me a single boxplot(instead of 2), as if it never understood the factor(Variable1) command (and did a single boxplot of the entire Variable 2, rather than separate them by Variable 1 level first, then boxplot them).
Aes_string evaluates the entire string, so if you do sprintf("factor(%s)",Variable1) you get the desired result. As a further remark: your function has a data-argument, but inside the plotting you use myData. I have also edited the x-lab and title, so that you can pass 'Variable3' and get proper labels.
With some example data:
set.seed(123)
dat <- data.frame(Variable2=rnorm(100),Variable1=c(0,1),Variable3=sample(0:1,100,T))
myfunction = function (data, Variable1) {
ggplot(data=data, aes_string(sprintf("factor(%s)",Variable1), "Variable2"))+
geom_boxplot(fill="grey", colour="black")+
labs(title = sprintf("%s and Variable2", Variable1)) +
labs (x = Variable1, y = "Variable2")
}
p1 <- myfunction(dat,"Variable1")
p2 <- myfunction(dat,"Variable3")

Remove a layer from a ggplot2 chart

I'd like to remove a layer (in this case the results of geom_ribbon) from a ggplot2 created grid object. Is there a way I can remove it once it's already part of the object?
library(ggplot2)
dat <- data.frame(x=1:3, y=1:3, ymin=0:2, ymax=2:4)
p <- ggplot(dat, aes(x=x, y=y)) + geom_ribbon(aes(ymin=ymin, ymax=ymax), alpha=0.3)
+ geom_line()
# This has the geom_ribbon
p
# This overlays another ribbon on top
p + geom_ribbon(aes(ymin=ymin, ymax=ymax, fill=NA))
I'd like this functionality to allow me to build more complicated plots on top of less complicated ones. I am using functions that return a grid object and then printing out the final plot once it is fully assembled. The base plot has a single line with a corresponding error bar (geom_ribbon) surrounding it. The more complicated plot will have several lines and the multiple overlapping geom_ribbon objects are distracting. I'd like to remove them from the plots with multiple lines. Additionally, I'll be able to quickly create alternative versions using facets or other ggplot2 functionality.
Edit: Accepting #mnel's answer as it works. Now I need to determine how to dynamically access the geom_ribbon layer, which is captured in the SO question here.
Edit 2: For completeness, this is the function I created to solve this problem:
remove_geom <- function(ggplot2_object, geom_type) {
layers <- lapply(ggplot2_object$layers, function(x) if(x$geom$objname == geom_type) NULL else x)
layers <- layers[!sapply(layers, is.null)]
ggplot2_object$layers <- layers
ggplot2_object
}
Edit 3: See the accepted answer below for the latest versions of ggplot (>=2.x.y)
For ggplot2 version 2.2.1, I had to modify the proposed remove_geom function like this:
remove_geom <- function(ggplot2_object, geom_type) {
# Delete layers that match the requested type.
layers <- lapply(ggplot2_object$layers, function(x) {
if (class(x$geom)[1] == geom_type) {
NULL
} else {
x
}
})
# Delete the unwanted layers.
layers <- layers[!sapply(layers, is.null)]
ggplot2_object$layers <- layers
ggplot2_object
}
Here's an example of how to use it:
library(ggplot2)
set.seed(3000)
d <- data.frame(
x = runif(10),
y = runif(10),
label = sprintf("label%s", 1:10)
)
p <- ggplot(d, aes(x, y, label = label)) + geom_point() + geom_text()
Let's show the original plot:
p
Now let's remove the labels and show the plot again:
p <- remove_geom(p, "GeomText")
p
If you look at
p$layers
[[1]]
mapping: ymin = ymin, ymax = ymax
geom_ribbon: na.rm = FALSE, alpha = 0.3
stat_identity:
position_identity: (width = NULL, height = NULL)
[[2]]
geom_line:
stat_identity:
position_identity: (width = NULL, height = NULL)
You will see that you want to remove the first layer
You can do this by redefining the layers as just the second component in the list.
p$layer <- p$layer[2]
Now build and plot p
p
Note that p$layer[[1]] <- NULL would work as well. I agree with #Andrie and #Joran's comments regarding in wehat cases this might be useful, and would not expect this to be necessarily reliable.
As this problem looked interesting, I have expanded my 'ggpmisc' package with functions to manipulate the layers in a ggplot object (currently in package 'gginnards'). The functions are more polished versions of the example in my earlier answer to this same question. However, be aware that in most cases this is not the best way of working as it violates the Grammar of Graphics. In most cases one can assemble different variations of the same figure in the normal way with operator +, possibly "packaging" groups of layers into lists to have combined building blocks that can simplify the assembly of complex figures. Exceptionally we may want to edit an existing plot or a plot output by a higher level function that whose definition we cannot modify. In such cases these layer manipulation functions can be useful. The example above becomes.
library(gginnards)
p1 <- delete_layers(p, match_type = "GeomText")
See the documentation of the package for other examples, and for information on the companion functions useful for modifying the ordering of layers, and for inserting new layers at arbitrary positions.
#Kamil Slowikowski Thanks! Very useful. However I could not stop myself from creating a new variation on the same theme... hopefully easier to understand than that in the original post or the updated version by Kamil, also avoiding some assignments.
remove_geoms <- function(x, geom_type) {
# Find layers that match the requested type.
selector <- sapply(x$layers,
function(y) {
class(y$geom)[1] == geom_type
})
# Delete the layers.
x$layers[selector] <- NULL
x
}
This version is functionally identical to Kamil's function, so the usage example above does not need to be repeated here.
As an aside, this function can be easily adapted to select the layers based on the class of the stat instead of the class of the geom.
remove_stats <- function(x, stat_type) {
# Find layers that match the requested type.
selector <- sapply(x$layers,
function(y) {
class(y$stat)[1] == stat_type
})
# Delete the layers.
x$layers[selector] <- NULL
x
}
#Kamil and #Pedro Thanks a lot! For those interested, one can also augment Pedro's function to select only specific layers, as shown here with a last_only argument:
remove_geoms <- function(x, geom_type, last_only = T) {
# Find layers that match the requested type.
selector <- sapply(x$layers,
function(y) {
class(y$geom)[1] == geom_type
})
if(last_only)
selector <- max(which(selector))
# Delete the layers.
x$layers[selector] <- NULL
x
}
Coming back to #Kamil's example plot:
set.seed(3000)
d <- data.frame(
x = runif(10),
y = runif(10),
label = sprintf("label%s", 1:10)
)
p <- ggplot(d, aes(x, y, label = label)) + geom_point() + geom_point(color = "green") + geom_point(size = 5, color = "red")
p
p %>% remove_geoms("GeomPoint")
p %>% remove_geoms("GeomPoint") %>% remove_geoms("GeomPoint")

How to pass column names to a function that processes data.frames

I'm plotting lots of similar graphs so I thought I write a function to simplify the task. I'd like to pass it a data.frame and the name of the column to be plotted. Here is what I have tried:
plot_individual_subjects <- function(var, data)
{
require(ggplot2)
ggplot(data, aes(x=Time, y=var, group=Subject, colour=SubjectID)) +
geom_line() + geom_point() +
geom_text(aes(label=Subject), hjust=0, vjust=0)
}
Now if var is a string it will not work. It will not work either if change the aes part of the ggplot command to y=data[,var] and it will complain about not being able to subset a closure.
So what is the correct way/best practice to solve this and similar problems? How can I pass column names easily and safely to functions that would like to do processing on data.frames?
Bad Joran, answering in the comments!
You want to use aes_string, which allows you to pass variable names as strings. In your particular case, since you only seem to want to modify the y variable, you probably want to reorganize which aesthetics are mapped in which geoms. For instance, maybe something like this:
ggplot(data, aes_string(y = var)) +
geom_line(aes(x = Time,group = Subject,colour = SubjectID)) +
geom_point(aes(x = Time,group = Subject,colour = SubjectID)) +
geom_text(aes(x = Time,group = Subject,colour = SubjectID,label = Subject),hjust =0,vjust = 0)
or perhaps the other way around, depending on your tastes.

Resources