Variables cant be found by function (R language) - r

There's some problem when trying to use new functions to analyse a dataset.
I am trying to plot the relationship between hits and runs in the mlb11 dataset by R language.
The function was as
f_plot<-function(x,y,z){
ggplot(x,aes(y,z))+geom_point()+geom_smooth(method="lm")
}
and if I start to plot like this:
f_plot(mlb11,hits, runs)
then it will give :
Error in FUN(X[[i]], ...) : object 'hits' not found
Then if I try this:
f_plot(mlb11,mlb11$hits, mlb11$runs)
It will gives
this output
This fixed the problem!!
But I am very curious about why the function{} cant read the variable names automatically even if we have already designated the dataset "mlb11"?? Would be appreciated to know more about this basic problem!! Thanks!!

The reason is in this line
ggplot(x,aes(y,z))+geom_point()+geom_smooth(method="lm")
it's looking for a variable called y and z in the data x. The "easiest" way to solve this is to do this
f_plot<-function(x,y,z){
yy <- as.character(substitute(y))
zz <- as.character(substitute(z))
code = sprintf('ggplot(x,aes(%s,%s))+geom_point()+geom_smooth(method="lm")', yy, zz)
#print(code)
eval(parse(text = code))
}
f_plot(hehe, a, b)
this will do the trick but is not the best way to write it. The function substitute takes what you passed in to y and make it into an expression, which the cocdes converts into a string/text and then put inside the ggplot which then gets evaluated by eval(parse())

The problem here is that the helper function aes() expects bare variable names as input. This means that when you type aes(y, z), the helper function will not look for the values of the variables "y" and "z" that you supplied to the function f_plot(). Instead, it will look for variables called "y" and "z" in the supplied data frame; independently of the values for y and z that were given to f_plot().
To circumvent this behaviour, you can use aes_(): this variant doesn't treat its inputs as bare variable names, but instead as variables that refer to a bare variable name in the form of an R expression. For this approach to work, you then only have to convert the "y" and "z" inputs of the function f_plot() to R expressions using substitute().
library(ggplot2)
f_plot <- function(x, y, z) {
y <- substitute(y)
z <- substitute(z)
ggplot(x, aes_(y, z)) +
geom_point() +
geom_smooth(method = "lm")
}
df <- data_frame(var1 = 1:10, var2 = 1:10)
f_plot(df, var1, var2)
This should make your function work as intended.

Related

Dash in column name yields "object not found" Error

I have a function to generate scatter plots from data, where an argument is provided to select which column to use for coloring the points. Here is a simplified version:
library(ggplot2)
plot_gene <- function (df, gene) {
ggplot(df, aes(x, y)) +
geom_point(aes_string(col = gene)) +
scale_color_gradient()
}
where df is a data.frame with columns x, y, and then a bunch of gene names. This works fine for most gene names; however, some have dashes and these fail:
print(plot_gene(df, "Gapdh")) # great!
print(plot_gene(df, "H2-Aa")) # Error: object "H2" not found
It appears the gene variable is getting parsed ("H2-Aa" becomes H2 - Aa). How can I get around this? Is there a way to indicate that a string should not go through eval in aes_string?
Reproducible Input
If you need some input to play with, this fails like my data:
df <- data.frame(c(1,2), c(2,1), c(1,2), c(2,1))
colnames(df) <- c("x", "y", "Gapdh", "H2-Aa")
For my real data, I am using read.table(..., header=TRUE) and get column names with dashes because the raw data files have them.
Normally R tries very hard to make sure you have column names in your data.frame that can be valid variable names. Using non-standard column names (those that are not valid variable names) will lead to problems when using functions that use non-standard evaluation type syntax. When focused to use such variable names you often have to wrap them in back ticks. In the normal case
ggplot(df, aes(x, y)) +
geom_point(aes(col = H2-Aa)) +
scale_color_gradient()
# Error in FUN(X[[i]], ...) : object 'H2' not found
would return an error but
ggplot(df, aes(x, y)) +
geom_point(aes(col = `H2-Aa`)) +
scale_color_gradient()
would work.
You can paste in backticks if you really want
geom_point(aes_string(col = paste0("`", gene, "`")))
or you could treat it as a symbol from the get-go and use aes_q instread
geom_point(aes_q(col = as.name(gene)))
The latest release of ggplot support escaping via !! rather than using aes_string or aes_q so you could do
geom_point(aes(col = !!rlang::sym(gene)))

How to pass argument input to a self-defined function in R?

I made a self-defined plottest function as this:
plottest<-function(dataframe, var1){
ggplot(dataframe)+geom_point(aes(x=T, y=var1))
}
I wish I could pass a dataframe and a column name to it so I can repeated plot different columns.
df <- data.frame(T=(1:10), y1=(21:30), y2=(51:60), y3 = (61:70))
But when I do:
library(ggplot2)
plottest(df, y1)
Error message shows up saying:object 'var1' not found.
What should I do to make this work??
Try:
df <- data.frame(T=(1:10), y1=(21:30), y2=(51:60), y3 = (61:70))
plottest<-function(dataframe, var1){
ggplot(dataframe, aes_string(x='T', y=var1))+geom_point()
}
plottest(df, 'y1')
It would be even cleaner to fix the abcissa in the function argument as default parameter.
You can make you original function work with a simple added line (and I added the ylab() call to prevent the plot from always using "y" as the ylab.
plottest<-function(dataframe, var1){
y <- substitute(var1)
ggplot(dataframe)+geom_point(aes(x=T, y=y)) + ylab(y)
}
library(ggplot2)
plottest(df, y1)

Plotting inside function: subset(df,id_==...) gives wrong plot, df[df$id_==...,] is right

I have a df with multiple y-series which I want to plot individually, so I wrote a fn that selects one particular series, assigns to a local variable dat, then plots it. However ggplot/geom_step when called inside the fn doesn't treat it properly like a single series. I don't see how this can be a scoping issue, since if dat wasn't visible, surely ggplot would fail?
You can verify the code is correct when executed from the toplevel environment, but not inside the function. This is not a duplicate question. I understand the problem (this is a recurring issue with ggplot), but I've read all the other answers; this is not a duplicate and they do not give the solution.
set.seed(1234)
require(ggplot2)
require(scales)
N = 10
df <- data.frame(x = 1:N,
id_ = c(rep(20,N), rep(25,N), rep(33,N)),
y = c(runif(N, 1.2e6, 2.9e6), runif(N, 5.8e5, 8.9e5) ,runif(N, 2.4e5, 3.3e5)),
row.names=NULL)
plot_series <- function(id_, envir=environment()) {
dat <- subset(df,id_==id_)
p <- ggplot(data=dat, mapping=aes(x,y), color='red') + geom_step()
# Unsuccessfully trying the approach from http://stackoverflow.com/questions/22287498/scoping-of-variables-in-aes-inside-a-function-in-ggplot
p$plot_env <- envir
plot(p)
# Displays wrongly whether we do the plot here inside fn, or return the object to parent environment
return(p)
}
# BAD: doesn't plot geom_step!
plot_series(20)
# GOOD! but what's causing the difference?
ggplot(data=subset(df,id_==20), mapping=aes(x,y), color='red') + geom_step()
#plot_series(25)
#plot_series(33)
This works fine:
plot_series <- function(id_) {
dat <- df[df$id_ == id_,]
p <- ggplot(data=dat, mapping=aes(x,y), color='red') + geom_step()
return(p)
}
print(plot_series(20))
If you simply step through the original function using debug, you'll quickly see that the subset line did not actually subset the data frame at all: it returned all rows!
Why? Because subset uses non-standard evaluation and you used the same name for both the column name and the function argument. As jlhoward demonstrates above, it would have worked (but probably not been advisable) to have simply used different names for the two.
The reason is that subset evaluates with the data frame first. So all it sees in the logical expression is the always true id_ == id_ within that data frame.
One way to think about it is to play dumb (like a computer) and ask yourself when presented with the condition id_ == id_ how do you know what exactly each symbol refers to. It's ambiguous, and subset makes a consistent choice: use what's in the data frame.
Notwithstanding the comments, this works:
plot_series <- function(z, envir=environment()) {
dat <- subset(df,id_==z)
p <- ggplot(data=dat, mapping=aes(x,y), color='red') + geom_step()
p$plot_env <- envir
plot(p)
# Displays wrongly whether we do the plot here inside fn, or return the object to parent environment
return(p)
}
plot_series(20)
The problem seems to be the subset is interpreting id_ on the RHS of the == as identical to the LHS, to this is equivalent to subletting on T, which of course includes all the rows of df. That's the plot you are seeing.

Function that returns an aesthetic mapping

I would like to create a function that "works just like" ggplot2's aes() function. My humble attempts fail with an "Object not found" error:
library(ggplot2)
data <- data.frame(a=1:5, b=1:5)
# Works
ggplot(data) + geom_point() + aes(x=a, y=b)
my.aes <- function(x, y) { aes(x=x, y=y) }
# Fails with "Error in eval(expr, envir, enclos) : object 'x' not found"
ggplot(data) + geom_point() + my.aes(x=a, y=b)
What is the correct way to implement my.aes()? This is for encapsulation and code reuse.
Perhaps this is related, I just don't see yet how:
How to write an R function that evaluates an expression within a data-frame.
Type aes without any parentheses or arguments to see what it's doing:
function (x, y, ...)
{
aes <- structure(as.list(match.call()[-1]), class = "uneval")
rename_aes(aes)
}
It takes the name of its arguments without evaluating them. It's basically saving the names for later so it can evaluate them in the context of the data frame you're trying to plot (that's why your error message is complaining about eval). So when you include my.aes(x=a, y=b) in your ggplot construction, it's looking for x in data--because x was not evaluated in aes(x=x, y=y).
An alternate way of thinking about what's going on in aes is something like
my.aes <- function(x, y) {
ans <- list(x = substitute(x), y = substitute(y))
class(ans) <- "uneval"
ans
}
which should work in the example above, but see the note in plyr::. (which uses the same match.call()[-1] paradigm as aes):
Similar tricks can be performed with substitute, but when functions
can be called in multiple ways it becomes increasingly tricky to
ensure that the values are extracted from the correct frame.
Substitute tricks also make it difficult to program against the
functions that use them, while the quoted class provides
as.quoted.character to convert strings to the appropriate data
structure.
If you want my.aes to call aes itself, perhaps something like:
my.aes <- function(x,y) {
do.call(aes, as.list(match.call()[-1]))
}
Example with the aes_string function pointed out by Roman Luštrik:
my.aes <- function(x,y) {
aes_string(x = x, y = y)
}
but you would need to change your call to my.aes("a", "b") in this case.

Local Variables Within aes

I'm trying to use a local variable in aes when I plot with ggplot. This is my problem boiled down to the essence:
xy <- data.frame(x=1:10,y=1:10)
plotfunc <- function(Data,YMul=2){
ggplot(Data,aes(x=x,y=y*YMul))+geom_line()
}
plotfunc(xy)
This results in the following error:
Error in eval(expr, envir, enclos) : object 'YMul' not found
It seems as if I cannot use local variables (or function arguments) in aes. Could it be that it occurrs due to the content of aes being executed later when the local variable is out of scope? How can I avoid this problem (other than not using the local variable within aes)?
I would capture the local environment,
xy <- data.frame(x=1:10,y=1:10)
plotfunc <- function(Data, YMul = 2){
.e <- environment()
ggplot(Data, aes(x = x, y = y*YMul), environment = .e) + geom_line()
}
plotfunc(xy)
Here's an alternative that allows you to pass in any value through the YMul argument without having to add it to the Data data.frame or to the global environment:
plotfunc <- function(Data, YMul = 2){
eval(substitute(
expr = {
ggplot(Data,aes(x=x,y=y*YMul)) + geom_line()
},
env = list(YMul=YMul)))
}
plotfunc(xy, YMul=100)
To see how this works, try out the following line in isolation:
substitute({ggplot(Data, aes(x=x, y=y*YMul))}, list(YMul=100))
ggplot()'s aes expects YMul to be a variable within the data data frame. Try including YMull there instead:
Thanks to #Justin: ggplot()'s aes seems to look forYMul in the data data frame first, and if not found, then in the global environment. I like to add such variables to the data frame, as follows, as it makes sense to me conceptually. I also don't have to worry about changes to global variables having unexpected consequences to functions. But all of the other answers are also correct. So, use whichever suits you.
require("ggplot2")
xy <- data.frame(x = 1:10, y = 1:10)
xy <- cbind(xy, YMul = 2)
ggplot(xy, aes(x = x, y = y * YMul)) + geom_line()
Or, if you want the function in your example:
plotfunc <- function(Data, YMul = 2)
{
ggplot(cbind(Data, YMul), aes(x = x, y = y * YMul)) + geom_line()
}
plotfunc(xy)
I am using ggplot2, and your example seems to work fine with the current version.
However, it is easy to come up with variants which still create trouble. I was myself confused by similar behavior, and that's how I found this post (top Google result for "ggplot how to evaluate variables when passed"). For example, if we move ggplot out of plotfunc:
xy <- data.frame(x=1:10,y=1:10)
plotfunc <- function(Data,YMul=2){
geom_line(aes(x=x,y=y*YMul))
}
ggplot(xy)+plotfunc(xy)
# Error in eval(expr, envir, enclos) : object 'YMul' not found
In the above variant, "capturing the local environment" is not a solution because ggplot is not called from within the function, and only ggplot has the "environment=" argument.
But there is now a family of functions "aes_", "aes_string", "aes_q" which are like "aes" but capture local variables. If we use "aes_" in the above, we still get an error because now it doesn't know about "x". But it is easy to refer to the data directly, which solves the problem:
plotfunc <- function(Data,YMul=2){
geom_line(aes_(x=Data$x,y=Data$y*YMul))
}
ggplot(xy)+plotfunc(xy)
# works
Have you looked at the solution given by #wch (W. Chang)?
https://github.com/hadley/ggplot2/issues/743
I think it is the better one
essentially is like that of #baptiste but include the reference to the environment directly in the call to ggplot
I report it here
g <- function() {
foo3 <- 4
ggplot(mtcars, aes(x = wt + foo3, y = mpg),
environment = environment()) +
geom_point()
}
g()
# Works
If you execute your code outside of the function it works. And if you execute the code within the function with YMul defined globally, it works. I don't fully understand the inner workings of ggplot but this works...
YMul <- 2
plotfunc <- function(Data){
ggplot(Data,aes(x=x,y=y*YMul))+geom_line()
}
plotfunc(xy)

Resources