I have a huge dataframe, whose variables/ column names start with a number such as `1_variable`. Now I am trying to create a function that can take these column names as arguments to then plot a few boxplots using ggplot. However I need the string but also need to to use its input with `` to use the arguments in ggplot. However I am not sure how to escape the character string such as "1_variable" to give ggplot an input that is `1_variable`.
small reproducible example:
dfx = data.frame(`1ev`=c(rep(1,5), rep(2,5)), `2ev`=sample(10:99, 10),
`3ev`=10:1, check.names = FALSE)
If I were to plot the figure manually, the input would look like this:
dfx$`1ev` <- as.factor(dfx$`1ev`)
ggplot(dfx, aes(x = `1ev`, y = `2ev`))+
geom_boxplot()
the function I'd like to be able to run for the dataframe is this one:
plot_boxplot <- function(data, group, value){
data = data[c(group, value)]
data[,group] = as.factor(data[,group])
plot <- ggplot(data, aes(x = group, y = value))+
geom_boxplot()
return(plot)
}
1. Try
plot_boxplot(dfx, `1ev`, `2ev`)
which gives me an error saying Error in [.data.frame(data, c(group, value)) : object '1ev' not found
2. Try
entering the arguments with double quotes "" gives me unexpectedly this:
plot_boxplot(dfx, "1ev", "2ev")
3. Try
I also tried to replace the double quotes of the string with gsub in the function
gsub('\"', '`', group)
but that does not change anything abut its output.
4. Try
finally, I also tried to make use of aes_string , but that just gives me the same errors.
plot_boxplot <- function(data, group, value){
data = data[c(as.character(group), as.character(value))]
data[,group] = as.factor(data[,group])
plot <- ggplot(data, aes_string(x= group, y=value))+
geom_boxplot()
return(plot)
}
plot_boxplot(dfx, `1ev`, `2ev`)
plot_boxplot(dfx, "1ev", "2ev")
Ideally I would like to run the function to produce this output:
plot_boxplot(dfx, group = "1ev", value = "2ev")
[can be produced with this code manually]
ggplot(dfx, aes(x= `1ev`, y=`2ev`)) +
geom_boxplot()
Any help would be greatly appreciated.
One way to do this is a combination of aes_ and as.name():
plot_boxplot <- function(data, group, value){
data = data[c(group, value)]
data[,group] = as.factor(data[,group])
plot <- ggplot(data, aes_(x= as.name(group), y=as.name(value))) +
geom_boxplot()
return(plot)
}
And passing in strings for group and value:
plot_boxplot(dfx, "1ev", "2ev")
It's not the same plot you show above, but it looks to align with the data.
I'm writing a wrapper function around ggplot2 and having difficulty with one of the string arguments passed. Here's the sample code
myPlot <- function(tim, labx){
ggplot(subset(dat,TIME=tim), aes(x=WT, y=Var))+
geom_point(size=2)+
facet_wrap(~Dose)+
scale_x-continuous(expression(bold("Predicted"~labx~"Concentration ("*mu*"g/mL)")))
}
When I say myplot(100, "Week3"), my x-axis label is showing as "Predicted labx Concentration (µg/mL)" instead of "Predicted Week3 Concentration (µg/mL)". How do I fix this?
One solution is to use bquote() instead of expression(), and use .() inside of bquote to evaluate character (string) variables.
Below is a fully reproducible example of the issue.
library(ggplot2)
labx = "Week3"
p = ggplot(data.frame(x=1:5, y=1:5), aes(x, y)) +
geom_point() +
xlab(bquote(bold(Predicted~.(labx)~Concentration~(mu*g/mL))))
p
new to ggplot2, I've scoured the web but still couldn't figure this out.
I understand how to plot a boxplot in ggplot2, my problem is that I can't pass along the variable I use for groups when it is inside a function.
so, normally (i.e. NOT inside a function), I would write this:
ggplot(myData, aes(factor(Variable1), Variable2)) +
geom_boxplot(fill="grey", colour="black")+
labs(title = "Variable1 vs. Variable2" ) +
labs (x = "variable1", y = "Variable2")
Where myData is my data frame
Variable 1 is a 2 level factor variable
Variable 2 is a continuous variable
I want to make boxplots of Variable 1 by its 2 levels/groups
and this works fine,
but as soon as I write this inside a function I couldn't get it to work.
my attempt in writing the function:
myfunction = function (data, Variable1) {
ggplot(data=myData, aes_string(factor("Variable1"), "Variable2"))+
geom_boxplot(fill="grey", colour="black")+
labs(title = paste("Variable1 vs. Variable2" )) +
labs (x = "variable1", y = "Variable2")
}
this only gives me a single boxplot(instead of 2), as if it never understood the factor(Variable1) command (and did a single boxplot of the entire Variable 2, rather than separate them by Variable 1 level first, then boxplot them).
Aes_string evaluates the entire string, so if you do sprintf("factor(%s)",Variable1) you get the desired result. As a further remark: your function has a data-argument, but inside the plotting you use myData. I have also edited the x-lab and title, so that you can pass 'Variable3' and get proper labels.
With some example data:
set.seed(123)
dat <- data.frame(Variable2=rnorm(100),Variable1=c(0,1),Variable3=sample(0:1,100,T))
myfunction = function (data, Variable1) {
ggplot(data=data, aes_string(sprintf("factor(%s)",Variable1), "Variable2"))+
geom_boxplot(fill="grey", colour="black")+
labs(title = sprintf("%s and Variable2", Variable1)) +
labs (x = Variable1, y = "Variable2")
}
p1 <- myfunction(dat,"Variable1")
p2 <- myfunction(dat,"Variable3")
I am trying to write a function that calls ggplot with varying arguments to the aes:
hmean <- function(data, column, Label=label){
ggplot(data,aes(column)) +
geom_histogram() +
facet_wrap(~Antibody,ncol=2) +
ggtitle(paste("Mean Antibody Counts (Log2) for ",Label," stain"))
}
hmean(Log2Means,Primary.Mean, Label="Primary")
Error in eval(expr, envir, enclos) : object 'column' not found
Primary.Mean is the varying argument (I have multiple means). Following various posts here I have tried
passing the column name quoted and unquoted (which yieds either an "unexpected string constant" or the "object not found error)
setting up a local ennvironment (foo <-environment() followed by a environment= arg in ggplot)
creating a new copy of the data set using a data2$column <- data[,column]
None of these appear to work within ggplot. How do I write a function that works?
I will be calling it with different data.frames and columns:
hmean(Log2Means, Primary.mean, Label="Primary")
hmean(Log2Means, Secondary.mean, Label="Secondary")
hmean(SomeOtherFrame, SomeColumn, Label="Pretty Label")
You example is not reproducible, but likely this will work:
hmean <- function(data, column, Label=label){
ggplot(data, do.call("aes", list(y = substitute(column))) ) +
geom_histogram() +
facet_wrap(~Antibody,ncol=2) +
ggtitle(paste("Mean Antibody Counts (Log2) for ",Label," stain"))
}
hmean(Log2Means,Primary.Mean, Label="Primary")
If you need more arguments to aes, do like this:
do.call("aes", list(y = substitute(function_parameter), x = quote(literal_parameter)))
You could try this:
hmean <- function(data, column, Label=label){
# cool trick?
data$pColumn <- data[, column]
ggplot(data,aes(pColumn)) +
geom_histogram() +
facet_wrap(~Antibody,ncol=2) +
ggtitle(paste("Mean Antibody Counts (Log2) for ",Label," stain"))
}
hmean(Log2Means,'Primary.Mean', Label="Primary")
I eventually got it to work with an aes_string() call: aes_string(x=foo, y=y, colour=color), wehre y and color were also defined externally to ggplot().
I'm trying to write a simple plot function, using the ggplot2 library. But the call to ggplot doesn't find the function argument.
Consider a data.frame called means that stores two conditions and two mean values that I want to plot (condition will appear on the X axis, means on the Y).
library(ggplot2)
m <- c(13.8, 14.8)
cond <- c(1, 2)
means <- data.frame(means=m, condition=cond)
means
# The output should be:
# means condition
# 1 13.8 1
# 2 14.8 2
testplot <- function(meansdf)
{
p <- ggplot(meansdf, aes(fill=meansdf$condition, y=meansdf$means, x = meansdf$condition))
p + geom_bar(position="dodge", stat="identity")
}
testplot(means)
# This will output the following error:
# Error in eval(expr, envir, enclos) : object 'meansdf' not found
So it seems that ggplot is calling eval, which can't find the argument meansdf. Does anyone know how I can successfully pass the function argument to ggplot?
(Note: Yes I could just call the ggplot function directly, but in the end I hope to make my plot function do more complicated stuff! :) )
The "proper" way to use ggplot programmatically is to use aes_string() instead of aes() and use the names of the columns as characters rather than as objects:
For more programmatic uses, for example if you wanted users to be able to specify column names for various aesthetics as arguments, or if this function is going in a package that needs to pass R CMD CHECK without warnings about variable names without definitions, you can use aes_string(), with the columns needed as characters.
testplot <- function(meansdf, xvar = "condition", yvar = "means",
fillvar = "condition") {
p <- ggplot(meansdf,
aes_string(x = xvar, y= yvar, fill = fillvar)) +
geom_bar(position="dodge", stat="identity")
}
As Joris and Chase have already correctly answered, standard best practice is to simply omit the meansdf$ part and directly refer to the data frame columns.
testplot <- function(meansdf)
{
p <- ggplot(meansdf,
aes(fill = condition,
y = means,
x = condition))
p + geom_bar(position = "dodge", stat = "identity")
}
This works, because the variables referred to in aes are looked for either in the global environment or in the data frame passed to ggplot. That is also the reason why your example code - using meansdf$condition etc. - did not work: meansdf is neither available in the global environment, nor is it available inside the data frame passed to ggplot, which is meansdf itself.
The fact that the variables are looked for in the global environment instead of in the calling environment is actually a known bug in ggplot2 that Hadley does not consider fixable at the moment.
This leads to problems, if one wishes to use a local variable, say, scale, to influence the data used for the plot:
testplot <- function(meansdf)
{
scale <- 0.5
p <- ggplot(meansdf,
aes(fill = condition,
y = means * scale, # does not work, since scale is not found
x = condition))
p + geom_bar(position = "dodge", stat = "identity")
}
A very nice workaround for this case is provided by Winston Chang in the referenced GitHub issue: Explicitly setting the environment parameter to the current environment during the call to ggplot.
Here's what that would look like for the above example:
testplot <- function(meansdf)
{
scale <- 0.5
p <- ggplot(meansdf,
aes(fill = condition,
y = means * scale,
x = condition),
environment = environment()) # This is the only line changed / added
p + geom_bar(position = "dodge", stat = "identity")
}
## Now, the following works
testplot(means)
Here is a simple trick I use a lot to define my variables in my functions environment (second line):
FUN <- function(fun.data, fun.y) {
fun.data$fun.y <- fun.data[, fun.y]
ggplot(fun.data, aes(x, fun.y)) +
geom_point() +
scale_y_continuous(fun.y)
}
datas <- data.frame(x = rnorm(100, 0, 1),
y = x + rnorm(100, 2, 2),
z = x + rnorm(100, 5, 10))
FUN(datas, "y")
FUN(datas, "z")
Note how the y-axis label also changes when different variables or data-sets are used.
I don't think you need to include the meansdf$ part in your function call itself. This seems to work on my machine:
meansdf <- data.frame(means = c(13.8, 14.8), condition = 1:2)
testplot <- function(meansdf)
{
p <- ggplot(meansdf, aes(fill=condition, y=means, x = condition))
p + geom_bar(position="dodge", stat="identity")
}
testplot(meansdf)
to produce:
This is an example of a problem that is discussed earlier. Basically, it comes down to ggplot2 being coded for use in the global environment mainly. In the aes() call, the variables are looked for either in the global environment or within the specified dataframe.
library(ggplot2)
means <- data.frame(means=c(13.8,14.8),condition=1:2)
testplot <- function(meansdf)
{
p <- ggplot(meansdf, aes(fill=condition,
y=means, x = condition))
p + geom_bar(position="dodge", stat="identity")
}
EDIT:
update: After seeing the other answer and updating the ggplot2 package, the code above works. Reason is, as explained in the comments, that ggplot will look for the variables in aes in either the global environment (when the dataframe is specifically added as meandf$... ) or within the mentioned environment.
For this, be sure you work with the latest version of ggplot2.
If is important to pass the variables (column names) to the custom plotting function unquoted, while different variable names are used within the function, then another workaround that I tried, was to make use of match.call() and eval (like here as well):
library(ggplot2)
meansdf <- data.frame(means = c(13.8, 14.8), condition = 1:2)
testplot <- function(df, x, y) {
arg <- match.call()
scale <- 0.5
p <- ggplot(df, aes(x = eval(arg$x),
y = eval(arg$y) * scale,
fill = eval(arg$x)))
p + geom_bar(position = "dodge", stat = "identity")
}
testplot(meansdf, condition, means)
Created on 2019-01-10 by the reprex package (v0.2.1)
Another workaround, but with passing quoted variables to the custom plotting function is using get():
meansdf <- data.frame(means = c(13.8, 14.8), condition = 1:2)
testplot <- function(df, x, y) {
scale <- 0.5
p <- ggplot(df, aes(x = get(x),
y = get(y) * scale,
fill = get(x)))
p + geom_bar(position = "dodge", stat = "identity")
}
testplot(meansdf, "condition", "means")
Created on 2019-01-10 by the reprex package (v0.2.1)
This frustrated me for some time. I wanted to send different data frames with different variable names and I wanted the ability to plot different columns from the data frame. I finally got a work around by creating some dummy (global) variables to handle plotting and forcing assignment inside the function
plotgraph function(df,df.x,df.y) {
dummy.df <<- df
dummy.x <<- df.x
dummy.y <<- df.y
p = ggplot(dummy.df,aes(x=dummy.x,y=dummy.y,.....)
print(p)
}
then in the main code I can just call the function
plotgraph(data,data$time,data$Y1)
plotgraph(data,data$time,data$Y2)
Short answer: Use qplot
Long answer:
In essence you want something like this:
my.barplot <- function(x=this.is.a.data.frame.typically) {
# R code doing the magic comes here
...
}
But that lacks flexibility because you must stick to consistent column naming to avoid the annoying R scope idiosyncrasies. Of course the next logic step is:
my.barplot <- function(data=data.frame(), x=..., y....) {
# R code doing something really really magical here
...
}
But then that starts looking suspiciously like a call to qplot(), right?
qplot(data=my.data.frame, x=some.column, y=some.other column,
geom="bar", stat="identity",...)
Of course now you'd like to change things like scale titles but for that a function comes handy... the good news is that scoping issues are mostly gone.
my.plot <- qplot(data=my.data.frame, x=some.column, y=some.other column,...)
set.scales(p, xscale=scale_X_continuous, xtitle=NULL,
yscale=scale_y_continuous(), title=NULL) {
return(p + xscale(title=xtitle) + yscale(title=ytitle))
}
my.plot.prettier <- set.scale(my.plot, scale_x_discrete, 'Days',
scale_y_discrete, 'Count')
Another workaround is to define the aes(...) as a variable of your function :
func<-function(meansdf, aes(...)){}
This just worked fine for me on a similar topic
You don't need anything fancy. Not even dummy variables. You only need to add a print() inside your function, is like using cat() when you want something to show in the console.
myplot <- ggplot(......) + Whatever you want here
print(myplot)
It worked for me more than one time inside the same function
I just generate new data frame variables with the desired names inside the function:
testplot <- function(df, xVar, yVar, fillVar) {
df$xVar = df[,which(names(df)==xVar)]
df$yVar = df[,which(names(df)==yVar)]
df$fillVar = df[,which(names(df)==fillVar)]
p <- ggplot(df,
aes(x=xvar, y=yvar, fill=fillvar)) +
geom_bar(position="dodge", stat="identity")
}