How to pass column names to a function that processes data.frames - r

I'm plotting lots of similar graphs so I thought I write a function to simplify the task. I'd like to pass it a data.frame and the name of the column to be plotted. Here is what I have tried:
plot_individual_subjects <- function(var, data)
{
require(ggplot2)
ggplot(data, aes(x=Time, y=var, group=Subject, colour=SubjectID)) +
geom_line() + geom_point() +
geom_text(aes(label=Subject), hjust=0, vjust=0)
}
Now if var is a string it will not work. It will not work either if change the aes part of the ggplot command to y=data[,var] and it will complain about not being able to subset a closure.
So what is the correct way/best practice to solve this and similar problems? How can I pass column names easily and safely to functions that would like to do processing on data.frames?

Bad Joran, answering in the comments!
You want to use aes_string, which allows you to pass variable names as strings. In your particular case, since you only seem to want to modify the y variable, you probably want to reorganize which aesthetics are mapped in which geoms. For instance, maybe something like this:
ggplot(data, aes_string(y = var)) +
geom_line(aes(x = Time,group = Subject,colour = SubjectID)) +
geom_point(aes(x = Time,group = Subject,colour = SubjectID)) +
geom_text(aes(x = Time,group = Subject,colour = SubjectID,label = Subject),hjust =0,vjust = 0)
or perhaps the other way around, depending on your tastes.

Related

How can I print mutliple (81) ggplots from a for loop?

I have a large, long dataset like this example:
df <- data.frame("Sample" = c("CM","PB","CM","PB"),"Compound" = c("Hydrogen","Hydrogen","Helium","Helium"), "Value" = c(8,3,3,2))
however I have about 162 rows (81 sample/compound pairs)
I am trying to write a loop that prints individual geom_col() plots of each compound where
x=Sample
y=Value
and there are 81 plots for each compound.
I think I am close with this loop:
I want i in "each compound" etc.
for (i in df$Compound){
print(ggplot(data = i),
aes(x=Sample,
y=Value))+
geom_col()
}
What am I missing from this loop? I have also tried facet_wrap(~Compound) However it looks like 81 is too large and each plot is tiny once made. I am looking for a full size bar graph of each compound.
Two issues with your code:
Your aes needs to be combined with ggplot(.) somehow, not as a second argument to print.
Your geom_col needs to be added to the ggplot(.) chain, not to print.
I think then that your code should be
for (i in df$Compound){
print(
ggplot(data = i) +
aes(x = Sample, y = Value) +
geom_col()
)
}
A known-working example:
for (CYL in unique(mtcars$cyl)) {
print(
ggplot(subset(mtcars, cyl == CYL), aes(mpg, disp)) +
geom_point() +
labs(title = paste("cyl ==", CYL))
)
}
produces three plots (rapidly).
Note:
If you want a break, consider adding readline("next ...") after your print.
I tend to use gg <- ggplot(..) + ... ; print(gg) (instead of print(ggplot(.)+...)) mostly out of habit, but it can provide a little clarity in errors if/when they occur. It's minor and perhaps more technique than anything.
I think you can loop and pull out the selected data set for each index.
for (i in df$Compound){
print(ggplot(data = df[df$Compound == i,],
aes(x=Sample,
y=Value))+
geom_col())
}
(This code also fixes the problems/misplaced parentheses pointed out by #r2evans)
There are a variety of other ways to do this, e.g. split() the data frame by Compound, or something tidyverse-ish, or ...

What exactly is ggplot doing with the `+` operator?

I started using R recently, and have been confused with ggplot which my class is using. I'm used to the + operator just adding two outputs, but I see that in ggplot you can things such as:
ggplot(data = bechdel, aes(x = domgross_2013)) +
geom_histogram(bins = 10, color="purple", fill="white") +
labs(title = "Domestic Growth of Movies", x = " Domestic Growth")
Here we are adding two function calls together. What exactly is happening here? Is ggplot "overriding" the + operator (maybe like how you can override the == operator in dart?) in order to do something different? Or is it that the '+' operator means something different in R than I am used to with other programming languages?
I'll answer the first question. You should ask the second question in a separate posting.
R lets you override most operators. The easiest way to do it is using the "S3" object system. This is a very simple system where you attach an attribute named "class" to the object, and that affects how R processes some functions. (The ones this applies to are called "generic functions". There are other functions that don't pay any attention to the class.)
Each ggplot2 function returns an object with a class. You can use the class() function to get the class. For example, class(ggplot(data = "mtcars")) is a character vector containing c("gg", "ggplot"), and class(geom_histogram(bins = 10, color="purple", fill="white")) is the vector c("LayerInstance","Layer","ggproto","gg").
If you ask for methods("+") you'll see all the classes with methods defined for addition, and that includes "gg", so R will call that method to process the addition in the expression you used.
The + operator is part of the philosophy of ggplot2. It's inspired by The Grammar of Graphics, which is worth reading. Essentially, you keep creating new and new layers.
Try taking this one step at a time in your code and it should make sense!
one <- ggplot2::ggplot(data = mtcars) +
labs(title = "Mtcars", subtitle = "Blank Canvas")
two <- ggplot2::ggplot(data = mtcars, aes(x = mpg)) +
labs(title = "Mtcars", subtitle = "+ Aesthetic Mapping")
three <- ggplot2::ggplot(data = mtcars, aes(x = mpg, y = after_stat(count))) +
geom_histogram()
library(patchwork)
one + two + three

How to write a facet_wrap (ggplot2) within a function

I have written a function to plot a bar graph. But when I get to facet wrap the '~' sign is making things difficult.
rf.funct <- function(dat, predictor, feature){
ggplot(get(dat), aes(get(predictor), N)) +
geom_bar(stat = 'identity') +
facet_wrap(get(~feature)) # this is where the problem is
}
I've tried the following:
facet_wrap((get(~feature))) # invalid first argument
facet_wrap(paste0("~ ", get(feature))) # object 'feature' not found
How do i make sure the '~' sign gets included with the function?
You don't need to use get. You've passed the data frame into the function using the dat argument, so just feed dat to ggplot and it will have the data from within its environment.
rf.funct <- function(dat, predictor, feature) {
ggplot(dat, aes_string(predictor, "N")) +
geom_bar(stat = 'identity') +
facet_wrap(feature)
}
The predictor and feature arguments should be entered as strings. Then you can use aes_string to specify the aesthetics. facet_wrap can now take a character vector directly, without need of a formula (as pointed out by #WeihuangWong).
I was having a similar problem and the answers & comments on here helped me fix it. However, this post is about 6 years old now, and I think the most modern solution would be along these lines:
rf.funct <- function(dat, predictor, feature){
ggplot(dat, aes({{predictor}}, N)) +
geom_bar(stat = 'identity') +
facet_wrap(enquo(feature))
}

Refactoring recurring ggplot code

I'm using R and ggplot2 to analyze some statistics from basketball games. I'm new to R and ggplot, and I like the results I'm getting, given my limited experience. But as I go along, I find that my code gets repetitive; which I dislike.
I created several plots similar to this one:
Code:
efgPlot <- ggplot(gmStats, aes(EFGpct, Nrtg)) +
stat_smooth(method = "lm") +
geom_point(aes(colour=plg_ShortName, shape=plg_ShortName)) +
scale_shape_manual(values=as.numeric(gmStats$plg_ShortName))
Only difference between the plots is the x-value; next plot would be:
orPlot <- ggplot(gmStats, aes(ORpct, Nrtg)) +
stat_smooth(method = "lm") + ... # from here all is the same
How could I refactor this, such that I could do something like:
efgPlot <- getPlot(gmStats, EFGpct, Nrtg))
orPlot <- getPlot(gmStats, ORpct, Nrtg))
Update
I think my way of refactoring this isn't really "R-ish" (or ggplot-ish if you will); based on baptiste's comment below, I solved this without refactoring anything into a function; see my answer below.
The key to this sort of thing is using aes_string rather than aes (untested, of course):
getPlot <- function(data,xvar,yvar){
p <- ggplot(data, aes_string(x = xvar, y = yvar)) +
stat_smooth(method = "lm") +
geom_point(aes(colour=plg_ShortName, shape=plg_ShortName)) +
scale_shape_manual(values=as.numeric(data$plg_ShortName))
print(p)
invisible(p)
}
aes_string allows you to pass variable names as strings, rather than expressions, which is more convenient when writing functions. Of course, you may not want to hard code to color and shape scales, in which case you could use aes_string again for those.
Although Joran's answer helpt me a lot (and he accurately answers my question), I eventually solved this according to baptiste's suggestion:
# get the variablesI need from the stats data frame:
forPlot <- gmStats[c("wed_ID","Nrtg","EFGpct","ORpct","TOpct","FTTpct",
"plg_ShortName","Home")]
# melt to long format:
forPlot.m <- melt(forPlot, id=c("wed_ID", "plg_ShortName", "Home","Nrtg"))
# use fact wrap to create 4 plots:
p <- ggplot(forPlot.m, aes(value, Nrtg)) +
geom_point(aes(shape=plg_ShortName, colour=plg_ShortName)) +
scale_shape_manual(values=as.numeric(forPlot.m$plg_ShortName)) +
stat_smooth(method="lm") +
facet_wrap(~variable,scales="free")
Which gives me:

Use of ggplot() within another function in R

I'm trying to write a simple plot function, using the ggplot2 library. But the call to ggplot doesn't find the function argument.
Consider a data.frame called means that stores two conditions and two mean values that I want to plot (condition will appear on the X axis, means on the Y).
library(ggplot2)
m <- c(13.8, 14.8)
cond <- c(1, 2)
means <- data.frame(means=m, condition=cond)
means
# The output should be:
# means condition
# 1 13.8 1
# 2 14.8 2
testplot <- function(meansdf)
{
p <- ggplot(meansdf, aes(fill=meansdf$condition, y=meansdf$means, x = meansdf$condition))
p + geom_bar(position="dodge", stat="identity")
}
testplot(means)
# This will output the following error:
# Error in eval(expr, envir, enclos) : object 'meansdf' not found
So it seems that ggplot is calling eval, which can't find the argument meansdf. Does anyone know how I can successfully pass the function argument to ggplot?
(Note: Yes I could just call the ggplot function directly, but in the end I hope to make my plot function do more complicated stuff! :) )
The "proper" way to use ggplot programmatically is to use aes_string() instead of aes() and use the names of the columns as characters rather than as objects:
For more programmatic uses, for example if you wanted users to be able to specify column names for various aesthetics as arguments, or if this function is going in a package that needs to pass R CMD CHECK without warnings about variable names without definitions, you can use aes_string(), with the columns needed as characters.
testplot <- function(meansdf, xvar = "condition", yvar = "means",
fillvar = "condition") {
p <- ggplot(meansdf,
aes_string(x = xvar, y= yvar, fill = fillvar)) +
geom_bar(position="dodge", stat="identity")
}
As Joris and Chase have already correctly answered, standard best practice is to simply omit the meansdf$ part and directly refer to the data frame columns.
testplot <- function(meansdf)
{
p <- ggplot(meansdf,
aes(fill = condition,
y = means,
x = condition))
p + geom_bar(position = "dodge", stat = "identity")
}
This works, because the variables referred to in aes are looked for either in the global environment or in the data frame passed to ggplot. That is also the reason why your example code - using meansdf$condition etc. - did not work: meansdf is neither available in the global environment, nor is it available inside the data frame passed to ggplot, which is meansdf itself.
The fact that the variables are looked for in the global environment instead of in the calling environment is actually a known bug in ggplot2 that Hadley does not consider fixable at the moment.
This leads to problems, if one wishes to use a local variable, say, scale, to influence the data used for the plot:
testplot <- function(meansdf)
{
scale <- 0.5
p <- ggplot(meansdf,
aes(fill = condition,
y = means * scale, # does not work, since scale is not found
x = condition))
p + geom_bar(position = "dodge", stat = "identity")
}
A very nice workaround for this case is provided by Winston Chang in the referenced GitHub issue: Explicitly setting the environment parameter to the current environment during the call to ggplot.
Here's what that would look like for the above example:
testplot <- function(meansdf)
{
scale <- 0.5
p <- ggplot(meansdf,
aes(fill = condition,
y = means * scale,
x = condition),
environment = environment()) # This is the only line changed / added
p + geom_bar(position = "dodge", stat = "identity")
}
## Now, the following works
testplot(means)
Here is a simple trick I use a lot to define my variables in my functions environment (second line):
FUN <- function(fun.data, fun.y) {
fun.data$fun.y <- fun.data[, fun.y]
ggplot(fun.data, aes(x, fun.y)) +
geom_point() +
scale_y_continuous(fun.y)
}
datas <- data.frame(x = rnorm(100, 0, 1),
y = x + rnorm(100, 2, 2),
z = x + rnorm(100, 5, 10))
FUN(datas, "y")
FUN(datas, "z")
Note how the y-axis label also changes when different variables or data-sets are used.
I don't think you need to include the meansdf$ part in your function call itself. This seems to work on my machine:
meansdf <- data.frame(means = c(13.8, 14.8), condition = 1:2)
testplot <- function(meansdf)
{
p <- ggplot(meansdf, aes(fill=condition, y=means, x = condition))
p + geom_bar(position="dodge", stat="identity")
}
testplot(meansdf)
to produce:
This is an example of a problem that is discussed earlier. Basically, it comes down to ggplot2 being coded for use in the global environment mainly. In the aes() call, the variables are looked for either in the global environment or within the specified dataframe.
library(ggplot2)
means <- data.frame(means=c(13.8,14.8),condition=1:2)
testplot <- function(meansdf)
{
p <- ggplot(meansdf, aes(fill=condition,
y=means, x = condition))
p + geom_bar(position="dodge", stat="identity")
}
EDIT:
update: After seeing the other answer and updating the ggplot2 package, the code above works. Reason is, as explained in the comments, that ggplot will look for the variables in aes in either the global environment (when the dataframe is specifically added as meandf$... ) or within the mentioned environment.
For this, be sure you work with the latest version of ggplot2.
If is important to pass the variables (column names) to the custom plotting function unquoted, while different variable names are used within the function, then another workaround that I tried, was to make use of match.call() and eval (like here as well):
library(ggplot2)
meansdf <- data.frame(means = c(13.8, 14.8), condition = 1:2)
testplot <- function(df, x, y) {
arg <- match.call()
scale <- 0.5
p <- ggplot(df, aes(x = eval(arg$x),
y = eval(arg$y) * scale,
fill = eval(arg$x)))
p + geom_bar(position = "dodge", stat = "identity")
}
testplot(meansdf, condition, means)
Created on 2019-01-10 by the reprex package (v0.2.1)
Another workaround, but with passing quoted variables to the custom plotting function is using get():
meansdf <- data.frame(means = c(13.8, 14.8), condition = 1:2)
testplot <- function(df, x, y) {
scale <- 0.5
p <- ggplot(df, aes(x = get(x),
y = get(y) * scale,
fill = get(x)))
p + geom_bar(position = "dodge", stat = "identity")
}
testplot(meansdf, "condition", "means")
Created on 2019-01-10 by the reprex package (v0.2.1)
This frustrated me for some time. I wanted to send different data frames with different variable names and I wanted the ability to plot different columns from the data frame. I finally got a work around by creating some dummy (global) variables to handle plotting and forcing assignment inside the function
plotgraph function(df,df.x,df.y) {
dummy.df <<- df
dummy.x <<- df.x
dummy.y <<- df.y
p = ggplot(dummy.df,aes(x=dummy.x,y=dummy.y,.....)
print(p)
}
then in the main code I can just call the function
plotgraph(data,data$time,data$Y1)
plotgraph(data,data$time,data$Y2)
Short answer: Use qplot
Long answer:
In essence you want something like this:
my.barplot <- function(x=this.is.a.data.frame.typically) {
# R code doing the magic comes here
...
}
But that lacks flexibility because you must stick to consistent column naming to avoid the annoying R scope idiosyncrasies. Of course the next logic step is:
my.barplot <- function(data=data.frame(), x=..., y....) {
# R code doing something really really magical here
...
}
But then that starts looking suspiciously like a call to qplot(), right?
qplot(data=my.data.frame, x=some.column, y=some.other column,
geom="bar", stat="identity",...)
Of course now you'd like to change things like scale titles but for that a function comes handy... the good news is that scoping issues are mostly gone.
my.plot <- qplot(data=my.data.frame, x=some.column, y=some.other column,...)
set.scales(p, xscale=scale_X_continuous, xtitle=NULL,
yscale=scale_y_continuous(), title=NULL) {
return(p + xscale(title=xtitle) + yscale(title=ytitle))
}
my.plot.prettier <- set.scale(my.plot, scale_x_discrete, 'Days',
scale_y_discrete, 'Count')
Another workaround is to define the aes(...) as a variable of your function :
func<-function(meansdf, aes(...)){}
This just worked fine for me on a similar topic
You don't need anything fancy. Not even dummy variables. You only need to add a print() inside your function, is like using cat() when you want something to show in the console.
myplot <- ggplot(......) + Whatever you want here
print(myplot)
It worked for me more than one time inside the same function
I just generate new data frame variables with the desired names inside the function:
testplot <- function(df, xVar, yVar, fillVar) {
df$xVar = df[,which(names(df)==xVar)]
df$yVar = df[,which(names(df)==yVar)]
df$fillVar = df[,which(names(df)==fillVar)]
p <- ggplot(df,
aes(x=xvar, y=yvar, fill=fillvar)) +
geom_bar(position="dodge", stat="identity")
}

Resources