Refactoring recurring ggplot code - r

I'm using R and ggplot2 to analyze some statistics from basketball games. I'm new to R and ggplot, and I like the results I'm getting, given my limited experience. But as I go along, I find that my code gets repetitive; which I dislike.
I created several plots similar to this one:
Code:
efgPlot <- ggplot(gmStats, aes(EFGpct, Nrtg)) +
stat_smooth(method = "lm") +
geom_point(aes(colour=plg_ShortName, shape=plg_ShortName)) +
scale_shape_manual(values=as.numeric(gmStats$plg_ShortName))
Only difference between the plots is the x-value; next plot would be:
orPlot <- ggplot(gmStats, aes(ORpct, Nrtg)) +
stat_smooth(method = "lm") + ... # from here all is the same
How could I refactor this, such that I could do something like:
efgPlot <- getPlot(gmStats, EFGpct, Nrtg))
orPlot <- getPlot(gmStats, ORpct, Nrtg))
Update
I think my way of refactoring this isn't really "R-ish" (or ggplot-ish if you will); based on baptiste's comment below, I solved this without refactoring anything into a function; see my answer below.

The key to this sort of thing is using aes_string rather than aes (untested, of course):
getPlot <- function(data,xvar,yvar){
p <- ggplot(data, aes_string(x = xvar, y = yvar)) +
stat_smooth(method = "lm") +
geom_point(aes(colour=plg_ShortName, shape=plg_ShortName)) +
scale_shape_manual(values=as.numeric(data$plg_ShortName))
print(p)
invisible(p)
}
aes_string allows you to pass variable names as strings, rather than expressions, which is more convenient when writing functions. Of course, you may not want to hard code to color and shape scales, in which case you could use aes_string again for those.

Although Joran's answer helpt me a lot (and he accurately answers my question), I eventually solved this according to baptiste's suggestion:
# get the variablesI need from the stats data frame:
forPlot <- gmStats[c("wed_ID","Nrtg","EFGpct","ORpct","TOpct","FTTpct",
"plg_ShortName","Home")]
# melt to long format:
forPlot.m <- melt(forPlot, id=c("wed_ID", "plg_ShortName", "Home","Nrtg"))
# use fact wrap to create 4 plots:
p <- ggplot(forPlot.m, aes(value, Nrtg)) +
geom_point(aes(shape=plg_ShortName, colour=plg_ShortName)) +
scale_shape_manual(values=as.numeric(forPlot.m$plg_ShortName)) +
stat_smooth(method="lm") +
facet_wrap(~variable,scales="free")
Which gives me:

Related

ordering function seq() in R with the order of input value

My apologize for my bad english i'm a student from france.
I have a little problem with a function in R, indeed i have a dataframe like that :
https://imgur.com/G5ToQrL
With this code :
testtransect2$TOTAL<-testtransect2$TOTAL*-1
plot(testtransect2$DECA,testtransect2$TOTAL,asp = 1)
xl <- seq(min(testtransect2$DECA),max(testtransect2$DECA), (max(testtransect2$DECA)-min(testtransect2$DECA))/1000)
lines(xl, predict(loess(testtransect2$TOTAL~testtransect2$DECA,span = 0.25), newdata=xl))
I want to create a plot with a smooth line which pass through all the point in the order of the dataframe but when i want put my line with my value xl and predict my plot is not like i want :
https://imgur.com/cSlhNtV
I link you a plot where you can see what i want :
https://imgur.com/mnVgvQ7
i think it's a problem of order in my xl value but i can't do it, if you have any solution
Thanks for give it to me
You can use ggplot
Storing your dataframe in df
df <- data.frame(DECA=c(0,10,15,-23,15,40,90,140,190,250,310,370,420),
TOTAL=c(0,-9,-15,-31.5,-48,-50,-44,-24,-17,-10,-6,-5,0))
You are interested in geom_point and geom_line. You can specify df$DECA and df$TOTAL in aes like this:
library(ggplot)
ggplot(df, aes(x=DECA, y=TOTAL)) +
geom_line() + geom_point()
Yielding
The "but when i want put my line with my value xl and predict my plot is not like i want" part is unfortunately unclear to me, please rephrase if this solution does not work for you.
Updated
There are other smooth_lines that may be added, eg. geom_smooth. Is this what you request?
ggplot(df, aes(x=DECA, y=TOTAL)) +
geom_line() + geom_point() +
geom_smooth(se=F, method = lm, col="red") + #linear method
geom_smooth(se=F, col="green") # loess method

Strictly and only in the style of ggplot(df), is there a function that adds lines and points to the plot at the same time?

This question pertains to the second type of ggplot which does not request reshaping to longer data frames. Reshaping to a longer data frame isn't easily done in this case due to the memory requirements.
Only answers that begin with ggplot(df) will be accepted. If you do not wish to follow the ggplot(df) manner then please ignore this question and move on.
df=data.frame(xx=runif(10),yy=runif(10),zz=runif(10))
require(ggplot2)
ggplot(df) +
geom_line(aes(xx,yy, color='yy'))+
geom_point(aes(xx,yy, color='yy'))+
geom_line(aes(xx,zz, color='zz'))+
geom_point(aes(xx,zz, color='zz'))+
ggtitle("Title")
Is there a way to create a geom_both function that works in the ggplot manner?
This does not work:
geom_both=function(...) { geom_line(...)+geom_point(...) }
I believe this does what you asked
library(ggplot2)
library(lemon) ## contains geom_pointline
df=data.frame(xx=runif(10),yy=runif(10),zz=runif(10))
ggplot(df) +
geom_pointline(aes(xx,yy, color='yy'))+
geom_pointline(aes(xx,zz, color='zz'))+
ggtitle("Title")
To eliminate the gap between the lines and the points, you can add distance=0 like this:
ggplot(df) +
geom_pointline(aes(xx,yy, color='yy'), distance=0)+
geom_pointline(aes(xx,zz, color='zz'), distance=0)+
ggtitle("Title")
EDIT: Another option is to define a function like this
add_line_points = function(g, ...){
gg = g + geom_point(...) + geom_line(...)
return(gg)
}
and use %>% instead of +
ggplot(df) %>% ## use pipe operator, not plus
add_line_points(aes(xx,yy, color='yy')) %>%
add_line_points(aes(xx,zz, color='zz'))
Note: I adapted this from here.

How to write a facet_wrap (ggplot2) within a function

I have written a function to plot a bar graph. But when I get to facet wrap the '~' sign is making things difficult.
rf.funct <- function(dat, predictor, feature){
ggplot(get(dat), aes(get(predictor), N)) +
geom_bar(stat = 'identity') +
facet_wrap(get(~feature)) # this is where the problem is
}
I've tried the following:
facet_wrap((get(~feature))) # invalid first argument
facet_wrap(paste0("~ ", get(feature))) # object 'feature' not found
How do i make sure the '~' sign gets included with the function?
You don't need to use get. You've passed the data frame into the function using the dat argument, so just feed dat to ggplot and it will have the data from within its environment.
rf.funct <- function(dat, predictor, feature) {
ggplot(dat, aes_string(predictor, "N")) +
geom_bar(stat = 'identity') +
facet_wrap(feature)
}
The predictor and feature arguments should be entered as strings. Then you can use aes_string to specify the aesthetics. facet_wrap can now take a character vector directly, without need of a formula (as pointed out by #WeihuangWong).
I was having a similar problem and the answers & comments on here helped me fix it. However, this post is about 6 years old now, and I think the most modern solution would be along these lines:
rf.funct <- function(dat, predictor, feature){
ggplot(dat, aes({{predictor}}, N)) +
geom_bar(stat = 'identity') +
facet_wrap(enquo(feature))
}

How to pass column names to a function that processes data.frames

I'm plotting lots of similar graphs so I thought I write a function to simplify the task. I'd like to pass it a data.frame and the name of the column to be plotted. Here is what I have tried:
plot_individual_subjects <- function(var, data)
{
require(ggplot2)
ggplot(data, aes(x=Time, y=var, group=Subject, colour=SubjectID)) +
geom_line() + geom_point() +
geom_text(aes(label=Subject), hjust=0, vjust=0)
}
Now if var is a string it will not work. It will not work either if change the aes part of the ggplot command to y=data[,var] and it will complain about not being able to subset a closure.
So what is the correct way/best practice to solve this and similar problems? How can I pass column names easily and safely to functions that would like to do processing on data.frames?
Bad Joran, answering in the comments!
You want to use aes_string, which allows you to pass variable names as strings. In your particular case, since you only seem to want to modify the y variable, you probably want to reorganize which aesthetics are mapped in which geoms. For instance, maybe something like this:
ggplot(data, aes_string(y = var)) +
geom_line(aes(x = Time,group = Subject,colour = SubjectID)) +
geom_point(aes(x = Time,group = Subject,colour = SubjectID)) +
geom_text(aes(x = Time,group = Subject,colour = SubjectID,label = Subject),hjust =0,vjust = 0)
or perhaps the other way around, depending on your tastes.

plotting two vectors of data on a GGPLOT2 scatter plot using R

I've been experimenting with both ggplot2 and lattice to graph panels of data. I'm having a little trouble wrapping my mind around the ggplot2 model. In particular, how do I plot a scatter plot with two sets of data on each panel:
in lattice I could do this:
xyplot(Predicted_value + Actual_value ~ x_value | State_CD, data=dd)
and that would give me a panel for each State_CD with each column
I can do one column with ggplot2:
pg <- ggplot(dd, aes(x_value, Predicted_value)) + geom_point(shape = 2)
+ facet_wrap(~ State_CD) + opts(aspect.ratio = 1)
print(pg)
What I can't grok is how to add Actual_value to the ggplot above.
EDIT Hadley pointed out that this really would be easier with a reproducible example. Here's code that seems to work. Is there a better or more concise way to do this with ggplot? Why is the syntax for adding another set of points to ggplot so different from adding the first set of data?
library(lattice)
library(ggplot2)
#make some example data
dd<-data.frame(matrix(rnorm(108),36,3),c(rep("A",24),rep("B",24),rep("C",24)))
colnames(dd) <- c("Predicted_value", "Actual_value", "x_value", "State_CD")
#plot with lattice
xyplot(Predicted_value + Actual_value ~ x_value | State_CD, data=dd)
#plot with ggplot
pg <- ggplot(dd, aes(x_value, Predicted_value)) + geom_point(shape = 2) + facet_wrap(~ State_CD) + opts(aspect.ratio = 1)
print(pg)
pg + geom_point(data=dd,aes(x_value, Actual_value,group=State_CD), colour="green")
The lattice output looks like this:
(source: cerebralmastication.com)
and ggplot looks like this:
(source: cerebralmastication.com)
Just following up on what Ian suggested: for ggplot2 you really want all the y-axis stuff in one column with another column as a factor indicating how you want to decorate it. It is easy to do this with melt. To wit:
qplot(x_value, value,
data = melt(dd, measure.vars=c("Predicted_value", "Actual_value")),
colour=variable) + facet_wrap(~State_CD)
Here's what it looks like for me:
(source: princeton.edu)
To get an idea of what melt is actually doing, here's the head:
> head(melt(dd, measure.vars=c("Predicted_value", "Actual_value")))
x_value State_CD variable value
1 1.2898779 A Predicted_value 1.0913712
2 0.1077710 A Predicted_value -2.2337188
3 -0.9430190 A Predicted_value 1.1409515
4 0.3698614 A Predicted_value -1.8260033
5 -0.3949606 A Predicted_value -0.3102753
6 -0.1275037 A Predicted_value -1.2945864
You see, it "melts" Predicted_value and Actual_value into one column called value and adds another column called variable letting you know what column it originally came from.
Update: several years on now, I almost always use Jonathan's method (via the tidyr package) with ggplot2. My answer below works in a pinch, but gets tedious fast when you have 3+ variables.
I'm sure Hadley will have a better answer, but - the syntax is different because the ggplot(dd,aes()) syntax is (I think) primarily intended for plotting just one variable. For two, I would use:
ggplot() +
geom_point(data=dd, aes(x_value, Actual_value, group=State_CD), colour="green") +
geom_point(data=dd, aes(x_value, Predicted_value, group=State_CD), shape = 2) +
facet_wrap(~ State_CD) +
theme(aspect.ratio = 1)
Pulling the first set of points out of the ggplot() gives it the same syntax as the second. I find this easier to deal with because the syntax is the same and it emphasizes the "Grammar of Graphics" that is at the core of ggplot2.
you might just want to change the form of your data a little bit, so that you have one y-axis variable, with an additional factor variable indicating whether it is a predicted or actual variable.
Is this something like what you are trying to do?
dd<-data.frame(type=rep(c("Predicted_value","Actual_value"),20),y_value=rnorm(40),
x_value=rnorm(40),State_CD=rnorm(40)>0)
qplot(x_value,y_value,data=dd,colour=type,facets=.~State_CD)
well after posting the question I ran across this R Help thread that may have helped me. It looks like I can do this:
pg + geom_line(data=dd,aes(x_value, Actual_value,group=State_CD), colour="green")
is that a good way of doing things? It odd to me because adding the second item has a totally different syntax than the first.

Resources