R - generic function to plot dataframe with ggplot2 (multiple columns) [duplicate] - r

I am trying to figure out how to create multiple ggplots with one function, to avoid tedious copying and pasting. I do not want to have to change argument names in the function when I want to use different columns in the same data.frame. There may be completely different approach to this problem, but I am including two attempts that almost worked but still fall short of what I want.
Thanks!
Edit
I would also like the function to add a facet depending on on argument, such as groupBy="brand", for example. I think aes_string along side of https://stat.ethz.ch/pipermail/r-help/2009-October/213946.html may get me there. I included my facet request as part of my question, because aes_string alone falls short of my goal of being able to facet as part of the plot function. I added brand to the dataset, just to share what I could not find by searching online today.
library(ggplot2)
### sample data ###
n=25
dataTest = data.frame(
xVar=sample(1:3, n, replace=TRUE),
yVar = rnorm(n, 5, 2),
zVar=rnorm(n, 5, .5),
brand=letters[1:5])
### a first attempt ###
### works, but forces me to create a new function whenever column names need to change
my_plot =function(data) {ggplot(data=data, aes(x=xVar, y=yVar))+geom_bar(stat="identity")}
do.call("my_plot", list(data=dataTest))
### wish for something like this... but this does not work
my_plot = function(data) {ggplot(data=data, aes(x=x, y=y))+geom_bar(stat="identity")}
do.call("my_plot", list(data=dataTest, x=xVar, y=yVar))
do.call("my_plot", list(data=dataTest, x=xVar, y=zVar))
### a second attempt, does not work ###
my.plot = function(x, y, data)
{
arguments <- as.list(match.call())
data = eval(arguments$data, envir=data)
x = eval(arguments$x, envir=data)
y = eval(arguments$y, envir=data)
p=ggplot(data=data, aes(x, y))+geom_bar(stat="identity")
return(p)
}
my.plot(x=xVar, y=yVar, data=dataTest)

Using aes_string will allow you to pass character strings into your ggplot2 function, allowing you to programmatically change it more easily:
my.plot = function(x, y, data)
{
p=ggplot(data, aes_string(x=x, y=y))+geom_bar(stat="identity")
print(p)
}
my.plot(x="xVar", y="yVar", data=dataTest)
my.plot(x="xVar", y="zVar", data=dataTest)

What about using %+% to update the plots instead?
Example:
library(ggplot2)
### sample data ###
n=25
dataTest = data.frame(
xVar=sample(1:3, n, replace=TRUE),
yVar = rnorm(n, 5, 2),
zVar=rnorm(n, 5, .5) )
p1 <- ggplot(data = dataTest, aes(x = xVar, y = yVar)) +
geom_bar(stat = "identity")
aes2 <- aes(x = xVar, y = zVar)
p2 <- p1 %+% aes2
p1:
p2:
EDIT
or as #sebastian-c mentioned, aes_string
plots <- function(x, y, data = dataTest) {
p1 <- ggplot(data = data, aes_string(x = x, y = y)) +
geom_bar(stat = "identity")
p1
}
plots('xVar','yVar')
plots('xVar','zVar')
EDIT 2: beat me to the punch :o

Using #sebastian-c's answer and other sources, I have a function that I think will work and I wanted to share it. I think I see #Henrik's solution, but it seems like more typing, as I have 4 groups, 4 'x' categories, and a third category related to time (year, quarters, months).
library(ggplot2)
### sample data ###
n=25
dataTest = data.frame(
xVar=sample(1:3, n, replace=TRUE),
yVar = rnorm(n, 5, 2),
zVar=rnorm(n, 5, .5),
brand=letters[1:5])
### function
my.plot = function(x, y, data, group=NULL)
{
p=ggplot(data, aes_string(x=x, y=y, fill=group))+
geom_bar(stat="identity")
# make a facet if group is not null
if(length(group)>0) {
facets = facet_wrap(formula(paste("~", group)))
p = p + facets
}
return(p)
}

Related

plotly tooltip names returned from ggplot function

I have defined a function that takes a data.frame and returns a plot, which I later on pass to plotly. I need this function to be flexible and it's going to be called a number of times (that's why I wrote a function). A simple reproducible example:
a <- data.frame(x = 1:3, y = c(2, 6, 3))
library(ggplot2)
library(plotly)
plotTrend <- function(x){
var1 <- names(x)[1]
var2 <- names(x)[2]
p <- ggplot(a, aes(x = get(var1), y = get(var2)))+
geom_point()+
geom_smooth(method = "lm")
return(p)
}
Of course I can call plotTrend on a and I'll get the plot I'm expecting.
But when I call ggplotly on it, the tooltip reads an ugly get(var1) instead of the name of the column ("x" in this example).
plotTrend(a)
ggplotly()
I'm aware I could create a text column for the data.frame inside the function, and call ggplotly(tooltip = "text") (I read plenty of questions in SO about that), but I wondered if there's another way to get the right names in the tooltips, either by modifying the function or by using some special argument in ggplotly.
My expected output is:
A plotly plot with
Tooltips that accurately read the values and whose names are "x" and "y"
We can use aes_string to display the evaluated column names in the ggplotly tooltips:
library(ggplot2)
library(plotly)
a <- data.frame(x = 1:3, y = c(2, 6, 3))
var1 <- names(a)[1]
var2 <- names(a)[2]
p <- ggplot(a, aes_string(x = var1, y = var2)) +
geom_point()+
geom_smooth(method = "lm")
ggplotly(p)
NB: this works the same inside the plotTrend function call.
Alternatively, use tidy evaluation to pass the column names as function arguments in the plotTrend function:
plotTrend <- function(data, x, y) {
x_var <- enquo(x)
y_var <- enquo(y)
p <- ggplot(data, aes(x = !!x_var, y = !!y_var)) +
geom_point()+
geom_smooth(method = "lm")
return(p)
}
plotTrend(a, x = x, y = y) %>%
ggplotly()

Multiple Curves With Different Domains in a Single Plot ( with ggplot2)

library("ggplot2")
eq = function(x){x^-1}
ggplot(data.frame(x=c(-6,6)), aes(x = x, y=eq(x)))+
geom_line(data=as.data.frame(curve(from=-6, to=-.01, eq)))+
geom_line(data=as.data.frame(curve(from=.01, to=6, eq)))
I am trying to produce a single plot, and this code gives me the plot I want, but with two additional plots, one with each geom_line. I don't understand why those additional two plots are being created.
In addition to my comment above, you don't need two separate calls to geom_line to produce this plot. You can use stat_function if you redefine your function as follows.
eq <- function(x) ifelse(x==0, NA,x^-1)
Then you can plot it as follows
df <- data.frame(x=seq(-6,6,.01))
ggplot(df) + stat_function(aes(x), fun = eq)
As #shayaa noted in the comments, curve itself generates plots, which is why you are getting the extra plots. To avoid this, you can just create a dataframe before you plot, and subset it in geom_line:
library("ggplot2")
eq = function(x){x^-1}
df <- data.frame(x =seq(-6, 6, 0.01), y = eq(seq(-6, 6, 0.01)))
ggplot(df) +
geom_line(data=subset(df, x<=-.01), aes(x = x, y = y)) +
geom_line(data=subset(df, x>=.01), aes(x = x, y = y))

ggplot2 multiplot using changing variables

I am trying to create multiple plots using ggplot2 that is then gathered in using multiplot. However, when I try to create X graphs I end up with X of the same graph.
My problem code pretty much boils down to this, asuming df is the dataframe
library(ggplot2)
i = 1
j = 2
xVar = df[[i]]
yVar = df[[j]]
plot1 = ggplot(data = df, aes(xVar, yVar)) + geom_point(shape=1)
i = 1
j = 3
xVar = df[[i]]
yVar = df[[j]]
plot2 = ggplot(data = df, aes(xVar, yVar)) + geom_point(shape=1)
multiplot(plot1,plot2, cols=2)
At this point plot1 is equal to plot2 and I dont understand why.
My full code if interested:
n = 1
columns = colnames(df)
plots = list()
for(i in 3:7)
{
for(j in (i+1):7)
{
if(j < 8 & i < 7) {
xVar = df[[i]]
yVar = df[[j]]
plots[[n]] = ggplot(data = df, aes(x=xVar, y=yVar)) +
geom_point(shape=1) +
labs(x=columns[[i]], y=columns[[j]]) +
theme(axis.title=element_text(size=8))
n = n + 1
}
}
}
multiplot(plotlist = plots, cols=3)
There are lots of things going on here.
First, it is a really, really, really bad idea to use external variables in calls to aes(...). The arguments to aes(...) are evaluated in the context of the data=... argument, so in the context of df in your case. If that fails they are evaluated in the global environment. So it is highly preferable to do something like this:
gg <- data.frame(x=df[[i]],y=df[[j]])
plots[[n]] = ggplot(data = gg, aes(x,y)) +...
Second, ggplot stores the expressions from aes(...) and evaluates them when the plot is rendered (so, during the call to multiplot(...)). All of your plots use variables named xVar and yVar in aes(...). So when these plots are rendered, ggplot uses whatever is stored in those variables at the time - presumably from the last plot definition. That's why all your plots look like the last one. This is the reference to "lazy evaluation" in the other answer.
On the other hand, ggplot evaluates the data=... argument immediately, and stores the dataset as part of the plot definition (in the gtable). So creating different data frames (called gg above), for each plot will work.
Finally, it looks like you are trying to create a pairs plot (every column vs. every other column, more or less). Unless this is a homework assignment, there are much easier ways to do this. You could use ggpairs(...) in the GGally package (which uses grid graphics), or you could do it this way using basic ggplot with facets:
# make up some data
set.seed(1) # for reproducible example
df <- data.frame(matrix(rnorm(700),nc=7))
df[4] <- 1+2*df[3] + rnorm(100)
df[5] <- 3*df[3] - 2*df[4] + rnorm(100)
df[6] <- -10*df[5] + rnorm(100)
# you start here...
gg.pairs <- function(data) { # scatterplot matrix using ggplot facets
require(ggplot2)
require(data.table)
require(reshape2) # for melt(...)
DT <- data.table(melt(cbind(id=1:nrow(data),data),id="id"),key="id")
gg <- DT[DT,allow.cartesian=T]
setnames(gg,c("id","H","x","V","y"))
ggplot(gg[as.integer(gg$H)<as.integer(gg$V),], aes(x,y)) +
geom_point(shape=1) +
facet_grid(V~H, scales="free")
}
gg.pairs(df[3:7])
I think that your problem is in R lazy evaluation. Indeed what happens is that plot1 and plot2 are not created when you assign it but when you call it, and at this moment there is only one copy (the last one) of xVarand yVar and plots are the same
Well, I can't explain what is happening, but a workaround is to use column names instead of columns withaes_string. The following makes two unique plots in multiplot for me, and this change could easily be incorporated into your plot loop.
dat = data.frame(x = rnorm(10), y1 = rnorm(10), y2 = rpois(10, 5))
xVar = names(dat)[1]
yVar = names(dat)[2]
plot1 = ggplot(data = dat, aes_string(xVar, yVar)) + geom_point(shape=1)
yVar = names(dat)[3]
plot2 = ggplot(data = dat, aes_string(xVar, yVar)) + geom_point(shape=1)
multiplot(plot1, plot2, cols=2)

Pass variable columns with optional facets in a function via ggplot in R

I am trying to figure out how to create multiple ggplots with one function, to avoid tedious copying and pasting. I do not want to have to change argument names in the function when I want to use different columns in the same data.frame. There may be completely different approach to this problem, but I am including two attempts that almost worked but still fall short of what I want.
Thanks!
Edit
I would also like the function to add a facet depending on on argument, such as groupBy="brand", for example. I think aes_string along side of https://stat.ethz.ch/pipermail/r-help/2009-October/213946.html may get me there. I included my facet request as part of my question, because aes_string alone falls short of my goal of being able to facet as part of the plot function. I added brand to the dataset, just to share what I could not find by searching online today.
library(ggplot2)
### sample data ###
n=25
dataTest = data.frame(
xVar=sample(1:3, n, replace=TRUE),
yVar = rnorm(n, 5, 2),
zVar=rnorm(n, 5, .5),
brand=letters[1:5])
### a first attempt ###
### works, but forces me to create a new function whenever column names need to change
my_plot =function(data) {ggplot(data=data, aes(x=xVar, y=yVar))+geom_bar(stat="identity")}
do.call("my_plot", list(data=dataTest))
### wish for something like this... but this does not work
my_plot = function(data) {ggplot(data=data, aes(x=x, y=y))+geom_bar(stat="identity")}
do.call("my_plot", list(data=dataTest, x=xVar, y=yVar))
do.call("my_plot", list(data=dataTest, x=xVar, y=zVar))
### a second attempt, does not work ###
my.plot = function(x, y, data)
{
arguments <- as.list(match.call())
data = eval(arguments$data, envir=data)
x = eval(arguments$x, envir=data)
y = eval(arguments$y, envir=data)
p=ggplot(data=data, aes(x, y))+geom_bar(stat="identity")
return(p)
}
my.plot(x=xVar, y=yVar, data=dataTest)
Using aes_string will allow you to pass character strings into your ggplot2 function, allowing you to programmatically change it more easily:
my.plot = function(x, y, data)
{
p=ggplot(data, aes_string(x=x, y=y))+geom_bar(stat="identity")
print(p)
}
my.plot(x="xVar", y="yVar", data=dataTest)
my.plot(x="xVar", y="zVar", data=dataTest)
What about using %+% to update the plots instead?
Example:
library(ggplot2)
### sample data ###
n=25
dataTest = data.frame(
xVar=sample(1:3, n, replace=TRUE),
yVar = rnorm(n, 5, 2),
zVar=rnorm(n, 5, .5) )
p1 <- ggplot(data = dataTest, aes(x = xVar, y = yVar)) +
geom_bar(stat = "identity")
aes2 <- aes(x = xVar, y = zVar)
p2 <- p1 %+% aes2
p1:
p2:
EDIT
or as #sebastian-c mentioned, aes_string
plots <- function(x, y, data = dataTest) {
p1 <- ggplot(data = data, aes_string(x = x, y = y)) +
geom_bar(stat = "identity")
p1
}
plots('xVar','yVar')
plots('xVar','zVar')
EDIT 2: beat me to the punch :o
Using #sebastian-c's answer and other sources, I have a function that I think will work and I wanted to share it. I think I see #Henrik's solution, but it seems like more typing, as I have 4 groups, 4 'x' categories, and a third category related to time (year, quarters, months).
library(ggplot2)
### sample data ###
n=25
dataTest = data.frame(
xVar=sample(1:3, n, replace=TRUE),
yVar = rnorm(n, 5, 2),
zVar=rnorm(n, 5, .5),
brand=letters[1:5])
### function
my.plot = function(x, y, data, group=NULL)
{
p=ggplot(data, aes_string(x=x, y=y, fill=group))+
geom_bar(stat="identity")
# make a facet if group is not null
if(length(group)>0) {
facets = facet_wrap(formula(paste("~", group)))
p = p + facets
}
return(p)
}

Data driven plot names in data.table

This is a personal project to learn the syntax of the data.table package. I am trying to use the data values to create multiple graphs and label each based on the by group value. For example, given the following data:
# Generate dummy data
require(data.table)
set.seed(222)
DT = data.table(grp=rep(c("a","b","c"),each=10),
x = rnorm(30, mean=5, sd=1),
y = rnorm(30, mean=8, sd=1))
setkey(DT, grp)
The data consists of random x and y values for 3 groups (a, b, and c). I can create a formatted plot of all values with the following code:
# Example of plotting all groups in one plot
require(ggplot2)
p <- ggplot(data=DT, aes(x = x, y = y)) +
aes(shape = factor(grp))+
geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
labs(title = "Group: ALL")
p
This creates the following plot:
Instead I would like to create a separate plot for each by group, and change the plot title from “Group: ALL” to “Group: a”, “Group: b”, “Group: c”, etc. The documentation for data.table says:
.BY is a list containing a length 1 vector for each item in by. This can be useful when by is not known in advance. The by variables are also available to j directly by name; useful for example for titles of graphs if j is a plot command, or to branch with if()
That being said, I do not understand how to use .BY or .SD to create separate plots for each group. Your help is appreciated.
Here is the data.table solution, though again, not what I would recommend:
make_plot <- function(dat, grp.name) {
print(
ggplot(dat, aes(x=x, y=y)) +
geom_point() + labs(title=paste0("Group: ", grp.name$grp))
)
NULL
}
DT[, make_plot(.SD, .BY), by=grp]
What you really should do for this particular application is what #dmartin recommends. At least, that's what I would do.
Instead of using data.table, you could use facet_grid in ggplot with the labeller argument:
p <- ggplot(data=DT, aes(x = x, y = y)) + aes(shape = factor(grp)) +
geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
facet_grid(. ~ grp, labeller = label_both)
See the ggplot documentation for more information.
I see you already have a "facetting" option. I had done this
p+facet_wrap('grp')
But this gives the same result:
p+facet_wrap(~grp)

Resources