ggplot2 multiplot using changing variables - r

I am trying to create multiple plots using ggplot2 that is then gathered in using multiplot. However, when I try to create X graphs I end up with X of the same graph.
My problem code pretty much boils down to this, asuming df is the dataframe
library(ggplot2)
i = 1
j = 2
xVar = df[[i]]
yVar = df[[j]]
plot1 = ggplot(data = df, aes(xVar, yVar)) + geom_point(shape=1)
i = 1
j = 3
xVar = df[[i]]
yVar = df[[j]]
plot2 = ggplot(data = df, aes(xVar, yVar)) + geom_point(shape=1)
multiplot(plot1,plot2, cols=2)
At this point plot1 is equal to plot2 and I dont understand why.
My full code if interested:
n = 1
columns = colnames(df)
plots = list()
for(i in 3:7)
{
for(j in (i+1):7)
{
if(j < 8 & i < 7) {
xVar = df[[i]]
yVar = df[[j]]
plots[[n]] = ggplot(data = df, aes(x=xVar, y=yVar)) +
geom_point(shape=1) +
labs(x=columns[[i]], y=columns[[j]]) +
theme(axis.title=element_text(size=8))
n = n + 1
}
}
}
multiplot(plotlist = plots, cols=3)

There are lots of things going on here.
First, it is a really, really, really bad idea to use external variables in calls to aes(...). The arguments to aes(...) are evaluated in the context of the data=... argument, so in the context of df in your case. If that fails they are evaluated in the global environment. So it is highly preferable to do something like this:
gg <- data.frame(x=df[[i]],y=df[[j]])
plots[[n]] = ggplot(data = gg, aes(x,y)) +...
Second, ggplot stores the expressions from aes(...) and evaluates them when the plot is rendered (so, during the call to multiplot(...)). All of your plots use variables named xVar and yVar in aes(...). So when these plots are rendered, ggplot uses whatever is stored in those variables at the time - presumably from the last plot definition. That's why all your plots look like the last one. This is the reference to "lazy evaluation" in the other answer.
On the other hand, ggplot evaluates the data=... argument immediately, and stores the dataset as part of the plot definition (in the gtable). So creating different data frames (called gg above), for each plot will work.
Finally, it looks like you are trying to create a pairs plot (every column vs. every other column, more or less). Unless this is a homework assignment, there are much easier ways to do this. You could use ggpairs(...) in the GGally package (which uses grid graphics), or you could do it this way using basic ggplot with facets:
# make up some data
set.seed(1) # for reproducible example
df <- data.frame(matrix(rnorm(700),nc=7))
df[4] <- 1+2*df[3] + rnorm(100)
df[5] <- 3*df[3] - 2*df[4] + rnorm(100)
df[6] <- -10*df[5] + rnorm(100)
# you start here...
gg.pairs <- function(data) { # scatterplot matrix using ggplot facets
require(ggplot2)
require(data.table)
require(reshape2) # for melt(...)
DT <- data.table(melt(cbind(id=1:nrow(data),data),id="id"),key="id")
gg <- DT[DT,allow.cartesian=T]
setnames(gg,c("id","H","x","V","y"))
ggplot(gg[as.integer(gg$H)<as.integer(gg$V),], aes(x,y)) +
geom_point(shape=1) +
facet_grid(V~H, scales="free")
}
gg.pairs(df[3:7])

I think that your problem is in R lazy evaluation. Indeed what happens is that plot1 and plot2 are not created when you assign it but when you call it, and at this moment there is only one copy (the last one) of xVarand yVar and plots are the same

Well, I can't explain what is happening, but a workaround is to use column names instead of columns withaes_string. The following makes two unique plots in multiplot for me, and this change could easily be incorporated into your plot loop.
dat = data.frame(x = rnorm(10), y1 = rnorm(10), y2 = rpois(10, 5))
xVar = names(dat)[1]
yVar = names(dat)[2]
plot1 = ggplot(data = dat, aes_string(xVar, yVar)) + geom_point(shape=1)
yVar = names(dat)[3]
plot2 = ggplot(data = dat, aes_string(xVar, yVar)) + geom_point(shape=1)
multiplot(plot1, plot2, cols=2)

Related

ggplot: adding a label to a geom_line aes_string

I have a for loop plotting 3 geom_lines, how do I add a label/legend so they won't all be 3 indiscernible black lines?
methods.list <- list(rwf,snaive,meanf)
cv.list <- lapply(methods.list, function(method) {
taylor%>% tsCV(forecastfunction = method, h=48)
})
gg <- ggplot(NULL, aes(x))
for (i in seq(1,3)){
gg <- gg + geom_line(aes_string( y=sqrt(colMeans(cv.list[[i]]^2, na.rm=TRUE))))
}
gg + guides(colour=guide_legend(title="Forecast"))
If I don't use a loop, I can use aes instead of that horrible aes_string and then everything works, but I have to write the same code 3 times and replace the loop with this:
gg <- gg + geom_line(aes(y=sqrt(colMeans(cv.list[[1]]^2, na.rm=TRUE)), colour=names(cv.list)[1]))
gg <- gg + geom_line(aes(y=sqrt(colMeans(cv.list[[2]]^2, na.rm=TRUE)), colour=names(cv.list)[2]))
gg <- gg + geom_line(aes(y=sqrt(colMeans(cv.list[[3]]^2, na.rm=TRUE)), colour=names(cv.list)[3]))
and then there are nice automatic colors and legend. What am I missing? Why is r being so noob-unfriendly?
The example is not reproducible, (there is no data!) but it seems you have some information in a list cv.list which contains multiple data.frames, and you want to plot some summary statistic of each against a common varaible stored in x.
The simplest method is simply to create a data.frame and plot using the data.frame.
#Create 3 data.frames with data (forecast?)
df <- lapply(1:3, function(group){
summ_stat <- sqrt(colMeans(cv.list[[i]]^2, na.rm=TRUE))
group <- group
data.frame(summ_stat, group, x = x)
})
#bind the data.frames into a single data.frame
df <- do.call(rbind, df)
#Create the plot
ggplot(data = df, aes(x = x, y = summ_stat, colour = group)) +
geom_line() +
labs(colour = "Forecast")
Note the change of label in the labs argument. This is changing the label of colour which is part of aes.

R - generic function to plot dataframe with ggplot2 (multiple columns) [duplicate]

I am trying to figure out how to create multiple ggplots with one function, to avoid tedious copying and pasting. I do not want to have to change argument names in the function when I want to use different columns in the same data.frame. There may be completely different approach to this problem, but I am including two attempts that almost worked but still fall short of what I want.
Thanks!
Edit
I would also like the function to add a facet depending on on argument, such as groupBy="brand", for example. I think aes_string along side of https://stat.ethz.ch/pipermail/r-help/2009-October/213946.html may get me there. I included my facet request as part of my question, because aes_string alone falls short of my goal of being able to facet as part of the plot function. I added brand to the dataset, just to share what I could not find by searching online today.
library(ggplot2)
### sample data ###
n=25
dataTest = data.frame(
xVar=sample(1:3, n, replace=TRUE),
yVar = rnorm(n, 5, 2),
zVar=rnorm(n, 5, .5),
brand=letters[1:5])
### a first attempt ###
### works, but forces me to create a new function whenever column names need to change
my_plot =function(data) {ggplot(data=data, aes(x=xVar, y=yVar))+geom_bar(stat="identity")}
do.call("my_plot", list(data=dataTest))
### wish for something like this... but this does not work
my_plot = function(data) {ggplot(data=data, aes(x=x, y=y))+geom_bar(stat="identity")}
do.call("my_plot", list(data=dataTest, x=xVar, y=yVar))
do.call("my_plot", list(data=dataTest, x=xVar, y=zVar))
### a second attempt, does not work ###
my.plot = function(x, y, data)
{
arguments <- as.list(match.call())
data = eval(arguments$data, envir=data)
x = eval(arguments$x, envir=data)
y = eval(arguments$y, envir=data)
p=ggplot(data=data, aes(x, y))+geom_bar(stat="identity")
return(p)
}
my.plot(x=xVar, y=yVar, data=dataTest)
Using aes_string will allow you to pass character strings into your ggplot2 function, allowing you to programmatically change it more easily:
my.plot = function(x, y, data)
{
p=ggplot(data, aes_string(x=x, y=y))+geom_bar(stat="identity")
print(p)
}
my.plot(x="xVar", y="yVar", data=dataTest)
my.plot(x="xVar", y="zVar", data=dataTest)
What about using %+% to update the plots instead?
Example:
library(ggplot2)
### sample data ###
n=25
dataTest = data.frame(
xVar=sample(1:3, n, replace=TRUE),
yVar = rnorm(n, 5, 2),
zVar=rnorm(n, 5, .5) )
p1 <- ggplot(data = dataTest, aes(x = xVar, y = yVar)) +
geom_bar(stat = "identity")
aes2 <- aes(x = xVar, y = zVar)
p2 <- p1 %+% aes2
p1:
p2:
EDIT
or as #sebastian-c mentioned, aes_string
plots <- function(x, y, data = dataTest) {
p1 <- ggplot(data = data, aes_string(x = x, y = y)) +
geom_bar(stat = "identity")
p1
}
plots('xVar','yVar')
plots('xVar','zVar')
EDIT 2: beat me to the punch :o
Using #sebastian-c's answer and other sources, I have a function that I think will work and I wanted to share it. I think I see #Henrik's solution, but it seems like more typing, as I have 4 groups, 4 'x' categories, and a third category related to time (year, quarters, months).
library(ggplot2)
### sample data ###
n=25
dataTest = data.frame(
xVar=sample(1:3, n, replace=TRUE),
yVar = rnorm(n, 5, 2),
zVar=rnorm(n, 5, .5),
brand=letters[1:5])
### function
my.plot = function(x, y, data, group=NULL)
{
p=ggplot(data, aes_string(x=x, y=y, fill=group))+
geom_bar(stat="identity")
# make a facet if group is not null
if(length(group)>0) {
facets = facet_wrap(formula(paste("~", group)))
p = p + facets
}
return(p)
}

ggplot2: adding lines in a loop and retaining colour mappings

When running the following two pieces of code, I unexpectedly get different results. I need to add lines in a loop as in EX2, but all lines end up having the same colour. Why is this?
EX1
economics2 <- economics
economics2$unemploy <- economics$unemploy + 1000
economics3 <- economics
economics3$unemploy <- economics$unemploy + 2000
economics4 <- economics
economics4$unemploy <- economics$unemploy + 3000
b <- ggplot() +
geom_line(aes(x = date, y = unemploy, colour = as.character(1)), data=economics2) +
geom_line(aes(x = date, y = unemploy, colour = as.character(2)), data=economics3) +
geom_line(aes(x = date, y = unemploy, colour = as.character(3)), data=economics4)
print(b)
EX2
#economics2, economics3, economics4 are reused from EX1.
b <- ggplot()
econ <- list(economics2, economics3, economics4)
for(i in 1:3){
b <- b + geom_line(aes(x = date, y = unemploy, colour = as.character(i)), data=econ[[i]])
}
print(b)
This is not a good way to use ggplot. Try this way:
econ <- list(e1=economics2, e2=economics3, e3=economics4)
df <- cbind(cat=rep(names(econ),sapply(econ,nrow)),do.call(rbind,econ))
ggplot(df, aes(date,unemploy, color=cat)) + geom_line()
This puts your three versions of economics into a single data.frame, in long format (all the data in 1 column, with a second column, cat in this example, identifying the source). Once you've done that, ggplot takes care of everything else. No loops.
The specific reason your loop failed, as pointed out in the comment, is that using aes(...) stores the expression in the ggplot object, and that expression is evaluated when you call print(...). At that point i is 3.
Note that this does not apply to the data=... argument, so you could have done something like this:
b=ggplot()
for(i in 1:3){
b <- b + geom_line(aes(x=date,y=unemploy,colour=cat),
data=cbind(cat=as.character(i),econ[[i]]))
}
print(b)
But, this is still the wrong way to use ggplot.

Pass variable columns with optional facets in a function via ggplot in R

I am trying to figure out how to create multiple ggplots with one function, to avoid tedious copying and pasting. I do not want to have to change argument names in the function when I want to use different columns in the same data.frame. There may be completely different approach to this problem, but I am including two attempts that almost worked but still fall short of what I want.
Thanks!
Edit
I would also like the function to add a facet depending on on argument, such as groupBy="brand", for example. I think aes_string along side of https://stat.ethz.ch/pipermail/r-help/2009-October/213946.html may get me there. I included my facet request as part of my question, because aes_string alone falls short of my goal of being able to facet as part of the plot function. I added brand to the dataset, just to share what I could not find by searching online today.
library(ggplot2)
### sample data ###
n=25
dataTest = data.frame(
xVar=sample(1:3, n, replace=TRUE),
yVar = rnorm(n, 5, 2),
zVar=rnorm(n, 5, .5),
brand=letters[1:5])
### a first attempt ###
### works, but forces me to create a new function whenever column names need to change
my_plot =function(data) {ggplot(data=data, aes(x=xVar, y=yVar))+geom_bar(stat="identity")}
do.call("my_plot", list(data=dataTest))
### wish for something like this... but this does not work
my_plot = function(data) {ggplot(data=data, aes(x=x, y=y))+geom_bar(stat="identity")}
do.call("my_plot", list(data=dataTest, x=xVar, y=yVar))
do.call("my_plot", list(data=dataTest, x=xVar, y=zVar))
### a second attempt, does not work ###
my.plot = function(x, y, data)
{
arguments <- as.list(match.call())
data = eval(arguments$data, envir=data)
x = eval(arguments$x, envir=data)
y = eval(arguments$y, envir=data)
p=ggplot(data=data, aes(x, y))+geom_bar(stat="identity")
return(p)
}
my.plot(x=xVar, y=yVar, data=dataTest)
Using aes_string will allow you to pass character strings into your ggplot2 function, allowing you to programmatically change it more easily:
my.plot = function(x, y, data)
{
p=ggplot(data, aes_string(x=x, y=y))+geom_bar(stat="identity")
print(p)
}
my.plot(x="xVar", y="yVar", data=dataTest)
my.plot(x="xVar", y="zVar", data=dataTest)
What about using %+% to update the plots instead?
Example:
library(ggplot2)
### sample data ###
n=25
dataTest = data.frame(
xVar=sample(1:3, n, replace=TRUE),
yVar = rnorm(n, 5, 2),
zVar=rnorm(n, 5, .5) )
p1 <- ggplot(data = dataTest, aes(x = xVar, y = yVar)) +
geom_bar(stat = "identity")
aes2 <- aes(x = xVar, y = zVar)
p2 <- p1 %+% aes2
p1:
p2:
EDIT
or as #sebastian-c mentioned, aes_string
plots <- function(x, y, data = dataTest) {
p1 <- ggplot(data = data, aes_string(x = x, y = y)) +
geom_bar(stat = "identity")
p1
}
plots('xVar','yVar')
plots('xVar','zVar')
EDIT 2: beat me to the punch :o
Using #sebastian-c's answer and other sources, I have a function that I think will work and I wanted to share it. I think I see #Henrik's solution, but it seems like more typing, as I have 4 groups, 4 'x' categories, and a third category related to time (year, quarters, months).
library(ggplot2)
### sample data ###
n=25
dataTest = data.frame(
xVar=sample(1:3, n, replace=TRUE),
yVar = rnorm(n, 5, 2),
zVar=rnorm(n, 5, .5),
brand=letters[1:5])
### function
my.plot = function(x, y, data, group=NULL)
{
p=ggplot(data, aes_string(x=x, y=y, fill=group))+
geom_bar(stat="identity")
# make a facet if group is not null
if(length(group)>0) {
facets = facet_wrap(formula(paste("~", group)))
p = p + facets
}
return(p)
}

Data driven plot names in data.table

This is a personal project to learn the syntax of the data.table package. I am trying to use the data values to create multiple graphs and label each based on the by group value. For example, given the following data:
# Generate dummy data
require(data.table)
set.seed(222)
DT = data.table(grp=rep(c("a","b","c"),each=10),
x = rnorm(30, mean=5, sd=1),
y = rnorm(30, mean=8, sd=1))
setkey(DT, grp)
The data consists of random x and y values for 3 groups (a, b, and c). I can create a formatted plot of all values with the following code:
# Example of plotting all groups in one plot
require(ggplot2)
p <- ggplot(data=DT, aes(x = x, y = y)) +
aes(shape = factor(grp))+
geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
labs(title = "Group: ALL")
p
This creates the following plot:
Instead I would like to create a separate plot for each by group, and change the plot title from “Group: ALL” to “Group: a”, “Group: b”, “Group: c”, etc. The documentation for data.table says:
.BY is a list containing a length 1 vector for each item in by. This can be useful when by is not known in advance. The by variables are also available to j directly by name; useful for example for titles of graphs if j is a plot command, or to branch with if()
That being said, I do not understand how to use .BY or .SD to create separate plots for each group. Your help is appreciated.
Here is the data.table solution, though again, not what I would recommend:
make_plot <- function(dat, grp.name) {
print(
ggplot(dat, aes(x=x, y=y)) +
geom_point() + labs(title=paste0("Group: ", grp.name$grp))
)
NULL
}
DT[, make_plot(.SD, .BY), by=grp]
What you really should do for this particular application is what #dmartin recommends. At least, that's what I would do.
Instead of using data.table, you could use facet_grid in ggplot with the labeller argument:
p <- ggplot(data=DT, aes(x = x, y = y)) + aes(shape = factor(grp)) +
geom_point(aes(colour = factor(grp), shape = factor(grp)), size = 3) +
facet_grid(. ~ grp, labeller = label_both)
See the ggplot documentation for more information.
I see you already have a "facetting" option. I had done this
p+facet_wrap('grp')
But this gives the same result:
p+facet_wrap(~grp)

Resources