Pass by value in R - r

When trying to call grid.arrange to put multiple plots on a same ggplot2 graph, I first build a list of the plots I want. Then I build the corresponding argument list to call grid.arrange, as was explained in a previous question. This is my code (my dataframe is called manip):
args.list <- NULL;
plot.list <- NULL;
for (m in names(manip[2:10])) {
plot.list <- c(plot.list, list(qplot(manip$side, y=manip[,m],ylab=m))
}
args.list <- c(plot.list, 1, 9)
names(args.list) <- c(names(manip)[2:10], list("nrow","ncol"))
do.call(grid.arrange, args.list)
This works, except that the 9 graphs are exactly the same! After checking, it turns out that the data is always the one corresponding to m=10. So my guess was that the value of m is not assigned in the loop, but evaluated later. However, the label ylab=m is assigned correctly and is different for all the graphs.
So I don't really get what the difference is and how the interpreter chooses when to evaluate m for the plots. Can someone explain?

The behavior is due to the lazy evaluation of R.
Here is a minimal(?) example:
d <- 1:3
args.list <- NULL;
plot.list <- NULL;
for (m in 1:3) {
plot.list <- c(plot.list, list(qplot(d[m], d[m], ylab=letters[m])))
}
args.list <- c(plot.list, nrow=1, ncol=3)
do.call(grid.arrange, args.list)
in this case, d[m] is evaluated at the call of do.call. so m is 3 for all panel.
here is a workaround:
d <- 1:3
args.list <- NULL;
plot.list <- NULL;
for (m in 1:3) {
plot.list <- c(plot.list,
list(qplot(d, d, data=data.frame(d=d[m]), ylab=letters[m])))
}
args.list <- c(plot.list, nrow=1, ncol=3)
do.call(grid.arrange, args.list)
in this case, d[m] is evaluated at the call of qplot, and the d[m] is stored in the output object of qplot.
so, the simple solution is to pass data to qplot() or ggplot().

I will first answer your question and then show an alternative using a facet plot.
Edited
The following, much simplified, code seems to work:
library(gridExtra)
manip <- mtcars
plot.list <- lapply(2:11,
function(x)qplot(manip$mpg, y=manip[, x],
ylab=names(manip)[x]))
do.call(grid.arrange, c(plot.list, nrow=10))
It produces this ugly plot:
Without knowing your objectives, it is dangerous to try and give advice, I know. Nonetheless, have you considered using facets for your plot instead?
The following code is much simpler, executes quiker and produces a graph that is easier to interpret:
library(reshape2)
manip <- mtcars
mmanip <- melt(manip, id.vars="mpg")
str(mmanip)
ggplot(mmanip, aes(x=mpg, y=value)) +
geom_point(stat="identity") +
facet_grid(.~variable, scales="free")

Perhaps it would be better to melt then data and use faceting?
library(ggplot2)
manip <- data.frame(car = row.names(mtcars), mtcars)
manip.m <- melt(manip)
qplot(car, value, data = manip.m) + facet_wrap(~variable, scales = "free_y")
It need some polishing in the xlab
last_plot() + opts(axis.text.x = theme_text(angle = 90))
HTH

Related

How can I create a list of plots to be rendered with ggplot?

I am trying to construct a list of ggplot graphics, which will be plotted later. What I have so far, using Anscombe's quartet for an example, is:
library(ggplot2)
library(gridExtra)
base <- ggplot() + xlim(4,19)
plots = vector(mode = "list", length = 4)
for(i in 1:4) {
x <- anscombe[,i]
y <- anscombe[,i+4]
p <- geom_point(aes(x,y),colour="blue")
q <- geom_smooth(aes(x,y),method="lm",colour="red",fullrange=T)
plots[[i]] <- base+p+q
}
grid.arrange(grobs = plots,ncol=2)
As I travel through the loop, I want the current values of the plots p and q to be added with the base plot, into the i-th value of the list. That is, so that list element number i contains the plots relating to the i-th x and y columns from the dataset.
However, what happens is that the last plot only is drawn, four times. I've done something very similar with base R, using mfrow, plot and abline, so that I believe my logic is correct, but my implementation isn't. I suspect that the issue is with these lines:
plots = vector(mode = "list", length = 4)
plots[[i]] <- base+p+q
How can I create a list of ggplot graphics; starting with an empty list?
(If this is a trivial and stupid question, I apologise. I am very new both to R and to the Grammar of Graphics.)
The code works properly if lapply() is used instead of a for loop.
plots <- lapply(1:4, function(i) {
# create plot number i
})
The reason for this issue is that ggplot uses lazy evaluation. By the time the plots are rendered, the loop already iterated to i=4 and the last plot will be displayed four times.
Full working example:
library(ggplot2)
library(gridExtra)
base <- ggplot() + xlim(4,19)
plots <- lapply(1:4, function(i) {
x <- anscombe[,i]
y <- anscombe[,i+4]
p <- geom_point(aes(x,y),colour="blue")
q <- geom_smooth(aes(x,y),method="lm",colour="red",fullrange=T)
base+p+q
})
grid.arrange(grobs = plots,ncol=2)
To force evaluation, there's a simple solution, change aes(...) into aes_(...) and your code works.
library(ggplot2)
library(gridExtra)
base <- ggplot() + xlim(4,19)
plots <- lapply(1:4, function(i) {
x <- anscombe[,i]
y <- anscombe[,i+4]
p <- geom_point(aes_(x,y),colour="blue")
q <- geom_smooth(aes_(x,y),method="lm",colour="red",fullrange=T)
base+p+q
})
grid.arrange(grobs = plots,ncol=2)

Iterating over a list using ggarrange

I have the following bit of code and don't understand why the for loop isn't working. I'm new to this, so excuse me if this is obvious, but it's not actually producing a combined set of graphs (as the more brute force method does below), it just prints out each graph individually
library(ggpubr)
graphs <- lapply(names(hemi_split), function(i){
ggplot(data=hemi_split[[i]], aes(x=type, y=shoot.mass))+
geom_point()+
facet_wrap(.~host, scales="free")+
theme_minimal()+
labs(title=i)
});graphs
for (i in 1:length(graphs)) {
ggarrange(graphs[[i]])
} ##not working
## this works, and is the desired output
ggarrange(graphs[[1]], graphs[[2]], graphs[[3]],
graphs[[4]], graphs[[5]], graphs[[6]],
graphs[[7]], graphs[[8]], graphs[[9]],
graphs[[10]], graphs[[11]])
thank you!
You can use do.call to provide all of the list elements of graphs as arguments of ggarrange:
library(ggpubr)
graphs <- lapply(names(mtcars)[2:5],function(x){
ggplot(mtcars,aes_string(x = x, y = "mpg")) +
geom_point()})
do.call(ggarrange,graphs)
another solution using purrr
library(tidyverse)
ggraphs <- map(names(mtcars)[2:5],
~ ggplot(mtcars,aes_string(x = .x, y = "mpg")) +
geom_point())
ggarrange(plotlist = ggraphs)

ggplot on grid with a grobList in R

I'm trying to plot multiple plots on a grid using ggplot2 in a for loop, followed by grid.arrange. But all the plots are identical afterwards.
library(ggplot2)
library(grid)
test = data.frame(matrix(rnorm(320), ncol=16 ))
names(test) = sapply(1:16, function(x) paste0("var_",as.character(x)))
plotlist = list()
for (i in 1:(dim(test)[2]-1)){
plotlist[[i]] = ggplot(test) +
geom_point(aes(get(x=names(test)[dim(test)[2]]), y=get(names(test)[i])))
}
pdf("output.pdf")
do.call(grid.arrange, list(grobs=plotlist, nrow=3))
dev.off(4)
When running this code, it seems like the get() calls are only evaluated at the time of the grid.arrange call, so all of the y vectors in the plot are identical as "var_15". Is there a way to force get evaluation immediately, so that I get 15 different plots?
Thanks!
Here are two ways that use purrr::map functions instead of a for-loop. I find that I have less of a clear sense of what's going on when I try to use loops, and since there are functions like the apply and map families that fit so neatly into R's vector operations paradigm, I generally go with mapping instead.
The first example makes use of cowplot::plot_grid, which can take a list of plots and arrange them. The second uses the newer patchwork package, which lets you add plots together—like literally saying plot1 + plot2—and add a layout. To do all those additions, I use purrr::reduce with + as the function being applied to all the plots.
library(tidyverse)
set.seed(722)
test = data.frame(matrix(rnorm(320), ncol=16 ))
names(test) = sapply(1:16, function(x) paste0("var_",as.character(x)))
# extract all but last column
xvars <- test[, -ncol(test)]
By using purrr::imap, I can map over all the columns and apply a function with 2 arguments: the column itself, and its name. That way I can set an x-axis label that specifies the column name. I can also easily access the column of data without having to use get or any tidyeval tricks (although for something for complicated, a tidyeval solution might be better).
plots <- imap(xvars, function(variable, var_name) {
df <- data_frame(x = variable, y = test[, ncol(test)])
ggplot(df, aes(x = x, y = y)) +
geom_point() +
xlab(var_name)
})
cowplot::plot_grid(plotlist = plots, nrow = 3)
library(patchwork)
# same as plots[[1]] + plots[[2]] + plots[[3]] + ...
reduce(plots, `+`) + plot_layout(nrow = 3)
Created on 2018-07-22 by the reprex package (v0.2.0).
Try this:
library(ggplot2)
library(grid)
library(gridExtra)
set.seed(1234)
test = data.frame(matrix(rnorm(320), ncol=16 ))
names(test) = sapply(1:16, function(x) paste0("var_",as.character(x)))
plotlist = list()
for (i in 1:(dim(test)[2]-1)) {
# Define here the dataset for the i-th plot
df <- data.frame(x=test$var_16, y=test[, i])
plotlist[[i]] = ggplot(data=df, aes(x=x, y=y)) + geom_point()
}
grid.arrange(grobs=plotlist, nrow=3)

R: Error looping variable in in ggplot2 geom_line

I am trying to add multiple lines to a plot using a for loop. Using i as a variable to navigate in my data, for some reason, I get an error in the ggplot part all of a sudden, saying that that object i is not found ("Error in as.name(names(data$SURF)[i]) : object 'i' not found"). It seems like only those variables are accepted, that are linked to the data frame given to ggplot. I have used different variables in ggplots before though, so I don't know why it doesn't work now. This is my code:
library(ggplot2)
#creating object 'data' for example purpose
surf <- list()
day <- c(1:7)
for(i in 1:5){
surf[[i]] <- data.frame(day*i, day-i)
names(surf)[i] <- paste("var",i,sep = "")
colnames(surf[[i]]) <- c("T0", "whatever")
}
data <- list(surf)
names(data)[1] <- "SURF"
df <- data.frame(day)
ret <- ggplot(df, aes(x=day))
for(i in 1:length(names(data$SURF))){
df[,i+1] <- data$SURF[[i]]$T0
colnames(df)[i+1] <- names(data$SURF)[i]
ret <- ret + geom_line(data = df, aes(y=as.name(names(data$SURF)[i]), colour= names(data$SURF)[i]))
}
I managed to solve the problem in a not fully pleasing way, by omitting new variables in the plot part. I am not fully content with this solution though, because I want to keep working from a more automated code. This is the 'dirty" solution:
df <- data.frame(day)
ret <- ggplot(df, aes(x=day))
for(i in 1:length(names(data$SURF))){
df$y <- data$SURF[[i]]$T0
df$name <- names(data$SURF[i])
ret <- ret + geom_line(data = df, aes(y=y, colour= name))
}
I'd be grateful if someone could help me figure out why the use of 'external' variable i does not work in this example.
Thanks for the help. I see, my head was thinking in the wrong direction and way too complicated. Using the melt() function of the reshape2 package really gives me everything I need for this purpose. For anyone interested, here is how I solved my problem:
library(reshape2)
library(ggplot2)
for(i in 1:length(data$SURF)){
df[,i+1] <- data$SURF[[i]]$T0
}
colnames(df) <- c("day", names(data$SURF))
mlt <- melt(data = df, id.vars = "day")
ret <- ggplot(mlt) +
aes(x=day, y=value, group=variable, colour = variable) +
geom_line()

Histograms using ggplot2 within loop

I would like to create a grid of histograms using a loop and ggplot2. Say I have the following code:
library(gridExtra)
library(ggplot2)
df<-matrix(NA,2000,5)
df[,1]<-rnorm(2000,1,1)
df[,2]<-rnorm(2000,2,1)
df[,3]<-rnorm(2000,3,1)
df[,4]<-rnorm(2000,4,1)
df[,5]<-rnorm(2000,5,1)
df<-data.frame(df)
out<-NULL
for (i in 1:5){
out[[i]]<-ggplot(df, aes(x=df[,i])) + geom_histogram(binwidth=.5)
}
grid.arrange(out[[1]],out[[2]],out[[3]],out[[4]],out[[5]], ncol=2)
Note that all of the plots appear, but that they all have the same mean and shape, despite having set each of the columns of df to have different means.
It seems to only plot the last plot (out[[5]]), that is, the loop seems to be reassigning all of the out[[i]]s with out[[5]].
I'm not sure why, could someone help?
I agree with #GabrielMagno, facetting is the way to go. But if for some reason you need to work with the loop, then either of these will do the job.
library(gridExtra)
library(ggplot2)
df<-matrix(NA,2000,5)
df[,1]<-rnorm(2000,1,1)
df[,2]<-rnorm(2000,2,1)
df[,3]<-rnorm(2000,3,1)
df[,4]<-rnorm(2000,4,1)
df[,5]<-rnorm(2000,5,1)
df<-data.frame(df)
out<-list()
for (i in 1:5){
x = df[,i]
out[[i]] <- ggplot(data.frame(x), aes(x)) + geom_histogram(binwidth=.5)
}
grid.arrange(out[[1]],out[[2]],out[[3]],out[[4]],out[[5]], ncol=2)
or
out1 = lapply(df, function(x){
ggplot(data.frame(x), aes(x)) + geom_histogram(binwidth=.5) })
grid.arrange(out1[[1]],out1[[2]],out1[[3]],out1[[4]],out1[[5]], ncol=2)
I would recommend using facet_wrap instead of aggregating and arranging the plots by yourself. It requires you to specify a grouping variable in the data frame that separates the values for each distribution. You can use the melt function from the reshape2 package to create such new data frame. So, having your data stored in df, you could simply do this:
library(ggplot2)
library(reshape2)
ggplot(melt(df), aes(x = value)) +
facet_wrap(~ variable, scales = "free", ncol = 2) +
geom_histogram(binwidth = .5)
That would give you something similar to this:

Resources