I'm using a for loop to assign ggplots to a list, which is then passed to plot_grid() (package cowplot). plot_grid places multiple ggplots side by side in a single figure. This works fine manually, but when I use a for loop, the last plot generated is repeated in each subframe of the figure (shown below). In other words, all the subframes show the same ggplot.
Here is a toy example:
require(cowplot)
dfrm <- data.frame(A=1:10, B=10:1)
v <- c("A","B")
dfmsize <- nrow(dfrm)
myplots <- vector("list",2)
count = 1
for(i in v){
myplots[[count]] <- ggplot(dfrm, aes(x=1:dfmsize, y=dfrm[,i])) + geom_point() + labs(y=i)
count = count +1
}
plot_grid(plotlist=myplots)
Expected Figure:
Figure from for loop:
I tried converting the list elements to grobs, as described in this question, like this:
mygrobs <- lapply(myplots, ggplotGrob)
plot_grid(plotlist=mygrobs)
But I got the same result.
I think the problem lies in the loop assignment, not plot_grid(), but I can't see what I'm doing wrong.
The answers so far are very close, but unsatisfactory in my opinion. The problem is the following - after your for loop:
myplots[[1]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, i]
myplots[[1]]$plot_env
#<environment: R_GlobalEnv>
myplots[[2]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, i]
myplots[[2]]$plot_env
#<environment: R_GlobalEnv>
i
#[1] "B"
As the other answers mention, ggplot doesn't actually evaluate those expressions until plotting, and since these are all in the global environment, and the value of i is "B", you get the undesirable results.
There are several ways of avoiding this issue, the simplest of which in fact simplifies your expressions:
myplots = lapply(v, function(col)
ggplot(dfrm, aes(x=1:dfmsize, y=dfrm[,col])) + geom_point() + labs(y=col))
The reason this works, is because the environment is different for each of the values in the lapply loop:
myplots[[1]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, col]
myplots[[1]]$plot_env
#<environment: 0x000000000bc27b58>
myplots[[2]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, col]
myplots[[2]]$plot_env
#<environment: 0x000000000af2ef40>
eval(quote(dfrm[, col]), env = myplots[[1]]$plot_env)
#[1] 1 2 3 4 5 6 7 8 9 10
eval(quote(dfrm[, col]), env = myplots[[2]]$plot_env)
#[1] 10 9 8 7 6 5 4 3 2 1
So even though the expressions are the same, the results are different.
And in case you're wondering what exactly is stored/copied to the environment of lapply - unsurprisingly it's just the column name:
ls(myplots[[1]]$plot_env)
#[1] "col"
I believe the problem here is that the non-standard evaluation of the aes method delays evaluating i until the plot is actually plotted. By the time of plotting, i is the last value (in the toy example "B") and thus the y aesthetic mapping for all plots refers to that last value. Meanwhile, the labs call uses standard evaluation and so the labels correctly refer to each iteration of i in the loop.
This can be fixed by simply using the standard evaluation version of the mapping function, aes_q:
require(cowplot)
dfrm <- data.frame(A=1:10, B=10:1)
v <- c("A","B")
dfmsize <- nrow(dfrm)
myplots <- vector("list",2)
count = 1
for(i in v){
myplots[[count]] <- ggplot(dfrm, aes_q(x=1:dfmsize, y=dfrm[,i])) + geom_point() + labs(y=i)
count = count +1
}
plot_grid(plotlist=myplots)
There is a nice explanation of what happens with ggplot2's lazy evaluation and for loops in [this answer](https://stackoverflow.com/a/26246791/2461552.
I usually switch to aes_string or aes_ for situations like this so I can use variables as strings in ggplot2.
I find lapply loops easier than a for loop in your case as initializing the list and using the counter can be avoided.
First, I add the x variable to the dataset.
dfrm$index = 1:nrow(dfrm)
Now, the lapply loop, looping through the columns in v.
myplots = lapply(v, function(x) {
ggplot(dfrm, aes_string(x = "index", y = x)) +
geom_point() +
labs(y = x)
})
plot_grid(plotlist = myplots)
I think ggplot is getting confused by looking for your x and y variables inside of dfrm even though you are actually defining them on the fly. If you change the for loop slightly to build a new sub data.frame as the first line it works just fine.
myplots <- list()
count = 1
for(i in v){
df <- data.frame(x = 1:dfmsize, y = dfrm[,i])
myplots[[count]] <- ggplot(df, aes(x=x, y=y)) + geom_point() + labs(y=i)
count = count + 1
}
plot_grid(plotlist=myplots)
Related
I have a dataset with numeric and factor variables. I want to do one page with numeric and other with factor var. First of all, i select factor var with his index.
My df is IRIS dataset.
df<-iris
df$y<-sample(0:1,nrow(iris),replace=TRUE)
fact<-colnames(df)[sapply(df,is.factor)]
index_fact<-which(names(df)%in%fact)
Then i calculate rest of it (numerics)
nm<-ncol(df)-length(fact)
Next step is create loop
i_F=1
i_N=1
list_plotN<- list()
list_plotF<- list()
for (i in 1:length(df)){
plot <- ggplot(df,aes(x=df[,i],color=y,fill=y))+xlab(names(df)[i])
if (is.factor(df[,i])){
p_factor<-plot+geom_bar()
list_plotF[[i_F]]<-p_factor
i_F=i_F+1
}else{
p_numeric <- plot+geom_histogram()
list_plotN[[i_N]]<-p_numeric
i_N=i_N+1
}
}
When i see list_plotF and list_plot_N,it didn't well. It always have same vars. i don't know what i'm doing wrong.
thanks!!!
I don't really follow your for loop code all that well. But from what I see it seems to be saving the last plot in every loop you make. I've reconstructed what I think you need using lapply. I generally prefer lapply to for loops whenever I can.
Lapply takes a list of values and a function and applies that function to every value. you can define your function separately like I have so everything looks cleaner. Then you just mention the function in the lapply command.
In our case the list is a list of columns from your dataframe df. The function it applies first creates our base plot. Then it does a quick check to see if the column it is looking at is a factor.. If it's a factor it creates a bar graph, else it creates a histogram.
histOrBar <- function(var) {
basePlot <- ggplot(df, aes_string(var))
if ( is.factor(df[[var]]) ) {
basePlot + geom_bar()
} else {
basePlot + geom_histogram()
}
}
loDFs <- lapply(colnames(df), histOrBar)
Consider passing column names with aes_string to better align x with df:
for (i in 1:length(df)){
plot <- ggplot(df, aes_string(x=names(df)[i], color="y", fill="y")) +
xlab(names(df)[i])
...
}
To demonstrate the problem using aes() and solution using aes_string() in OP's context, consider the following random data frame with columns of different data types: factor, char, int, num, bool, date.
Data
library(ggplot2)
set.seed(1152019)
alpha <- c(LETTERS, letters, c(0:9))
data_tools <- c("sas", "stata", "spss", "python", "r", "julia")
random_df <- data.frame(
group = sample(data_tools, 500, replace=TRUE),
int = as.numeric(sample(1:15, 500, replace=TRUE)),
num = rnorm(500),
char = replicate(500, paste(sample(LETTERS[1:2], 3, replace=TRUE), collapse="")),
bool = as.numeric(sample(c(TRUE, FALSE), 500, replace=TRUE)),
date = as.Date(sample(as.integer(as.Date('2019-01-01', origin='1970-01-01')):as.integer(Sys.Date()),
500, replace=TRUE), origin='1970-01-01')
)
Graph
fact <- colnames(random_df)[sapply(random_df,is.factor)]
index_fact <- which(names(random_df) %in% fact)
i_F=1
i_N=1
list_plotN <- list()
list_plotF <- list()
plot <- NULL
for (i in 1:length(random_df)){
# aes() VERSION
#plot <- ggplot(random_df, aes(x=random_df[,i], color=group, fill=group)) +
# xlab(names(random_df)[i])
# aes_string() VERSION
plot <- ggplot(random_df, aes_string(x=names(random_df)[i], color="group", fill="group")) +
xlab(names(random_df)[i])
if (is.factor(random_df[,i])){
p_factor <- plot + geom_bar()
list_plotF[[i_F]] <- p_factor
i_F=i_F+1
}else{
p_numeric <- plot + geom_histogram()
list_plotN[[i_N]] <- p_numeric
i_N=i_N+1
}
}
Problem (using aes() where graph outputs DO NOT change according to type)
Solution (using aes_string() where graphs DO change according to type)
This question already has an answer here:
R: Create custom output from list object
(1 answer)
Closed 4 years ago.
I'm building a function that performs some actions on a dataset, and I want it to return a variety of plots that may or may not be needed at any given time.
My approach to now is to return a list of objects, including some ggplot objects.
The issue is that when I call the function without assignment, or execute the resulting list, the ggplots are plotted alongside the printing of the list summary, a behaviour I want to avoid as the object may include many ggplot objects.
Example:
library(ggplot2)
df <- data.frame(
x = 1:10,
y = 10:1,
g = rep(c('a', 'b'), each=5)
)
df_list <- split(df, df$g)
plot_list <- lapply(df_list, function(d){
ggplot(d) +
geom_point(aes(x=x, y=y))
})
plot_list
# $`a` # plots plot_list$a
#
# $`b` # plots plot_list$b
I don't want to modify the default behaviours of the ggplot2 object, though I am open to a more advanced S3 solution where I have no idea how to go about this.
You can simply override the default behaviour by having your function return an object from a custom class.
Option 1: subclass your plots
Here our plots are now quiet_plot which do not print.
To actually print them you'll have to explicitly call print.ggplot
library(ggplot2)
df <- data.frame(
x = 1:10,
y = 10:1,
g = rep(c('a', 'b'), each=5)
)
df_list <- split(df, df$g)
plot_list <- lapply(df_list, function(d){
out <- ggplot(d) +
geom_point(aes(x=x, y=y))
class(out) <- c("quiet_plot", class(out))
out
})
print.quiet_plot <- function(x, ...) {
print("A plot not displayed!")
}
plot_list
Option 2 - subclass your list
This allows you to specify how you want the list to be printed when you just type plot_list in the console. Here I had it print the names of the list instead of the full list content.
plot_list <- lapply(df_list, function(d){
ggplot(d) +
geom_point(aes(x=x, y=y))
})
class(plot_list) <- c("quiet_list", class(plot_list))
print.quiet_list <- function(x, ...) {
cat("A list with names:")
print(names(x))
}
plot_list
I'm using a for loop to assign ggplots to a list, which is then passed to plot_grid() (package cowplot). plot_grid places multiple ggplots side by side in a single figure. This works fine manually, but when I use a for loop, the last plot generated is repeated in each subframe of the figure (shown below). In other words, all the subframes show the same ggplot.
Here is a toy example:
require(cowplot)
dfrm <- data.frame(A=1:10, B=10:1)
v <- c("A","B")
dfmsize <- nrow(dfrm)
myplots <- vector("list",2)
count = 1
for(i in v){
myplots[[count]] <- ggplot(dfrm, aes(x=1:dfmsize, y=dfrm[,i])) + geom_point() + labs(y=i)
count = count +1
}
plot_grid(plotlist=myplots)
Expected Figure:
Figure from for loop:
I tried converting the list elements to grobs, as described in this question, like this:
mygrobs <- lapply(myplots, ggplotGrob)
plot_grid(plotlist=mygrobs)
But I got the same result.
I think the problem lies in the loop assignment, not plot_grid(), but I can't see what I'm doing wrong.
The answers so far are very close, but unsatisfactory in my opinion. The problem is the following - after your for loop:
myplots[[1]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, i]
myplots[[1]]$plot_env
#<environment: R_GlobalEnv>
myplots[[2]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, i]
myplots[[2]]$plot_env
#<environment: R_GlobalEnv>
i
#[1] "B"
As the other answers mention, ggplot doesn't actually evaluate those expressions until plotting, and since these are all in the global environment, and the value of i is "B", you get the undesirable results.
There are several ways of avoiding this issue, the simplest of which in fact simplifies your expressions:
myplots = lapply(v, function(col)
ggplot(dfrm, aes(x=1:dfmsize, y=dfrm[,col])) + geom_point() + labs(y=col))
The reason this works, is because the environment is different for each of the values in the lapply loop:
myplots[[1]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, col]
myplots[[1]]$plot_env
#<environment: 0x000000000bc27b58>
myplots[[2]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, col]
myplots[[2]]$plot_env
#<environment: 0x000000000af2ef40>
eval(quote(dfrm[, col]), env = myplots[[1]]$plot_env)
#[1] 1 2 3 4 5 6 7 8 9 10
eval(quote(dfrm[, col]), env = myplots[[2]]$plot_env)
#[1] 10 9 8 7 6 5 4 3 2 1
So even though the expressions are the same, the results are different.
And in case you're wondering what exactly is stored/copied to the environment of lapply - unsurprisingly it's just the column name:
ls(myplots[[1]]$plot_env)
#[1] "col"
I believe the problem here is that the non-standard evaluation of the aes method delays evaluating i until the plot is actually plotted. By the time of plotting, i is the last value (in the toy example "B") and thus the y aesthetic mapping for all plots refers to that last value. Meanwhile, the labs call uses standard evaluation and so the labels correctly refer to each iteration of i in the loop.
This can be fixed by simply using the standard evaluation version of the mapping function, aes_q:
require(cowplot)
dfrm <- data.frame(A=1:10, B=10:1)
v <- c("A","B")
dfmsize <- nrow(dfrm)
myplots <- vector("list",2)
count = 1
for(i in v){
myplots[[count]] <- ggplot(dfrm, aes_q(x=1:dfmsize, y=dfrm[,i])) + geom_point() + labs(y=i)
count = count +1
}
plot_grid(plotlist=myplots)
There is a nice explanation of what happens with ggplot2's lazy evaluation and for loops in [this answer](https://stackoverflow.com/a/26246791/2461552.
I usually switch to aes_string or aes_ for situations like this so I can use variables as strings in ggplot2.
I find lapply loops easier than a for loop in your case as initializing the list and using the counter can be avoided.
First, I add the x variable to the dataset.
dfrm$index = 1:nrow(dfrm)
Now, the lapply loop, looping through the columns in v.
myplots = lapply(v, function(x) {
ggplot(dfrm, aes_string(x = "index", y = x)) +
geom_point() +
labs(y = x)
})
plot_grid(plotlist = myplots)
I think ggplot is getting confused by looking for your x and y variables inside of dfrm even though you are actually defining them on the fly. If you change the for loop slightly to build a new sub data.frame as the first line it works just fine.
myplots <- list()
count = 1
for(i in v){
df <- data.frame(x = 1:dfmsize, y = dfrm[,i])
myplots[[count]] <- ggplot(df, aes(x=x, y=y)) + geom_point() + labs(y=i)
count = count + 1
}
plot_grid(plotlist=myplots)
I have a list of elemental compositions and I'd like to display a count for the number of times an element is included in a composition mapped onto the periodic table (e.g. CH4 would increase the count on H and C by one).
How can I do this with ggplot? Is there a map I can use?
With a bit of searching I found information about the periodic table in this example code project. They had an Access Database with element information. I've exported it to this gist. You can import the data using the httr library with
library(httr)
dd <- read.table(text=content(GET("https://gist.githubusercontent.com/MrFlick/c1183c911bc5398105d4/raw/715868fba2d0d17a61a8081de17c468bbc525ab1/elements.txt")), sep=",", header=TRUE)
(You should probably create your own local version for easier loading in the future.)
Then your other challenge is decomposing something like "CH4" into the raw element counts. I've created this helper function which I think does what you need.
decompose <- function(x) {
m <- gregexpr("([A-Z][a-z]?)(\\d*)", x, perl=T)
dx <- Map(function(x, y) {
ElementSymbol <- gsub("\\d","", x)
cnt <- as.numeric(gsub("\\D","", x))
cnt[is.na(cnt)]<-1
cbind(Sym=y, as.data.frame(xtabs(cnt~ElementSymbol)))
}, regmatches(x,m), x)
do.call(rbind, dx)
}
Here I test the function
test_input <- c("H2O","CH4")
decompose(test_input)
# Sym ElementSymbol Freq
# 1 H2O H 2
# 2 H2O O 1
# 3 CH4 C 1
# 4 CH4 H 4
Now we can combine the data and the reference information to make a plot
library(ggplot2)
ggplot(merge(decompose("CH4"), dd), aes(Column, -Row)) +
geom_tile(data=dd, aes(fill=GroupName), color="black") +
geom_text(aes(label=Freq))
Clearly there are opportunities for improvement but this should give you a good start.
You might look for a more robust decomposition function. Looks like the CHNOSZ package has one
library(CHNOSZ)
data(thermo)
decompose <- function(x) {
do.call(`rbind`, lapply(x, function (x) {
z <- makeup(x)
cbind(data.frame(ElementSymbol = names(z),Freq=z), Sym=x)
}))
}
ggplot(merge(decompose("CaAl2Si2O7(OH)2*H2O"), dd), aes(Column, -Row)) +
geom_tile(data=dd, aes(fill=GroupName), color="black") +
geom_text(aes(label=Freq))
I'm trying to plot from a rather complex array in R. I want to produce an image with 3 by 3 graphs, each with red and blue points on it.
I've got a structure of apply loops which works, but I'd like to change the y maximum value by each row.
I would normally do this using a counter, like i, in other languages. But the apply thing in R is completely baffling me!
par(mfrow=c(3,3),pty="s") # a 3 by 3 graphic
set.seed(1001)
x <- 1:54 # with 1 to 54 along the x axis
y <- array(rexp(20), dim=c(54,6,3,2)) # and the y axis coming
# from an array with dimensions as shown.
ymax <- c(1,0.1,0.3) # three different y maximum values I want
# on the graphic, one for each row of graphs
counter <- 1 # a counter, starting at 1,
# as I would use in a traditional loop
apply(y[,3:5,,], 2, function(i) # my first apply, which only considers
# the 3rd, 4th and 5th columns
{
yy <- ymax[counter] # using the counter to select my ylimit maximum
apply(i, 2, function (ii) # my second apply, considering the 3rd
# dimension of y
{
plot(x,ii[,1], col="blue", ylim=c(0,yy))
# plotting the 4th dimension
points(x,ii[,2], col="red")
# adding points in a different
# colour from the 4th dim.
})
})
Thank you in advance for your thoughts, they are very much appreciated!
Cheers
Kate
I think it might be easier to use loops in this case.
Also, your code does not have a line to update the counter, like counter <- counter + 1. From inside apply you will need to assign to the global environment using <<-, note the doubled smaller < sign. An example using lapply, e.g.
Single lapply usage
counter <- 0
lapply(1:3, function(x) {
counter <<- counter + 1
cat("outer", counter, "\n")
plot(1:10, main=counter)
})
Or nested usage of lapply
counter <- 0
lapply(1:3, function(x) {
counter <<- counter + 1
cat("outer", counter, "\n")
lapply(1:3, function(x) {
counter <<- counter + 1
cat("inner", counter, "\n")
plot(1:10, main=counter)
})
})
The key thing here is to use lapply on the index rather than on the array itself, so then you can use the index to subset both your y limits and the array ahead of the inner loop. This also avoids having to use the <<- construct.
Simplified your data a bit:
par(mfrow=c(3,3),pty="s") # a 3 by 3 graphic
set.seed(1001)
x <- 1:10 # with 1 to 54 along the x axis
dims <- c(10,6,3,2)
y <- array(rexp(prod(dims)), dim=c(10,6,3,2)) # and the y axis coming
ymax <- c(1,0.1,0.3)
lapply(1:3, function(counter, arr) {
apply(
arr[ ,counter + 2, , ], 2,
function(ii) {
plot(x, ii[,1], col="blue", ylim=c(0,ymax[counter]))
points(x, ii[,2], col="red")
} )
},
arr=y
)
I am not going to rewrite your code as I must say it is difficult to comprehend, but this will help: you can update a variable outside of the scope of apply by using <<- assignment, e.g. to update some external "counter"