My goal it to get a list p which contains two graphs p[[1]] and p[[2]].
p[[1]] and p[[2]] are supposed to be a plot with point(10,10) and point(20,20) for each. But after executing below, in the list p, only p[[2]] shows expected graph. P[[1]] graph does not appear.
How to correct to make p[[1]] in the list have point(10,10)?
(It seemd that the variable cordx and cordy are tightly coupled to p[[1]],
so whenever the cordx, cordy are changed, the alredy made p[[1]] is revised everytime.)
library(ggplot2)
xx<-list(10,20);yy<-list(10,20)
p<-list()
for (i in (1:2) ) {
cordy<-yy[[i]];cordx<-xx[[i]] #But,at 2nd loop(that is when i=2),after executing this line, my p[[1]] is affected unexpectedly, containning point (20,20))
p<-ggplot()+geom_point(aes(x=cordx,y=cordy))
p[[i]]<-p # at 1st loop(that is i=1), p[[1]] contains point (10,10) as expected.
}
print(p[[1]])
print(p[[2]])
May I suggest using mapply() to avoid looping?
Here is the code:
library(ggplot2)
xx <- list(10,20)
yy <- list(10,20)
p <- mapply(function(cordx, cordy) { ggplot() + geom_point(aes(x = cordx, y = cordy)) }, xx, yy, SIMPLIFY = FALSE)
print(p[[1]])
print(p[[2]])
What it does: mapply pass each element of xx and yy in the function that creates the plot. The outputs of the function are stored in the object p. SIMPLIFY = FALSE forces p to be a list.
Outputs:
Related
I want to make many plots using multiple pairs of variables in a dataframe, all with the same x. I store the plots in a named list. For simplicity, below is an example with only 1 variable in each plot.
Key to this function is a select() call that is clearly not necessary here but is with my actual data.
The body of the function works fine on each variable, but when I loop through a list of variables, the last one in the list always produces
Error in get(ll): object 'd' not found.
(or whatever the last variable, if not 'd'). Replacing data <- df %>% select(x,ll) with data <- df avoids the error.
## make data
df2 <- data.frame(x = 1:10,
a = 1:10,
b = 2:11,
c = 101:110,
d = 10*(1:10))
## make function
testfun <- function(df = df2, vars = letters[1:4]){
## initialize list to store plots
plotlist <- list()
for (ll in vars){
## subset data
data <- df %>% select(x, ll) ## comment out select() to get working function
# print(data) ## uncomment to check that dataframe subset works correctly
## plot variable vs. x
p <- ggplot(data,
aes(x = x, y = get(ll))) +
geom_point() +
ylab(ll)
## add plot to named list
plotlist[[ll]] <- p
# print(p) ## uncomment to see that each plot is being made
}
return(plotlist) ## unnecessary, being explicit for troubleshooting
}
## use function
pl <- testfun(df2)
## error ?
pl
I have a work-around that avoids select() by renaming variables in my actual dataframe, but I am curious why this does not work? Any ideas?
get() could work, but not with ll directly. Try y = get(!!ll) or y = {{ll}}.
ggplot (or maybe aes, it's hard to tell) waits to run this code until its plot object is referenced, as the error in the provided code demonstrates. By the time each ggplot evaluates get(ll), the for loop has already finished. So ll evaluates to the last value of the loop variable, "d", for all four ggplots. ll being "d" in the error makes it seem like it's the final ggplot object that fails, but it's actually evaluating the first one that causes this error.
In the body of the loop we'd like a way to evaluate the ll variable and stick that resulting string ("a", "b", "c", or "d") into this code, the rest of which won't run until later. Changing y = get(ll) to y = get(!!ll) is one way to do this: !! performs "surgery" on the unevaluated expression (called a "blueprint for code" in Tidyverse docs) so that the expression passed into ggplot contains a literal string like "a" instead of the variable reference ll.
testfun <- function(df = df2, vars = letters[1:4]){
plotlist <- list()
for (ll in vars){
data <- df %>% select(x, ll)
p <- ggplot(data,
aes(x = x, y = get(!!ll))) +
geom_point() +
ylab(ll)
plotlist[[ll]] <- p
}
return(plotlist)
}
Read on for explanation and an alternate solution.
The loop problem: late binding
In a given function or in the global scope in R, there's just one variable of any given name. A for (x in xs) loop repeatedly rebinds that variable to a new value. That means that after a for loop has finished, that variable still exists and retains the last value it was assigned. Here's a way this can trip you up:
vars <- c("a", "b", "c", "d")
results <- list()
for (ll in vars){
message("in for loop, ll: ", ll)
func <- function () { ll }
results[[ll]] <- c(ll, func)
}
message("after for loop, ll: ", ll)
# after for loop, now ll is "d"
for (vec in results) {
message(vec[[1]], " ", vec[[2]]())
}
This outputs
in for loop, ll: a
in for loop, ll: b
in for loop, ll: c
in for loop, ll: d
after for loop, ll: d
a d
b d
c d
d d
Each of the four functions constructed here use the same outer scope variable ll which, by the time the functions are actually called after the for loop, is "d". The late binding part is that the value of the variable at function call time (late) is used when looking up its value, not the value of the variable when the function is defined (early).
The NSE problem
The OP isn't creating functions in a loop though, they're calling ggplot. ggplot does something similar to creating a function: it takes some code as an argument that it doesn't evaluate until later. ggplot (or maybe aes) "captures" code from some of arguments instead of running them. In OP's case, get(ll) isn't evaluated until later.
When this code is evaluated it's in a new context with a "data mask" that allows names of a data frame to be referenced directly. This part is great, it's what we want — this is what makes get("a") work at all. But the fact that the evaluation happens later is a problem for the OP: ll in get(ll) evaluated to "d", like get("d"), because the code is evaluated after the for-loop iteration where ll had the expected value.
Ignoring the data mask part, here's a function called run.later that, like ggplot, doesn't run one of its arguments. When we run that code later, we again find that ll evaluates to "d" for all four of the saved expressions.
vars <- c("a", "b", "c", "d")
unevaluated.exprs <- list();
run.later <- function(name, something) {
expr <- substitute(something)
unevaluated.exprs[[name]] <<- c(name, expr)
}
for (ll in vars){
run.later(ll, ll)
}
for (vec in unevaluated.exprs) {
message(c(vec[[1]], " ", eval(vec[[2]])))
}
prints
a d
b d
c d
d d
That's the ll part of the problem. The rule of thumb from languages like Python of "Don't define functions in a loop (if they reference loop variables)" could be generalized for R to "don't define functions or otherwise write code that won't be immediately evaluated in a loop (if that code references loop variables)."
Fixing the scope problem instead of metaprogramming
The !! solution provided at the top uses metaprogramming to evaluate the ll variable in the loop instead of evaluating it later.
Theoretically, one could instead dynamically create variables in each iteration of a loop, then carefully reference that dynamically created variable name with metaprogramming. But a more elegant way would be to use the same variable name but in different scopes. This is what Nithin's answer does with a function: every function creates a new scope and tada, you can use the same variable name in each. Here's another version of that, closer to OP's code:
testfun <- function(df = df2, vars = letters[1:4]){
plotlist <- list()
plot.fn <- function(var) {
data <- df %>% select(x, var)
p <- ggplot(data,
aes(x = x, y = get(var))) +
geom_point() +
ylab(var)
plotlist[[ll]] <<- p
}
for (ll in vars){
plot.fn(ll)
}
return(plotlist)
}
pl <- testfun(df2)
pl
There are 4 distinct variables called var in this code, and each iteration of the loop references a different one.
Prettier metaprogramming
I think (haven't tested) that get(!!ll) is equivalent to {{ll}} here — get() looks up a string as a variable, but that's also what sticking the symbol of the string that ll evaluates to into the expression does. Double curlies seem more common and can roughly be understood as "evaluate the result of this expression as a variable in the other context," or as "template this string into the expression."
write a custom function like this
plot_fn<- function(df,y){
df %>% ggplot(aes(x=x,
y=get(y))+
geom_point()+
ylab(y)
}
Iterate over plots with purrr:::map
map(letters[1:4],~plot_fn(df=df2,y=.x))
The issue is that we cannot use get to access dplyr/tidyverse data in a "programming" paradigm. Instead, we should use non standard evaluation to access the data. I offer a simplified function below (originally I thought it was a function masking issue as I quickly skimmed the question).
testfun <- function(df = df2, vars = letters[1:4]){
lapply(vars, function(y) {
ggplot(df,
aes(x = x, y = .data[[y]] )) +
geom_point() +
ylab(y)
})
}
Calling
plots <- testfun(df2)
plots[[1]]
EDIT
Since OP would like to know what the issue is, I have used a traditional loop as requested
testfun2 <- function(df = df2, vars = letters[1:4]){
## initialize list to store plots
plotlist <- list()
for (ll in vars){
## subset data
d_t <- df %>% select(x, ll) ## comment out select() to get working function
# print(data) ## uncomment to check that dataframe subset works correctly
## plot variable vs. x
p <- ggplot(d_t,
aes(x = x, y = .data[[ll]])) +
geom_point() +
ylab(ll)
## add plot to named list
plotlist[[ll]] <- p
## uncomment to see that each plot is being made
}
plotlist
}
pl <- testfun2(df2)
pl[[1]]
The reason get does not work is that we need to use non-standard evaluation as the docs state. Related questions on using get may be useful.
First plot
I need to loop over files and then create for each file an object:
Here is an example :
filenames <- Sys.glob("/Users/Desktop/*.nwk")
for (i in filenames ) {
print(paste0("Processing the phylogeny: ",i))
p <- a code that generate a figure
}
And then I generate 5 figures that I call with this code :
multiplot(p1,p2,p3,p4,p5 ncol=2, labels=c('A', 'B','C','D','E'))
But I wondered how can I call assign the 1,2 etc values into the variable objects p?
I tried to create a nb=1 object and then assign as p+nb <- a code that generate a figure, but it does not work
There are dedicated packages for plotting/merging multiple plots. patchwork, cowplot, grid, egg, etc.
Use lapply to generate ggplot objects in a list, then use cowplot::plot_grid, something like:
cowplot::plot_grid(
plotlist = lapply(list.files(...), function(i){
#import file
d <- read.table(i)
#plot
ggplot(d, aes(...)) + geom_...
}),
ncol = 2)
You can do that but I'll suggest not to create 5 plot objects in global environment. Store the output of plots in a list.
list_plot <- vector('list', length(filenames))
for (i in seq_along(filenames)) {
cat("\nProcessing the phylogeny: ",filenames[i])
list_plot[[i]] <- a code that generate a figure using filenames[i] to read file
}
do.call(multiplot, c(list_plot, ncol=2, labels=c('A', 'B','C','D','E')))
I found a way by doing :
Phylo_name<-paste0("p",nb,sep="")
eval(call("<-", as.name(Phylo_name), ggtree(tr = phylo,
mapping = aes(color = group)) + geom_tiplab() + theme(legend.position="right")+
scale_color_manual(values=color_vector)))
I'm saving objects to a list and I'd like to clear the plots pane and viewer pane whenever I load those objects. I basically want the back button to grey out after I've gone through the charts in that element of the list.
This is my code
gg <- ggplot(iris, aes(width, length)) + geom_point
l <- list()
l[["element"]] <- gg
I want it so that when I run l$element, it's like I first clicked the broom on the plots and viewer tab in rstudio.
You can do this as long as you store the ggplot inside a custom S3 class whose default print method calls dev.off then plots the enclosed ggplot:
gg_wipe <- function(x) structure(list(plot = x), class = "gg_wipe")
print.gg_wipe <- function(x) {dev.off(); print(x$plot)}
gg <- ggplot(iris, aes(Sepal.Width, Sepal.Length)) + geom_point()
l <- list()
l[["element"]] <- gg_wipe(gg)
l[["element"]]
Or rather than wrapping the plot as Allan did (which is a good idea) you can provide a helper function and then provide different ways to get at it. So this calls the helper function directly
l <- list(
apple=ggplot(data.frame(x=1:3, y=1:3)) + geom_point(aes(x,y)),
banana=ggplot(data.frame(x=1:3, y=3:1)) + geom_point(aes(x,y))
)
clear_and_print <- function(x, ele) {
graphics.off(); invisible(print(x[[deparse(substitute(ele))]]))
}
clear_and_print(l, apple)
You can define some new operators but they have to be syntactically valid. You can't have ^^^ bur you could have %^%
`%^%` <- clear_and_print
l %^% apple
l %^% banana
Or you can create a special class for your container
`$.plot_clear` <- function(x, ele) {
graphics.off(); x[[ele]]
}
class(l) <- "plot_clear"
l$apple
l$banana
I'm trying to automate making a series of the same plot using different objects; I'm working with S4 class phyloseq objects. When I use a for loop to iterate over a list of objects and try to use the object name as a title for each plot and in a filename for ggsave I can't quite get it to recognize the correct name, though it's making the correct plots for a given object in the list.
I've tried using variations of deparse(substitute(object)) with get() and quote() and end up getting slightly different, but still off-target results.
object_list <- c(object1, object2, object3)
automate_graphs <- function(x){
for(object in x){
name <- deparse(substitute(object))
ordination <- ordinate(object, "NMDS", "bray")
plot <- plot_ordination(object, ordination) + ggtitle(label = name)
ggsave(plot, filename=sprintf("NMDS_bray_%s.pdf", name), height=4, width=7)}}
automate_graphs(object_list)
I'm expecting to save 3 pdfs named NMDS_bray_object1, NMDS_bray_object2, NMDS_bray_object3.
Instead I get NMDS_bray_S4 object of class structure("phyloseq", package = "phyloseq") (so it's saving the deparse of the object to the variable name rather than the substitution) or with quote I get NMDS_bray_object which I suppose is to be expected haha. Thanks in advance for any help!
Just make it a named vector (list) of objects and iterate over the names:
object_list <- c(object1 = object1,object2 = object2,object3 = object3)
automate_graphs <- function(x){
for(nm in names(x)){
object <- x[nm] #Pick out the one named nm
ordination <- ordinate(object, "NMDS", "bray")
plot <- plot_ordination(object, ordination) + ggtitle(label = nm)
ggsave(plot, filename=sprintf("NMDS_bray_%s.pdf", nm), height=4, width=7)
}}
automate_graphs(object_list)
#Joran thanks again for the help--it pushed me in the right direction to figuring out a solution, even if it's not the most elegant. I took the idea of generating a vector of names and then just created an extra variable to cycle through that vector. But this way it maintains the class of object and creates a separate list of corresponding names:
object_list <- c("object1" = object1, "object2" = object2, "object3" = object3)
automate_graphs <- function(x){
names = names(x)
obj_num = 1
for(object in x){
name <- names[obj_num]
ordination <- ordinate(object, "NMDS", "bray")
plot <- plot_ordination(object, ordination) + ggtitle(label = name)
ggsave(plot, filename=sprintf("NMDS_bray_%s.pdf", name), height=4, width=7)
obj_num = obj_num + 1
}
}
automate_graphs(object_list)
The aim of this script was to replicate something like the figure below:
found on: https://robjhyndman.com/hyndsight/tscv/
The problem I have encountered relates to (I think) how R is handling my promises in ggplot.
Below is an example which reproduces my problem.
library(tidyverse)
process_starting_row <- 600
per_validation_period <- 30
number_of_validations <- 5
graphical_data <- data.frame(x= 1:(process_starting_row + 1 + (number_of_validations)*per_validation_period))
for (it in 1:number_of_validations) {
# For this graph there is always a line and then a colour component explaining each one...
graphical_data[,paste0("iteration",it,"line")] <- c(it)
# First make the whole row grey and then "dolly up" the colours.
graphical_data[,paste0("iteration",it,"colour")] <- "grey"
graphical_data[1:(process_starting_row + (it-1)*per_validation_period), paste0("iteration",it,"colour")] <- "blue"
graphical_data[(process_starting_row + 1 + (it)*per_validation_period), paste0("iteration",it,"colour")] <- "red"
}
#graphical_data
The above code creates a dataframe object which could be used to create the desired figure. For each iteration (in the original figure a different line) it creates a vector corresponding to the iterations "height" above the axis (that columns name is always iteration#line and a corresponding character vector, iteration#colour, with the colour code for each of the dots.
The next bit is to create a base ggplot object.
ggbase <- ggplot(data = graphical_data, aes(x=x)) +
coord_cartesian(xlim = c(process_starting_row-1*per_validation_period, nrow(graphical_data))) +
theme_bw()
It is upon this base object that I wish to iterate.
I wrote a function which would add each iteration gg_adding() and then another ggaddfor() which runs the for loop.
gg_adding <- function(data, iteration_sub, color_sub){
iteration_promise <- enquo(iteration_sub)
colour_promise <- enquo(color_sub)
gg <- geom_point(data = data, aes(x= x, y= !! iteration_promise, color = !! colour_promise))
return(gg)
}
ggaddfor <- function(data, gg){
ggout <- gg
for (it in 1:number_of_validations) {
#print(it)
iterationsub <- paste0("iteration",it,"line")
coloursub <- paste0("iteration",it,"colour")
ggout <- ggout + gg_adding(data, iterationsub, coloursub)
}
return(ggout)
}
When I run this function I get the following:
# Not working
ggaddfor(graphical_data, ggbase)
Which produces output that looks like this:
Clearly that's not what I was hoping for...
In order to test things I stipulated each iteration explicitly.
# Working...
ggadd <- ggbase
ggadd <- ggadd + gg_adding(graphical_data, iteration1line, iteration1colour)
ggadd <- ggadd + gg_adding(graphical_data, iteration2line, iteration2colour)
ggadd <- ggadd + gg_adding(graphical_data, iteration3line, iteration3colour)
ggadd <- ggadd + gg_adding(graphical_data, iteration4line, iteration4colour)
ggadd <- ggadd + gg_adding(graphical_data, iteration5line, iteration5colour)
This produces the desired output:
I want to put these functions into a package I'm currently writing and so explicitly stipulating the additions (as I do directly above) is not going to work...
I'm not sure why my earlier code is not producing the same results. I'm somewhat new to handling promises with the rlang package and I suspect my mistake could be there...
What worked for me is to replace your enquo() calls in your gg_adding() function by as.symbol(), so that the new function would look like this:
gg_adding <- function(data, iteration_sub, color_sub){
iteration_promise <- as.symbol(iteration_sub)
colour_promise <- as.symbol(color_sub)
gg <- geom_point(data = data, aes(x= x, y= !! iteration_promise, color = !! colour_promise))
return(gg)
}
However, in order to not duplicate your data every iteration, I would suggest this as your geom_point() call.
gg <- geom_point(aes(y= !! iteration_promise, color = !! colour_promise))
I'm tangentially familiar with tidy evaluation and quotation, but not fully. The thing that I understand is that whatever you put in aes(), will always be evaluated in context of data column names, first in the layer's data, next in the global data, unless the user is explicit in his calls (e.g aes(fill = "black") or something). Because a value for x and data are already specified in your ggbase construction, we do not need it in your geom_point() call.
I know this is maybe an unsollicited tip and I apologise, but ggplot seems to prefer to work with long data more than with wide data. What I mean with 'wide' data is that your iterations are sort of cbind()-ed together. Therefore, if you first calculate each iteration and then rbind() them together, you could shorten your script by quite a bit and circumvent the (quasi)quotation stuff altogether to produce a similar plot:
new_gr_dat <- lapply(seq_len(number_of_validations), function(it){
df <- data.frame(x= 1:(process_starting_row + 1 + (number_of_validations)*per_validation_period),
line = it, # doubles as y-value and iteration tracker
colour = "grey")
df[1:(process_starting_row + (it-1)*per_validation_period), "colour"] <- "blue"
df[(process_starting_row + 1 + (it)*per_validation_period), "colour"] <- "red"
return(df)
})
new_gr_dat <- do.call(rbind, new_gr_dat)
ggplot(new_gr_dat, aes(x = x, y = line, colour = colour)) +
geom_point() +
coord_cartesian(xlim = c(process_starting_row-1*per_validation_period, max(new_gr_dat$x)))