I am running a lot of graphs across different subsets of data, using the split function with the lapply function.
My problem is: How do I easily save all the graphs? Ideally I want to name each fileby the title of the graph (Indicating the subset of data).
data("mtcars")
split <- split(mtcars, list(mtcars$gear, mtcars$am))
my_plot <- function(mtcars) {
ggplot(mtcars, aes(x=mpg, y=cyl)) +
geom_point() +
labs (title= paste(unique(mtcars$gear), paste("- ", unique(mtcars$am)))) }
lapply(split, my_plot)
Thus far, I have been saving the temporary directory (But this is a lot of manual labour)
plots.dir.path <- list.files(tempdir(), pattern="rs-graphics", full.names = TRUE);
plots.png.paths <- list.files(plots.dir.path, pattern=".png", full.names = TRUE)
file.copy(from=plots.png.paths, to="......")
You could include ggsave in your function, then we would loop over the names() of split instead:
data("mtcars")
library(ggplot2)
split <- split(mtcars, list(mtcars$gear, mtcars$am))
my_plot <- function(x) {
p <- ggplot(split[[x]], aes(x=mpg, y=cyl)) +
geom_point() +
labs (title= paste(unique(split[[x]][["gear"]]), paste("- ", unique(split[[x]][["am"]]))))
ggsave(paste0("my_path/mtcars_", x, ".png"))
p
}
lapply(names(split), my_plot)
Alternatively we could use a slightly different workflow with purrr::map and purrr::iwalk(). First we create a list of plots with purrr::map and than loop over the named lists of plots using purrr::iwalk() to save them:
library(purrr)
my_plot <- function(df, nm) {
ggplot(df, aes(x=mpg, y=cyl)) +
geom_point() +
labs (title= paste(unique(df[["gear"]]), paste("- ", unique(df[["am"]]))))
}
# save plots to list
plots <- map(split, my_plot)
# loop over plots and save
iwalk(plots, ~ ggsave(paste0("my_path/mtcars_", .y, ".png"),
plot = .x))
Created on 2023-02-16 by the reprex package (v2.0.1)
Related
Problem
I am creating a boxplot of 14 water chemistry elements. Each element has a different dataframe. I have create a for loop to loop through each data frame and plot the appropriate graph. I want to add each of the plots to a list of plots that I have created outside of the for loop.
Working Code
# Libraries
library(dplyr)
library(ggplot2)
library(readr)
# read in all files
myFiles <- list.files(pattern= ".csv")
# create a list of all 14 data frames
dataList <- list()
for (var in myFiles) {
filepath <- file.path(paste(var))
dataList[[var]] <- read.csv(filepath, sep=",", header=T)
}
# Plot the data as a boxplot
for (data in dataList){
p <-
data %>%
ggplot(aes_string(x='Month', y=data[,5])) +
geom_boxplot() +
theme_classic() +
labs(y= colnames(data[5])) +
scale_x_discrete
print(p)
}
Attempts
Attempt 1:
# Plot the data and add to list
bplot_list <- list()
for (data in dataList){
plot_list[[data]] <-
data %>%
ggplot(aes_string(x='Month', y=data[,5])) +
geom_boxplot() +
theme_classic() +
labs(y= colnames(data[5])) +
scale_x_discrete
}
Attempt 2:
# Plot the data and add to list
bplot_list <- list()
for (data in dataList){
p <-
data %>%
ggplot(aes_string(x='Month', y=data[,5])) +
geom_boxplot() +
theme_classic() +
labs(y= colnames(data[5])) +
scale_x_discrete
bplot_list[[]] <- p
}
This is the solution I came up with. I created a name for the graph based off the column name and used this to add to the list.
# Plot the data and add to list
bplot_list <- list()
for (data in dataList){
chemElement <- colnames(data[5])
p <-
data %>%
ggplot(aes_string(x='Month', y=data[,5])) +
geom_boxplot() +
theme_classic() +
labs(y= chemElement) +
scale_x_discrete()
bplot_list[[chemElement]] <- p
}
I created a function that splits a dataframe by group variable 'gear' and makes plots for each new df. How do I change the title for each plot then?
rank20 <- function(df){
sdf <- df
clusterName <- paste0("cluster",sdf$gear)
splitData <- split(sdf,clusterName)
plot <- lapply(splitData, function (x) {ggplot(x, aes(mpg, cyl)) + geom_point()+
labs(x="mpg", y="cyl",
title="This Needs To Be Changed") +
theme_minimal()})
do.call(grid.arrange,plot)
}
rank20(mtcars)
I want the following titles: gear3, gear4, etc. (corresponding to their gear value)
UPD: both results are right if using mtcars. But in my real case, I transform my initial df. So, I should have put my question in the different way. I need to take the titles from splitData df name itself rather than from the column.
I find it easier with lapply to use the indexes as the input. Provides more flexibility if you need to link the list to another element eg the name of the list:
rank20 <- function(df, col="gear"){
sdf <- df
clusterName <- paste0("cluster", sdf[[col]])
splitData <- split(sdf, clusterName)
# do the apply across the splitData indexes instead and pull out cluster
# name from the column
plot <- lapply(seq_along(splitData), function(i) {
X <- splitData[[i]]
i_title <- paste0(col, X[[col]][1])
## to use clusterName instead eg cluster3 instead of gear3:
#i_title <- paste0(col, names(splitData)[i])
ggplot(X, aes(mpg, cyl)) +
geom_point() +
labs(x="mpg", y="cyl", title=i_title) +
theme_minimal()
})
do.call(grid.arrange,plot)
}
rank20(mtcars)
As you say you wanted the name to be gear2, gear3 I've gone with this but hashed out the alternative i_title that uses the clusterName value instead.
In the input of the function you can change the col value to switch to a different column so gear isn't hard-coded
May I suggest a slightly different approach (yet also using the loop over indices as suggested in user Jonny Phelps answer). I am creating a list of plots and then using patchwork::wrap_plots for plotting. I find it smoother.
library(tidyverse)
library(patchwork)
len_ind <- length(unique(mtcars$cyl))
ls_plot <-
mtcars %>%
split(., .$cyl) %>%
map2(1:len_ind, ., function(x, y) {
ggplot(y, aes(mpg, cyl)) +
geom_point() +
labs(x = "mpg", y = "cyl",
title = names(.)[x]
) +
theme_minimal()
})
wrap_plots(ls_plot) + plot_layout(ncol = 1)
Just noticed this was the wrong column - using cyl instead of gear. Oops. It was kind of fun to wrap this into a function:
plot_col <- function(x, col, plotx, ploty){
len_ind <- length(unique(mtcars[[col]]))
x_name <- deparse(substitute(plotx))
y_name <- deparse(substitute(ploty))
ls_plot <-
mtcars %>%
split(., .[[col]]) %>%
map2(1:len_ind, ., function(x, y) {
ggplot(y, aes({{plotx}}, {{ploty}})) +
geom_point() +
labs(x = x_name, y = y_name,
title = names(.)[x]
) +
theme_minimal()
})
wrap_plots(ls_plot) + plot_layout(ncol = 1)
}
plot_col(mtcars, "gear", mpg, cyl)
I am trying to create 2 line plots.
But I noticed that using a for loop will generate two plots with y=mev2 (instead of a plot based on y=mev1 and another one based on y=mev2).
The code below shows the observation here.
mev1 <- c(1,3,7)
mev2 <- c(9,8,2)
Period <- c(1960, 1970, 1980)
df <- data.frame(Period, mev1, mev2)
library(ggplot2)
# Method 1: Creating plot1 and plot2 without using "for" loop (hard-code)
plot1 <- ggplot(data = df, aes(x=Period, y=unlist(as.list(df[2])))) + geom_line()
plot2 <- ggplot(data = df, aes(x=Period, y=unlist(as.list(df[3])))) + geom_line()
# Method 2: Creating plot1 and plot2 using "for" loop
for (i in 1:2) {
y_var <- unlist(as.list(df[i+1]))
assign(paste("plot", i, sep = ""), ggplot(data = df, aes(x=Period, y=y_var)) + geom_line())
}
Seems like this is due to some ggplot()'s way of working that I am not aware of.
Question:
If I want to use Method 2, how should I modify the logic?
People said that using assign() is not an "R-style", so I wonder what's an alternate way to do this? Say, using list?
One possible answer with no tidyverse command added is :
library(ggplot2)
y_var <- colnames(df)
for (i in 1:2) {
assign(paste("plot", i, sep = ""),
ggplot(data = df, aes_string(x=y_var[1], y=y_var[1 + i])) +
geom_line())
}
plot1
plot2
You may use aes_string. I hope it helps.
EDIT 1
If you want to stock your plot in a list, you can use this :
Initialize your list :
n <- 2 # number of plots
list_plot <- vector(mode = "list", length = n)
names(list_plot) <- paste("plot", 1:n)
Fill it :
for (i in 1:2) {
list_plot[[i]] <- ggplot(data = df, aes_string(x=y_var[1], y=y_var[1 + i])) +
geom_line()
}
Display :
list_plot[[1]]
list_plot[[2]]
For lines in different "plots", you can simplify it with facet_wrap():
library(tidyverse)
df %>%
gather(variable, value, -c(Period)) %>% # wide to long format
ggplot(aes(Period, value)) + geom_line() + facet_wrap(vars(variable))
You can also put it in a loop if necessary and store the results in a list:
# empty list
listed <- list()
# fill the list with the plots
for (i in c(2:3)){
listed[[i-1]] <- df[,-i] %>%
gather(variable, value, -c(Period)) %>%
ggplot(aes(Period, value)) + geom_line()
}
# to get the plots
listed[[1]]
listed[[2]]
Why do you want 2 separate plots? ggplots way to do this would be to get data in long format and then plot.
library(tidyverse)
df %>%
pivot_longer(cols = -Period) %>%
ggplot() + aes(Period, value, color = name) + geom_line()
Here is an alternative approach using a function and lapply. I recognize that you asked how to solve this using a loop. Still, I think it might be useful to consider this approach.
library(ggplot2)
mev1 <- c(1,3,7)
mev2 <- c(9,8,2)
Period <- c(1960, 1970, 1980)
df <- data.frame(Period, mev1, mev2)
myplot <- function(yvar){
plot <- ggplot(df, aes(Period, !!sym(yvar))) + geom_line()
return(plot)
}
colnames <- c("mev1","mev2")
list <- lapply(colnames, myplot)
names(list) <- paste0("plot_", colnames)
# Alternativing naming: names(list) <- paste0("plot", 1:2)
Using this approach you can easily apply your plot function to whatever columns you like. You can specify the columns by name, which may be preferrabe to specifying by position. Plots are saved in a list, and they are named afterwards using the names attribute. In my example I named the plots plot_mev1 and plot_mev2. But you can easily adjust to some other naming. E.g. write names(list) <- paste0("plot", 1:2) to get plot1 and plot2.
Note that I used !!sym() in the ggplot call. This is essentally an alternative to aes_string which was used in the answer of RĂ©mi Coulaud. In this way ggplot understands even in the context of a function or in the context of a loop that "mev1" is a column of your dataset and not just a text string
I have a list of ggplot-objects
library(ggplot2)
library(gridExtra)
p <- ggplot(mtcars, aes(factor(cyl))) + geom_bar()
plots <- list(p,p,p)
names(plots) <- c("A","B","C")
which i can manipulate:
plots <- lapply(plots, function(x) x + theme_bw())
grid.arrange(plots[[1]], plots[[2]], plots[[3]])
This works fine. However, what i cannot do is pasting the names of the ggplot-objects into the ggtitle-argument:
plots <- lapply(plots, function(x) x + ggtitle(paste(names(x))))
grid.arrange(plots[[1]], plots[[2]], plots[[3]])
Something, was pasted into the argument, but I basically missed the correct hierarchy level:
grid.arrange(plots[[1]], plots[[2]], plots[[3]]) ### all titled "data"
names(plots[[1]][1])
[1] "data"
Moving up the hierarchy does not work:
plots <- lapply(plots, function(x) x + ggtitle(paste(names(plots[[x]]))))
Error in plots[[x]] : invalid subscript type 'list'
I remembered an old question of mine for base-R plotting, but this is was not possible to transfer here:
plots <- lapply(names(plots), function(x) plots[[x]] + ggtitle(paste(names(x))))
names(plots)
NULL
Where did i fail?
Here is solution using Map:
# Your sample plots
library(ggplot2)
library(gridExtra)
p <- ggplot(mtcars, aes(factor(cyl))) + geom_bar()
plots <- list(p,p,p)
names(plots) <- c("A","B","C")
plots <- lapply(plots, function(x) x + theme_bw())
# Add title to ggplots
plots <- Map(function(gg, title) gg + ggtitle(title), plots, names(plots));
grid.arrange(grobs = plots);
The problem is that you loose the names of the list you iterate over with lapply.
if you check
lapply(plots, function(x) browser())
then you will see that names(x) returns
[1] "data" "layers" "scales" "mapping" "theme" "coordinates" "facet"
[8] "plot_env" "labels"
One way of adding titles iteratively would be to iterate over an index and use that to subset plots and names(plots):
plots <- lapply(seq_along(plots), function(i) {
plots[[i]] + ggtitle(names(plots)[i])
})
Then, a title will be added, e.g. for plots[[1]]:
Adding another option using cowplot::plot_grid which I found helpful.
cowplot::plot_grid(plotlist = plots,
labels = names(plots),
ncol = 1)
Data
p <- ggplot(mtcars, aes(factor(cyl))) + geom_bar()
plots <- list(p,p,p)
names(plots) <- c("A","B","C")
plots <- lapply(plots, function(x) x + theme_bw())
I want to use a loop in order to create multiple plots for different values of DPC. The data I have looks like:
df <- data.frame (c ("Results", "Capacity", "Power", "LDI","LDE", "LB", "PDC","D", CostPerkWh)
As output I would like multiple graphs with graphs for each unique value of PDC.
The following plot work:
plot1 <- ggplot(subset(df, df$PDC=='PDC0'),
aes(Capacity, CostPerkWh))+ geom_point()+geom_line()
plot2 <- ggplot(subset(df, df$PDC=='PDC0.25'),
aes(Capacity, CostPerkWh))+ geom_point()+geom_line()
plot3 <- ggplot(subset(df, df$PDC=='PDC0.5'),
aes(Capacity, CostPerkWh))+ geom_point()+geom_line()
plot4 <- ggplot(subset(df, df$PDC=='PDC0.75'),
aes(Capacity, CostPerkWh))+ geom_point()+geom_line()
plot5 <- ggplot(subset(df, df$PDC=='PDC1'),
aes(Capacity, CostPerkWh))+ geom_point()+geom_line()
All these plots work,however I would like to create a loop since I have a large amount of parameters and I found this example.
So I tried to implement it into my own model:
#plot data
StoreResults <- "/Users/IMA/Documents/Results/"
PDC.graph <- function(df, na.rm = TRUE, ...){
PDClist <- unique(df$PDC)
for (i in seq_along(PDClist)){
plot <-
ggplot(subset(df, df$PDC==PDClist[i]),
aes(Capacity, CostPerkWh)) + geom_point()+
ggtitle(paste(PDClist, 'PDC, Power \n', "Capacity \n", sep='')) +
geom_line()
print(plot)
#save plot as PNG
ggsave(plot, file= paste(StoreResults, '/projection_graphs/PDCgraph/',
PDClist[i], ".png", sep=''), scale=2)
}
}
The code does not give me an error message, but I don't see any graphs and nothing gets stored into the folder that is defined; how to resolve this? Or is there a better way to export many graph for different values of PDC?
Didn't you forget running the function you created?
This minimal version works for me:
df = iris
StoreResults <- "/Users/timfaber/Desktop"
PDC.graph <- function(df, na.rm = TRUE, ...){
PDClist <- unique(df$Species)
for (i in seq_along(PDClist)){
ggplot(subset(iris, df$Species==PDClist[i]),
aes(Sepal.Length, Sepal.Width)) + geom_point() +
ggtitle(paste(PDClist[i], 'PDC, Power \n', "Capacity \n", sep=''))
#save plot as PNG
ggsave(plot = last_plot(), file= paste(StoreResults, '/etc/',
PDClist[i], ".png", sep=''), scale=2)
}
}
PDC.Graph(df)