Plot multiple graphs from multiple csv and follow up analysis - r

I have over hundreds csv files that I would like to plot graph for each of them. I've searched through the forum and found something that I can use but still need some editing.
The code is originally from Plotting multiple graphs from multiple .csv files using R.
library(dplyr)
list_of_dfs = lapply(list.files('path/to/files', pattern = '*csv'),
function(x) {
dat = read.csv(x)
dat$fname = x
return(dat)
})
one_big_df = list_of_dfs %>% bind_rows()
one_big_df %>% ggplot(aes(x = x, y = y)) + geom_point() + facet_wrap(~ fname)
It works fine except I need to save all the graphs separately.
I also need to analyse the graphs by overlapping the graphs according to the suffixes, is it possible to incorporate in the code?
Example file names:
MAX_C1-B3.csv
MAX_C2-B3.csv
MAX_C1-B4.csv
MAX_C2-B4.csv
...
So the ones with B3 should be in one graph and B4 another graphs.
Thanks for your help in advance!

I am not sure that the following is what the question is asking for.
The main method is always the same,
split the data with base function split. This creates a named list;
pipe the resulting list to seq_along to get index numbers into the list. This allows for access to the list's names attribute and to compose filenames according to them;
pipe the numbers to purrr::map and plot each list member separately;
save the results to disk.
First load the packages needed.
suppressPackageStartupMessages({
library(dplyr)
library(ggplot2)
library(purrr)
})
This is a common function to save the plots.
save_plot <- function(graph, graph_name, type = "") {
# file name depends on suffix and on directory structure
# the files are to be saved to a temp directory
# (it's just a code test)
if(type != "")
graph_name <- paste0(graph_name, "_", type)
filename <- paste0(graph_name, ".pdf")
filename <- file.path("~/Temp", filename)
ggsave(filename, graph, device = "pdf")
}
1. Plot all graphs separately
From the question:
It works fine except I need to save all the graphs separately.
Does this mean that the graphs corresponding to each file are to be saved separately? If yes, then the following code plots and saves them in files with filenames with the extension .csv changed to .pdf.
list_dfs_by_fname <- split(one_big_df, one_big_df$fname)
list_dfs_by_fname %>%
seq_along() %>%
map(.f = \(i) {
graph_name <- names(list_dfs_by_fname)[i]
DF <- list_dfs_by_fname[[i]]
graph <- DF %>%
ggplot(aes(x = x, y = y)) +
geom_point()
save_plot(graph, graph_name)
})
2. Plot by suffix
First create a new column with either the suffix "B3" or the suffix "B4". Then split the data by groups so defined. The split data is needed for the two plots that follow.
inx <- grepl("B4$", one_big_df$fname)
one_big_df$group <- c("B3", "B4")[inx + 1L]
list_dfs_by_suffix <- split(one_big_df, one_big_df$group)
2.1. Plot by suffix, overlapped
To have the groups of fname overlap, map that variable to the color aesthetic.
list_dfs_by_suffix %>%
seq_along() %>%
map(.f = \(i) {
graph_name <- names(list_dfs_by_suffix)[i]
DF <- list_dfs_by_suffix[[i]]
graph <- DF %>%
ggplot(aes(x = x, y = y, color = fname)) +
geom_point()
save_plot(graph, graph_name, type = "overlapped")
})
2.2. Plot by suffix, faceted
If the plots are faceted by fname, the code is copied and pasted from the question's with added scales = "free".
list_dfs_by_suffix %>%
seq_along() %>%
map(.f = \(i) {
graph_name <- names(list_dfs_by_suffix)[i]
DF <- list_dfs_by_suffix[[i]]
graph <- DF %>%
ggplot(aes(x = x, y = y)) +
geom_point() +
facet_wrap( ~ fname, scales = "free")
save_plot(graph, graph_name, "faceted")
})
Test data
Use built-in data sets iris and mtcars to test the code.
Only the last two instructions matter to the question, they check the data set one_big_df's column names and the values in fname.
suppressPackageStartupMessages({
library(dplyr)
})
df1 <- iris[3:5]
df2 <- mtcars[c("hp", "qsec", "cyl")]
names(df1) <- c("x", "y", "categ")
names(df2) <- c("x", "y", "categ")
df2$categ <- factor(df2$categ)
sp1 <- split(df1[1:2], df1$categ)
sp2 <- split(df2[1:2], df2$categ)
names(sp1) <- sprintf("MAX_C%d-B3", seq_along(sp1))
names(sp2) <- sprintf("MAX_C%d-B4", seq_along(sp2))
list_of_dfs <- c(sp1, sp2)
list_of_dfs <- lapply(seq_along(list_of_dfs), \(i) {
list_of_dfs[[i]]$fname <- names(list_of_dfs)[i]
list_of_dfs[[i]]
})
one_big_df <- list_of_dfs %>% dplyr::bind_rows()
names(one_big_df)
#> [1] "x" "y" "fname"
unique(one_big_df$fname)
#> [1] "MAX_C1-B3" "MAX_C2-B3" "MAX_C3-B3" "MAX_C1-B4" "MAX_C2-B4" "MAX_C3-B4"
Created on 2022-05-31 by the reprex package (v2.0.1)

Related

Loop over ID in ggplot2, then save each plot individually

I have a large dataframe in long format that contains about 13k rows. Here is example data of what it looks like.
# make data
set.seed(1234)
id <- c(101,101,101,101,101,101,101,101,101,
102,102,102,102,102,102,
103,103,103,103,103,103,103,103,103,103,103)
time <- c(1:9, 1:6, 1:11)
var1 <- sample(1:20, 26, replace=TRUE)
df <- data.frame(id,time,var1)
I want to:
Generate a list of plots with x=time and y=var1 for each id
Save each id's plot individually as an image to a folder
My code so far is:
# loop to make plots
library(tidyverse)
id <- unique(df$id)
for(i in id){
list <- ggplot(df, aes_string(x = time, y = var1)) +
geom_point() +
stat_smooth() +
ggtitle("Plot for", paste(df$id))
print(list)
}
# loop to save plots
filename <- paste("plot_", df$id, ".png")
path <- "~/test"
for(i in list){
ggsave(filename = filename, plot = plot[[i]], path = path)
}
The code to make a list of ggplots runs without errors, but whenever I try to view the list, it only shows the plot for id 101. The code save the list of plots results in an error: Error: device must be NULL, a string or a function. How should the code be fixed to achieve both goals?
You can split the data into list and use map to save each plot as an image.
library(tidyverse)
df %>%
group_split(id) %>%
map(~ggsave(sprintf('plot_%d.png', first(.x$id)),
ggplot(.x, aes(x = time, y = var1)) +
geom_point() +
stat_smooth() + ggtitle(paste0("Plot for ", first(.x$id)))))
The first plot is tilted plot_101.png and looks like this :
Your script is overwriting list on each iteration of the loop. It would be easier to save the plot when it is created.
But if you would like to store the plot created on each iteration for use later in the script, I suggest creating a list object outside the loop and then append objects on each iteration.
See the comments for more details.
# Create an empty list
plotlist <- list()
id <- unique(df$id)
for(i in id){
#filter the dataframe for each id
list <- df %>% filter(id==i) %>% ggplot(aes(x = time, y = var1)) +
geom_point() +
stat_smooth() +
ggtitle("Plot for", paste(i))
#print the plot
print(list)
#store the plot in the list with id as the name
plotlist[[as.character(i)]] <- id
}
Now one can this list: "plotlist" for additional analysis later on.

Returing a list object from a function

I am trying to write a function that returns a series of ggplot scatterplots from a data frame. Below is a reproducible data frame as well as the function I've written
a <- sample(0:20,20,rep=TRUE)
b <- sample(0:300,20,rep=TRUE)
c <- sample(0:3, 20, rep=TRUE)
d <- rep("dog", 20)
df <- data.frame(a,b,c,d)
loopGraph <- function(dataFrame, y_value, x_type){
if(is.numeric(dataFrame[,y_value]) == TRUE && x_type == "numeric"){
dataFrame_number <- dataFrame %>%
dplyr::select_if(is.numeric) %>%
filter(y_value!=0) %>%
dplyr::select(-y_value)
x <- 1
y<-ncol(dataFrame_number)
endList <- list()
for (i in x:y)
{
i
x_value <-colnames(dataFrame_number)[i]
plotDataFrame <- cbind(dataFrame_number[,x_value], dataFrame[,y_value]) %>% as.data.frame()
r2 <- summary(lm(plotDataFrame[,2]~plotDataFrame[,1]))$r.squared
ggplot <- ggplot(plotDataFrame, aes(y=plotDataFrame[,2],x=plotDataFrame[,1])) +
geom_point()+
geom_smooth(method=lm) +
labs(title = paste("Scatterplot of", y_value, "vs.", x_value), subtitle= paste0("R^2=",r2), x = x_value,y=y_value)
endList[[i]] <- ggplot
}
return(endList)
}
else{print("Try again")}
}
loopGraph(df, "a", "numeric")
What I want is to return the object endList so I can look at the multiple scatterplots generated by this function. What happens is that the function prints each scatterplot in the plots window without giving me access to the endList object.
How can I get this function to return the endList object? Is there a better way to go about this? Thanks in advance!
Update
Thanks to #GordonShumway for solving my first issue. Now, when I define plots as plots <- loopGraph(df, "a", "numeric"), I can view all the outputs. However, all the graphs are of the first ggplot feature, even though the labels change. Any intuition as to why this is happening? Or how to fix it? I tried adding dev.set(dev.next()) to no avail.

Creating a boxplot loop with ggplot2 for only certain variables

I have a dataset with 99 observations and I need to create boxplots for ones with a specific string in them. However, when I run this code I get 57 of the exact same plots from the original function instead of the loop. I was wondering how to prevent the plots from being overwritten but still create all 57. Here is the code and a picture of the plot.
Thanks!
Boxplot Format
#starting boxplot function
myboxplot <- function(mydata=ivf_dataset, myexposure =
"ART_CURRENT", myoutcome = "MEG3_DMR_mean")
{bp <- ggplot(ivf_dataset, aes(ART_CURRENT, MEG3_DMR_mean))
bp <- bp + geom_boxplot(aes(group =ART_CURRENT))
}
#pulling out variables needed for plots
outcomes = names(ivf_dataset)[grep("_DMR_", names(ivf_dataset),
ignore.case = T)]
#creating loop for 57 boxplots
allplots <- list()
for (i in seq_along(outcomes))
{
allplots[[i]]<- myboxplot (myexposure = "ART_CURRENT", myoutcome =
outcomes[i])
}
allplots
I recommend reading about standard and non-standard evaluation and how this works with the tidyverse. Here are some links
http://adv-r.had.co.nz/Functions.html#function-arguments
http://adv-r.had.co.nz/Computing-on-the-language.html
I also found this useful
https://rstudio-pubs-static.s3.amazonaws.com/97970_465837f898094848b293e3988a1328c6.html
Also, you need to produce an example so that it is possible to replicate your problem. Here is the data that I created.
df <- data.frame(label = rep(c("a","b","c"), 5),
x = rnorm(15),
y = rnorm(15),
x2 = rnorm(15, 10),
y2 = rnorm(15, 5))
I kept most of your code the same and only changed what needed to be changed.
myboxplot2 <- function(mydata = df, myexposure, myoutcome){
bp <- ggplot(mydata, aes_(as.name(myexposure), as.name(myoutcome))) +
geom_boxplot()
print(bp)
}
myboxplot2(myexposure = "label", myoutcome = "y")
Because aes() uses non-standard evaluation, you need to use aes_(). Again, read the links above.
Here I am getting all the columns that start with x. I am assuming that your code gets the columns that you want.
outcomes <- names(df)[grep("^x", names(df), ignore.case = TRUE)]
Here I am looping through in the same way that you did. I am only storing the plot object though.
allplots <- list()
for (i in seq_along(outcomes)){
allplots[[i]]<- myboxplot2(myexposure = "label", myoutcome = outcomes[i])$plot
}
allplots

Sending dataframes within list to a plot function

I'm trying to make multiple ggplot charts from multiple data frames. I have developed the code below but the final loop doesn't work.
df1 <- tibble(
a = rnorm(10),
b = rnorm(10)
)
df2 <- tibble(
a = rnorm(20),
b = rnorm(20)
)
chart_it <- function(x) {
x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b)) +
ggsave(paste0(substitute(x),".png"))
}
ll <- list(df1,df2)
for (i in seq_along(ll)) {
chart_it(ll[[i]])
}
I know its something to do with
ll[[i]]
but I dont understand why because when I put that in the console it gives the dataframe I want. Also, is there a way do this the tidyverse way with the map functions instead of a for loop?
I assume you want to see two files called df1.png and df2.png at the end.
You need to somehow pass on the names of the dataframes to the function. One way of doing it would be through named list, passing the name along with the content of the list element.
library(ggplot2)
library(purrr)
df1 <- tibble(
a = rnorm(10),
b = rnorm(10)
)
df2 <- tibble(
a = rnorm(20),
b = rnorm(20)
)
chart_it <- function(x, nm) {
p <- x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b))
ggsave(paste0(nm,".png"), p, device = "png")
}
ll <- list(df1=df1,df2=df2)
for (i in seq_along(ll)) {
chart_it(ll[[i]], names(ll[i]))
}
In tidyverse you could just replace the loop with the following command without modifying the function.
purrr::walk2(ll, names(ll),chart_it)
or simply
purrr::iwalk(ll, chart_it)
There's also imap and lmap, but they will leave some output in the console, which is not what you would like to do, I guess.
The problem is in your chart_it function. It doesn't return a ggplot. Try saving the result of the pipe into a variable and return() that (or place it as the last statement in the function).
Something along the lines of
chart_it <- function(x) {
chart <- x %>% ggplot() +
geom_line(mapping = aes(y=a,x=b))
ggsave(paste0(substitute(x),".png")) # this will save the last ggplot figure
return(chart)
}

R ggplot2 boxplot from 10 files

I have 4 files each called 0_X_cell.csv, 0_S_cell.csv and 15_X_cell.csv, 15_S_cell.csv of the format:
p U:0 U:1 U:2 Tracer Tracer_0 U_0:0
-34.014 0.15268 -3.7907 -0.20155 10.081 10.032 0.12454
-33.836 0.07349 -2.1457 -0.30531 27.706 27.278 0.076542
I'd like to create boxplots out of the values for Tracer/3600 and put them on the same graph using ggplot2 but I'm finding it not quite so straightforward. Any suggestions would be much appreciated:
I'm thinking it might something like this:
Import data from all files into separate variables:
Extract Tracer from each one and put into a data.frame
Plot the boxplots of every column Tracer/3600. But each column will be called Tracer...
What would the correct procedure be?
Here's one way to do it (if I understood you correctly):
`0_X_cell.csv` <- `0_S_cell.csv` <- `15_X_cell.csv` <- `15_S_cell.csv` <- read.table(header=T, text="
p U:0 U:1 U:2 Tracer Tracer_0 U_0:0
-34.014 0.15268 -3.7907 -0.20155 10.081 10.032 0.12454
-33.836 0.07349 -2.1457 -0.30531 27.706 27.278 0.076542")
lst <- mget(grep("cell.csv", ls(), fixed=TRUE, value=TRUE))
df <- stack(lapply(lapply(lst, "[", "Tracer"), unlist))
df$ind <- sub("^(\\d+_[A-Z]).*$", "\\1", df$ind)
library(ggplot2)
ggplot(df, aes(ind, values/3600)) + geom_boxplot()
To read in the data from your dir:
z <- list.files(pattern = ".*cell\\.csv$")
z <- lapply(1:length(z), function(x) {chars <- strsplit(z[x], "_");
cbind(data.frame(Tracer = read.csv(z[x])$Tracer), time = chars[[1]][1], treatment = chars[[1]][2])})
z <- do.call(rbind, z)
Then plot it:
library(ggplot2)
ggplot(z, aes(y = Tracer/3600, x = factor(time))) +geom_boxplot(aes(fill = factor(treatment))) + ylab("Tracer")

Resources