How export several graphics from R [duplicate] - r

This question already has answers here:
Save ggplot within a function
(2 answers)
Closed 5 years ago.
I have list of data frames and i want to export one bar chart per data frame... I am trying use lapply but it does not work... Does anyone know how to do this?
my_data <- lapply(X = seq(from = 1, to = length(in_files_path), by = 1), FUN = function(x){
data_tables <- read.table(file = in_files_path[[x]], header = TRUE)
})
lapply(X = seq(from = 1, to = length(in_files_path), by = 1), FUN = function(x){
setwd(dir = ou_graph_path)
png(filename = in_files_name[[x]],
units = "in",
width = 15,
height = 10,
res = 300)
ggplot(data = my_data[[x]], aes(x = my_data[[x]]$A, y = my_data[[x]]$B)) +
geom_bar()
dev.off()
})

I would advice using the following approach
I would advice using the following approach
# Get list of files
# Start loop -
# read files
# make plot
# store plots in list
# - end loop
#
# Start loop -
# perform plot operation
# save plots
# - end loop
setwd(your_location_of_the_files)
list_files = list.files(pattern = ".csv")
for(i_file in list_files){
dummy = fread(i_file,header = TRUE)
png(filename = paste(your_location_for_the_plots,in_files_name[[x]],sep="/"),
units = "in",
width = 15,
height = 10,
res = 300)
# You can just say A here, not dummy$A
plot(ggplot(data = dummy, aes(x = A, y = B)) + geom_bar())
dev.off()
}

Related

ggsave ggsurvplot with risk.table

I am trying to save a ggsurvplot with risk.table using ggsave. However, the output off ggsave is always just the risk.table. I also tried this and this. None is working.
library(data.table)
library(survival)
library(survminer)
OS <- c(c(1:100), seq(1, 75, length = 50), c(1:50))
dead <- rep(1, times = 200)
variable <- c(rep(0, times = 100), rep(1, times = 50), rep(2, times = 50))
dt <- data.table(OS = OS,
dead = dead,
variable = variable)
survfit <- survfit(Surv(OS, dead) ~ variable, data = dt)
ggsurvplot(survfit, data = dt,
risk.table = TRUE)
ggsave("test.png")
The main issue is that a ggsurvplot object is a list of plots. Hence, when using ggsave only the last plot or element of the list is saved.
There is already a GitHub issue on that topic with several workarounds, e.g. using one of the more recent suggestions this works fine for me
library(survival)
library(survminer)
OS <- c(c(1:100), seq(1, 75, length = 50), c(1:50))
dead <- rep(1, times = 200)
variable <- c(rep(0, times = 100), rep(1, times = 50), rep(2, times = 50))
dt <- data.frame(OS = OS,
dead = dead,
variable = variable)
survfit <- survfit(Surv(OS, dead) ~ variable, data = dt)
# add method to grid.draw
grid.draw.ggsurvplot <- function(x){
survminer:::print.ggsurvplot(x, newpage = FALSE)
}
p <- ggsurvplot(survfit, data = dt, risk.table = TRUE)
ggsave("test.png", p, height = 6, width = 6)

ggarrange generates an empty pdf file

I am dealing with a function that takes a big data frame (36 rows and 194 columns) which performs a Principal Component Analysis and then generates a list of plots where I have the combination of 26 Principal Components which are 325 in total, using 'expand.grid'.
My problem is that when I am using ggarrange(), from ggpubr, to merge all the plots in only one pdf file, this file is empty.
My code:
a = 26
row.pairs = 325
PC.Graph <- function(df, col1, col2, tag, id){
df1 <- df[,-c(col1:col2)]
pca <- prcomp(df1, scale. = T)
pc.summ <- summary(pca)
a <- sum(pc.summ$importance[3,] < 0.975)
b <- c(1:a)
pc.grid <- expand.grid(b, b)
pc.pairs <- pc.grid[pc.grid$Var1 < pc.grid$Var2,]
row.pairs <- nrow(pc.pairs)
components <- c(1:row.pairs)
S.apply.FUN <- function(x){
c <- sapply(pc.pairs, "[", x, simplify = F)
pcx <- c$Var1
pcy <- c$Var2
df2 <- df
row.names(df2) <- df[, tag]
name = paste("PCA_", pcx, "_vs_", pcy)
autoplot(pca, data = df2, colour = id, label = T, label.repel = T, main = name,
x = pcx, y = pcy)
}
all.plots <- Map(S.apply.FUN, components)
pdf(file = "All_PC.pdf", width = 50, height = 70)
print(ggarrange(all.plots))
dev.off()
}
PC.Graph(Final_DF, col1 = 1, col2 = 5, tag = "Sample", id = "Maturation")
You would have to pass a plotlist to ggarrange, but I am not sure you would get any useful plot out of that plot area in the PDF file, so I would advise you to split the plotlist into chunks (e.g. of 20) and plot these to multiple pages.
Specifically, I would export all.plots from your PC.Graph function (and remove the code to write to PDF there).
I would also change the expand.grid(b, b) to t(combn(b, 2)), since you don't need to plot the PC combinations twice.
Then I would do something like this:
# export the full list of plots
plots <- PC.Graph(Final_DF, col1 = 1, col2 = 5, tag = "Sample", id = "Maturation")
# split the plotlist
splitPlots <- split(plots, ceiling(seq_along(plots)/20))
plotPlots <- function(x){
out <- cowplot::plot_grid(plotlist = x, ncol = 5, nrow = 4)
plot(out)
}
pdf(file = "All_PC.pdf", width = 50, height = 45)
lapply(splitPlots, plotPlots)
dev.off()

Memory leakage in using `ggplot` on large binned datasets

I am making various ggplots on a very large dataset (much larger than the examples). I created a binning function on both x- and y-axes to enable plotting of such large dataset.
In the following example, the memory.size() is recorded at the start. Then the large dataset is simulated as dt. dt's x2 is plotted against x1 with binning. Plotting is repeated with different subsets of dt. The size of the ploted object is checked by object.size() and stored. After the plotting objects have been created, rm(dt) is executed, followed by a double gc(). At this point, memory.size() is recorded again. At the end, the memory.size() at the end is compared to that at the beginning and printed.
In view of the small size of the plotted object, it is expected that the memory.size() at the end should be similar to that at the beginning. But no. memory.size() does not go down anymore until I restart a new R session.
REPRODUCIBLE EXAMPLE
library(data.table)
library(ggplot2)
library(magrittr)
# The binning function
# x = column name for x-axis (character)
# y = column name for y-axis (character)
# xNItv = Number of bin for x-axis
# yNItv = Number of bin for y-axis
# Value: A binned data.table
tab_by_bin_idxy <- function(dt, x, y, xNItv, yNItv) {
#Binning
xBreaks = dt[, seq(min(get(x), na.rm = T), max(get(x), na.rm = T), length.out = xNItv + 1)]
yBreaks = dt[, seq(min(get(y), na.rm = T), max(get(y), na.rm = T), length.out = yNItv + 1)]
xbinCode = dt[, .bincode(get(x), breaks = xBreaks, include.lowest = T)]
xbinMid = sapply(seq(xNItv), function(i) {return(mean(xBreaks[c(i, i+1)]))})[xbinCode]
ybinCode = dt[, .bincode(get(y), breaks = yBreaks, include.lowest = T)]
ybinMid = sapply(seq(yNItv), function(i) {return(mean(yBreaks[c(i, i+1)]))})[ybinCode]
#Creating table
tab_match = CJ(xbinCode = seq(xNItv), ybinCode = seq(yNItv))
tab_plot = data.table(xbinCode, xbinMid, ybinCode, ybinMid)[
tab_match, .(xbinMid = xbinMid[1], ybinMid = ybinMid[1], N = .N), keyby = .EACHI, on = c("xbinCode", "ybinCode")
]
#Returning table
return(tab_plot)
}
before.mem.size <- memory.size()
# Simulation of dataset
nrow <- 6e5
ncol <- 60
dt <- do.call(data.table, lapply(seq(ncol), function(i) {return(runif(nrow))}) %>% set_names(paste0("x", seq(ncol))))
# Graph plotting
dummyEnv <- new.env()
with(dummyEnv, {
fcn <- function(tab) {
binned.dt <- tab_by_bin_idxy(dt = tab, x = "x1", y = "x2", xNItv = 50, yNItv = 50)
plot <- ggplot(binned.dt, aes(x = xbinMid, y = ybinMid)) + geom_point(aes(size = N))
return(plot)
}
lst_plots <- list(
plot1 = fcn(dt),
plot2 = fcn(dt[x1 <= 0.7]),
plot3 = fcn(dt[x5 <= 0.3])
)
assign("size.of.plots", object.size(lst_plots), envir = .GlobalEnv)
})
rm(dummyEnv)
# After use, remove and clean up of dataset
rm(dt)
gc();gc()
after.mem.size <- memory.size()
# Memory reports
print(paste0("before.mem.size = ", before.mem.size))
print(paste0("after.mem.size = ", after.mem.size))
print(paste0("plot.objs.size = ", size.of.plots / 1000000))
I have tried the following modifications to the code:
Inside fcn, removing ggplot and returning a NULL instead of a plot object: The memory leakage is totally gone. But this is not a solution. I need the plot.
The less plots requested / less columns / less rows passed to fcn, the less is the memory leakage.
Memory leakage also exists if I do not make any subset and make only one plot object (In the examples, I plotted 3).
After the process, even after I call rm(list = ls()), the memory is still non-recoverable.
I wish to know why this happens and how to get rid of it without compromising my need to do binned plots and subset dt to make different plots.
Thanks for attention!

Save 2-plot figures in pdf within for loop

I have multiple plots to save as .pdf files and they are created in R by using par(mfrow=c(1,2)), i.e. per each figure (to be saved) there are 2 plots disposed by 1 row and 2 columns.
Since my total number of plots is quite high I am creating the plots with a for loop.
How can I save the figures (with 2 plots each one) as pdf files in the for loop?
Here's same funky code:
## create data.frames
df_1 = data.frame(x = c(1:100), y = rnorm(100))
df_2 = data.frame(x = c(1:100), y = rnorm(100))
df_3 = data.frame(x = c(1:100), y = rnorm(100))
df_4 = data.frame(x = c(1:100), y = rnorm(100))
## create list of data.frames
df_lst = list(df_1, df_2, df_3, df_4)
## plot in for loop by 1 row and 2 cols
par(mar=c(3,3,1,0), mfrow=c(1,2))
for (i in 1:length(df_lst)) {
barplot(df_lst[[i]]$y)
}
Let's say I want to save the plots with the pdf function. Here's what I tried:
for (i in 1:length(df_lst)) {
pdf(paste('my/directory/file_Name_', i, '.pdf', sep = ''), height = 6, width = 12)
barplot(df_lst[[i]]$y)
dev.off()
}
My solution is clearly wrong because the pdf function saves a figure at each loop (i.e. 4 instead of 2).
Any suggestion?
Thanks
Sounds like you could use a nested loop here: an outer loop for each file you create, and an inner loop for each multi-panel figure you create. Since all the data frames are stored in a 1-d list, you'll then need to keep track of the index of the list that you are plotting.
Here's one way to do that:
nrow <- 1
ncol <- 2
n_panels <- nrow * ncol
n_files <- length(df_lst) / n_panels
for (i in seq_len(n_files)) {
file <- paste0("file_", i, ".pdf")
pdf(file, height = 6, width = 12)
# plot params need to be set for each device
par(mar = c(3, 3, 1, 0), mfrow = c(nrow, ncol))
for (j in seq_len(n_panels)) {
idx <- (i - 1) * n_panels + j
barplot(df_lst[[idx]]$y)
}
# updated to also add a legend
legend("bottom", legend = "Bar", fill = "grey")
dev.off()
}
If you just want one file with multiple pages, all you need to do is move the pdf() call outside your original loop, and move the parameter setting after the pdf():
pdf('my/directory/file_Name.pdf', height = 6, width = 12)
par(mar=c(3,3,1,0), mfrow=c(1,2))
for (i in 1:length(df_lst)) {
barplot(df_lst[[i]]$y)
}
dev.off()

Apply loop in automated forecast

I am trying to forecast individual variables from a data.frame in long format. I get stuck in the loop [apply] part. The question is: how can I replace the manual forecasting with an apply?
library(forecast)
library(data.table)
# get time series
www = "http://staff.elena.aut.ac.nz/Paul-Cowpertwait/ts/cbe.dat"
cbe = read.table(www, header = T)
# in this case, there is a data.frame in long format to start with
df = data.table(cbe[, 2:3])
df[, year := 1958:1990]
dfm = melt(df, id.var = "year", variable.name = "indicator", variable.factor = F) # will give warning because beer = num and others are int
dfm[, site := "A"]
dfm2= copy(dfm) # make duplicate to simulate other site
dfm2[, site := "B"]
dfm = rbind(dfm, dfm2)
# function to make time series & forecast
f.forecast = function(df, mysite, myindicator, forecast.length = 6, frequency = freq) {
# get site and indicator
x = df[site == mysite & indicator == myindicator,]
# convert to time series
start.date = min(x$year)
myts = ts(x$value, frequency = freq, start = start.date)
# forecast
myfc = forecast(myts, h = forecast.length, fan = F, robust = T)
plot(myfc, main = paste(mysite, myindicator, sep = " / "))
grid()
return(myfc)
}
# the manual solution
par(mfrow = c(2,1))
f1 = f.forecast(dfm, mysite = "A", myindicator = "beer", forecast.length = 6, freq = 12)
f2 = f.forecast(dfm, mysite = "A", myindicator = "elec", forecast.length = 6, freq = 12)
# how to loop? [in the actual data set there are many variables per site]
par(mfrow = c(2,1))
myindicators = unique(dfm$indicator)
sapply(myindicator, f.forecast(dfm, "A", myindicator = myindicators, forecast.length = 6, freq = 12)) # does not work
I'd suggest using split and dropping the second and third argument of f.forecast. You directly pass the subset of the data.frame you want to forecast. For instance:
f.forecast = function(x, forecast.length = 6, frequency = freq) {
#comment the first line
#x = df[site == mysite & indicator == myindicator,]
#here goes the rest of the body
#modify the plot line
plot(myfc, main = paste(x$site[1], x$indicator[1], sep = " / "))
}
Now you split the entire df and call f.forecast for each subset:
dflist<-split(df,df[,c("site","indicator")],drop=TRUE)
lapply(dflist,f.forecast)

Resources