Related
I am having trouble saving multiple plots from the output of a loop. To give some background:
I have multiple data frames, each with the data for single chemical toxicity for multiple species. I have labelled each data frame for the chemical that it represents, ie "ChemicalX". The data is in this format as this is how the "SSDTools" package works, which creates a species sensitivity distribution for a single chemical.
Because I have a lot of chemicals, I want to create a loop that iterates over each data frame, calculates the required metrics to create an SSD, plot the SSD, and then save the plot.
The code below works for calculating all of metrics and plotting the SSDs - it only breaks when I try to create a title within the loop, and when I try to save the plot within the loop
For reference, I am using the packages:
SSDTools, ggplot2, tidyverse, fitdistrplus
My code is as follows:
# Create a list of data frames
list_dfs <- list(ChemicalX, ChemicalY, ChemicalZ)
# make the loop
for (i in list_dfs){ # for each chemical (ie data frame)
ssd.fits <- ssd_fit_dists(i, dists = c("llogis", "gamma", "lnorm", "gompertz", "lgumbel", "weibull", "burrIII3", "invpareto", "llogis_llogis", "lnorm_lnorm")) # Test the goodness of fit using all distributions available
ssd.gof_fits <- ssd_gof(ssd.fits) # Save the goodness of fit statistics
chosen_dist <- ssd.gof_fits %>% # Choose the best fit distribution by
filter(aicc==min(aicc)) # finding the minimum aicc
final.fit <- ssd_fit_dists(i, dists = chosen_dist$dist) # Use the chosen distribution only
final.predict <-predict(final.fit, ci = TRUE) # generate the final predictions
plotdata <- i # create a separate plot data frame
final.plot <- ssd_plot(plotdata, final.predict, # generate the final plot
color = "Taxa",
label = "Species",
xlab = "Concentration (ug/L)", ribbon = TRUE) +
expand_limits(x = 10e6) + # to ensure the species labels fit
ggtitle(paste("Species Sensitivity for",chem_names_df[i], sep = " ")) +
scale_colour_ssd()
ggsave(filename = paste("SSD for",chem_names_df[i], ".pdf", sep = ""),
plot = final.plot)
}
The code works great right up until the last part, where I want to create a title for each chemical in each iteration, and where I want to save the filename as the chemical name.
I have two issues:
I want the title of the plot to be "Species Sensitivity for ChemicalX", with ChemicalX being the name of the data frame. However, when I use the following code the title gets all messed up, and gives me a list of the species in that data frame (see image).
ggtitle(paste("Species Sensitivity. for",i, sep = " "))
Graph title output using "i"
To try and get around this, I created a vector of chemical names that matches the order of the data frame list, called "chem_names_df". When I use ggtitle(paste("Species Sensitivity for",chem_names_df[i], sep = " ")) however, it gives me the error of Error in chem_names_df[i] : invalid subscript type 'list'
A similar issue is happening when I try to save the plot using GGSave. I am trying to save the filenames for each chemical data frame as "SSD_ChemicalX", except similarly to above it just outputs a list of the species in the place of i.
I think it has something to do with how R is calling from my list of dataframes - I am not sure why it is calling the species list (ie c("Danio Rerio, Lepomis macrochirus,...)) instead of the chemical name.
Any help would be appreciated! Thank you!
Basically your problem here is that you are sometimes using i as if it is an index, and sometimes as if it is a data frame, but in fact it is a data frame.
Your example is not reproducible so let me provide one. You have done the equivalent of:
list_dfs2 <- list(mtcars, mtcars, cars)
for(i in list_dfs2){
print(i)
}
This is just going to print the whole mtcars dataset twice and then the cars dataset. You can then define a vector:
cars_names <- c("mtcars", "mtcars", "cars")
If you call cars_names[i], on the first iteration you're not calling cars_names[1], you're trying to subset a vector with an entire data frame. That won't work. Better to seq_along() your list of data frames and then subset it with list_dfs[[i]] when you want to refer to the actual data frame rather than the index, i. Something like:
# Create a list of data frames
list_dfs <- list(ChemicalX, ChemicalY, ChemicalZ)
# make the loop
for (i in seq_along(list_dfs)){ # for each chemical (ie data frame)
ssd.fits <- ssd_fit_dists(list_dfs[[i]], dists = c("llogis", "gamma", "lnorm", "gompertz", "lgumbel", "weibull", "burrIII3", "invpareto", "llogis_llogis", "lnorm_lnorm")) # Test the goodness of fit using all distributions available
ssd.gof_fits <- ssd_gof(ssd.fits) # Save the goodness of fit statistics
chosen_dist <- ssd.gof_fits %>% # Choose the best fit distribution by
filter(aicc==min(aicc)) # finding the minimum aicc
final.fit <- ssd_fit_dists(list_dfs[[i]], dists = chosen_dist$dist) # Use the chosen distribution only
final.predict <-predict(final.fit, ci = TRUE) # generate the final predictions
plotdata <- list_dfs[[i]] # create a separate plot data frame
final.plot <- ssd_plot(plotdata, final.predict, # generate the final plot
color = "Taxa",
label = "Species",
xlab = "Concentration (ug/L)", ribbon = TRUE) +
expand_limits(x = 10e6) + # to ensure the species labels fit
ggtitle(paste("Species Sensitivity for",chem_names_df[i], sep = " ")) +
scale_colour_ssd()
ggsave(filename = paste("SSD for",chem_names_df[i], ".pdf", sep = ""),
plot = final.plot)
}
Consider using a defined method that receives name and data frame as input parameters. Then, pass a named list into the method using Map to iterate through data frames and corresponding names elementwise:
Function
build_plot <- function(plotdata, plotname) {
# Test the goodness of fit using all distributions available
ssd.fits <- ssd_fit_dists(
plotdata,
dists = c(
"llogis", "gamma", "lnorm", "gompertz", "lgumbel", "weibull",
"burrIII3", "invpareto", "llogis_llogis", "lnorm_lnorm"
)
)
# Save the goodness of fit statistics
ssd.gof_fits <- ssd_gof(ssd.fits)
# Choose the best fit distribution by finding the minimum aicc
chosen_dist <- filter(ssd.gof_fits, aicc==min(aicc))
# Use the chosen distribution only
final.fit <- ssd_fit_dists(plotdata, dists = chosen_dist$dist)
# generate the final predictions
final.predict <- predict(final.fit, ci = TRUE)
# generate the final plot
final.plot <- ssd_plot(
plotdata, final.predict, color = "Taxa", label = "Species",
xlab = "Concentration (ug/L)", ribbon = TRUE) +
expand_limits(x = 10e6) + # to ensure the species labels fit
ggtitle(paste("Species Sensitivity for", plotname)) +
scale_colour_ssd()
# export plot to pdf
ggsave(filename = paste0("SSD for ", plotname, ".pdf"), plot = final.plot)
# return plot to environment
return(final.plot)
}
Call
# create a named list of data frames
chem_dfs <- list(
"ChemicalX"=ChemicalX, "ChemicalY"=ChemicalY, "ChemicalZ"=ChemicalZ
)
chem_plots <- Map(build_plot, chem_dfs, names(chem_dfs))
I am foresting with combination of data sets from fpp2 package and forecasting function from the forecast package. Output from this forecasting is object list with SNAIVE_MODELS_ALL. This object contain data separate for two series, where first is Electricity and second is Cement.
You can see code below :
# CODE
library(fpp2)
library(dplyr)
library(forecast)
library(gridExtra)
library(ggplot2)
#INPUT DATA
mydata_qauselec <- qauselec
mydata_qcement <- window(qcement, start = 1956, end = c(2010, 2))
# Мerging data
mydata <- cbind(mydata_qauselec, mydata_qcement)
colnames(mydata) <- c("Electricity", "Cement")
# Test Extract Name
mydata1 <- data.frame(mydata)
COL_NAMES <- names(mydata1)
rm(mydata_qauselec, mydata_qcement)
# FORCASTING HORIZON
forecast_horizon <- 12
#FORCASTING
BuildForecast <- function(Z, hrz = forecast_horizon) {
timeseries <- msts(Z, start = 1956, seasonal.periods = 4)
forecast <- snaive(timeseries, biasadj = TRUE, h = hrz)
}
frc_list <- lapply(X = mydata1, BuildForecast)
#FINAL FORCASTING
SNAIVE_MODELS_ALL<-lapply(frc_list, forecast)
So my intention here is to put this object SNAIVE_MODELS_ALL into autoplot function in order to get two plots like pic below.
With code below I draw both plots separate, but my main intention is to do this with function autoplot and some function like apply or something similar, which can automatically draw this two chart like pic above.This is only small example in real example I will have maybe 5 or 10 charts.
#PLOT 1
P_PLOT1<-autoplot(SNAIVE_Electricity,main = "Snaive Electricity forecast",xlab = "Year", ylab = "in billion kWh")+
autolayer(SNAIVE_Electricity,series="Data")+
autolayer(SNAIVE_Electricity$fitted,series="Forecasts")
# PLOT 2
P_PLOT2<-autoplot(SNAIVE_Cement,main = "Snaive Cement forecast",xlab = "Year", ylab = "in millions of tonnes")+
autolayer(SNAIVE_Cement,series="Data")+
autolayer(SNAIVE_Cement$fitted,series="Forecasts")
#UNION PLOTS (PLOT 1 AND PLOT 2)
SNAIVE_PLOT_ALL<-grid.arrange(P_PLOT1,P_PLOT2)
So can anybody help me with this code ?
If I understand in a proper way, one of the difficulties with that problem is that each plot should have a specific title and y label. One of the possible solutions is to set the plot titles and y-lables as function arguments:
PlotForecast <- function(df_pl, main_pl, ylab_plt){
autoplot(df_pl,
main = main_pl,
xlab = "Year", ylab = ylab_plt)+
autolayer(df_pl,series="Data")+
autolayer(df_pl$fitted,series="Forecasts")
}
Prepare lists of the plot labels to be used with PlotForecast():
main_lst <- list("Snaive Electricity forecast", "Snaive Cement forecast")
ylab_lst <- list("in billion kWh", "in millions of tonnes")
Construct a list of plot-objects using a base Map() function:
PL_list <- Map(PlotForecast, df_pl = SNAIVE_MODELS_ALL, main_pl = main_lst,
ylab_plt= ylab_lst)
Then all we have to do is to call grid.arrange() with the plot list:
do.call(grid.arrange, PL_list)
Note, please, that main_lst and ylab_lst are created manually for demonstration purposes, but it is not the best way if you work with a lot of charts. Ideally, the labels should be generated automatically using the original SNAIVE_PLOT_ALL list.
I'm using a function in R able to analyse my data and produce several plots.
The function is "snpzip" from adegenet package.
I would like to save automatically the three plots that the function produces as part of the output. Do you have any suggestion on how to do it?
I want to point to the fact that I know how to save a single plot, for instance with png or pdf followed by dev.off(). My problem is that when I run snpzip(snps, phen, method = "centroid"), the outcomes are three plots (which I would like to save).
I report here the same example as in the "adegenet" package:
simpop <- glSim(100, 10000, n.snp.struc = 10, grp.size = c(0.3,0.7),
LD = FALSE, alpha = 0.4, k = 4)
snps <- as.matrix(simpop)
phen <- simpop#pop
outcome <- snpzip(snps, phen, method = "centroid")
If you use a filename with a C integer format in it, then R will substitute the page number for that part of the name, generating multiple files. For example,
png("page%d.png")
plot(1)
plot(2)
plot(3)
dev.off()
will generate 3 files, page1.png, page2.png, and page3.png. For pdf(), you also need onefile=FALSE:
pdf("page%d.pdf", onefile = FALSE)
plot(1)
plot(2)
plot(3)
dev.off()
In the following reproducible example I try to create a function for a ggplot distribution plot and saving it as an R object, with the intention of displaying two plots in a grid.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
output<-list(distribution,var1,dat)
return(output)
}
Call to function:
set.seed(100)
df <- data.frame(x = rnorm(100, mean=10),y =rep(1,100))
output1 <- ggplothist(dat=df,var1='x')
output1[1]
All fine untill now.
Then i want to make a second plot, (of note mean=100 instead of previous 10)
df2 <- data.frame(x = rep(1,1000),y = rnorm(1000, mean=100))
output2 <- ggplothist(dat=df2,var1='y')
output2[1]
Then i try to replot first distribution with mean 10.
output1[1]
I get the same distibution as before?
If however i use the information contained inside the function, return it back and reset it as a global variable it works.
var1=as.numeric(output1[2]);dat=as.data.frame(output1[3]);p1 <- output1[1]
p1
If anyone can explain why this happens I would like to know. It seems that in order to to draw the intended distribution I have to reset the data.frame and variable to what was used to draw the plot. Is there a way to save the plot as an object without having to this. luckly I can replot the first distribution.
but i can't plot them both at the same time
var1=as.numeric(output2[2]);dat=as.data.frame(output2[3]);p2 <- output2[1]
grid.arrange(p1,p2)
ERROR: Error in gList(list(list(data = list(x = c(9.66707664902549, 11.3631137069225, :
only 'grobs' allowed in "gList"
In this" Grid of multiple ggplot2 plots which have been made in a for loop " answer is suggested to use a list for containing the plots
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
pltlist <- list()
pltlist[["plot"]] <- distribution
output<-list(pltlist,var1,dat)
return(output)
}
output1 <- ggplothist(dat=df,var1='x')
p1<-output1[1]
output2 <- ggplothist(dat=df2,var1='y')
p2<-output2[1]
output1[1]
Will produce the distribution with mean=100 again instead of mean=10
and:
grid.arrange(p1,p2)
will produce the same Error
Error in gList(list(list(plot = list(data = list(x = c(9.66707664902549, :
only 'grobs' allowed in "gList"
As a last attempt i try to use recordPlot() to record everything about the plot into an object. The following is now inside the function.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
distribution<-recordPlot()
output<-list(distribution,var1,dat)
return(output)
}
This function will produce the same errors as before, dependent on resetting the dat, and var1 variables to what is needed for drawing the distribution. and similarly can't be put inside a grid.
I've tried similar things like arrangeGrob() in this question "R saving multiple ggplot2 plots as R-object in list and re-displaying in grid " but with no luck.
I would really like a solution that creates an R object containing the plot, that can be redrawn by itself and can be used inside a grid without having to reset the variables used to draw the plot each time it is done. I would also like to understand wht this is happening as I don't consider it intuitive at all.
The only solution I can think of is to draw the plot as a png file, saved somewhere and then have the function return the path such that i can be reused - is that what other people are doing?.
Thanks for reading, and sorry for the long question.
Found a solution
How can I reference the local environment within a function, in R?
by inserting
localenv <- environment()
And referencing that in the ggplot
distribution <- ggplot(data=dat, aes(dat[,var1]),environment = localenv)
made it all work! even with grid arrange!
I've been trying to write out an R script that will plot the date-temp series for a set of locations that are identified by a Deployment_ID.
Ideally, each page of the output pdf would have the name of the Deployment_ID (check), a graph with proper axes (check) and correct scaling of the x-axis to best show the date-temp series for that specific Deployment_ID (not check).
At the moment, the script makes a pdf that shows each ID over the full range of the dates in the date column (i.e. 1988-2010), instead of just the relevant dates (i.e. just 2005), which squishes the scatterplot down into uselessness.
I'm pretty sure it's something to do with how you define xlim, but I can't figure out how to have R access the date min and the date max for each factor as it draws the plots.
Script I have so far:
#Get CSV to read data from, change the file path and name
data <- read.csv(file.path("C:\Users\Person\Desktop\", "SampleData.csv"))
#Make Date real date - must be in yyyy/mm/dd format from the csv to do so
data$Date <- as.Date(data$Date)
#Call lattice to library, note to install.packages(lattice) if you don't have it
library(lattice)
#Make the plots with lattice, this takes a while.
dataplot <- xyplot(data$Temp~data$Date|factor(data$Deployment_ID),
data=data,
stack = TRUE,
auto.key = list(space = "right"),
layout = c(1,1),
ylim = c(-10,40)
)
#make the pdf
pdf("Dataplots_SampleData.pdf", onefile = TRUE)
#print to the pdf? Not really sure how this works. Takes a while.
print(dataplot)
dev.off()
Use the scales argument. give this a try
dataplot <- xyplot(data$Temp~data$Date|factor(data$Deployment_ID),
data=data,
stack = TRUE,
auto.key = list(space = "right"),
layout = c(1,1),
scales= list( relation ="free")
)