can't open pdf plot r - r

For some reason, in this loop the PDFs that it produces end up corrupt. However, when I plot each individually it is saved and I can open them. Please advise, going mad!
for (l in 1:length(which_genes)) {
gene_name <- which_genes[[l]]
cases_values <- cases[cases$HGNC == genes[gene_name],]
controls_values <- controls[controls$HGNC == genes[gene_name],]
t <- t.test(cases_values[c(2:ncol(cases_values))], controls_values[c(2:ncol(controls_values))])
case <- cbind(t(cases_values[c(2:ncol(cases_values))]), "cases")
cont <- cbind(t(controls_values[c(2:ncol(controls_values))]), "controls")
dat <- as.data.frame(rbind(case, cont))
names(dat) <- c("expression", "type")
dat$expression <- as.numeric(dat$expression)
#plot significant genes
pdf(file = paste(genes[gene_name], "_different.pdf", sep=""))
ggplot(dat, aes(type, expression, fill=type)) +
geom_boxplot() +
ggtitle(paste(genes[gene_name], "pvalue", t$p.value)) +
xlab("cases vs controls")
dev.off()
}

Yet another instance of the failure-to-print error (as described in the R-FAQ). Use this instead inside the loop:
pdf(file = paste(genes[gene_name], "_different.pdf", sep=""))
print( ggplot(dat, aes(type, expression, fill=type)) +
geom_boxplot() +
ggtitle(paste(genes[gene_name], "pvalue", t$p.value)) +
xlab("cases vs controls")
)
dev.off()
If the goal was to have a multi-page output then you should have opened the PDF-device outside the loop, print-ed within the loop, and then closed the device outside.

Related

Plot data using loop in R

I want to make a plot of the Daily Streamflow in each Station and save it in png format. I want a separate png for each station, something like the image below:
I have a list with the data frame for each station, as shown in the figure below:
I am trying using the following code, but it is not working because R aborted, I am not sure if it is because of the quantity of data:
for (i in 1:length(listDF2))
{
df1 <- as.data.frame(listDF2[[i]])
df1[is.na(df1)] <- 0
temp_plot <- ggplot(df1, aes(x = day, y = DailyMeanStreamflow, colour=Station)) +
geom_line(size = 1) +
geom_point(size=1.5, shape=21, fill="white") +
facet_wrap(~ month, ncol = 3) +
labs(title = "Daily Mean Streamflow",
subtitle = "Data plotted by month",
y = "Daily Mean Streamflow [m3/s]", x="Days") +
scale_y_continuous (breaks=seq(0,max(df1$DailyMeanStreamflow, na.rm=TRUE),by=1500)) +
scale_x_continuous (breaks=seq(1,max(df1$day),by=1)) + theme(axis.text.x = element_text(size=9))
print(temp_plot)
name4<- paste("DailyStreamflow_byMonth","_", siteNumber[i], ".png", sep="")
ggsave(temp_plot,filename = name4,width=22,height=11,units="in",dpi=500)
#while (!is.null(dev.list()))
dev.off()
}
I have also a "big" data frame with the data for each station one after the other. This data frame is useful when I want to apply functions like data_frame %>% group_by(station) %>% summarise(...)
Any idea in how to make the plots for each station? Is it better to use the list or the "big" data frame for this purpose?
I am not sure where the problem in your workflow occures. It is quite hard to help you, as we have not minimal working example. Also I am not sure if you just want to produce your plots in a loop or if you (also) want to put them together in one visualization?
Anyways ... I tried to give you a starting point ... maybe this will help?
"%>%" <- magrittr::"%>%"
df_list <- list(
A=dplyr::tibble(ID=1:10,
x=rnorm(10),
y=rnorm(10)),
B=dplyr::tibble(ID=1:10,
x=rnorm(10),
y=rnorm(10)),
C=dplyr::tibble(ID=1:10,
x=rnorm(10),
y=rnorm(10)),
D=dplyr::tibble(ID=1:10,
x=rnorm(10),
y=rnorm(10)))
# Lapply approach
lapply(df_list, function(dat){
p <- dat %>%
ggplot2::ggplot(ggplot2::aes(x=x,y=y)) +
ggplot2::geom_point()
print(p)
})
# Loop approach
for (i in 1:length(df_list)){
p <- df_list[[i]] %>%
ggplot2::ggplot(ggplot2::aes(x=x,y=y)) +
ggplot2::geom_point()
print(p)
fname <- paste("test","_", i, ".png", sep="")
ggsave(p,
filename=fname,
width=22,
height=11,
units="in",
dpi=500)
}

Create loops to write multiple graphs

I want to use a loop in order to create multiple plots for different values of DPC. The data I have looks like:
df <- data.frame (c ("Results", "Capacity", "Power", "LDI","LDE", "LB", "PDC","D", CostPerkWh)
As output I would like multiple graphs with graphs for each unique value of PDC.
The following plot work:
plot1 <- ggplot(subset(df, df$PDC=='PDC0'),
aes(Capacity, CostPerkWh))+ geom_point()+geom_line()
plot2 <- ggplot(subset(df, df$PDC=='PDC0.25'),
aes(Capacity, CostPerkWh))+ geom_point()+geom_line()
plot3 <- ggplot(subset(df, df$PDC=='PDC0.5'),
aes(Capacity, CostPerkWh))+ geom_point()+geom_line()
plot4 <- ggplot(subset(df, df$PDC=='PDC0.75'),
aes(Capacity, CostPerkWh))+ geom_point()+geom_line()
plot5 <- ggplot(subset(df, df$PDC=='PDC1'),
aes(Capacity, CostPerkWh))+ geom_point()+geom_line()
All these plots work,however I would like to create a loop since I have a large amount of parameters and I found this example.
So I tried to implement it into my own model:
#plot data
StoreResults <- "/Users/IMA/Documents/Results/"
PDC.graph <- function(df, na.rm = TRUE, ...){
PDClist <- unique(df$PDC)
for (i in seq_along(PDClist)){
plot <-
ggplot(subset(df, df$PDC==PDClist[i]),
aes(Capacity, CostPerkWh)) + geom_point()+
ggtitle(paste(PDClist, 'PDC, Power \n', "Capacity \n", sep='')) +
geom_line()
print(plot)
#save plot as PNG
ggsave(plot, file= paste(StoreResults, '/projection_graphs/PDCgraph/',
PDClist[i], ".png", sep=''), scale=2)
}
}
The code does not give me an error message, but I don't see any graphs and nothing gets stored into the folder that is defined; how to resolve this? Or is there a better way to export many graph for different values of PDC?
Didn't you forget running the function you created?
This minimal version works for me:
df = iris
StoreResults <- "/Users/timfaber/Desktop"
PDC.graph <- function(df, na.rm = TRUE, ...){
PDClist <- unique(df$Species)
for (i in seq_along(PDClist)){
ggplot(subset(iris, df$Species==PDClist[i]),
aes(Sepal.Length, Sepal.Width)) + geom_point() +
ggtitle(paste(PDClist[i], 'PDC, Power \n', "Capacity \n", sep=''))
#save plot as PNG
ggsave(plot = last_plot(), file= paste(StoreResults, '/etc/',
PDClist[i], ".png", sep=''), scale=2)
}
}
PDC.Graph(df)

Create the same plot for various data.frames

I have three different data.frames (GRCYPT_flows, ESIEIT_flows, GRCYPT_flows) which contain the same variables (report_ctry, partner_ctry, indicator, year, value), but with different levels/observations. Now I want to create plots for each of those data.frames. Since the plots are supposed to look the same, I seems reasonable to use an iterative command. I tried the foreach loop:
foreach(i=GRCYPT_flows, ESIEIT_flows, GRCYPT_flows) %do% { ggplot(i, aes(year, value)) +
geom_line(aes(colour=partner_ctry, linetype=indicator)) + facet_wrap(~report_ctry) +
theme(axis.text.x=element_text(angle=90, vjust=0.5)) +
scale_x_continuous(breaks=seq(2002, 2012, 2), name="") +
scale_y_continuous(name="Billion Euros") +
scale_colour_discrete(breaks=c("EA17", "ROW_NON_EA17"), labels=c("EA17", "Extra-EA17")) +
scale_linetype_discrete(breaks=c("EA17", "ROW_NON_EA17"), labels=c("Trade", "Capital")) +
theme(legend.title=element_blank())}
The code, as it is, does not work. I face to problems here:
Assign a data.frame to an iteration variable.
Tell the foreach loop to save each iteration to a different list with a distinct name (plot1, plot2, plot3, etc.).
I'm relatively sure, this is quite easy so solve if you have some experience with R. I'm a total greenhorn, however, so I really don't know where to start (I could easily do it with Stata with which I have at least some experience).
What I want to do is tell R: "Make a plot for each of these data.frames and save each of it in an individual list."
I would suggest separating the plotting code from the loop, that way you can test it on one example and then run it for the batch easily. And you probably want to save the batch to files.
library(tidyverse)
myplot <- function(df, filename = NULL) {
df %>%
ggplot(aes(Sepal.Length, Petal.Length)) +
geom_point() ->
result
if(!is.null(filename)) ggsave(filename, plot = result, width = 6, height = 4)
else result
}
# test the plot
myplot(iris)
# do the batch
l <- list(one = iris, two = iris)
l %>% names %>% walk(function(n) myplot(l[[n]], paste0(n, ".pdf")))
Here's an example with three data.frames of iris, which I'd named i1, i2 and i3 for simplicity sake.
i2 <- i3 <- i1 <- iris
foreach(m = 1:3) %do% {
dat <- paste0("i" , m) %>% get
ggplot(dat, aes(Sepal.Length, Petal.Length)) + geom_line()
}
Basically the trick is to call for the specific data.frame with get. In your case, this should work:
data.names <- c("GRCYPT_flows", "ESIEIT_flows", "GRCYPT_flows")
foreach(i=1:length(data.names) %do% {
dat <- get(data.names[i])
ggplot(dat, aes(year, value)) +
geom_line(aes(colour=partner_ctry, linetype=indicator)) +
facet_wrap(~report_ctry) +
theme(axis.text.x=element_text(angle=90, vjust=0.5)) +
scale_x_continuous(breaks=seq(2002, 2012, 2), name="") +
scale_y_continuous(name="Billion Euros") +
scale_colour_discrete(breaks=c("EA17", "ROW_NON_EA17"),
labels=c("EA17", "Extra-EA17")) +
scale_linetype_discrete(breaks=c("EA17", "ROW_NON_EA17"),
labels=c("Trade", "Capital")) +
theme(legend.title=element_blank())
}
I think the most "R"-y solution here would be lapply. Lapply takes a vector of things and does the same thing to all of them, then stores the outputs as a list. Since you're using ggplot, you may like a neatly organized list of all the similar plots.
First organize your data frames together in a list
my_data <- list(GRCYPT_flows, ESIEIT_flows)
Two of your "three" data frames have exactly the same name. I'm going to assume you actually meant two, but this would work with any number of data frames.
my_plots = lapply(my_data, function(i) {
ggplot(i, aes(year, value))
})
This takes each element of the list ("i") and does the custom function to it, where the custom function is your elaborate plots.
Since you're using ggplot, you can store these plots as outputs. so my_plots will be a neat list with all your plots.
so with your full plot function try:
my_plot <- lapply(my_data, function(i) {
ggplot(i, aes(year, value)) +
geom_line(aes(colour=partner_ctry, linetype=indicator)) + facet_wrap(~report_ctry) +
theme(axis.text.x=element_text(angle=90, vjust=0.5)) +
scale_x_continuous(breaks=seq(2002, 2012, 2), name="") +
scale_y_continuous(name="Billion Euros") +
scale_colour_discrete(breaks=c("EA17", "ROW_NON_EA17"), labels=c("EA17", "Extra-EA17")) +
scale_linetype_discrete(breaks=c("EA17", "ROW_NON_EA17"), labels=c("Trade", "Capital")) +
theme(legend.title=element_blank())
})

grid.arrange plotting same graphic for all plots in list

I am running R version 3.1.1. in RStudio and am having difficulties with grid.arrange.
I am reading sales data for over 100 stores and plotting the sales over time. I am also trying to group the plots to display a set number at a time.
However, after many different attempts, the command
do.call("grid.arrange", c(plotlist, ncol=nCol))
results in all plots in the plot matrix being identical. Further, plotting individual items from the plotlist demonstrates that all plots are stored as identical elements in the plotlist as well.
Here is code that duplicates the issue:
require(ggplot2)
require(gridExtra)
wss <- data.frame(ind=1:10, dep=(1:10))
plotlist <- list()
for (i in 1:4) {
wss <- data.frame(ind=wss$ind, dep=(i*wss$dep))
plotname <- paste0("Store", i, sep="")
plotlist[[ plotname ]] <- ggplot(data=wss, aes(x=wss$ind, y=wss$dep)) + geom_line() + theme_bw() + ggtitle(paste0("Store #",i, sep="")) + ylab("Sales Revenue") + theme(axis.title.x=element_blank())
}
n <- length(plotlist)
nCol <- floor(sqrt(n))
do.call("grid.arrange", c(plotlist, ncol=nCol))
I had the same issue, and used lapply instead of the "for", it seemed to fix the problem.
pList <- lapply(1:4, function(i){
p <- ggplot(data=wss, aes(x=wss$ind, y=wss$dep)) + geom_line() + theme_bw() + ggtitle(paste0("Store #",i, sep="")) + ylab("Sales Revenue") + theme(axis.title.x=element_blank())
}
# print 1 by 1
for (i in 1:4){
print(pList[[i]])
}
print all
g <- grid.arrange(grobs=pList, ncol=2, nrow=2)

statements not getting executed within a function, executing independently

I have run into a strange issue (I am new to R). I have tried creating a function as follows:
library(ggplot2)
median_confidence_interval <- function(x) {
quart_list<-c()
return_data<-data.frame(lower_ci=0,median=0,upper_ci=0)
for(i in 1:1000){
y<-x[as.integer(runif(length(x), min = 1, max = length(x) + 1))]
median<-median(y)
quart_list=c(quart_list,median)
}
return_data$median<-median(quart_list)
return_data$lower_ci<-quantile(quart_list,probs=0.025)
return_data$upper_ci<-quantile(quart_list,probs=0.975)
p <- ggplot()
p <- p + geom_density(aes(x=x)) + geom_density(aes(x=quart_list))
p <- p + geom_vline(aes(xintercept = return_data$median, color='red'))
p <- p + geom_vline(aes(xintercept = return_data$lower_ci, color='blue'))
p <- p + geom_vline(aes(xintercept = return_data$upper_ci, color='green')) + coord_cartesian(xlim = c(min(x),max(x)))
png("density_confidence_internal.png")
plot(p)
dev.off()
return_data
}
In this code I am simply trying to create a plot and save it. Though I am able to execute each of these statements independently, outside the function, but not inside the function. The function compiles without errors but while running the function it says 'quart_list' not found.
If quart_list and return_data are present in the workspace, then I am able to execute the function and get the result. When I clear the workspace and execute the function, I run into the same error while running (not compiling).
Another issue is that when I call the function median_confidence_interval(x), it expects me to only provide 'x' as the argument, it doesn't take something like median_confidence_interval(possum$earconch). Why could that be?
Would someone please be able to point me in some direction?
The environment evaluation of ggplot objects is a little mysterious. However, if you remember that ggplot wants data.frames to be passed to the data argument and the values in aes should be columns of the data.frame you'll generally avoid issues.
To debug things like this, I find it helpful to insert print statements into the function to sort out how far through it I get. (see the commented lines)
Adjusting your function accordingly gives:
median_confidence_interval <- function(x) {
quart_list<-c()
return_data<-data.frame(lower_ci=0,median=0,upper_ci=0)
for(i in 1:1000){
y<-x[as.integer(runif(length(x), min = 1, max = length(x) + 1))]
median<-median(y)
# print ('inside for loop')
quart_list=c(quart_list,median)
}
# print('past for loop')
return_data$median<-median(quart_list)
return_data$lower_ci<-quantile(quart_list,probs=0.025)
return_data$upper_ci<-quantile(quart_list,probs=0.975)
# print('start of ggplot code')
foo=data.frame(q=quart_list, x=x)
p <- ggplot()
p <- p + geom_density(data=foo, aes(x=x)) + geom_density(data=foo, aes(x=q))
# print('past first quart_list reference in ggplot')
p <- p + geom_vline(data=return_data, aes(xintercept = median, color='red'))
p <- p + geom_vline(data=return_data, aes(xintercept = lower_ci, color='blue'))
p <- p + geom_vline(data=return_data, aes(xintercept = upper_ci, color='green')) + coord_cartesian(xlim = c(min(x), max(x)))
png("/tmp/density_confidence_internal.png")
plot(p)
dev.off()
return_data
}
Also, I think #DWin has a good point in his comment!

Resources