I'm writing a code that does long calculations over many time steps, and plots the results step by step.
This is R 3.6.1 under Windows 7, RStudio 1.1.383.
I'm working in RStudio, and trying to do the plots with ggplot2. The overall structure of the code is something like
for (step in 1:nb_steps){
... do big calculations
m<-ggplot(.... the data...) + ... some options
print(m)
}
You will note that I did assign the results of ggplot() to a variable, and I did explicitly print() it -- as suggested in many related posts, here as well as in RStudio web site.
In my (actual) example, the result is that the loop takes about 2-3 seconds for each iteration (the "big calculation" part). The (gg)graph is flashed for an instant, then disappear and the plot window blanks out -- as far as I can tell, shortly after the print() statement.
If I use a "regular" plot (in this case an image() )the code works as intended, i.e. the plot stays visible until it is over-plotted by something else.
Now my actual code is a bit long, so I tried to design a minimal reproducible example. This is what I came up with, for a result that is similar to the "main" example.
library(ggplot2)
data(mpg)
for(i in 1:10){
cat(i); cat("\n")
for (j in 1:100000){j*j} # Do something time-consuming
m<-ggplot(mpg, aes_(~ displ, ~ hwy, colour = ~trans)) +
geom_point() + ggtitle(paste("graph number",i))
print(m)
}
This gives the same result, i.e. the graph is briefly shown, then disappears, the window stays blank for a moment before the new graph comes in. It is a bit hard to keep an eye simultaneously on the console and the plot (!), but my impression is that the actual plot building (the ggplot() command) is somewhat time-consuming, and starts by blanking the window, then creates the plot, which is then drawn at the very end. Thus, I see a blank window for all the time it takes to run the ggplot() command itself. In my actual code, the ggplot() is more complex (it is a geom_raster() of a 50*50 matrix) so the delay is longer, so much in fact that I have more often a blank window than a plot !
Of course, I could add a Sys.sleep() at the end. I'd see the graph for a longer time, but the blank periods does not seem to decrease, and obviously this would make the run time longer, which is not what I want in the real-life case.
What I would like instead would be that the plot window should stay as it is until the print() statement. This would give the illusion of one plot replacing the previous one, without the interruption.
Any way of doing so ?
Thanks !
One suggestion for you might be to save your ggplot object into a grob (grid graphical object) and then print the grob.
Doing it in this way - using your sample code on my laptop - shortens the time of blank periods by half.
library(ggplot2)
library(grid)
data(mpg)
for(i in 1:10){
cat(i); cat("\n")
for (j in 1:100000){j*j} # Do something time-consuming
m<-ggplot(mpg, aes_(~ displ, ~ hwy, colour = ~trans)) +
geom_point() + ggtitle(paste("graph number",i))
grid.draw(ggplotGrob(m))
}
For completeness, I will add the default, which is to use the base graphics. If there is not need for ggplot graphics and the graphics are just being used for diagnostics, the base drawing package can whip out a graph very quickly. When I run MCMC, I will typically use base graphics for diagnostics, then ggplot2 for the fancy final stuff.
#base way
start <- Sys.time()
for(i in 1:10){
cat(i); cat("\n")
for (j in 1:100000){j*j} # Do something time-consuming
plot(mpg[["displ"]], mpg[["hwy"]], col = factor(mpg$trans),
main = paste("Graph Number", i))
legend("topleft", fill = factor(mpg$trans),legend = levels(factor(mpg$trans)),
ncol = 4)
}
end_time <- Sys.time()
end_time - start
It draws the graph and leaves it present for a while (and very quickly).
Related
I was experimenting with the waffle package in r, and was trying to use a for loop to make multiple plots at once but was not able to get my code to work. I have a dataset with values for each year of renewables,and since it is over 40 years of data, was looking for a simple way to plot these with a for loop rather than manyally year by year. What am I doing wrong?
I have it from 1:16 as an experiment to see if it would work, although in reality I would do it for all the years in my dataset.
for(i in 1:16){
renperc<-islren$Value[i]
parts <- c(`Renewable`=(renperc), `Non-Renewable`=100-renperc)
waffle(parts, rows=10, size=1, colors=c("#00CC00", "#A9A9A9"),
title="Iceland Primary Energy Supply",
xlab=islren$TIME)
}
If I get your question correctly you want to plot all the 16 iterations in a same panel? You can parametrise your plot window to be divided into 16 smaller plots using par(mfrow = c(4,4)) (creating a 4 by 4 matrix and plotting into each cells recursively).
## Setting the graphical parameters
par(mfrow = c(4,4))
## Running the loop normally
for(i in 1:16){
renperc<-islren$Value[i]
parts <- c(`Renewable`=(renperc), `Non-Renewable`=100-renperc)
waffle(parts, rows=10, size=1, colors=c("#00CC00", "#A9A9A9"),
title="Iceland Primary Energy Supply",
xlab=islren$TIME)
}
If you need more plots (e.g. 40) you can increase the numbers in the graphical parameters (e.g. par(mfrow = c(6,7))) but that will create really tiny plots. One solution is to do it in multiple loops (for(i in 1:16); for(i in 17:32); etc.)
UPDATE: The code simply wasn't plotting anything when i tried putting in anything above one value (ex. 1:16) or a letter, both in terms of separate plots or many in one plot window (which I think perhaps waffle does not support in the same way as regular plots). In the end, I managed by making it into a function, although I'm still not sure why my original method wouldn't work if this did. See the code that worked below. I also tweaked it a bit, adding ggsave for example.
#function
waffling <- function(x){
renperc<-islren$Value[x]
parts <- c(`Renewable`=(renperc), `Non-Renewable`=100-renperc)
waffle(parts, rows=10, size=1, colors=c("#00CC00", "#A9A9A9"), title="",
xlab=islren$TIME[x])
ggsave(file=paste0("plot_", x,".png"))}
for(i in 1:57){
waffling(i)
}
I've been trying to draw two plots using R's ggplot library in RStudio. Problem is, when I draw two within one function, only the last one displays (in RStudio's "plots" view) and the first one disappears. Even worse, when I run ggsave() after each plot - which saves them to a file - neither of them appear (but the files save as expected). However, I want to view what I've saved in the plots as I was able to before.
Is there a way I can both display what I'll be plotting in RStudio's plots view and also save them? Moreover, when the plots are not being saved, why does the display problem happen when there's more than one plot? (i.e. why does it show the last one but not the ones before?)
The code with the plotting parts are below. I've removed some parts because they seem unnecessary (but can add them if they are indeed relevant).
HHIplot = ggplot(pergame)
# some ggplot geoms and misc. here
ggsave(paste("HHI Index of all games,",year,"Finals.png"),
path = plotpath, width = 6, height = 4)
HHIAvePlot = ggplot(AveHHI, aes(x = AveHHI$n_brokers))
# some ggplot geoms and misc. here
ggsave(paste("Average HHI Index of all games,",year,"Finals.png"),
path = plotpath, width = 6, height = 4)
I've already taken a look here and here but neither have helped. Adding a print(HHIplot) or print(HHIAvePlot) after the ggsave() lines has not displayed the plot.
Many thanks in advance.
Update 1: The solution suggested below didn't work, although it works for the answer's sample code. I passed the ggplot objects to .Globalenv and print() gives me an empty gray box on the plot area (which I imagine is an empty ggplot object with no layers). I think the issue might lie in some of the layers or manipulators I have used, so I've brought the full code for one ggplot object below. Any thoughts? (Note: I've tried putting the assign() line in all possible locations in relation to ggsave() and ggplot().)
HHIplot = ggplot(pergame)
HHIplot +
geom_point(aes(x = pergame$n_brokers, y = pergame$HHI)) +
scale_y_continuous(limits = c(0,10000)) +
scale_x_discrete(breaks = gameSizes) +
labs(title = paste("HHI Index of all games,",year,"Finals"),
x = "Game Size", y = "Herfindahl-Hirschman Index") +
theme(text = element_text(size=15),axis.text.x = element_text(angle = 0, hjust = 1))
assign("HHIplot",HHIplot, envir = .GlobalEnv)
ggsave(paste("HHI Index of all games,",year,"Finals.png"),
path = plotpath, width = 6, height = 4)
I'll preface this by saying that the following is bad practice. It's considered bad practice to break a programming language's scoping rules for something as trivial as this, but here's how it's done anyway.
So within the body of your function you'll create both plots and put them into variables. Then you'll use ggsave() to write them out. Finally, you'll use assign() to push the variables to the global scope.
library(ggplot2)
myFun <- function() {
#some sample data that you should be passing into the function via arguments
df <- data.frame(x=1:10, y1=1:10, y2=10:1)
p1 <- ggplot(df, aes(x=x, y=y1))+geom_point()
p2 <- ggplot(df, aes(x=x, y=y2))+geom_point()
ggsave('p1.jpg', p1)
ggsave('p2.jpg', p2)
assign('p1', p1, envir=.GlobalEnv)
assign('p2', p2, envir=.GlobalEnv)
return()
}
Now, when you run myFun() it will write out your two plots to .jpg files, and also drop the plots into your global environment so that you can just run p1 or p2 on the console and they'll appear in RStudio's Plot pane.
ONCE AGAIN, THIS IS BAD PRACTICE
Good practice would be to not worry about the fact that they're not popping up in RStudio. They wrote out to files, and you know they did, so go look at them there.
I'm attempting to step through a dataset and create a histogram and summary table for each factor and save the output as a .svg . The histogram is created using ggplot2 and the summary table using summary().
I have successfully used the code below to save the output to a single .pdf with each page containing the relevant histogram/table. However, when I attempt to save each histogram/table combo into a set of .svg images using ggsave only the ggplot histogram is showing up in the .svg. The table is just white space.
I've tried using dev.copy Cairo and svg but all end up with the same result: Histogram renders, but table does not. If I save the image as a .png the table shows up.
I'm using the iris data as a reproducible dataset. I'm not using R-Studio which I saw was causing some "empty plot" grief for others.
#packages used
library(ggplot2)
library(gridExtra)
library(gtable)
library(Cairo)
#Create iris histogram plot
iris.hp<-ggplot(data=iris, aes(x=Sepal.Length)) +
geom_histogram(binwidth =.25,origin=-0.125,
right = TRUE,col="white", fill="steelblue4",alpha=1) +
labs(title = "Iris Sepal Length")+
labs(x="Sepal Length", y="Count")
iris.list<-by(data = iris, INDICES = iris$Species, simplify = TRUE,FUN = function(x)
{iris.hp %+% x + ggtitle(unique(x$Species))})
#Generate list of data to create summary statistics table
sum.str<-aggregate(Sepal.Length~Species,iris,summary)
spec<-sum.str[,1]
spec.stats<-sum.str[,2]
sum.data<-data.frame(spec,spec.stats)
sum.table<-tableGrob(sum.data)
colnames(sum.data) <-c("species","sep.len.min","sep.len.1stQ","sep.len.med",
"sep.len.mean","sep. len.3rdQ","sep.len.max")
table.list<-by(data = sum.data, INDICES = sum.data$"species", simplify = TRUE,
FUN = function(x) {tableGrob(x)})
#Combined histogram and summary table across multiple plots
multi.plots<-marrangeGrob(grobs=(c(rbind(iris.list,table.list))),
nrow=2, ncol=1, top = quote(paste(iris$labels$Species,'\nPage', g, 'of',pages)))
#bypass the class check per #baptiste
ggsave <- ggplot2::ggsave; body(ggsave) <- body(ggplot2::ggsave)[-2]
#
for(i in 1:3){
multi.plots<-marrangeGrob(grobs=(c(rbind(iris.list[i],table.list[i]))),
nrow=2, ncol=1,heights=c(1.65,.35),
top = quote(paste(iris$labels$Species,'\nPage', g, 'of',pages)))
prefix<-unique(iris$Species)
prefix<-prefix[i]
filename<-paste(prefix,".svg",sep="")
ggsave(filename,multi.plots)
#dev.off()
}
Edit removed theme tt3 that #rawr referenced. It was accidentally left in example code. It was not causing the problem, just in case anyone was curious.
Edit: Removing previous answer regarding it working under 32bit install and not x64 install because that was not the problem. Still unsure what was causing the issue, but it is working now. Leaving the info about grid.export as it may be a useful alternative for someone else.
Below is the loop for saving the .svg's using grid.export(), although I was having some text formatting issues with this (different dataset).
for(i in 1:3){
multi.plots<-marrangeGrob(grobs=(c(rbind(iris.list[i],table.list[i]))),
nrow=2, ncol=1,heights=c(1.65,.35), top =quote(paste(iris$labels$Species,'\nPage', g,
'of',pages)))
prefix<-unique(iris$Species)
prefix<-prefix[i]
filename<-paste(prefix,".svg",sep="")
grid.draw(multi.plots)
grid.export(filename)
grid.newpage()
}
EDIT: As for using arrangeGrob per #baptiste's comment. Below is the updated code. I was incorrectly using the single brackets [] for the returned by list, so I switched to the correct double brackets [[]] and used grid.draw to on the ggsave call.
for(i in 1:3){
prefix<-unique(iris$Species)
prefix<-prefix[i]
multi.plots<-grid.arrange(arrangeGrob(iris.list[[i]],table.list[[i]],
nrow=2,ncol=1,top = quote(paste(iris$labels$Species))))
filename<-paste(prefix,".svg",sep="")
ggsave(filename,grid.draw(multi.plots))
}
I have some 16 plots. I want to plot all of these in grid manner with ggplot2. But, whenever I plot, I get a grid with all the plots same, i.e, last plot saved in a list gets plotted at all the 16 places of grid. To replicate the same issue, here I am providing a simple example with two files. Although data are entirely different, but plots drawn are similar.
library(ggplot2)
library(grid)
library(gridExtra)
library(scales)
set.seed(1006)
date1<- as.POSIXct(seq(from=1443709107,by=3600,to=1446214707),origin="1970-01-01")
power <- rnorm(length(date1),100,5)#with normal distribution
write.csv(data.frame(date1,power),"file1.csv",row.names = FALSE,quote = FALSE)
# Now another dataset with uniform distribution
write.csv(data.frame(date1,power=runif(length(date1))),"file2.csv",row.names = FALSE,quote = FALSE)
path=getwd()
files=list.files(path,pattern="*.csv")
plist<-list()# for saving intermediate ggplots
for(i in 1:length(files))
{
dframe<-read.csv(paste(path,"/",files[i],sep = ""),head=TRUE,sep=",")
dframe$date1= as.POSIXct(dframe$date1)
plist[[i]]<- ggplot(dframe)+aes(dframe$date1,dframe$power)+geom_line()
}
grid.arrange(plist[[1]],plist[[2]],ncol = 1,nrow=2)
You need to remove the dframe from your call to aes. You should do that anyway because you have provided a data-argument. In this case it's even more important because while you save the ggplot-object, things don't get evaluated until the call to plot/grid.arrange. When you do that, it looks at the current value of dframe, which is the last dataset in your iteration.
You need to plot with:
ggplot(dframe)+aes(date1,power)+geom_line()
I have a for loop which produces 60 plots. I would like to save all this plots in only one file.
If I set par(mfrow=c(10,6)) it says : Error in plot.new() : figure margins too large
What can I do?
My code is as follows:
pdf(file="figure.pdf")
par(mfrow=c(10,6))
for(i in 1:60){
x=rnorm(100)
y=rnorm(100)
plot(x,y)
}
dev.off()
Your default plot, as stated in the loop, does not use the space very effectively. If you look at just a single plot, you can see it has large margins, both between axis and edge and plot area and axis text. Effectively, there is a lot of space-hogging.
Secondly, the default pdf-function creates small pages, 7 by 7 inches. That is not a large sheet to plot on.
Trying to plot a 10 x 6 or 12 x 5 on 7 by 7 inches is therefore trying to squeeze in a lot of whitespace on very little space.
For it to succeed, you must look into the margin-options of par which is mar, mai, oma and omi, and probably some more. Consult the documentation with the command
?par
In addition to this, you could consider not displaying axis-text, tick-marks, tick-labels and titles for every one of the 60 sub-plots, as this too will save you space.
But somebody has already gone through some of this trouble for you. Look into the lattice-package or ggplot2, which has some excellent methods for making table-like subplots.
But there is another pressing issue: What are you trying to display with 60 subplots?
Update
Seeing what you are trying to do, here is a small example of faceting in ggplot2. It uses the Tufte-theme from jrnold's ggthemes, which is copied into here and then modified slightly in the line after the function.
library(ggplot2)
library(scales)
#### Setup the `theme` for the plot, i.e. the appearance of background, lines, margins, etc. of the plot.
## This function returns a theme-object, which ggplot2 uses to control the appearance.
theme_tufte <- function(ticks=TRUE, base_family="serif", base_size=11) {
ret <- theme_bw(base_family=base_family, base_size=base_size) +
theme(
legend.background = element_blank(),
legend.key = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
strip.background = element_blank(),
plot.background = element_blank(),
axis.line = element_blank(),
panel.grid = element_blank())
if (!ticks) {
ret <- ret + theme(axis.ticks = element_blank())
}
ret
}
## Here I modify the theme returned from the function,
theme <- theme_tufte() + theme(panel.margin=unit(c(0,0,0,0), 'lines'), panel.border=element_rect(colour='grey', fill=NA))
## and instruct ggplot2 to use this theme as default.
theme_set(theme)
#### Some data generation.
size = 60*30
data <- data.frame(x=runif(size), y=rexp(size)+rnorm(size), mdl=sample(60,size, replace=TRUE))
#### Main plotting routine.
ggplot(data, aes(x,y, group=mdl)) ## base state of the plot to be used on all "layers", i.e. which data to use and which mappings to use (x should use x-variable, y should use the y-variable
+ geom_point() ## a layer that renders data as points, creates the scatterplot
+ stat_quantile(formula=y~x) ## another layer that adds some statistics, in this case the 25%, 50% and 75% quantile lines.
+ facet_wrap(~ mdl, ncol=6) ## Without this, all the groups would be displayed in one large plot; this breaks it up according to the `mdl`-variable.
The usual challenge in using ggplot2 is restructuring all your data into data.frames. For this task, the reshape2 and plyr-packages might be of good use.
For you, I would imagine that your function that creates the subplot both calculates the estimation and creates the plot. This means that you have to split the function into calculating the estimation, returning it to a data.frame, which you then can collate and pass to ggplot.
Output the plots to a pdf:
X = matrix(rnorm(60*100), ncol=60)
Y = matrix(rnorm(60*100), ncol=60)
pdf(file="fileName.pdf")
for(j in 1:60){
plot(X[,j], Y[,j])
}
dev.off()
For placing many plots on a page or document (and I have created images with literally thousands of plots in them), it is convenient to separate the work between R--which creates the plots individually--and other software which is better suited for arranging arrays of things. If this reminds you of spreadsheets or word processing tables, then we are thinking alike.
This page, which is a screenshot from a PDF file, contains over 200 statistical graphics. Although it has been greatly reduced (to 40% nominal size) in order to obscure proprietary data, the original has all the detail of the original R graphics and can be zoomed to 1600% without problem.
Two mechanisms have worked reasonably well. For up to several hundred plots, a little macro to import and re-sequence a set of bitmapped image files (.emf or .wmf) into a Word document does fine. For better control, I turn to a comparable Excel macro. It is driven by a sheet that is empty of everything except a row with column headers and a column with row headers. (You can see them at the left and top of the figure.) The macro deletes everything else on that sheet (except for formatting), then munges each possible combination of row and column header into a file name and if it finds that file, it imports it into the corresponding cell. The whole operation takes just a few seconds for several thousand images.
Obviously this communication mechanism between R and the other software is primitive, consisting of a collection of image files having a standard naming convention. But the code needed to implement it all is brief (albeit customized to each situation) and it works reliably. For example, if you encapsulate the plotting code within a function, then it will be called within a loop to create many similar plots. At the end of that function add a few lines to save the plot to a file, something like this:
path <- "W: <whatever>/" # Folder for the output files
ext <- "wmf" # or "emf" or "png" or ... # Format (and extension) of the output
...
if (save) {
outfile <- paste(path, paste(munge(well), munge(parm), sep="_"), sep="/")
outfile <- paste(outfile, ext, sep=".")
savePlot(filename=outfile, type=ext)
}
In this case each plot is identified by two loop variables, well and parm, both of which are strings (they correspond to the column and row headers). The function for creating acceptable filenames merely strips out punctuation, replacing it by an anodyne placeholder:
munge <- function(s) gsub("[[:punct:]]", "_", s)
Once those images have been imported into Word, Excel, or wherever you like, it's fairly easy to reorganize them, place other material around them, etc., and then print the result in PDF format.
There is an art to creating these very large "small multiples" (in Tufte's terminology). To the extent possible, it helps to follow Tufte's principle of increasing the data:ink ratio by erasing inessential material. That makes graphical patterns clear even when the tableau has been greatly reduced in size in order to comprehend all its rows and columns at once. Although the preceding figure is a poor example--the individual plots had to have axes, gridlines, labels, and so on so that they can be read in detail when zoomed--the power of this method to reveal patterns is clear even at this scale. It is crucial to make the plots comparable to one another. In this example, which consists of time series, every plot has the same range on the x-axis; within each row (which corresponds to a different type of observation), the ranges on the y-axes are the same; and all color schemes and methods of symbolization are the same throughout.
You could also use knitr. This didn't instantly convert over to base graphics (and I've got to run now), but using ggplot works easily.
\documentclass{article}
\begin{document}
<<echo = FALSE, fig.keep='high', fig.height=3, fig.width=4>>=
require(ggplot2)
for (i in 1:10) print(ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point())
#
\end{document}
The above code will produce a nice multi-page pdf with all the graphs.
For a very simple solution to this type of issue, I found that setting a large "Windows" device manages to make the window big enough for many uses.
windows(50,50)
par(mfrow=c(10,6))
for(i in 1:60){
x=rnorm(100)
y=rnorm(100)
plot(x,y)
}
Or in my case,
windows(20,20)
plot(Plotting_I_Need_In_Rows_of_4, mfrow=c(4,4))