Ggplot does not show plots in sourced function - r

I've been trying to draw two plots using R's ggplot library in RStudio. Problem is, when I draw two within one function, only the last one displays (in RStudio's "plots" view) and the first one disappears. Even worse, when I run ggsave() after each plot - which saves them to a file - neither of them appear (but the files save as expected). However, I want to view what I've saved in the plots as I was able to before.
Is there a way I can both display what I'll be plotting in RStudio's plots view and also save them? Moreover, when the plots are not being saved, why does the display problem happen when there's more than one plot? (i.e. why does it show the last one but not the ones before?)
The code with the plotting parts are below. I've removed some parts because they seem unnecessary (but can add them if they are indeed relevant).
HHIplot = ggplot(pergame)
# some ggplot geoms and misc. here
ggsave(paste("HHI Index of all games,",year,"Finals.png"),
path = plotpath, width = 6, height = 4)
HHIAvePlot = ggplot(AveHHI, aes(x = AveHHI$n_brokers))
# some ggplot geoms and misc. here
ggsave(paste("Average HHI Index of all games,",year,"Finals.png"),
path = plotpath, width = 6, height = 4)
I've already taken a look here and here but neither have helped. Adding a print(HHIplot) or print(HHIAvePlot) after the ggsave() lines has not displayed the plot.
Many thanks in advance.
Update 1: The solution suggested below didn't work, although it works for the answer's sample code. I passed the ggplot objects to .Globalenv and print() gives me an empty gray box on the plot area (which I imagine is an empty ggplot object with no layers). I think the issue might lie in some of the layers or manipulators I have used, so I've brought the full code for one ggplot object below. Any thoughts? (Note: I've tried putting the assign() line in all possible locations in relation to ggsave() and ggplot().)
HHIplot = ggplot(pergame)
HHIplot +
geom_point(aes(x = pergame$n_brokers, y = pergame$HHI)) +
scale_y_continuous(limits = c(0,10000)) +
scale_x_discrete(breaks = gameSizes) +
labs(title = paste("HHI Index of all games,",year,"Finals"),
x = "Game Size", y = "Herfindahl-Hirschman Index") +
theme(text = element_text(size=15),axis.text.x = element_text(angle = 0, hjust = 1))
assign("HHIplot",HHIplot, envir = .GlobalEnv)
ggsave(paste("HHI Index of all games,",year,"Finals.png"),
path = plotpath, width = 6, height = 4)

I'll preface this by saying that the following is bad practice. It's considered bad practice to break a programming language's scoping rules for something as trivial as this, but here's how it's done anyway.
So within the body of your function you'll create both plots and put them into variables. Then you'll use ggsave() to write them out. Finally, you'll use assign() to push the variables to the global scope.
library(ggplot2)
myFun <- function() {
#some sample data that you should be passing into the function via arguments
df <- data.frame(x=1:10, y1=1:10, y2=10:1)
p1 <- ggplot(df, aes(x=x, y=y1))+geom_point()
p2 <- ggplot(df, aes(x=x, y=y2))+geom_point()
ggsave('p1.jpg', p1)
ggsave('p2.jpg', p2)
assign('p1', p1, envir=.GlobalEnv)
assign('p2', p2, envir=.GlobalEnv)
return()
}
Now, when you run myFun() it will write out your two plots to .jpg files, and also drop the plots into your global environment so that you can just run p1 or p2 on the console and they'll appear in RStudio's Plot pane.
ONCE AGAIN, THIS IS BAD PRACTICE
Good practice would be to not worry about the fact that they're not popping up in RStudio. They wrote out to files, and you know they did, so go look at them there.

Related

RStudio taks a long time to display a ggplot graph

I'm writing a code that does long calculations over many time steps, and plots the results step by step.
This is R 3.6.1 under Windows 7, RStudio 1.1.383.
I'm working in RStudio, and trying to do the plots with ggplot2. The overall structure of the code is something like
for (step in 1:nb_steps){
... do big calculations
m<-ggplot(.... the data...) + ... some options
print(m)
}
You will note that I did assign the results of ggplot() to a variable, and I did explicitly print() it -- as suggested in many related posts, here as well as in RStudio web site.
In my (actual) example, the result is that the loop takes about 2-3 seconds for each iteration (the "big calculation" part). The (gg)graph is flashed for an instant, then disappear and the plot window blanks out -- as far as I can tell, shortly after the print() statement.
If I use a "regular" plot (in this case an image() )the code works as intended, i.e. the plot stays visible until it is over-plotted by something else.
Now my actual code is a bit long, so I tried to design a minimal reproducible example. This is what I came up with, for a result that is similar to the "main" example.
library(ggplot2)
data(mpg)
for(i in 1:10){
cat(i); cat("\n")
for (j in 1:100000){j*j} # Do something time-consuming
m<-ggplot(mpg, aes_(~ displ, ~ hwy, colour = ~trans)) +
geom_point() + ggtitle(paste("graph number",i))
print(m)
}
This gives the same result, i.e. the graph is briefly shown, then disappears, the window stays blank for a moment before the new graph comes in. It is a bit hard to keep an eye simultaneously on the console and the plot (!), but my impression is that the actual plot building (the ggplot() command) is somewhat time-consuming, and starts by blanking the window, then creates the plot, which is then drawn at the very end. Thus, I see a blank window for all the time it takes to run the ggplot() command itself. In my actual code, the ggplot() is more complex (it is a geom_raster() of a 50*50 matrix) so the delay is longer, so much in fact that I have more often a blank window than a plot !
Of course, I could add a Sys.sleep() at the end. I'd see the graph for a longer time, but the blank periods does not seem to decrease, and obviously this would make the run time longer, which is not what I want in the real-life case.
What I would like instead would be that the plot window should stay as it is until the print() statement. This would give the illusion of one plot replacing the previous one, without the interruption.
Any way of doing so ?
Thanks !
One suggestion for you might be to save your ggplot object into a grob (grid graphical object) and then print the grob.
Doing it in this way - using your sample code on my laptop - shortens the time of blank periods by half.
library(ggplot2)
library(grid)
data(mpg)
for(i in 1:10){
cat(i); cat("\n")
for (j in 1:100000){j*j} # Do something time-consuming
m<-ggplot(mpg, aes_(~ displ, ~ hwy, colour = ~trans)) +
geom_point() + ggtitle(paste("graph number",i))
grid.draw(ggplotGrob(m))
}
For completeness, I will add the default, which is to use the base graphics. If there is not need for ggplot graphics and the graphics are just being used for diagnostics, the base drawing package can whip out a graph very quickly. When I run MCMC, I will typically use base graphics for diagnostics, then ggplot2 for the fancy final stuff.
#base way
start <- Sys.time()
for(i in 1:10){
cat(i); cat("\n")
for (j in 1:100000){j*j} # Do something time-consuming
plot(mpg[["displ"]], mpg[["hwy"]], col = factor(mpg$trans),
main = paste("Graph Number", i))
legend("topleft", fill = factor(mpg$trans),legend = levels(factor(mpg$trans)),
ncol = 4)
}
end_time <- Sys.time()
end_time - start
It draws the graph and leaves it present for a while (and very quickly).

r - Missing object when ggsave output as .svg

I'm attempting to step through a dataset and create a histogram and summary table for each factor and save the output as a .svg . The histogram is created using ggplot2 and the summary table using summary().
I have successfully used the code below to save the output to a single .pdf with each page containing the relevant histogram/table. However, when I attempt to save each histogram/table combo into a set of .svg images using ggsave only the ggplot histogram is showing up in the .svg. The table is just white space.
I've tried using dev.copy Cairo and svg but all end up with the same result: Histogram renders, but table does not. If I save the image as a .png the table shows up.
I'm using the iris data as a reproducible dataset. I'm not using R-Studio which I saw was causing some "empty plot" grief for others.
#packages used
library(ggplot2)
library(gridExtra)
library(gtable)
library(Cairo)
#Create iris histogram plot
iris.hp<-ggplot(data=iris, aes(x=Sepal.Length)) +
geom_histogram(binwidth =.25,origin=-0.125,
right = TRUE,col="white", fill="steelblue4",alpha=1) +
labs(title = "Iris Sepal Length")+
labs(x="Sepal Length", y="Count")
iris.list<-by(data = iris, INDICES = iris$Species, simplify = TRUE,FUN = function(x)
{iris.hp %+% x + ggtitle(unique(x$Species))})
#Generate list of data to create summary statistics table
sum.str<-aggregate(Sepal.Length~Species,iris,summary)
spec<-sum.str[,1]
spec.stats<-sum.str[,2]
sum.data<-data.frame(spec,spec.stats)
sum.table<-tableGrob(sum.data)
colnames(sum.data) <-c("species","sep.len.min","sep.len.1stQ","sep.len.med",
"sep.len.mean","sep. len.3rdQ","sep.len.max")
table.list<-by(data = sum.data, INDICES = sum.data$"species", simplify = TRUE,
FUN = function(x) {tableGrob(x)})
#Combined histogram and summary table across multiple plots
multi.plots<-marrangeGrob(grobs=(c(rbind(iris.list,table.list))),
nrow=2, ncol=1, top = quote(paste(iris$labels$Species,'\nPage', g, 'of',pages)))
#bypass the class check per #baptiste
ggsave <- ggplot2::ggsave; body(ggsave) <- body(ggplot2::ggsave)[-2]
#
for(i in 1:3){
multi.plots<-marrangeGrob(grobs=(c(rbind(iris.list[i],table.list[i]))),
nrow=2, ncol=1,heights=c(1.65,.35),
top = quote(paste(iris$labels$Species,'\nPage', g, 'of',pages)))
prefix<-unique(iris$Species)
prefix<-prefix[i]
filename<-paste(prefix,".svg",sep="")
ggsave(filename,multi.plots)
#dev.off()
}
Edit removed theme tt3 that #rawr referenced. It was accidentally left in example code. It was not causing the problem, just in case anyone was curious.
Edit: Removing previous answer regarding it working under 32bit install and not x64 install because that was not the problem. Still unsure what was causing the issue, but it is working now. Leaving the info about grid.export as it may be a useful alternative for someone else.
Below is the loop for saving the .svg's using grid.export(), although I was having some text formatting issues with this (different dataset).
for(i in 1:3){
multi.plots<-marrangeGrob(grobs=(c(rbind(iris.list[i],table.list[i]))),
nrow=2, ncol=1,heights=c(1.65,.35), top =quote(paste(iris$labels$Species,'\nPage', g,
'of',pages)))
prefix<-unique(iris$Species)
prefix<-prefix[i]
filename<-paste(prefix,".svg",sep="")
grid.draw(multi.plots)
grid.export(filename)
grid.newpage()
}
EDIT: As for using arrangeGrob per #baptiste's comment. Below is the updated code. I was incorrectly using the single brackets [] for the returned by list, so I switched to the correct double brackets [[]] and used grid.draw to on the ggsave call.
for(i in 1:3){
prefix<-unique(iris$Species)
prefix<-prefix[i]
multi.plots<-grid.arrange(arrangeGrob(iris.list[[i]],table.list[[i]],
nrow=2,ncol=1,top = quote(paste(iris$labels$Species))))
filename<-paste(prefix,".svg",sep="")
ggsave(filename,grid.draw(multi.plots))
}

Cannot save plots as pdf when ggplot function is called inside a function

I am going to plot a boxplot from a 4-column matrix pl1 using ggplot with dots on each box. The instruction for plotting is like this:
p1 <- ggplot(pl1, aes(x=factor(Edge_n), y=get(make.names(y_label)), ymax=max(get(make.names(y_label)))*1.05))+
geom_boxplot(aes(fill=method), outlier.shape= NA)+
theme(text = element_text(size=20), aspect.ratio=1)+
xlab("Number of edges")+
ylab(y_label)+
scale_fill_manual(values=color_box)+
geom_point(aes(x=factor(Edge_n), y=get(make.names(true_des)), ymax=max(get(make.names(true_des)))*1.05, color=method),
position = position_dodge(width=0.75))+
scale_color_manual(values=color_pnt)
Then, I use print(p1) to print it on an opened pdf. However, this does not work for me and I get the below error:
Error in make.names(true_des) : object 'true_des' not found
Does anyone can help?
Your example is not very clear because you give a call but you don't show the values of your variables so it's really hard to figure out what you're trying to do (for instance, is method the name of a column in the data frame pl1, or is it a variable (and if it's a variable, what is its type? string? name?)).
Nonetheless, here's an example that should help set you on the way to doing what you want:
Try something like this:
pl1 <- data.frame(Edge_n = sample(5, 20, TRUE), foo = rnorm(20), bar = rnorm(20))
y_label <- 'foo'
ax <- do.call(aes, list(
x=quote(factor(Edge_n)),
y=as.name(y_label),
ymax = substitute(max(y)*1.05, list(y=as.name(y_label)))))
p1 <- ggplot(pl1) + geom_boxplot(ax)
print(p1)
This should get you started to figuring out the rest of what you're trying to do.
Alternately (a different interpretation of your question) is that you may be running into a problem with the environment in which aes evaluates its arguments. See https://github.com/hadley/ggplot2/issues/743 for details. If this is the issue, then the answer might to override the default value of the environment argument to aes, for instance: aes(x=factor(Edge_n), y=get(make.names(y_label)), ymax=max(get(make.names(y_label)))*1.05, environment=environment())

Assigning "beanplot" object to variable in R

I have found that the beanplot is the best way to represent my data. I want to look at multiple beanplots together to visualize my data. Each of my plots contains 3 variables, so each one looks something like what would be generated by this code:
library(beanplot)
a <- rnorm(100)
b <- rnorm(100)
c <- rnorm(100)
beanplot(a, b ,c ,ylim = c(-4, 4), main = "Beanplot",
col = c("#CAB2D6", "#33A02C", "#B2DF8A"), border = "#CAB2D6")
(Would have just included an image but my reputation score is not high enough, sorry)
I have 421 of these that I want to put into one long PDF (EDIT: One plot per page is fine, this was just poor wording on my part). The approach I have taken was to first generate the beanplots in a for loop and store them in a list at each iteration. Then I will use the multiplot function (from the R Cookbook page on multiplot) to display all of my plots on one long column so I can begin my analysis.
The problem is that the beanplot function does not appear to be set up to assign plot objects as a variable. Example:
library(beanplot)
a <- rnorm(100)
b <- rnorm(100)
plot1 <- beanplot(a, b, ylim = c(-5,5), main = "Beanplot",
col = c("#CAB2D6", "#33A02C", "#B2DF8A"), border = "#CAB2D6")
plot1
If you then type plot1 into the R console, you will get back two of the plot parameters but not the plot itself. This means that when I store the plots in the list, I am unable to graph them with multiplot. It will simply return the plot parameters and a blank plot.
This behavior does not seem to be the case with qplot for example which will return a plot when you recall the stored plot. Example:
library(ggplot2)
a <- rnorm(100)
b <- rnorm(100)
plot2 <- qplot(a,b)
plot2
There is no equivalent to the beanplot that I know of in ggplot. Is there some sort of workaround I can use for this issue?
Thank you.
You can simply open a PDF device with pdf() and keep the default parameter onefile=TRUE. Then call all your beanplot()s, one after the other. They will all be in one PDF document, each one on a separate page. See here.

How to deal with a lot of plots in R

I have a for loop which produces 60 plots. I would like to save all this plots in only one file.
If I set par(mfrow=c(10,6)) it says : Error in plot.new() : figure margins too large
What can I do?
My code is as follows:
pdf(file="figure.pdf")
par(mfrow=c(10,6))
for(i in 1:60){
x=rnorm(100)
y=rnorm(100)
plot(x,y)
}
dev.off()
Your default plot, as stated in the loop, does not use the space very effectively. If you look at just a single plot, you can see it has large margins, both between axis and edge and plot area and axis text. Effectively, there is a lot of space-hogging.
Secondly, the default pdf-function creates small pages, 7 by 7 inches. That is not a large sheet to plot on.
Trying to plot a 10 x 6 or 12 x 5 on 7 by 7 inches is therefore trying to squeeze in a lot of whitespace on very little space.
For it to succeed, you must look into the margin-options of par which is mar, mai, oma and omi, and probably some more. Consult the documentation with the command
?par
In addition to this, you could consider not displaying axis-text, tick-marks, tick-labels and titles for every one of the 60 sub-plots, as this too will save you space.
But somebody has already gone through some of this trouble for you. Look into the lattice-package or ggplot2, which has some excellent methods for making table-like subplots.
But there is another pressing issue: What are you trying to display with 60 subplots?
Update
Seeing what you are trying to do, here is a small example of faceting in ggplot2. It uses the Tufte-theme from jrnold's ggthemes, which is copied into here and then modified slightly in the line after the function.
library(ggplot2)
library(scales)
#### Setup the `theme` for the plot, i.e. the appearance of background, lines, margins, etc. of the plot.
## This function returns a theme-object, which ggplot2 uses to control the appearance.
theme_tufte <- function(ticks=TRUE, base_family="serif", base_size=11) {
ret <- theme_bw(base_family=base_family, base_size=base_size) +
theme(
legend.background = element_blank(),
legend.key = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
strip.background = element_blank(),
plot.background = element_blank(),
axis.line = element_blank(),
panel.grid = element_blank())
if (!ticks) {
ret <- ret + theme(axis.ticks = element_blank())
}
ret
}
## Here I modify the theme returned from the function,
theme <- theme_tufte() + theme(panel.margin=unit(c(0,0,0,0), 'lines'), panel.border=element_rect(colour='grey', fill=NA))
## and instruct ggplot2 to use this theme as default.
theme_set(theme)
#### Some data generation.
size = 60*30
data <- data.frame(x=runif(size), y=rexp(size)+rnorm(size), mdl=sample(60,size, replace=TRUE))
#### Main plotting routine.
ggplot(data, aes(x,y, group=mdl)) ## base state of the plot to be used on all "layers", i.e. which data to use and which mappings to use (x should use x-variable, y should use the y-variable
+ geom_point() ## a layer that renders data as points, creates the scatterplot
+ stat_quantile(formula=y~x) ## another layer that adds some statistics, in this case the 25%, 50% and 75% quantile lines.
+ facet_wrap(~ mdl, ncol=6) ## Without this, all the groups would be displayed in one large plot; this breaks it up according to the `mdl`-variable.
The usual challenge in using ggplot2 is restructuring all your data into data.frames. For this task, the reshape2 and plyr-packages might be of good use.
For you, I would imagine that your function that creates the subplot both calculates the estimation and creates the plot. This means that you have to split the function into calculating the estimation, returning it to a data.frame, which you then can collate and pass to ggplot.
Output the plots to a pdf:
X = matrix(rnorm(60*100), ncol=60)
Y = matrix(rnorm(60*100), ncol=60)
pdf(file="fileName.pdf")
for(j in 1:60){
plot(X[,j], Y[,j])
}
dev.off()
For placing many plots on a page or document (and I have created images with literally thousands of plots in them), it is convenient to separate the work between R--which creates the plots individually--and other software which is better suited for arranging arrays of things. If this reminds you of spreadsheets or word processing tables, then we are thinking alike.
This page, which is a screenshot from a PDF file, contains over 200 statistical graphics. Although it has been greatly reduced (to 40% nominal size) in order to obscure proprietary data, the original has all the detail of the original R graphics and can be zoomed to 1600% without problem.
Two mechanisms have worked reasonably well. For up to several hundred plots, a little macro to import and re-sequence a set of bitmapped image files (.emf or .wmf) into a Word document does fine. For better control, I turn to a comparable Excel macro. It is driven by a sheet that is empty of everything except a row with column headers and a column with row headers. (You can see them at the left and top of the figure.) The macro deletes everything else on that sheet (except for formatting), then munges each possible combination of row and column header into a file name and if it finds that file, it imports it into the corresponding cell. The whole operation takes just a few seconds for several thousand images.
Obviously this communication mechanism between R and the other software is primitive, consisting of a collection of image files having a standard naming convention. But the code needed to implement it all is brief (albeit customized to each situation) and it works reliably. For example, if you encapsulate the plotting code within a function, then it will be called within a loop to create many similar plots. At the end of that function add a few lines to save the plot to a file, something like this:
path <- "W: <whatever>/" # Folder for the output files
ext <- "wmf" # or "emf" or "png" or ... # Format (and extension) of the output
...
if (save) {
outfile <- paste(path, paste(munge(well), munge(parm), sep="_"), sep="/")
outfile <- paste(outfile, ext, sep=".")
savePlot(filename=outfile, type=ext)
}
In this case each plot is identified by two loop variables, well and parm, both of which are strings (they correspond to the column and row headers). The function for creating acceptable filenames merely strips out punctuation, replacing it by an anodyne placeholder:
munge <- function(s) gsub("[[:punct:]]", "_", s)
Once those images have been imported into Word, Excel, or wherever you like, it's fairly easy to reorganize them, place other material around them, etc., and then print the result in PDF format.
There is an art to creating these very large "small multiples" (in Tufte's terminology). To the extent possible, it helps to follow Tufte's principle of increasing the data:ink ratio by erasing inessential material. That makes graphical patterns clear even when the tableau has been greatly reduced in size in order to comprehend all its rows and columns at once. Although the preceding figure is a poor example--the individual plots had to have axes, gridlines, labels, and so on so that they can be read in detail when zoomed--the power of this method to reveal patterns is clear even at this scale. It is crucial to make the plots comparable to one another. In this example, which consists of time series, every plot has the same range on the x-axis; within each row (which corresponds to a different type of observation), the ranges on the y-axes are the same; and all color schemes and methods of symbolization are the same throughout.
You could also use knitr. This didn't instantly convert over to base graphics (and I've got to run now), but using ggplot works easily.
\documentclass{article}
\begin{document}
<<echo = FALSE, fig.keep='high', fig.height=3, fig.width=4>>=
require(ggplot2)
for (i in 1:10) print(ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point())
#
\end{document}
The above code will produce a nice multi-page pdf with all the graphs.
For a very simple solution to this type of issue, I found that setting a large "Windows" device manages to make the window big enough for many uses.
windows(50,50)
par(mfrow=c(10,6))
for(i in 1:60){
x=rnorm(100)
y=rnorm(100)
plot(x,y)
}
Or in my case,
windows(20,20)
plot(Plotting_I_Need_In_Rows_of_4, mfrow=c(4,4))

Resources