As a part of my dissertation research, I wrote R code that performs a basic exploratory data analysis (EDA) of initial datasets. The code is supposed to output the results of EDA in three formats: 1) screen (RStudio Plots window); 2) SVG files (single file per plot); 3) PDF file (one file for all univariate EDA plots and another one for all multivariate EDA plots). It is still a work in progress in terms of covering all variables of interest and their relationships, but the basic infrastructure has already been designed and implemented (using ggplot2 and gridExtra packages). I am experiencing the following two issues with this:
1) When, during EDA, the code is supposed to display a current plot in the RStudio Plots window, the screen just blinks and no output is performed. The following is the code that generates and displays plots (screen and SVG only, see below for PDF output) [similar blocks of code are wrapped in functions, returning plot objects (g), which form a list, which, in turn, is passed to lapply() for iterating through all plots]:
df$var <- factor(df[[colName]])
title <- paste("Projects distribution across", colName, "range")
g <- ggplot(data=df, aes(x=var, fill=var)) +
geom_bar(stat="bin") +
scale_fill_discrete(colName) +
xlab(colName) +
ylab("Number of projects") +
ggtitle(label=title)
if (.Platform$GUI == "RStudio") {print(g); dev.off()}
edaFile <- str_replace_all(string=colName, pattern=" ", repl="")
edaFile <- paste0(EDA_RESULTS_DIR, "/", edaFile, ".svg")
suppressMessages(ggsave(file=edaFile, plot=g))
2) After attempting to open PDF file with EDA results, it opens, but further attempts to navigate it (i.e., line or page scrolling) or otherwise work with it (i.e., change zoom level) result in hanging of a PDF reader program (Adobe Reader XI) with message in its title bar "... (Not Responding)". Sometimes, after quite a while, Adobe Reader returns back to the responsive state, but for a short period of time, until next action sends its again to the hanging state. I noticed that it takes time for Adobe Reader to display one particular plot, specifically a Q-Q plot. Just wanted to mention this, as it might give additional insight. The following is the code that outputs saved plots to a PDF file (these are the same plots that were saved and displayed on the screen as well as were output to SVG files, as described above):
edaFilePDF <- paste0(EDA_RESULTS_DIR, "/", "eda-univar.pdf")
mg <- do.call(marrangeGrob, c(allPlots, list(nrow=2, ncol = 1)));
suppressMessages(ggsave(filename=edaFilePDF, mg, width=8.5, height=11))
Your help is much appreciated! P.S. I use RStudio Server, so the output is browser-based.
Related
I have 3 R plots saved as pdf files (upper_left.pdf, upper_right.pdf, lower.pdf) as vector graphic and want to make a one-page pdf file and arrange them on it as follows:
What I have tried already
I have tried reading the pdf's using magick::image_read_pdf and appending them using magick::image_append. More specifically,
library(magick)
panel.ul <- image_read_pdf("upper_left.pdf")
panel.ur <- image_read_pdf("upper_right.pdf")
panel.l <- image_read_pdf("lower.pdf")
whole <- c(panel.ul, panel.ur) %>%
image_append() %>%
c(panel.l) %>%
image_append(stack = TRUE)
The first issue is magick::image_read_pdf imports the plot as png (if I'm right, not vector graphic though).
magick::image_append also 'works' and gives me what I want in viewer pane (in RStudio, next to Help).
I then try to save them using export::graph2pdf(whole), but it gives me a blank page.
So, if I am to use magick, there are two issues that need to be solved:
importing plots as vector graphic objects (do not know the technical term in R)
Exporting the stacked plot to a vector pdf file.
How can I solve it? thanks in advance.
You're basically done. You only need to add
plot(whole) # plot the external object generated in ImageMagick to R's plotting device
savePlot(type = "pdf") # saves the current plotting device to a pdf file.
You will find your plot in your workoing directory called "Rplot.pdf".
savePlot has many options to customize your pdf output. Make sure to check ?savePlot.
To recreate your scheme from above youll need to temporarily save the upper panel as a separate pdf before you paste it to on top of the lower panel:
whole2 <- image_append(c(panel.ul, panel.ur))
plot(whole2)
savePlot("whole2.pdf", type = "pdf")
If the upper and lower panel do not look proportionate you can use the heght and width parameters of savePlot to adjust the size of the first pdf.
panel.upr <- image_read_pdf("whole2.pdf")
final <- image_append(c(image_append(panel.upr),panel.l), stack = TRUE)
plot(final)
savePlot("final.pdf", type = "pdf")
I'm creating a script to cluster my data in a server. I need to save the text output and the images as well. The text output works just fine but when I try to use the png() + plot() + dev.off() thing to save the plots, no image is created.
[ADDED FOR CLARIFICATION]
What I need to do is to SAVE the plot (i.e. generate an image file) in running mode. If I run the code step by step, the file is created.
I already tried to change the image format to PDF and JPG using the corresponding functions but I'm still getting no images as output when running the code as script. When stepping, it works great.
Since it takes a little while for R to render the image when I'm running step by step, I tried to add Sys.sleep(2) in between commands (code below) but nothing changed.
I think the problem might be related to the package that I'm using and the type of object it generates (library(NMF)). I looked at the documentation to see if there was something about the way the plot() function works with the type of object that the clustering algorithm generates but the text is vague:
"The result (of estim.r <- nmf(esGolub, 2:6, nrun=10, seed=123456) for example) is a S3 object of class NMF.rank, that contains a data.frame with the quality measures in column, and the values of r in row. It also contains a list of the consensus matrix for
each value of r".
"All the measures can be plotted at once with the method plot (Figure 1), and the function consensusmap generates heatmaps of the consensus matrix for each value of the rank".
There is another type of image that can be generated after the clustering runs: the consensusmap. This one works on both cases (stepping and running).
The script is pretty short. Here it is:
library(NMF)
data = read.csv('R.csv', header=TRUE, sep=";")
res1 <- nmf(data, rank=2:5, nrun=1, "brunet", "random")
# this always works
capture.output(summary(res1) ,file = "summary.txt", append = TRUE)
# this always works too
png(filename = 'consensus.png', width = 1366, height = 768, units = 'px')
consensusmap(res1)
dev.off()
# this does not work on 'running mode', only 'stepping mode'
png(filename = 'metrics.png', width = 1366, height = 768, units = 'px')
# added hoping it would fix the issue. It didn't
Sys.sleep(2)
plot(res1)
# added hoping it would fix the issue. It didn't
Sys.sleep(2)
dev.off()
The summary.txt file is generated, the consensus.png too. The metrics.png is not. What's going on here??
I've made different plots (more than a hundred) for a project and I haven't capture them on the way (yes it's bad , i know). Now, I need to save them all at once but without running again my script (which takes hours). Is there a way to do so within Rstudio ?
Edit: All the plot are already there and I don't want to run them again.
In RStudio, every session has a temporary directory that can be obtained using tempdir(). Inside that temporary directory, there is another directory that always starts with "rs-graphics" and contains all the plots saved as ".png" files. Therefore, to get the list of ".png" files you can do the following:
plots.dir.path <- list.files(tempdir(), pattern="rs-graphics", full.names = TRUE);
plots.png.paths <- list.files(plots.dir.path, pattern=".png", full.names = TRUE)
Now, you can copy these files to your desired directory, as follows:
file.copy(from=plots.png.paths, to="path_to_your_dir")
Additional feature:
As you will notice, the .png file names are automatically generated (e.g., 0078cb77-02f2-4a16-bf02-0c5c6d8cc8d8.png). So if you want to number the .png files according to their plotting order in RStudio, you may do so as follows:
plots.png.detials <- file.info(plots.png.paths)
plots.png.detials <- plots.png.detials[order(plots.png.detials$mtime),]
sorted.png.names <- gsub(plots.dir.path, "path_to_your_dir", row.names(plots.png.detials), fixed=TRUE)
numbered.png.names <- paste0("path_to_your_dir/", 1:length(sorted.png.names), ".png")
# Rename all the .png files as: 1.png, 2.png, 3.png, and so on.
file.rename(from=sorted.png.names, to=numbered.png.names)
Hope it helps.
Although this discussion has been inactive for a while, there are some persons, like myself, who still come across the same problem, and the other solutions don't really seem to even get what the actual question is.
So, hands on. Your plot history gets saved in a variable called .SavedPlots. You can either access it directly, assign it to another variable in code or do the latter from the plots window.
# ph for plot history
ph <- .SavedPlots
In R 3.4.2, I could index ph to reproduce the corresponding plot in a device. What follows is rather straightforward:
Open a new device (png, jpeg, pdf...).
Reproduce your plot ph[index_of_plot_in_history].
Close the device (or keep plotting if it is a pdf with multiple pages).
Example:
for(i in 1:lastplot) {
png('plotname.png')
print(ph[i])
dev.off()
}
Note: Sometimes this doesn't happen because of poor programming. For instance, I was using the MICE package to impute many datasets with a large number of variables, and plotting as shown in section 4.3 of this paper. Problem was, that only three variables per plot were displayed, and if I used a png device in my code, only the last plot of each dataset would be saved. However, if the plots were printed to a window, all the plots of each dataset would be recorded.
If your plots are 3d, you can take a snapshot of all your plots and save them as a .png file format.
snapshot3d(filename = '../Plots/SnapshotPlots.png', fmt = 'png')
Or else, the best way is to create a multi-paneled plotting window using the par(mfrow) function. Try the following
plotsPath = "../Plots/allPlots.pdf"
pdf(file=plotsPath)
for (x in seq(1,100))
{
par(mfrow = c(2,1))
p1=rnorm(x)
p2=rnorm(x)
plot(p1,p2)
}
dev.off()
You can also use png, bmp, tiff, and jpeg functions instead of pdf. You can read their advantages and disadvantages and choose the one you think is good for your needs.
I am not sure how Rstudio opens the device where the plot are drawn, but I guess it uses dev.new(). In that case one quick way to save all opened graphs is to loop through all the devices and write them using dev.print.
Something like :
lapply(dev.list(),function(d){dev.set(d);dev.print(pdf,file=file.path(folder,paste0("graph_",d,".pdf"))})
where folder is the path of the folder where you want to store your graph (could be for example folder="~" if you are in linux and want to store all your graph in your home folder).
If you enter the following function all that will follow will be save in a document:
pdf("nameofthedocument.pdf")
plot(x~y)
plot(...
dev.off()
You can also use tiff(), jpg()... see ?pdf
A lot of the time, I find it very useful to output graphics with pdf() as it allows me to scroll through pages and observe subtle differences (e.g. the page numbers may correspond to a particular parameter in a simulation).
Sometimes if the plot is quite packed with information, the fact that the PDF is a vector graphic means that it takes a long time to load in a PDF reader and is useless for scrolling through pages. I could plot with png(), but this would result in many image files.
My ideal solution would be to have a device that will plot a bitmap graphic (e.g. PNG) to a PDF.
I have read that cairo_pdf() outputs to a bitmap sometimes? Or I could write something that outputs to PNG, then combines these all together into a PDF?
Any other thoughts? Or does anyone have a solution for this already?
UPDATE: have now added method based on readPNG() as suggested in comments above. It's a bit slower (3s vs 9s) and seems to result in slightly larger file sizes than ImageMagick. rasterImage() interpolation makes no difference to filesize or timing, but alters the appearance slightly. If it's FALSE, then it looks the same as ImageMagick
I have just come up with the following solution using ImageMagick. It's not perfect, but it seems to work well so far.
png2pdf <- function(name=NULL,removepngs=TRUE,method="imagemagick",pnginterpolate=FALSE){
# Run the png() function with a filename of the form name%03d.png
# Then the actual plotting functions, e.g. plot(), lines() etc.
# Then dev.off()
# Then run png2pdf() and specify the name= argument if other pngs exist in the directory
# Need to incorporate a way of dealing with non-square plots
if(is.null(name)){
names <- list.files(pattern="[.]png")
name <- unique(sub("[0-9][0-9][0-9][.]png","",names))
if(length(name)!=1) stop("png2pdf() error: Check filenames")
}else{
names <- list.files(pattern=paste0(name,"[0-9][0-9][0-9][.]png"))
}
# Can change this to "convert" if it is correctly in the system path
if(method=="imagemagick"){
cmd <- c('C:\\Program Files\\ImageMagick-6.9.0-Q16\\convert.exe',names,paste0(name,".pdf"))
system2(cmd[1],cmd[-1])
}else if(method=="readPNG"){
library(png)
pdf(paste0(name,".pdf"))
par(mar=rep(0,4))
for(i in 1:length(names)){
plot(c(0,1),c(0,1),type="n")
rasterImage(readPNG(names[i]),0,0,1,1,interpolate=pnginterpolate)
}
dev.off()
}
if(removepngs) file.remove(names)
}
I do a lot of data exploration in R and I would like to keep every plot I generate (from the interactive R console). I am thinking of a directory where everything I plot is automatically saved as a time-stamped PDF. I also do not want this to interfere with the normal display of plots.
Is there something that I can add to my ~/.Rprofile that will do this?
The general idea is to write a script generating the plot in order to regenerate it. The ESS documentation (in a README) says it well under 'Philosophies for using ESS':
The source code is real. The objects are realizations of the
source code. Source for EVERY user modified object is placed in a
particular directory or directories, for later editing and
retrieval.
With any editor allows stepwise (or regionwise) execution of commands you can keep track of your work this way.
The best approach is to use a script file (or sweave or knitr file) so that you can just recreate all the graphs when you need them (into a pdf file or other).
But here is the start of an approach that does the basics of what you asked:
savegraphs <- local({i <- 1;
function(){
if(dev.cur()>1){
filename <- sprintf('graphs/SavedPlot%03d.pdf', i)
dev.copy2pdf( file=filename )
i <<- i + 1
}
}
})
setHook('before.plot.new', savegraphs )
setHook('before.grid.newpage', savegraphs )
Now just before you create a new graph the current one will be saved into the graphs folder of the current working folder (make sure that it exists). This means that if you add to a plot (lines, points, abline, etc.) then the annotations will be included. However you will need to run plot.new in order for the last plot to be saved (and if you close the current graphics device without running another plot.new then that last plot will not be saved).
This version will overwrite plots saved from a previous R session in the same working directory. It will also fail if you use something other than base or grid graphics (and maybe even with some complicated plots then). I would not be surprised if there are some extra plots on occasion that show up (when internally a plot is created to get some parameters, then immediatly replaced with the one of interest). There are probably other things that I have overlooked as well, but this might get you started.
you could write your own wrapper functions for your commonly used plot functions. This wrapper function would call both the on-screen display and a timestamped pdf version. You could source() this function in your ~/.Rprofile so that it's available every time you run R.
For latice's xyplot, using the windows device for the on-screen display:
library(lattice)
my.xyplot <- function(...){
dir.create(file.path("~","RPlots"))
my.chart <- xyplot(...)
trellis.device(device="windows",height = 8, width = 8)
print(my.chart)
trellis.device(device = "pdf",
file = file.path("~", "RPlots",
paste("xyplot",format(Sys.time(),"_%Y%m%d_%H-%M-%S"),
".pdf", sep = "")),
paper = "letter", width = 8, height = 8)
print(my.chart)
dev.off()
}
my.data <- data.frame(x=-100:100)
my.data$y <- my.data$x^2
my.xyplot(y~x,data=my.data)
As others have said, you should probably get in the habit of working from an R script, rather than working exclusively from the interactive terminal. If you save your scripts, everything is reproducible and modifiable in the future. Nonetheless, a "log of plots" is an interesting idea.