R - Keep log of all plots - r

I do a lot of data exploration in R and I would like to keep every plot I generate (from the interactive R console). I am thinking of a directory where everything I plot is automatically saved as a time-stamped PDF. I also do not want this to interfere with the normal display of plots.
Is there something that I can add to my ~/.Rprofile that will do this?

The general idea is to write a script generating the plot in order to regenerate it. The ESS documentation (in a README) says it well under 'Philosophies for using ESS':
The source code is real. The objects are realizations of the
source code. Source for EVERY user modified object is placed in a
particular directory or directories, for later editing and
retrieval.
With any editor allows stepwise (or regionwise) execution of commands you can keep track of your work this way.

The best approach is to use a script file (or sweave or knitr file) so that you can just recreate all the graphs when you need them (into a pdf file or other).
But here is the start of an approach that does the basics of what you asked:
savegraphs <- local({i <- 1;
function(){
if(dev.cur()>1){
filename <- sprintf('graphs/SavedPlot%03d.pdf', i)
dev.copy2pdf( file=filename )
i <<- i + 1
}
}
})
setHook('before.plot.new', savegraphs )
setHook('before.grid.newpage', savegraphs )
Now just before you create a new graph the current one will be saved into the graphs folder of the current working folder (make sure that it exists). This means that if you add to a plot (lines, points, abline, etc.) then the annotations will be included. However you will need to run plot.new in order for the last plot to be saved (and if you close the current graphics device without running another plot.new then that last plot will not be saved).
This version will overwrite plots saved from a previous R session in the same working directory. It will also fail if you use something other than base or grid graphics (and maybe even with some complicated plots then). I would not be surprised if there are some extra plots on occasion that show up (when internally a plot is created to get some parameters, then immediatly replaced with the one of interest). There are probably other things that I have overlooked as well, but this might get you started.

you could write your own wrapper functions for your commonly used plot functions. This wrapper function would call both the on-screen display and a timestamped pdf version. You could source() this function in your ~/.Rprofile so that it's available every time you run R.
For latice's xyplot, using the windows device for the on-screen display:
library(lattice)
my.xyplot <- function(...){
dir.create(file.path("~","RPlots"))
my.chart <- xyplot(...)
trellis.device(device="windows",height = 8, width = 8)
print(my.chart)
trellis.device(device = "pdf",
file = file.path("~", "RPlots",
paste("xyplot",format(Sys.time(),"_%Y%m%d_%H-%M-%S"),
".pdf", sep = "")),
paper = "letter", width = 8, height = 8)
print(my.chart)
dev.off()
}
my.data <- data.frame(x=-100:100)
my.data$y <- my.data$x^2
my.xyplot(y~x,data=my.data)
As others have said, you should probably get in the habit of working from an R script, rather than working exclusively from the interactive terminal. If you save your scripts, everything is reproducible and modifiable in the future. Nonetheless, a "log of plots" is an interesting idea.

Related

R Need to restart RStudio to view and save in a file using dev.copy() and dev.off()

I am trying to create a plot and eventually save it as a file. But because I am making a lot of changes and want to test it out, I want to be able to view and save the plot at the same time. I have looked at this page to do what I want to do but in my system, it does not seem to be working as it is supposed to.
Here are my codes:
png('Save.png')
sample.df <- data.frame(group = c('A','B','A','C','B','A','A','C','B','C','C','C','B'),
X = c(2,11,3,4,1,6,3,7,5,9,10,2,8),
Y = c(3,8,5,2,7,9,3,6,6,1,3,4,10))
plot(Y ~ X, data = sample.df)
dev.copy(png, 'Save.png')
dev.off()
There are several issues (I am new to R so I might be missing something entirely):
(1) When I use png(), I cannot view the plot in RStudio so I used dev.copy() but it does not allow me to view my plot in R studio
(2) Even after I use dev.off(), I cannot view the saved file until I close the RStudio (says "Windows Photo Viewer can't open this picture because the picture is being edited in another program"). I need to restart every time so it is very inconvenient.
What am I doing wrong and how could I view and view saved file without restarting RStudio every time? Thank you in advance!
Addition
Based on Love Tätting's comments, when I run dev.list(), this is what I get.
> png('Save.png')
>
> sample.df <- data.frame(group = c('A','B','A','C','B','A','A','C','B','C','C','C','B'),
+ X = c(2,11,3,4,1,6,3,7,5,9,10,2,8),
+ Y = c(3,8,5,2,7,9,3,6,6,1,3,4,10))
>
> plot(Y ~ X, data = sample.df)
>
> dev.copy(png, 'Save.png')
png
3
> dev.off()
png
2
> dev.list()
png
2
> dev.off()
null device
1
> dev.list()
NULL
Why do I not get RStudioGD?
RStudio has its own device, "RStudioGD". You can see it with dev.list(), where it by default is the first and only one.
R's design for decoupling rendering and backend is by the abstraction of devices. Which ones you can use is platform and environment dependent. dev.list() shows the stack of current devices.
If I understand your problem correctly you want to display the graph first in RStudio, and then decide whether you want to save it or not. Depending on how often you save th image you could use the 'export' button in the plot pane in RStudio and save it manually.
Otherwise, your choice of trying to copy it would be the obvious one for me as well.
To my knowledge the device abstraction in R does not allow one to encapsulate the device as an object, so one for example could make it an argument to a function that does the actual plot. Since dev.set() takes an index as argument, passing the index as argument will be dependent on state of the stack of devices.
I have not come up with a clean solution to this myself and have sometimes retorted to bracketing the plot rendering code with a call to a certain device and saving it right after, and switching device depending on a global.
So, if you can, use RStudios export functionality, otherwise an abstraction would need to maintain the state of the global stack of devices and do extensive testing of its state as it is global and you cannot direct a plot call to a certain device, it simply plots to the current device (to my knowledge).
Edit after OP comment
It seems that it is somewhat different behaviour you are experiencing if you cannot watch the file after dev.off, but also need to quit RStudio. For some type of plot frameworks there is a need to call print on the graphical object to have it actually print to the file. Perhaps this is done by RStudio at shutdown as part of normal teardown procedures of open devices? In that ase the file should be empty if you forcibly look in its contents before quiting RStudio.
The other thing that sometimes work is to call dev.off twice. I don't know exactly why, but sometimes more devices get created than I have anticipated. After you have done dev.off, what does dev.list show?
Edit after OP's edit
I can see that you do, png(); dev.copy(); dev.off(). This will leave you with one more device opened than closed. You will still have the first graphics device that you started open as can be seen when you do the listing. You can simply remove dev.copy(). The image will be saved on dev.off() and should be able to open from the filesystem.
As to why you cannot see the RStudio graphics device, I am not entirely sure. It might be that other code is messing with your device stack. I would check in a clean session if it is there to make sure other code isn't tampering with the device stack. From RStudio forums and other SO questions there seem to have been plot pane related problems in RStudio that have resolved after updating RStudio to the latest. If that is a viable solution for you I would try that.
I've just added support for RStudio's RStudioGD device to the developer's version of R.devices package (I'm the author). This will allow you to do the following in RStudio:
library("R.devices")
sample.df <- data.frame(
group = c('A','B','A','C','B','A','A','C','B','C','C','C','B'),
X = c(2,11,3,4,1,6,3,7,5,9,10,2,8),
Y = c(3,8,5,2,7,9,3,6,6,1,3,4,10)
)
figs <- devEval(c("RStudioGD", "png"), name = "foo", {
plot(Y ~ X, data = sample.df)
})
You can specify any set of output target types, e.g. c("RStudioGD", "png", "pdf", "x11"). The devices that output to file will by default write the files in folder figures/ with filenames as <name>.<ext>, e.g. figures/foo.png in the above example.
The value of the call, figs, holds references to all figures produced, e.g. figs$png. You can open them directly from R using the operator !. For example:
> figs$png
[1] "figures/foo.png"
> !figs$png
[1] "figures/foo.png"
The latter call should show the PNG file using your system's PNG viewer.
Until I submit these updates to CRAN, you can install the developer's version (2.15.1.9000) as:
remotes::install_github("HenrikBengtsson/R.devices#develop")

Save all plots already present in the panel of Rstudio

I've made different plots (more than a hundred) for a project and I haven't capture them on the way (yes it's bad , i know). Now, I need to save them all at once but without running again my script (which takes hours). Is there a way to do so within Rstudio ?
Edit: All the plot are already there and I don't want to run them again.
In RStudio, every session has a temporary directory that can be obtained using tempdir(). Inside that temporary directory, there is another directory that always starts with "rs-graphics" and contains all the plots saved as ".png" files. Therefore, to get the list of ".png" files you can do the following:
plots.dir.path <- list.files(tempdir(), pattern="rs-graphics", full.names = TRUE);
plots.png.paths <- list.files(plots.dir.path, pattern=".png", full.names = TRUE)
Now, you can copy these files to your desired directory, as follows:
file.copy(from=plots.png.paths, to="path_to_your_dir")
Additional feature:
As you will notice, the .png file names are automatically generated (e.g., 0078cb77-02f2-4a16-bf02-0c5c6d8cc8d8.png). So if you want to number the .png files according to their plotting order in RStudio, you may do so as follows:
plots.png.detials <- file.info(plots.png.paths)
plots.png.detials <- plots.png.detials[order(plots.png.detials$mtime),]
sorted.png.names <- gsub(plots.dir.path, "path_to_your_dir", row.names(plots.png.detials), fixed=TRUE)
numbered.png.names <- paste0("path_to_your_dir/", 1:length(sorted.png.names), ".png")
# Rename all the .png files as: 1.png, 2.png, 3.png, and so on.
file.rename(from=sorted.png.names, to=numbered.png.names)
Hope it helps.
Although this discussion has been inactive for a while, there are some persons, like myself, who still come across the same problem, and the other solutions don't really seem to even get what the actual question is.
So, hands on. Your plot history gets saved in a variable called .SavedPlots. You can either access it directly, assign it to another variable in code or do the latter from the plots window.
# ph for plot history
ph <- .SavedPlots
In R 3.4.2, I could index ph to reproduce the corresponding plot in a device. What follows is rather straightforward:
Open a new device (png, jpeg, pdf...).
Reproduce your plot ph[index_of_plot_in_history].
Close the device (or keep plotting if it is a pdf with multiple pages).
Example:
for(i in 1:lastplot) {
png('plotname.png')
print(ph[i])
dev.off()
}
Note: Sometimes this doesn't happen because of poor programming. For instance, I was using the MICE package to impute many datasets with a large number of variables, and plotting as shown in section 4.3 of this paper. Problem was, that only three variables per plot were displayed, and if I used a png device in my code, only the last plot of each dataset would be saved. However, if the plots were printed to a window, all the plots of each dataset would be recorded.
If your plots are 3d, you can take a snapshot of all your plots and save them as a .png file format.
snapshot3d(filename = '../Plots/SnapshotPlots.png', fmt = 'png')
Or else, the best way is to create a multi-paneled plotting window using the par(mfrow) function. Try the following
plotsPath = "../Plots/allPlots.pdf"
pdf(file=plotsPath)
for (x in seq(1,100))
{
par(mfrow = c(2,1))
p1=rnorm(x)
p2=rnorm(x)
plot(p1,p2)
}
dev.off()
You can also use png, bmp, tiff, and jpeg functions instead of pdf. You can read their advantages and disadvantages and choose the one you think is good for your needs.
I am not sure how Rstudio opens the device where the plot are drawn, but I guess it uses dev.new(). In that case one quick way to save all opened graphs is to loop through all the devices and write them using dev.print.
Something like :
lapply(dev.list(),function(d){dev.set(d);dev.print(pdf,file=file.path(folder,paste0("graph_",d,".pdf"))})
where folder is the path of the folder where you want to store your graph (could be for example folder="~" if you are in linux and want to store all your graph in your home folder).
If you enter the following function all that will follow will be save in a document:
pdf("nameofthedocument.pdf")
plot(x~y)
plot(...
dev.off()
You can also use tiff(), jpg()... see ?pdf

R: Bitmap output in PDF

A lot of the time, I find it very useful to output graphics with pdf() as it allows me to scroll through pages and observe subtle differences (e.g. the page numbers may correspond to a particular parameter in a simulation).
Sometimes if the plot is quite packed with information, the fact that the PDF is a vector graphic means that it takes a long time to load in a PDF reader and is useless for scrolling through pages. I could plot with png(), but this would result in many image files.
My ideal solution would be to have a device that will plot a bitmap graphic (e.g. PNG) to a PDF.
I have read that cairo_pdf() outputs to a bitmap sometimes? Or I could write something that outputs to PNG, then combines these all together into a PDF?
Any other thoughts? Or does anyone have a solution for this already?
UPDATE: have now added method based on readPNG() as suggested in comments above. It's a bit slower (3s vs 9s) and seems to result in slightly larger file sizes than ImageMagick. rasterImage() interpolation makes no difference to filesize or timing, but alters the appearance slightly. If it's FALSE, then it looks the same as ImageMagick
I have just come up with the following solution using ImageMagick. It's not perfect, but it seems to work well so far.
png2pdf <- function(name=NULL,removepngs=TRUE,method="imagemagick",pnginterpolate=FALSE){
# Run the png() function with a filename of the form name%03d.png
# Then the actual plotting functions, e.g. plot(), lines() etc.
# Then dev.off()
# Then run png2pdf() and specify the name= argument if other pngs exist in the directory
# Need to incorporate a way of dealing with non-square plots
if(is.null(name)){
names <- list.files(pattern="[.]png")
name <- unique(sub("[0-9][0-9][0-9][.]png","",names))
if(length(name)!=1) stop("png2pdf() error: Check filenames")
}else{
names <- list.files(pattern=paste0(name,"[0-9][0-9][0-9][.]png"))
}
# Can change this to "convert" if it is correctly in the system path
if(method=="imagemagick"){
cmd <- c('C:\\Program Files\\ImageMagick-6.9.0-Q16\\convert.exe',names,paste0(name,".pdf"))
system2(cmd[1],cmd[-1])
}else if(method=="readPNG"){
library(png)
pdf(paste0(name,".pdf"))
par(mar=rep(0,4))
for(i in 1:length(names)){
plot(c(0,1),c(0,1),type="n")
rasterImage(readPNG(names[i]),0,0,1,1,interpolate=pnginterpolate)
}
dev.off()
}
if(removepngs) file.remove(names)
}

How to join a PDF files in R?

I've an R program that outputs a booklet of graphics as a PDF file onto the local server. There's a separate PDF file, an introduction piece, not written in R, that I would like to join my output to.
I can complete this in Adobe and R-bloggers has the process here, both of which involve joining the files by hand, as it were:
http://www.r-bloggers.com/splitting-and-combining-r-pdf-graphics/
But what I'd really prefer to do is just run my code and have the files join. I wasn't able to find similar posts while searching for "[R] Pdf" and "join", "merge", "import pdf", etc..
My intent is to run the code for a different ID number ("Physician") each time. The report will save as a PDF titled by ID number on the server, and the same addendum would be joined to each document.
Here's the current code creating the R report.
Physician<- 1
#creates handle for file name and location using ID
Jumanji<- paste ("X:\\Feedback_ID_", Physician, ".pdf", sep="")
#PDF graphics device on, using file handle
pdf(file=Jumanji,8.5, 11)
Several plots for this ID occur here and then the PDF is completed with dev.off().
dev.off()
I think I need to pull the outside document into R and reference it in between the opening and closing, but I haven't been successful here.
To do this in R, follow #cbeleites' suggestion (who, I think, is rightly suggesting you move your whole workflow to knitr) to do just this bit in Sweave/knitr. knit the following to pdf, where "test.pdf" is your report that you're appending to, and you'll get the result you want:
\documentclass{article}
\usepackage{pdfpages}
\begin{document}
\includepdf{test.pdf} % your other document
<<echo=FALSE>>=
x <- rnorm(100)
hist(x)
# or whatever you need to do to get your plot
#
\end{document}
Also, the post you link to seems crazy because it's easy to combine plots into a single pdf in R (in fact it's the default option). Simply leave the pdf device open with its parameter onefile=TRUE (the default).
x <- rnorm(100)
y <- rnorm(100)
pdf("test.pdf")
hist(x)
hist(y)
dev.off()
Plots will automatically get paginated.
You can also consider something like :
library(qpdf)
path_PDF1 <- "C:/1.pdf"
path_PDF2 <- "C:/2.pdf"
pdf_combine(input = c(path_PDF1, path_PDF2), output = "C:/merged.pdf")

R: Improving workflow and keeping track of output

I have what I think is a common enough issue, on optimising workflow in R. Specifically, how can I avoid the common issue of having a folder full of output (plots, RData files, csv, etc.), without, after some time, having a clue where they came from or how they were produced? In part, it surely involves trying to be intelligent about folder structure. I have been looking around, but I'm unsure of what the best strategy is. So far, I have tackled it in a rather unsophisticated (overkill) way: I created a function metainfo (see below) that writes a text file with metadata, with a given file name. The idea is that if a plot is produced, this command is issued to produce a text file with exactly the same file name as the plot (except, of course, the extension), with information on the system, session, packages loaded, R version, function and file the metadata function was called from, etc. The questions are:
(i) How do people approach this general problem? Are there obvious ways to avoid the issue I mentioned?
(ii) If not, does anyone have any tips on improving this function? At the moment it's perhaps clunky and not ideal. Particularly, getting the file name from which the plot is produced doesn't necessarily work (the solution I use is one provided by #hadley in 1). Any ideas would be welcome!
The function assumes git, so please ignore the probable warning produced. This is the main function, stored in a file metainfo.R:
MetaInfo <- function(message=NULL, filename)
{
# message - character string - Any message to be written into the information
# file (e.g., data used).
# filename - character string - the name of the txt file (including relative
# path). Should be the same as the output file it describes (RData,
# csv, pdf).
#
if (is.null(filename))
{
stop('Provide an output filename - parameter filename.')
}
filename <- paste(filename, '.txt', sep='')
# Try to get as close as possible to getting the file name from which the
# function is called.
source.file <- lapply(sys.frames(), function(x) x$ofile)
source.file <- Filter(Negate(is.null), source.file)
t.sf <- try(source.file <- basename(source.file[[length(source.file)]]),
silent=TRUE)
if (class(t.sf) == 'try-error')
{
source.file <- NULL
}
func <- deparse(sys.call(-1))
# MetaInfo isn't always called from within another function, so func could
# return as NULL or as general environment.
if (any(grepl('eval', func, ignore.case=TRUE)))
{
func <- NULL
}
time <- strftime(Sys.time(), "%Y/%m/%d %H:%M:%S")
git.h <- system('git log --pretty=format:"%h" -n 1', intern=TRUE)
meta <- list(Message=message,
Source=paste(source.file, ' on ', time, sep=''),
Functions=func,
System=Sys.info(),
Session=sessionInfo(),
Git.hash=git.h)
sink(file=filename)
print(meta)
sink(file=NULL)
}
which can then be called in another function, stored in another file, e.g.:
source('metainfo.R')
RandomPlot <- function(x, y)
{
fn <- 'random_plot'
pdf(file=paste(fn, '.pdf', sep=''))
plot(x, y)
MetaInfo(message=NULL, filename=fn)
dev.off()
}
x <- 1:10
y <- runif(10)
RandomPlot(x, y)
This way, a text file with the same file name as the plot is produced, with information that could hopefully help figure out how and where the plot was produced.
In terms of general R organization: I like to have a single script that recreates all work done for a project. Any project should be reproducible with a single click, including all plots or papers associated with that project.
So, to stay organized: keep a different directory for each project, each project has its own functions.R script to store non-package functions associated with that project, and each project has a master script that starts like
## myproject
source("functions.R")
source("read-data.R")
source("clean-data.R")
etc... all the way through. This should help keep everything organized, and if you get new data you just go to early scripts to fix up headers or whatever and rerun the entire project with a single click.
There is a package called Project Template that helps organize and automate the typical workflow with R scripts, data files, charts, etc. There is also a number of helpful documents like this one Workflow of statistical data analysis by Oliver Kirchkamp.
If you use Emacs and ESS for your analyses, learning Org-Mode is a must. I use it to organize all my work. Here is how it integrates with R: R Source Code Blocks in Org Mode.
There is also this new free tool called Drake which is advertised as "make for data".
I think my question belies a certain level of confusion. Having looked around, as well as explored the suggestions provided so far, I have reached the conclusion that it is probably not important to know where and how a file is produced. You should in fact be able to wipe out any output, and reproduce it by rerunning code. So while I might still use the above function for extra information, it really is a question of being ruthless and indeed cleaning up folders every now and then. These ideas are more eloquently explained here. This of course does not preclude the use of Make/Drake or Project Template, which I will try to pick up on. Thanks again for the suggestions #noah and #alex!
There is also now an R package called drake (Data Frames in R for Make), independent from Factual's Drake. The R package is also a Make-like build system that links code/dependencies with output.
install.packages("drake") # It is on CRAN.
library(drake)
load_basic_example()
plot_graph(my_plan)
make(my_plan)
Like it's predecessor remake, it has the added bonus that you do not have to keep track of a cumbersome pile of files. Objects generated in R are cached during make() and can be reloaded easily.
readd(summ_regression1_small) # Read objects from the cache.
loadd(small, large) # Load objects into your R session.
print(small)
But you can still work with files as single-quoted targets. (See 'report.Rmd' and 'report.md' in my_plan from the basic example.)
There is package developed by RStudio called pins that might address this problem.

Resources