In a Rmarkdown document, I want to accellerate the knitting process by only building the plots when they have not been already built and saved.
I did this using the code below, with a simple example.
x = rnorm(10)
if (! "figurename" %in% dir("figure")) {
png("figure/figurename.png")
hist(x)
dev.off()
}
Now I want to make a function, that does the above command automatically, with a plot call as input. Also, the plot call should not have been evaluated (too slow!). I learned about the substitute command and I wrote this :
x = rnorm(10)
plot_call = substitute(hist(x))
function(plot_call, figurename){
if (! figurename %in% dir("figure")) {
png(file.path("figure", figurename))
eval(plot_call)
dev.off()
}
knitr::include_graphics(file.path("figure", figurename))
}
I have two issues with this :
it does not seem to work with multiple lines plot calls
it seems like a dubious hack
What do you think? Is there a better way ?
A more formal way to cache code blocks is to leverage chunk options. Adding a simple cache = TRUE to the block header will force your plot to be re-evaluated each time block options or the code itself will change:
```{r expensive_plot, cache = TRUE}
# Some expensive plot
df %>%
ggplot(aes(x, y)) +
geom_point()
```
If the plot needs to be recomputed each time there's a change in the underlying data, it's possible to invalidate cache each time the file 'last edited' field changes by adding cache.extra = file.mtime('your-csv.csv') to your options.
Related
I want to create a footer within the float for a figure created with ggplot2 in an rmarkdown document that is generating a .pdf file via LaTeX.
My question: Is there a way within rmarkdown/knitr to add more LaTeX commands within the figure environment?
Specifically, I'd like to find a way to insert custom LaTeX using either the floatrow or caption* macro as described in https://tex.stackexchange.com/questions/56529/note-below-figure within the figure environment.
When I looked at the chunk options (https://yihui.org/knitr/options/#plots), something like out.extra seems close to what I want, but that is used as an extra option to \includegraphics while I want access to put extra LaTeX within the figure environment, outside of any other LaTeX command.
The solution to your question is perhaps quite similar to this one. However, I believe yours is a bit more general, so I'll try to be a bit more general as well...
As far as I know, there's no simple solution to add extra LaTeX code within the figure environment. What you can do is update the knit (or output) hook (i.e. the LaTeX code output generated by the figure chunk).
The source code for the LaTeX figure output hook can be found here (hook_plot_tex). The output generated can be found starting at line 159. Here we can see how the output is structured and we're able to modify it before it reaches the latex engine.
However, we only want to modify it for relevant figure chunks, not all. This is where Martin Schmelzer's answer comes in handy. We can create a new chunk option which allows for control over when it is activated. As an example enabling the use of caption* and floatrow we can define the following knit hook
defOut <- knitr::knit_hooks$get("plot")
knitr::knit_hooks$set(plot = function(x, options) {
#reuse the default knit_hook which will be updated further down
x <- defOut(x, options)
#Make sure the modifications only take place when we enable the customplot option
if(!is.null(options$customplot)) {
x <- gsub("caption", "caption*", x) #(1)
inter <- sprintf("\\floatfoot{%s}\\end{figure}", options$customplot[1]) #(2)
x <- gsub("\\end{figure}", inter, x, fixed=T) #(3)
}
return(x)
})
What we're doing here is (1) replacing the \caption command with \caption*, (2) defining the custom floatfoot text input, (3) replacing \end{figure} with \floatfoot{custom text here}\end{figure} such that floatfoot is inside the figure environment.
As you can probably tell, sky's the limit for what you can add/replace in the figure environment. Just make sure it is added inside the environment and is in the apropriate location. See the example below how the chunk option is used to enable floatfoot and caption*. (You can also split the customplot option into e.g. starcaption and floatfoot by simply dividing up the !is.null(options$customplot) condition. This should allow for better control)
Working example:
---
header-includes:
- \usepackage[capposition=top]{floatrow}
- \usepackage{caption}
output: pdf_document
---
```{r, echo = F}
library(ggplot2)
defOut <- knitr::knit_hooks$get("plot")
knitr::knit_hooks$set(plot = function(x, options) {
x <- defOut(x, options)
if(!is.null(options$customplot)) {
x <- gsub("caption", "caption*", x)
inter <- sprintf("\\floatfoot{%s}\\end{figure}", options$customplot[1])
x <- gsub("\\end{figure}", inter, x, fixed=T)
}
return(x)
})
```
```{r echo = F, fig.cap = "Custom LaTeX hook chunk figure", fig.align="center", customplot = list("This is float text using floatfoot and floatrow")}
ggplot(data = iris, aes(x=Sepal.Length, y=Sepal.Width))+
geom_point()
```
PS
The example above requires the fig.align option to be enabled. Should be fairly easy to fix, but I didn't have the time to look into it.
#henrik_ibsen gave the answer that got me here. I made some modifications to the code that I ended up using to make it work a bit more simply:
hook_plot_tex_footer <- function(x, options) {
x_out <- knitr:::hook_plot_tex(x, options)
if(!is.null(options$fig.footer)) {
inter <- sprintf("\\floatfoot{%s}\n\\end{figure}", options$fig.footer[1])
x_out <- gsub(x=x_out, pattern="\n\\end{figure}", replacement=inter, fixed=TRUE)
}
x_out
}
knitr::knit_hooks$set(plot=hook_plot_tex_footer)
Overall, this does the same thing. But, it uses knitr:::hook_plot_tex() instead of defOut() so that if rerun in the same session, it will still work. And, since it's going into a \floatfoot specifically, I named the option fig.footer. But, those changes are minor and the credit for the answer definitely goes to #henrik_ibsen.
I am very new to Knitr and I am trying to do the following in a R markdown file.
I have a set of names, for each name I make two plots. I need to get a HTML file for each name, containing the respective plots.
{r, echo=FALSE}
for name in setofNames{
barplot(xx)
barplot(yy)
}
I am quite lost on how to do this. Does any one have any ideas?
EDIT:
I am able to generate different HTML files now for each name, using stitch(). However, I don't get all the plots, the code I've retains only the last iteration. I've also explored the opts_chunk() feature, but in vain. It probably has something to clear the cache with, but I am not sure.
Below is the piece of code:
for name in setofNames{
opts_chunk$set(echo=FALSE, fig.keep='all', fig.show='asis')
fname=paste(name,".html")
stitch_rhtml("../testSub.r",output=fname,envir=globalenv())
}
===testSub.r file===
barplot(xx)
barplot(yy)
Would appreciate some inputs.
You could use the par function to get what you need. Also, I usually remove the "echo=FALSE" because it messes up my knitted html.
http://www.statmethods.net/advgraphs/layout.html.
Here is an example of text that gets entered together for the knitr:
```{r}
df<- replicate(100, runif(n=20))
par(mfrow=c(2,3))
for (i in 2:7) hist(df[,i],main=colnames(df)[i])
```
Let me know if you need more specific help and I'll edit this post.
One solution is to use a "control" file that calls knitr several times (once per name). Each time knitr processes the same Rmd-template but with different data.
In the code below I exploited the fact that knitr by default uses the objects in the calling environment (see ?knit: envir = parent.frame()). Hence it is possible to modify an object in the "control" file and a subsequent call of knitr will use that object when processing the template.
(Of course, global variables could be avoided. Then, the control file would need to assign objects in a specific environment and pass this environment to knitr.)
The "control" file (control.R) could look like this:
library(knitr)
## Generate data
set.seed(1)
n <- 1000
dat <- data.frame(
name = sample(x = LETTERS, size = n, replace = TRUE),
value = rnorm(n))
## knit the template once per "name"
lapply(X = levels(dat$name), FUN = function(name) {
currentSubset <- dat[dat$name == name, ]
knit2html(input = "template.Rmd", output = sprintf("output_%s.html", name))
})
template.Rmd:
```{r}
op <- par(mfrow = c(1, 2))
plot(currentSubset$value, col = "green", main = name)
plot(currentSubset$value, col = "red", main = name)
par(op)
```
This generates a separate HTML file (output_[Letter].html) for each letter in levels(dat$name).
Note that each call to knit2html overrides the plots in the figure directory. However, this does not matter because the HTML files do not reference external figures but contain the figures in data URIs. This is due to markdown::markdownToHTML() which is called from knitr::knit2html():
Any local images linked using the <img> tag will be base64 encoded and included in the output HTML.
(Source: markdownToHTML)
I am trying to create a pdf with several plots. More specifically, I want to save my plots, 4 in each page. Therefore, I have the following code in r (which works, but leaves a page empty -the first one-):
pdf("Plots/plots_numeric_four_in_page.pdf",paper="a4r",width = 14)
graphlist <- lapply(3:NCOL(agg_num), function(i) {
force(i)
tempColName=dataName_num[i]
print (tempColName)
p<-qplot(Group.1,agg_num[[tempColName]],data = agg_num,color=Group.2,geom = "line",main=tempColName) + xlab("Date") + ylab(paste("Count of ", tempColName)) + geom_line(size=1.5)+ scale_x_date(labels = date_format("%m/%Y"))+
theme(legend.position="bottom",legend.direction="horizontal")+ guides(col=guide_legend(ncol=3))
})
do.call("marrangeGrob",c(graphlist,ncol=2,nrow=2))
dev.off()
It correctly displays around 50 plots, 4 in each page correctly in a PDF. However, it leaves the first page empty and starts from the second. I looked at marrangeGrob options, but I couldnt find anything to address the problem. Do you know any workaround, or any way to resolve this issue?
There's a known bug between ggplot2 in gridExtra that's causing this for some marrangeGrob's that contain ggplots. Manually overriding the grid.draw.arrangelist function (src) (marrangeGrob returns an arrangelist object) may potentially fix it (suggested here).
grid.draw.arrangelist <- function(x, ...) {
for(ii in seq_along(x)){
if(ii>1) grid.newpage() # skips grid.newpage() call the first time around
grid.draw(x[[ii]])
}
}
It may be safer to define a new class for the arrangelist object in question and apply the fix to it than override grid.draw for every marrageGrob call in scope.
I do a lot of data exploration in R and I would like to keep every plot I generate (from the interactive R console). I am thinking of a directory where everything I plot is automatically saved as a time-stamped PDF. I also do not want this to interfere with the normal display of plots.
Is there something that I can add to my ~/.Rprofile that will do this?
The general idea is to write a script generating the plot in order to regenerate it. The ESS documentation (in a README) says it well under 'Philosophies for using ESS':
The source code is real. The objects are realizations of the
source code. Source for EVERY user modified object is placed in a
particular directory or directories, for later editing and
retrieval.
With any editor allows stepwise (or regionwise) execution of commands you can keep track of your work this way.
The best approach is to use a script file (or sweave or knitr file) so that you can just recreate all the graphs when you need them (into a pdf file or other).
But here is the start of an approach that does the basics of what you asked:
savegraphs <- local({i <- 1;
function(){
if(dev.cur()>1){
filename <- sprintf('graphs/SavedPlot%03d.pdf', i)
dev.copy2pdf( file=filename )
i <<- i + 1
}
}
})
setHook('before.plot.new', savegraphs )
setHook('before.grid.newpage', savegraphs )
Now just before you create a new graph the current one will be saved into the graphs folder of the current working folder (make sure that it exists). This means that if you add to a plot (lines, points, abline, etc.) then the annotations will be included. However you will need to run plot.new in order for the last plot to be saved (and if you close the current graphics device without running another plot.new then that last plot will not be saved).
This version will overwrite plots saved from a previous R session in the same working directory. It will also fail if you use something other than base or grid graphics (and maybe even with some complicated plots then). I would not be surprised if there are some extra plots on occasion that show up (when internally a plot is created to get some parameters, then immediatly replaced with the one of interest). There are probably other things that I have overlooked as well, but this might get you started.
you could write your own wrapper functions for your commonly used plot functions. This wrapper function would call both the on-screen display and a timestamped pdf version. You could source() this function in your ~/.Rprofile so that it's available every time you run R.
For latice's xyplot, using the windows device for the on-screen display:
library(lattice)
my.xyplot <- function(...){
dir.create(file.path("~","RPlots"))
my.chart <- xyplot(...)
trellis.device(device="windows",height = 8, width = 8)
print(my.chart)
trellis.device(device = "pdf",
file = file.path("~", "RPlots",
paste("xyplot",format(Sys.time(),"_%Y%m%d_%H-%M-%S"),
".pdf", sep = "")),
paper = "letter", width = 8, height = 8)
print(my.chart)
dev.off()
}
my.data <- data.frame(x=-100:100)
my.data$y <- my.data$x^2
my.xyplot(y~x,data=my.data)
As others have said, you should probably get in the habit of working from an R script, rather than working exclusively from the interactive terminal. If you save your scripts, everything is reproducible and modifiable in the future. Nonetheless, a "log of plots" is an interesting idea.
I have a set of survey data, and I'd like to generate plots of a particular variable, grouped by the respondent's country. The code I have written to generate the plots so far is:
countries <- isplit(drones, drones$v3)
foreach(country = countries) %dopar% {
png(file = paste(output.exp, "/Histogram of Job Satisfaction in ", country$key[[1]], ".png", sep = ""))
country.df <- data.frame(country) #ggplot2 doesn't appreciate the lists nextElem() produces
ggplot(country.df, aes(x = value.v51)) + geom_histogram()
dev.off()
}
The truly bizarre thing? I can run the isplit(), set country <- nextElem(countries), and then run through the code without sending the foreach line - and get a lovely plot. If I send the foreach, I get some blank .png files.
I can definitely do this with standard R loops, but I'd really like to get a better grasp on foreach.
You need to print the plot if you want it to display:
print(ggplot(country.df, aes(x = value.v51)) + geom_histogram())
By default, ggplot commands return a plot object but the command itself does not actually display the plot; that is done with the print command. Note that when you run code interactively, results of commands get printed which is why you often don't need the explicit print. But when wrapping in a foreach, you need to explicitly print since the results of the commands in the body will not be echoed.