Difference: "Compile PDF" button in RStudio vs. knit() and knit2pdf() - r

TL;DR
What are the (possibly unwanted) side-effects of using knit()/knit2pdf() instead of the "Compile PDF"1 button in RStudio?
Motivation
Most users of knitr seem to write their documents in RStudio and compile the documents using the "Compile PDF" / "Knit HTML" button. This works smoothly most of the time, but every once a while there are special requirements that cannot be achieved using the compile button. In these cases, the solution is usually to call knit()/knit2pdf()/rmarkdown::render() (or similar functions) directly.
Some examples:
How to knit/Sweave to a different file name?
Is there a way to knitr markdown straight out of your workspace using RStudio?
Insert date in filename while knitting document using RStudio Knit button
Using knit2pdf() instead of the "Compile PDF" button usually offers a simple solution to such questions. However, this comes at a price: There is the fundamental difference that "Compile PDF" processes the document in a separate process and environment whereas knit2pdf() and friends don't.
This has implications and the problem is that not all of these implications are obvious. Take the fact that knit() uses objects from the global environment (whereas "Compile PDF" does not) as an example. This might be obvious and the desired behavior in cases like the second example above, but it is an unexpected consequence when knit() is used to overcome problems like in example 1 and 3.
Moreover, there are more subtle differences:
The working directory might not be set as expected.
Packages need to be loaded.
Some options that are usually set by RStudio may have unexpected values.
The Question and it's goal
Whenever I read/write the advice to use knit2pdf() instead of "Compile PDF", I think "correct, but the user should understand the consequences …".
Therefore, the question here is:
What are the (possibly unwanted) side-effects of using knit()/knit2pdf() instead of the "Compile PDF" button in RStudio?
If there was a comprehensive (community wiki?) answer to this question, it could be linked in future answers that suggest using knit2pdf().
Related Questions
There are dozens of related questions to this one. However, they either propose only code to (more or less) reproduce the behavior of the RStudio button or they explain what "basically" happens without mentioning the possible pitfalls. Others look like being very similar questions but turn out to be a (very) special case of it. Some examples:
Knit2html not replicating functionality of Knit HTML button in R Studio: Caching issue.
HTML outputs are different between using knitr in Rstudio & knit2html in command line: Markdown versions.
How to convert R Markdown to HTML? I.e., What does “Knit HTML” do in Rstudio 0.96?: Rather superficial answer by Yihui (explains what "basically" happens) and some options how to reproduce the behavior of the RStudio button. Neither the suggested Sys.sleep(30) nor the "Compile PDF" log are insightful (both hints point to the same thing).
What does “Knit HTML” do in Rstudio 0.98?: Reproduce behavior of button.
About the answer
I think this question raised many of the issues that should be part of an answer. However, there might be many more aspects I don't know about which is the reason why I am reluctant to self-answer this question (though I might try if nobody answers).
Probably, an answer should cover three main points:
The new session vs. current session issue (global options, working directory, loaded packages, …).
A consequence of the first point: The fact that knit() uses objects from the calling environment (default: envir = parent.frame()) and implications for reproducibility. I tried to tackle the issue of preventing knit() from using objects from outside the document in this answer (second bullet point).
Things RStudio secretly does …
… when starting an interactive session (example) --> Not available when hitting "Compile PDF"
… when hitting "Compile PDF" (anything special besides the new session with the working directory set to the file processed?)
I am not sure about the right perspective on the issue. I think both, "What happens when I hit 'Compile PDF' + implications" as well as "What happens when I use knit() + implications" is a good approach to tackle the question.
1 The same applies to the "Knit HTML" button when writing RMD documents.

First of all, I think this question is easier to answer if you limit the scope to the "Compile PDF" button, because the "Knit HTML" button is a different story. "Compile PDF" is only for Rnw documents (R + LaTeX, or think Sweave).
I'll answer your question following the three points you suggested:
Currently RStudio always launch a new R session to compile Rnw documents, and first changes the working directory to the directory of the Rnw file. You can imagine the process as a shell script like this:
cd path/to/your-Rnw-directory
Rscript -e "library(knitr); knit('your.Rnw')"
pdflatex your.tex
Note that the knitr package is always attached, and pdflatex might be other LaTeX engines (depending on your RStudio configurations for Sweave documents, e.g., xelatex). If you want to replicate it in your current R session, you may rewrite the script in R:
owd = setwd("path/to/your-Rnw-directory")
system2("Rscript", c("-e", shQuote("library(knitr); knit('your.Rnw')"))
system2("pdflatex", "your.tex")
setwd(owd)
which is not as simple as knitr::knit('path/to/your.Rnw'), in which case the working directory is not automatically changed, and everything is executed in the current R session (in the globalenv() by default).
Because the Rnw document is always compiled in a new R session, it won't use any objects in your current R session. This is hard to replicate only through the envir argument of knitr::knit() in the current R session. In particular, you cannot use knitr::knit(envir = new.env()) because although new.env() is a new environment, it has a default parent environment parent.frame(), which is typically the globalenv(); you cannot use knitr::knit(envir = emptyenv()), either, because it is "too clean", and you will have trouble with objects even in the R base package. The only reliable way to replicate what the "Compile PDF" button does is what I said in 1: system2("Rscript", c("-e", shQuote("library(knitr); knit('your.Rnw')")), in which case knit() uses the globalenv() of a new R session.
I'm not entirely sure about what RStudio does for the repos option. It probably automatically sets this option behind the scenes if it is not set. I think this is a relatively minor issue. You can set it in your .Rprofile, and I think RStudio should respect your CRAN mirror setting.
Users have always been asking why the Rnw document (or R Markdown documents) are not compiled in the current R session. To us, it basically boils down to which of the following consequences is more surprising or undesired:
If we knit a document in the current R session, there is no guarantee that your results can be reproduced in another R session (e.g., the next time you open RStudio, or your collaborators open RStudio on their computers).
If we knit a document in a new R session, users can be surprised that objects are not found (and when they type the object names in the R console, they can see them). This can be surprising, but it is also a good and early reminder that your document probably won't work the next time.
To sum it up, I think:
Knitting in a new R session is better for reproducibilty;
Knitting in the current R session is sometimes more convenient (e.g., you try to knit with different temporary R objects in the current session). Sometimes you also have to knit in the current R session, especially when you are generating PDF reports programmatically, e.g., you use a (for) loop to generate a series of reports. There is no way that you can achieve this only through the "Compile PDF" button (the button is mostly only for a single Rnw document).
BTW, I think what I said above can also apply to the Knit or Knit HTML buttons, but the underlying function is rmarkdown::render() instead of knitr::knit().

Related

RStudio environment pane does not show knitr variables

I have a very simple knitr Rnw script. When I run it in RStudio, the Environment Pane shows that the global environment is empty although the script compiles into pdf correctly, and the variable is evaluated correctly.
\documentclass{article}
\begin{document}
<<settings, echo=FALSE>>=
library(knitr)
a<-1+10
#
The outcome is equal to \Sexpr{a}.
\end{document}
This worked fine always until recently. I wonder whether this has to do with some RStudio settings or knitr options. Variables in a regular R script show up fine in the environment pane. For more complex knitr projects, being able to look at the variables can make work much much easier.
Normally when you knit a document by clicking knit in RStudio, it is run in a separate R process, and the variables are deleted when done. Your code chunks won't be able to see variables in your environment, and those variables won't get modified.
There are ways to run in your current process: run each chunk as code, or run rmarkdown::render("somefilename.Rmd") in your main process. Then your document can see your current workspace and make modifications there.
For debugging, the 2nd approach is convenient, but for reproducibility of the final result, you should run the separate R process.
In the end, this is what worked for me. Instead of clicking on "Compile pdf" button in RStudio, I ran the following in the console:
knitr::knit2pdf("file.Rnw", envir = globalenv())
This had been recommended here:
knitr: why does nothing appear in the "environment" panel when I compile my .Rnw file

Is there a way to create an R knitr program file which is also an R (console) program file?

I've started using knitr (without pander) and I'm very impressed.
I can find instructions for writing inline knitr markdown – which will be processed even though a hash is written at the beginning of a line (which will be useful). However, it has occurred to me that if knitr can read and process such information, perhaps there is a way to write ALL markdown instructions e.g. ```{r} with a hash at the beginning of the line ? I.e., I would like it if ##```{r} also worked when run via knit.
This would allow me to create files which work without errors when run using R console and also when run via knit – which might be useful when files are submitted for review.

How to use objects from global environment in Rstudio Markdown

I've seen similar questions on Stack Overflow but virtually no conclusive answers, and certainly no answer that worked for me.
What is the easiest way to access and use objects (regression fits, data frames, other objects) that are located in the global R environment in the Markdown (Rstudio) script.
I find it surprising that there is no easy solution to this, given the tendency of the RStudio team to make things comfortable and effective.
Thanks in advance.
For better or worse, this omission is intentional. Relying on objects created outside the document makes your document less reproducible--that is, if your document needs data in the global environment, you can't just give someone (or yourself in two years) the document and data files and let them recreate it themselves.
For this reason, and in order to perform the render in the background, RStudio actually creates a separate R session to render the document. That background R session cannot see any of the environments in the interactive R session you see in RStudio.
The best way around this problem is to take the code you used to create the contents of your global environment and move it inside your document (you can use echo = FALSE if you don't want it to show up in the document). This makes your document self-contained and reproducible.
If you can't do that, there are a few approaches you can take to use the data in the global environment directly:
Instead of using the Knit HTML button, type rmarkdown::render("your_doc.Rmd") at the R console. This will knit in the current session instead of a background session. Alternatively:
Save your global environment to an .Rdata file prior to rendering (use R's save function), and load it in your document.
Well, in my case i found the following solution:
(1) Save your Global Environmental in a .Rdata file inside the same folder where you have your .Rmd file. (You just need click at disquet picture that is on "Global Environmental" panel)
(2) Write the following code in your script of Rmarkdown:
load(file = "filename.RData") # it load the file that you saved before
and stop suffering.
Going to RStudio´s 'Tools' and 'Global options' and visiting the 'R Markdown' tab, you can make a selection in 'Evaluate chunks in directory', there select the option 'Documents' and the R Markdown knitting engine will be accessing the global environment as plain R code does. Hope this helps those who search this info!
The thread is old but in case anyone's still looking for a solution (as I was):
You can pass an envir parameter to the render() (or knit() function) so that it can access objects from the environment it was called from.
rmarkdown::render(
input = input_rmd,
output_file = output_file,
envir = parent.frame()
)
I have the same problem myself. Some stuff is pretty time consuming to reproduce every time.
I think there could be another answer. What if you save your environment with the save.image() function to a different file than the standard .Rdata one. Then, bring it back with load().
To be sure you are using the same data, use the md5sum() from tools.
Cheers, Cord
I think I solved this problem by referring to the package explicitly in the code that is being knitted. Using the yarrr package, for example, I loaded the dataframe "pirates" using data(pirates). This worked fine at the console and within an Rstudio code chunk, but with knitr it failed following the pattern in the question above. If, however, I loaded the data into memory by creating an object using pirates <- yarrr::pirates, the document then knitted cleanly to HTML.
You can load the script in the desired environment as follows:
```{r, include=FALSE}
source("your-script.R", local = knitr::knit_global())
# or sys.source("your-script.R", envir = knitr::knit_global())
```
Next in the R Markdown document, you can use objects created in these scripts (e.g., data objects or functions).
https://bookdown.org/yihui/rmarkdown-cookbook/source-script.html
One option that I have not yet seen is the use of parameters.
This chapter goes through a simple example of how to do this.

ESS & Knitr/Sweave: How to source the Rnw file into an interactive session?

This is a terribly simple request, and I can't believe I haven't found the solution to this yet, but I've been searching for it far and wide without luck.
I have an .Rnw file loaded up in Emacs, I use M-n s to compile it.
Everything works well, and it even opens an R buffer. Great. But that buffer
is entirely useless: it doesn't contain the objects that I just sourced!
Example minimal .Rnw file:
\documentclass{article}
\begin{document}
<<>>=
foo <- "bar"
#
\end{document}
Using M-n s, I now have a new R-buffer with a session loaded up, but:
> foo
Error: object 'foo' not found
That is disappointing. I would like to play around with the data interactively.
How do I achieve that? I don't want to be sourcing the file line-by-line, or
region-by-region with C-c C-c or something similar every time I change my code.
Ideally, it should be just like RStudio's source function, that leaves me with
a fully prepared R session.
I haven't tried this with sweave yet, only with knitr.
EDIT: the eval=TRUE chunk option does not seem to result in the correct behaviour.
This behaviour was recently changed in ESS. Now sweave and knitr are executed directly in the global environment, as if when you write it yourself at command line. So wait for a couple of more weeks till ESSv13.09 is out or use the development version.
Alternatively, you can also set ess-swv-processing-command to "%s(%s)" and you will get the same result, except automatic library loading.
For the record, knitr (in contrast to sweave) evaluates everything in it's own environment unless you instruct it otherwise.
[edit: Something went wrong. I don't see the correct .ess_weave any more. Probably some git commit messup again. So it is not fixed in 13.09. Fixing it now. Sorry.]
Open an interactive R session, and then call Sweave directly, I believe like this (untested though). knitr works in the same way, though you need to load the knitr library first.
> Sweave("yourfile.Rnw")
There is some potential for peril here, though. If you call Sweave in a session after doing other things, your code can use things previously in the workspace, thus making your results unreproducible.

avoid displayed figures during sweave/pgfsweave compilation

When compiling with sweave/pgfsweave, every time a figure is created in R it is shown in a graphics windows (during the sweave compilation process). This is helpful in many cases as I can see what the figures look like as the document is being compiled.
But when I compile through ssh a large document this can be very slow. Is there a way to tell sweave/pgfsweave to avoid displaying the figure during the compilation (I still want the figure in the final pdf document though).
For interactive sessions, the figs.only Sweave option controls this behavior. To plot figures only to the target graphics files (and not to a console graphical window) set figs.only=TRUE.
As explained in the RweaveLatex help file:
figs.only: logical (‘FALSE’). By default each figure chunk is run
once, then re-run for each selected type of graphics. That
will open a default graphics device for the first figure
chunk and use that device for the first evaluation of all
subsequent chunks. If this option is true, the figure chunk
is run only for each selected type of graphics, for which a
new graphics device is opened and then closed.
As with other Sweave options, you can set this option: (1) for the current compilation (e.g. Sweave("example.Rnw", figs.only=TRUE); (2) within the .Rnw file, using \SweaveOpts{figs.only=TRUE}; or (3) as a global default, by putting SWEAVE_OPTIONS="figs.only=TRUE" in, e.g., $R_HOME/etc/Renviron.site
figs.only is the correct way to go, and I also want to mention the default graphical device in R here:
For now you may look at this: http://yihui.name/en/2010/12/a-special-graphics-device-in-r-the-null-device/
After R 2.14.1 (not released yet) you will be able to set the default device to a null PDF device, which is both safe and fast: https://github.com/yihui/knitr/issues/9
If you sweave from the command line instead of in an interactive session, graphics aren't produced in an interactive graphic window.
You can run R from the command line by just typing R CMD Sweave mydoc.Rnw or via a batch file, or a makefile for larger projects. I've started to use makefiles for many of my sweave documents as it handles dependencies, can clear up after itself and much more.
One option could be
<<label=myplotlabel, fig=TRUE, include=FALSE>>=
graph code
#
then
\begin{figure}[h]
\includegraphics[width=6cm, height=6cm]{myplotlabel}
\caption{My Plot}
\label{fig:label}
\end{figure}

Resources