R Markdown / Quarto - Avoid loading package every time I knit a document

R Markdown / Quarto - Avoid loading package every time I knit a document - r

Simple question but can't find a solution.
I have a quarto document (but this apply to Markdown as well) in which I use R to execute some code. Obviously, in the first chunk of the document, I load the packages needed (let's say for example):
```{r setup}
library(tidyverse)
library(survival)
library(survminer)
```
Now, everytime I knit the file to render the document, these packages are loaded, which can be pretty time consuming especially if you have a long list of packages to import. using cache=TRUE doesn't seem to work properly. Is there anyway to avoid loading the package everytime I knit the document, and only load them when they are not loaded in the environment/in the first knit call of the session at least?

The usual way to run Quarto or RMarkdown is in a clean session, not in the current session. So you normally only have a minimal set of packages already loaded.
If you run rmarkdown::render( ... ) in a session, that doesn't happen, and things will run in the current session. That will speed up library() calls a lot, because they do nothing if the package has already been attached to the search list.
I don't know if something similar is available for Quarto, but in any case, it's a risky strategy: what you hope for from an RMarkdown or Quarto document is something that is reproducible. If you run in the current session, you run the risk of getting results that depend on variables in the current session.
I'd advise you to identify which packages are slow to load, and try to follow #Arthur's suggestion from a comment: precompute the objects that those packages produce, and just load them in the document. Then you may not need the package at all.

Related

Is there a way to create an R knitr program file which is also an R (console) program file?

I've started using knitr (without pander) and I'm very impressed.
I can find instructions for writing inline knitr markdown – which will be processed even though a hash is written at the beginning of a line (which will be useful). However, it has occurred to me that if knitr can read and process such information, perhaps there is a way to write ALL markdown instructions e.g. ```{r} with a hash at the beginning of the line ? I.e., I would like it if ##```{r} also worked when run via knit.
This would allow me to create files which work without errors when run using R console and also when run via knit – which might be useful when files are submitted for review.

My code will run as a chunk but gives me an error when I try to knit

Hi I'm new to R and I'm using RStudio Cloud for a university stats course.
The code I'm having trouble with will run as a chunk but when I try to knit the project it comes up with an error saying that the object 'filename' not found.
The 'filename' is listed in the global environment but it is a tbl_df, which I'm thinking is not the right kind of object for knitting.

It is difficult to answer without having all the code. And code is almost always better than a screenshot.
My guess is that you loaded the dataset X2019THBrier manually in RStudio. Thus you can access it in chunks, in the current R session, but not in the knitted R session.
You need to write commands to load the data. As you are loading an XLSX file, you might want to install the openxlsx package, and use the openxlsx::read.xlsx() command.

Difference: "Compile PDF" button in RStudio vs. knit() and knit2pdf()

TL;DR
What are the (possibly unwanted) side-effects of using knit()/knit2pdf() instead of the "Compile PDF"1 button in RStudio?
Motivation
Most users of knitr seem to write their documents in RStudio and compile the documents using the "Compile PDF" / "Knit HTML" button. This works smoothly most of the time, but every once a while there are special requirements that cannot be achieved using the compile button. In these cases, the solution is usually to call knit()/knit2pdf()/rmarkdown::render() (or similar functions) directly.
Some examples:
How to knit/Sweave to a different file name?
Is there a way to knitr markdown straight out of your workspace using RStudio?
Insert date in filename while knitting document using RStudio Knit button
Using knit2pdf() instead of the "Compile PDF" button usually offers a simple solution to such questions. However, this comes at a price: There is the fundamental difference that "Compile PDF" processes the document in a separate process and environment whereas knit2pdf() and friends don't.
This has implications and the problem is that not all of these implications are obvious. Take the fact that knit() uses objects from the global environment (whereas "Compile PDF" does not) as an example. This might be obvious and the desired behavior in cases like the second example above, but it is an unexpected consequence when knit() is used to overcome problems like in example 1 and 3.
Moreover, there are more subtle differences:
The working directory might not be set as expected.
Packages need to be loaded.
Some options that are usually set by RStudio may have unexpected values.
The Question and it's goal
Whenever I read/write the advice to use knit2pdf() instead of "Compile PDF", I think "correct, but the user should understand the consequences …".
Therefore, the question here is:
What are the (possibly unwanted) side-effects of using knit()/knit2pdf() instead of the "Compile PDF" button in RStudio?
If there was a comprehensive (community wiki?) answer to this question, it could be linked in future answers that suggest using knit2pdf().
Related Questions
There are dozens of related questions to this one. However, they either propose only code to (more or less) reproduce the behavior of the RStudio button or they explain what "basically" happens without mentioning the possible pitfalls. Others look like being very similar questions but turn out to be a (very) special case of it. Some examples:
Knit2html not replicating functionality of Knit HTML button in R Studio: Caching issue.
HTML outputs are different between using knitr in Rstudio & knit2html in command line: Markdown versions.
How to convert R Markdown to HTML? I.e., What does “Knit HTML” do in Rstudio 0.96?: Rather superficial answer by Yihui (explains what "basically" happens) and some options how to reproduce the behavior of the RStudio button. Neither the suggested Sys.sleep(30) nor the "Compile PDF" log are insightful (both hints point to the same thing).
What does “Knit HTML” do in Rstudio 0.98?: Reproduce behavior of button.
About the answer
I think this question raised many of the issues that should be part of an answer. However, there might be many more aspects I don't know about which is the reason why I am reluctant to self-answer this question (though I might try if nobody answers).
Probably, an answer should cover three main points:
The new session vs. current session issue (global options, working directory, loaded packages, …).
A consequence of the first point: The fact that knit() uses objects from the calling environment (default: envir = parent.frame()) and implications for reproducibility. I tried to tackle the issue of preventing knit() from using objects from outside the document in this answer (second bullet point).
Things RStudio secretly does …
… when starting an interactive session (example) --> Not available when hitting "Compile PDF"
… when hitting "Compile PDF" (anything special besides the new session with the working directory set to the file processed?)
I am not sure about the right perspective on the issue. I think both, "What happens when I hit 'Compile PDF' + implications" as well as "What happens when I use knit() + implications" is a good approach to tackle the question.
1 The same applies to the "Knit HTML" button when writing RMD documents.

First of all, I think this question is easier to answer if you limit the scope to the "Compile PDF" button, because the "Knit HTML" button is a different story. "Compile PDF" is only for Rnw documents (R + LaTeX, or think Sweave).
I'll answer your question following the three points you suggested:
Currently RStudio always launch a new R session to compile Rnw documents, and first changes the working directory to the directory of the Rnw file. You can imagine the process as a shell script like this:
cd path/to/your-Rnw-directory
Rscript -e "library(knitr); knit('your.Rnw')"
pdflatex your.tex
Note that the knitr package is always attached, and pdflatex might be other LaTeX engines (depending on your RStudio configurations for Sweave documents, e.g., xelatex). If you want to replicate it in your current R session, you may rewrite the script in R:
owd = setwd("path/to/your-Rnw-directory")
system2("Rscript", c("-e", shQuote("library(knitr); knit('your.Rnw')"))
system2("pdflatex", "your.tex")
setwd(owd)
which is not as simple as knitr::knit('path/to/your.Rnw'), in which case the working directory is not automatically changed, and everything is executed in the current R session (in the globalenv() by default).
Because the Rnw document is always compiled in a new R session, it won't use any objects in your current R session. This is hard to replicate only through the envir argument of knitr::knit() in the current R session. In particular, you cannot use knitr::knit(envir = new.env()) because although new.env() is a new environment, it has a default parent environment parent.frame(), which is typically the globalenv(); you cannot use knitr::knit(envir = emptyenv()), either, because it is "too clean", and you will have trouble with objects even in the R base package. The only reliable way to replicate what the "Compile PDF" button does is what I said in 1: system2("Rscript", c("-e", shQuote("library(knitr); knit('your.Rnw')")), in which case knit() uses the globalenv() of a new R session.
I'm not entirely sure about what RStudio does for the repos option. It probably automatically sets this option behind the scenes if it is not set. I think this is a relatively minor issue. You can set it in your .Rprofile, and I think RStudio should respect your CRAN mirror setting.
Users have always been asking why the Rnw document (or R Markdown documents) are not compiled in the current R session. To us, it basically boils down to which of the following consequences is more surprising or undesired:
If we knit a document in the current R session, there is no guarantee that your results can be reproduced in another R session (e.g., the next time you open RStudio, or your collaborators open RStudio on their computers).
If we knit a document in a new R session, users can be surprised that objects are not found (and when they type the object names in the R console, they can see them). This can be surprising, but it is also a good and early reminder that your document probably won't work the next time.
To sum it up, I think:
Knitting in a new R session is better for reproducibilty;
Knitting in the current R session is sometimes more convenient (e.g., you try to knit with different temporary R objects in the current session). Sometimes you also have to knit in the current R session, especially when you are generating PDF reports programmatically, e.g., you use a (for) loop to generate a series of reports. There is no way that you can achieve this only through the "Compile PDF" button (the button is mostly only for a single Rnw document).
BTW, I think what I said above can also apply to the Knit or Knit HTML buttons, but the underlying function is rmarkdown::render() instead of knitr::knit().

How to use objects from global environment in Rstudio Markdown

I've seen similar questions on Stack Overflow but virtually no conclusive answers, and certainly no answer that worked for me.
What is the easiest way to access and use objects (regression fits, data frames, other objects) that are located in the global R environment in the Markdown (Rstudio) script.
I find it surprising that there is no easy solution to this, given the tendency of the RStudio team to make things comfortable and effective.
Thanks in advance.

For better or worse, this omission is intentional. Relying on objects created outside the document makes your document less reproducible--that is, if your document needs data in the global environment, you can't just give someone (or yourself in two years) the document and data files and let them recreate it themselves.
For this reason, and in order to perform the render in the background, RStudio actually creates a separate R session to render the document. That background R session cannot see any of the environments in the interactive R session you see in RStudio.
The best way around this problem is to take the code you used to create the contents of your global environment and move it inside your document (you can use echo = FALSE if you don't want it to show up in the document). This makes your document self-contained and reproducible.
If you can't do that, there are a few approaches you can take to use the data in the global environment directly:
Instead of using the Knit HTML button, type rmarkdown::render("your_doc.Rmd") at the R console. This will knit in the current session instead of a background session. Alternatively:
Save your global environment to an .Rdata file prior to rendering (use R's save function), and load it in your document.

Well, in my case i found the following solution:
(1) Save your Global Environmental in a .Rdata file inside the same folder where you have your .Rmd file. (You just need click at disquet picture that is on "Global Environmental" panel)
(2) Write the following code in your script of Rmarkdown:
load(file = "filename.RData") # it load the file that you saved before
and stop suffering.

Going to RStudio´s 'Tools' and 'Global options' and visiting the 'R Markdown' tab, you can make a selection in 'Evaluate chunks in directory', there select the option 'Documents' and the R Markdown knitting engine will be accessing the global environment as plain R code does. Hope this helps those who search this info!

The thread is old but in case anyone's still looking for a solution (as I was):
You can pass an envir parameter to the render() (or knit() function) so that it can access objects from the environment it was called from.
rmarkdown::render(
input = input_rmd,
output_file = output_file,
envir = parent.frame()
)

I have the same problem myself. Some stuff is pretty time consuming to reproduce every time.
I think there could be another answer. What if you save your environment with the save.image() function to a different file than the standard .Rdata one. Then, bring it back with load().
To be sure you are using the same data, use the md5sum() from tools.
Cheers, Cord

I think I solved this problem by referring to the package explicitly in the code that is being knitted. Using the yarrr package, for example, I loaded the dataframe "pirates" using data(pirates). This worked fine at the console and within an Rstudio code chunk, but with knitr it failed following the pattern in the question above. If, however, I loaded the data into memory by creating an object using pirates <- yarrr::pirates, the document then knitted cleanly to HTML.

You can load the script in the desired environment as follows:
```{r, include=FALSE}
source("your-script.R", local = knitr::knit_global())
# or sys.source("your-script.R", envir = knitr::knit_global())
```
Next in the R Markdown document, you can use objects created in these scripts (e.g., data objects or functions).
https://bookdown.org/yihui/rmarkdown-cookbook/source-script.html

One option that I have not yet seen is the use of parameters.
This chapter goes through a simple example of how to do this.

ESS & Knitr/Sweave: How to source the Rnw file into an interactive session?

This is a terribly simple request, and I can't believe I haven't found the solution to this yet, but I've been searching for it far and wide without luck.
I have an .Rnw file loaded up in Emacs, I use M-n s to compile it.
Everything works well, and it even opens an R buffer. Great. But that buffer
is entirely useless: it doesn't contain the objects that I just sourced!
Example minimal .Rnw file:
\documentclass{article}
\begin{document}
<<>>=
foo <- "bar"
#
\end{document}
Using M-n s, I now have a new R-buffer with a session loaded up, but:
> foo
Error: object 'foo' not found
That is disappointing. I would like to play around with the data interactively.
How do I achieve that? I don't want to be sourcing the file line-by-line, or
region-by-region with C-c C-c or something similar every time I change my code.
Ideally, it should be just like RStudio's source function, that leaves me with
a fully prepared R session.
I haven't tried this with sweave yet, only with knitr.
EDIT: the eval=TRUE chunk option does not seem to result in the correct behaviour.

This behaviour was recently changed in ESS. Now sweave and knitr are executed directly in the global environment, as if when you write it yourself at command line. So wait for a couple of more weeks till ESSv13.09 is out or use the development version.
Alternatively, you can also set ess-swv-processing-command to "%s(%s)" and you will get the same result, except automatic library loading.
For the record, knitr (in contrast to sweave) evaluates everything in it's own environment unless you instruct it otherwise.
[edit: Something went wrong. I don't see the correct .ess_weave any more. Probably some git commit messup again. So it is not fixed in 13.09. Fixing it now. Sorry.]

Open an interactive R session, and then call Sweave directly, I believe like this (untested though). knitr works in the same way, though you need to load the knitr library first.
> Sweave("yourfile.Rnw")
There is some potential for peril here, though. If you call Sweave in a session after doing other things, your code can use things previously in the workspace, thus making your results unreproducible.