How to use objects from global environment in Rstudio Markdown - r

I've seen similar questions on Stack Overflow but virtually no conclusive answers, and certainly no answer that worked for me.
What is the easiest way to access and use objects (regression fits, data frames, other objects) that are located in the global R environment in the Markdown (Rstudio) script.
I find it surprising that there is no easy solution to this, given the tendency of the RStudio team to make things comfortable and effective.
Thanks in advance.

For better or worse, this omission is intentional. Relying on objects created outside the document makes your document less reproducible--that is, if your document needs data in the global environment, you can't just give someone (or yourself in two years) the document and data files and let them recreate it themselves.
For this reason, and in order to perform the render in the background, RStudio actually creates a separate R session to render the document. That background R session cannot see any of the environments in the interactive R session you see in RStudio.
The best way around this problem is to take the code you used to create the contents of your global environment and move it inside your document (you can use echo = FALSE if you don't want it to show up in the document). This makes your document self-contained and reproducible.
If you can't do that, there are a few approaches you can take to use the data in the global environment directly:
Instead of using the Knit HTML button, type rmarkdown::render("your_doc.Rmd") at the R console. This will knit in the current session instead of a background session. Alternatively:
Save your global environment to an .Rdata file prior to rendering (use R's save function), and load it in your document.

Well, in my case i found the following solution:
(1) Save your Global Environmental in a .Rdata file inside the same folder where you have your .Rmd file. (You just need click at disquet picture that is on "Global Environmental" panel)
(2) Write the following code in your script of Rmarkdown:
load(file = "filename.RData") # it load the file that you saved before
and stop suffering.

Going to RStudio´s 'Tools' and 'Global options' and visiting the 'R Markdown' tab, you can make a selection in 'Evaluate chunks in directory', there select the option 'Documents' and the R Markdown knitting engine will be accessing the global environment as plain R code does. Hope this helps those who search this info!

The thread is old but in case anyone's still looking for a solution (as I was):
You can pass an envir parameter to the render() (or knit() function) so that it can access objects from the environment it was called from.
rmarkdown::render(
input = input_rmd,
output_file = output_file,
envir = parent.frame()
)

I have the same problem myself. Some stuff is pretty time consuming to reproduce every time.
I think there could be another answer. What if you save your environment with the save.image() function to a different file than the standard .Rdata one. Then, bring it back with load().
To be sure you are using the same data, use the md5sum() from tools.
Cheers, Cord

I think I solved this problem by referring to the package explicitly in the code that is being knitted. Using the yarrr package, for example, I loaded the dataframe "pirates" using data(pirates). This worked fine at the console and within an Rstudio code chunk, but with knitr it failed following the pattern in the question above. If, however, I loaded the data into memory by creating an object using pirates <- yarrr::pirates, the document then knitted cleanly to HTML.

You can load the script in the desired environment as follows:
```{r, include=FALSE}
source("your-script.R", local = knitr::knit_global())
# or sys.source("your-script.R", envir = knitr::knit_global())
```
Next in the R Markdown document, you can use objects created in these scripts (e.g., data objects or functions).
https://bookdown.org/yihui/rmarkdown-cookbook/source-script.html

One option that I have not yet seen is the use of parameters.
This chapter goes through a simple example of how to do this.

Related

Difference: "Compile PDF" button in RStudio vs. knit() and knit2pdf()

TL;DR
What are the (possibly unwanted) side-effects of using knit()/knit2pdf() instead of the "Compile PDF"1 button in RStudio?
Motivation
Most users of knitr seem to write their documents in RStudio and compile the documents using the "Compile PDF" / "Knit HTML" button. This works smoothly most of the time, but every once a while there are special requirements that cannot be achieved using the compile button. In these cases, the solution is usually to call knit()/knit2pdf()/rmarkdown::render() (or similar functions) directly.
Some examples:
How to knit/Sweave to a different file name?
Is there a way to knitr markdown straight out of your workspace using RStudio?
Insert date in filename while knitting document using RStudio Knit button
Using knit2pdf() instead of the "Compile PDF" button usually offers a simple solution to such questions. However, this comes at a price: There is the fundamental difference that "Compile PDF" processes the document in a separate process and environment whereas knit2pdf() and friends don't.
This has implications and the problem is that not all of these implications are obvious. Take the fact that knit() uses objects from the global environment (whereas "Compile PDF" does not) as an example. This might be obvious and the desired behavior in cases like the second example above, but it is an unexpected consequence when knit() is used to overcome problems like in example 1 and 3.
Moreover, there are more subtle differences:
The working directory might not be set as expected.
Packages need to be loaded.
Some options that are usually set by RStudio may have unexpected values.
The Question and it's goal
Whenever I read/write the advice to use knit2pdf() instead of "Compile PDF", I think "correct, but the user should understand the consequences …".
Therefore, the question here is:
What are the (possibly unwanted) side-effects of using knit()/knit2pdf() instead of the "Compile PDF" button in RStudio?
If there was a comprehensive (community wiki?) answer to this question, it could be linked in future answers that suggest using knit2pdf().
Related Questions
There are dozens of related questions to this one. However, they either propose only code to (more or less) reproduce the behavior of the RStudio button or they explain what "basically" happens without mentioning the possible pitfalls. Others look like being very similar questions but turn out to be a (very) special case of it. Some examples:
Knit2html not replicating functionality of Knit HTML button in R Studio: Caching issue.
HTML outputs are different between using knitr in Rstudio & knit2html in command line: Markdown versions.
How to convert R Markdown to HTML? I.e., What does “Knit HTML” do in Rstudio 0.96?: Rather superficial answer by Yihui (explains what "basically" happens) and some options how to reproduce the behavior of the RStudio button. Neither the suggested Sys.sleep(30) nor the "Compile PDF" log are insightful (both hints point to the same thing).
What does “Knit HTML” do in Rstudio 0.98?: Reproduce behavior of button.
About the answer
I think this question raised many of the issues that should be part of an answer. However, there might be many more aspects I don't know about which is the reason why I am reluctant to self-answer this question (though I might try if nobody answers).
Probably, an answer should cover three main points:
The new session vs. current session issue (global options, working directory, loaded packages, …).
A consequence of the first point: The fact that knit() uses objects from the calling environment (default: envir = parent.frame()) and implications for reproducibility. I tried to tackle the issue of preventing knit() from using objects from outside the document in this answer (second bullet point).
Things RStudio secretly does …
… when starting an interactive session (example) --> Not available when hitting "Compile PDF"
… when hitting "Compile PDF" (anything special besides the new session with the working directory set to the file processed?)
I am not sure about the right perspective on the issue. I think both, "What happens when I hit 'Compile PDF' + implications" as well as "What happens when I use knit() + implications" is a good approach to tackle the question.
1 The same applies to the "Knit HTML" button when writing RMD documents.
First of all, I think this question is easier to answer if you limit the scope to the "Compile PDF" button, because the "Knit HTML" button is a different story. "Compile PDF" is only for Rnw documents (R + LaTeX, or think Sweave).
I'll answer your question following the three points you suggested:
Currently RStudio always launch a new R session to compile Rnw documents, and first changes the working directory to the directory of the Rnw file. You can imagine the process as a shell script like this:
cd path/to/your-Rnw-directory
Rscript -e "library(knitr); knit('your.Rnw')"
pdflatex your.tex
Note that the knitr package is always attached, and pdflatex might be other LaTeX engines (depending on your RStudio configurations for Sweave documents, e.g., xelatex). If you want to replicate it in your current R session, you may rewrite the script in R:
owd = setwd("path/to/your-Rnw-directory")
system2("Rscript", c("-e", shQuote("library(knitr); knit('your.Rnw')"))
system2("pdflatex", "your.tex")
setwd(owd)
which is not as simple as knitr::knit('path/to/your.Rnw'), in which case the working directory is not automatically changed, and everything is executed in the current R session (in the globalenv() by default).
Because the Rnw document is always compiled in a new R session, it won't use any objects in your current R session. This is hard to replicate only through the envir argument of knitr::knit() in the current R session. In particular, you cannot use knitr::knit(envir = new.env()) because although new.env() is a new environment, it has a default parent environment parent.frame(), which is typically the globalenv(); you cannot use knitr::knit(envir = emptyenv()), either, because it is "too clean", and you will have trouble with objects even in the R base package. The only reliable way to replicate what the "Compile PDF" button does is what I said in 1: system2("Rscript", c("-e", shQuote("library(knitr); knit('your.Rnw')")), in which case knit() uses the globalenv() of a new R session.
I'm not entirely sure about what RStudio does for the repos option. It probably automatically sets this option behind the scenes if it is not set. I think this is a relatively minor issue. You can set it in your .Rprofile, and I think RStudio should respect your CRAN mirror setting.
Users have always been asking why the Rnw document (or R Markdown documents) are not compiled in the current R session. To us, it basically boils down to which of the following consequences is more surprising or undesired:
If we knit a document in the current R session, there is no guarantee that your results can be reproduced in another R session (e.g., the next time you open RStudio, or your collaborators open RStudio on their computers).
If we knit a document in a new R session, users can be surprised that objects are not found (and when they type the object names in the R console, they can see them). This can be surprising, but it is also a good and early reminder that your document probably won't work the next time.
To sum it up, I think:
Knitting in a new R session is better for reproducibilty;
Knitting in the current R session is sometimes more convenient (e.g., you try to knit with different temporary R objects in the current session). Sometimes you also have to knit in the current R session, especially when you are generating PDF reports programmatically, e.g., you use a (for) loop to generate a series of reports. There is no way that you can achieve this only through the "Compile PDF" button (the button is mostly only for a single Rnw document).
BTW, I think what I said above can also apply to the Knit or Knit HTML buttons, but the underlying function is rmarkdown::render() instead of knitr::knit().

knitr: Referring to and evaluating external code chunks

How could I refer to and evaluate (in an .Rmd file) specific chunks of R code, located not in a different .Rmd file, but in an R module, containing chunks of code, tagged with ## #knitr chunk_name? Thanks!
I just figured out what the problem was: I have simply forgotten to call read_chunk() function for the R module, containing those external code chunks. So far, everything appears to be working, with the exception, mentioned below.
One problem I'm currently experiencing (and this might be a good separate question, but I'll leave it as is for now) is that knitr doesn't seem to respect working directory and paths, constructed on its basis, using relative paths, such as file.path(getwd(), "data/transform"). I think this contradicts with knitr design, which allows code reuse via chunks in external R modules. What are approaches that people are using to solve this peculiar situation? I believe that it might be a good idea to submit as a feature request.

Knitr global environment

I can't seem to get R markdown/knitr to see/use objects in my global environment in R.
From what I've read knitr should use the global environment as standard but every one of my objects I include in a code chunk returns the error
## Error: object 'XXX' not found
Am I missing something really simple here?
Do I need to manually load objects from the global environment first?
Thanks in advance
Marty
If you've already saved the object(s) to a file, then one clean approach for markdown purposes is as follows:
if(file.exists("rfModel.Rda")){
load("rfModel.Rda")} else {
modFit <- train(class~.,method="rf",data=train)
}
This effectively bypasses the lengthy model build time by only building it if it does not exist as an object yet, so that it preserves reproducibility. This is similar to the cache idea, but is more generalizable IMHO.
It sounds like you want the same code to work with both knitr and your global environment. This can be useful when building complicated Rmd files that require testing during construction.
The issue lies in that knitr uses the local folder when you press knit, and does not look for the project home folder (i.e. your Rproj - I am assuming you use relative paths). So when you go to run code it only works for one or the other. The approach around this is to write code in your Rmd using relative paths to the project folder (as you would in a normal R script), and redirect knitr to use the project home folder. To do this insert the following code at the top of your rmd script.
```{r setup, include=FALSE}
library(knitr)
dd <- getwd()
knitr::opts_knit$set(root.dir = paste0(dd,'/../../'))
knitr::opts_chunk$set(cache.path = paste0(dd,'/cache/'))
knitr::opts_chunk$set(fig.path = paste0(dd,'/figures/'))
```
This code does the following:
first, finds the current directory for your rmd.
second, sets the project root directory. I keep my files two folders in, hence the '/../../' , this will need to be adjusted for your folder structure.
third, you need to set the cache folder path manually, as the default setting no longer works, hence cache wont work.
finally, do the same for the figures folder, as again you need to overwrite the default.
Happy coding.

ESS & Knitr/Sweave: How to source the Rnw file into an interactive session?

This is a terribly simple request, and I can't believe I haven't found the solution to this yet, but I've been searching for it far and wide without luck.
I have an .Rnw file loaded up in Emacs, I use M-n s to compile it.
Everything works well, and it even opens an R buffer. Great. But that buffer
is entirely useless: it doesn't contain the objects that I just sourced!
Example minimal .Rnw file:
\documentclass{article}
\begin{document}
<<>>=
foo <- "bar"
#
\end{document}
Using M-n s, I now have a new R-buffer with a session loaded up, but:
> foo
Error: object 'foo' not found
That is disappointing. I would like to play around with the data interactively.
How do I achieve that? I don't want to be sourcing the file line-by-line, or
region-by-region with C-c C-c or something similar every time I change my code.
Ideally, it should be just like RStudio's source function, that leaves me with
a fully prepared R session.
I haven't tried this with sweave yet, only with knitr.
EDIT: the eval=TRUE chunk option does not seem to result in the correct behaviour.
This behaviour was recently changed in ESS. Now sweave and knitr are executed directly in the global environment, as if when you write it yourself at command line. So wait for a couple of more weeks till ESSv13.09 is out or use the development version.
Alternatively, you can also set ess-swv-processing-command to "%s(%s)" and you will get the same result, except automatic library loading.
For the record, knitr (in contrast to sweave) evaluates everything in it's own environment unless you instruct it otherwise.
[edit: Something went wrong. I don't see the correct .ess_weave any more. Probably some git commit messup again. So it is not fixed in 13.09. Fixing it now. Sorry.]
Open an interactive R session, and then call Sweave directly, I believe like this (untested though). knitr works in the same way, though you need to load the knitr library first.
> Sweave("yourfile.Rnw")
There is some potential for peril here, though. If you call Sweave in a session after doing other things, your code can use things previously in the workspace, thus making your results unreproducible.

Making knitr run a r script: do I use read_chunk or source?

I am running R version 2.15.3 with RStudio version 0.97.312. I have one script that reads my data from various sources and creates several data.tables. I then have another r script which uses the data.tables created in the first script. I wanted to turn the second script into a R markdown script so that the results of analysis can be outputted as a report.
I do not know the purpose of read_chunk, as opposed to source. My read_chunk is not working, but source is working. With either instance I do not get to see the objects in my workspace panel of RStudio.
Please explain the difference between read_chunk and source? Why would I use one or the other? Why will my .Rmd script not work
Here is ridiculously simplified sample
It does not work. I get the following message
Error: object 'z' not found
Two simple files...
test of source to rmd.R
x <- 1:10
y <- 3:4
z <- x*y
testing source.Rmd
Can I run another script from Rmd
========================================================
Testing if I can run "test of source to rmd.R"
```{r first part}
require(knitr)
read_chunk("test of source to rmd.R")
a <- z-1000
a
```
The above worked only if I replaced "read_chunk" with "source". I
can use the vectors outside of the code chunk as in inline usage.
So here I will tell you that the first number is `r a[1]`. The most
interesting thing is that I cannot see the variables in RStudio
workspace but it must be there somewhere.
read_chunk() only reads the source code (for future references); it does not evaluate code like source(). The purpose of read_chunk() was explained in this page as well as the manual.
There isn't an option to run a chunk interactively from within knitr AFAIK. However, this can be done easily enough with something like:
#' Run a previously loaded chunk interactively
#'
#' Takes labeled code loaded with load_chunk and runs it in the /global/ envir (unless otherwise specified)
#'
#' #param chunkName The name of the chunk as a character string
#' #param envir The environment in which the chunk is to be evaluated
run_chunk <- function(chunkName,envir=.GlobalEnv) {
chunkName <- unlist(lapply(as.list(substitute(.(chunkName)))[-1], as.character))
eval(parse(text=knitr:::knit_code$get(chunkName)),envir=envir)
}
NULL
In case it helps anyone else, I've found using read_chunk() to read a script without evaluating can be useful in two ways. First, you might have a script with many chunks and want control over which ones run where (e.g., a plot or a table in a specific place). I use source when I want to run everything in a script (for example, at the start of a document to load a standard set of packages or custom functions). I've started using read_chunk early in the document to load scripts and then selectively run the chunks I want where I need them.
Second, if you are working with an R script directly or interactively, you might want a long preamble of code that loads packages, data, etc. Such a preamble, however, could be unnecessary and slow if, for example, prior code chunks in the main document already loaded data.

Resources