I have set the root.dir in the settings:
```{r setup, include=TRUE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(root.dir = "E:/something/Lists")
```
with getwd() I can check that it worked.
On top of that, I have set the working directory to exactly the same path in the RStudio tools.
This works without problems for the code chunks. E.g. I load a table and it appears correctly in the global environment:
```{r, include = TRUE, echo = FALSE}
fevofe<-read.table(file = "112_Auswertg.csv", header = T, sep=";", dec=".")
```
However, I have some inline code in order to have some numbers generated within my plain text.
"All in all we found `r length(fevove[,1])` specimens..."
And this inline code does not react to setting the root.dir. As soon as I try to run these code bits, R markdown keeps telling me that it hasn't found the object - although it is already in the global environment, because it was loaded in the previous code chunk.
After executing this inline code and the error, I ask R markdown getwd() and suddenly it is my Documents folder!
Consequently, when knitting, the process is cancelled, because the object is lost and then I get the Error:
Error in eval(parse_only(code), envir=envir) : object 'fevofe' not found calls <Anonymous>... inline_exec -> hook_eval -> withvisible -> eval -> eval
Has anybody an idea what causes this stubborn resetting of the working directory?
Any hint is welcome
so this is an interesting case:
The basic problem was a typo: fevofe was in the problematic lines spelled fevove...
Correcting this helped knitr to complete the job.
However, it seems strange enough that this causes R markdown to reset the working directory to something that is not even the standard in the options.
This topic remains interesting...
Related
I've grown tired of repeating the beginning of R Markdown documents over and over again to set up my preferences for knitting and chunk options. An example:
```{r, include=FALSE}
library(tidyverse)
knitr::opts_chunk$set(error = FALSE, message = FALSE,
warning = FALSE, fig.align = 'center')
knitr::opts_knit$set(root.dir = 'DATA/PATH'))
```
If you use this chunk to begin the majority of you .Rmd files, can you embed this into the .Renviron or .RProfile for an R Project? I know you can load libraries via your .RProfile, and I know knitr::opts_knit$set also has some sort of integration with options(knitr.package.foo) in your .RProfile, but I have been unsuccessful in getting the options() route to work, and it only accounts for the knitting options, excluding the chunk options. It feels like there should be an easier way to reproduce this, but I can't find any resources exploring it.
The recommended way is to develop a small package that includes it following the steps outlined here: https://rstudio.github.io/rstudio-extensions/rmarkdown_templates.html
Here's an example: https://github.com/rstudio/rticles
As it's bad practice to link-only, here are the steps:
Create a package
Create the folder: inst/rmarkdown/templates/my_template
Inside of this folder create a file called template.yaml and
skeleton/skeleton.Rmd
Build and install the package and the templates should show up.
template.yaml will contain the topmatter YAML
skeleton/skeleton.Rmd will contain your default Rmd.
Alternative:
Edit the default files for your Rstudio installation. Depending on your platform they are in something like resources/r_markdown_v2.Rmd folder of your Rstudio install. But, this will get wiped out anytime you update RStudio.
When I render() an *.Rmd file locally in RStudio, the output from the render() function is displayed in the console:
Basic .Rmd file:
---
title: "test"
output: html_document
---
```{r setup, include=FALSE}
sink("./output.txt", type = 'output')
knitr::opts_chunk$set(echo = TRUE)
```
## Summary
```{r cars}
summary(cars)
```
## Error
```{r}
dbGetQuery()
```
To build:
library(rmarkdown)
render('./test.rmd')
Output:
This is great when I'm creating reports locally and I can see the progress and errors thrown (if any). I need to monitor this output in stdout (or stderr) but I can't sink this output to that location because knitr is using capture.input which uses sink() (see first comment). I even tried sinking to a file instead but though the file output.txt is created, there is nothing recorded in that file.
This is an issue for me because I'm using render() in a Docker container and I can't send the chunk output from the .Rmd file in the Docker container to stderr or stdout. I need to monitor the chunk output for errors in the container R code inside the .Rmd file (to diagnose connection db connection errors) and sending those chunks to stdout or stderr is the only way to do that (without logging in to the container which, in my use case (i.e., deployed to AWS) is impossible)
I've reviewed the knitr chunk options and there doesn't seem to be any option I can set to force the chunk output to a file or to stdout or stderr.
Is there some way I can write all of the chunk output to stdout or stderr inside of the render() function? This several-years old question is similar to mine (if not identical) but the accepted answer does not fit my use case
There are two approaches I would take to this problem - depending on what you want.
If you want to see the stderr/stdout output, as you would in a console, the simplest way to accomplish this is to use a shell script to render your doc, and pipe the output to text.
- The littler package
contains an example shell script render.r that is very useful for
this purpose. After installing littler, and ensuring that the scripts are available on your path, you can run:
render.r test.rmd > output.txt
or
render.r test.rmd > output.txt 2>&1
to include stderr in the output file.
However, I've found that this console output is often not sufficiently detailed for debugging and troubleshooting purposes.
So, another option is to edit the Rmd file to log more details about progress/errors & to direct the logger output to an external file.
This is particularly helpful in the context of a cloud or docker-container compute environment since the log can be directed to a datastore allowing for real-time logging, and searching logs across many jobs. Personally, I do this using the futile.logger package.
Using a logger created by futile.logger, this would work as follows:
In your Rmd file or in functions called by code in your Rmd file, redirect important error & warning messages to the logger. The best way to do this is the topic of another question, and in my experience varies by task.
At a minimum, this results in inserting a series of R commands like follows in your Rmd file:
library(futile.logger)
flog.info('Querying data .. ')
data <- tryCatch(dbGetQuery(...),
warning = function(war) {flog.warn(war)},
error = function(err) {flog.error(err)},
...)
possibly editing the logged messages to provide more context.
A more thorough solution will apply globally either to a code chunk or a file. I have not tested this personally in the context of an Rmd, but it might involve using withCallingHandlers, or changing options(error = custom_logging_function).
In your R session, before rendering your Rmd, redirect the logger output to a file or to your desired destination.
This looks something like:
library(futile.logger)
flog.logger(name = 'ROOT',
appender = appender.file('render.log'))
rmarkdown::render('my_document.Rmd')
As the document is rendering, you will see the logger output printed to the render.log file.
I would note that, while I actively use the futile.logger package, this package has now been deprecated into a new iteration of it, called logger. I haven't tried this approach specifically with logger, but I suspect it would work just as well if not better. The differences between logger & futile.logger are described very well in this vignette on migration, from the logger docs.
I believe the usual way to do this is to run a separate R instance and capture its output. Without any error checking:
output <- system2("R","-e \"rmarkdown::render('test.Rmd')\"",
stdout = TRUE, stderr = TRUE)
This puts all of the output into the output vector. Maybe you can run analysis code in the docker container to look for problems.
I have just started testing Rmarkdown for use in creating a codebook of a dataset, and I am quite puzzled by its behaviour when using cache = TRUE. I'm running it on using RStudio 1.1.463. rmarkdown_1.11, knitr_1.21 and the tidyverse_1.2.1.
Take the following sample code which includes some doc and chunk options I'm interested in, attaches all libraries I normally use (noting that I've added "|" in a couple of places for appropriate formatting on SO):
---
title: "Test"
date: 2019-03-11
output:
html_document
---
```{r header, echo= FALSE, include=FALSE, cache = TRUE, warning= FALSE}
attach(mtcars)
library(sf)
library(tidyverse)
library(knitr)
library(summarytools)
opts_chunk$set(echo = FALSE, error = TRUE)
|```
# mtcars dataset heading
## map of car purchases
## cyl variable
```{r}
kable(descr(cyl))
|```
When I hit the Knit button on RStudio for the first time (without an existing cache folder), the results are as expected. If I hit Knit again, the following happens:
cyl is not found
kable, descr both throw 'could not find function' errors
If the parent packages/dataframes are called explicitly, these problems disappear. If cache = FALSE there are no issues.
Why would cache = TRUE trigger this behaviour? For this codebook, I thought of attaching the final dataset and then present some summaries for each variable. I would also like to generate a couple of sf maps with many of the variables. I thought of processing everything in such a header chunk, and then call on various bits throughout the document. Should I think differently?
Incidentally, I don't quite understand why it is necessary to explicitly library(knitr) on an Rmarkdown document as I thought it was a key package to 'knit' the document... If I remove it, opts_chunk is not found.
Thanks for any help!
I believe cache = TRUE tries to cache the R objects created in a chunk. Your first chunk does a lot more than just create objects: the attach and require calls each have side effects: modifying the search list, loading packages, etc. Those side effects aren't cached, but they are needed for your document to work: since knitr sees no reason to run the chunk again your document fails on the second run.
You normally use cache = TRUE when the chunk does a long slow computation to produce a dataset for later plotting or summarizing, because then subsequent runs can skip the slow part of the computation.
You ask why require(knitr) is needed. Strictly speaking, it's not needed: you could have used knitr::opts_chunk instead. But more to the point, the idea is that an R Markdown document is a description of a standalone R session. Yes, you need knitr to process it, but it should give the same results as if you just ran the code on its own in an empty session. (This isn't exactly true: knitr's chunk options and hooks modify the behaviour a bit, but it's a convenient mental model of what's going on.)
Libraries are not time-consuming.
I usually pull apart libraries and setup from data in two differents chunks, and the last one set cache=TRUE.
I am trying to print to the console (or the output window) for debugging purposes. For example:
\documentclass{article}
\begin{document}
<<foo>>=
print(getwd())
message(getwd())
message("ERROR:")
cat(getwd(), file=stderr())
not_a_command() # Does not throw an error?
stop("Why doesn't this throw an error?")
#
\end{document}
I get the results in the output PDF, but my problem is I have a script that is not completing (so there is no output PDF to check), and I'm trying to understand why. There also appears to be no log file output if the knitting doesn't complete successfully.
I am using knitr 1.13 and Rstudio 0.99.896.
EDIT: The above code will correctly output (and break) if I change to Sweave, so that makes me think it is a knitr issue.
This question has several aspects – and its partly a XY problem. At the core, the question is (as I read it):
How can I see what's wrong if knitr fails and doesn't produce an output file?
In case of PDF output, quite often compiling the output PDF fails after an error occurred, but there is still the intermediate TEX file. Opening this file may reveal error messages.
As suggested by Gregor, you can run the code in the chunks line by line in the console (or by chunk). However, this may not reproduce all problems, especially if they are related to the working directory or the environment.
capture.output can be used to print debug information to an external file.
Finally (as opposed to my earlier comment), it is possible to print on RStudio's progress window (or however it's called): Messages from hooks will be printed on the progress window. Basically, the message must come from knitr itself, not from the code knitr evaluates.
How can I print debug information on the progress window in RStudio?
The following example prints all objects in the environment after each chunk with debug = TRUE:
\documentclass{article}
\begin{document}
<<>>=
knitr::knit_hooks$set(debug = function(before, options, envir) {
if (!before) {
message(
paste(names(envir), as.list(envir),
sep = " = ", collapse = "\n"))
}
})
#
<<debug = TRUE>>=
a <- 5
foo <- "bar"
#
\end{document}
The progress window reads:
Of course, for documents with more or larger objects the hook should be adjusted to selectively print (parts of) objects.
Use stop(). For example, stop("Hello World").
Let's say you have an R markdown document that will not render cleanly.
I know you can set the knitr chunk option error to TRUE to request that evaluation continue, even in the presence of errors. You can do this for an individual chunk via error = TRUE or in a more global way via knitr::opts_chunk$set(error = TRUE).
But sometimes there are errors that are still fatal to the knitting process. Two examples I've recently encountered: trying to unlink() the current working directory (oops!) and calling rstudioapi::getVersion() from inline R code when RStudio is not available. Is there a general description of these sorts of errors, i.e. the ones beyond the reach of error = TRUE? Is there a way to tolerate errors in inline R code vs in chunks?
Also, are there more official ways to halt knitting early or to automate debugging in this situation?
To exit early from the knitting process, you may use the function knitr::knit_exit() anywhere in the source document (in a code chunk or inline expression). Once knit_exit() is called, knitr will ignore all the rest of the document and write out the results it has collected so far.
There is no way to tolerate errors in inline R code at the moment. You need to make sure inline R code always runs without errors1. If errors do occur, you should see the range of lines that produced the error from the knitr log in the console, of the form Quitting from lines x1-x2 (filename.Rmd). Then you can go to the file filename.Rmd and see what is wrong with the lines from x1 to x2. Same thing applies to code chunks with the chunk option error = FALSE.
Beyond the types of errors mentioned above, it may be tricky to find the source of the problem. For example, when you unintentionally unlink() the current directory, it should not stop the knitting process, because unlink() succeeded anyway. You may run into problems after the knitting process, e.g., LaTeX/HTML cannot find the output figure files. In this case, you can try to apply knit_exit() to all code chunks in the document one by one. One way to achieve this is to set up a chunk hook to run knit_exit() after a certain chunk. Below is an example of using linear search (you can improve it by using bisection instead):
#' Render an input document chunk by chunk until an error occurs
#'
#' #param input the input filename (an Rmd file in this example)
#' #param compile a function to compile the input file, e.g. knitr::knit, or
#' rmarkdown::render
knit_debug = function(input, compile = knitr::knit) {
library(knitr)
lines = readLines(input)
chunk = grep(all_patterns$md$chunk.begin, lines) # line number of chunk headers
knit_hooks$set(debug = function(before) {
if (!before) {
chunk_current <<- chunk_current + 1
if (chunk_current >= chunk_num) knit_exit()
}
})
opts_chunk$set(debug = TRUE)
# try to exit after the i-th chunk and see which chunk introduced the error
for (chunk_num in seq_along(chunk)) {
chunk_current = 0 # a chunk counter, incremented after each chunk
res = try(compile(input))
if (inherits(res, 'try-error')) {
message('The first error came from line ', chunk[chunk_num])
break
}
}
}
This is by design. I think it is a good idea to have error = TRUE for code chunks, since sometimes we want to show errors, for example, for teaching purposes. However, if I allow errors for inline code as well, authors may fail to recognize fatal errors in the inline code. Inline code is normally used to embed values inline, and I don't think it makes much sense if an inline value is an error. Imagine a sentence in a report like The P-value of my test is ERROR, and if knitr didn't signal the error, it will require the authors to read the report output very carefully to spot this issue. I think it is a bad idea to have to rely on human eyes to find such mistakes.
IMHO, difficulty debugging an Rmd document is a warning that something is wrong. I have a rule of thumb: Do the heavy lifting outside the Rmd. Do rendering inside the Rmd, and only rendering. That keeps the Rmd code simple.
My large R programs look like this.
data <- loadData()
analytics <- doAnalytics(data)
rmarkdown::render("theDoc.Rmd", envir=analytics)
(Here, doAnalytics returns a list or environment. That list or environment gets passed to the Rmd document via the envir parameter, making the results of the analytics computations available inside the document.)
The doAnalytics function does the complicated calculations. I can debug it using the regular tools, and I can easily check its output. By the time I call rmarkdown::render, I know the hard stuff is working correctly. The Rmd code is just "print this" and "format that", easy to debug.
This division of responsibility has served me well, and I can recommend it. Especially compared to the mind-bending task of debugging complicated calculations buried inside a dynamically rendered document.