How can I print to the console when using knitr? - r

I am trying to print to the console (or the output window) for debugging purposes. For example:
\documentclass{article}
\begin{document}
<<foo>>=
print(getwd())
message(getwd())
message("ERROR:")
cat(getwd(), file=stderr())
not_a_command() # Does not throw an error?
stop("Why doesn't this throw an error?")
#
\end{document}
I get the results in the output PDF, but my problem is I have a script that is not completing (so there is no output PDF to check), and I'm trying to understand why. There also appears to be no log file output if the knitting doesn't complete successfully.
I am using knitr 1.13 and Rstudio 0.99.896.
EDIT: The above code will correctly output (and break) if I change to Sweave, so that makes me think it is a knitr issue.

This question has several aspects – and its partly a XY problem. At the core, the question is (as I read it):
How can I see what's wrong if knitr fails and doesn't produce an output file?
In case of PDF output, quite often compiling the output PDF fails after an error occurred, but there is still the intermediate TEX file. Opening this file may reveal error messages.
As suggested by Gregor, you can run the code in the chunks line by line in the console (or by chunk). However, this may not reproduce all problems, especially if they are related to the working directory or the environment.
capture.output can be used to print debug information to an external file.
Finally (as opposed to my earlier comment), it is possible to print on RStudio's progress window (or however it's called): Messages from hooks will be printed on the progress window. Basically, the message must come from knitr itself, not from the code knitr evaluates.
How can I print debug information on the progress window in RStudio?
The following example prints all objects in the environment after each chunk with debug = TRUE:
\documentclass{article}
\begin{document}
<<>>=
knitr::knit_hooks$set(debug = function(before, options, envir) {
if (!before) {
message(
paste(names(envir), as.list(envir),
sep = " = ", collapse = "\n"))
}
})
#
<<debug = TRUE>>=
a <- 5
foo <- "bar"
#
\end{document}
The progress window reads:
Of course, for documents with more or larger objects the hook should be adjusted to selectively print (parts of) objects.

Use stop(). For example, stop("Hello World").

Related

RStudio environment pane does not show knitr variables

I have a very simple knitr Rnw script. When I run it in RStudio, the Environment Pane shows that the global environment is empty although the script compiles into pdf correctly, and the variable is evaluated correctly.
\documentclass{article}
\begin{document}
<<settings, echo=FALSE>>=
library(knitr)
a<-1+10
#
The outcome is equal to \Sexpr{a}.
\end{document}
This worked fine always until recently. I wonder whether this has to do with some RStudio settings or knitr options. Variables in a regular R script show up fine in the environment pane. For more complex knitr projects, being able to look at the variables can make work much much easier.
Normally when you knit a document by clicking knit in RStudio, it is run in a separate R process, and the variables are deleted when done. Your code chunks won't be able to see variables in your environment, and those variables won't get modified.
There are ways to run in your current process: run each chunk as code, or run rmarkdown::render("somefilename.Rmd") in your main process. Then your document can see your current workspace and make modifications there.
For debugging, the 2nd approach is convenient, but for reproducibility of the final result, you should run the separate R process.
In the end, this is what worked for me. Instead of clicking on "Compile pdf" button in RStudio, I ran the following in the console:
knitr::knit2pdf("file.Rnw", envir = globalenv())
This had been recommended here:
knitr: why does nothing appear in the "environment" panel when I compile my .Rnw file

rmarkdown::render() output to stdout

When I render() an *.Rmd file locally in RStudio, the output from the render() function is displayed in the console:
Basic .Rmd file:
---
title: "test"
output: html_document
---
```{r setup, include=FALSE}
sink("./output.txt", type = 'output')
knitr::opts_chunk$set(echo = TRUE)
```
## Summary
```{r cars}
summary(cars)
```
## Error
```{r}
dbGetQuery()
```
To build:
library(rmarkdown)
render('./test.rmd')
Output:
This is great when I'm creating reports locally and I can see the progress and errors thrown (if any). I need to monitor this output in stdout (or stderr) but I can't sink this output to that location because knitr is using capture.input which uses sink() (see first comment). I even tried sinking to a file instead but though the file output.txt is created, there is nothing recorded in that file.
This is an issue for me because I'm using render() in a Docker container and I can't send the chunk output from the .Rmd file in the Docker container to stderr or stdout. I need to monitor the chunk output for errors in the container R code inside the .Rmd file (to diagnose connection db connection errors) and sending those chunks to stdout or stderr is the only way to do that (without logging in to the container which, in my use case (i.e., deployed to AWS) is impossible)
I've reviewed the knitr chunk options and there doesn't seem to be any option I can set to force the chunk output to a file or to stdout or stderr.
Is there some way I can write all of the chunk output to stdout or stderr inside of the render() function? This several-years old question is similar to mine (if not identical) but the accepted answer does not fit my use case
There are two approaches I would take to this problem - depending on what you want.
If you want to see the stderr/stdout output, as you would in a console, the simplest way to accomplish this is to use a shell script to render your doc, and pipe the output to text.
- The littler package
contains an example shell script render.r that is very useful for
this purpose. After installing littler, and ensuring that the scripts are available on your path, you can run:
render.r test.rmd > output.txt
or
render.r test.rmd > output.txt 2>&1
to include stderr in the output file.
However, I've found that this console output is often not sufficiently detailed for debugging and troubleshooting purposes.
So, another option is to edit the Rmd file to log more details about progress/errors & to direct the logger output to an external file.
This is particularly helpful in the context of a cloud or docker-container compute environment since the log can be directed to a datastore allowing for real-time logging, and searching logs across many jobs. Personally, I do this using the futile.logger package.
Using a logger created by futile.logger, this would work as follows:
In your Rmd file or in functions called by code in your Rmd file, redirect important error & warning messages to the logger. The best way to do this is the topic of another question, and in my experience varies by task.
At a minimum, this results in inserting a series of R commands like follows in your Rmd file:
library(futile.logger)
flog.info('Querying data .. ')
data <- tryCatch(dbGetQuery(...),
warning = function(war) {flog.warn(war)},
error = function(err) {flog.error(err)},
...)
possibly editing the logged messages to provide more context.
A more thorough solution will apply globally either to a code chunk or a file. I have not tested this personally in the context of an Rmd, but it might involve using withCallingHandlers, or changing options(error = custom_logging_function).
In your R session, before rendering your Rmd, redirect the logger output to a file or to your desired destination.
This looks something like:
library(futile.logger)
flog.logger(name = 'ROOT',
appender = appender.file('render.log'))
rmarkdown::render('my_document.Rmd')
As the document is rendering, you will see the logger output printed to the render.log file.
I would note that, while I actively use the futile.logger package, this package has now been deprecated into a new iteration of it, called logger. I haven't tried this approach specifically with logger, but I suspect it would work just as well if not better. The differences between logger & futile.logger are described very well in this vignette on migration, from the logger docs.
I believe the usual way to do this is to run a separate R instance and capture its output. Without any error checking:
output <- system2("R","-e \"rmarkdown::render('test.Rmd')\"",
stdout = TRUE, stderr = TRUE)
This puts all of the output into the output vector. Maybe you can run analysis code in the docker container to look for problems.

How to request an early exit when knitting an Rmd document?

Let's say you have an R markdown document that will not render cleanly.
I know you can set the knitr chunk option error to TRUE to request that evaluation continue, even in the presence of errors. You can do this for an individual chunk via error = TRUE or in a more global way via knitr::opts_chunk$set(error = TRUE).
But sometimes there are errors that are still fatal to the knitting process. Two examples I've recently encountered: trying to unlink() the current working directory (oops!) and calling rstudioapi::getVersion() from inline R code when RStudio is not available. Is there a general description of these sorts of errors, i.e. the ones beyond the reach of error = TRUE? Is there a way to tolerate errors in inline R code vs in chunks?
Also, are there more official ways to halt knitting early or to automate debugging in this situation?
To exit early from the knitting process, you may use the function knitr::knit_exit() anywhere in the source document (in a code chunk or inline expression). Once knit_exit() is called, knitr will ignore all the rest of the document and write out the results it has collected so far.
There is no way to tolerate errors in inline R code at the moment. You need to make sure inline R code always runs without errors1. If errors do occur, you should see the range of lines that produced the error from the knitr log in the console, of the form Quitting from lines x1-x2 (filename.Rmd). Then you can go to the file filename.Rmd and see what is wrong with the lines from x1 to x2. Same thing applies to code chunks with the chunk option error = FALSE.
Beyond the types of errors mentioned above, it may be tricky to find the source of the problem. For example, when you unintentionally unlink() the current directory, it should not stop the knitting process, because unlink() succeeded anyway. You may run into problems after the knitting process, e.g., LaTeX/HTML cannot find the output figure files. In this case, you can try to apply knit_exit() to all code chunks in the document one by one. One way to achieve this is to set up a chunk hook to run knit_exit() after a certain chunk. Below is an example of using linear search (you can improve it by using bisection instead):
#' Render an input document chunk by chunk until an error occurs
#'
#' #param input the input filename (an Rmd file in this example)
#' #param compile a function to compile the input file, e.g. knitr::knit, or
#' rmarkdown::render
knit_debug = function(input, compile = knitr::knit) {
library(knitr)
lines = readLines(input)
chunk = grep(all_patterns$md$chunk.begin, lines) # line number of chunk headers
knit_hooks$set(debug = function(before) {
if (!before) {
chunk_current <<- chunk_current + 1
if (chunk_current >= chunk_num) knit_exit()
}
})
opts_chunk$set(debug = TRUE)
# try to exit after the i-th chunk and see which chunk introduced the error
for (chunk_num in seq_along(chunk)) {
chunk_current = 0 # a chunk counter, incremented after each chunk
res = try(compile(input))
if (inherits(res, 'try-error')) {
message('The first error came from line ', chunk[chunk_num])
break
}
}
}
This is by design. I think it is a good idea to have error = TRUE for code chunks, since sometimes we want to show errors, for example, for teaching purposes. However, if I allow errors for inline code as well, authors may fail to recognize fatal errors in the inline code. Inline code is normally used to embed values inline, and I don't think it makes much sense if an inline value is an error. Imagine a sentence in a report like The P-value of my test is ERROR, and if knitr didn't signal the error, it will require the authors to read the report output very carefully to spot this issue. I think it is a bad idea to have to rely on human eyes to find such mistakes.
IMHO, difficulty debugging an Rmd document is a warning that something is wrong. I have a rule of thumb: Do the heavy lifting outside the Rmd. Do rendering inside the Rmd, and only rendering. That keeps the Rmd code simple.
My large R programs look like this.
data <- loadData()
analytics <- doAnalytics(data)
rmarkdown::render("theDoc.Rmd", envir=analytics)
(Here, doAnalytics returns a list or environment. That list or environment gets passed to the Rmd document via the envir parameter, making the results of the analytics computations available inside the document.)
The doAnalytics function does the complicated calculations. I can debug it using the regular tools, and I can easily check its output. By the time I call rmarkdown::render, I know the hard stuff is working correctly. The Rmd code is just "print this" and "format that", easy to debug.
This division of responsibility has served me well, and I can recommend it. Especially compared to the mind-bending task of debugging complicated calculations buried inside a dynamically rendered document.

How to define Sweave driver on RStudio

I'm using sweave package to make a report based on my R code. However, since some code chunks take too much time to process, I'm planning to use cacheSweave package to avoid this issue.
In cacheSweave's vignette, it says I need to specify a driver
Sweave("foo.Rnw", driver = cacheSweaveDriver)
However, I would like to keep using the "Compile PDF" button inside RStudio, so that it automatically runs Sweave command and pdflatex as well.
How do I tell RStudio to use that specific driver when calling Sweave function?
The expected result is that when I process the following ".Rnw" code twice (example based on code taken from cacheSweave's vignette), the second time will be much faster since data is cached.
\documentclass{article}
\begin{document}
\SweaveOpts{concordance=TRUE}
<cache=TRUE>>=
set.seed(1)
x <- local({
Sys.sleep(10)
rnorm(100)
})
results <- mean(x)
#
\end{document}
Sweave function help says *Environment variable SWEAVE_OPTIONS can be used to override the initial options set by the driver*. So I tried the following command in RStudio console,
Sys.setenv(SWEAVE_OPTIONS="driver=cacheSweaveDriver")
then "Compile PDF" twice again, but no success.
Solution:
As posted in this "ghost" blog, I created a file named .Rprofile in my working directory with the following content:
library(utils)
library(cacheSweave)
assignInNamespace("RweaveLatex", cacheSweave::cacheSweaveDriver, "utils")

Knitr: redirect chunk code output to terminal

I want to monitor some pretty lengthy parallelized computations embedded in a knitr file.
The computations rely on a package I have written, and the relevant function uses mclapply from the multicore package for parallelization. This function outputs progress bars to monitor the progress of the computations using a slightly modified implementation of txtProgressBar from the utils package. The progress bar is printed to the terminal and updated through a fifo connection every time an iteration of mclapply is complete.
This works fine when sourcing from a file or calling directly the function, but I find no way to get this to work within knitr. I have tried the relevant chunk options, I can get messages and warnings redirected to the terminal, but not the progress bar. Can anyone help?
Sorry for not providing a minimal working example but I don't see how I could make one in this setting.
Because txtProgressBar() writes to stdout, and knitr captures everything in stdout, so currently it is not easy to show your progress bar if it is text-based and writes to stdout. Perhaps I can use evaluate::evaluate(debug = TRUE) internally to achieve what you want, but I'm not entirely sure if that works well with the text progress bar.
My suggestions at the moment are:
use a GUI-based progress bar like tcltk::tkProgressBar()
write the progress to other places, e.g. (ab)using stderr
```{r progress}
pb = txtProgressBar(min = 0, max = 100, file = stderr())
for (i in 1:100) {
setTxtProgressBar(pb, i)
Sys.sleep(0.05)
}
close(pb)
```
or use your function outside a code chunk, e.g. in an inline expression (such as \Sexpr{my_multicore_function()} in Rnw or `r my_cool_fun()` in Rmd), because inline evaluation does not capture stdout
Having read the point about printing progress bar to stderr in Yihui's answer I would suggest temporary redirecting stdout to stderr with sink().
sink(stderr())
your_code()
sink()

Resources