Knitr: redirect chunk code output to terminal - r

I want to monitor some pretty lengthy parallelized computations embedded in a knitr file.
The computations rely on a package I have written, and the relevant function uses mclapply from the multicore package for parallelization. This function outputs progress bars to monitor the progress of the computations using a slightly modified implementation of txtProgressBar from the utils package. The progress bar is printed to the terminal and updated through a fifo connection every time an iteration of mclapply is complete.
This works fine when sourcing from a file or calling directly the function, but I find no way to get this to work within knitr. I have tried the relevant chunk options, I can get messages and warnings redirected to the terminal, but not the progress bar. Can anyone help?
Sorry for not providing a minimal working example but I don't see how I could make one in this setting.

Because txtProgressBar() writes to stdout, and knitr captures everything in stdout, so currently it is not easy to show your progress bar if it is text-based and writes to stdout. Perhaps I can use evaluate::evaluate(debug = TRUE) internally to achieve what you want, but I'm not entirely sure if that works well with the text progress bar.
My suggestions at the moment are:
use a GUI-based progress bar like tcltk::tkProgressBar()
write the progress to other places, e.g. (ab)using stderr
```{r progress}
pb = txtProgressBar(min = 0, max = 100, file = stderr())
for (i in 1:100) {
setTxtProgressBar(pb, i)
Sys.sleep(0.05)
}
close(pb)
```
or use your function outside a code chunk, e.g. in an inline expression (such as \Sexpr{my_multicore_function()} in Rnw or `r my_cool_fun()` in Rmd), because inline evaluation does not capture stdout

Having read the point about printing progress bar to stderr in Yihui's answer I would suggest temporary redirecting stdout to stderr with sink().
sink(stderr())
your_code()
sink()

Related

How can I print to the console when using knitr?

I am trying to print to the console (or the output window) for debugging purposes. For example:
\documentclass{article}
\begin{document}
<<foo>>=
print(getwd())
message(getwd())
message("ERROR:")
cat(getwd(), file=stderr())
not_a_command() # Does not throw an error?
stop("Why doesn't this throw an error?")
#
\end{document}
I get the results in the output PDF, but my problem is I have a script that is not completing (so there is no output PDF to check), and I'm trying to understand why. There also appears to be no log file output if the knitting doesn't complete successfully.
I am using knitr 1.13 and Rstudio 0.99.896.
EDIT: The above code will correctly output (and break) if I change to Sweave, so that makes me think it is a knitr issue.
This question has several aspects – and its partly a XY problem. At the core, the question is (as I read it):
How can I see what's wrong if knitr fails and doesn't produce an output file?
In case of PDF output, quite often compiling the output PDF fails after an error occurred, but there is still the intermediate TEX file. Opening this file may reveal error messages.
As suggested by Gregor, you can run the code in the chunks line by line in the console (or by chunk). However, this may not reproduce all problems, especially if they are related to the working directory or the environment.
capture.output can be used to print debug information to an external file.
Finally (as opposed to my earlier comment), it is possible to print on RStudio's progress window (or however it's called): Messages from hooks will be printed on the progress window. Basically, the message must come from knitr itself, not from the code knitr evaluates.
How can I print debug information on the progress window in RStudio?
The following example prints all objects in the environment after each chunk with debug = TRUE:
\documentclass{article}
\begin{document}
<<>>=
knitr::knit_hooks$set(debug = function(before, options, envir) {
if (!before) {
message(
paste(names(envir), as.list(envir),
sep = " = ", collapse = "\n"))
}
})
#
<<debug = TRUE>>=
a <- 5
foo <- "bar"
#
\end{document}
The progress window reads:
Of course, for documents with more or larger objects the hook should be adjusted to selectively print (parts of) objects.
Use stop(). For example, stop("Hello World").

How to request an early exit when knitting an Rmd document?

Let's say you have an R markdown document that will not render cleanly.
I know you can set the knitr chunk option error to TRUE to request that evaluation continue, even in the presence of errors. You can do this for an individual chunk via error = TRUE or in a more global way via knitr::opts_chunk$set(error = TRUE).
But sometimes there are errors that are still fatal to the knitting process. Two examples I've recently encountered: trying to unlink() the current working directory (oops!) and calling rstudioapi::getVersion() from inline R code when RStudio is not available. Is there a general description of these sorts of errors, i.e. the ones beyond the reach of error = TRUE? Is there a way to tolerate errors in inline R code vs in chunks?
Also, are there more official ways to halt knitting early or to automate debugging in this situation?
To exit early from the knitting process, you may use the function knitr::knit_exit() anywhere in the source document (in a code chunk or inline expression). Once knit_exit() is called, knitr will ignore all the rest of the document and write out the results it has collected so far.
There is no way to tolerate errors in inline R code at the moment. You need to make sure inline R code always runs without errors1. If errors do occur, you should see the range of lines that produced the error from the knitr log in the console, of the form Quitting from lines x1-x2 (filename.Rmd). Then you can go to the file filename.Rmd and see what is wrong with the lines from x1 to x2. Same thing applies to code chunks with the chunk option error = FALSE.
Beyond the types of errors mentioned above, it may be tricky to find the source of the problem. For example, when you unintentionally unlink() the current directory, it should not stop the knitting process, because unlink() succeeded anyway. You may run into problems after the knitting process, e.g., LaTeX/HTML cannot find the output figure files. In this case, you can try to apply knit_exit() to all code chunks in the document one by one. One way to achieve this is to set up a chunk hook to run knit_exit() after a certain chunk. Below is an example of using linear search (you can improve it by using bisection instead):
#' Render an input document chunk by chunk until an error occurs
#'
#' #param input the input filename (an Rmd file in this example)
#' #param compile a function to compile the input file, e.g. knitr::knit, or
#' rmarkdown::render
knit_debug = function(input, compile = knitr::knit) {
library(knitr)
lines = readLines(input)
chunk = grep(all_patterns$md$chunk.begin, lines) # line number of chunk headers
knit_hooks$set(debug = function(before) {
if (!before) {
chunk_current <<- chunk_current + 1
if (chunk_current >= chunk_num) knit_exit()
}
})
opts_chunk$set(debug = TRUE)
# try to exit after the i-th chunk and see which chunk introduced the error
for (chunk_num in seq_along(chunk)) {
chunk_current = 0 # a chunk counter, incremented after each chunk
res = try(compile(input))
if (inherits(res, 'try-error')) {
message('The first error came from line ', chunk[chunk_num])
break
}
}
}
This is by design. I think it is a good idea to have error = TRUE for code chunks, since sometimes we want to show errors, for example, for teaching purposes. However, if I allow errors for inline code as well, authors may fail to recognize fatal errors in the inline code. Inline code is normally used to embed values inline, and I don't think it makes much sense if an inline value is an error. Imagine a sentence in a report like The P-value of my test is ERROR, and if knitr didn't signal the error, it will require the authors to read the report output very carefully to spot this issue. I think it is a bad idea to have to rely on human eyes to find such mistakes.
IMHO, difficulty debugging an Rmd document is a warning that something is wrong. I have a rule of thumb: Do the heavy lifting outside the Rmd. Do rendering inside the Rmd, and only rendering. That keeps the Rmd code simple.
My large R programs look like this.
data <- loadData()
analytics <- doAnalytics(data)
rmarkdown::render("theDoc.Rmd", envir=analytics)
(Here, doAnalytics returns a list or environment. That list or environment gets passed to the Rmd document via the envir parameter, making the results of the analytics computations available inside the document.)
The doAnalytics function does the complicated calculations. I can debug it using the regular tools, and I can easily check its output. By the time I call rmarkdown::render, I know the hard stuff is working correctly. The Rmd code is just "print this" and "format that", easy to debug.
This division of responsibility has served me well, and I can recommend it. Especially compared to the mind-bending task of debugging complicated calculations buried inside a dynamically rendered document.

ESS & Knitr/Sweave: How to source the Rnw file into an interactive session?

This is a terribly simple request, and I can't believe I haven't found the solution to this yet, but I've been searching for it far and wide without luck.
I have an .Rnw file loaded up in Emacs, I use M-n s to compile it.
Everything works well, and it even opens an R buffer. Great. But that buffer
is entirely useless: it doesn't contain the objects that I just sourced!
Example minimal .Rnw file:
\documentclass{article}
\begin{document}
<<>>=
foo <- "bar"
#
\end{document}
Using M-n s, I now have a new R-buffer with a session loaded up, but:
> foo
Error: object 'foo' not found
That is disappointing. I would like to play around with the data interactively.
How do I achieve that? I don't want to be sourcing the file line-by-line, or
region-by-region with C-c C-c or something similar every time I change my code.
Ideally, it should be just like RStudio's source function, that leaves me with
a fully prepared R session.
I haven't tried this with sweave yet, only with knitr.
EDIT: the eval=TRUE chunk option does not seem to result in the correct behaviour.
This behaviour was recently changed in ESS. Now sweave and knitr are executed directly in the global environment, as if when you write it yourself at command line. So wait for a couple of more weeks till ESSv13.09 is out or use the development version.
Alternatively, you can also set ess-swv-processing-command to "%s(%s)" and you will get the same result, except automatic library loading.
For the record, knitr (in contrast to sweave) evaluates everything in it's own environment unless you instruct it otherwise.
[edit: Something went wrong. I don't see the correct .ess_weave any more. Probably some git commit messup again. So it is not fixed in 13.09. Fixing it now. Sorry.]
Open an interactive R session, and then call Sweave directly, I believe like this (untested though). knitr works in the same way, though you need to load the knitr library first.
> Sweave("yourfile.Rnw")
There is some potential for peril here, though. If you call Sweave in a session after doing other things, your code can use things previously in the workspace, thus making your results unreproducible.

Strange output from fread when called from knitr

I'm using the recently introduced fread function from data.table to read data files.
When I wrap my code into a knitr (Rmd) document, I noticed some strange output, namely lines like:
##
0%
even though the verbose option of fread was set to FALSE. I've used sink to hide this output, but I'd like to report the exact problem to the package author(s). Here's a minimal example,
library(knitr)
test = "```{r}
require(data.table)
fread('1 2 3\n')
```"
knit2html(text=test, output="test.html")
browseURL("test.html")
What is the 0% output?
It's a % progress counter. For me it prints 0%, 5%, 10%, ... 95%, 100% (for example) with a \r at the end to make it appear on one line just underneath the call to fread when typed at the prompt.
But when called from functions, batches and knitr this is undesirable. This has now been removed. From NEWS for v1.8.9 (rev 851) :
% progress console meter has been removed. The ouput was inconvenient in batch mode, log files and reports which don't handle \r. It was too difficult to detect where fread is being called from, plus, removing it speeds up fread a little by saving code inside the C for loop (which is why it wasn't made optional instead). Use your operating system's system monitor to confirm fread is progressing. Thanks to Baptiste for highlighting :
Strange output from fread when called from knitr
Just a quick reminder for completeness. From the top of ?fread :
This function is still under development. For example, dates are read
as character (they can be converted afterwards using the excellent
fasttime package or standard base functions) and embedded quotes ("\""
and """") have problems. There are other known issues that haven't
been fixed and features not yet implemented. But, you may find it
works in many cases. Please report problems to datatable-help or Stack
Overflow's data.table tag.
Not for production use yet. Not because it's unstable in the sense
that it crashes or is buggy (your testing will show whether it is
stable in your cases or not) but because fread's arguments and
behaviour is likely to change in future; i.e., we expect to make
(hopefully minor) non-backwards-compatible changes. Why has it been
released to CRAN then? Because a maintenance release was asked for by
CRAN maintainers to comply with new stricter tests in R-devel, and a
few Bioconductor packages depend on data.table and Bioconductor
requires packages to pass R-devel checks. It was quicker to leave
fread in and write these paragraphs, than take fread out.
It isn't a problem to be reported.
As stated by Matthew Dowle, this is a progress counter from fread
You can set results = 'hide' to avoid these results being included
library(knitr)
test = "```{r, results = 'hide'}
require(data.table)
fread('1 2 3\n')
```"
knit2html(text=test, output="test.html")
browseURL("test.html")
Look, no progress bar.
At a practical level, I think it would be sensible to have results = 'hide' or even include = FALSE for a step like this.
You will not want to repeat this kind of reading in step, practically, you only ever want to read the data in once, then you would serialize it (using save, saveRDS or similar), so you could use that next time (which would be faster).
Edit in light of the comment
I would split the processing up into a number of smaller chunks. You could then not include the reading in chunk, but include a dummy version that is not evaluated (so you can see the code, but not include the results)
```{r libraries}
require(data.table)
```
```{r loaddata, include = FALSE}
DT <- fread('yourfile')
```
```{r loaddummy, ref.label = 'loaddata', eval = FALSE, echo = TRUE}
```
```{r dostuff}
# doing other stuff
```
There is a parameter called showProgress in fread, if you set it to FALSE, then you will not see the progress output. (It's useful in making r markdown.)

avoid displayed figures during sweave/pgfsweave compilation

When compiling with sweave/pgfsweave, every time a figure is created in R it is shown in a graphics windows (during the sweave compilation process). This is helpful in many cases as I can see what the figures look like as the document is being compiled.
But when I compile through ssh a large document this can be very slow. Is there a way to tell sweave/pgfsweave to avoid displaying the figure during the compilation (I still want the figure in the final pdf document though).
For interactive sessions, the figs.only Sweave option controls this behavior. To plot figures only to the target graphics files (and not to a console graphical window) set figs.only=TRUE.
As explained in the RweaveLatex help file:
figs.only: logical (‘FALSE’). By default each figure chunk is run
once, then re-run for each selected type of graphics. That
will open a default graphics device for the first figure
chunk and use that device for the first evaluation of all
subsequent chunks. If this option is true, the figure chunk
is run only for each selected type of graphics, for which a
new graphics device is opened and then closed.
As with other Sweave options, you can set this option: (1) for the current compilation (e.g. Sweave("example.Rnw", figs.only=TRUE); (2) within the .Rnw file, using \SweaveOpts{figs.only=TRUE}; or (3) as a global default, by putting SWEAVE_OPTIONS="figs.only=TRUE" in, e.g., $R_HOME/etc/Renviron.site
figs.only is the correct way to go, and I also want to mention the default graphical device in R here:
For now you may look at this: http://yihui.name/en/2010/12/a-special-graphics-device-in-r-the-null-device/
After R 2.14.1 (not released yet) you will be able to set the default device to a null PDF device, which is both safe and fast: https://github.com/yihui/knitr/issues/9
If you sweave from the command line instead of in an interactive session, graphics aren't produced in an interactive graphic window.
You can run R from the command line by just typing R CMD Sweave mydoc.Rnw or via a batch file, or a makefile for larger projects. I've started to use makefiles for many of my sweave documents as it handles dependencies, can clear up after itself and much more.
One option could be
<<label=myplotlabel, fig=TRUE, include=FALSE>>=
graph code
#
then
\begin{figure}[h]
\includegraphics[width=6cm, height=6cm]{myplotlabel}
\caption{My Plot}
\label{fig:label}
\end{figure}

Resources