R2HTML or knitr for dynamic report generation? - r

I want to write an R function which processes some data and then automatically outputs an html report. This report should contain some fixed text, some text changing according to the underlying data and some figures.
What is the best way to go?
R2HTML or knitr?
What are the advantages of one over the other?
As far as I understood R2HTML allows me to build the html file sequentially while knitr already operates on an predefined .Rhtml file.
So, either use R2HTML or stitch and spin from knitr for on the fly report generation.
I would appreciate any suggestions or hints.

I grab this nice opportunity to promote pander a bit :)
This package was written for similar reasons like #Yihui's great knitr, although I wanted to let users really concentrate on the text and R code without dealing with chunk options etc. So letting users generate pretty HTML, pdf or even docx or odt output automatically with some predefined options.
These options affects e.g. the cache engine (handling dependencies without any chunk options) or the default plot options (let it be a "base" R graphics, lattice or ggplot2), so that you do no thave to set the color palette or the minor grid in each of your plots, just once - or live with the package defaults :)
The package captures the results (besides errors/warnings and other messages and the output) of all run R expression and can convert to Pandoc's markdown automatically. There are some helper functions that let you convert the resulting document written in a brew-like syntax automatically to e.g. HTML if you have pandoc installed, or export R objects to markdown/HTML/any other supported format in a live R session with a reference class.
Short demo:
brew file
Pandoc.brew('file_name.brew', output = 'foo.html', convert = 'html')
HTML output

knitr, every time. Handles graphics, lets you write your report with markdown instead of having to write html everywhere (if you want), caches things, makes coffee for you etc.

You can also build an HTML file sequentially as long as you have a decent text editor like Emacs/ESS or RStudio, etc. R2HTML is excellent in terms of its wide support to many R objects (see methods(HTML)), but I'll probably frown on RweaveHTML() due to its root Sweave().
That said, I think it may be a good idea to combine R2HTML and knitr, e.g.
# A LOESS Example
```{r loess-demo, results='asis'}
cars.lo <- loess(dist ~ speed, cars)
library(R2HTML)
HTML(cars.lo, file = '')
```
I was using the R Markdown syntax in the above example. The key is results='asis' which means to writing raw HTML code into the output.

I believe that you can also use Sweave to create HTML files, though I have heard that knitr is easier to use.

Related

have knitr output figures in different formats

Let's say I have a long report that produces a lot of figures which knitr makes as pdfs and it works really well with LaTeX. At the end of the project, my co-authors would like to have also raster based figures. One option would be to convert everything using ImageMagick. Another option would be to specify for each chunk dev = c("jpg", "pdf"), but given the number of figures, this could be cumbersome.
Is there a global switch to make knitr produce figures in pdf and other formats at the same time?
I think in the preamble
opts_chunk$set(dev = c("pdf", "jpg"))
should do. Within a R-chunk of course.

How to create R documentation file (.Rd) in latex?

Is there a simple way to create R documentation file for simple R functions?
I know I can edit a .Rd file in R-studio and preview it in HTML file. But how to put it into latex to edit and preview? Is there some latex package producing R documentation format?
There is the Rd2latex function in the tools package that will convert from the .Rd format to LaTeX format. This will let you preview the documentation in LaTeX. However this does not allow converting edits to the LaTeX document back to the .Rd document.
Look at Sweave, maybe it helpful for you.
Sweave is a tool that allows to embed the R code for complete data analyses in latex documents.
The purpose is to create dynamic reports, which can be updated automatically if data or analysis change. Instead of inserting a prefabricated graph or table into the report, the master document contains the R code necessary to obtain it. When run through R, all data analysis output (tables, graphs, etc.) is created on the fly and inserted into a final latex document.
The report can be automatically updated if data or analysis change, which allows for truly reproducible research.
Check out printr http://yihui.name/printr/ . It should do what you need if you are using knitr.
The problem with Rd2latex is that i haven't figured out which style file I need to use, otherwise it works fine.
When you generate the latex code with the Rd2latex function, make sure that you copy the Rd.sty file from the R directory, paste it and somewhere that latex can see it and use \usepackage{Rd}.
Try the knitr package, an easy way to generate flexible and fast dynamic reports with R for LaTex.

Create and save R's default codebooks as a pdf

If I load data(mtcars) it comes with a very neat codebook that I can call using ?mtcars.
I'm interested to document my data in the same way and, furthermore, save that neat codebook as a pdf.
Is it possible to save the 'content' of ?mtcars and how is it created?
Thanks, Eric
P.S. I did read this thread.
update 2012-05-14 00:39:59 PDT
I am looking for a solution using only R; unfortunately I cannot rely on other software (e.g. Tex)
update 2012-05-14 09:49:05 PDT
Thank you very much everyone for the many answers.
Reading these answers I realized that I should have made my priorities much clearer. Therefore, here is a list of my priorities in regard to this question.
R, I am looking for a solution that is based exclusively on R.
Reproducibility, that the codebook can be part of a automated script.
Readability, the text should be easy to read.
Searchability, a file that can be open with any standard software and searched (this is why I thought pdf would be a good solution, but this is overruled by 1 through 3).
I am currently labeling my variables using label() from the Hmisc package and might end up writing a .txt codebook using Label() from the same package.
(I'm not completely sure what you're after, but):
Like other package documentation, the file for mtcars is an .Rd file. You can convert it into other formats (ASCII) than pdf, but the usual way of producing a pdf does use pdflatex.
However, most information in such an .Rd file is written more or less by hand (unless you use yet another R package like roxygen/roxygen2 help you to generate parts of it automatically.
For user-data, usually Noweb is much more convenient.
.Rnw -Sweave-> -> .tex -pdflatex-> pdf is certainly the most usual way with such files.
However, you can use it e.g. with Openoffice (if that is installed) or use it with plain ASCII files instead of TeX.
Have a look at package knitr which may be easier with pure-ASCII files. (I'm not an expert, just switching over from Sweave)
If html is an option, both Sweave and knitr can work with that.
I don't know how to get the pdf of individual data sets but you can build the pdf of the entire datasets package from the LaTeX version using:
path <- find.package('datasets')
system(paste(shQuote(file.path(R.home("bin"), "R")),"CMD",
"Rd2pdf",shQuote(path)))
I'm not sure on this but it only makes sense you'd have to have some sort of LaTeX program like MikTex. Also I'm not sure how this will work on different OS as mine is windows and this works for me.
PS this is only a partial answer to your question as you want to do this for your data, but if nothing else it may get the ball rolling.
The help page that is displayed when entering ?mtcars is generated from an .Rd file, which is a LaTeX-like file that is used for all of R's help pages. Although .Rd files are LaTeX-like, you don't actually need to know LaTeX to read or write them. The actual mtcars.Rd file is available here: http://commondatastorage.googleapis.com/jthetzel-public/mtcars.Rd , which can be viewed with any text editor.
.Rd files included in the ./man directory of a package are converted to .html files when installing the package. They are converted by functions in the "tools" package.. If you would like functionality like ?mtcars for your datasets, you would need to create a package for them. That might sound complicated if you have never created a package before, but it is easy enough to learn and will make you a better R programmer. There are a number of examples of dataset-only packages on CRAN, for example msProstate: http://cran.r-project.org/web/packages/msProstate/index.html . Consider downloading the package source to see how it is organized.
For more information on creating your own packages, writing .Rd files, and building packages:
http://cran.r-project.org/doc/manuals/R-exts.html, especially "1.1.5 Data in packages".
Edit
And if you want to convert the .Rd file in your package to a .pdf, you can do so when building your package, but you will need a LaTeX compiler. If you are on Windows, see here: http://cran.r-project.org/bin/windows/Rtools/ .
You can't create a PDF with just R; you need to use other software that creates PDFs.
You could use a combination of utils::promptData, tools::Rd2HTML, and a simple custom function to open the created HTML file in the users' browser.
It would probably be easier to just make a package containing your data sets. Look at the "datasets" package for an example.
It looks like that if you want to generate a pdf, an external tool like LaTeX is always needed. I would recommend using a simple ASCII text format to generate such a file. In principle the .Rd files are also ASCII text, but I do not find them particularly readable.
Instead, I would recommend using a plain text ASCII format such as Markdown (which is e.g. used on StackOverflow) to write the text file. Such a file is already much more readable than an .Rd formatted file, and as a bonus it can quite easily be processed into a PDF should you choose to do so later on. The knitr package I think is capable of generating PDF files from Markdown sources. In addition, knitr allows you to mix in R code in the Markdown text. This code can be evaluated and the results (even figures) added to the resulting PDF.
In practice you can use sprintf to generate character vectors that you can pipe to a file in order to dynamically generate the markdown text. Just write the template one time, and mark the places for the text you want to add later like this:
base_text = "
First header
============
This document was generated on %s, by %s.
"
text_forfile = sprintf(text, some_date, some_name)
Just dump the text in text_forfile to a .md file and your done, no external tools needed. See this post on SO for how dump text to a file.

Redirecting R output and graphs

I use Sweave and LaTex to create reports from R output and graphs. But sometime it is required to have graphs in editable format. I tried R2wd package but it doesn't seem very flexible with ggplot2. I'd highly appreciate if someone point out me some efficient ways. Thanks
It really depends what you mean by "editable", and what kinds of files/endpoints you're talking about. There are a lot of discussions out there (e.g. this R-help thread from 2006) about (1) the best options for generating figures to embed in Word (or PowerPoint, which is pretty much the same question) and (2) the best options for figures that can be edited (by which I mean that they can be modified by non-R-users, not just moved from file to file). The general conclusions I have seen are:
PDF files: vector format, editable in Adobe Illustrator ($$$), only sorta-kinda-embeddable in MS Office documents
Windows metafiles or extended metafiles (WMF/EMF): vector format, very limited support outside of the Windows platform. Somewhat wonky format, but MS Office-native. Will certainly have limited support for things like alpha channels (transparency).
SVG: vector format. Very modern, editable in Inkscape (don't know where else), not particularly MS Office-compatible (I think). (Generatable at least via the Cairo package.)
PNG: raster format, but very compact (you can make the resolution absurdly large and still have a reasonably small output file); probably the easiest/lowest-common-denominator solution if you only need portability and not editability.
As of R 2.13.0, Sweave can automatically generate both PDF and PNG files on the fly for each figure chunk. If this is saved as document foo.Rnw:
\documentclass{article}
\begin{document}
\SweaveOpts{png=TRUE,pdf=TRUE,eps=FALSE}
<<fig1,fig=TRUE>>=
plot(1:5,1:5,col=1:5,pch=16)
#
\end{document}
... then after Sweaveing your directory will contain the files foo-fig1.png and foo-fig1.pdf. I don't know if that answers your question, but your question isn't entirely clear ...
The package pgfSweave links together Sweave with tikzDevice. tikzDevice uses the LaTeX tikz package to put the instructions to draw a plot into LaTeX code. So you can copy the resulting code from one file to another. pgfSweave also adds some handy features like cacheing.
You could also just output PDFs of your plots with pdf() and then insert the LaTeX code to load those plots as figures. You lose the automated file management of Sweave that way though.

practically getting started with Sweave

my question(s) might be less general than the title suggests. I am running R on Mac OS X with a MySQL database to store the data. I have been working with the Komodo / Sciviews-R for some time. Recently I had the need for auto-generated reports and looked into Sweave. I guess StatET / Eclipse appears to be the "standard" solution for Sweavers.
1) Is it reasonable to switch from Komodo to StatET Eclipse? I tried StatET before but chose Komodo over StatET because I liked the calltip / autosuggest and the more convenient config from Komodo so much.
2) What´s a reasonable workflow to generate Sweave files? Usually I develop my R code first and then care about the report later. I just learned today that there is one file in Sweave that contains R code and Latex code at once and that from this file the .tex document is created. While the example files look handily and can't really imagine how to enter my 250 + lines of R code to a file and mixed it up with Latex.
Is it possible to just enter the qplot() and ggplot() statements to a such a document and source the functionality like database connection and intermediate results somehow?
Or is it just a matter of being used to the mix of Latex and R code?
Thx for any suggestions, hints, links and back-to-the-roots-shout-outs…
You've asked several questions, so here's several answers;
Is StatEt/Eclipse the right way to do Sweave ?
Not nessarily (note: I'm an avid StatEt/Eclipse user, and use it for both pure R and Sweave/R and love it, I haven't used Komodo / sciviews-R). You should be able to run the sweave command from any R command line which will generate a .tex file. You can then turn the .tex file into something readable (like pdf) from any tex environment.
What's a good Sweave workflow ?
When I have wanted to turn an r script into a sweave report I generaly start with an empty sweave template and copy/paste my entire R script into a sweave R block just after the title, i.e;
<<label=myEntireRScript, echo=false, include=false>>
#Insert code here
myTable<-dataframe(...)
myPlot<-qplot(....)
#
Then I go through and find the parts I want to report. For instance, if i want to put a table into the report, I'll cut the R block and put an xtable block in, and the same for variables and plots.
<<label=myEntireRScript, echo=false, include=false>>=
#Insert code here
#
Put any text I want before my table here, maybe with a \Sexpr{print(variable)} named variable
<<label=myTable, result=Tex>>=
myTable<-dataframe(...)
print(xtable(mytable,...),...)
#
Any text I want before my figure
<label=myplot, result=figure>>=
myPlot<-qplot(....)
print(qplot)
#
You may want to look at these related SO posts. The rest of my post relates to your question 2.
When creating reports with Sweave, I usually keep most of the R code and the report text separate. If the R code is fast to run, then I prefer I will include something like the following at the start of the .Rnw file:
<<>>
source('/path/to/script.r')
#
On the other hand, if the R code takes a long time, I will often include something like the following at the end of the R script:
Sweave('/path/to/report.Rnw'); system('pdflatex report.tex')
That way, I can re-generate the report quickly, without needing to run all the R code again. Then, the only work R has to do in the Sweave file is print tables, make graphs and maybe extract a few figures.
Like nullglob, I prefer to keep the R and Sweave files separate, but I prefer to save the workspace with save.image() rather than to source() the file. This avoids running the R calculations with each .Rnw file compiling (and I always end up tinkering with the typesetting more than I'd like).
My general work flow is to do each paper/project in it's own folder with it's own R file(s). When the calculation side is "done", I save.image() to store all the workspace variables as-is.
Then, in the .Rnw file in the same directory I set the working directory with setwd() and load all variables with load(".Rdata"). Of course, you can change the name you use for your workspace, but I do one workspace per folder and keep the default name. Oh, and if you tinker with the R file, be sure save the workspace image and watch out for variables that linger in the workspace and .Rnw file, but are no longer part of the R file... this is where the save.image() approach can cause some headaches.
I am on a Mac and I suggest TextMate if you're mildly geeky and emacs/ess if you're really geeky. I use vim and command line R, but emacs/ess works best for most. If you're in this for the long haul, I doubt you'll regret learning emacs/ess for R, Sweave, and LaTeX.

Resources