Reducing the output file size of tikzDevice - r

I'm writing a large latex document with lots of plots generated by R's tikzDevice package. Currently, I'm experiencing LaTeX error "TeX capacity exceeded".
I've managed to fix this issue temporarily by following the remedy from Leo Liu's answer here (I've also tried the accepted answer from the same source), but this solution is temporary as I said, because eventually, I will be adding many more plots to my document with at least 1 meg per plot. Some people say that LuaTex dynamically allocates memory so it is never an issue with LuaTex (same source); however, I have a big document and limited time available, so converting to LuaTex is not an option (unless there is an automatic conversion software).
During my search, I found this post by chance talking about the same problem but with matlab's matlab2tikz package. The solution there is to reduce the number of samples in the plot and thus reduce the size of the resulting file. I looked up R's tikzDevice documentation (here and here) for a similar option, but was not able to find one unfortunately.
So the question is: How can I control the size of the plots (tikzpictures) generated by tikzDevice?

One workaround is to export the file produced by tikzDevice with option standAlone = TRUE. Compile the resulting file with lualatex to avoid the error message "TeX capacity exceeded". This can be done directly from R, using e.g.
tikz(figure.tex, standAlone = TRUE)
...
dev.off()
system("lualatex figure.tex; rm *.aux; rm *.log")
Then, include the file in LaTeX as a graphic using \includegraphics.
To reduce the size of the resulting pdf, you may wish to convert the image to a lossy compression format such as JPEG. Converting to postscript and back can also create smaller files, albeit at the cost of a loss of resolution.

Related

Input .tex in Rmarkdown

I'm using Rmarkdown/Bookdown to write a paper/PDF document, which is an amazing tool #Yihui, thanks! Now I'm trying to include a table I have already put in LaTeX into the document by reading in this external .tex file. However, when knitting in RStudio with a \include{some-file.tex} or input{some-file.tex} in the body of the .Rmd outside of a chunk a LaTeX Error: Can be used only in preamble. is produced and the process stopped. I haven't found a way how to directly input through knit or otherwise into a chunk as well.
I found this question here: Rmarkdown v2, embed Latex document, although while the question is similar, there is no answer which would reflect how to input/include .tex-files into an .Rmd.
Why would I want this? Sometimes LaTeX tables offer more layout options than building directly in R, like for tables only with text rather than R-computed numbers. Also, when running models on a cluster, exporting results directly into .tex ready for compilation saves a lot of computation compared to have to open all these heavy .RData files just for getting the results into a PDF. Similarly, having sometimes multiple types of reports with different audiences, having the full R code in one main .Rmd file and integrating only the necessary results in other files reduces complexity by not having to redo all steps in each file newly. This way, I can keep one report with the full picture and do not have to check if I included every little change in various documents simultaneously.
So finally the question is how to get prepared .tex-Files into a .Rmd-document?
Thanks for your answers!

Transitioning research project to knitr-based setup

Finally, I've decided to move my dissertation research closer toward the goal of making it as good reproducible research as it can be, given my circumstances. Since currently I don't use LaTeX for my dissertation report (though I'm considering this option), I believe that knitr is the best way to go.
The software project, implementing empirical part of my dissertation research (data analysis), is being written in R. The project's contains multiple files within directory structure, which is rather typical for scientific workflows (top level sub-directories: analysis, cache, data, figures, import, prepare, present, results, sandbox, utils).
I have read a lot of information (including examples) on using knitr for auto-generating reports and reproducible research, in general. However, I'm somewhat overwhelmed by multitude of configuration options and, more importantly, still confused on the best/correct/optimal approach for using knitr in projects like mine, containing multiple files and directories. In particular, I'm interested in advice on framework and steps for transitioning existing codebase without too many modifications in R modules.
As an example, let's consider my modules, related to exploratory data analysis (EDA). My current EDA workflow includes:
preliminary data, transformed from the original raw data (located in "data/transform" sub-directories);
module "eda.R", located in "analysis" directory;
directory "results/eda", where my current code is generating figures (SVG files) of univariate and multivariate EDA, as well as a single document report (PDF file) with the same graphical only information (generated descriptive statistics is being produced as a console output, when running the "eda.R" script).
In order to transition to knitr-based project, I have created file "eda-report.Rmd" with R Markdown statements for setting local knitr options, including read_chunk("eda.R"). My understanding is that now I need to define existing blocks of R code in "eda.R" as knitr chunks and then call these named chunks, according to my EDA workflow.
Questions:
Is it correct approach? What are best practices for using knitr in regard to setting up project paths, using source(), grouping some plots via gridExtra, preventing potential issues? It seems to me that, in addition to "eda-report.Rmd", I need to create another R module, which will be initiating processing of the .Rmd file by knitr. If Yes, which call should I use: rmarkdown::render() or knitr::knit() (while I use RStudio for development, I want my code to be independent from the development environment)?
UPDATE 1 (Additional question):
Why processing of an .Rmd file in RStudio via "Knit HTML" button produces HTML document, while processing via Makefile command Rscript -e 'library("knitr"); knit("eda-report.Rmd")' produces .md file, but not HTML, despite the presence of output: html_document directive?
Thank you for reading this! Your advice will be greatly appreciated!
In order to transition your workflow to using knitr, I suggest that rather than trying to make every last piece of code you write reproducible, you should start with the bits that will be most useful.
Since knitr is a report generation tool, the best place to start is by writing your dissertation in knitr. (You mention that you don't use LaTeX at the moment. That's fine: knitr also supports AsciiDoc, which I find easier to write. If your dissertation doesn't have many equations or tables, you might also get away with writing it in Markdown or Textile, which are even easier.)
Similarly, knitr is good for any reports or papers that you might write.
For more advanced usage, you can create presentations using knitr. (I sometimes knit xhtml Slidy presentations.)
What I wouldn't bother with is trying to knit all your exploratory data analysis. Most things you'll find are boring or dead ends, so it isn't worth the extra effort. Concentrate on exploring as fast as you can, then knit the interesting bits afterwards. Likewise, data cleaning isn't usually that interesting, so well commented code often suffices.
To answer your question about directory structure, my preference is that since knitr reports are for final output, they should be sandboxed away from scrappier exploratory work. That is, they can have their own directory, and produce their own copies of figures.

R2HTML or knitr for dynamic report generation?

I want to write an R function which processes some data and then automatically outputs an html report. This report should contain some fixed text, some text changing according to the underlying data and some figures.
What is the best way to go?
R2HTML or knitr?
What are the advantages of one over the other?
As far as I understood R2HTML allows me to build the html file sequentially while knitr already operates on an predefined .Rhtml file.
So, either use R2HTML or stitch and spin from knitr for on the fly report generation.
I would appreciate any suggestions or hints.
I grab this nice opportunity to promote pander a bit :)
This package was written for similar reasons like #Yihui's great knitr, although I wanted to let users really concentrate on the text and R code without dealing with chunk options etc. So letting users generate pretty HTML, pdf or even docx or odt output automatically with some predefined options.
These options affects e.g. the cache engine (handling dependencies without any chunk options) or the default plot options (let it be a "base" R graphics, lattice or ggplot2), so that you do no thave to set the color palette or the minor grid in each of your plots, just once - or live with the package defaults :)
The package captures the results (besides errors/warnings and other messages and the output) of all run R expression and can convert to Pandoc's markdown automatically. There are some helper functions that let you convert the resulting document written in a brew-like syntax automatically to e.g. HTML if you have pandoc installed, or export R objects to markdown/HTML/any other supported format in a live R session with a reference class.
Short demo:
brew file
Pandoc.brew('file_name.brew', output = 'foo.html', convert = 'html')
HTML output
knitr, every time. Handles graphics, lets you write your report with markdown instead of having to write html everywhere (if you want), caches things, makes coffee for you etc.
You can also build an HTML file sequentially as long as you have a decent text editor like Emacs/ESS or RStudio, etc. R2HTML is excellent in terms of its wide support to many R objects (see methods(HTML)), but I'll probably frown on RweaveHTML() due to its root Sweave().
That said, I think it may be a good idea to combine R2HTML and knitr, e.g.
# A LOESS Example
```{r loess-demo, results='asis'}
cars.lo <- loess(dist ~ speed, cars)
library(R2HTML)
HTML(cars.lo, file = '')
```
I was using the R Markdown syntax in the above example. The key is results='asis' which means to writing raw HTML code into the output.
I believe that you can also use Sweave to create HTML files, though I have heard that knitr is easier to use.

Large filesize of PDF output from spplot

Using a truncated shapefile and some data that I have added, I have created a colored map using spplot. However, when I export the created graph as a PDF, the filesize is very large and so is the final document. (The PDF is around 50 MB).
Is this normal or am I doing something wrong? I am not sure if I can provide a minimum working example, since the shapefile is also quite large (300MB).
Alternatively, is there any way to reduce the file-size after the fact?
To cite Prof Brian Ripley in http://markmail.org/message/ravkpnjexagmpm4o
If compression is enough, pdf() in R-devel does it, as does cairo_pdf() in current R. And there are other ways than Acrobat to compress/compact a PDF file: see ?tools::compactPDF and the 'Writing R Extensions' manual.
As an alternative to pdf I suggest to generate a high resolution png. I have submitted such pngs without any problem to journals such as computers and geosciences. Using such a png makes the documents they are put into much more workable. If you want to stick to pdf, pdftk (pdf toolkit) allows you to compress pdfs.
library(tools)
help(compactPDF)

Redirecting R output and graphs

I use Sweave and LaTex to create reports from R output and graphs. But sometime it is required to have graphs in editable format. I tried R2wd package but it doesn't seem very flexible with ggplot2. I'd highly appreciate if someone point out me some efficient ways. Thanks
It really depends what you mean by "editable", and what kinds of files/endpoints you're talking about. There are a lot of discussions out there (e.g. this R-help thread from 2006) about (1) the best options for generating figures to embed in Word (or PowerPoint, which is pretty much the same question) and (2) the best options for figures that can be edited (by which I mean that they can be modified by non-R-users, not just moved from file to file). The general conclusions I have seen are:
PDF files: vector format, editable in Adobe Illustrator ($$$), only sorta-kinda-embeddable in MS Office documents
Windows metafiles or extended metafiles (WMF/EMF): vector format, very limited support outside of the Windows platform. Somewhat wonky format, but MS Office-native. Will certainly have limited support for things like alpha channels (transparency).
SVG: vector format. Very modern, editable in Inkscape (don't know where else), not particularly MS Office-compatible (I think). (Generatable at least via the Cairo package.)
PNG: raster format, but very compact (you can make the resolution absurdly large and still have a reasonably small output file); probably the easiest/lowest-common-denominator solution if you only need portability and not editability.
As of R 2.13.0, Sweave can automatically generate both PDF and PNG files on the fly for each figure chunk. If this is saved as document foo.Rnw:
\documentclass{article}
\begin{document}
\SweaveOpts{png=TRUE,pdf=TRUE,eps=FALSE}
<<fig1,fig=TRUE>>=
plot(1:5,1:5,col=1:5,pch=16)
#
\end{document}
... then after Sweaveing your directory will contain the files foo-fig1.png and foo-fig1.pdf. I don't know if that answers your question, but your question isn't entirely clear ...
The package pgfSweave links together Sweave with tikzDevice. tikzDevice uses the LaTeX tikz package to put the instructions to draw a plot into LaTeX code. So you can copy the resulting code from one file to another. pgfSweave also adds some handy features like cacheing.
You could also just output PDFs of your plots with pdf() and then insert the LaTeX code to load those plots as figures. You lose the automated file management of Sweave that way though.

Resources