knitr HTML output too large - r

I have been using rmarkdown/knitr's knit to html capability to generate html code for some blogs. I've found it extremely helpful and convenient, but have been running into some problems lately with file size.
When I knit a script that has graphics that use shapefiles or ggmap images, the html file gets too big for the blog host to make sense of it (I've tried with both blogger and wordpress). I believe this has to do with the relatively large data.frames/files that are the shapefiles/ggmap being put into html form. Is there anything I can do to get a smaller html file that can be parsed by a blog host?
For reference, the html output from an rmarkdown script with one graphic using a ggmap layer, a layer of shapefiles and some data is 1.90MB, which is too big for blogger or wordpress to handle in html input. Thanks for any ideas.

Below are 3 different options to help you reduce the file size of HTML files with encoded images.
1. Optimize an existing HTML file
You can run this Python script on an existing HTML file. The script will:
decode the base64 encoded images
run pngquant to optimize the images
re-encode the optimized images as base64
Usage:
python optimize_html.py infile.html
It writes output to infile-optimized.html.
2. Use the built-in knitr hook for optimizing PNG images
knitr 1.15 includes a hook called hook_optipng that will run the optipng program on generated PNG files to reduce file size.
Here is a .Rmd example (taken from: knitr-examples/035-optipng.Rmd):
# 035-optipng.Rmd
This demo shows you how to optimize PNG images with `optipng`.
```{r setup}
library(knitr)
knit_hooks$set(optipng = hook_optipng)
```
Now we set the chunk option `optipng` to a non-`NULL` value,
e.g. `optipng=''`, to activate the hook. This string is passed to
`optipng`, so you can use `optipng='-o7'` to optimize more heavily.
```{r use-optipng, optipng=''}
library(methods)
library(ggplot2)
set.seed(123)
qplot(rnorm(1e3), rnorm(1e3))
```
3. Write your own knitr hook for any image optimizer
Writing your own hook is also quite easy, so I wrote a hook that calls the pngquant program. I find that pngquant runs faster, and the output files are smaller and look better.
Here is a .R example that defines and uses hook_pngquant (taken from this gist).
#' ---
#' title: "pngquant demo"
#' author: "Kamil Slowikowski"
#' date: "`r Sys.Date()`"
#' output:
#' html_document:
#' self_contained: true
#' ---
#+ setup, include=FALSE
library(knitr)
# Functions taken from knitr/R/utils.R
all_figs = function(options, ext = options$fig.ext, num = options$fig.num) {
fig_path(ext, options, number = seq_len(num))
}
in_dir = function(dir, expr) {
if (!is.null(dir)) {
owd = setwd(dir); on.exit(setwd(owd))
}
wd1 = getwd()
res = expr
wd2 = getwd()
if (wd1 != wd2) warning(
'You changed the working directory to ', wd2, ' (probably via setwd()). ',
'It will be restored to ', wd1, '. See the Note section in ?knitr::knit'
)
res
}
is_windows = function() .Platform$OS.type == 'windows'
in_base_dir = function(expr) {
d = opts_knit$get('base.dir')
if (is.character(d) && !file_test('-d', d)) dir.create(d, recursive = TRUE)
in_dir(d, expr)
}
# Here is the code you can modify to use any image optimizer.
hook_pngquant <- function(before, options, envir) {
if (before)
return()
ext = tolower(options$fig.ext)
if (ext != "png") {
warning("this hook only works with PNG")
return()
}
if (!nzchar(Sys.which("pngquant"))) {
warning("cannot find pngquant; please install and put it in PATH")
return()
}
paths = all_figs(options, ext)
in_base_dir(lapply(paths, function(x) {
message("optimizing ", x)
cmd = paste(
"pngquant",
if (is.character(options$pngquant)) options$pngquant,
shQuote(x)
)
message(cmd)
(if (is_windows())
shell
else system)(cmd)
x_opt = sub("\\.png$", "-fs8.png", x)
file.rename(x_opt, x)
}))
return()
}
# Enable this hook in this R script.
knit_hooks$set(
pngquant = hook_pngquant
)
#' Here we set the chunk option `pngquant='--speed=1 --quality=0-50'`,
#' which activates the hook.
#+ use-pngquant, pngquant='--speed=1 --quality=0-50'
library(methods)
library(ggplot2)
set.seed(123)
qplot(rnorm(1e3), rnorm(1e3))
I prefer to write my reports in R scripts (.R) instead of R markdown documents (.Rmd). See http://yihui.name/knitr/demo/stitch/ for more information on how to do that.

One thing you could do would be to not use embedded image and other resources. To achieve this, you can set the self_contained option in the YAML header for your document to false, e.g.:
---
output:
html_document:
self_contained: false
---
More info here: http://rmarkdown.rstudio.com/html_document_format.html

Related

Highlighting some references in RMarkdown documents?

Is it possible to emphasise (e.g., put in bold) some references that contain a particular string (e.g., the name of a particular author) in a papaja .Rmd document (where the refs are taken from a bib file and using the apa7.csl file)?
I can propose this solution based on pandoc lua filter which would work for not just pdf but also html output and doesn't require manual editing of latex or html file.
---
title: "The title"
bibliography: "r-references.bib"
output:
pdf_document:
pandoc_args: [ "--lua-filter", "ref-bold.lua"]
html_document:
pandoc_args: [ "--lua-filter", "ref-bold.lua"]
---
```{r setup, include = FALSE}
library("papaja")
r_refs("r-references.bib")
```
We used `R` [#R-base] and `Tidyverse` [#R-tidyverse] for all our analyses. Especially [#R-tidyverse] made things easy.
\vspace{10mm}
# References
ref-bold.lua
function Cite(el)
if pandoc.utils.stringify(el.content) == "[#R-tidyverse]" then
return (pandoc.Strong(el))
end
end
This demo bolds all of the reference to tidyverse package, if we would wanted to bold the reference to base-R, we would modify the second line in ref-bold.lua as pandoc.utils.stringify(el.content) == "[#R-base]" and all instances of references to base-R would be bold (highlighted).
pdf output
html output
A Lua-Filter would be the most elegant solution. If you use BibLaTeX and biber, you can use the general annotation feature (see this SO answer).
Include the following in your preamble:
\renewcommand*{\mkbibnamegiven}[1]{%
\ifitemannotation{bold}
{\textbf{#1}}
{#1}}
\renewcommand*{\mkbibnamefamily}[1]{%
\ifitemannotation{bold}
{\textbf{#1}}
{#1}}
In your bib-file, use the Author+an field to define which author to highlight:
#Misc{pawel2022power,
title = {Power Priors for Replication Studies},
author = {S Pawel and F Aust and L Held and E J Wagenmakers},
year = {2022},
eprinttype = {arxiv},
eprint = {2207.14720},
url = {https://arxiv.org/abs/2207.14720},
Author+an = {2=bold}
}
Now, render the R Markdown file with XeLaTeX, keep the intermediate files and the TeX file, and render the TeX file again using biber:
rmarkdown::render("paper.Rmd", clean = FALSE)
tinytex::xelatex("academic.tex", bib_engine = "biber")
Posting a solution in case it can be useful to others as well. We can first render the LaTeX file from the RMarkdown document, then find and replace all instances of the name to be emphasised, and finally generate the pdf from the modified LaTeX file.
# knitting the original Rmd file (with "keep_tex: true" in YAML)
rmarkdown::render(input = "some_file.Rmd")
# reading the generated LaTeX file
tex_file <- readLines(con = "some_file.tex")
# putting a particular author in bold in the references list
new_tex_file <- gsub(pattern = "James, W.", replace = "\\textbf{James, W.}", x = tex_file, fixed = TRUE)
# writing the (updated) LaTeX file
writeLines(text = new_tex_file, con = "some_file.tex")
# generating the pdf file (may need to be ran twice)
system(command = "xelatex some_file.tex")

Is it possible to set chunk defaults only for a specific engine?

I am using SQL blocks in an RMarkdown document, and I want the echo option to default to FALSE for all of them – but only for sql blocks, not others.
I know I can set knitr::opts_chunk$set(echo = TRUE), but that would set it for all chunks.
As Yihui suggested in a comment, the proper way to do this is to use an option hook. The following sets echo=FALSE for sql chunks and echo=TRUE otherwise:
knitr::opts_hooks$set(echo = function(options) {
options$echo <- options$engine != "sql"
return(options)
})
I'll leave my original answer below … for entertainment. It's a workaround, required in a hypothetical parallel universe without option hooks.
You can query the current engine via opts_current$get("engine"). Based on this, you can use the following function (and extend it however you like) to determine the desired value for echo:
conditionalDefaut_echo <- function() {
return(opts_current$get("engine") != "sql")
}
The challenge is to evaluate this function whenever parsing a new chunk. This can be achieved with quote:
opts_chunk$set(echo = quote(conditionalDefaut_echo()))
To be honest, I am not sure how reliable this is – this kind of metaprogramming depends on the internal workings of knitr and might break in the future. (Maybe Yihui wants to comment on this …)
A full example with engines r and asis, where echo is FALSE for asis chunks and TRUE otherwise:
```{r}
library(knitr)
conditionalDefaut_echo <- function() {
return(opts_current$get("engine") != "asis")
}
opts_chunk$set(echo = quote(conditionalDefaut_echo()))
```
```{asis}
I'm invisible.
```
```{r}
print(1) # code visible
```

wrapping figure with latex environment in knitr/rmarkdown with hooks

I'd like to wrap figures created with knitr and rmarkdown in a "wrapfigure" environment using hooks. However, when running the minimal example below, the figure chunk only gets compiled into a markdown picture:
\begin{wrapfigure}{R}{0.3\textwidth}
![](test_files/figure-latex/unnamed-chunk-2-1.pdf)
\end{wrapfigure}
and not the expected:
\begin{wrapfigure}{R}{0.3\textwidth}
\includegraphics{test_files/figure-latex/unnamed-chunk-2-1.pdf}
\end{wrapfigure}
Minimal example:
---
header-includes:
- \usepackage{wrapfig}
output:
pdf_document:
keep_tex: TRUE
---
```{r}
library(knitr)
knit_hooks$set(wrapf = function(before, options, envir) {
if(before) {
"\\begin{wrapfigure}{R}{0.3\\textwidth}"
} else {
"\\end{wrapfigure}"
}
})
```
```{r, wrapf=TRUE}
library(ggplot2)
qplot(cars$speed, cars$dist)
```
pandoc is responsible for converting the markdown document to a TEX document. As pandoc doesn't touch between \begin{…} and \end{…} the markdown syntax for the image is not being converted to TEX syntax.
You could …
Hide the plot (fig.show = 'hide') and use something along the lines of cat("\includegraphics{figure/unnamed-chunk-2-1.pdf}").
Hide the plot as above and include some magic in the hook that saves the cat.
Write RNW instead of RMD if you want PDF output.
Here's an example for option 2:
knit_hooks$set(wrapf = function(before, options, envir) {
if(before) {
return("\\begin{wrapfigure}{R}{0.3\\textwidth}")
} else {
output <- vector(mode = "character", length = options$fig.num + 1)
for (i in 1:options$fig.num) {
output[i] <- sprintf("\\includegraphics{%s}", fig_path(number = i))
}
output[i+1] <- "\\end{wrapfigure}"
return(paste(output, collapse = ""))
}
})
This hook can be used with wrapf = TRUE and fig.show = "hide". (Moreover, you need to add \usepackage{graphics} to header-includes.)
But note that I would not do it! Too many things can go wrong in more complex settings. Think of cache, captions, labels, cache (again!) …
Therefore, if it is really necessary to control the typesetting of the PDF, I recommend writing RNW (option 3).

R markdown file: include help information

I would like to include at the end of the R markdown documention the help page about the mtcars dataset.
In my file I included the following:
```{r}
?mtcars
```
When I compile the markdown (output is to PDF - knitr), upon processing this instruction the help page comes up in my browser but the resulting pdf lacks this section.
Is there a way I could acheive this other then copying from one place to the other?
Thank you.
We can adapt Yihui Xie's static_help function to get the html source for a given help file
static_help <- function(pkg, topic, out, links = tools::findHTMLlinks()) {
pkgRdDB = tools:::fetchRdDB(file.path(find.package(pkg), 'help', pkg))
force(links)
tools::Rd2HTML(pkgRdDB[[topic]], out, package = pkg,
Links = links, no_links = is.null(links))
}
If we write the source to a temporary file we can then read it back in and strip off the header and footer, giving you the body of the help file to include in your markdown document
```{r, echo = FALSE, results = "asis"}
static_help <- function(pkg, topic, out, links = tools::findHTMLlinks()) {
pkgRdDB = tools:::fetchRdDB(file.path(find.package(pkg), 'help', pkg))
force(links)
tools::Rd2HTML(pkgRdDB[[topic]], out, package = pkg,
Links = links, no_links = is.null(links))
}
tmp <- tempfile()
static_help("datasets", "mtcars", tmp)
out <- readLines(tmp)
headfoot <- grep("body", out)
cat(out[(headfoot[1] + 1):(headfoot[2] - 1)], sep = "\n")
```
EDIT
The above solution produced HTML output, whereas the question actually asked for PDF output. We can adapt the above to return latex output instead; this time the only-post-editing required is to switch % for \n
```{r, echo = FALSE, results = "asis"}
static_help <- function(pkg, topic, out, links = tools::findHTMLlinks()) {
pkgRdDB = tools:::fetchRdDB(file.path(find.package(pkg), 'help', pkg))
force(links)
tools::Rd2latex(pkgRdDB[[topic]], out, package = pkg,
Links = links, no_links = is.null(links))
}
tmp <- tempfile()
static_help("datasets", "mtcars", tmp)
out <- readLines(tmp)
out <- gsub("%", "\n", out, fixed = TRUE)
cat(out, sep = "\n")
```
However the .Rd files depend on Rd.sty. The simplest way to get LaTeX to find Rd.sty is to put a copy in the same directory as your .Rmd file. Then you need to define a custom template to replace the default pandoc LaTeX template. Again, the simplest solution is to put a copy of the default template in the same directory as your .Rmd file, then modify it by replacing everything between the \documentclass command and the \begin{document} command (lines 2 - 145) with the command
\usepackage{Rd}
Finally modify the metadata of your .Rmd file to use the new template
---
output:
pdf_document:
template: template.tex
---

Is it possible to view an HTML table in the viewer pane?

I would like to know if there is any function which makes it easy to visualize an html object in the RStudio's viewer pane. For instance, I would like to know if it would be possible to view an html table in the viewer pane.
library("Quandl")
library("knitr")
df <- Quandl("FBI_UCR/USCRIME_TYPE_VIOLENTCRIMERATE")
kable(head(df[,1:9]), format = 'html', table.attr = "class=nofluid")
I have a solution that works for kable tables.
kable(iris) %>% kableExtra::kable_styling()
This is automatically displayed in the viewer pane. No need for tempfile.
I have this functionality in my htmlTable package and the function is rather simple:
print.htmlTable<- function(x, useViewer = TRUE, ...){
# Don't use viewer if in knitr
if (useViewer &&
!"package:knitr" %in% search()){
htmlFile <- tempfile(fileext=".html")
htmlPage <- paste("<html>",
"<head>",
"<meta http-equiv=\"Content-type\" content=\"text/html;charset=UTF-8\">",
"</head>",
"<body>",
"<div style=\"margin: 0 auto; display: table; margin-top: 1em;\">",
x,
"</div>",
"</body>",
"</html>", sep="\n")
cat(htmlPage, file=htmlFile)
viewer <- getOption("viewer")
if (!is.null(viewer) &&
is.function(viewer)){
# (code to write some content to the file)
viewer(htmlFile)
}else{
utils::browseURL(htmlFile)
}
}else{
cat(x)
}
}
RStudio recommends that you use the getOption("viewer") instead of #Ramnath's suggestion, the raw RStudio::viewer(). My solution also adds the utils::browserURL() in case you are not using RStudio. I got the idea from this blog post.
Here is a quick way to do this in RStudio
view_kable <- function(x, ...){
tab <- paste(capture.output(kable(x, ...)), collapse = '\n')
tf <- tempfile(fileext = ".html")
writeLines(tab, tf)
rstudio::viewer(tf)
}
view_kable(head(df[,1:9]), format = 'html', table.attr = "class=nofluid")
If the kable function can return an object of class kable, then one could rename view_kable as print.kable in which case merely calling the kable function would open the table in the viewer. If you think this is useful, please go ahead and file a feature request on the knitr github page.
As was explained on this RStudio Support page, the key is to use tempfile() :
Note that the Viewer pane can only be used for local web content. This
content can either be static HTML files written to the session
temporary directory (i.e. files with paths generated by the tempfile
function) or a locally run web application.
See my answer to this question for a bare-bones example.
For kable objects, we can use print.kableExtra
library(knitr)
x <- kable(head(iris), format = "html")
library(kableExtra)
class(x) <- c("kableExtra", class(x))
print(x)

Resources