I'm having a problem with assigning LaTeX environments within an RMarkdown for-loop code-chunk.
In short, I've written an R Markdown document and a series of R-scripts to automatically generate PDF reports at the end of a long data analysis pipeline. The main section of the report can have a variable number of sections that I'm generating using a for-loop, with each section containing a \subsection heading, a datatable and plot generated by ggplot. Some of these sections will be very long (spanning several pages) and some will be very short (~1/4 of a page).
At the moment I'm just inserting a \pagebreak at the end of each for-loop iteration, but that leaves a lot of wasted space with the shorter sections, so I'm trying to "group" each section (i.e. the heading, table and chart) so that there can be several per page, but they will break to a new page if the whole section won't fit.
I've tried using a figure or minipage environment, but for some reason those commands are printed as literal text when the plot is included; these work as expected with the heading and data table, but aren't returned properly in the presence of the image.
I've also tried to create a LaTeX samepage environment around the whole subsection (although not sure this will behave correctly with multi-page sections?) and then it appears that the Markdown generated for the plot is not interpreted correctly somewhere along the way (Pandoc?) when it's within that environment and throws an error when compiling the TeX due to the raw Markdown ![]... image tag.
Finally, I've also tried implementing \pagebreak[x] and \nopagebreak[y] hints at various points in the subsection but can't seem get these to be produce the desired page breaking behaviour.
I've generated an MWE that reproduces my issues below.
I'd be really grateful for any suggestions on how to get around this, or better ways of approaching "grouping" of elements that are generated in a dynamic fashion like this?
---
title: "Untitled"
author: "I don't know what I'm doing"
date: "26/07/2020"
output:
pdf_document:
latex_engine: xelatex
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, dev = "cairo_pdf")
```
```{r cars, results='asis'}
for (i in 1:5){
cat("\\begin{figure}")
cat(paste0("\\subsection{This is subsection ",i,"}"))
cat("\\Huge Here's some bulk text that would represent a data table... kasvfkwsvg fiauwe grfiwgiu iudaldbau iausbd ouasbou asdbva asdbaisd i iuahihai hiuh iaiuhqijdblab ihlibljkb liuglugu h uhi uhi uhqw iuh qoijhoijoijoi qwegru wqe grouw egq\\newline")
plot(mtcars$wt,mtcars[,i])
cat("\\end{figure}")
}
```
Edit to add: interestingly these figure and minipage environments seems to work as expected when executing the same example in an .Rnw using knitr... so does that narrow it down to an issue with Pandoc? Again, any help much appreciated!
What happens is that the raw TeX commands are not treated as TeX when going through Markdown. You can fix that by explicitly marking the relevant snippets as LaTeX:
for (i in 1:5){
cat("`\\begin{figure}`{=latex}")
cat(paste0("\\subsection{This is subsection ",i,"}"))
cat("\\Huge Here's some bulk text that would represent a data table... kasvfkwsvg fiauwe grfiwgiu iudaldbau iausbd ouasbou asdbva asdbaisd i iuahihai hiuh iaiuhqijdblab ihlibljkb liuglugu h uhi uhi uhqw iuh qoijhoijoijoi qwegru wqe grouw egq\\newline")
plot(mtcars$wt,mtcars[,i])
cat("`\\end{figure}`{=latex}")
}
See the generic raw attribute section in the pandoc manual for details.
Related
I created a custom function which sets mfrow to nxn and creates n^2 scatter plots, with multiple data sets on each plot, based on an input list of data frames. The signature of my plotting function looks like this:
plot.return.list<-function(df.list,num.plot,title)
Where df.list is my list of data frames, num.plot is the total number of plots to generate (used to set mfrow) and title is the overall plot title (the function generates titles for each individual sub-graph).
This creats plots fine when I run the function from the console. However, I'm trying to get this figure into a markdown document using RStudio, like so:
```{r, fig.height=6,fig.width=6}
plot.return.list(f.1.list,4,bquote(atop("Numerical Approximations vs Exact Soltuions for "
,dot(x)==-1*x*(t))))
```
Since I haven't set the echo option in my {r} statement, this prints both the plotting code as well as the plot itself. However, if my first line instead reads:
{r, fig.height=6,fig.width=6,echo=FALSE}
Then both the code AND the plot disappear from the final document.
How do I make the plot appear WITHOUT the code? According to the example RStudio gives, setting echo=FALSE should make the plot appear without the code, but that isn't the behavior I'm observing.
EDIT: I seem to have tracked my problem down to kable. Whether or not I'm making a custom plot-helper function, any call to kable kills my plot. This can be reproduced in a markdown:
---
title: "repro"
author: "Frank Moore-Clingenpeel"
date: "October 9, 2016"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(knitr)
options(default=TRUE)
repro.df<-data.frame((0.1*1:10)%*%t(1:10))
```
```{r, echo=FALSE}
kable(repro.df)
```
```{r, fig.height=6,fig.width=6,echo=FALSE}
plot(repro.df[,1],repro.df[,2])
```
In this code, the plot won't plot because I have echo set to false; removing the flag makes the plot visible
Also note that in my repro code, kable produces a table with a bunch of garbage in the last line--I don't know why, but this isn't true for my full original code, and I don't think it's related to my problem.
Thanks for the reproducible example. From this I can see that the problem is you don't have a newline between your table chunk and your plot chunk.
If you were to knit this and examine the MD file produced by knit (or set html_document as your output format and have keep_md: true to look at it), you would see that the table code and plot code are not separated by any newline. Pandoc needs this to delimit the end of the table. Without it, it thinks your ![](path/to/image.png) is part of the table and hence puts it as a "junk line" in the table rather than an image on its own.
Just add a newline between the two chunks and you will be fine. (Tables need to be surrounded with blank lines).
(I know you are compiling to LaTeX so it may confuse you why I am talking about markdown. In case it does, when you do Rmd -> PDF, Rmarkdown uses knit to go from RMD to MD, and then pandoc to go from MD to tex. This is why you still need to make sure your markdown looks OK).
I'm writing a report in R Markdown in which I don't want to print any of my R code in the main body of the report – I just want to show plots, calculate variables that I substitute into the text inline, and sometimes show a small amount of raw R output. Therefore, I write something like this:
In the following plot, we see that blah blah blah:
```{r snippetName, echo=F}
plot(df$x, df$y)
```
Now...
That's all well and good. But I would also like to provide the R code at the end of the document for anybody curious to see how it was produced. Right now I have to manually write something like this:
Here is snippet 1, and a description of what section of the report
this belongs to and how it's used:
```{r snippetName, eval=F}
```
Here is snippet 2:
```{r snippetTwoName, eval=F}
```
<!-- and so on for 20+ snippets -->
This gets rather tedious and error-prone once there are more than a few code snippets. Is there any way I could loop over the snippets and print them out automatically? I'm hoping I could do something like:
```{r snippetName, echo=F, comment="This is snippet 1:"}
# the code for this snippet
```
and somehow substitute the following result into the document at a specified point when it's knitted:
This is snippet 1:
```{r snippetName, eval=F}
```
I suppose I could write some post-processing code to scan through the .Rmd file, find all the snippets, and pull out the code with a regex or something (I seem to remember there's some kind of options file you can use to inject commands into the pandoc process?), but I'm hoping there might be something simpler.
Edit: This is definitely not a duplicate – if you read my question thoroughly, the last code block shows me doing exactly what the answer to the linked question suggests (with a slight difference in syntax, which could have been the source of the confusion?). I'm looking for a way to not have to write out that last code block manually for all 20+ snippets in the document.
This is do-able within knitr, no need to use pandoc. Based on an example posted by Yihui at https://github.com/yihui/knitr-examples/blob/master/073-code-appendix.Rnw
Set echo=FALSE throughout your document: opts_chunk$set(echo = FALSE)
Then put this chunk at the end to print all code:
```{r show-code, ref.label=all_labels(), echo = TRUE, eval=FALSE}
```
This will print code for all chunks. Currently they all show up in a single block; I'd love to figure out how to put in the chunk label or some other header... For now I start my chunks with comments (probably not a bad idea in any case).
Updated: to show only the chunks that were evaluated, use:
ref.label = all_labels(!exists('engine')) - see question 40919201
Since this is quite difficult if not impossible to do with knitr, we can take advantage of the next step, the pandoc compilation, and of pandoc's ability to manipulate content with filters. So we write a normal Rmd document with echo=TRUE and the code chunks are printed as usual when they are called.
Then, we write a filter that finds every codeblock of language R (this is how a code chunk will be coded in pandoc), removes it from the document (replacing it, here, with an empty paragraph) and storing it in a list. We then add the list of all codeblocks at the end of the document. For this last step, the problem is that there really is no way to tell a python filter to add content at the end of a document (there might be a way in haskell, but I don't know it). So we need to add a placeholder at the end of the Rmd document to tell the filter to add the R code at this point. Here, I consider that the placeholder will be a CodeBlock with code lastchunk.
Here is the filter, which we could save as postpone_chunks.py.
#!/usr/bin/env python
from pandocfilters import toJSONFilter, Str, Para, CodeBlock
chunks = []
def postpone_chunks(key, value, format, meta):
if key == 'CodeBlock':
[[ident, classes, keyvals], code] = value
if "r" in classes:
chunks.append(CodeBlock([ident, classes, keyvals], code))
return Para([Str("")])
elif code == 'lastchunk':
return chunks
if __name__ == "__main__":
toJSONFilter(postpone_chunks)
Now, we can ask knitr to execute it with pandoc_args. Note that we need to remember to add the placeholder at the end of the document.
---
title: A test
output:
html_document:
pandoc_args: ["--filter", "postpone_chunks.py"]
---
Here is a plot.
```{r}
plot(iris)
```
Here is a table.
```{r}
table(iris$Species)
```
And here are the code chunks used to make them:
lastchunk
There is probably a better way to write this in haskell, where you won't need the placeholder. One could also customize the way the code chunks are returned at the end to add a title before each one for instance.
This is my first questions on StackOverflow, so please let me know if I'm doing anything wrong.
I'm using R to generate a lot of very large PDF documents. My data is about 580,000 observations, and breaks down in to 32 categories with each category containing 70 answers to between 20 and 300 questions. Currently I use two for loops (I try to avoid for loops, but for creating these pdfs it was the only way that worked). The first goes through and creates a pdf for the category with a title page, then the second adds a page for each graph showing the results of that question. I'm using ggplot2 & the "pdf" function.
The script works great, creating 32 pdfs (one for each category) with a custom title page and pages for all the questions in that category. I would like to add a Table of Contents after the title page. I know how to add a page with labels and page numbers, but I need one that links to each question.
I've searched this site and Google, but haven't found any way to do this in R. This question: Adding a table of contents to PDF with R plots talks about using RPython. I've also come across sources mentioning "hyperref", LaTex, Pandoc, and Knitr. I know how to use Kintr in an Rmarkdown doc, but that doesn't work for what I'm trying to do. I'm not really sure how to work with any of the others, so solutions with using them went over my head.
Is there not a way to work with creating a Table of Contents or just hyperlinks to PDF pages inside R, without going to those other languages?
Have you tried just clicking on the section names in the table of contents? By default, these seem to be hyperlinked, although there isn't any colouration that hints at it.
To help you see what might be happening, add / change your YAML header to add the following:
output:
pdf_document:
keep_tex: true
toc: true
toc_depth: 3
That will get the intermediate .tex file kept. If you open that up after knitting, you should already see references to hyperref in it.
I then find my table of contents being defined as:
{
\hypersetup{linkcolor=black}
\setcounter{tocdepth}{3}
\tableofcontents
}
which produces a hyperlinked TOC, but with "black" hyperlinks!
If you want to change the colour and see them show up, you can open the tex file in RSudio and simply change the "black" to "blue" and have RStudio run "Compile PDF" and you should see them showing up.
If you want your page numbers hyperlinked rather than the description, add the following into your YAML:
header-includes:
- \hypersetup{linktocpage}
Share & Enjoy!
I just remembered I left this open and thought I'd go back and post how I ended up solving it, well sorta. Instead of an R script, I used a R Markdownfile to create a combined pdf, which included all sections with their subsequent questions as different levels. I was able to create a pdf for each section individually with a linked clickable Table of Contents including all of its questions(pages) and different header levels for title pages.
The key was pandoc.header, which allowed me to create the headers, which show in the TOC. I think neither the for loops, nor the ggplot, which was created for each page, is relevant. Here is an overview of the .rmd :
title:
author:
output:
pdf_document:
toc: true
```{r results = "asis", message=FALSE, warning=FALSE, echo=FALSE, fig.height = 11, fig.width = 8}
for(i in 1:length(categories){
pandoc.header(paste("Category ",category_num, ": ", category discription), level = 1)
category title page
for(i in 1:numberofquestions){
pandoc.header(paste("Question ",question_num, ": ", subtitle1), level = 2)
print(ggplot())
}}
```
The only inconvenient part is that each page must have a header to be linked to and I really didn't like the title pages having one, but it looks like I can manually edit that out with what dsz posted.
I am using knit()and markdownToHTML() to automatically generate reports.
The issue is that I am not outputting plots when using these commands. However, when I use RStudio's Knit HTML button, the plots get generated. When I then use my own knit/markdown function, it suddenly outputs the plot. When I switch to another document and knit that one, the old plot appears.
Example:
```{r figA, result='asis', echo=TRUE, dpi=300, out.width="600px",
fig=TRUE, fig.align='center', fig.path="figure/"}
plot(1:10)
```
Using commands:
knit(rmd, md, quiet=TRUE)
markdownToHTML(md, html, stylesheet=style)
So I guess there are 2 questions, depending on how you want to approach it:
What magic is going on in Rstudio's Knit HTML?
How can I produce/include without depending on RStudio's Knit HTML button?
The only issue I see here is that this doesn't work when you have the chunk options {...} spanning two lines. If it's all on one line, it works fine. Am I missing something?
See how this is not allowed under knitr in the documentation:
Chunk options must be written in one line; no line breaks are allowed inside chunk options;
RStudio must handle linebreaks in a non-standard way.
This is really embarrassing, I really thought I read the documentation carefully:
include: (TRUE; logical) whether to include the chunk output in the
final output document; if include=FALSE, nothing will be written into
the output document, but the code is still evaluated and plot files
are generated if there are any plots in the chunk, so you can manually
insert figures; note this is the only chunk option that is not cached,
i.e., changing it will not invalidate the cache
Simply adding {..., include=TRUE} did the trick. I would say it would be a pretty sensible default though.
I'm trying to come up with a good system for generating slides and accompanying handouts. The ideal system would have the following properties:
beautiful in both presentation (PDF/HTML) and handout (PDF) layouts (handouts should have room for taking notes)
embedded R chunks, figures, other JPG/PNG pictures, etc.
easy to compose
build using command-line tools
bibliography support
pandoc slide separator format (automatically generate a new slide after headers of a specified level) is preferred
I can live with a little bit of additional processing (e.g. via sed), but would prefer not to write a huge infrastructure
two-column layouts: there is a SO post on how to get multi-column slides from pandoc, but it is LaTeX- rather than HTML-oriented.
ability to adjust sizes of embedded images (other than R-generated figures) and column widths on the fly
Here's what I've discovered so far about the various options:
Slidify:
doesn't do pandoc slide separator format, although there is a workaround
the suggestion for creating handouts is to print to PDF; I'd like to leave room for notes etc. (I could probably figure out a way to do that using something like PDFtk or psnup ...)
RStudio presentations (.Rpres files):
does lots of things nicely, including multi-columns with specified widths
doesn't support pandoc slide separator format
I can't figure out what's going on under the hood. There is RStudio documentation that describes the translation process for regular HTML, but it doesn't seem to cover the R presentation format (which isn't quite the same). (I have previously invested some effort in figuring out how to get RStudio-like output via pandoc ...), which means I can't generate slides etc. from the command line.
RStudio's Development Version (as of March 2014) comes bundled with Pandoc and version 2 of rmarkdown. It addresses many of the above issues with the .Rpres format.
pandoc: may be the only markdown-translator that has features such as footnotes, bibliography support, etc.. I can also use pandoc to generate LaTeX using the tufte-handout class, which meets my criteria of beauty.
Unfortunately, it seems not to have built-in two-column format support. Yihui Xie's HTML5 example doesn't show any two-column examples, and it claims (on slide 5) that clicking the "Knit HTML" button in RStudio is equivalent to pandoc -s -S -i -t dzslides --mathjax knitr-slides.md -o knitr-slides.html, but it doesn't seem to be ...
LaTeX/beamer: I could simply compose in Rnw (knitr-dialect Sweave) rather than R markdown to begin with. This would give me ultimate flexibility ...
despite many years of LaTeX use I do find LaTeX composition more of a pain than markdown composition.
After all that, my specific question is: what's the best (easiest) way to generate a two-column layout for HTML output?
Any other advice will also be appreciated.
This is an old Q, but I was recently plagued by a similar question, here's what I found:
Using the RPres format, two columns can be specified like so (details). Note that RPres can only be converted to HTML by clicking a button in RStudio, there doesn't seem to be any command line method, which is a bit annoying. Despite, that I'd say it is currently the simplest and most flexible method for getting slide columns with markdown:
===
Two Column Layout
===
This slide has two columns
***
```{r, echo=FALSE}
plot(cars)
```
Some flexibility is afforded by adjusting the column proportions:
===
Two Column Layout
===
left: 30%
This slide has two columns
***
```{r, echo=FALSE}
plot(cars)
```
With rmarkdown we can get two columns, but with no control over where the break is, which is a bit of a problem:
---
output: ioslides_presentation
---
## Two Column Layout {.columns-2}
This slide has two columns
```{r, echo=FALSE}
plot(cars)
```
We can also mix markdown and LaTeX in an Rmd file using the beamer_presentation format in RStudio to get two columns like this, but can't run any code in either column, which is a limitation:
---
output: beamer_presentation
---
Two Column Layout
-------
\begin{columns}
\begin{column}{0.48\textwidth}
This slide has two columns
\end{column}
\begin{column}{0.48\textwidth}
If I put any code in here I get an error, see
https://support.rstudio.com/hc/communities/public/questions/202717656-Can-we-have-columns-in-rmarkdown-beamer-presentations-
\end{column}
\end{columns}
Seems like a regular Rnw LaTeX doc is the best way to get columns if you want to use LaTex, not this markdown hybrid (cf. two column beamer/sweave slide with grid graphic)
In all of the above an image can be placed in an column.
The slidify website has instructions on making two columns here: http://slidify.org/customize.html but it's not clear what has to go into the assets/layouts folder to make it work
You can use fenced_divs notation or ::: to create columns or `Two Content layout'. See also this page to know more about the notation.
## Slide With Image Left
::: columns
:::: column
left
::::
:::: column
right
```{r your-chunk-name, echo=FALSE, fig.cap="your-caption-name"}
knitr::include_graphics("your/figure/path/to/the-image.pdf")
#The figure will appear on the right side of the slide...
```
::::
:::
Since pandoc 2+, which supports the notation, was implemented in RStudio v1.2+, you may need to install RStudio v1.2+ first. The installation is easy enough (at least in my case); just download and install RStudio v1.2+. In the way of installation, the former version of RStudio on your computer will be replaced with the new one without uninstalling it manually.
The ::: notation can be used even when you knit .Rmd files with beamer_presentation option, as well as when you create HTML slides. So we don't have to neither mix markdown and LaTeX notation in one file, nor add additional codes any longer: just knit the file as you knit other .Rmd with other options.
I now have what I think is a reasonable solution that should apply at least to ioslides-based solutions, and maybe (?) to other HTML5-based formats. Starting here, I added
<style>
div#before-column p.forceBreak {
break-before: column;
}
div#after-column p.forceBreak {
break-after: column;
}
</style>
to the beginning of my document; then putting <p class="forceBreak"></p> within a slide with {.columns-2} breaks the column at that point, e.g.
## Latin hypercube sampling {.columns-2}
- sample evenly, randomly across (potentially many) uncertain parameters
<p class="forceBreak"></p>
![](LHScrop.png)
[User:Saittam, Wikipedia](https://commons.wikimedia.org/wiki/File:LHSsampling.png#/media/File:LHSsampling.png)
There may be an even better way, but this isn't too painful.
#ChrisMerkord points out in comments that
.forceBreak { -webkit-column-break-after: always; break-after: column; }
worked instead (I haven't tested ...)
I got an idea from HERE, the basic solutions was:
### Function *inner_join*
. . .
`<div style="float: left; width: 50%;">`
``` {r, echo = FALSE, results = 'markup', eval = TRUE}
kable(cbind(A,B))
```
`</div>`
`<div style="float: right; width: 50%;">`
```{r, echo = TRUE, results = 'markup', eval = TRUE}
inner_join(A,B, by="C")
```
`</div>`
There is a workaround for beamer error.
In short: Error is related to pandoc conversion engine, which treats everything between \begin{...} and \end{...} as TeX. It can be avoided by giving new definition for begin{column} and end{column} in yaml header.
Create mystyle.tex and write there:
\def\begincols{\begin{columns}}
\def\begincol{\begin{column}}
\def\endcol{\end{column}}
\def\endcols{\end{columns}}
In the Rmd file use these new definitions
---
output:
beamer_presentation:
includes:
in_header: mystyle.tex
---
Two Column Layout
-------
\begincols
\begincol{.48\textwidth}
This slide has two columns.
\endcol
\begincol{.48\textwidth}
```{r}
#No error here i can run any r code
plot(cars)
```
\endcol
\endcols
And you get:
So far I haven't been able to do better than hacking my own little bit of markup on top of the rmd format: I call my source file rmd0 and run a script including this sed tidbit to translate it to rmd before calling knit:
sed -e 's/BEGIN2COLS\(.*\)/<table><tr><td style="vertical-align:top; width=50%" \1>/' \
-e 's/SWITCH2COLS/<\/td><td style="vertical-align:top">/' \
-e 's/END2COLS/<\/td><\/tr><\/table>/' ...
There are a few reasons I don't like this. (1) It's ugly and special-purpose, and I don't have a particularly good way to allow optional arguments (e.g. relative widths of columns, alignment, etc.). (2) It has to be tweaked for each output format (e.g. if I wanted LaTeX/beamer output I would need to substitute \begin{columns}\begin{column}{5cm} ... \end{column}\begin{column}{5cm} ... \end{column}\end{columns} instead (as it turns out I want to ignore the two-column formatting when I make LaTeX-format handouts, so it's a little easier, but it's still ugly).
Slidify may yet be the answer.
Not a direct solution, but Yihui's Xaringan package https://github.com/yihui/xaringan/ works for me. It's based on remark.js. In the default template, you can use .pull-left[] and .pull-right[] . Example: https://slides.yihui.name/xaringan/#15. You only need a minimum tweak on the existing .rmd files.