Automating the generation of preformated text in Rmarkdown using R - r

I'm creating a document in which I repeat the same formatting multiple times. So I want to automate the process using the for loop in R. Here's a simple example.
Assume, I have an R code that computes some information for all cut values in ggplot2::diamonds dataset, which I want then to print in my document in five separate sections (one section per cut):
library(knitr); library(data.table)
dt <- data.table(ggplot2::diamonds)
for (cutX in unique(dt$cut)) {
dtCutX <- dt[cut==cutX, lapply(.SD,mean), .SDcols=5:7]
#### START of the Rmd part that needs to be printed
# Section: The Properties of Cut `cutX`
<!-- NB: This is the Section title in Rmd format, not the comment in R format ! -->
This Section describes the properties of cut `r cutX`. Table below shows its mean values:
`r knitr::kable(dtCutX)`
The largest carat value for cut `r cutX` is `r dt[cut=='Ideal', max(carat)]`
#### END of the Rmd part that needs to be printed
}
How do I do that?
I.e., How do I insert inside my main Rmd code an R code that tells it to insert other Rmd codes (in a for loop) - to produce automatically five Sections for five types of diamond cuts?
PS.
I found these related posts:
Reusing chunks in Knitr and
Use loop to generate section of text in rmarkdown
but was not able yet to recreate the solution for the above example.

For this kind of task, you can use glue package to evaluate R expressions inside character strings.
Here's an Rmd file that answer your question:
---
title: "Untitled"
output: html_document
---
```{r echo=FALSE, results='asis'}
library(data.table)
dt <- data.table(ggplot2::diamonds)
for (cutX in unique(dt$cut)) {
dtCutX <- dt[cut==cutX, lapply(.SD,mean), .SDcols=5:7]
cat("\n\n# Section: The Properties of Cut `cutX`\n")
cat(glue::glue("This Section describes the properties of cut {cutX}. Table below shows its mean values:\n"))
print(knitr::kable(dtCutX))
cat(glue::glue("\n\nThe largest carat value for cut {cutX} is {dt[cut=='Ideal', max(carat)]}\n"))
}
```

Related

Removing last 10 lines of each RMarkdown in a folder

I wish to run a R script where I can remove the last 10 lines of each Rmarkdown file as the last 10 lines are confidential content. I have been given 10 such RMarkdowns in total. Is there a better way to do this than doing it one by one?
I have provieded a dummy tempalte of the Rmarkdown.
---
title: "Untitled"
author: "Unknown"
date: "28/05/2021"
output: pdf_document
---
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
## Including Plots
You can also embed plots, for example:
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
## Including Plots
You can also embed plots, for example:
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
## Including Plots
You can also embed plots, for example:
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
## Including Plots
You can also embed plots, for example:
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
You can use readLinesand writeLinesfor this:
txt <- readLines("test.md")
N <- length(txt)
writeLines(txt[1:(N-10)], "test_short.md")
Then use dirto enumerate all files and automate with lapply or a for loop.

R Markdown Grouping of Figures to Prevent Pagebreak

I'm having a problem with assigning LaTeX environments within an RMarkdown for-loop code-chunk.
In short, I've written an R Markdown document and a series of R-scripts to automatically generate PDF reports at the end of a long data analysis pipeline. The main section of the report can have a variable number of sections that I'm generating using a for-loop, with each section containing a \subsection heading, a datatable and plot generated by ggplot. Some of these sections will be very long (spanning several pages) and some will be very short (~1/4 of a page).
At the moment I'm just inserting a \pagebreak at the end of each for-loop iteration, but that leaves a lot of wasted space with the shorter sections, so I'm trying to "group" each section (i.e. the heading, table and chart) so that there can be several per page, but they will break to a new page if the whole section won't fit.
I've tried using a figure or minipage environment, but for some reason those commands are printed as literal text when the plot is included; these work as expected with the heading and data table, but aren't returned properly in the presence of the image.
I've also tried to create a LaTeX samepage environment around the whole subsection (although not sure this will behave correctly with multi-page sections?) and then it appears that the Markdown generated for the plot is not interpreted correctly somewhere along the way (Pandoc?) when it's within that environment and throws an error when compiling the TeX due to the raw Markdown ![]... image tag.
Finally, I've also tried implementing \pagebreak[x] and \nopagebreak[y] hints at various points in the subsection but can't seem get these to be produce the desired page breaking behaviour.
I've generated an MWE that reproduces my issues below.
I'd be really grateful for any suggestions on how to get around this, or better ways of approaching "grouping" of elements that are generated in a dynamic fashion like this?
---
title: "Untitled"
author: "I don't know what I'm doing"
date: "26/07/2020"
output:
pdf_document:
latex_engine: xelatex
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, dev = "cairo_pdf")
```
```{r cars, results='asis'}
for (i in 1:5){
cat("\\begin{figure}")
cat(paste0("\\subsection{This is subsection ",i,"}"))
cat("\\Huge Here's some bulk text that would represent a data table... kasvfkwsvg fiauwe grfiwgiu iudaldbau iausbd ouasbou asdbva asdbaisd i iuahihai hiuh iaiuhqijdblab ihlibljkb liuglugu h uhi uhi uhqw iuh qoijhoijoijoi qwegru wqe grouw egq\\newline")
plot(mtcars$wt,mtcars[,i])
cat("\\end{figure}")
}
```
Edit to add: interestingly these figure and minipage environments seems to work as expected when executing the same example in an .Rnw using knitr... so does that narrow it down to an issue with Pandoc? Again, any help much appreciated!
What happens is that the raw TeX commands are not treated as TeX when going through Markdown. You can fix that by explicitly marking the relevant snippets as LaTeX:
for (i in 1:5){
cat("`\\begin{figure}`{=latex}")
cat(paste0("\\subsection{This is subsection ",i,"}"))
cat("\\Huge Here's some bulk text that would represent a data table... kasvfkwsvg fiauwe grfiwgiu iudaldbau iausbd ouasbou asdbva asdbaisd i iuahihai hiuh iaiuhqijdblab ihlibljkb liuglugu h uhi uhi uhqw iuh qoijhoijoijoi qwegru wqe grouw egq\\newline")
plot(mtcars$wt,mtcars[,i])
cat("`\\end{figure}`{=latex}")
}
See the generic raw attribute section in the pandoc manual for details.

Estimating read time for an R Markdown document and printing it possibly below the title

System: Windows 10, RStudio 1.0.153, R 3.4.1
I write blog posts in R Markdown. I would like to use Medium's approach of including an estimated reading time for the blog post just underneath the title, because research has suggested that including reading time in your blog post increases the likelihood that people will read it all. To estimate reading time I would like to use a simplified version of the algorithm used on Medium:
compute the number N of words in the document, preferably excluding the YAML preface and the R code chunks with echo=FALSE, i.e., all the text that my visitors won't see)
compute the number m of imported images and R-generated plots: if counting plots makes the problem too complicated, it's ok to count only imported images. I import each image separately in its own R chunk with the include_graphics function, thus even just counting the number of occurences of the include_graphics keyword would work.
estimated time in minutes is
seconds <- N/200*60+12*m
minutes <- round(seconds/60)
fin <- ifelse(minutes>1, " minutes", " minute")
estimation <- paste0("Estimated reading time: ", minutes, fin)
The code for the estimation should be executed each time the R Markdown doc is knitted, so ideally inside an R chunk in the R Markdown doc itself, or in an external function which is then sourced in an R chunk. This way, the estimation would automatically update as I edit the R Markdown document. How can I do that? For example, given the sample R document below, which has N=106 words and m=2 images, the text
"Estimated reading time: 1 minute"
should be be included in the HTML output, just below the title, author and date.
Sample R Markdown document:
---
title: "Testy test"
author: "anonymous"
date: "`r Sys.Date()`"
output: html_document
---
```{r setup, include=FALSE}
library(knitr)
opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r load_data, echo=FALSE}
include_graphics("doge.jpg")
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

R Markdown makes custom plot disappear if I set echo=FALSE

I created a custom function which sets mfrow to nxn and creates n^2 scatter plots, with multiple data sets on each plot, based on an input list of data frames. The signature of my plotting function looks like this:
plot.return.list<-function(df.list,num.plot,title)
Where df.list is my list of data frames, num.plot is the total number of plots to generate (used to set mfrow) and title is the overall plot title (the function generates titles for each individual sub-graph).
This creats plots fine when I run the function from the console. However, I'm trying to get this figure into a markdown document using RStudio, like so:
```{r, fig.height=6,fig.width=6}
plot.return.list(f.1.list,4,bquote(atop("Numerical Approximations vs Exact Soltuions for "
,dot(x)==-1*x*(t))))
```
Since I haven't set the echo option in my {r} statement, this prints both the plotting code as well as the plot itself. However, if my first line instead reads:
{r, fig.height=6,fig.width=6,echo=FALSE}
Then both the code AND the plot disappear from the final document.
How do I make the plot appear WITHOUT the code? According to the example RStudio gives, setting echo=FALSE should make the plot appear without the code, but that isn't the behavior I'm observing.
EDIT: I seem to have tracked my problem down to kable. Whether or not I'm making a custom plot-helper function, any call to kable kills my plot. This can be reproduced in a markdown:
---
title: "repro"
author: "Frank Moore-Clingenpeel"
date: "October 9, 2016"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(knitr)
options(default=TRUE)
repro.df<-data.frame((0.1*1:10)%*%t(1:10))
```
```{r, echo=FALSE}
kable(repro.df)
```
```{r, fig.height=6,fig.width=6,echo=FALSE}
plot(repro.df[,1],repro.df[,2])
```
In this code, the plot won't plot because I have echo set to false; removing the flag makes the plot visible
Also note that in my repro code, kable produces a table with a bunch of garbage in the last line--I don't know why, but this isn't true for my full original code, and I don't think it's related to my problem.
Thanks for the reproducible example. From this I can see that the problem is you don't have a newline between your table chunk and your plot chunk.
If you were to knit this and examine the MD file produced by knit (or set html_document as your output format and have keep_md: true to look at it), you would see that the table code and plot code are not separated by any newline. Pandoc needs this to delimit the end of the table. Without it, it thinks your ![](path/to/image.png) is part of the table and hence puts it as a "junk line" in the table rather than an image on its own.
Just add a newline between the two chunks and you will be fine. (Tables need to be surrounded with blank lines).
(I know you are compiling to LaTeX so it may confuse you why I am talking about markdown. In case it does, when you do Rmd -> PDF, Rmarkdown uses knit to go from RMD to MD, and then pandoc to go from MD to tex. This is why you still need to make sure your markdown looks OK).

How do I present a variable out of sequence in R markdown?

I am preparing a .Rmd document that begins with an executive summary. I would like to include some inline R code to present a few of the key results up front; however, those results are calculated as part of the body later in the document.
Is there any way to present a result in the rendered document out of order/sequence with the actual calculation?
You could use chunk reuse in knitr (see http://yihui.name/knitr/demo/reference/). Here you would put your chunks to analyze first but not create the output, then output the summary, then the details. Here is some quick markdown knitr code to show this:
```{r chunk1, echo=FALSE, results='hide'}
x <- rnorm(1)
x
```
the value of x is `r x`.
```{r chunk2, ref.label='chunk1', echo=TRUE, results='markup', eval=2}
```
Note that the code will be evaluated twice unless you take steps to prevent this (the eval=2 in my example).
Another option would be to create 2 child documents, the first runs your main code and creates the output, the second creates the summary. Then in your parent document you include the summary first, then the detail part. I think that you will need to run knitr on these by hand so that you do it in the correct order, the automatic child document tools would probably run in the wrong order.

Resources