Automate PDF Reports in R - r

I have a .CSV file that includes an ID column and several text columns (title of story, content of story) and columns for a multiple choice questions (each question in a different column). Also, there are columns for a numerical variable (ternary plots).
Here is a screen shot of the CSV file:
CSV File
Now what I'm trying to do is to automatically generate multiple PDF reports for each ID number (generate a unique report for each individual person). With different values in the report depending on the ID column in the CSV.
I thought the best way to do that in R was to create a RMarkdown file and use parameters to make the values of the report match the ID number values.
Here is my code for the RMarkdown file:
---
title: "`r params$new_title`"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
pdf_document:
latex_engine: xelatex
html_document:
df_print: paged
header-includes:
\usepackage{fontspec}
\usepackage{fancyhdr}
mainfont: Arial
params:
id:
label: ID
value: 1
input: select
choices:
- 1
- 2
- 3
- 4
- 5
new_title: "My Title!"
---
library(tidyverse)
library(ggtern)
library(ggplot2)
library(readr)
library(lubridate)
library(magrittr)
library(rmarkdown)
knitr::opts_chunk$set(echo = FALSE)
data <- readr::read_csv("dummy.csv")
data_id <- data %>%
filter(id == v)
**Your title:** `r data_id$title`
**Your micro-narrative:** `r data_id$narrative`
Now the code is working, but the formatting in the generated report is not how I want it.
If the same ID number has multiple entries for story title and story content, the values are displayed next to each other. What I want is this:
Story #1 title:
Story #1 content:
Story #2 title:
Story #2 content:
and NOT:
Title: story#1 title, story#2 title, etc...
Content: story#1, story#2, etc...
To automatically generate multiple reports with one click, I created a loop. Here is the code:
require(rmarkdown)
data = read_csv("dummy.csv")
slices = unique(data$id)
for (v in slices){
render("~/Desktop/My_L3/report.Rmd", output_file = paste0("~/Desktop/report_", v, ".pdf"),
params=list(new_title=paste("Quarterly Report -", v)))
}
The loop is working and I was able to generate multiple PDFs by just running this code.
Is this the easiest way to do it? Any other way you're aware of?
And lastly, how do I include the multiple choice questions in the RMarkdown file?
For example, if a certain ID number has 3 choices selected (three 1s in the CSV) how do I display the result as the following:
You selected the following choices: bananas, apples, oranges
I would really appreciate your help as I'm an R noob and still learning a lot of stuff.

#badi congrats! For a newcomer to R you managed already quite a steep hill.
I hope the following will help you moving further:
(I) observation: use of rmarkdown::render(... , params = list(...))
You can pass multiple variables and objects as params to your "report" Rmd.
Thus, when you have lengthy preparatory steps, you can load your data, prepare it, and filter it with the loop you use to call rmarkdown::render().
E.g. inside your for loop you could do something like df <- data %>% filter(id == v) and then pass df as (one of the) params, e.g. rmarkdown::render(... , params = list(new_title=paste("Quarterly Report -", v)), data = df)
Then define a params for the dataframe. I recommend to "load" a dummy object/df, e.g.
...
params:
id ...
data: !mtcars # dummy object/df - do not forget the ! mark
(II) printing dynamic and static text
There are different ways to achieve this. For your example, it looks like you are looking for something relatively well-formatted that can be constructed from your table columns.
For this sprintf() is your friend. I abbreviated your example with a lighter dataframe.
I print this in the beginning of the document/pdf output.
For this you have to set the chunk parameter results = "as-is" and wrap the sprintf() call into a cat() to allow the template formatting and block R/Rmd from adding other output format stuff (e.g. the ## you can see when I print the table above).
The choices you can combine with a paste() call. Of course this can be done with varying levels of sophistication that I leave to you to explore.
I keep the 1 and NA coding. You can replace these with what you think is appropriate (ifelse/case_when, or complex(er) string substitute operations.
To prepare the list of choices, I just paste everything together:
df <- params$data %>%
mutate(choice_text = paste(choice1, choice2, choice3, sep = ","))
The following code-chunk defines the static/dynamic text template for sprintf() and we iterate over the rows of the data dataframe
# to programmatically print text sprintf()
# allows to combine static and dynamic text
# first define a template how your section looks like
# %s points to a string - not used here by %f caters for (float)numbers
template <- "
## Title %s
With content: %s.
\n You selected the following choices: %s
" # end of your "dynamic" text template
# recall to add an empty line for spacing
# you can force a new line for text entries with \n
# iterate over the input data frame
for (i in seq(nrow(df))) {
current <- df[i, ]
cat(sprintf(template, current$title, current$content, current$choice_text))
}
With the adequately set-up pdf template, you will get the following.
Note: My report breaks over to a 2nd page, I only show the first page here.

Related

How to create text paragraph for results in R?

I'm writing my results (both text and tables) and the process is quite time-consuming...
I was wondering if a function in R could help me paste my results into the text so I could copy it into WORD?
For example:
R square = put number here, B = put number here... The difference between the models was significant/nonsignificant with p < put number here
Then I would love to paste it into WORD.
Best regards,
Daniel
Couldn't find any function that would help me... Tried Flextable...
If I understand the question correctly, this can largely be done in base R and reported with R Markdown or Quarto. You can either write the document as a .qmd or .rmd file and export to word, or simply render in RStudio and then copy and paste as you like.
I work the following way:
First assign the models to variables using <-.
Then examine the structure of the model with str().
In the text of the document, use $ with the variable name to access the various parts. Or paste it in the console for cut and paste of the values.
You may want to round() some numbers. Additionally, the function scales::pvalue() is very helpful for p values.
library(scales)
# Generate some model and assign it to a variable
model <- cor.test(cars$speed, cars$dist)
# Examine the structure of the model object
str(model)
Then in the text of an R markdown or Quarto document you can write:
$r$(`r model$parameter`) = `r round(model$estimate, digits = 2)`, $p$= `r pvalue(model$p.value, accuracy = 0.05)`
This will give the following: r(48) = 0.81, p= <0.05
In a code chunk $ is used to access a component. In text $ is used to start an equation. That can be confusing. To access a piece of R code in R Markdown text, use the convention `r <function or variable>` as in `r model$parameter` above. Alternatively, you can simply paste model$parameter into the console and copy the results to your target document.
You could knit the document to word, here's a quick example using inline R:
---
title: "test.Rmd"
output: word_document
---
``` {r, echo = FALSE}
# some variables
a <- 24
b <- 100
```
R square = `r a`, B = `r b` ... The difference between the models was `r if (a == 24) {"significant"} else {"insignificant"}` with p < `r b - a`

Split dataset into clusters and save each cluster on a separate pdf document in R

Using the 'mtcars' dataset, how can one split the dataset into clusters using the 'Carb' field and output each grid on a separate pdf document with the Carb value being the name of the pdf document. I am new in R and the solutions I have found enable one to save each cluster on a different page of a pdf document. Have not found one where its possible to save each cluster as a separate document.
You can create pdfs for each part of dataset using approach of parameterized reports in Rmarkdown and not just creating tables, you can create a whole report for each clusters of the dataset.
So to do that, we need to first create a template rmarkdown file containing code for printing data as table where we also need to specify params in yaml of the file.
---
title: "Untitled"
author: "None"
date: '2022-07-26'
output: pdf_document
params:
carb: 1
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown table
```{r, echo=FALSE}
data(mtcars)
df <- mtcars[mtcars$carb %in% params$carb,]
knitr::kable(df, caption = paste("mtcars table for carb", params$carb))
```
Then from a separate R file (r script) or from console run this code which will create six pdfs for each value of carb
lapply(unique(mtcars$carb), function(carb_i) {
rmarkdown::render("tables.Rmd",
params = list(carb = carb_i),
output_file = paste0("table_for_carb",carb_i, ".pdf"))
})
So, for example, table_for_carb1.pdf looks like this
To know more how to create parameterized report with rmarkdown, see here
Here is an option with package gridExtra.
library(gridExtra)
sp <- split(mtcars, mtcars$carb)
lapply(seq_along(sp), \(i) {
carb <- names(sp)[i]
filename <- sprintf("carb%s.pdf", carb)
pdf(filename)
grid.table(sp[[i]])
dev.off()
})
To write the clusters to the same PDF file, one table per page, start by exporting the first table, then, in the lapply loop go to next page and export the next table. The new pages must be between the tables and there must already exist a page (the 1st) before starting a new one for the next table.
And since the filename doesn't depend on the number of carburetors, the code can be simplified and rewritten without the need for seq_along.
library(grid)
library(gridExtra)
sp <- split(mtcars, mtcars$carb)
pdf("carb.pdf")
grid.table(sp[[1]])
lapply(sp[-1], \(x) {
grid.newpage()
grid.table(x)
})
dev.off()

Creating R Markdown htmls by looping

I have a large report that I am running through R Markdown. The report has a data frame. At the beginning of the script, the data frame is filtered. After that, it does lots of manipulation and interpretation.
Currently, I change what I filter for and knit each report individually. I want to automate this process so that I can provide a vector of terms to filter with and the reports are generated.
Here is an example:
---
title: "Create markdown htmls with loop"
author: "Nathan Roe"
date: "2/17/2022"
output: html_document
---
library(dplyr)
my_df <- data.frame(my_letters = letters[1:5], my_numbers = 1:5)
my_df %>% filter(my_letters == "a")
I want to generate reports for a, b, c, d, and e. Currently, I have to go in and change what is being filtered for. As shown in the example above, I am filtering for "a". After that, I would have to change it to filter for "b", and so on. Is there a way to automate this, so that I provide a vector a, b, c, d, and e and reports are generated based on those filters and htmls are generated using the letter as the title. For example, I provide my_letters <- letters[1:5] and the script creates a.html, b.html, c.html, d.html, and e.html.
It seems similar to this, https://community.rstudio.com/t/loop-for-output-files/79716, but this example is poorly explained, if it does even answer the question.
The link you mention gives all the elements to generate a parametrized report.
On your example, you could knit with custom parameters using rmarkdown::render.
markdown file : test.Rmd
---
title: "Create markdown htmls with loop"
author: "Nathan Roe"
date: "2/17/2022"
output: html_document
params:
letter: 'a'
---
# `r paste('Processing letter ',letter)`
```{r}
letter
```
html file generation with loop :
for (letter in letters[1:5] ) {
rmarkdown::render(input = 'test.Rmd',
output_file = paste0(letter,".html"),
params = list(letter = letter))
}
...

R Markdown embedding a function into a simple table

I'm new to R and I'm really liking the flexibility of R Markdown to create reports.
My problem is that I want to use a random number generating function I've created my tables. I want simple tables to include string headers and the following function:
> ran<-function(x){
+ x<-runif(n=1, min=13.5,max=15.5)
+ round(x, digits=2)}.
It won't allow me to create a table using this method?
```{r}
String |String |String
-------|------|------
ran(x)|ran(x)|ran(x)
```
My ultimate goal is to create practice worksheets with simple statistics that will be different but within a bounded integer range - so I can ask follow-up questions with some idea of the mean, median etc.
Any help would be greatly appreciated.
Perhaps you should read up on both how to build working R code and how to code up Rmd files since your function doesn't work and there are a few places in the R Markdown docs that show how to do this:
---
output: html_document
---
```{r echo=FALSE}
ran <- function(x) {
runif(n=1, min=13.5, max=15.5) + round(x, digits=2)
}
```
One |Two |Three
-----------|-------------|-----------
`r ran(2)` | `r ran(3)` | `r ran(4)`
`r ran(2)` | `r ran(3)` | `r ran(4)`
`r ran(2)` | `r ran(3)` | `r ran(4)`
`r ran(2)` | `r ran(3)` | `r ran(4)`
generates:
Also, neither SO nor RStudio charges extra for spaces in code blocks. It'd be good to show students good code style while you're layin' down stats tracks.
Here is an approach that automates much of the report generation and reduces the amount of code you need to type. For starters, you can turn this into a parameterized report, which would make it easier to create worksheets using different values of x. Here's an example:
In your rmarkdown document you would declare parameters x and n in the yaml header. n is the number of random values you want to produce for each value of x. The x and n values in the yaml header are just the defaults knitr uses if no other values are input when you render the report:
---
output: html_document
params:
x: !r c(1,5,10)
n: 10
---
Then, in the same rmarkdown document you would have the text and code for your worksheet. You access the parameters x and n with params$x and params$n, respectively.
For example, the rest of the rmarkdown document could look like the code below. We put x into a list called x_vals with named elements, so that the resulting column names in the output will be the names of the list elements. We feed that list to sapply to get a column of n random values for each value of x. The whole sapply statement is wrapped in kable, which produces a table in rmarkdown format.
```{r, include=FALSE}
library(knitr)
```
```{r, echo=FALSE}
# Create a named list of the x values that we passed into this document
x_vals = as.list(setNames(params$x, paste0("x=", params$x)))
kable(sapply(x_vals, function(i) round(runif(params$n, 13.5, 15.5) + i, 2)))
```
You can now click the "knit" button and it will produce a table using the default parameter values:
If instead you want to use different values for x and/or n, open a separate R script file and type the following:
rmarkdown::render("Worksheet.Rmd",
params = list(x = c(2,4,6,8),
n = 5),
output_file="Worksheet.html")
In the code above, the render function compiles the rmarkdown document we just created, but with new x and n values, and saves the output to a file called Worksheet.html. (I've assumed that we've saved the rmarkdown document to a file called Worksheet.Rmd.) Here's what the output looks like:
You can also, of course, add parameters for the lower and upper limits of the runif function, rather than hard-coding them as 13.5 and 15.5.
If you want to create several worksheets, each with different x values, you can put render in a loop:
df = expand.grid(1:3,5:6,10:11)
for (i in 1:nrow(df)) {
rmarkdown::render("Worksheet.Rmd",
params = list(x=unlist(df[i,]), n=10),
output_file=paste0(paste(unlist(df[i,]),collapse="_"),".html"))
}

Generate both markdown comments and html wigets in a loop rmarkdown

I'm trying to generate a flexdashboard, creating each page from within a loop and with each of the generated pages containing a dygraph (although any HTML widget ought to behave the same).
I have looked extensively and it seems that rmarkdown comments can be generated using cat("title") (as per the solution here: generate markdown comments within for loop).
The HTML widgets on the other hand only behave nicely if you use htmltools::tagList() (as per the solution here:For loop over dygraph does not work in R).
I dont have working code to share, but this broadly gives the picture of what I am hoping to achieve:
for (i in 1:ncol(downloadedData)){
fund_NAVS <- downloadedData[,i] #this is an xts object
fund_NAVS <- fund_NAVS[!is.na(fund_NAVS)]
cat("pageTitle")
cat("===================================== \n")
cat("Row\n")
cat("------------------------------------- \n")
cat("### Page title")
dygraph(fund_NAVS)
}
I've been able to get this to partially work using pander::pandoc.header. However, getting content (plots and HTML objects) to actually show up in the final HTML file is a different story.
---
output:
flexdashboard::flex_dashboard:
orientation: columns
source: embed
vertical_layout: fill
---
```{r results='asis'}
library(pander)
pages <- 5
cols <- 2
sections <- 3
for (p in 1:pages) {
pandoc.header(paste("Page", p), level = 1)
for (c in 1:cols) {
pandoc.header(paste("Column", c), level = 2)
for (s in 1:sections) {
pandoc.header(paste("Section", s), level = 3)
cat("hi")
pandoc.p("")
}
}
}
```
I was able to autogenerate content by building r chunks explicitly then kniting them inline with r paste(knitr::knit(text = out)). This amazing line of code was found in an SO post.
In my case, I wanted to produce a series of graphs, each with a separate tab, with different content. Each graph was similar but there were numerous (about 15) and I didn't want to copy/paste all of the separate chunks.
Here is a gist you can download of a more simple example. (The code is also below but note that I add \ before each chunk so that it rendered as a single block of code so remove the \ before running.) I built a much more complicated function to build graphs, but the idea of the R chunks can be carried forward to any list object containing htmlwidgets as elements.
---
title: "Loop to Auto Build Tabs Containing htmlwidgets"
output: flexdashboard::flex_dashboard
---
\```{r setup, echo =FALSE, eval = TRUE}
library(tidyverse)
library(flexdashboard)
library(highcharter)
labels <- mtcars %>% names # these will serve as labels for each tab
# create a bunch of random, nonsensical line graphs
hcs <- purrr::map(.x = mtcars, ~highcharter::hchart(mtcars, y = .x, type = 'line')) %>%
setNames(labels) # assign names to each element to use later as tab titles
\```
Page
====================
Column {.tabset .tabset-fade}
-----------------------------
<!-- loop to build each tabs (in flexdashboard syntax) -->
<!-- each element of the list object `out` is a single tab written in rmarkdown -->
<!-- you can see this running the next chunk and typing `cat(out[[1]])` -->
\```{r, echo = FALSE, eval = TRUE}
out <- lapply(seq_along(hcs), function(i) {
a1 <- knitr::knit_expand(text = sprintf("### %s\n", names(hcs)[i])) # tab header, auto extracts names of `hcs`
a2 <- knitr::knit_expand(text = "\n```{r}") # start r chunk
a3 <- knitr::knit_expand(text = sprintf("\nhcs[[%d]]", i)) # extract graphs by "writing" out `hcs[[1]]`, `hcs[[2]]` etc. to be rendered later
a4 <- knitr::knit_expand(text = "\n```\n") # end r chunk
paste(a1, a2, a3, a4, collapse = '\n') # collapse together all lines with newline separator
})
\```
<!-- As I mentioned in the SO post, I don't quite understand why it has to be -->
<!-- 'r paste(knitr::knit(...)' vs just 'r knitr::knit(...)' but hey, it works -->
`r paste(knitr::knit(text = paste(out, collapse = '\n')))`

Resources