Create PDF using Rhtml and iterate over data frame - r

I have a data frame that has demographic information split into 16 groups. I basically need to iterate over these groups and create a PDF page for each group. I've tried using Rhtml but so far I can only get one page to generate. Is there a way to use templates or something?

When you need PDF output, why don't you directly compile .Rnw to .pdf?
Here an example using the iris dataset. It prints the first few rows of each species on a new page:
\documentclass{article}
\begin{document}
<<results = "asis", echo = FALSE>>=
library(xtable)
newpage <- ""
invisible(lapply(unique(iris$Species), FUN = function(x) {
cat(newpage)
cat(sprintf("\\section{%s}", x))
current <- head(subset(x = iris, subset = Species == x))
print(xtable(current))
newpage <<- "\\clearpage"
}))
#
\end{document}
I additionally used xtable to easily get a nicely formatted table. The output looks like this:

Related

Split dataset into clusters and save each cluster on a separate pdf document in R

Using the 'mtcars' dataset, how can one split the dataset into clusters using the 'Carb' field and output each grid on a separate pdf document with the Carb value being the name of the pdf document. I am new in R and the solutions I have found enable one to save each cluster on a different page of a pdf document. Have not found one where its possible to save each cluster as a separate document.
You can create pdfs for each part of dataset using approach of parameterized reports in Rmarkdown and not just creating tables, you can create a whole report for each clusters of the dataset.
So to do that, we need to first create a template rmarkdown file containing code for printing data as table where we also need to specify params in yaml of the file.
---
title: "Untitled"
author: "None"
date: '2022-07-26'
output: pdf_document
params:
carb: 1
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown table
```{r, echo=FALSE}
data(mtcars)
df <- mtcars[mtcars$carb %in% params$carb,]
knitr::kable(df, caption = paste("mtcars table for carb", params$carb))
```
Then from a separate R file (r script) or from console run this code which will create six pdfs for each value of carb
lapply(unique(mtcars$carb), function(carb_i) {
rmarkdown::render("tables.Rmd",
params = list(carb = carb_i),
output_file = paste0("table_for_carb",carb_i, ".pdf"))
})
So, for example, table_for_carb1.pdf looks like this
To know more how to create parameterized report with rmarkdown, see here
Here is an option with package gridExtra.
library(gridExtra)
sp <- split(mtcars, mtcars$carb)
lapply(seq_along(sp), \(i) {
carb <- names(sp)[i]
filename <- sprintf("carb%s.pdf", carb)
pdf(filename)
grid.table(sp[[i]])
dev.off()
})
To write the clusters to the same PDF file, one table per page, start by exporting the first table, then, in the lapply loop go to next page and export the next table. The new pages must be between the tables and there must already exist a page (the 1st) before starting a new one for the next table.
And since the filename doesn't depend on the number of carburetors, the code can be simplified and rewritten without the need for seq_along.
library(grid)
library(gridExtra)
sp <- split(mtcars, mtcars$carb)
pdf("carb.pdf")
grid.table(sp[[1]])
lapply(sp[-1], \(x) {
grid.newpage()
grid.table(x)
})
dev.off()

RMarkdown: ggplot into a table

There are already a few questions considering ggplots in RMarkdown but none has answered my question as how to put a ggplot into a table with kable() by knitr.
I ve tried this link:
How can I embed a plot within a RMarkdown table?
But have not had any luck so far. Any ideas?
The idea was to put all plots into a list with
a<-list(p1,p2,p3...)
and then having the table with
{r}kable(a)
Additional text should also be able to be included
b<-("x","y","z",...)
kable (c(a,b),col.names=c())
Thanks for your help
Frieder
I experimented some with this and the following is the best I could come up with. This is a complete markdown document you should be able to paste into RStudio and hit the Knit button.
Two relevant notes here.
Setting the file links directly into kable doesn't work as it is wrapped in html such that it is interpreted as text, so we need to gsub() it in. An alternative is to set kable(..., escape = FALSE), but it is a risk that other text might cause problems.
Also, the chunk option results = 'asis' is necessary to have the print(kab) return raw html.
I don't know if these are problems for the real application.
---
title: "Untitled"
author: "me"
date: "02/06/2020"
output: html_document
---
```{r, results = 'asis'}
library(ggplot2)
library(svglite)
n <- length(unique(iris$Species))
data <- split(iris, iris$Species)
# Create list of plots
plots <- lapply(data, function(df) {
ggplot(df, aes(Sepal.Width, Sepal.Length)) +
geom_point()
})
# Create temporary files
tmpfiles <- replicate(n, tempfile(fileext = ".svg"))
# Save plots as files, get HTML links
links <- mapply(function(plot, file) {
# Suit exact dimensions to your needs
ggsave(file, plot, device = "svg", width = 4, height = 3)
paste0('<figure><img src="', file, '" style = "width:100%"></figure>')
}, plot = plots, file = tmpfiles)
# Table formatting
tab <- data.frame(name = names(plots), fig = paste0("dummy", LETTERS[seq_len(n)]))
kab <- knitr::kable(tab, "html")
# Substitute dummy column for figure links
for (i in seq_len(n)) {
kab <- gsub(paste0("dummy", LETTERS[i]), links[i], kab, fixed = TRUE)
}
print(kab)
```
I have found my way around it as described in the link I posted.
I. Saved my plot as a picture
II. Used sprintf() to insert picture into table with this command from Rmarkdown:
![](path/to/file)
Poor, but it works. If anybody finds a solution, I will always be interested in smart coding.

rstudio hangs and aborts with rmarkdown loop

I have several datasets each of which have a common grouping factor. I want to produce one large report with separate sections for each grouping factor. Therefore I want to re-run a set of rmarkdown code for each iteration of the grouping factor.
Using the following approach from here doesnt work for me. i.e.:
---
title: "Untitled"
author: "Author"
output: html_document
---
```{r, results='asis'}
for (i in 1:2){
cat('\n')
cat("#This is a heading for ", i, "\n")
hist(cars[,i])
cat('\n')
}
```
Because the markdown I want to run on each grouping factor does not easily fit within one code chunk. The report must be ordered by grouping factor and I want to be able to come in and out of code chunks for each iteration over grouping factor.
So I went for calling an Rmd. with render using a loop from an Rscript for each grouping factor as found here:
# run a markdown file to summarise each one.
for(each_group in the_groups){
render("/Users/path/xx.Rmd",
output_format = "pdf_document",
output_file = paste0(each_group,"_report_", Sys.Date(),".pdf"),
output_dir = "/Users/path/folder")
}
My plan was to then combine the individual reports with pdftk. However, when I get to the about the 5th iteration my Rstudio session hangs and eventually aborts with a fatal error. I have ran individually the Rmd. for the grouping factors it stops at which work fine.
I tested some looping with the following simple test files:
.R
# load packages
library(knitr)
library(markdown)
library(rmarkdown)
# use first 5 rows of mtcars as example data
mtcars <- mtcars[1:5,]
# for each type of car in the data create a report
# these reports are saved in output_dir with the name specified by output_file
for (car in rep(unique(rownames(mtcars)), 100)){
# for pdf reports
rmarkdown::render(input = "/Users/xx/Desktop/2.Rmd",
output_format = "pdf_document",
output_file = paste("test_report_", car, Sys.Date(), ".pdf", sep=''),
output_dir = "/Users/xx/Desktop")
}
.Rmd
```{r, include = FALSE}
# packages
library(knitr)
library(markdown)
library(rmarkdown)
library(tidyr)
library(dplyr)
library(ggplot2)
```
```{r}
# limit data to car name that is currently specified by the loop
cars <- mtcars[rownames(mtcars)==car,]
# create example data for each car
x <- sample(1:10, 1)
cars <- do.call("rbind", replicate(x, cars, simplify = FALSE))
# create hypotheical lat and lon for each row in cars
cars$lat <- sapply(rownames(cars), function(x) round(runif(1, 30, 46), 3))
cars$lon <- sapply(rownames(cars), function(x) round(runif(1, -115, -80),3))
cars
```
Today is `r Sys.Date()`.
```{r}
# data table of cars sold
table <- xtable(cars[,c(1:2, 12:13)])
print(table, type="latex", comment = FALSE)
```
This works fine. So I also looked at memory pressure while running my actual loop over the Rmd. which gets very high.
Is there a way to reduce memory when looping over a render call to an Rmd. file?
Is there a better way to create a report for multiple grouping factors than looping over a render call to an Rmd. file, which doesn't rely on the entire loop being inside one code chunk?
Found a solution here rmarkdown::render() in a loop - cannot allocate vector of size
knitr::knit_meta(class=NULL, clean = TRUE)
use this line before the render line and it seems to work
I am dealing with the same issue now and it's very perplexing. I tried to create some simple MWEs but they loop successfully on occasion. So far, I've tried
Checking the garbage collection between iterations of rmarkdown::render. (They don't reveal any special accumulations.)
Removing all inessential objects
Deleting any cached files manually
Here is my question:
How can we debug hangs? Should we set up special log files to understand what's going wrong?

How to split kable over multiple columns?

I am trying to produce a "longitudinal" layout for long tables in RMarkdown with kable. For example, I would like a table to be split over two columns, like in the example below:
dd <- data.frame(state=state.abb, freq=1:50)
kable(list(state=dd[1:25,], state=dd[26:50,]))
However, this hack produces an output that looks a way worse than the normal kable output (for example the header is not in bold). Is there a "proper" way to do this using kable?
kable is a great tool, but has limits. For the type of table you're describing I would use one of two different tools depending on output wanted.
Hmisc::latex for .Rnw -> .tex -> .pdf
htmlTable::htmlTable for .Rmd -> .md -> .html
Here is an example of the latter:
dd <- data.frame(state=state.name, freq=1:50)
dd2 <- cbind(dd[1:25, ], dd[26:50, ])
library(htmlTable)
htmlTable(dd2,
cgroup = c("Set 1:25", "Set 26:50"),
n.cgroup = c(2, 2),
rnames = FALSE)
You can still use Kable with a slight modification to your code.
dd <- data.frame(state=state.abb, freq=1:50)
knitr::kable(
list(dd[1:25,], dd[26:50,]),
caption = 'Two tables placed side by side.',
booktabs = TRUE
)
This code is a modification of this. You can also find more information about tables on that page

R Knitr PDF: Is there a posssibility to automatically save PDF reports (generated from .Rmd) through a loop?

I would like to create a loop, which allows me to automatically save PDF reports, which were generated from a .Rmd file. For instance, if a variable "ID" has 10 rows, I would like R to automatically save me 10 reports, into a specific directory. These reports shall vary based on the ID selected.
A previous post (Using loops with knitr to produce multiple pdf reports... need a little help to get me over the hump) has dealt with the creation of multiple pdf reports generated from .Rnw files. I tried to apply the approach as follows:
#Data
```{r, include=FALSE}
set.seed(500)
Score <- rnorm(40, 100, 15)
Criteria1<-rnorm(40, 10, 5)
Criteria2<-rnorm(40, 20, 5)
ID <- sample(1:1000,8,replace=T)
df <- data.frame(ID,Score,Criteria1,Criteria2)
#instead of manually choosing the ID:
subgroup<- subset(df, ID==1)
# I would like to subset the Data through a loop. My approach was like like this:
for (id in unique(df$ID)){
subgroup<- df[df$ID == id,]}
```
```{r, echo=FALSE}
#Report Analysis
summary(subgroup)
```
#Here will be some text about the summary.
# At the end the goal is to produce automatic pdf reports with the ID name as a filename:
library("rmarkdown")
render("Automated_Report.rmd",output_file = paste('report.', id, '.pdf', sep=''))
Adapting your example:
You need one .rmd "template" file. It could be something like this, save it as template.rmd.
This is a subgroup report.
```{r, echo=FALSE}
#Report Analysis
summary(subgroup)
```
Then, you need an R script that will load the data you want, loop through the data subsets, and for each subset
Define the subgroup object used inside the template
render the template to the desired output
So, in this separate script:
# load data
set.seed(500)
Score <- rnorm(40, 100, 15)
Criteria1<-rnorm(40, 10, 5)
Criteria2<-rnorm(40, 20, 5)
ID <- sample(1:1000,8,replace=T)
df <- data.frame(ID,Score,Criteria1,Criteria2)
library("rmarkdown")
# in a single for loop
# 1. define subgroup
# 2. render output
for (id in unique(df$ID)){
subgroup <- df[df$ID == id,]
render("template.rmd",output_file = paste0('report.', id, '.html'))
}
This produced 8 html files in my working directory, each with a summary of a different subset of the data.
Note that this will not work if you try clicking the "knit" button inside RStudio, as that runs the R code in a separate R session. However, when you run from the console explicitly using render (or knit2pdf) the R code in the rmd file still has access to the global environment.
Rather than relying on global variables, another option would be to use parametrized reports, defining parameters in the YAML header, and passing the parameter values in as arguments to rmarkdown::render.

Resources