rstudio hangs and aborts with rmarkdown loop - r

I have several datasets each of which have a common grouping factor. I want to produce one large report with separate sections for each grouping factor. Therefore I want to re-run a set of rmarkdown code for each iteration of the grouping factor.
Using the following approach from here doesnt work for me. i.e.:
---
title: "Untitled"
author: "Author"
output: html_document
---
```{r, results='asis'}
for (i in 1:2){
cat('\n')
cat("#This is a heading for ", i, "\n")
hist(cars[,i])
cat('\n')
}
```
Because the markdown I want to run on each grouping factor does not easily fit within one code chunk. The report must be ordered by grouping factor and I want to be able to come in and out of code chunks for each iteration over grouping factor.
So I went for calling an Rmd. with render using a loop from an Rscript for each grouping factor as found here:
# run a markdown file to summarise each one.
for(each_group in the_groups){
render("/Users/path/xx.Rmd",
output_format = "pdf_document",
output_file = paste0(each_group,"_report_", Sys.Date(),".pdf"),
output_dir = "/Users/path/folder")
}
My plan was to then combine the individual reports with pdftk. However, when I get to the about the 5th iteration my Rstudio session hangs and eventually aborts with a fatal error. I have ran individually the Rmd. for the grouping factors it stops at which work fine.
I tested some looping with the following simple test files:
.R
# load packages
library(knitr)
library(markdown)
library(rmarkdown)
# use first 5 rows of mtcars as example data
mtcars <- mtcars[1:5,]
# for each type of car in the data create a report
# these reports are saved in output_dir with the name specified by output_file
for (car in rep(unique(rownames(mtcars)), 100)){
# for pdf reports
rmarkdown::render(input = "/Users/xx/Desktop/2.Rmd",
output_format = "pdf_document",
output_file = paste("test_report_", car, Sys.Date(), ".pdf", sep=''),
output_dir = "/Users/xx/Desktop")
}
.Rmd
```{r, include = FALSE}
# packages
library(knitr)
library(markdown)
library(rmarkdown)
library(tidyr)
library(dplyr)
library(ggplot2)
```
```{r}
# limit data to car name that is currently specified by the loop
cars <- mtcars[rownames(mtcars)==car,]
# create example data for each car
x <- sample(1:10, 1)
cars <- do.call("rbind", replicate(x, cars, simplify = FALSE))
# create hypotheical lat and lon for each row in cars
cars$lat <- sapply(rownames(cars), function(x) round(runif(1, 30, 46), 3))
cars$lon <- sapply(rownames(cars), function(x) round(runif(1, -115, -80),3))
cars
```
Today is `r Sys.Date()`.
```{r}
# data table of cars sold
table <- xtable(cars[,c(1:2, 12:13)])
print(table, type="latex", comment = FALSE)
```
This works fine. So I also looked at memory pressure while running my actual loop over the Rmd. which gets very high.
Is there a way to reduce memory when looping over a render call to an Rmd. file?
Is there a better way to create a report for multiple grouping factors than looping over a render call to an Rmd. file, which doesn't rely on the entire loop being inside one code chunk?

Found a solution here rmarkdown::render() in a loop - cannot allocate vector of size
knitr::knit_meta(class=NULL, clean = TRUE)
use this line before the render line and it seems to work

I am dealing with the same issue now and it's very perplexing. I tried to create some simple MWEs but they loop successfully on occasion. So far, I've tried
Checking the garbage collection between iterations of rmarkdown::render. (They don't reveal any special accumulations.)
Removing all inessential objects
Deleting any cached files manually
Here is my question:
How can we debug hangs? Should we set up special log files to understand what's going wrong?

Related

Split dataset into clusters and save each cluster on a separate pdf document in R

Using the 'mtcars' dataset, how can one split the dataset into clusters using the 'Carb' field and output each grid on a separate pdf document with the Carb value being the name of the pdf document. I am new in R and the solutions I have found enable one to save each cluster on a different page of a pdf document. Have not found one where its possible to save each cluster as a separate document.
You can create pdfs for each part of dataset using approach of parameterized reports in Rmarkdown and not just creating tables, you can create a whole report for each clusters of the dataset.
So to do that, we need to first create a template rmarkdown file containing code for printing data as table where we also need to specify params in yaml of the file.
---
title: "Untitled"
author: "None"
date: '2022-07-26'
output: pdf_document
params:
carb: 1
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown table
```{r, echo=FALSE}
data(mtcars)
df <- mtcars[mtcars$carb %in% params$carb,]
knitr::kable(df, caption = paste("mtcars table for carb", params$carb))
```
Then from a separate R file (r script) or from console run this code which will create six pdfs for each value of carb
lapply(unique(mtcars$carb), function(carb_i) {
rmarkdown::render("tables.Rmd",
params = list(carb = carb_i),
output_file = paste0("table_for_carb",carb_i, ".pdf"))
})
So, for example, table_for_carb1.pdf looks like this
To know more how to create parameterized report with rmarkdown, see here
Here is an option with package gridExtra.
library(gridExtra)
sp <- split(mtcars, mtcars$carb)
lapply(seq_along(sp), \(i) {
carb <- names(sp)[i]
filename <- sprintf("carb%s.pdf", carb)
pdf(filename)
grid.table(sp[[i]])
dev.off()
})
To write the clusters to the same PDF file, one table per page, start by exporting the first table, then, in the lapply loop go to next page and export the next table. The new pages must be between the tables and there must already exist a page (the 1st) before starting a new one for the next table.
And since the filename doesn't depend on the number of carburetors, the code can be simplified and rewritten without the need for seq_along.
library(grid)
library(gridExtra)
sp <- split(mtcars, mtcars$carb)
pdf("carb.pdf")
grid.table(sp[[1]])
lapply(sp[-1], \(x) {
grid.newpage()
grid.table(x)
})
dev.off()

Convert list of different length into data table for markdown for html format

This is what Im doing to generate a markdown so that all the things should be in one place.
How can i put these output into a datatable form which are more readable and easier to search.The list which is made are of different length. Each list has a series of table under it.
If there a way to convert these differing length list to data table format that would be really helpful
The table looks like this
## Prepare for analyses
```{r,warning=FALSE,message=FALSE}
set.seed(1234)
library(europepmc)
library(tidypmc)
library(tidyverse)
#library(dplyr)
```
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
##Cytarabine cytogenetically normal aml adult clinical trial Randomized Controlled Trial. 828 records found, showing 10
```{r,include=FALSE}
b <-epmc_search(query = 'cytarabine cytogenetically normal aml adult clinical trial Randomized Controlled Trial OPEN_ACCESS:Y',limit = 10)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
docs <- map(pmcids, epmc_ftxt)
my_tables <- map(docs, pmc_table)
```
```{r}
names(my_tables) <- pmcids
```
The code chunk input and output is then displayed as follows:
```{r basicconsole}
source("flat.R")
L1 <- flattenlist(my_tables)
l.f <- Filter(function(a) any(!is.na(a)), L1)
l.f
#tibble:::print.tbl_df(head(df))
#n <- paste0("Valporic_", names(l.f), ".txt")
for (i in 1:length(l.f)) {
write.table(l.f[i], sep = "\t",row.names = FALSE,col.names = TRUE,file=paste0(names(l.f)[i], ".txt"))
}
UPDATE
I have manged to covert those tibble into dataframe
using this solution
##Outout
```{r}
abc <- mapply(cbind, l.f)
abc
But when it is rendered in the markdown the column formatting is gone. Now i have now dataframe inside list.
But still im not sure how to put that into a data table
**UPDATE 2.0 **
The better approach is to read those saved output as list of files into data table and then use it as markdown but so far it is taking only one ID only. My code.
tbl_fread <-
list.files(pattern = "*.txt") %>%
map_df(~fread(.))
knitr::kable(head(tbl_fread), "pipe")
Is it possible to put these files as such.
if a list of file are from one PMCID then those would be all in one column such as if PMCID one has 3 output then all of them should be one the same row. Then the next PMCID in the second one etc etc.
UPDATE new
I have managed to align the output into more readable format. But It seems that by default all the files assigned to multiple columns which would be the case given that im reading all the files together since my idea of using the list to data table didn't work.
If i can push or stack each unique PMCID over one another instead of all in one after another that would be. Good
knitr::kable(tbl_fread, align = "lccrr")
This may be something you can adapt for R Markdown. I'm not sure what the rationale is to save and load the tables. Instead, you could obtain the tables and show in html directly.
As you are using HTML, make sure to have results='asis' in your chunk. You can use a for loop and seq_along to show each table. You can include information in your table caption, such as the PMCID as well as table number.
---
title: "test13121"
author: "Ben"
date: "1/31/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Libraries
```{r}
library(tidypmc)
library(tidyverse)
library(europepmc)
library(kableExtra)
```
# Get Articles
```{r, echo = FALSE}
b <-epmc_search(query = 'cytarabine aml OPEN_ACCESS:Y',limit = 6)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
docs <- map(pmcids, epmc_ftxt)
my_tables <- map(docs, pmc_table)
names(my_tables) <- pmcids
```
# Show Tables
```{r, echo=F, results='asis'}
for (i in seq_along(my_tables)) {
for (j in seq_along(my_tables[[i]])) {
print(kable(x = my_tables[[i]][[j]], caption = paste0(names(my_tables)[i], ": Table ", j)))
}
}
```

Write multiple PDF object in one file using magick in R

I am working to stick two PNGs side by side and then convert the concatenated object to a PDF using the code below. Now how can I write all the PDF objects as pages in a single file? I tried to save all the objects in a list and then pass the list to image_write(), but it did not work.
library(magick)
vec_out <- list()
for(i in 1:length(all_stims)){
stim <- all_stims[i]
img1 <- image_read(file.path(figures_folder, "across_cluster_heatmaps", paste0("bendall_",stim,".png")))
img2 <- image_read(file.path(figures_folder, "across_cluster_heatmaps", paste0("farmer_", stim,".png")))
imgs <- c(img1, img2)
imgs <- image_append(imgs)
imgs_pdf <- image_convert(imgs)
vec_out[[i]] <- imgs_pdf
}
image_write(vec_out, path = file.path(figures_folder, "test.pdf"), format = "pdf")
Any suggestions would be helpful. Thanks.
Couldn't you write a knitr-document instead, or as rmarkdown-file?
I could not run your code, since it's not reproducible.
A mini example in rmarkdown:
The following code produces 3 plots in a for-loop.
By choosing result = 'asis' as option and by inserting
cat("\n\n\\pagebreak\n") in the for-loop, every output is printed on a separate page.
---
title: "Test page break between two figures"
output: pdf_document
---
```{r, echo=FALSE, results='asis'}
for (i in 1:3) {
print(plot(1:i, rnorm(i)))
cat("\n\n\\pagebreak\n")
}
```
I suggest doing something as mentioned above, instead of creating in a first step all pdf-files separately and glue them together in a second step.

How to remove display of strange characters in R-Markdown chunk output?

I am getting frequencies from a few rasters and am doing so in R-Markdown. I am using lapply to get the frequencies from the rasters in a list. When I store those frequencies in a list of data.frames, the chunk output displays some unexpected non-numeric characters.
Example rasters:
```{r}
require(raster)
r1 <- setValues(raster(nrows = 10, ncols = 10), sample(1:10, 100, replace = TRUE))
r2 <- setValues(raster(nrows = 10, ncols = 10), sample(1:10, 100, replace = TRUE))
rList <- list(r1, r2)
```
Getting the frequencies:
```{r}
lapply(rList, function(ras) {
data.frame(freq(ras))
})
```
Output from the above chunk:
If I display only the data frame itself, those characters are not displayed:
```{r}
lapply(rList, function(ras) {
data.frame(freq(ras))
})[[2]]
```
The correct values are also shown if do not use data.frame:
```{r}
lapply(rList, function(ras) {
freq(ras)
})
```
I've tried saving the Rmd with UTF-8 encoding and am on RStudio 1.2.5019. Any ideas on how to get the list of data frames to display properly would be appreciated.
Edit: Just a note that the characters do not display in any scenario in the generated html file, only in the specific chunk in the R Notebook file itself.
Edit 2:
The full code and YAML header for the notebook that generates the strange characters is below:
---
title: "R Notebook"
output: html_notebook
---
```{r}
require(raster)
r1 <- setValues(raster(nrows = 10, ncols = 10), sample(1:10, 100, replace = TRUE))
r2 <- setValues(raster(nrows = 10, ncols = 10), sample(1:10, 100, replace = TRUE))
rList <- list(r1, r2)
```
```{r}
lapply(rList, function(ras) {
data.frame(freq(ras))
})
```
```{r}
lapply(rList, function(ras) {
data.frame(freq(ras))
})[[2]]
```
Check out this code is properly showing the output. You can use print
On previewing the html notebook output is not showing any UTF characters
If you want to use the chunk output than you can use
as.data.frame()
I've had this experience before with that same "ΓΏ" character, but haven't thought about it a lot (Mac RStudio). I have been able to re-run my code and haven't been able to reproduce it on the same chunk, even in the same session.
I do use the formattable package often, and since folks are mentioning the formatting aspect, perhaps that's at play? Although it's hard for me to envision exactly how.
I did find these related items about Pandoc rendering Unicode artifacts on certain character inputs, and about Pandoc converting to UTF-8. Since the workflow involves knitr feeding Pandoc, maybe that's a piece?
https://github.com/jgm/pandoc/issues/844
https://yihui.org/en/2018/11/biggest-regret-knitr/
Can't quite connect the dots, but hope it helps someone else put it together.
I don't remember it happening recently, so potentially one avenue is to update pandoc and knitr (or even RStudio altogether -- I'm on 1.2.5019)?

R Knitr PDF: Is there a posssibility to automatically save PDF reports (generated from .Rmd) through a loop?

I would like to create a loop, which allows me to automatically save PDF reports, which were generated from a .Rmd file. For instance, if a variable "ID" has 10 rows, I would like R to automatically save me 10 reports, into a specific directory. These reports shall vary based on the ID selected.
A previous post (Using loops with knitr to produce multiple pdf reports... need a little help to get me over the hump) has dealt with the creation of multiple pdf reports generated from .Rnw files. I tried to apply the approach as follows:
#Data
```{r, include=FALSE}
set.seed(500)
Score <- rnorm(40, 100, 15)
Criteria1<-rnorm(40, 10, 5)
Criteria2<-rnorm(40, 20, 5)
ID <- sample(1:1000,8,replace=T)
df <- data.frame(ID,Score,Criteria1,Criteria2)
#instead of manually choosing the ID:
subgroup<- subset(df, ID==1)
# I would like to subset the Data through a loop. My approach was like like this:
for (id in unique(df$ID)){
subgroup<- df[df$ID == id,]}
```
```{r, echo=FALSE}
#Report Analysis
summary(subgroup)
```
#Here will be some text about the summary.
# At the end the goal is to produce automatic pdf reports with the ID name as a filename:
library("rmarkdown")
render("Automated_Report.rmd",output_file = paste('report.', id, '.pdf', sep=''))
Adapting your example:
You need one .rmd "template" file. It could be something like this, save it as template.rmd.
This is a subgroup report.
```{r, echo=FALSE}
#Report Analysis
summary(subgroup)
```
Then, you need an R script that will load the data you want, loop through the data subsets, and for each subset
Define the subgroup object used inside the template
render the template to the desired output
So, in this separate script:
# load data
set.seed(500)
Score <- rnorm(40, 100, 15)
Criteria1<-rnorm(40, 10, 5)
Criteria2<-rnorm(40, 20, 5)
ID <- sample(1:1000,8,replace=T)
df <- data.frame(ID,Score,Criteria1,Criteria2)
library("rmarkdown")
# in a single for loop
# 1. define subgroup
# 2. render output
for (id in unique(df$ID)){
subgroup <- df[df$ID == id,]
render("template.rmd",output_file = paste0('report.', id, '.html'))
}
This produced 8 html files in my working directory, each with a summary of a different subset of the data.
Note that this will not work if you try clicking the "knit" button inside RStudio, as that runs the R code in a separate R session. However, when you run from the console explicitly using render (or knit2pdf) the R code in the rmd file still has access to the global environment.
Rather than relying on global variables, another option would be to use parametrized reports, defining parameters in the YAML header, and passing the parameter values in as arguments to rmarkdown::render.

Resources