I have a large report that I am running through R Markdown. The report has a data frame. At the beginning of the script, the data frame is filtered. After that, it does lots of manipulation and interpretation.
Currently, I change what I filter for and knit each report individually. I want to automate this process so that I can provide a vector of terms to filter with and the reports are generated.
Here is an example:
---
title: "Create markdown htmls with loop"
author: "Nathan Roe"
date: "2/17/2022"
output: html_document
---
library(dplyr)
my_df <- data.frame(my_letters = letters[1:5], my_numbers = 1:5)
my_df %>% filter(my_letters == "a")
I want to generate reports for a, b, c, d, and e. Currently, I have to go in and change what is being filtered for. As shown in the example above, I am filtering for "a". After that, I would have to change it to filter for "b", and so on. Is there a way to automate this, so that I provide a vector a, b, c, d, and e and reports are generated based on those filters and htmls are generated using the letter as the title. For example, I provide my_letters <- letters[1:5] and the script creates a.html, b.html, c.html, d.html, and e.html.
It seems similar to this, https://community.rstudio.com/t/loop-for-output-files/79716, but this example is poorly explained, if it does even answer the question.
The link you mention gives all the elements to generate a parametrized report.
On your example, you could knit with custom parameters using rmarkdown::render.
markdown file : test.Rmd
---
title: "Create markdown htmls with loop"
author: "Nathan Roe"
date: "2/17/2022"
output: html_document
params:
letter: 'a'
---
# `r paste('Processing letter ',letter)`
```{r}
letter
```
html file generation with loop :
for (letter in letters[1:5] ) {
rmarkdown::render(input = 'test.Rmd',
output_file = paste0(letter,".html"),
params = list(letter = letter))
}
...
Related
I have a .CSV file that includes an ID column and several text columns (title of story, content of story) and columns for a multiple choice questions (each question in a different column). Also, there are columns for a numerical variable (ternary plots).
Here is a screen shot of the CSV file:
CSV File
Now what I'm trying to do is to automatically generate multiple PDF reports for each ID number (generate a unique report for each individual person). With different values in the report depending on the ID column in the CSV.
I thought the best way to do that in R was to create a RMarkdown file and use parameters to make the values of the report match the ID number values.
Here is my code for the RMarkdown file:
---
title: "`r params$new_title`"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
pdf_document:
latex_engine: xelatex
html_document:
df_print: paged
header-includes:
\usepackage{fontspec}
\usepackage{fancyhdr}
mainfont: Arial
params:
id:
label: ID
value: 1
input: select
choices:
- 1
- 2
- 3
- 4
- 5
new_title: "My Title!"
---
library(tidyverse)
library(ggtern)
library(ggplot2)
library(readr)
library(lubridate)
library(magrittr)
library(rmarkdown)
knitr::opts_chunk$set(echo = FALSE)
data <- readr::read_csv("dummy.csv")
data_id <- data %>%
filter(id == v)
**Your title:** `r data_id$title`
**Your micro-narrative:** `r data_id$narrative`
Now the code is working, but the formatting in the generated report is not how I want it.
If the same ID number has multiple entries for story title and story content, the values are displayed next to each other. What I want is this:
Story #1 title:
Story #1 content:
Story #2 title:
Story #2 content:
and NOT:
Title: story#1 title, story#2 title, etc...
Content: story#1, story#2, etc...
To automatically generate multiple reports with one click, I created a loop. Here is the code:
require(rmarkdown)
data = read_csv("dummy.csv")
slices = unique(data$id)
for (v in slices){
render("~/Desktop/My_L3/report.Rmd", output_file = paste0("~/Desktop/report_", v, ".pdf"),
params=list(new_title=paste("Quarterly Report -", v)))
}
The loop is working and I was able to generate multiple PDFs by just running this code.
Is this the easiest way to do it? Any other way you're aware of?
And lastly, how do I include the multiple choice questions in the RMarkdown file?
For example, if a certain ID number has 3 choices selected (three 1s in the CSV) how do I display the result as the following:
You selected the following choices: bananas, apples, oranges
I would really appreciate your help as I'm an R noob and still learning a lot of stuff.
#badi congrats! For a newcomer to R you managed already quite a steep hill.
I hope the following will help you moving further:
(I) observation: use of rmarkdown::render(... , params = list(...))
You can pass multiple variables and objects as params to your "report" Rmd.
Thus, when you have lengthy preparatory steps, you can load your data, prepare it, and filter it with the loop you use to call rmarkdown::render().
E.g. inside your for loop you could do something like df <- data %>% filter(id == v) and then pass df as (one of the) params, e.g. rmarkdown::render(... , params = list(new_title=paste("Quarterly Report -", v)), data = df)
Then define a params for the dataframe. I recommend to "load" a dummy object/df, e.g.
...
params:
id ...
data: !mtcars # dummy object/df - do not forget the ! mark
(II) printing dynamic and static text
There are different ways to achieve this. For your example, it looks like you are looking for something relatively well-formatted that can be constructed from your table columns.
For this sprintf() is your friend. I abbreviated your example with a lighter dataframe.
I print this in the beginning of the document/pdf output.
For this you have to set the chunk parameter results = "as-is" and wrap the sprintf() call into a cat() to allow the template formatting and block R/Rmd from adding other output format stuff (e.g. the ## you can see when I print the table above).
The choices you can combine with a paste() call. Of course this can be done with varying levels of sophistication that I leave to you to explore.
I keep the 1 and NA coding. You can replace these with what you think is appropriate (ifelse/case_when, or complex(er) string substitute operations.
To prepare the list of choices, I just paste everything together:
df <- params$data %>%
mutate(choice_text = paste(choice1, choice2, choice3, sep = ","))
The following code-chunk defines the static/dynamic text template for sprintf() and we iterate over the rows of the data dataframe
# to programmatically print text sprintf()
# allows to combine static and dynamic text
# first define a template how your section looks like
# %s points to a string - not used here by %f caters for (float)numbers
template <- "
## Title %s
With content: %s.
\n You selected the following choices: %s
" # end of your "dynamic" text template
# recall to add an empty line for spacing
# you can force a new line for text entries with \n
# iterate over the input data frame
for (i in seq(nrow(df))) {
current <- df[i, ]
cat(sprintf(template, current$title, current$content, current$choice_text))
}
With the adequately set-up pdf template, you will get the following.
Note: My report breaks over to a 2nd page, I only show the first page here.
This is what Im doing to generate a markdown so that all the things should be in one place.
How can i put these output into a datatable form which are more readable and easier to search.The list which is made are of different length. Each list has a series of table under it.
If there a way to convert these differing length list to data table format that would be really helpful
The table looks like this
## Prepare for analyses
```{r,warning=FALSE,message=FALSE}
set.seed(1234)
library(europepmc)
library(tidypmc)
library(tidyverse)
#library(dplyr)
```
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
##Cytarabine cytogenetically normal aml adult clinical trial Randomized Controlled Trial. 828 records found, showing 10
```{r,include=FALSE}
b <-epmc_search(query = 'cytarabine cytogenetically normal aml adult clinical trial Randomized Controlled Trial OPEN_ACCESS:Y',limit = 10)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
docs <- map(pmcids, epmc_ftxt)
my_tables <- map(docs, pmc_table)
```
```{r}
names(my_tables) <- pmcids
```
The code chunk input and output is then displayed as follows:
```{r basicconsole}
source("flat.R")
L1 <- flattenlist(my_tables)
l.f <- Filter(function(a) any(!is.na(a)), L1)
l.f
#tibble:::print.tbl_df(head(df))
#n <- paste0("Valporic_", names(l.f), ".txt")
for (i in 1:length(l.f)) {
write.table(l.f[i], sep = "\t",row.names = FALSE,col.names = TRUE,file=paste0(names(l.f)[i], ".txt"))
}
UPDATE
I have manged to covert those tibble into dataframe
using this solution
##Outout
```{r}
abc <- mapply(cbind, l.f)
abc
But when it is rendered in the markdown the column formatting is gone. Now i have now dataframe inside list.
But still im not sure how to put that into a data table
**UPDATE 2.0 **
The better approach is to read those saved output as list of files into data table and then use it as markdown but so far it is taking only one ID only. My code.
tbl_fread <-
list.files(pattern = "*.txt") %>%
map_df(~fread(.))
knitr::kable(head(tbl_fread), "pipe")
Is it possible to put these files as such.
if a list of file are from one PMCID then those would be all in one column such as if PMCID one has 3 output then all of them should be one the same row. Then the next PMCID in the second one etc etc.
UPDATE new
I have managed to align the output into more readable format. But It seems that by default all the files assigned to multiple columns which would be the case given that im reading all the files together since my idea of using the list to data table didn't work.
If i can push or stack each unique PMCID over one another instead of all in one after another that would be. Good
knitr::kable(tbl_fread, align = "lccrr")
This may be something you can adapt for R Markdown. I'm not sure what the rationale is to save and load the tables. Instead, you could obtain the tables and show in html directly.
As you are using HTML, make sure to have results='asis' in your chunk. You can use a for loop and seq_along to show each table. You can include information in your table caption, such as the PMCID as well as table number.
---
title: "test13121"
author: "Ben"
date: "1/31/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Libraries
```{r}
library(tidypmc)
library(tidyverse)
library(europepmc)
library(kableExtra)
```
# Get Articles
```{r, echo = FALSE}
b <-epmc_search(query = 'cytarabine aml OPEN_ACCESS:Y',limit = 6)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
docs <- map(pmcids, epmc_ftxt)
my_tables <- map(docs, pmc_table)
names(my_tables) <- pmcids
```
# Show Tables
```{r, echo=F, results='asis'}
for (i in seq_along(my_tables)) {
for (j in seq_along(my_tables[[i]])) {
print(kable(x = my_tables[[i]][[j]], caption = paste0(names(my_tables)[i], ": Table ", j)))
}
}
```
I'm new to R and I'm really liking the flexibility of R Markdown to create reports.
My problem is that I want to use a random number generating function I've created my tables. I want simple tables to include string headers and the following function:
> ran<-function(x){
+ x<-runif(n=1, min=13.5,max=15.5)
+ round(x, digits=2)}.
It won't allow me to create a table using this method?
```{r}
String |String |String
-------|------|------
ran(x)|ran(x)|ran(x)
```
My ultimate goal is to create practice worksheets with simple statistics that will be different but within a bounded integer range - so I can ask follow-up questions with some idea of the mean, median etc.
Any help would be greatly appreciated.
Perhaps you should read up on both how to build working R code and how to code up Rmd files since your function doesn't work and there are a few places in the R Markdown docs that show how to do this:
---
output: html_document
---
```{r echo=FALSE}
ran <- function(x) {
runif(n=1, min=13.5, max=15.5) + round(x, digits=2)
}
```
One |Two |Three
-----------|-------------|-----------
`r ran(2)` | `r ran(3)` | `r ran(4)`
`r ran(2)` | `r ran(3)` | `r ran(4)`
`r ran(2)` | `r ran(3)` | `r ran(4)`
`r ran(2)` | `r ran(3)` | `r ran(4)`
generates:
Also, neither SO nor RStudio charges extra for spaces in code blocks. It'd be good to show students good code style while you're layin' down stats tracks.
Here is an approach that automates much of the report generation and reduces the amount of code you need to type. For starters, you can turn this into a parameterized report, which would make it easier to create worksheets using different values of x. Here's an example:
In your rmarkdown document you would declare parameters x and n in the yaml header. n is the number of random values you want to produce for each value of x. The x and n values in the yaml header are just the defaults knitr uses if no other values are input when you render the report:
---
output: html_document
params:
x: !r c(1,5,10)
n: 10
---
Then, in the same rmarkdown document you would have the text and code for your worksheet. You access the parameters x and n with params$x and params$n, respectively.
For example, the rest of the rmarkdown document could look like the code below. We put x into a list called x_vals with named elements, so that the resulting column names in the output will be the names of the list elements. We feed that list to sapply to get a column of n random values for each value of x. The whole sapply statement is wrapped in kable, which produces a table in rmarkdown format.
```{r, include=FALSE}
library(knitr)
```
```{r, echo=FALSE}
# Create a named list of the x values that we passed into this document
x_vals = as.list(setNames(params$x, paste0("x=", params$x)))
kable(sapply(x_vals, function(i) round(runif(params$n, 13.5, 15.5) + i, 2)))
```
You can now click the "knit" button and it will produce a table using the default parameter values:
If instead you want to use different values for x and/or n, open a separate R script file and type the following:
rmarkdown::render("Worksheet.Rmd",
params = list(x = c(2,4,6,8),
n = 5),
output_file="Worksheet.html")
In the code above, the render function compiles the rmarkdown document we just created, but with new x and n values, and saves the output to a file called Worksheet.html. (I've assumed that we've saved the rmarkdown document to a file called Worksheet.Rmd.) Here's what the output looks like:
You can also, of course, add parameters for the lower and upper limits of the runif function, rather than hard-coding them as 13.5 and 15.5.
If you want to create several worksheets, each with different x values, you can put render in a loop:
df = expand.grid(1:3,5:6,10:11)
for (i in 1:nrow(df)) {
rmarkdown::render("Worksheet.Rmd",
params = list(x=unlist(df[i,]), n=10),
output_file=paste0(paste(unlist(df[i,]),collapse="_"),".html"))
}
Revised title to clarify focus.
We notice an anomaly that R markdown and data.table interact in a surprising way. Same does not happen when knitting LaTeX. Commands which not have a return within the R session do cause a return within the knitted markdown output. I trace the problem back to commands like the following, which do not produce an output in R,
````{r}
poolballs[ , weight2:=2 * weight]
```
but inside Rmarkdown, the output includes the full print of the poolballs DT. Same does not happen if we knit an equivalent chunk in LaTeX.
This produced some funny HTML output because I wrote chunks like this, intending to display only the first 5 lines
```{r}
poolballs[ , weight2:=2 * weight]
head(poolballs)
```
Markdown parses that as the equivalent of two chunks,
> poolballs[ , weight2:=2 * weight]
> poolballs
> head(poolballs)
Here's the markdown file to demonstrate
---
title: "Data Table Guide"
author:
- name: Paul Johnson
affiliation: Center for Research Methods and Data Analysis, University of Kansas
email: pauljohn#ku.edu
date: "`r format(Sys.time(), '%Y %B %d')`"
output:
html_document:
theme: united
highlight: haddock
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=TRUE, comment=NA)
options(width = 70)
```
```{r make_pb_dt}
set.seed(234234)
library(data.table)
poolballs <- data.table(
number = 1:15,
weight = rnorm(15, 45.7, 0.8),
diameter = c(3, 2.9, 3.1) #shows recyling
)
poolballs
```
I want the following to show only head in line 2
```{r}
poolballs[ , weight2:=2 * weight]
head(poolballs)
```
Compare the HTML output:
http://pj.freefaculty.org/scraps/mre-dt.html
I'm sorry if this is a known feature of markdown. I've coded around this wrinkle by hiding chunks, but it seems somewhat inconvenient. Today i'm curious enough to ask you about it. I wrote the same chunks in a LaTeX file and the funny DT output problem does not happen. I put link to PDF from LaTeX in http:/pj.freefaculty.org/scraps/mre-dt-3.pdf
In your final chunk, knitr sees that you have two objects that its attempting to print and you're getting the output for both. This isn't a feature, and has been addressed in a previous question.
If you want to only print the head of the first object in that chunk, your code should be head(poolballs[, weight2:=2 * weight])
I have a standard piece of analysis to perform on multiple datasets and want to present them in one report using a template.
The analysis per dataset could look like this:
child.Rmd
## Name of dataset
```{r calculate_stats}
summary(ds)
nrows <- nrow(ds)
```
The number of rows in the dataset is `r nrows`
The full report has this structure:
parent.Rmd
# Report
```{r import_all_datasets}
...import all datasets form csv...
ds.list <- c(ds1, ds2, ds3, ...)
```
for ds in ds.list
run child.Rmd with ds as a parameter
An additional requirement is that I can run the child.Rmd report alone with a specified parameter. The linked answer in comments below uses double curly braces ({{i}}) and knit_expand replaces it with i in the parent environment. This is unsatisfactory as it makes it a faff to call child.Rmd on its own.
Is it possible for the child to be a parametrised report and for the parent to pass the child the list of parameters.
I'm just attempting to do this now by trying:
child.Rmd
---
output: pdf_document
params:
ds: !r cars
name: "cars"
---
`r params$name`
=====
```{r}
summary(params$ds)
nrows <- nrow(params$ds)
```
The number of rows in the dataset is `r nrows`
And passing params to child within parent.Rmd