How to include the description of a data set in RMarkdown? - r

I am creating an RMarkdown for teaching and I would like to include the description of the data set on the R Markdown file. For example, if I use the data marketing from the R package datarium, I would like to be able to include the description obtained with ?marketing without having to open it online or in R.
marketing {datarium} R Documentation
Marketing Data Set
Description
A data frame containing the impact of three advertising medias (youtube, facebook and newspaper) on sales. Data are the advertising budget in thousands of dollars along with the sales. The advertising experiment has been repeated 200 times.
Usage
data("marketing")
Format
A data frame with 200 rows and 4 columns.
Examples
data(marketing)
res.lm <- lm(sales ~ youtube*facebook, data = marketing)
summary(res.lm)
Is this possible?

Using #MrFlick (more like MrShy) suggestion:
How to get text data from help pages in R?
We can create an R Markdown (I also wanted to hide the function used to get the help text) showing the description of the data as follows:
---
title: "Marketing"
author: 'Jon Doe'
date: ""
output: html_document
---
```{r}
library(datarium)
data("marketing")
```
```{r include=FALSE}
help_text <- function(...) {
file <- help(...)
path <- dirname(file)
dirpath <- dirname(path)
pkgname <- basename(dirpath)
RdDB <- file.path(path, pkgname)
rd <- tools:::fetchRdDB(RdDB, basename(file))
capture.output(tools::Rd2txt(rd, out="", options=list(underline_titles=FALSE)))
}
```
```{r}
# ?marketing
cat(help_extract(marketing), sep="\n")
# Data
head(marketing, 4)
```

Related

Split dataset into clusters and save each cluster on a separate pdf document in R

Using the 'mtcars' dataset, how can one split the dataset into clusters using the 'Carb' field and output each grid on a separate pdf document with the Carb value being the name of the pdf document. I am new in R and the solutions I have found enable one to save each cluster on a different page of a pdf document. Have not found one where its possible to save each cluster as a separate document.
You can create pdfs for each part of dataset using approach of parameterized reports in Rmarkdown and not just creating tables, you can create a whole report for each clusters of the dataset.
So to do that, we need to first create a template rmarkdown file containing code for printing data as table where we also need to specify params in yaml of the file.
---
title: "Untitled"
author: "None"
date: '2022-07-26'
output: pdf_document
params:
carb: 1
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown table
```{r, echo=FALSE}
data(mtcars)
df <- mtcars[mtcars$carb %in% params$carb,]
knitr::kable(df, caption = paste("mtcars table for carb", params$carb))
```
Then from a separate R file (r script) or from console run this code which will create six pdfs for each value of carb
lapply(unique(mtcars$carb), function(carb_i) {
rmarkdown::render("tables.Rmd",
params = list(carb = carb_i),
output_file = paste0("table_for_carb",carb_i, ".pdf"))
})
So, for example, table_for_carb1.pdf looks like this
To know more how to create parameterized report with rmarkdown, see here
Here is an option with package gridExtra.
library(gridExtra)
sp <- split(mtcars, mtcars$carb)
lapply(seq_along(sp), \(i) {
carb <- names(sp)[i]
filename <- sprintf("carb%s.pdf", carb)
pdf(filename)
grid.table(sp[[i]])
dev.off()
})
To write the clusters to the same PDF file, one table per page, start by exporting the first table, then, in the lapply loop go to next page and export the next table. The new pages must be between the tables and there must already exist a page (the 1st) before starting a new one for the next table.
And since the filename doesn't depend on the number of carburetors, the code can be simplified and rewritten without the need for seq_along.
library(grid)
library(gridExtra)
sp <- split(mtcars, mtcars$carb)
pdf("carb.pdf")
grid.table(sp[[1]])
lapply(sp[-1], \(x) {
grid.newpage()
grid.table(x)
})
dev.off()

Creating R Markdown htmls by looping

I have a large report that I am running through R Markdown. The report has a data frame. At the beginning of the script, the data frame is filtered. After that, it does lots of manipulation and interpretation.
Currently, I change what I filter for and knit each report individually. I want to automate this process so that I can provide a vector of terms to filter with and the reports are generated.
Here is an example:
---
title: "Create markdown htmls with loop"
author: "Nathan Roe"
date: "2/17/2022"
output: html_document
---
library(dplyr)
my_df <- data.frame(my_letters = letters[1:5], my_numbers = 1:5)
my_df %>% filter(my_letters == "a")
I want to generate reports for a, b, c, d, and e. Currently, I have to go in and change what is being filtered for. As shown in the example above, I am filtering for "a". After that, I would have to change it to filter for "b", and so on. Is there a way to automate this, so that I provide a vector a, b, c, d, and e and reports are generated based on those filters and htmls are generated using the letter as the title. For example, I provide my_letters <- letters[1:5] and the script creates a.html, b.html, c.html, d.html, and e.html.
It seems similar to this, https://community.rstudio.com/t/loop-for-output-files/79716, but this example is poorly explained, if it does even answer the question.
The link you mention gives all the elements to generate a parametrized report.
On your example, you could knit with custom parameters using rmarkdown::render.
markdown file : test.Rmd
---
title: "Create markdown htmls with loop"
author: "Nathan Roe"
date: "2/17/2022"
output: html_document
params:
letter: 'a'
---
# `r paste('Processing letter ',letter)`
```{r}
letter
```
html file generation with loop :
for (letter in letters[1:5] ) {
rmarkdown::render(input = 'test.Rmd',
output_file = paste0(letter,".html"),
params = list(letter = letter))
}
...

Automate PDF Reports in R

I have a .CSV file that includes an ID column and several text columns (title of story, content of story) and columns for a multiple choice questions (each question in a different column). Also, there are columns for a numerical variable (ternary plots).
Here is a screen shot of the CSV file:
CSV File
Now what I'm trying to do is to automatically generate multiple PDF reports for each ID number (generate a unique report for each individual person). With different values in the report depending on the ID column in the CSV.
I thought the best way to do that in R was to create a RMarkdown file and use parameters to make the values of the report match the ID number values.
Here is my code for the RMarkdown file:
---
title: "`r params$new_title`"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
pdf_document:
latex_engine: xelatex
html_document:
df_print: paged
header-includes:
\usepackage{fontspec}
\usepackage{fancyhdr}
mainfont: Arial
params:
id:
label: ID
value: 1
input: select
choices:
- 1
- 2
- 3
- 4
- 5
new_title: "My Title!"
---
library(tidyverse)
library(ggtern)
library(ggplot2)
library(readr)
library(lubridate)
library(magrittr)
library(rmarkdown)
knitr::opts_chunk$set(echo = FALSE)
data <- readr::read_csv("dummy.csv")
data_id <- data %>%
filter(id == v)
**Your title:** `r data_id$title`
**Your micro-narrative:** `r data_id$narrative`
Now the code is working, but the formatting in the generated report is not how I want it.
If the same ID number has multiple entries for story title and story content, the values are displayed next to each other. What I want is this:
Story #1 title:
Story #1 content:
Story #2 title:
Story #2 content:
and NOT:
Title: story#1 title, story#2 title, etc...
Content: story#1, story#2, etc...
To automatically generate multiple reports with one click, I created a loop. Here is the code:
require(rmarkdown)
data = read_csv("dummy.csv")
slices = unique(data$id)
for (v in slices){
render("~/Desktop/My_L3/report.Rmd", output_file = paste0("~/Desktop/report_", v, ".pdf"),
params=list(new_title=paste("Quarterly Report -", v)))
}
The loop is working and I was able to generate multiple PDFs by just running this code.
Is this the easiest way to do it? Any other way you're aware of?
And lastly, how do I include the multiple choice questions in the RMarkdown file?
For example, if a certain ID number has 3 choices selected (three 1s in the CSV) how do I display the result as the following:
You selected the following choices: bananas, apples, oranges
I would really appreciate your help as I'm an R noob and still learning a lot of stuff.
#badi congrats! For a newcomer to R you managed already quite a steep hill.
I hope the following will help you moving further:
(I) observation: use of rmarkdown::render(... , params = list(...))
You can pass multiple variables and objects as params to your "report" Rmd.
Thus, when you have lengthy preparatory steps, you can load your data, prepare it, and filter it with the loop you use to call rmarkdown::render().
E.g. inside your for loop you could do something like df <- data %>% filter(id == v) and then pass df as (one of the) params, e.g. rmarkdown::render(... , params = list(new_title=paste("Quarterly Report -", v)), data = df)
Then define a params for the dataframe. I recommend to "load" a dummy object/df, e.g.
...
params:
id ...
data: !mtcars # dummy object/df - do not forget the ! mark
(II) printing dynamic and static text
There are different ways to achieve this. For your example, it looks like you are looking for something relatively well-formatted that can be constructed from your table columns.
For this sprintf() is your friend. I abbreviated your example with a lighter dataframe.
I print this in the beginning of the document/pdf output.
For this you have to set the chunk parameter results = "as-is" and wrap the sprintf() call into a cat() to allow the template formatting and block R/Rmd from adding other output format stuff (e.g. the ## you can see when I print the table above).
The choices you can combine with a paste() call. Of course this can be done with varying levels of sophistication that I leave to you to explore.
I keep the 1 and NA coding. You can replace these with what you think is appropriate (ifelse/case_when, or complex(er) string substitute operations.
To prepare the list of choices, I just paste everything together:
df <- params$data %>%
mutate(choice_text = paste(choice1, choice2, choice3, sep = ","))
The following code-chunk defines the static/dynamic text template for sprintf() and we iterate over the rows of the data dataframe
# to programmatically print text sprintf()
# allows to combine static and dynamic text
# first define a template how your section looks like
# %s points to a string - not used here by %f caters for (float)numbers
template <- "
## Title %s
With content: %s.
\n You selected the following choices: %s
" # end of your "dynamic" text template
# recall to add an empty line for spacing
# you can force a new line for text entries with \n
# iterate over the input data frame
for (i in seq(nrow(df))) {
current <- df[i, ]
cat(sprintf(template, current$title, current$content, current$choice_text))
}
With the adequately set-up pdf template, you will get the following.
Note: My report breaks over to a 2nd page, I only show the first page here.

Convert list of different length into data table for markdown for html format

This is what Im doing to generate a markdown so that all the things should be in one place.
How can i put these output into a datatable form which are more readable and easier to search.The list which is made are of different length. Each list has a series of table under it.
If there a way to convert these differing length list to data table format that would be really helpful
The table looks like this
## Prepare for analyses
```{r,warning=FALSE,message=FALSE}
set.seed(1234)
library(europepmc)
library(tidypmc)
library(tidyverse)
#library(dplyr)
```
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
##Cytarabine cytogenetically normal aml adult clinical trial Randomized Controlled Trial. 828 records found, showing 10
```{r,include=FALSE}
b <-epmc_search(query = 'cytarabine cytogenetically normal aml adult clinical trial Randomized Controlled Trial OPEN_ACCESS:Y',limit = 10)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
docs <- map(pmcids, epmc_ftxt)
my_tables <- map(docs, pmc_table)
```
```{r}
names(my_tables) <- pmcids
```
The code chunk input and output is then displayed as follows:
```{r basicconsole}
source("flat.R")
L1 <- flattenlist(my_tables)
l.f <- Filter(function(a) any(!is.na(a)), L1)
l.f
#tibble:::print.tbl_df(head(df))
#n <- paste0("Valporic_", names(l.f), ".txt")
for (i in 1:length(l.f)) {
write.table(l.f[i], sep = "\t",row.names = FALSE,col.names = TRUE,file=paste0(names(l.f)[i], ".txt"))
}
UPDATE
I have manged to covert those tibble into dataframe
using this solution
##Outout
```{r}
abc <- mapply(cbind, l.f)
abc
But when it is rendered in the markdown the column formatting is gone. Now i have now dataframe inside list.
But still im not sure how to put that into a data table
**UPDATE 2.0 **
The better approach is to read those saved output as list of files into data table and then use it as markdown but so far it is taking only one ID only. My code.
tbl_fread <-
list.files(pattern = "*.txt") %>%
map_df(~fread(.))
knitr::kable(head(tbl_fread), "pipe")
Is it possible to put these files as such.
if a list of file are from one PMCID then those would be all in one column such as if PMCID one has 3 output then all of them should be one the same row. Then the next PMCID in the second one etc etc.
UPDATE new
I have managed to align the output into more readable format. But It seems that by default all the files assigned to multiple columns which would be the case given that im reading all the files together since my idea of using the list to data table didn't work.
If i can push or stack each unique PMCID over one another instead of all in one after another that would be. Good
knitr::kable(tbl_fread, align = "lccrr")
This may be something you can adapt for R Markdown. I'm not sure what the rationale is to save and load the tables. Instead, you could obtain the tables and show in html directly.
As you are using HTML, make sure to have results='asis' in your chunk. You can use a for loop and seq_along to show each table. You can include information in your table caption, such as the PMCID as well as table number.
---
title: "test13121"
author: "Ben"
date: "1/31/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Libraries
```{r}
library(tidypmc)
library(tidyverse)
library(europepmc)
library(kableExtra)
```
# Get Articles
```{r, echo = FALSE}
b <-epmc_search(query = 'cytarabine aml OPEN_ACCESS:Y',limit = 6)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
docs <- map(pmcids, epmc_ftxt)
my_tables <- map(docs, pmc_table)
names(my_tables) <- pmcids
```
# Show Tables
```{r, echo=F, results='asis'}
for (i in seq_along(my_tables)) {
for (j in seq_along(my_tables[[i]])) {
print(kable(x = my_tables[[i]][[j]], caption = paste0(names(my_tables)[i], ": Table ", j)))
}
}
```

Calling an .Rmd file from within an .Rmd file

I have a standard piece of analysis to perform on multiple datasets and want to present them in one report using a template.
The analysis per dataset could look like this:
child.Rmd
## Name of dataset
```{r calculate_stats}
summary(ds)
nrows <- nrow(ds)
```
The number of rows in the dataset is `r nrows`
The full report has this structure:
parent.Rmd
# Report
```{r import_all_datasets}
...import all datasets form csv...
ds.list <- c(ds1, ds2, ds3, ...)
```
for ds in ds.list
run child.Rmd with ds as a parameter
An additional requirement is that I can run the child.Rmd report alone with a specified parameter. The linked answer in comments below uses double curly braces ({{i}}) and knit_expand replaces it with i in the parent environment. This is unsatisfactory as it makes it a faff to call child.Rmd on its own.
Is it possible for the child to be a parametrised report and for the parent to pass the child the list of parameters.
I'm just attempting to do this now by trying:
child.Rmd
---
output: pdf_document
params:
ds: !r cars
name: "cars"
---
`r params$name`
=====
```{r}
summary(params$ds)
nrows <- nrow(params$ds)
```
The number of rows in the dataset is `r nrows`
And passing params to child within parent.Rmd

Resources