loop through data instead of indexing in R - r

I am trying to convert my data into a html document using Rmarkdown, and I am currently relying on conversion to vectors and indexing to solve my problem.
Although my sample data has 4 observations, my actual datasets has over 30 records, so indexing seems cumbersome and unnatural.
Is there a better way to pull out each of these elements in sequence? Any suggestions would be great.
--
title: "Rmarkdown report"
output: html_document
---
```{r echo = FALSE}
mydata <- data.frame(First = c("John", "Hui", "Jared"), Second = c("Smith", "Chang", "Jzu"), Sport = c("Football","Soccer","Ballet"), Age = c("12", "13", "12"), submission = c("Microbes may be the friends of future colonists living off the land on the moon, Mars or elsewhere in the solar system and aiming to establish self-sufficient homes. Space colonists, like people on Earth, will need what are known as rare earth elements, which are critical to modern technologies. These 17 elements, with daunting names like yttrium, lanthanum, neodymium and gadolinium, are sparsely distributed in the Earths crust. Without the rare earths, we wouldn’t have certain lasers, metallic alloys and powerful magnets that are used in cellphones and electric cars. But mining them on Earth today is an arduous process. It requires crushing tons of ore and then extracting smidgens of these metals using chemicals that leave behind rivers of toxic waste water.",
"Experiments conducted aboard the International Space Station show that a potentially cleaner, more efficient method could work on other worlds: let bacteria do the messy work of separating rare earth elements from rock. The idea is the biology is essentially catalyzing a reaction that would occur very slowly without the biology, said Charles S. Cockell, a professor of astrobiology at the University of Edinburgh.
On Earth, such biomining techniques are already used to produce 10 to 20 percent of the world’s copper and also at some gold mines; scientists have identified microbes that help leach rare earth elements out of rocks.",
"Experiments conducted aboard the International Space Station show that a potentially cleaner, more efficient method could work on other worlds: let bacteria do the messy work of separating rare earth elements from rock. The idea is the biology is essentially catalyzing a reaction that would occur very slowly without the biology, said Charles S. Cockell, a professor of astrobiology at the University of Edinburgh.
On Earth, such biomining techniques are already used to produce 10 to 20 percent of the world’s copper and also at some gold mines; scientists have identified microbes that help leach rare earth elements out of rocks."))
first<- as.vector(mydata$First)
sec <- as.vector(mydata$Second)
age <- as.vector(mydata$Age)
submission <- as.vector(mydata$submission)
```
##
**First:** `r first[1]`   **Second:** `r sec[1]` <br>
**Age:** `r age[1]`
**submission** <br>
`r submission[1]`
***
**First:** `r first[2]`   **Second:** `r sec[2]` <br>
**Age:** `r age[2]`
**submission** <br>
`r submission[2]`

Here's a way to iterate over all rows
---
title: "Rmarkdown report"
output: html_document
---
```{r echo = FALSE}
# using data from above
# mydata <- data.frame(...)
# Define template (using column names from data.frame)
template <- "**First:** `r First`   **Second:** `r Second` <br>
**Age:** `r Age`
**submission** <br>
`r submission`"
# Now process the template for each row of the data.frame
src <- lapply(1:nrow(mydata), function(i) {
knitr::knit_child(text=template, envir=mydata[i, ], quiet=TRUE)
})
```
# Print result to document
`r knitr::knit_child(text=unlist(src))`
Here we use knit_child to take a template string and then use that for each row of the data.frame. I used a trick here to pass in the row of the data.frame as an environment so the template can see all the columns as variables so we don't need to create the vector versions of all the data.frame columns.

If we need to create objects in the global env, subset the columns of data into a list, rename it and use list2env
nm1 <- c('First', 'Second', 'Age', 'submission')
nm2 <- c('first', 'sec', 'age', submission')
list2env(setNames(unclass(mydata[nm1]), nm2), .GlobalEnv)

This is the answer I gave to your previous question:
You can use cat to add the HTML code to an R markdown chunk in order to loop through your data.
Important
You have to add results = "asis" to {r}
Here is the loop:
{r results="asis", echo = FALSE}
i = 1
NR_OF_ROWS <-
nrow(data) # number of rows that the loop will go through
while (i <= NR_OF_ROWS) {
cat("\n **First:** ", data[i, 1], "  **Last:** ", data[i, 2], "<br> \n")
cat("\n **Age:** ", data[i, 3], "  **Sport:** ", data[i, 4], "<br> \n")
cat("\n **submission** ", data[i, 5], "<br> \n")
# cat("\n <br> \n") extra space between entries
cat("\n *** \n") line between entries
i = i + 1
}
Here is the result:

Related

Automate PDF Reports in R

I have a .CSV file that includes an ID column and several text columns (title of story, content of story) and columns for a multiple choice questions (each question in a different column). Also, there are columns for a numerical variable (ternary plots).
Here is a screen shot of the CSV file:
CSV File
Now what I'm trying to do is to automatically generate multiple PDF reports for each ID number (generate a unique report for each individual person). With different values in the report depending on the ID column in the CSV.
I thought the best way to do that in R was to create a RMarkdown file and use parameters to make the values of the report match the ID number values.
Here is my code for the RMarkdown file:
---
title: "`r params$new_title`"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
pdf_document:
latex_engine: xelatex
html_document:
df_print: paged
header-includes:
\usepackage{fontspec}
\usepackage{fancyhdr}
mainfont: Arial
params:
id:
label: ID
value: 1
input: select
choices:
- 1
- 2
- 3
- 4
- 5
new_title: "My Title!"
---
library(tidyverse)
library(ggtern)
library(ggplot2)
library(readr)
library(lubridate)
library(magrittr)
library(rmarkdown)
knitr::opts_chunk$set(echo = FALSE)
data <- readr::read_csv("dummy.csv")
data_id <- data %>%
filter(id == v)
**Your title:** `r data_id$title`
**Your micro-narrative:** `r data_id$narrative`
Now the code is working, but the formatting in the generated report is not how I want it.
If the same ID number has multiple entries for story title and story content, the values are displayed next to each other. What I want is this:
Story #1 title:
Story #1 content:
Story #2 title:
Story #2 content:
and NOT:
Title: story#1 title, story#2 title, etc...
Content: story#1, story#2, etc...
To automatically generate multiple reports with one click, I created a loop. Here is the code:
require(rmarkdown)
data = read_csv("dummy.csv")
slices = unique(data$id)
for (v in slices){
render("~/Desktop/My_L3/report.Rmd", output_file = paste0("~/Desktop/report_", v, ".pdf"),
params=list(new_title=paste("Quarterly Report -", v)))
}
The loop is working and I was able to generate multiple PDFs by just running this code.
Is this the easiest way to do it? Any other way you're aware of?
And lastly, how do I include the multiple choice questions in the RMarkdown file?
For example, if a certain ID number has 3 choices selected (three 1s in the CSV) how do I display the result as the following:
You selected the following choices: bananas, apples, oranges
I would really appreciate your help as I'm an R noob and still learning a lot of stuff.
#badi congrats! For a newcomer to R you managed already quite a steep hill.
I hope the following will help you moving further:
(I) observation: use of rmarkdown::render(... , params = list(...))
You can pass multiple variables and objects as params to your "report" Rmd.
Thus, when you have lengthy preparatory steps, you can load your data, prepare it, and filter it with the loop you use to call rmarkdown::render().
E.g. inside your for loop you could do something like df <- data %>% filter(id == v) and then pass df as (one of the) params, e.g. rmarkdown::render(... , params = list(new_title=paste("Quarterly Report -", v)), data = df)
Then define a params for the dataframe. I recommend to "load" a dummy object/df, e.g.
...
params:
id ...
data: !mtcars # dummy object/df - do not forget the ! mark
(II) printing dynamic and static text
There are different ways to achieve this. For your example, it looks like you are looking for something relatively well-formatted that can be constructed from your table columns.
For this sprintf() is your friend. I abbreviated your example with a lighter dataframe.
I print this in the beginning of the document/pdf output.
For this you have to set the chunk parameter results = "as-is" and wrap the sprintf() call into a cat() to allow the template formatting and block R/Rmd from adding other output format stuff (e.g. the ## you can see when I print the table above).
The choices you can combine with a paste() call. Of course this can be done with varying levels of sophistication that I leave to you to explore.
I keep the 1 and NA coding. You can replace these with what you think is appropriate (ifelse/case_when, or complex(er) string substitute operations.
To prepare the list of choices, I just paste everything together:
df <- params$data %>%
mutate(choice_text = paste(choice1, choice2, choice3, sep = ","))
The following code-chunk defines the static/dynamic text template for sprintf() and we iterate over the rows of the data dataframe
# to programmatically print text sprintf()
# allows to combine static and dynamic text
# first define a template how your section looks like
# %s points to a string - not used here by %f caters for (float)numbers
template <- "
## Title %s
With content: %s.
\n You selected the following choices: %s
" # end of your "dynamic" text template
# recall to add an empty line for spacing
# you can force a new line for text entries with \n
# iterate over the input data frame
for (i in seq(nrow(df))) {
current <- df[i, ]
cat(sprintf(template, current$title, current$content, current$choice_text))
}
With the adequately set-up pdf template, you will get the following.
Note: My report breaks over to a 2nd page, I only show the first page here.

Create multiiple rmarkdown reports with one dataset

I would like to create several pdf files in rmarkdown.
This is a sample of my data:
mydata <- data.frame(First = c("John", "Hui", "Jared","Jenner"), Second = c("Smith", "Chang", "Jzu","King"), Sport = c("Football","Ballet","Ballet","Football"), Age = c("12", "13", "12","13"), submission = c("Microbes may be the friends of future colonists living off the land on the moon, Mars or elsewhere in the solar system and aiming to establish self-sufficient homes.
Space colonists, like people on Earth, will need what are known as rare earth elements, which are critical to modern technologies. These 17 elements, with daunting names like yttrium, lanthanum, neodymium and gadolinium, are sparsely distributed in the Earth’s crust. Without the rare earths, we wouldn’t have certain lasers, metallic alloys and powerful magnets that are used in cellphones and electric cars.", "But mining them on Earth today is an arduous process. It requires crushing tons of ore and then extracting smidgens of these metals using chemicals that leave behind rivers of toxic waste water.
Experiments conducted aboard the International Space Station show that a potentially cleaner, more efficient method could work on other worlds: let bacteria do the messy work of separating rare earth elements from rock.", "“The idea is the biology is essentially catalyzing a reaction that would occur very slowly without the biology,” said Charles S. Cockell, a professor of astrobiology at the University of Edinburgh.
On Earth, such biomining techniques are already used to produce 10 to 20 percent of the world’s copper and also at some gold mines; scientists have identified microbes that help leach rare earth elements out of rocks.", "Blank"))
With help from the community, I was able to arrive at a cool rmarkdown solution that would create a single html file, with all the data I want.
This is saved as Essay to Word.Rmd
```{r echo = FALSE}
# using data from above
# mydata <- mydata
# Define template (using column names from data.frame)
template <- "**First:** `r First`   **Second:** `r Second` <br>
**Age:** `r Age`
**Submission** <br>
`r Submission`"
# Now process the template for each row of the data.frame
src <- lapply(1:nrow(mydata), function(i) {
knitr::knit_child(text=template, envir=mydata[i, ], quiet=TRUE)
})
```
# Print result to document
`r knitr::knit_child(text=unlist(src))`
```
This creates a single file:
I would like to create a single html (or preferably PDF file) for each "sport" listed in the data. So I would have all the submissions for students who do "Ballet" in one file, and a separate file with all the submissions of students who play football.
I have been looking a few different solutions, and I found this to be the most helpful:
R Knitr PDF: Is there a posssibility to automatically save PDF reports (generated from .Rmd) through a loop?
Following suite, I created a separate R script to loop through and subset the data by sport:
Unfortunately, this is creating a separate file with ALL the students, not just those who belong to that sport.
for (sport in unique(mydata$Sport)){
subgroup <- mydata[mydata$Sport == sport,]
render("Essay to Word.Rmd",output_file = paste0('report.',sport, '.html'))
}
Any idea what might be going on with this code above?
Is it possible to directly create these files as PDF docs instead of html? I know I can click on each file to save them as pdf after the fact, but I will have 40 different sports files to work with.
Is is possible to add a thin line between each "submission" essay within a file?
Any help would be great, thank you!!!
This could be achieved via a parametrized report like so:
Add parameters for the data and e.g. the type of sport to your Rmd
Inside the lapply pass your subgroup dataset to render via argument params
You can add horizontal lines via ***
If you want pdf then use output_format="pdf_document". Additionally to render your document I had to switch the latex engine via output_options
Rmd:
---
params:
data: null
sport: null
---
```{r echo = FALSE}
# using data from above
data <- params$data
# Define template (using column names from data.frame)
template <- "
***
**First:** `r First`   **Second:** `r Second` <br>
**Age:** `r Age`
**Submission** <br>
`r Submission`"
# Now process the template for each row of the data.frame
src <- lapply(1:nrow(data), function(i) {
knitr::knit_child(text=template, envir=data[i, ], quiet=TRUE)
})
```
# Print result to document. Sport: `r params$sport`
`r knitr::knit_child(text=unlist(src))`
R Script:
mydata <- data.frame(First = c("John", "Hui", "Jared","Jenner"),
Second = c("Smith", "Chang", "Jzu","King"),
Sport = c("Football","Ballet","Ballet","Football"),
Age = c("12", "13", "12","13"),
Submission = c("Microbes may be the friends of future colonists living off the land on the moon, Mars or elsewhere in the solar system and aiming to establish self-sufficient homes.
Space colonists, like people on Earth, will need what are known as rare earth elements, which are critical to modern technologies. These 17 elements, with daunting names like yttrium, lanthanum, neodymium and gadolinium, are sparsely distributed in the Earth’s crust. Without the rare earths, we wouldn’t have certain lasers, metallic alloys and powerful magnets that are used in cellphones and electric cars.", "But mining them on Earth today is an arduous process. It requires crushing tons of ore and then extracting smidgens of these metals using chemicals that leave behind rivers of toxic waste water.
Experiments conducted aboard the International Space Station show that a potentially cleaner, more efficient method could work on other worlds: let bacteria do the messy work of separating rare earth elements from rock.", "“The idea is the biology is essentially catalyzing a reaction that would occur very slowly without the biology,” said Charles S. Cockell, a professor of astrobiology at the University of Edinburgh.
On Earth, such biomining techniques are already used to produce 10 to 20 percent of the world’s copper and also at some gold mines; scientists have identified microbes that help leach rare earth elements out of rocks.", "Blank"))
for (sport in unique(mydata$Sport)){
subgroup <- mydata[mydata$Sport == sport,]
rmarkdown::render("test.Rmd", output_format = "html_document", output_file = paste0('report.', sport, '.html'), params = list(data = subgroup, sport = sport))
rmarkdown::render("test.Rmd", output_format = "pdf_document", output_options = list(latex_engine = "xelatex"), output_file = paste0('report.', sport, '.pdf'), params = list(data = subgroup, sport = sport))
}
In order to directly create a pdf from your rmd-file , you could use the following function in a separate R script where your data is loaded, and then use map from the purrr package to iterate over the data (in the rmd-file the output must be set to pdf_document):
library(tidyverse)
library(lazyeval)
get_report <- function(sport){
sport <- enquo(sport)
mydata <- mydata %>%
filter(Sport == !!sport)
render("test.rmd", output_file = paste('report_', as_name(sport), '.pdf', sep=''))
}
map(as.vector(data$Sport), get_report)
Hope that is what you are looking for?

How to include the description of a data set in RMarkdown?

I am creating an RMarkdown for teaching and I would like to include the description of the data set on the R Markdown file. For example, if I use the data marketing from the R package datarium, I would like to be able to include the description obtained with ?marketing without having to open it online or in R.
marketing {datarium} R Documentation
Marketing Data Set
Description
A data frame containing the impact of three advertising medias (youtube, facebook and newspaper) on sales. Data are the advertising budget in thousands of dollars along with the sales. The advertising experiment has been repeated 200 times.
Usage
data("marketing")
Format
A data frame with 200 rows and 4 columns.
Examples
data(marketing)
res.lm <- lm(sales ~ youtube*facebook, data = marketing)
summary(res.lm)
Is this possible?
Using #MrFlick (more like MrShy) suggestion:
How to get text data from help pages in R?
We can create an R Markdown (I also wanted to hide the function used to get the help text) showing the description of the data as follows:
---
title: "Marketing"
author: 'Jon Doe'
date: ""
output: html_document
---
```{r}
library(datarium)
data("marketing")
```
```{r include=FALSE}
help_text <- function(...) {
file <- help(...)
path <- dirname(file)
dirpath <- dirname(path)
pkgname <- basename(dirpath)
RdDB <- file.path(path, pkgname)
rd <- tools:::fetchRdDB(RdDB, basename(file))
capture.output(tools::Rd2txt(rd, out="", options=list(underline_titles=FALSE)))
}
```
```{r}
# ?marketing
cat(help_extract(marketing), sep="\n")
# Data
head(marketing, 4)
```

Using an R script to create a highly repetitive Rmarkdown script

I analyze survey results regularly and like to use Rmarkdown so I can make nice HTML output of the results.
The surveys can be many questions (like 40), so creating 40 code chunks, with highly repetitive code and headers, can be annoying. And I can easily do this with a loop in R, I think. However, I'm just stuck on how to combine these 2 processes!
This was close --
how to create a loop that includes both a code chunk and text with knitr in R
But in the end, it was just a loop, and not very flexible. So, I couldn't add a figure to question 22 (or whatever).
### Question 1
#### `r key$Question_Text[key$Question=="Q1"][1]`
```{r chunk1}
quest <- "Q1"
# code for question 1
```
### Question 2
#### `r key$Question_Text[key$Question=="Q2"][1]`
```{r chunk2}
quest <- "Q2"
# Identical code for question 2
```
....and so on....
### Question 35
#### `r key$Question_Text[key$Question=="Q35"][1]`
```{r chunk35}
quest <- "Q35"
# Identical code for question 35
```
Because sometimes, a question has a special type of figure or tweak, I want the output to be something I can paste into RMD and make all the changes there. I just want to skip ahead as much as possible... by making all the boring, identical steps, fully automated.
make strings that I can refer to in loop
question<-paste(rep("Question",20), 1:20, sep=" ")
qnum<-paste0(rep("Q",20), 1:20, sep="")
quest_text_code <- paste0("#### ","`r key$Question_Text[key$Question==", "\"",qnum[i],'"' ,"][1]`")
chunk <- paste(rep("chunk",20), 1:20, sep="")
use sink() to send to a text file
sink("outfile.txt")
loop and paste and output into the sink
for(i in 1:20){ cat(paste("###", question[i], "\n", "\n",
quest_text_code,"\n", "\n",
"```", "{r ", chunk[i], "}", "\n","\n",
"function.dat(", qnum[i], ")","\n", "\n",
"function.dat.nr(", qnum[i], ")","\n", "\n",
"```", "\n", "\n"))
}
dev.off() # ends sink
After that, I was able to copy to an RMD file and use find and replace on a few glitches (extra leading spaces) I also had trouble adding "" marks with paste.

R Markdown embedding a function into a simple table

I'm new to R and I'm really liking the flexibility of R Markdown to create reports.
My problem is that I want to use a random number generating function I've created my tables. I want simple tables to include string headers and the following function:
> ran<-function(x){
+ x<-runif(n=1, min=13.5,max=15.5)
+ round(x, digits=2)}.
It won't allow me to create a table using this method?
```{r}
String |String |String
-------|------|------
ran(x)|ran(x)|ran(x)
```
My ultimate goal is to create practice worksheets with simple statistics that will be different but within a bounded integer range - so I can ask follow-up questions with some idea of the mean, median etc.
Any help would be greatly appreciated.
Perhaps you should read up on both how to build working R code and how to code up Rmd files since your function doesn't work and there are a few places in the R Markdown docs that show how to do this:
---
output: html_document
---
```{r echo=FALSE}
ran <- function(x) {
runif(n=1, min=13.5, max=15.5) + round(x, digits=2)
}
```
One |Two |Three
-----------|-------------|-----------
`r ran(2)` | `r ran(3)` | `r ran(4)`
`r ran(2)` | `r ran(3)` | `r ran(4)`
`r ran(2)` | `r ran(3)` | `r ran(4)`
`r ran(2)` | `r ran(3)` | `r ran(4)`
generates:
Also, neither SO nor RStudio charges extra for spaces in code blocks. It'd be good to show students good code style while you're layin' down stats tracks.
Here is an approach that automates much of the report generation and reduces the amount of code you need to type. For starters, you can turn this into a parameterized report, which would make it easier to create worksheets using different values of x. Here's an example:
In your rmarkdown document you would declare parameters x and n in the yaml header. n is the number of random values you want to produce for each value of x. The x and n values in the yaml header are just the defaults knitr uses if no other values are input when you render the report:
---
output: html_document
params:
x: !r c(1,5,10)
n: 10
---
Then, in the same rmarkdown document you would have the text and code for your worksheet. You access the parameters x and n with params$x and params$n, respectively.
For example, the rest of the rmarkdown document could look like the code below. We put x into a list called x_vals with named elements, so that the resulting column names in the output will be the names of the list elements. We feed that list to sapply to get a column of n random values for each value of x. The whole sapply statement is wrapped in kable, which produces a table in rmarkdown format.
```{r, include=FALSE}
library(knitr)
```
```{r, echo=FALSE}
# Create a named list of the x values that we passed into this document
x_vals = as.list(setNames(params$x, paste0("x=", params$x)))
kable(sapply(x_vals, function(i) round(runif(params$n, 13.5, 15.5) + i, 2)))
```
You can now click the "knit" button and it will produce a table using the default parameter values:
If instead you want to use different values for x and/or n, open a separate R script file and type the following:
rmarkdown::render("Worksheet.Rmd",
params = list(x = c(2,4,6,8),
n = 5),
output_file="Worksheet.html")
In the code above, the render function compiles the rmarkdown document we just created, but with new x and n values, and saves the output to a file called Worksheet.html. (I've assumed that we've saved the rmarkdown document to a file called Worksheet.Rmd.) Here's what the output looks like:
You can also, of course, add parameters for the lower and upper limits of the runif function, rather than hard-coding them as 13.5 and 15.5.
If you want to create several worksheets, each with different x values, you can put render in a loop:
df = expand.grid(1:3,5:6,10:11)
for (i in 1:nrow(df)) {
rmarkdown::render("Worksheet.Rmd",
params = list(x=unlist(df[i,]), n=10),
output_file=paste0(paste(unlist(df[i,]),collapse="_"),".html"))
}

Resources