Mass create documents in R markdown - r

I'm wondering if anyone can help me with a dilemma I'm having.
I have a dataset of around 300 individuals that contains some basic information about them (e.g. age, gender). I want to knit a separate R markdown report for each individual that details this basic information, in Word format. And then I want to save each report with their unique name.
The actual code which sits behind the report doesn't change, only the details of each individual. For example:
Report 1: "Sally is a female who is 34 years old". (Which I would want to save as Sally.doc)
Report 2: "Mike is a male who is 21 years old." (Saved as Mike.doc)
Etc, etc.
Is there a way I can do this without manually filtering the data, re-kniting the document, and then saving manually with unique names?
Thanks a lot!

Use the render function and pass a list of names to the function:
renderMyDocument <- function(name) {
rmarkdown::render("./printing_procedures/dagr_parent.Rmd",
params = list(names = name),
output_file = paste("~/document_", name, '.doc', sep = '')
)
}
lapply(names, renderMyDocument)
Then just make sure your RMD file can take the params argument via the YAML:
---
params:
names:
---
RStudio documentation on parameterized reports here: https://rmarkdown.rstudio.com/developer_parameterized_reports.html

Related

Looping variables in the parameters of the YAML header of an R Markdown file and automatically outputting a PDF for each variable

I am applying for junior data analyst positions and have come to the realization that I will be sending out a lot of cover letters.
To (somewhat) ease the pain and suffering that this will entail, I want to automate the parts of the cover letter that is suited for automation and will be using R Markdown to (hopefully) achieve this.
For the purposes of this question, let's say that the parts I am looking to automate is the position applied for and the company looking to hire someone for that position, to be used in the header of the cover letter.
These are the steps I envision in my mind's eye:
Gather the positions of interest and corresponding company in an Excel spreadsheet. This gives and Excel sheet with two columns with the variables position and company, respectively.
Read the Excel file into the R Markdown as a data frame/tibble (let's call this jobs).
Define two parameters in the YAML header of the .Rmd file to look something like this:
---
output: pdf_document
params:
position: jobs$position[i]
company: jobs$company[i]
---
The heading of the cover letter would then look something like this:
"Application for the position as r params$position at r params$company"
To summarize: In order to not have to change the values of the parameters manually for each cover letter, I would like to read an Excel file with the position titles and company names, loop these through the parameters in the YAML header, and then have R Markdown output a PDF for each pair of position and company (and ideally have the name of each PDF include the position title and company name for easier identification when sending the letters out). Is that possible? (Note: the title of the position and the company name does not necessarily have to be stored in an Excel file, that's just how I've collected them.)
Hopefully, the above makes clear what I am trying to achieve.
Any nudges in the right direction is greatly appreciated!
EDIT (11 July 2021):
I have partly arrived at an answer to this.
The trick is to define a function that includes the rmarkdown::render function. This function can then be included in a nested for-loop to produce the desired PDF files.
Again, assuming that I want to automate the position and the company, I defined the rendering function as follows (in a script separate from the "main" .Rmd file containing the text [named "loop_test.Rmd" here]):
render_function <- function(position, company){
rmarkdown::render(
# Name of the 'main' .Rmd file
'loop_test.Rmd',
# What should the output PDF files be called?
output_file = paste0(position, '-', company, '.pdf'),
# Define the parameters that are used in the 'main' .Rmd file
params = list(position = position, company = company),
evir = parent.frame()
)
}
Then, use the function in a for-loop:
for (position in positions$position) {
for (company in positions$company) {
render_function(position, company)
}
}
Where the Excel file containing the relevant positions is called positions with two variables called position and company.
I tested this method using 3 "observations" for a position and a company, respectively ("Company 1", "Company 2" and "Company 3" and "Position 1", "Position 2" and "Position 3"). One problem with the above method is that it produces 3^2 = 9 reports. For example, Position 1 is used in letters for Company 1, Company 2 and Company 3. I obviously only want to match outputs for Company 1 and Position 1. Does anyone have any idea on how to achieve this? This is quite unproblematic for two variables with only three observations, but my intent is to use several additional parameters. The number of companies (i.e. "observations") is, unfortunately, also highly likely to be quite numerous before I can end my search... With, say, 5-6 parameters and 20 companies, the number of reports output will obviously become ridiculous.
As said, I am almost there, but any nudges in the right direction for how to restrict the output to only "match" the company with the position would be highly appreciated.
You can iterate over by row like below.
for(i in 1:nrow(positions)) {
render_function(positions$position[i], positions$company[i])
}

Extracting/Parsing from a PDF to CSV using R?

I am trying to extract data from a poorly formatted PDF into a .csv file for geocoding. The data I am concerned with are the locations of Farmers' Markets in Colorado for 2018 (https://www.colorado.gov/pacific/sites/default/files/Colorado%20Farmers%27%20Markets.pdf). The necessary fields I am looking to have are Business_Name, Address, City, State, Zip, Hours, Season, Email, and Website. The trouble is that the data are all in one column, and not all of the entries have 100% complete data. That is to say that one entry may have five attributes under it (name, address, hours, zip, website) and another may only have 2 lines of the attributes (name, address).
I found an embedded map of locations here (http://www.coloradofarmers.org/find-markets/) that references the PDF file above. I was able to save this map to MyMaps and copy/paste the table to a CSV, but there are missing entries.
Is there a way to cleanly parse this data from PDF to CSV? I imagine what I need to do is create a dictionary of Colorado towns with markets (e.g. 'Denver', 'Canon City', 'Telluride') and then basically have R look through the column, put every new line that exists between look-up cities on the previous city's line all in one row in separate field columns. Or as one comma-delimited field to then parse out based on what the fields looks like.
Here's what I have so far:
#Set the working directory
setwd("C:/Users/bwhite/Desktop")
#download the PDF of data
?download.file
download.file("https://www.colorado.gov/pacific/sites/default/files/Colorado%20Farmers%27%20Markets.pdf", destfile = "./ColoradoMarkets2018.pdf", method = "auto", quiet = FALSE, mode = "w", cacheOK=TRUE)
#import the pdf table library from CRAN
install.packages("pdftables")
library(pdftables)
#convert pdf to CSV
?convert_pdf
convert_pdf("Colorado Farmers' Markets.pdf",output_file = "FarmersMarkets.csv",
format = "csv", message = TRUE, api_key = "n7qgsnz2nkun")
# read in CSV
Markets18 <-read.csv("./FarmersMarkets.csv")
#create a look-up table list of Colorado cities
install.packages("htmltab")
library(htmltab)
CityList <-htmltab("https://en.wikipedia.org/wiki/List_of_cities_and_towns_in_Colorado",1)
names(CityList)
Any help is appreciated.
You can only attempt to extract information that is consistent. I'm not an expert but I tried to build a logic for some part. Pages 2-20 are somewhat free of dirty data. Also if you notice, each group can be split at p.m. (for most part). Since number of columns is different for some of them, it was difficult to build one logic. Even the extracted dataframe would require some transformation.
library(pdftools)
text<-pdf_text("Colorado Farmers' Markets.pdf")
library(plyr)
new<-data.frame()
text4<-data.frame(Reduce(rbind, text),row.names =c() ,stringsAsFactors = FALSE)
for (i in 2:20){
list1<-text4[i,1]
list1<-strsplit(list1,'p.m.')
final<-data.frame(Reduce(rbind, list1),row.names =c() ,stringsAsFactors = FALSE)
for (i in 1:dim(final)[1]){
c<-final[i,]
c<-strsplit(c,'\n')
new<-rbind.fill(new,data.frame(t(data.frame(c,row.names =c()))))
}
}

Parameterized reports in RMarkdown - How to ask for parameters once?

I am currently trying to utilize parameterized reports to allow users to input a dataset (and a few more variables of interest) that will then be fed into an R script that performs and outputs a variety of analyses. These datasets will have information on multiple subjects, and the goal is to produce one report for each subject within the dataset. Thus, I utilize a for loop that loops through the Usernames within the dataset (called map). I then input a .Rmd file which is responsible for the bulk of the analysis. The for loop essentially refers to this .Rmd file for the 50 or so subjects, and outputs the 50 or so reports.
for (id in unique(map$UserName)){
# bunch of code for processing
render(input = "../lib/scripthtml.Rmd",output_file = paste0('report.',
id, '.html'),"html_document",
output_dir = "Script_output", params = "ask") }
What I am currently trying to do is I am trying to utilize parameterized reports in Shiny to allow for the user to input their own dataset (map). Thus, I specified a parameter and utilized params = ask in the render step. The main issue lies here:
Since the render step is under the for loop, it is basically run for each subject. As a result, the params ask interface loads up 50 times, asking for the user to provide their dataset each time.
Is there anyway I can avoid this? How can I get a user to supply their dataset file as a parameter, then utilize it for all 50 reports?
All your variables may be passed through in your render command, I do this for thousands of reports currently.
YAML of .Rmd template
This may include default values for certain parameters depending on your requirements, for illustrative purposes I have left them as empty strings here.
---
params:
var1: ""
var2: ""
var3: ""
---
Loading data set
In shiny, you can have the file input once and re-use for each report. Passing elements of the data frame to the render command in the next section.
Pseudo code for render in for loop
for (i in 1:n) {
rmarkdown::render(
"template.Rmd",
params = list(
var1 = df$var1[i],
var2 = df$var2[i],
var3 = df$var3[i]
),
output_file = out_file
)
}
Note: within a shiny app, you will need to use df()$var1 assuming the file input will become a reactive function.
You can then use the parameters throughout your template using the params$var1 convention.

markdown and bookdown -run by factors

I am brand new to markdown language.
I am using bookdown to generate reports. I have two questions:
If I have one stacked dataset with one column being a factor...
Is it possible to run separate reports by each factor level (note: I know how to do this analysis in R, I want to know if I can export separate results by each factor as separate reports. Any tips to do this are appreciated.
Can you reference this level in the TEXT section of the report?
I want one report titled, "Results for A" with stats=1234 and another report titled "Results for B with stats=567" where A and B are the levels of a factor.
Does that make sense? All help is appreciated.
I want one report titled, "Results for A" with mode = 2 and another report titled "Results for B with mode = 3".
Does that make sense? All help is appreciated.
You can pass a parameter to the report. The parameter has to be defined in the yaml header
Example:
in example.rmd:
---
output: html_document
params:
stats: NA # default value
---
Results for stats = `r params$stats`
And pass the parameter as such:
rmarkdown::render("example.rmd", params = list(stats = 123))

Is it possible to to export from reporttools?

I am using tableNominal{reporttools} to produce frequency tables. The way I understand it, tableNominal() produces latex code which has to be copied and pasted onto a text file and then saved as .tex. But is it possible to simple export the table produced as can be done in print(xtable(table), file="path/outfile.tex"))?
You may be able to use either latex or latexTranslate from the "Hmisc" package for this purpose. If you have the necessary program infrastructure the output gets sent to your TeX engine. (You may be able to improve the level of our answers by adding specific examples.)
Looks like that function does not return a character vector, so you need to use a strategy to capture the output from cat(). Using the example in the help page:
capture.output( TN <- tableNominal(vars = vars, weights = weights, group = group,
cap = "Table of nominal variables.", lab = "tab: nominal") ,
file="outfile.tex")

Resources