Passing parameters through an RMarkdown document? - r

I am looking to create unique/individual reports for a given list of vendors. The ideal output format would be a seperate html file of the given vendors information.
The issue is that I am having trouble wrapping my head around creating parameterized reports in RMarkdown. I have been taking a look at this link to understand how to loop/iterate through RMarkdown reports
To illustrate and share the logic of what I want to execute is the following:
for (vendor in vendor.name) {
rmarkdown::render('input.Rmd', params = list(vendor = vendor))
}
Where I then have a print out of:
Vendor-1.html, Vendor-2.html, …, Vendor-4.html, and vendor-4.html
Which then gets saved locally to my computer. The part I have wrapping my head around is say we have a bar graph of sales by month, how would the parameter passing through the entire document know when to change the the vendor number for a unique view.
If anyone can share an example either iris, mtcars, or any base dataset within R I would really appreciate it. Looking at how this workflow/logic would work because I am struggling to understand the concept.
To specify, say I have this chunck of code here. How would the params$vendor function know to loop over another vendor if I am not calling it within the chunk. Within my dplyr verbs on my filter should I do , MVNDR_NBR == qc_sales$vendor_number or params$vendor? This is the piece that has me most confounded
sales_2021_stock <- qc_sales %>%
filter(FSCL_YR == 2021
, STR_NBR != '8119'
, MAPPED_ORD_SRC == 'QC'
, so_flg == 0
, YTLW_TY_LY_FLG == 'TY'
, MVNDR_NBR == '60031167'
, !SUB_DEPT_NBR %in% c('0025','0028')) %>%
group_by(MVNDR_NBR, MVNDR_NM, FSCL_WK_NBR, FSCL_YR) %>%
summarise(Sales = sum(ESVS_NET_SLS)) %>%
mutate(FSCL_YR = as.character(FSCL_YR)) %>%
collect()
sales_2020_stock <- qc_sales %>%
filter(FSCL_YR == 2020
, STR_NBR != '8119'
, MAPPED_ORD_SRC == 'QC'
, YTLW_TY_LY_FLG == 'LY'
, so_flg == 0
, MVNDR_NBR == '60031167'
, !SUB_DEPT_NBR %in% c('0025','0028')) %>%
group_by(MVNDR_NBR, MVNDR_NM, FSCL_WK_NBR, FSCL_YR) %>%
summarise(Sales = sum(ESVS_NET_SLS)) %>%
mutate(FSCL_YR = as.character(FSCL_YR)) %>%
collect()
sales_comp_line_stock <- rbind(sales_2021_stock, sales_2020_stock)
stock_comp <- ggplot(sales_comp_line_stock, aes(x = FSCL_WK_NBR, y = Sales, color = FSCL_YR ))+
geom_line(size = 1.25, aes(color = FSCL_YR))+
geom_smooth(size = .50, aes(color = FSCL_YR), se = FALSE, method = "auto")+
scale_x_continuous(breaks=seq(0,weeks,1))+
scale_y_continuous(labels = scales::dollar_format(scale = .0001, suffix = "K"))+
scale_color_manual(values=c("#0298F9", "#F96302"))+
theme_economist()+
ggtitle('Week-Over-Week Sales (Stock) 2021 v 2020')+
theme(
panel.grid.major = element_line(linetype = "dotted"),
axis.text = element_text( size = 10),
legend.position = c(0, 1),legend.justification = c(0, 1),
plot.title = element_text( size = 14, margin=margin(0,0,20,0), hjust = 0.5),
panel.background = element_rect(fill = NA),
strip.text = element_text(size=10)
)
stock_comp
My dataset looks as follows, these are the same vendors included in my parameter, how would I create a ggplot displaying sales month over month printed into an individual html output?:
Ideally one plot would be written like this:
ggplot(sample_vendor_tbl, aes(x = FSCL_MTH_NM, y = Sales)) +
geom_col()
To print out multiple ggplots within a document I would do the following:
for (i in vendor_nbr){
ggplot(mydata, aes(x = Month, y = sales))+
geom_col()
}
I am just confused here when we need to accounting for the parameter. How can I create a plot for the given vendor for its print out, similar to the example posted in your answer. I basically want to do exactly what you did in your answer with the charts but leveraging ggplot instead of base R
To create a ggplot using parameters I had to pull in params$mvndr_nbr into my dplyr verbs as follows:
sample_vendor_tbl %>%
filter(MVNDR_NBR == params$MVNDR_NBR) %>%
ggplot(aes(x = FSCL_MTH_NM, y = Sales)) +
geom_col()

I think the step you were missing was specifying the output filename so that each "vendor" would have its own file; otherwise, the same filename is overwritten each time leaving you with a single HTML document.
An example:
---
title: mtcars cyl
author: r2evans
params:
cyl: null
---
We have chosen to plot a histogram of `mtcars` where `$cyl` is equal to `r params$cyl`:
```{r}
dat <- subset(mtcars, cyl == params$cyl)
if (nrow(dat) > 0) {
hist(dat$disp)
}
```
Calling this with:
for (cy in c(4,6,8)) {
rmarkdown::render("~/StackOverflow/10466439/67525642.Rmd",
output_file = sprintf("cyl_%s.html", cy),
params = list(cyl = cy))
}
will render three HTML files, cyl_4.html, cyl_6.html, and cyl_8.html, each with differing content:

Related

How can you make a loop so that you are creating this same graph over and over but for a different variable each time?

This is the coding that I have right now that creates the graph that I want. This is only for well 'aa1' though, and I want to learn how to make a loop so that I can make this graph for all of my wells. Any ideas?
longer_raindata %>%
select(well, metal, level, smpl_mth) %>%
filter(well == 'aa1') %>%
ggplot(aes(metal, level, fill = smpl_mth))+
scale_fill_manual(values = c('plum', 'lightsteelblue', 'darkolivegreen', 'indianred'))+
geom_col(position = "dodge")+
labs(title = 'Metals in Well AA1',
x = 'Metals',
y = 'µg/l')
One option to achieve your desired result would be to put your code into a plotting function which takes one argument .well and adjust your filter statement to filter your data for that .well.
Then you could use e.g. lapply to loop over the unique wells in your dataset to get a list of plots for each well.
Using some fake example data:
library(ggplot2)
library(dplyr)
longer_raindata <- data.frame(
metal = LETTERS[1:3],
level = 1:6,
well = rep(c("aa1", "aa2"), each = 6),
smpl_mth = rep(letters[1:2], each = 3)
)
plot_fun <- function(.well) {
longer_raindata %>%
select(well, metal, level, smpl_mth) %>%
filter(well == .well) %>%
ggplot(aes(metal, level, fill = smpl_mth))+
scale_fill_manual(values = c('plum', 'lightsteelblue', 'darkolivegreen', 'indianred'))+
geom_col(position = "dodge")+
labs(title = paste0('Metals in Well ', toupper(.well)),
x = 'Metals',
y = 'µg/l')
}
lapply(unique(longer_raindata$well), plot_fun)
#> [[1]]
#>
#> [[2]]

R: Creating multiple maps using a loop through variable names

I'd like to create a map that shows the value of variable for a given state. The dataset contains around a thousand variables and is at the state level, for about 100 years.
The code I have and works is:
plot_usmap(data = database, values = "var1") + scale_fill_continuous(
low = "white", high = "blue", na.value="light gray", name = "Title of graph", label = scales::comma
) + theme(legend.position = "right")
Now what I'd like to do is create this map for a list of about 15 variables and 10 years.
I'm usually a STATA user, and there I could define a variable list and then loop through the variable list. On page 7 of this document of "A Quick Introduction to R (for Stata Users)", I tried applying the following solution:
vars <- c("database$var1", "database$var2", "database$var3","database$var4", "database$var5", "database$var6", "database$var7", "database$var8", "database$var9", "database$var10", "database$var11", "database$var12")
for(var in vars) {
v <- get(var)
plot_usmap(data = darabase, values = "v") +
scale_fill_continuous(low = "white", high = "blue", na.value="light gray", name = "v", label = scales::comma) + theme(legend.position = "right")}
With this code, I get error "Error in get(var) : object 'database$var1' not found. When I try view(database$var1) it appears. The next problem is that I'd like the name of the graph to be the label of the variable rather than the variable. In the example above, I'd restricted the whole data to only include 1 year, so if there's a solution to set the code up that I could use the whole database but map only select years, that would be great.
Any insights would be appreciated! I read that in R, "for" isn't used as much, so if there is a better way to do it, please let me know.
Basically it't not that different in R. First, there is no need to use get and in general should be avoided. Second, while for loops are fine the more R-ish way would be to use lapply. Especially when making plots via ggplot2 it is recommended to use lapply.
Making use of some fake example data to mimic your database:
library(usmap)
library(ggplot2)
# Example data
database <- statepop
names(database) <- c("fips", "abbr", "full", "var1")
database$var2 <- database$var1
vars <- c("var1", "var2")
lapply(vars, function(x) {
plot_usmap(data = database, values = x) +
scale_fill_continuous(
low = "white", high = "blue", na.value="light gray", name = "Title of graph", label = scales::comma
) +
theme(legend.position = "right") +
labs(title = x)
})
#> [[1]]
#>
#> [[2]]
EDIT Assuming that your data contains a column with years I would suggest to wrap the plotting code inside a function which takes your database, a vectors of vars and the desired year as a argument. But there are other approaches and which works best depends on your desired result.
library(usmap)
library(ggplot2)
library(labelled)
# Example data
database <- statepop
names(database) <- c("fips", "abbr", "full", "var1")
database$year <- 2015
database <- rbind(database, transform(database, year = 2020))
var_label(database$var1) <- "Population"
vars <- c("var1")
names(vars) <- vars
map_vars <- function(.data, vars, year) {
lapply(vars, function(x, year) {
.data <- .data[.data$year == year, ]
plot_usmap(data = database, values = x) +
scale_fill_continuous(
low = "white", high = "blue", na.value = "light gray", name = "Title of graph", label = scales::comma
) +
theme(legend.position = "right") +
labs(title = paste(var_label(database[[x]]), "in", year))
}, year = year)
}
map_vars(database, vars, 2015)
#> $var1
map_vars(database, vars, 2020)
#> $var1

How can I loop colnames as plot titles along with data using lapply in R?

I have this function that works close to what I need -- it creates a clean table from my original raw data, makes it a ggplot, and uses lapply to run it through all the variables I want from the original table, data:
#Get colnames of all numeric varaibles
nlist <- names(data[,sapply(data,is.numeric)])
#Create function
varviz_n <- function(dat, var){
var <- dat[,which(names(dat) == var)]
title<-var
tab <- dat %>%
group_by(group = cut(var, breaks = seq(0, max(var), 10)),
groupedsupport) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
filter(!is.na(group),n>10)
tab2 <- tab %>%
group_by(groupedsupport) %>%
summarise(mean = mean(freq),
median = median(freq))
finaltab <- tab %>% left_join(tab2, by = "groupedsupport")
fplot <- finaltab %>%
ggplot(aes(fill=group,x=groupedsupport,y=freq)) +
geom_col(position="dodge") +
geom_text(aes(label = paste("n =",n), n = (n + 0.05)), position = position_dodge(0.9), vjust = 0, size=2) +
geom_errorbar(aes(groupedsupport, ymax = median, ymin = mean),
size=0.5, linetype = "longdash", inherit.aes = F, width = 1) +
scale_y_continuous(labels = scales::percent) +
xlab("") + ylab("") +
ggtitle(title) +
scale_fill_discrete("")
filename = filename <- paste0(finaltab$var)
ggsave(paste("Plots/",filename,".png"), width = 10, height = 7)
return(fplot)
}
#Run function
lapply(nlist, varviz_n, dat = data)
This does almost exactly what I want -- the problem is that all of the variables it's running through are 0-100 numeric and it's creating the plots but I can't at all figure out how to get the column name as the title of the plot or of the key. So I have no idea which graph is getting returned.
Can someone please help me figure out a way to get the column name from nlist to be the title of my plot? The way it is now prints out the first value of the column instead of the actual column name:
The final piece of code to save it in the 'Plots' folder doesn't work either since the title/var isn't populating correctly.
You can use something like this to create data to test out the code: data <- data.frame(v1 = sample(1:100,1000,replace=T),v2 = sample(1:100,1000,replace=T),v3 = sample(1:100,1000,replace=T),groupedsupport = sample(LETTERS[1:3],1000,replace = TRUE))
Thanks!
I think you just need to swap these steps:
var <- dat[,which(names(dat) == var)]
title <- var
should be
title <- var
var <- dat[,which(names(dat) == var)]
var being assigned to the column of selected data so when it is called again in title, it is looking at that vector and not the column name.
If this doesn't resole it, please give us some code to mimic the contents of data.

For loop to create a list of ggplots always save the same coordinates for points and segments

first you need to load these packages:
library(ggplot2)
library(ggrepel)
I have a dataframe "dframe" like this:
V1 V2 V3 V4 V5 V6 V7 Groups
0.05579838 -0.44781204 -0.164612982 -0.05362210 -0.23103516 -0.04923499 -0.06634579 1
0.14097924 -0.35582736 0.385218841 0.18004788 -0.18429327 0.29398646 0.69460669 2
0.10699838 -0.38486299 -0.107284020 0.16468591 0.48678593 -0.70776085 0.20721932 3
0.22720072 -0.30860464 -0.197930310 -0.24322096 -0.30969028 -0.04460600 -0.08420536 4
0.24872635 -0.23415141 0.410406163 0.07072146 -0.09302970 0.01662256 -0.21683816 5
0.24023670 -0.27807097 -0.096301697 -0.02373198 0.28474825 0.27397862 -0.29397324 6
0.30358363 0.05630646 -0.115190308 -0.51532428 -0.08516130 -0.08785924 0.12178198 7
0.28680386 0.07609196 0.488432908 -0.13082951 0.00439161 -0.17572986 -0.25854047 8
0.30084361 0.06323714 -0.008347161 -0.26762137 0.40739524 0.22959024 0.19107494 9
0.27955675 0.22533959 -0.095640072 -0.27988676 -0.04921808 -0.10662521 0.19934074 10
0.25209125 0.22723231 0.408770841 0.13013867 -0.03850484 -0.23924023 -0.16744745 11
0.29377963 0.13650247 -0.105720288 -0.00316556 0.29653723 0.25568169 0.06087608 12
0.24561895 0.28729625 -0.167402464 0.24251060 -0.22199262 -0.17182828 0.16363196 13
0.25150342 0.25298115 -0.147945172 0.43827820 0.02938933 0.01778563 0.15241257 14
0.30902922 -0.01299330 -0.261085058 0.13509982 -0.40967529 -0.11366113 -0.06020937 15
0.28696274 -0.12896680 -0.196764195 0.39259942 0.08362863 0.25464125 -0.29386260 16
Here is a reproducible dataframe that you can use from Mark Peterson:
dframe <-
rnorm(70) %>%
matrix(nrow = 10) %>%
as_tibble() %>%
setNames(paste0("V", 1:ncol(.))) %>%
mutate(Groups = 1:nrow(.)
, Label = 1:nrow(.))
I created a table of combinations of columns I want to be used from my dataframe:
#Create all possible combinations
combs<-expand.grid(seq(7),seq(7))
#Remove duplicate and order
combs<-combs[combs$Var1 != combs$Var2,]
combs<-combs[order(combs[,1]),]
then I made a for loop supposed to generate a list of ggplots, 1 plot by combination:
list_EVplots<-list()
for(i in seq(nrow(combs))){
list_EVplots[[paste(combs[i,1],"&",combs[i,2])]]<- ggplot(data=dframe) +
ggtitle(paste("Eigenvector Plot - Pairwise",
"correlation with","adjustment")) +
geom_point(aes(x = dframe[,combs[i,1]], y = dframe[,combs[i,2]],
color = Groups)) +
geom_segment(aes(x = rep(0,nrow(dframe)), y = rep(0,nrow(dframe)),
xend = dframe[,combs[i,1]], yend = dframe[,combs[i,2]],
color = Groups),
size = 1, arrow = arrow(length = unit(0.3,"cm"))) +
geom_label_repel(aes(x = dframe[,combs[i,1]], y = dframe[,combs[i,2]],
label = rownames(dframe))) +
scale_color_manual(values=colors) +
xlab(paste0("Eigenvector ",combs[i,1])) +
ylab(paste0("Eigenvector ",combs[i,2])) +
theme(plot.title = element_text(hjust = 0.5),
axis.title = element_text(size = 13),
legend.text = element_text(size=12)) +
geom_hline(yintercept = 0, linetype="dashed") +
geom_vline(xintercept = 0, linetype="dashed")
}
After running this for loop, I obtain my list "list_EVplots".
Problem: iterations seem to work for xlab() and ylab(), it also work for the names of plots in the list, but the coordinates of geom_point(aes()) and geom_segment(aes()) do not change. Coordinates stay the same when they obviously should change!
I think the coordinates stay locked on the one used for the first plot of the first iteration.
If anyone has the solution for that I would be very grateful for your help.
Working under Linux 16.04 with R Studio. R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
I tried with a subsetted dataframe with only the columns I wanted to work with instead of using an 8 columns dataframe: didn't work.
Expected: The list should contain different plots: all plots should be different.
Problem: All plots have the same coordinates for dots and segments in the list.
The simplest answer is often the easiest one: try to avoid using for loops in places where lapply is more appropriate. I don't see anything obvious in your code that suggests where the problem lies, but I am guessing that it is a problem in the deeply nested [] statements.
Here is an approach using lapply and aes_string to handle the variables. If you want something other than a full pairwise set of plots, you may have to modify the calls to the two lapply's a bit.
First, some reproducible data (made using dplyr). Note that I made the Labels explicit instead of relying on the rownames (this is good practice, and far easier to use in calls to ggplot).
dframe <-
rnorm(70) %>%
matrix(nrow = 10) %>%
as_tibble() %>%
setNames(paste0("V", 1:ncol(.))) %>%
mutate(Groups = 1:nrow(.)
, Label = 1:nrow(.))
Then, I am pulling out the columns that you want to use for your plots. I am naming them so that the returned list has the column names automatically assigned.
my_cols <-
names(dframe)[1:7] %>%
setNames(.,.)
Then, just set up a nested lapply to work through all of the pairwise comparisons:
plot_list <-
lapply(my_cols, function(col1){
lapply(my_cols, function(col2){
if(col1 == col2){
return(NULL)
}
ggplot(dframe) +
ggtitle(paste("Eigenvector Plot - Pairwise",
"correlation with","adjustment")) +
geom_point(aes_string(x = col1
, y = col2
, color = "Groups")) +
geom_segment(aes_string(xend = col1
, yend = col2
, color = "Groups")
, x = 0
, y = 0
, size = 1
, arrow = arrow(length = unit(0.3,"cm"))) +
geom_label_repel(aes_string(x = col1
, y = col2
, label = "Label")) +
xlab(paste0("Eigenvector ", col1)) +
ylab(paste0("Eigenvector ", col2)) +
theme(plot.title = element_text(hjust = 0.5),
axis.title = element_text(size = 13),
legend.text = element_text(size=12)) +
geom_hline(yintercept = 0, linetype="dashed") +
geom_vline(xintercept = 0, linetype="dashed")
})
})
Note that you did not include the colors that you wanted to use for the groups, so I left the defaults instead.
The plots come out correctly and this should be easier to work through.

Subset/filter in dplyr chain with ggplot2

I'd like to make a slopegraph, along the lines (no pun intended) of this. Ideally, I'd like to do it all in a dplyr-style chain, but I hit a snag when I try to subset the data to add specific geom_text labels. Here's a toy example:
# make tbl:
df <- tibble(
area = rep(c("Health", "Education"), 6),
sub_area = rep(c("Staff", "Projects", "Activities"), 4),
year = c(rep(2016, 6), rep(2017, 6)),
value = rep(c(15000, 12000, 18000), 4)
) %>% arrange(area)
# plot:
df %>% filter(area == "Health") %>%
ggplot() +
geom_line(aes(x = as.factor(year), y = value,
group = sub_area, color = sub_area), size = 2) +
geom_point(aes(x = as.factor(year), y = value,
group = sub_area, color = sub_area), size = 2) +
theme_minimal(base_size = 18) +
geom_text(data = dplyr::filter(., year == 2016 & sub_area == "Activities"),
aes(x = as.factor(year), y = value,
color = sub_area, label = area), size = 6, hjust = 1)
But this gives me Error in filter_(.data, .dots = lazyeval::lazy_dots(...)) :
object '.' not found. Using subset instead of dplyr::filter gives me a similar error. What I've found on SO/Google is this question, which addresses a slightly different problem.
What is the correct way to subset the data in a chain like this?
Edit: My reprex is a simplified example, in the real work I have one long chain. Mike's comment below works for the first case, but not the second.
If you wrap the plotting code in {...}, you can use . to specify exactly where the previously calculated results are inserted:
library(tidyverse)
df <- tibble(
area = rep(c("Health", "Education"), 6),
sub_area = rep(c("Staff", "Projects", "Activities"), 4),
year = c(rep(2016, 6), rep(2017, 6)),
value = rep(c(15000, 12000, 18000), 4)
) %>% arrange(area)
df %>% filter(area == "Health") %>% {
ggplot(.) + # add . to specify to insert results here
geom_line(aes(x = as.factor(year), y = value,
group = sub_area, color = sub_area), size = 2) +
geom_point(aes(x = as.factor(year), y = value,
group = sub_area, color = sub_area), size = 2) +
theme_minimal(base_size = 18) +
geom_text(data = dplyr::filter(., year == 2016 & sub_area == "Activities"), # and here
aes(x = as.factor(year), y = value,
color = sub_area, label = area), size = 6, hjust = 1)
}
While that plot is probably not what you really want, at least it runs so you can edit it.
What's happening: Normally %>% passes the results of the left-hand side (LHS) to the first parameter of the right-hand side (RHS). However, if you wrap the RHS in braces, %>% will only pass the results in to wherever you explicitly put a .. This formulation is useful for nested sub-pipelines or otherwise complicated calls (like a ggplot chain) that can't otherwise be sorted out just by redirecting with a .. See help('%>%', 'magrittr') for more details and options.
Writing:
geom_text(data = df[df$year == 2016 & df$sub_area == "Activities",],...
instead of
geom_text(data = dplyr::filter(., year == 2016 & sub_area == "Activities"),...
makes it work but you still have issues about the position of the text (you should be able to easily find help on SO for that issue).

Resources