Using knitr to produce complex dynamic documents - r

The minimal reproducible example (RE) below is my attempt to figure out how can I use knitr for generating complex dynamic documents, where "complex" here refers not to the document's elements and their layout, but to non-linear logic of the underlying R code chunks. While the provided RE and its results show that a solution, based on such approach might work well, I would like to know: 1) is this a correct approach of using knitr for such situations; 2) are there any optimizations that can be made to improve the approach; 3) what are alternative approaches, which could decrease the granularity of code chunks.
EDA source code (file "reEDA.R"):
## #knitr CleanEnv
rm(list = ls(all.names = TRUE))
## #knitr LoadPackages
library(psych)
library(ggplot2)
## #knitr PrepareData
set.seed(100) # for reproducibility
data(diamonds, package='ggplot2') # use built-in data
## #knitr PerformEDA
generatePlot <- function (df, colName) {
df <- df
df$var <- df[[colName]]
g <- ggplot(data.frame(df)) +
scale_fill_continuous("Density", low="#56B1F7", high="#132B43") +
scale_x_log10("Diamond Price [log10]") +
scale_y_continuous("Density") +
geom_histogram(aes(x = var, y = ..density..,
fill = ..density..),
binwidth = 0.01)
return (g)
}
performEDA <- function (data) {
d_var <- paste0("d_", deparse(substitute(data)))
assign(d_var, describe(data), envir = .GlobalEnv)
for (colName in names(data)) {
if (is.numeric(data[[colName]]) || is.factor(data[[colName]])) {
t_var <- paste0("t_", colName)
assign(t_var, summary(data[[colName]]), envir = .GlobalEnv)
g_var <- paste0("g_", colName)
assign(g_var, generatePlot(data, colName), envir = .GlobalEnv)
}
}
}
performEDA(diamonds)
EDA report R Markdown document (file "reEDA.Rmd"):
```{r KnitrSetup, echo=FALSE, include=FALSE}
library(knitr)
opts_knit$set(progress = TRUE, verbose = TRUE)
opts_chunk$set(
echo = FALSE,
include = FALSE,
tidy = FALSE,
warning = FALSE,
comment=NA
)
```
```{r ReadChunksEDA, cache=FALSE}
read_chunk('reEDA.R')
```
```{r CleanEnv}
```
```{r LoadPackages}
```
```{r PrepareData}
```
Narrative: Data description
```{r PerformEDA}
```
Narrative: Intro to EDA results
Let's look at summary descriptive statistics for our dataset
```{r DescriptiveDataset, include=TRUE}
print(d_diamonds)
```
Now, let's examine each variable of interest individually.
Varible Price is ... Decriptive statistics for 'Price':
```{r DescriptivePrice, include=TRUE}
print(t_price)
```
Finally, let's examine price distribution across the dataset visually:
```{r VisualPrice, include=TRUE, fig.align='center'}
print(g_price)
```
The result can be found here:
http://rpubs.com/abrpubs/eda1

I don't understand what's non-linear about this code; perhaps because the example (thanks for that by the way) is small enough to demonstrate the code but not large enough to demonstrate the concern.
In particular, I don't understand the reason for the performEDA function. Why not put that functionality into the markdown? It would seem to be simpler and clearer to read. (This is untested...)
Let's look at summary descriptive statistics for our dataset
```{r DescriptiveDataset, include=TRUE}
print(describe(diamonds))
```
Now, let's examine each variable of interest individually.
Varible Price is ... Decriptive statistics for 'Price':
```{r DescriptivePrice, include=TRUE}
print(summary(data[["Price"]]))
```
Finally, let's examine price distribution across the dataset visually:
```{r VisualPrice, include=TRUE, fig.align='center'}
print(generatePlot(data, "Price"))
```
It looked like you were going to show the plots for all the variables; are you perhaps looking to loop there?
Also, this wouldn't change the functionality, but it would be much more within the R idiom to have performEDA return a list with the things it had created, rather than assigning into the global environment. It took me a while to figure out what the code did as those new variables didn't seem to be defined anywhere.

Related

How do I reference simple vectors from one code chunk to another in knitr?

I'm new to R, and trying to work a side project to supplement my in-class learning.
How on earth do I reference simple vectors from one code chunk to another in knitr?
For example, if I have a user input 3 values, and want to use those values in a later chunk, what's the best way to do so?
EX:
```{R User Input, echo=FALSE, REPLACE=FALSE}
redoffensedicenumber <- as.numeric(readline(prompt="Red Dice : "))
blackoffensedicenumber <- as.numeric(readline(prompt="Black Dice : "))
whiteoffensedicenumber <- as.numeric(readline(prompt="White Dice : "))
```
```{R Create Storage Matrix, eval=FALSE, include=FALSE}
zero <- c(0)
storagematrix <- matrix (zero, nrow = 10000 , ncol = redoffensedicenumber + blackoffensedicenumber + whiteoffensedicenumber , byrow = TRUE)
```

How can I make a variable-length report using Rmarkdown?

I am using R / Rmarkdown / knitr to generate multiple reports (pdfs) via render(), but the content / length of the reports will vary depending on certain characteristics of the underlying data.
As an example, let's say I have 10 different datasets of 50 variables each and I'm examining a correlation matrix of all 50 variables in the data. I want to produce a report for each dataset that has a new page for each variable pair that has a correlation that is greater than 0.5 and each variable pair that has a correlation that is less than -0.5. The number of correlations that will meet these thresholds will vary by dataset, and thus the report length / number of pages will vary by dataset.
I've learned to use {asis, echo = somecondition, eval = somecondition} to evaluate whether an entire section needs to be included (e.g., when there are no negative correlations less than -0.5). I have also seen solutions utilizing 'for' loops when there might be variable-length arguments across reports, but those solutions don't include printing each result on a new page. I'd also like to include section headers on each of the pages reporting the correlations as well.
The difficulty for me is that any solution I can think of requires nesting chunks of text and r code within one another. For some sample Rmd code of how I am approaching the problem, I've tried to print a new histogram for each small dataset on a new page, using "```" to denote where three ticks would usually be as to not mess up the sample code formatting:
"```"{r, echo = FALSE}
datlist <- list(df1 = rnorm(100), df2 = rnorm(100), df3 = rnorm(100)) # fake data
"```"
Some Text Introducing the Report
"```"{'asis', eval = length(datlist) > 0} # evaluating if the section is to be included
"```"{r, echo = FALSE, eval = length(datlist) > 0}
for(i in 1:length(datlist)){ # starting the variable-length scope
"```"{'asis', eval = length(datlist) > 0} # the information to be included on each new page
\newpage
\section{`r (names(datlist[i]))`}
Here is a histogram of the data found in `r (names(datlist[i]))`.
`r hist(unlist(datlist[i]))`
"```"
} # closing the for loop above
"```"
"```"
Any help, including a solution using a completely different approach, is most welcome.
A correlation is always between two variables so I am unsure wether this is what you want, but the following code will display the correlation of all pairs of variables that are greater than 0.5 in absolute value.
---
title: "Untitled"
author: "Author"
date: "18 November 2019"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r, echo = FALSE}
datlist <- data.frame(var1 = rnorm(100),
var2 = rnorm(100),
var3 = rnorm(100)) # fake data
# add some correlations
datlist$var4 <- datlist$var1*(rnorm(100,0,0.05)+1)
datlist$var5 <- datlist$var3*(rnorm(100,0,0.05)-1)
# get all correlations, there is probably an easier way of doing this...
corlist <- as.data.frame(t(combn(colnames(datlist),2)))
corlist$cor <- apply(corlist,1,function(x) {
cor(datlist[,x[1]],datlist[,x[2]])
})
```
Some Text Introducing the Report
```{r, results='asis', echo=F}
apply(corlist[abs(corlist$cor)>0.5,],1, function(x) {
cat('\n')
cat("# Correlation between ", x[1], " and ",x[2],"\n")
cat("The correlation between both variables was ", x[3], ".\n")
})
```
Of course you can extend the content of the loop to do whatever you want with the variables.
Original solution from here

How to show significance stars in R Markdown (rmarkdown) html output notes?

I want to show regression outputs in HTML documents using R Markdown. I tried the texreg and stargazerpackages. My problem is now, that in the notes I can't bring the significance stars to life. Due to automatic generation it seems I can't escape them. I've been puzzling around with this and this but with no success. What am I missing? Thanks a lot!!
Here's some code:
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r data}
library(car)
lm1 <- lm(prestige ~ income + education, data=Duncan)
```
## with STARGAZER
```{r table1, results = "asis", message=FALSE}
library(stargazer)
stargazer(lm1, type="html", notes="stargazer html 1") # nothing
stargazer(lm1, type="html", notes="stargazer html 2", star.char = "\\*") # nothing, even gone in table
```
## with TEXREG
```{r table2, results = "asis", message=FALSE}
library(texreg)
htmlreg(lm1, custom.note="%stars. htmlreg") # nothing
htmlreg(lm1, custom.note="%stars. htmlreg", star.symbol = "\\*") # still nothing!
```
Note: Question was a former sub-question I have now splitted.
Use the HTML entity for the asterisk:
star.symbol='*'
See http://www.ascii.cl/htmlcodes.htm.
You could also add the "legend" manually:
stargazer(lm1, type="html", notes = "<em>*p<0.1;**p<0.05;***p<0.01</em>", notes.append = F)

How do I parameterize template blocks in knitr?

Say I have the following code in knitr. How can I run it multiple times with different values of i?
```{r, echo=FALSE}
i<-0.1
```
### X,Y plot of Y=X+e where e is a standard normal distro: mean=0, sd=`r i`
```{r, echo=FALSE}
r<-rnorm(100,mean=0,sd=i)
x<-seq(0,1,length.out=100)
y<-x+r
plot(x,y)
```
EDIT:
As has been suggested ... I tried to do something like this: start a loop in an R code block, have a template in between and then close the loop -- R throws and error.
```{r, echo=FALSE}
for (i in 1:4) {
```
# bla
```{r, echo=FALSE}
}
```
What makes this question tricky is that not only the chunk content (the plot) must be repeated, but the heading as well. That's why we can neither simply reuse the chunk nor just loop over the plot command like
for (i in 1:3) { plot(rnorm(100, sd = i)) }
But it's almost that simple: We loop over the code that produces the plot and output the heading from inside the loop. This requires the chunk option results="asis" and cat to get verbatim markdown output:
```{r, echo=FALSE, results = "asis"}
sdVec <- c(0.1, 0.2, 0.3)
for (sd in sdVec) {
cat(sprintf("\n### X,Y plot of Y=X+e where e ~ N(0, %s)", sd))
r<-rnorm(100,mean=0,sd=sd)
x<-seq(0,1,length.out=100)
y<-x+r
plot(x,y)
}
```
See this answer for related issues.

Print RMarkdown captions from a loop

I am creating a series of plots from within a loop in an RMarkdown document, then knitting this to a PDF. I can do this without any problem, but I would like the caption to reflect the change between each plot. A MWE is shown below:
---
title: "Caption loop"
output: pdf_document
---
```{r, echo=FALSE}
library(tidyverse)
p <-
map(names(mtcars), ~ggplot(mtcars) +
geom_point(aes_string(x = 'mpg', y = .))) %>%
set_names(names(mtcars))
```
```{r loops, fig.cap=paste(for(i in seq_along(p)) print(names(p)[[i]])), echo=FALSE}
for(i in seq_along(p)) p[[i]] %>% print
```
I have made a first attempt at capturing the plots and storing in a variable p, and trying to use that to generate the captions, but this isn't working. I haven't found too much about this on SO, despite this surely being something many people would need to do. I did find this question, but it looks so complicated that I was wondering if there is a clear and simple solution that I am missing.
I wondered if it has something to do with eval.after, as with this question, but that does not involve plots generated within a loop.
many thanks for your help!
It seems that knitr is smart enough to do the task automatically. By adding names(mtcars) to the figure caption, knitr iterates through these in turn to produce the correct caption. The only problem now is how to stop all of the list indexes from printing in the document...
---
title: "Caption loop"
output: pdf_document
---
```{r loops, fig.cap=paste("Graph of mpg vs.", names(mtcars)), message=FALSE, echo=FALSE, warning=FALSE}
library(tidyverse)
map(
names(mtcars),
~ ggplot(mtcars) +
geom_point(aes_string(x = 'mpg', y = .))
)
```
In case this might be useful to somebody. Here is an adaptation of Jonny's solution for captions without printing list indices. This can be achieved by using purrr::walk instead of purrr::map. Also included is a latex fig label and text that references each plot.
---
title: "Loop figures with captions"
output:
pdf_document
---
```{r loops, fig.cap=paste(sprintf("\\label{%s}", names(mtcars)), "Graph of mpg vs.", names(mtcars)),results='asis', message=FALSE, echo=FALSE, warning=FALSE}
library(tidyverse)
library(stringi)
walk(names(mtcars),
~{
p <- ggplot(mtcars) +
geom_point(aes_string(x = 'mpg', y = .))
#print plot
cat('\n\n')
print(p)
#print text with refernce to plot
cat('\n\n')
cat(sprintf("Figure \\ref{%s} is a Graph of mpg vs. %s. %s \n\n", ., .,
stri_rand_lipsum(1)))
cat("\\clearpage")
})
```

Resources