I am using R / Rmarkdown / knitr to generate multiple reports (pdfs) via render(), but the content / length of the reports will vary depending on certain characteristics of the underlying data.
As an example, let's say I have 10 different datasets of 50 variables each and I'm examining a correlation matrix of all 50 variables in the data. I want to produce a report for each dataset that has a new page for each variable pair that has a correlation that is greater than 0.5 and each variable pair that has a correlation that is less than -0.5. The number of correlations that will meet these thresholds will vary by dataset, and thus the report length / number of pages will vary by dataset.
I've learned to use {asis, echo = somecondition, eval = somecondition} to evaluate whether an entire section needs to be included (e.g., when there are no negative correlations less than -0.5). I have also seen solutions utilizing 'for' loops when there might be variable-length arguments across reports, but those solutions don't include printing each result on a new page. I'd also like to include section headers on each of the pages reporting the correlations as well.
The difficulty for me is that any solution I can think of requires nesting chunks of text and r code within one another. For some sample Rmd code of how I am approaching the problem, I've tried to print a new histogram for each small dataset on a new page, using "```" to denote where three ticks would usually be as to not mess up the sample code formatting:
"```"{r, echo = FALSE}
datlist <- list(df1 = rnorm(100), df2 = rnorm(100), df3 = rnorm(100)) # fake data
"```"
Some Text Introducing the Report
"```"{'asis', eval = length(datlist) > 0} # evaluating if the section is to be included
"```"{r, echo = FALSE, eval = length(datlist) > 0}
for(i in 1:length(datlist)){ # starting the variable-length scope
"```"{'asis', eval = length(datlist) > 0} # the information to be included on each new page
\newpage
\section{`r (names(datlist[i]))`}
Here is a histogram of the data found in `r (names(datlist[i]))`.
`r hist(unlist(datlist[i]))`
"```"
} # closing the for loop above
"```"
"```"
Any help, including a solution using a completely different approach, is most welcome.
A correlation is always between two variables so I am unsure wether this is what you want, but the following code will display the correlation of all pairs of variables that are greater than 0.5 in absolute value.
---
title: "Untitled"
author: "Author"
date: "18 November 2019"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r, echo = FALSE}
datlist <- data.frame(var1 = rnorm(100),
var2 = rnorm(100),
var3 = rnorm(100)) # fake data
# add some correlations
datlist$var4 <- datlist$var1*(rnorm(100,0,0.05)+1)
datlist$var5 <- datlist$var3*(rnorm(100,0,0.05)-1)
# get all correlations, there is probably an easier way of doing this...
corlist <- as.data.frame(t(combn(colnames(datlist),2)))
corlist$cor <- apply(corlist,1,function(x) {
cor(datlist[,x[1]],datlist[,x[2]])
})
```
Some Text Introducing the Report
```{r, results='asis', echo=F}
apply(corlist[abs(corlist$cor)>0.5,],1, function(x) {
cat('\n')
cat("# Correlation between ", x[1], " and ",x[2],"\n")
cat("The correlation between both variables was ", x[3], ".\n")
})
```
Of course you can extend the content of the loop to do whatever you want with the variables.
Original solution from here
Related
I'm new to R, and trying to work a side project to supplement my in-class learning.
How on earth do I reference simple vectors from one code chunk to another in knitr?
For example, if I have a user input 3 values, and want to use those values in a later chunk, what's the best way to do so?
EX:
```{R User Input, echo=FALSE, REPLACE=FALSE}
redoffensedicenumber <- as.numeric(readline(prompt="Red Dice : "))
blackoffensedicenumber <- as.numeric(readline(prompt="Black Dice : "))
whiteoffensedicenumber <- as.numeric(readline(prompt="White Dice : "))
```
```{R Create Storage Matrix, eval=FALSE, include=FALSE}
zero <- c(0)
storagematrix <- matrix (zero, nrow = 10000 , ncol = redoffensedicenumber + blackoffensedicenumber + whiteoffensedicenumber , byrow = TRUE)
```
I am unable to find a way to implement r code into an inline LateX equation in R markdown. The goal is to not have to hard code the values of my variable 'values' if they were to change.
Given:
values <- c(1.4, 2.5, 7, 9)
avg <- sum(values)/length(values)
avg
My current approach was to just copy and paste the values of my R variable into the LaTeX inline equation as such:
The average of $values$ is $\hat{v} = \frac{1.4 + 2.5 + 7 + 9}{4} = 4.975$
But this is cumbersome even with such a trivial example.
Using inline r code with r values[1] does not work inside of a LateX equation in R Markdown.
---
title: Inline LaTeX using \textsf{\textbf{R}} variables
output: pdf_document
---
```{r, echo=FALSE}
# set variables
set.seed(1)
values <- sample(10:100, sample(3:5))/10
lv <- length(values)
avg <- sum(values)/lv
```
\begin{center}
The average of $values$ is
$\hat{v} = \frac{`r paste(values, collapse=" + ")`}{`r lv`} = `r round(avg, 3)`$.
\end{center}
If you same that as a .rmd file and render it you should get something like
I am using RStudio to write my R Markdown files. How can I remove the hashes (##) in the final HTML output file that are displayed before the code output?
As an example:
---
output: html_document
---
```{r}
head(cars)
```
You can include in your chunk options something like
comment=NA # to remove all hashes
or
comment='%' # to use a different character
More help on knitr available from here: http://yihui.name/knitr/options
If you are using R Markdown as you mentioned, your chunk could look like this:
```{r comment=NA}
summary(cars)
```
If you want to change this globally, you can include a chunk in your document:
```{r include=FALSE}
knitr::opts_chunk$set(comment = NA)
```
Just HTML
If your output is just HTML, you can make good use of the PRE or CODE HTML tag.
Example
```{r my_pre_example,echo=FALSE,include=TRUE,results='asis'}
knitr::opts_chunk$set(comment = NA)
cat('<pre>')
print(t.test(mtcars$mpg,mtcars$wt))
cat('</pre>')
```
HTML Result:
Welch Two Sample t-test
data: mtcars$mpg and mtcars$wt
t = 15.633, df = 32.633, p-value < 0.00000000000000022
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
14.67644 19.07031
sample estimates:
mean of x mean of y
20.09062 3.21725
Just PDF
If your output is PDF, then you may need some replace function. Here what I am using:
```r
tidyPrint <- function(data) {
content <- paste0(data,collapse = "\n\n")
content <- str_replace_all(content,"\\t"," ")
content <- str_replace_all(content,"\\ ","\\\\ ")
content <- str_replace_all(content,"\\$","\\\\$")
content <- str_replace_all(content,"\\*","\\\\*")
content <- str_replace_all(content,":",": ")
return(content)
}
```
Example
The code also needs to be a little different:
```{r my_pre_example,echo=FALSE,include=TRUE,results='asis'}
knitr::opts_chunk$set(comment = NA)
resultTTest <- capture.output(t.test(mtcars$mpg,mtcars$wt))
cat(tidyPrint(resultTTest))
```
PDF Result
PDF and HTML
If you really need the page work in both cases PDF and HTML, the tidyPrint should be a little different in the last step.
```r
tidyPrint <- function(data) {
content <- paste0(data,collapse = "\n\n")
content <- str_replace_all(content,"\\t"," ")
content <- str_replace_all(content,"\\ ","\\\\ ")
content <- str_replace_all(content,"\\$","\\\\$")
content <- str_replace_all(content,"\\*","\\\\*")
content <- str_replace_all(content,":",": ")
return(paste("<code>",content,"</code>\n"))
}
```
Result
The PDF result is the same, and the HTML result is close to the previous, but with some extra border.
It is not perfect but maybe is good enough.
The minimal reproducible example (RE) below is my attempt to figure out how can I use knitr for generating complex dynamic documents, where "complex" here refers not to the document's elements and their layout, but to non-linear logic of the underlying R code chunks. While the provided RE and its results show that a solution, based on such approach might work well, I would like to know: 1) is this a correct approach of using knitr for such situations; 2) are there any optimizations that can be made to improve the approach; 3) what are alternative approaches, which could decrease the granularity of code chunks.
EDA source code (file "reEDA.R"):
## #knitr CleanEnv
rm(list = ls(all.names = TRUE))
## #knitr LoadPackages
library(psych)
library(ggplot2)
## #knitr PrepareData
set.seed(100) # for reproducibility
data(diamonds, package='ggplot2') # use built-in data
## #knitr PerformEDA
generatePlot <- function (df, colName) {
df <- df
df$var <- df[[colName]]
g <- ggplot(data.frame(df)) +
scale_fill_continuous("Density", low="#56B1F7", high="#132B43") +
scale_x_log10("Diamond Price [log10]") +
scale_y_continuous("Density") +
geom_histogram(aes(x = var, y = ..density..,
fill = ..density..),
binwidth = 0.01)
return (g)
}
performEDA <- function (data) {
d_var <- paste0("d_", deparse(substitute(data)))
assign(d_var, describe(data), envir = .GlobalEnv)
for (colName in names(data)) {
if (is.numeric(data[[colName]]) || is.factor(data[[colName]])) {
t_var <- paste0("t_", colName)
assign(t_var, summary(data[[colName]]), envir = .GlobalEnv)
g_var <- paste0("g_", colName)
assign(g_var, generatePlot(data, colName), envir = .GlobalEnv)
}
}
}
performEDA(diamonds)
EDA report R Markdown document (file "reEDA.Rmd"):
```{r KnitrSetup, echo=FALSE, include=FALSE}
library(knitr)
opts_knit$set(progress = TRUE, verbose = TRUE)
opts_chunk$set(
echo = FALSE,
include = FALSE,
tidy = FALSE,
warning = FALSE,
comment=NA
)
```
```{r ReadChunksEDA, cache=FALSE}
read_chunk('reEDA.R')
```
```{r CleanEnv}
```
```{r LoadPackages}
```
```{r PrepareData}
```
Narrative: Data description
```{r PerformEDA}
```
Narrative: Intro to EDA results
Let's look at summary descriptive statistics for our dataset
```{r DescriptiveDataset, include=TRUE}
print(d_diamonds)
```
Now, let's examine each variable of interest individually.
Varible Price is ... Decriptive statistics for 'Price':
```{r DescriptivePrice, include=TRUE}
print(t_price)
```
Finally, let's examine price distribution across the dataset visually:
```{r VisualPrice, include=TRUE, fig.align='center'}
print(g_price)
```
The result can be found here:
http://rpubs.com/abrpubs/eda1
I don't understand what's non-linear about this code; perhaps because the example (thanks for that by the way) is small enough to demonstrate the code but not large enough to demonstrate the concern.
In particular, I don't understand the reason for the performEDA function. Why not put that functionality into the markdown? It would seem to be simpler and clearer to read. (This is untested...)
Let's look at summary descriptive statistics for our dataset
```{r DescriptiveDataset, include=TRUE}
print(describe(diamonds))
```
Now, let's examine each variable of interest individually.
Varible Price is ... Decriptive statistics for 'Price':
```{r DescriptivePrice, include=TRUE}
print(summary(data[["Price"]]))
```
Finally, let's examine price distribution across the dataset visually:
```{r VisualPrice, include=TRUE, fig.align='center'}
print(generatePlot(data, "Price"))
```
It looked like you were going to show the plots for all the variables; are you perhaps looking to loop there?
Also, this wouldn't change the functionality, but it would be much more within the R idiom to have performEDA return a list with the things it had created, rather than assigning into the global environment. It took me a while to figure out what the code did as those new variables didn't seem to be defined anywhere.
is there a simple way (e.g., via a chunk option) to get a chunk's source code and the plot it produces side by side, as on page 8 (among others) of this document?
I tried using out.width="0.5\\textwidth", fig.align='right', which makes the plot correctly occupy only half the page and be aligned to the right, but the source code is displayed on top of it, which is the normal behaviour.
I would like to have it on the left side of the plot.
Thanks
Sample code:
<<someplot, out.width="0.5\\textwidth", fig.align='right'>>=
plot(1:10)
#
Well, this ended up being trickier than I'd expected.
On the LaTeX side, the adjustbox package gives you great control over alignment of side-by-side boxes, as nicely demonstrated in this excellent answer over on tex.stackexchange.com. So my general strategy was to wrap the formatted, tidied, colorized output of the indicated R chunk with LaTeX code that: (1) places it inside of an adjustbox environment; and (2) includes the chunk's graphical output in another adjustbox environment just to its right. To accomplish that, I needed to replace knitr's default chunk output hook with a customized one, defined in section (2) of the document's <<setup>>= chunk.
Section (1) of <<setup>>= defines a chunk hook that can be used to temporarily set any of R's global options (and in particular here, options("width")) on a per-chunk basis. See here for a question and answer that treat just that one piece of this setup.
Finally, Section (3) defines a knitr "template", a bundle of several options that need to be set each time a side-by-side code-block and figure are to be produced. Once defined, it allows the user to trigger all of the required actions by simply typing opts.label="codefig" in a chunk's header.
\documentclass{article}
\usepackage{adjustbox} %% to align tops of minipages
\usepackage[margin=1in]{geometry} %% a bit more text per line
\begin{document}
<<setup, include=FALSE, cache=FALSE>>=
## These two settings control text width in codefig vs. usual code blocks
partWidth <- 45
fullWidth <- 80
options(width = fullWidth)
## (1) CHUNK HOOK FUNCTION
## First, to set R's textual output width on a per-chunk basis, we
## need to define a hook function which temporarily resets global R's
## option() settings, just for the current chunk
knit_hooks$set(r.opts=local({
ropts <- NA
function(before, options, envir) {
if (before) {
ropts <<- options(options$r.opts)
} else {
options(ropts)
}
}
}))
## (2) OUTPUT HOOK FUNCTION
## Define a custom output hook function. This function processes _all_
## evaluated chunks, but will return the same output as the usual one,
## UNLESS a 'codefig' argument appeared in the chunk's header. In that
## case, wrap the usual textual output in LaTeX code placing it in a
## narrower adjustbox environment and setting the graphics that it
## produced in another box beside it.
defaultChunkHook <- environment(knit_hooks[["get"]])$defaults$chunk
codefigChunkHook <- function (x, options) {
main <- defaultChunkHook(x, options)
before <-
"\\vspace{1em}\n
\\adjustbox{valign=t}{\n
\\begin{minipage}{.59\\linewidth}\n"
after <-
paste("\\end{minipage}}
\\hfill
\\adjustbox{valign=t}{",
paste0("\\includegraphics[width=.4\\linewidth]{figure/",
options[["label"]], "-1.pdf}}"), sep="\n")
## Was a codefig option supplied in chunk header?
## If so, wrap code block and graphical output with needed LaTeX code.
if (!is.null(options$codefig)) {
return(sprintf("%s %s %s", before, main, after))
} else {
return(main)
}
}
knit_hooks[["set"]](chunk = codefigChunkHook)
## (3) TEMPLATE
## codefig=TRUE is just one of several options needed for the
## side-by-side code block and a figure to come out right. Rather
## than typing out each of them in every single chunk header, we
## define a _template_ which bundles them all together. Then we can
## set all of those options simply by typing opts.label="codefig".
opts_template[["set"]](
codefig = list(codefig=TRUE, fig.show = "hide",
r.opts = list(width=partWidth),
tidy = TRUE,
tidy.opts = list(width.cutoff = partWidth)))
#
A chunk without \texttt{opts.label="codefig"} set...
<<A>>=
1:60
#
\texttt{opts.label="codefig"} \emph{is} set for this one
<<B, opts.label="codefig", fig.width=8, cache=FALSE>>=
library(raster)
library(RColorBrewer)
## Create a factor raster with a nice RAT (Rast. Attr. Table)
r <- raster(matrix(sample(1:10, 100, replace=TRUE), ncol=10, nrow=10))
r <- as.factor(r)
rat <- levels(r)[[1]]
rat[["landcover"]] <- as.character(1:10)
levels(r) <- rat
## To get a nice grid...
p <- as(r, "SpatialPolygonsDataFrame")
## Plot it
plot(r, col = brewer.pal("Set3", n=10),
legend = FALSE, axes = FALSE, box = FALSE)
plot(p, add = TRUE)
text(p, label = getValues(r))
#
\texttt{opts.label="codefig"} not set, and all settings back to ``normal''.
<<C>>=
lm(mpg ~ cyl + disp + hp + wt + gear, data=mtcars)
#
\end{document}
I see 3 possibilities
for beamer presentations, I'd go for \begin{columns} ... \end{columns} as well.
If it is only one such plot: Minipages
Here I used a table (column code and column result). (This example is "normal" Sweave)
For all three, the chunk options would have include = FALSE, and the plot would "manually" be put to the right place by \includegraphics[]{}.
You can display the text in a 'textplot' from package PerformanceAnalytics or gplots.
(Little) downside: To my knowledge there is no Syntax highlighting possible.
Sample Code:
```{r fig.width=8, fig.height=5, fig.keep = 'last', echo=FALSE}
suppressMessages(library(PerformanceAnalytics))
layout(t(1:2))
textplot('plot(1:10)')
plot(1:10)
```