`bookdown`/`rmarkdown`/`knitr`: Non-code sequential processing howto? - r

Programatically my bookdown project proceeds as follows:
Reading in raw data - produces all kind of stats.
Data preprocessing (logarithmization, normalization, imputation) - produces various plots for monitoring the population-level defects incurred.
PCA for analysis QC - produces plots for PCA and loadings-dominating data points.
Differential expression analysis - produces volcano plots and plots characterizing prominent differentially expressed features.
Overrepresentation analysis of the differentially expressed features from 4. in various biological ontology systems - produces example bar plots for enriched categories.
I have analysis and narrative nicely integrated using bookdown, enabling efficient on fly discarding of temporary (sizable) data sets/ggplot2 objects (pre/post transformation data etc.).
HOWEVER: The target audience is mostly/only interested in 4. & 5., leading me to the aspired to following structure:
4., 5., Appendix(1., 2., 3.)
Is there any other way but precomputing 1.-5. and then revisiting in the targeted order - I would prefer to avoid accumulating all those ggplot2 objects in memory if at all possible.

You could do the following:
Split steps 1-3 and 4-5 into two speparate *.Rmd files, say 123.Rmd and 45.Rmd.
Add a code chunk to the beginning of 45.md that knits 123.Rmd to 123.md:
```{r knit123, include = FALSE}
knitr::knit("123.Rmd", output = "123.md")
```
This will generate the output of steps 1-3 in Markdown and make all the objects created thereby available to steps 4-5.
Add a code chunk to the end of 45.Rmd that reads 123.md prints its content:
```{r include123, results = "asis"}
cat(readLines("123.md"), sep = "\n")
```
The results = "asis" will prevent any further processing as it is already valid Markdown.
Knit 45.Rmd to whatever target format you want.
edit (1):
TL;DR: Instead of storing the object from steps 1-3 in memory throughout steps 4-5 in order to print them afterwards, print them first and store the results on disk.
edit (2):
Since you explicitely mentioned bookdown: I would not be surprised if there was a YAML option to include a Markdown file at the end of the knitting process (something like include-after: 123.md); but I don't know for sure from the top of my head and I'm too lazy to look it up myself. ;-)

Related

Is there a knitr strat for passing R content to LaTeX commands?

I'm creating a small R package that will allow a user to create exams with R code for tables and figures, multiple question types, randomly ordered questions, and randomly ordered responses on multiple choice items. Inserting R code into LaTeX is not problematic, but one issue I've run into is the need to "slap" together text ingested via R with LaTeX commands. Consider this example:
I have this in a LaTeX file:
\newcommand{\question}[1]{
\begin{minipage}{\linewidth}
\item
{#1}
\end{minipage}
}
I read the content with readr::read_file and store it in a variable. I then have the contents of the questions in a .json file:
...
{
"type" : "mc",
"section_no" : 1,
"points" : 2,
"question" : "What is the best beer?",
"correct_answer" : "Hamm's",
"lure_1" : "Miller Lite",
"lure_2" : "PBR",
"lure_3" : "Naturdays",
"lure_4" : "Leine's"
},
...
which I read with jsonlite::fromJSON (which converts to a dataframe), do some massaging, and store in a variable. Let's call the questions and their available options questions. What I've been doing is putting the necessary LaTeX content together with the character string manually with
question.tex <- paste0("\\question{", question[i], "\\\\")
to achieve this in the knitted .tex file:
\question{What is the best beer?\\A. PBR\\B. Naturdays\\C. Miller Lite\\D. Leine's\\E. Hamm's\\}
but I'm thinking there has to be a better way to do this. I'm looking for a function that will allow for a more seamless passing of arguments to my LaTeX command, something like knitr::magic_func(latex.command, question[i]) to achieve the result above. Does this exist?
Maybe I am asking for an extra level of abstraction that knitr doesn't have (or wasn't designed to have)? Or perhaps there's a better way? I guess at this point I'm not far away from being able to create a function that reads the LaTeX command name, number of arguments, and inserts text appropriately, but better to not reinvent the wheel! Also, I think this question could be generalized to simpler commands like \title, \documentclass, etc.
Small MWE:
## (more backslashes since we need to escape in R)
tex.command <- "\\newcommand{\\question}[1]{
\\begin{minipage}{\\linewidth}
\\item
{#1}
\\end{minipage}
}"
q <- "What is the best beer?\\\\A. PBR\\\\B. Naturdays\\\\C. Miller Lite\\\\D. Leine's\\\\E. Hamm's\\\\"
## some magic function here?
magic_func(tex.command, q)
## desired result
"\\question{What is the best beer?\\\\A. PBR\\\\B. Naturdays\\\\C. Miller Lite\\\\D. Leine's\\\\E. Hamm's\\\\\\\\"

Can you make inline chunks in knitr evaluate last?

I am using knitr to write a manuscript.
I am using inline chunks to make sure the text matches my actual data.
For example "I performed regression on \rinline{nrow(df)} data points."
However, this information is needed in the abstract and other early parts of the text, while df is created by code that is next to the methods section that explains how it is created.
Can I force all inline chunks to evaluate last?
To be clear here is a markdown example.
Abstract
---------
My study is really interesting.
I performed regression on `r nrow(df)` data points.
Methods
--------
I used simulated data drawn from a normal distribution.
```{r data}
df <- data.frame(x = rnorm(10), y = rnorm(10))
```
The second sentence in the abstract should read "I performed regression on 10 data points."
In writing the MRE I discovered the answer.
If you knit the document in an R session, the variables will be saved to the global namespace.
Knitting the document a second time will fill in the inline chunks.
I didn't get any errors in the first knit.
So you do need to check that the final document does contain all the inline values.

R graphical output issues: "ghost" plots and malformed PDF files

As a part of my dissertation research, I wrote R code that performs a basic exploratory data analysis (EDA) of initial datasets. The code is supposed to output the results of EDA in three formats: 1) screen (RStudio Plots window); 2) SVG files (single file per plot); 3) PDF file (one file for all univariate EDA plots and another one for all multivariate EDA plots). It is still a work in progress in terms of covering all variables of interest and their relationships, but the basic infrastructure has already been designed and implemented (using ggplot2 and gridExtra packages). I am experiencing the following two issues with this:
1) When, during EDA, the code is supposed to display a current plot in the RStudio Plots window, the screen just blinks and no output is performed. The following is the code that generates and displays plots (screen and SVG only, see below for PDF output) [similar blocks of code are wrapped in functions, returning plot objects (g), which form a list, which, in turn, is passed to lapply() for iterating through all plots]:
df$var <- factor(df[[colName]])
title <- paste("Projects distribution across", colName, "range")
g <- ggplot(data=df, aes(x=var, fill=var)) +
geom_bar(stat="bin") +
scale_fill_discrete(colName) +
xlab(colName) +
ylab("Number of projects") +
ggtitle(label=title)
if (.Platform$GUI == "RStudio") {print(g); dev.off()}
edaFile <- str_replace_all(string=colName, pattern=" ", repl="")
edaFile <- paste0(EDA_RESULTS_DIR, "/", edaFile, ".svg")
suppressMessages(ggsave(file=edaFile, plot=g))
2) After attempting to open PDF file with EDA results, it opens, but further attempts to navigate it (i.e., line or page scrolling) or otherwise work with it (i.e., change zoom level) result in hanging of a PDF reader program (Adobe Reader XI) with message in its title bar "... (Not Responding)". Sometimes, after quite a while, Adobe Reader returns back to the responsive state, but for a short period of time, until next action sends its again to the hanging state. I noticed that it takes time for Adobe Reader to display one particular plot, specifically a Q-Q plot. Just wanted to mention this, as it might give additional insight. The following is the code that outputs saved plots to a PDF file (these are the same plots that were saved and displayed on the screen as well as were output to SVG files, as described above):
edaFilePDF <- paste0(EDA_RESULTS_DIR, "/", "eda-univar.pdf")
mg <- do.call(marrangeGrob, c(allPlots, list(nrow=2, ncol = 1)));
suppressMessages(ggsave(filename=edaFilePDF, mg, width=8.5, height=11))
Your help is much appreciated! P.S. I use RStudio Server, so the output is browser-based.

How to join a PDF files in R?

I've an R program that outputs a booklet of graphics as a PDF file onto the local server. There's a separate PDF file, an introduction piece, not written in R, that I would like to join my output to.
I can complete this in Adobe and R-bloggers has the process here, both of which involve joining the files by hand, as it were:
http://www.r-bloggers.com/splitting-and-combining-r-pdf-graphics/
But what I'd really prefer to do is just run my code and have the files join. I wasn't able to find similar posts while searching for "[R] Pdf" and "join", "merge", "import pdf", etc..
My intent is to run the code for a different ID number ("Physician") each time. The report will save as a PDF titled by ID number on the server, and the same addendum would be joined to each document.
Here's the current code creating the R report.
Physician<- 1
#creates handle for file name and location using ID
Jumanji<- paste ("X:\\Feedback_ID_", Physician, ".pdf", sep="")
#PDF graphics device on, using file handle
pdf(file=Jumanji,8.5, 11)
Several plots for this ID occur here and then the PDF is completed with dev.off().
dev.off()
I think I need to pull the outside document into R and reference it in between the opening and closing, but I haven't been successful here.
To do this in R, follow #cbeleites' suggestion (who, I think, is rightly suggesting you move your whole workflow to knitr) to do just this bit in Sweave/knitr. knit the following to pdf, where "test.pdf" is your report that you're appending to, and you'll get the result you want:
\documentclass{article}
\usepackage{pdfpages}
\begin{document}
\includepdf{test.pdf} % your other document
<<echo=FALSE>>=
x <- rnorm(100)
hist(x)
# or whatever you need to do to get your plot
#
\end{document}
Also, the post you link to seems crazy because it's easy to combine plots into a single pdf in R (in fact it's the default option). Simply leave the pdf device open with its parameter onefile=TRUE (the default).
x <- rnorm(100)
y <- rnorm(100)
pdf("test.pdf")
hist(x)
hist(y)
dev.off()
Plots will automatically get paginated.
You can also consider something like :
library(qpdf)
path_PDF1 <- "C:/1.pdf"
path_PDF2 <- "C:/2.pdf"
pdf_combine(input = c(path_PDF1, path_PDF2), output = "C:/merged.pdf")

Assign sage variable values into R objects via sagetex and Sweave

I am writing a short Sweave document that outputs into a Beamer presentation, in which I am using the sagetex package to solve an equation for two parameters in the beta binomial distribution, and I need to assign the parameter values into the R session so I can do additional processing on those values. The following code excerpt shows how I am interacting with sage:
<<echo=false,results=hide>>=
mean.raw <- c(5, 3.5, 2)
theta <- 0.5
var.raw <- mean.raw + ((mean.raw^2)/theta)
#
\begin{frame}[fragile]
\frametitle{Test of Sage 2}
\begin{sagesilent}
var('a1, b1, a2, b2, a3, b3')
eqn1 = [1000*a1/(a1+b1)==\Sexpr{mean.raw[1]}, ((1000*a1*b1)*(1000+a1+b1))/((a1+b1)^2*(a1+b1+1))==\Sexpr{var.raw[1]}]
eqn2 = [1000*a2/(a2+b2)==\Sexpr{mean.raw[2]}, ((1000*a2*b2)*(1000+a2+b2))/((a2+b2)^2*(a2+b2+1))==\Sexpr{var.raw[2]}]
eqn3 = [1000*a3/(a3+b3)==\Sexpr{mean.raw[3]}, ((1000*a3*b3)*(1000+a3+b3))/((a3+b3)^2*(a3+b3+1))==\Sexpr{var.raw[3]}]
s1 = solve(eqn1, a1,b1)
s2 = solve(eqn2, a2,b2)
s3 = solve(eqn3, a3,b3)
\end{sagesilent}
Solutions of Beta Binomial Parameters:
\begin{itemize}
\item $\sage{s1[0]}$
\item $\sage{s2[0]}$
\item $\sage{s3[0]}$
\end{itemize}
\end{frame}
Everything compiles just fine, and in that slide I am able to see the solutions to the three equations respective parameters in that itemized list (for example the first item in the itemized list from that beamer slide is outputted as [a1=(328/667), b1=(65272/667)] (I am not able to post an image of the beamer slide but I hope you get the idea).
I would like to save the parameter values a1,b1,a2,b2,a3,b3 into R objects so that I can use them in simulations. I cannot find any documentation in the sagetex package on how to save output from sage commands into variables for use with other programs (in this case R). Any suggestions on how to get these values into R?
Wow, you are really mixing two worlds ;)
The only idea I can give you is the "solution_dict=True" parameter for the solve command. Then you get a Python dictionary whicht might help you to just output the value. But I have no idea what exactly Sweave does and when which step of the process rewrites what.
In general, it might be better if you would write it in sagetex only and call R via the rpy2 Python wrapper. But that might be too much work for you - maybe just for one slide and then stitch them together via some pdf-merging?

Resources