R Markdown, output test results in loop - r

I'm looking for a nicely formated markdown output of test results that are produced within a for loop and structured with headings. For example
df <- data.frame(x = rnorm(1000),
y = rnorm(1000),
z = rnorm(1000))
for (v in c("y","z")) {
cat("##", v, " (model 0)\n")
summary(lm(x~1, df))
cat("##", v, " (model 1)\n")
summary(lm(as.formula(paste0("x~1+",v)), df))
}
whereas the output should be
y (model 0)
Call:
lm(formula = x ~ 1, data = df)
Residuals:
Min 1Q Median 3Q Max
-3.8663 -0.6969 -0.0465 0.6998 3.1648
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.05267 0.03293 -1.6 0.11
Residual standard error: 1.041 on 999 degrees of freedom
y (model 1)
Call:
lm(formula = as.formula(paste0("x~1+", v)), data = df)
Residuals:
Min 1Q Median 3Q Max
-3.8686 -0.6915 -0.0447 0.6921 3.1504
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.05374 0.03297 -1.630 0.103
y -0.02399 0.03189 -0.752 0.452
Residual standard error: 1.042 on 998 degrees of freedom
Multiple R-squared: 0.0005668, Adjusted R-squared: -0.0004346
F-statistic: 0.566 on 1 and 998 DF, p-value: 0.452
z (model 0)
and so on...
There are several results discussing parts of the question like here or here suggesting the asis-tag in combination with the cat-statement. This one includes headers.
Closest to me request seems to be this question from two years ago. However, even though highly appreciated, some of suggestions are deprecated like the asis_output or I can't get them to work in general conditions like the formattable suggestion (e.g. withlm-output). I just wonder -- as two years have past since then -- if there is a modern approach that facilitates what I'm looking for.

Solution Type 1
You could do a capture.output(cat(.)) approach with some lapply-looping. Send the output to a file and use rmarkdown::render(.).
This is the R code producing a *.pdf.
capture.output(cat("---
title: 'Test Results'
author: 'Tom & co.'
date: '11 10 2019'
output: pdf_document
---\n\n```{r setup, include=FALSE}\n
knitr::opts_chunk$set(echo = TRUE)\n
mtcars <- data.frame(mtcars)\n```\n"), file="_RMD/Tom.Rmd") # here of course your own data
lapply(seq(mtcars), function(i)
capture.output(cat("# Model", i, "\n\n```{r chunk", i, ", comment='', echo=FALSE}\n\
print(summary(lm(mpg ~ ", names(mtcars)[i] ,", mtcars)))\n```\n"),
file="_RMD/Tom.Rmd", append=TRUE))
rmarkdown::render("_RMD/Tom.Rmd")
Produces:
Solution Type 2
When we want to automate the output of multiple model summaries in the rmarkdown itself, we could chose between 1. selecting chunk option results='asis' which would produce code output but e.g. # Model 1 headlines, or 2. to choose not to select it, which would produce Model 1 but destroys the code formatting. The solution is to use the option and combine it with inline code that we can paste() together with another sapply()-loop within the sapply() for the models.
In the main sapply we apply #G.Grothendieck's venerable solution to nicely substitute the Call: line of the output using do.call("lm", list(.)). We need to wrap an invisible(.) around it to avoid the unnecessary sapply() output [[1]] [[2]]... of the empty lists produced.
I included a ". " into the cat(), because leading white space like ` this` will be rendered to this in lines 6 and 10 of the summary outputs.
This is the rmarkdown script producing a *pdf that can also be executed ordinary line by line:
---
title: "Test results"
author: "Tom & co."
date: "15 10 2019"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Overview
This is an example of an ordinary code block with output that had to be included.
```{r mtcars, fig.width=3, fig.height=3}
head(mtcars)
```
# Test results in detail
The test results follow fully automated in detail.
```{r mtcars2, echo=FALSE, message=FALSE, results="asis"}
invisible(sapply(tail(seq(mtcars), -2), function(i) {
fo <- reformulate(names(mtcars)[i], response="mpg")
s <- summary(do.call("lm", list(fo, quote(mtcars))))
cat("\n## Model", i - 2, "\n")
sapply(1:19, function(j)
cat(paste0("`", ". ", capture.output(s)[j]), "` \n"))
cat(" \n")
}))
```
***Note:*** This is a concluding remark to show that we still can do other stuff afterwards.
Produces:
(Note: Site 3 omitted)

Context
I was hit by the same need as that of OP when trying to generate multiple plots in a loop, but one of them would apparently crash the graphical device (because of unpredictable bad input) even when called using try() and prevent all the remaining figures from being generated. I needed really independent code blocks, like in the proposed solution.
Solution
I've thought of preprocessing the source file before it was passed to knitr, preferably inside R, and found that the jinjar package was a good candidate. It uses a dynamic template syntax based on the Jinja2 templating engine from Python/Django. There are no syntax clashes with document formats accepted by R Markdown, but the tricky part was integrating it nicely with its machinery.
My hackish solution was to create a wrapper rmarkdown::output_format() that executes some code inside the rmarkdown::render() call environment to process the source file:
preprocess_jinjar <- function(base_format) {
if (is.character(base_format)) {
base_format <- rmarkdown:::create_output_format_function(base_format)
}
function(...) {
# Find the markdown::render() environment.
callers <- sapply(sys.calls(), function(x) deparse(as.list(x)[[1]]))
target <- grep('^(rmarkdown::)?render$', callers)
target <- target[length(target)] # render may be called recursively
render_envir <- sys.frames()[[target]]
# Modify input with jinjar.
input_paths <- evalq(envir = render_envir, expr = {
original_knit_input <- sub('(\\.[[:alnum:]]+)$', '.jinjar\\1', knit_input)
file.rename(knit_input, original_knit_input)
input_lines <- jinjar::render(paste(input_lines, collapse = '\n'))
writeLines(input_lines, knit_input)
normalize_path(c(knit_input, original_knit_input))
})
# Add an on_exit hook to revert the modification.
rmarkdown::output_format(
knitr = NULL,
pandoc = NULL,
on_exit = function() file.rename(input_paths[2], input_paths[1]),
base_format = base_format(...),
)
}
}
Then I can call, for example:
rmarkdown::render('input.Rmd', output_format = preprocess_jinjar('html_document'))
Or, more programatically, with the output format specified in the source file metadata as usual:
html_jinjar <- preprocess_jinjar('html_document')
rmarkdown::render('input.Rmd')
Here is a minimal example for input.Rmd:
---
output:
html_jinjar:
toc: false
---
{% for n in [1, 2, 3] %}
# Section {{ n }}
```{r block-{{ n }}}
print({{ n }}**2)
```
{% endfor %}
Caveats
It's a hack. This code depends on the internal logic of markdown::render() and likely there are edge cases where it won't work. Use at your own risk.
For this solution to work, the output format contructor must be called by render(). Therefore, evaluating it before passing it to render() will fail:
render('input.Rmd', output_format = 'html_jinja') # works
render('input.Rmd', output_format = html_jinja) # works
render('input.Rmd', output_format = html_jinja()) # fails
This second limitation could be circumvented by putting the preprocessing code inside the pre_knit() hook, but then it would only run after other output format hooks, like intermediates_generator() and other pre_knit() hooks of the format.

Related

How does R Markdown automatically format print effects into dataframes? Or how can I access special print methods?

I'm working with the WRS2 package and there are cases where it'll output its analysis (bwtrim) into a list with a special class of the analysis type class = "bwtrim". I can't as.data.frame() it, but I found that there is a custom print method called print.bwtrim associated with it.
As an example let's say this is the output: bwtrim.out <- bwtrim(...). When I run the analysis output in an Rmarkdown chunk, it seems to "steal" part of the text output and make it into a dataframe.
So here's my question, how can I either access print.bwtrim or how does R markdown automatically format certain outputs into dataframes? Because I'd like to take this outputted dataframe and use it for other purposes.
Update: Here is a minimally working example -- put the following in a chunk in Rmd file."
```{r}
library(WRS2)
df <-
data.frame(
subject = rep(c(1:100), each = 2),
group = rep(c("treatment", "control"), each = 2),
timepoint = rep(c("pre", "post"), times = 2),
dv = rnorm(200, mean = 2)
)
analysis <- WRS2::bwtrim(dv ~ group * timepoint,
id = subject,
data = df,
tr = .2)
analysis
```
With this, a data.frame automatically shows up in the chunk afterwards and it shows all the values very nicely. My main question is how can I get this data.frame for my own uses. Because if you do str(analysis), you see that it's a list. If you do class(analysis) you get "bwtrim". if you do methods(class = "bwtrim"), you get the print method. And methods(print) will have a line that says print.bwtrim*. But I can't seem to figure out how to call print.bwtrim myself.
Regarding what Rmarkdown is doing, compare the following
If you run this in a chunk, it actually steals the data.frame part and puts it into a separate figure.
```{r}
capture.output(analysis)
```
However, if you run the same line in the console, the entire output comes out properly. What's also interesting is that if you try to assign it to another object, the output will be stolen before it can be assigned.
Compare x when you run the following in either a chunk or the console.
```{r}
x<-capture.output(analysis)
```
This is what I get from the chunk approach when I call x
[1] "Call:"
[2] "WRS2::bwtrim(formula = dv ~ group * timepoint, id = subject, "
[3] " data = df, tr = 0.2)"
[4] ""
[5] ""
This is what I get when I do it all in the console
[1] "Call:"
[2] "WRS2::bwtrim(formula = dv ~ group * timepoint, id = subject, "
[3] " data = df, tr = 0.2)"
[4] ""
[5] " value df1 df2 p.value"
[6] "group 1.0397 1 56.2774 0.3123"
[7] "timepoint 0.0001 1 57.8269 0.9904"
[8] "group:timepoint 0.5316 1 57.8269 0.4689"
[9] ""
My question is what can I call whatever Rstudio/Rmarkdown is doing to make data.frames, so that I can have an easy data.frame myself?
Update 2: This is probably not a bug, as discussed here https://github.com/rstudio/rmarkdown/issues/1150.
Update 3: You can access the method by using WRS2:::bwtrim(analysis), though I'm still interested in what Rmarkdown is doing.
Update 4: It might not be the case that Rmarkdown is stealing the output and automatically making dataframes from it, as you can see when you call x after you've already captured the output. Looking at WRS2:::print.bwtrim, it prints a dataframe that it creates, which I'm guessing Rmarkdown recognizes then formats it out.
See below for the print.bwtrim.
function (x, ...)
{
cat("Call:\n")
print(x$call)
cat("\n")
dfx <- data.frame(value = c(x$Qa, x$Qb, x$Qab), df1 = c(x$A.df[1],
x$B.df[1], x$AB.df[1]), df2 = c(x$A.df[2], x$B.df[2],
x$AB.df[2]), p.value = c(x$A.p.value, x$B.p.value, x$AB.p.value))
rownames(dfx) <- c(x$varnames[2], x$varnames[3], paste0(x$varnames[2],
":", x$varnames[3]))
dfx <- round(dfx, 4)
print(dfx)
cat("\n")
}
<bytecode: 0x000001f587dc6078>
<environment: namespace:WRS2>
In R Markdown documents, automatic printing is done by knitr::knit_print rather than print. I don't think there's a knit_print.bwtrim method defined, so it will use the default method, which is defined as
function (x, ..., inline = FALSE)
{
if (inline)
x
else normal_print(x)
}
and normal_print will call print().
You are asking why the output is different. I don't see that when I knit the document to html_document, but I do see it with html_notebook. I don't know the details of what is being done, but if you look at https://rmarkdown.rstudio.com/r_notebook_format.html you can see a discussion of "output source functions", which manipulate chunks to produce different output.
The fancy output you're seeing looks a lot like what knitr::knit_print does for a dataframe, so maybe html_notebook is substituting that in place of print.

Real data not found in R/exams

I am trying to develop an exam based on the results of a logit model fitted to a real data set. I try to load the data set, fit the model, and include some variables extracted from the model using the r varname syntax.
I first developed a small example using artificial data generated within the exercise. That worked fine and this is the corresponding Rmd file:
```{r data generation, echo = FALSE, results = "hide"}
library(tidyverse)
d <- tibble(y = rbinom(100, 1, 0.6), x1 = rnorm(100), x2=rnorm(100))
# randomize exams
nsize <- sample(50:150, 1)
sampled_dat <- sample(1:nrow(d), nsize, replace = TRUE)
fd <- d[sampled_dat, ]
fmodel <- glm(y ~ x1 + x2, data = fd, family = binomial("logit"))
```
Question
========
`r nrow(fd)`
```{r}
summary(fmodel)
```
Choose the correct answer.
Answerlist
----------
* sol1 `r nrow(fd)`
* sol2
Meta-information
================
exname: bdvDeviance
extype: schoice
exsolution: 10
exshuffle: TRUE
```
This worked as expected when launching
elearn_exam <- c("ess3.Rmd")
set.seed(1234567)
exams2nops(elearn_exam, n = 2, language = "en",
institution = "U", title = "Exam",
dir = "nops_pdf", name = "BDV", date = "2018-01-08", duplex = FALSE)
However, this is the analogous exercise loading a real data set:
```{r data generation, echo = FALSE, results = "hide"}
load("d.Rdata")
# randomize exams
nsize <- sample(180:250, 1)
sampled_dat <- sample(1:nrow(d), nsize, replace = TRUE)
fd <- d[sampled_dat, ]
logitModel <- glm(Adopted ~ CultArea + Trained + LabRice+ Education + ExtContact, data = fd, family=binomial("logit"))
```
Question
========
`r nrow(fd)`
Choose the correct answer.
Answerlist
----------
* When adding variables, the deviance did not change. The variables did not bring some useful information.
* sol2 `r nrow(fd)`
Meta-information
================
exname: bdvDeviance
extype: schoice
exsolution: 10
exshuffle: TRUE
```
This time, I get the following error:
> elearn_exam <- c("ess4.Rmd")
> set.seed(1234567)
> exams2nops(elearn_exam, n = 2, language = "en",
+ institution = "Uu", title = "Exam",
+ dir = "nops_pdf", name = "BDV_R", date = "2018-01-08", duplex = FALSE)
Quitting from lines 14-35 (ess4.Rmd)
Error in nrow(fd) : object 'fd' not found
I do not understand what the problem is in the second case. Apparently, the fd variable is not found when including it in r fd. The problem does not come from the regression because that works fine when knitting the Rmd file.
Your second example using the real data set just loads the corresponding data file via load("d.Rdata"), assuming that it is in the current working directory. However, when using any exams2xyz() interface, the exercises are processed in a temporary directory in order not to clutter the user's workspace. Hence, the d.Rdata file is not found in that directory and consequently cannot be loaded. And because of this problem, the fd object cannot be created and inserted. In short, the r fd code is working fine, the problem is loading the data.
To avoid this problem, you must either specify the full absolute path to your data file in load("/path/to/d.Rdata") or you need to copy the data to the temporary directory before loading it. For the latter, there is the convenience function include_supplement() that copies supplementary files to the temporary directory. By default, it takes them from the directory the exercise resides in. So you simply need to add:
include_supplement("d.Rdata")
before loading the data file. Note that when the file is not in the exercise directory itself but some sub-directory you can add the argument recursive = TRUE. Then sub-directories are searched recursively.

R interate over files and write results

Making my first attempt in R to iterate over data files to do an analysis and write output to a file. Here is the code I have:
# load packages igraph, dils, sna
sink('analysis-output.txt')
for (week in c("002","003","004","005","006")) {
cat("*** ",week,"\n")
obs <- read.table(paste(week,"obs.txt",sep="-"),sep="\t", header=FALSE)
per <- read.table(paste(week,"per.txt",sep="-"),sep="\t", header=FALSE)
n <- length(obs)
mper <- AdjacencyFromEdgelist(per, check.full = TRUE)
mobs <- AdjacencyFromEdgelist(obs, check.full = TRUE)
g<-array(c(mper[[1]],mobs[[1]]),c(n,n,2))
q<-qaptest(g,gcor,g1=1,g2=2)
summary(q)
}
sink()
When I run this, there are no errors, but analysis-output.txt contains only the output of the cat function, i.e.
*** 002
*** 003
*** 004
*** 005
*** 006
However, when I run the code interior to the loop by itself like so
sink('analysis-output.txt')
week = "002"
cat("*** ",week,"\n")
obs <- read.table(paste(week,"obs.txt",sep="-"),sep="\t", header=FALSE)
per <- read.table(paste(week,"per.txt",sep="-"),sep="\t", header=FALSE)
n <- length(obs)
mper <- AdjacencyFromEdgelist(per, check.full = TRUE)
mobs <- AdjacencyFromEdgelist(obs, check.full = TRUE)
g<-array(c(mper[[1]],mobs[[1]]),c(n,n,2))
q<-qaptest(g,gcor,g1=1,g2=2)
summary(q)
sink()
I do get the summary results in the output file:
*** 002
QAP Test Results
Estimated p-values:
p(f(perm) >= f(d)): 0.355
p(f(perm) <= f(d)): 0.656
Test Diagnostics:
Test Value (f(d)): -0.09045692
Replications: 1000
Distribution Summary:
Min: -0.207416
1stQ: -0.1971768
Med: -0.1720356
Mean: 0.01895663
3rdQ: 0.1185298
Max: 0.9982945
What am I doing wrong?
sink diverts the console output to a file, so the data needs to be output by R to appear in the file. I think what's missing is that, within a loop, you need to explicitly print the output of summary to get it to appear in the console (and thus be written by sink).
Compare the console output of these two loops:
for (i in 1:5) {
print(summary(cars))
}
Prints results to console
for (i in 1:5) {
summary(cars)
}
Does not print to console

I cannot figure out how to get R to recognize the Pander package

I am trying to get a R -> Docx workflow. I used the tutorial given here. The commands to setup your R system (which I used from the tutorial) are:
install.packages('pander')
library(knitr)
knit2html("example.rmd")
# installing/loading the package:
if(!require(installr)) { install.packages("installr"); require(installr)} #load / install+load installr
# Installing pandoc
install.pandoc()
FILE <- "example"
system(paste0("pandoc -o ", FILE, ".docx ", FILE, ".md"))
The example file from the site (example.rmd) is:
Doc header 1
============
```{r set_knitr_chunk_options}
opts_chunk$set(echo=FALSE,message=FALSE,results = "asis") # important for making sure the output will be well formatted.
```
```{r load_pander_methods}
require(pander)
replace.print.methods <- function(PKG_name = "pander") {
PKG_methods <- as.character(methods(PKG_name))
print_methods <- gsub(PKG_name, "print", PKG_methods)
for(i in seq_along(PKG_methods)) {
f <- eval(parse(text=paste(PKG_name,":::", PKG_methods[i], sep = ""))) # the new function to use for print
assign(print_methods[i], f, ".GlobalEnv")
}
}
replace.print.methods()
## The following might work with some tweaks:
## print <- function (x, ...) UseMethod("pander")
```
Some text explaining the analysis we are doing
```{r}
summary(cars)# a summary table
fit <- lm(dist~speed, data = cars)
fit
plot(cars) # a plot
```
This creates a doc file as shown below (there is a graph at the end too):
Doc header 1
opts_chunk$set(echo = FALSE, message = FALSE, results = "asis") # important for making sure the output will be well formatted.
## Warning: there is no package called 'pander'
## Error: no function 'pander' is visible
Some text explaining the analysis we are doing speed dist
Min. : 4.0 Min. : 2
1st Qu.:12.0 1st Qu.: 26
Median :15.0 Median : 36
Mean :15.4 Mean : 43
3rd Qu.:19.0 3rd Qu.: 56
Max. :25.0 Max. :120
Call: lm(formula = dist ~ speed, data = cars)
Coefficients: (Intercept) speed
-17.58 3.93
![generated graph image][1]
Now how can I remove the errors in the generated doc file? I would like to resolve the errors if possible.
pander is not knitr.
You will need to install the pander package. (i.e. install.packages('pander')) just as you installed knitr.

Adding custom row(s) to an LaTeX-outputted R table of regression results using memisc, xtable etc.

It's common practice for tables of regression outcomes in academic papers to have a row(s) that describe some feature of the estimated model. For example, you might have a row name:
"Model included individual fixed effects" and then each associated cell will have a Yes/No as appropriate.
My question is whether it is possible in any of the many tools for making LaTeX tables with R (c.f., Tools for making latex tables in R) to pass the table-generating functions such a row To make this more concrete, I'm imagining having a parameter like:
model.info.row <- list(name = "Fixed effects", values = c("Y", "N", "Y"))
I've read through the memisc mtable and toLaTeX documentation and did not see anything that seems capable of this---not sure about other packages / approaches, but this seems like such a common use case that I suspect there is some way of doing this.
You might try to add that new line(s) directly to the table you want to pass to e.g. xtable. Really lame example:
Let us have some model:
m <- lm(mtcars$hp ~ mtcars$wt)
Grab the table which is returned in xtable and other helpers:
df <- as.data.frame(summary(m)$coefficient)
Add a new line with some values:
df[3, ] <- c(sample(c('foo', 'bar'), 4, replace = TRUE))
Update the rowname of your custom line:
rownames(df)[3] <- 'FOOBAR'
Check out results:
> df
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.82092177119464 32.3246158121787 -0.0563323561763288 0.95545056134944
mtcars$wt 46.1600502824445 9.62530003926982 4.79569988406785 4.14582744107531e-05
FOOBAR bar foo bar bar
Or just call xtable:
> xtable(df)
% latex table generated in R 2.15.0 by xtable 1.7-0 package
% Tue Jun 12 01:39:46 2012
\begin{table}[ht]
\begin{center}
\begin{tabular}{rllll}
\hline
& Estimate & Std. Error & t value & Pr($>$$|$t$|$) \\
\hline
(Intercept) & -1.82092177119464 & 32.3246158121787 & -0.0563323561763288 & 0.95545056134944 \\
mtcars\$wt & 46.1600502824445 & 9.62530003926982 & 4.79569988406785 & 4.14582744107531e-05 \\
FOOBAR & bar & foo & bar & bar \\
\hline
\end{tabular}
\end{center}
\end{table}
I ended up writing some hacky R code (note that it only works on a system w/ sed, wc and awk available) that's more flexible and works well the memisc 'mtable' function, which is my preferred way of generating LaTeX tables. Basically you write your table to a text file, then call this function with (1) the line number in the file where you want to make an insertion (2) the line you want to insert and (3) the name of the file with you want to make the insertions into (note that this function will overwrite your existing file). The code is:
insert.note <-function(linenumber, line, file){
num.lines <- as.numeric(system(paste("wc", file, "| awk '{print $1}'"), intern=TRUE))
tmp <- tempfile()
system(paste("head -n ", linenumber, file, "> ", tmp))
sink(tmp, append=TRUE)
cat(line)
sink()
system(paste("tail -n", num.lines - linenumber, file, ">>", tmp))
system(paste("mv", tmp, file))
}
As a helper function, this code creates a valid line of LaTeX using mtable's double & column spacing:
create.note <- function(l, include.row.end = TRUE){
n <- length(l)
s <- ""
i <- 1
for(note in l){
if(i < n){
cap <- "&&"
} else {
if(include.row.end){
cap <- "\\\\ \n "
} else {
cap <- " \n"
}
}
s <- paste(s, note, cap)
i <- i + 1
}
s
}
The include.row.end parameter is in case you want to pass it something like "\midrule" and don't want to get an extra line.

Resources