RMarkdown: Statamarkdown produces undesired output when collectcode=TRUE - r

I'm using Statamarkdown to produce HTML documents using RMarkdown and Stata.
As documented here, each code chunk is executed as a separate Stata session. collectcode=TRUE is a chunk option to collect Stata code across chunks.
While this works neatly, the outputs of the second (and any further) chunks follwing the first with collectcode=TRUE contain an undesired echo at the top:
Running .......\profile.do
For instance, when running a second chunk with {stata stata2, echo = T,collectcode=TRUE}
reg mpg price i.foreign , noheader
yields this output:
reg mpg price i.foreign , noheader
Running C:\Cloud\Methods\prog\profile.do . reg mpg price i.foreign , noheader
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
price | -.000959 .0001815 -5.28 0.000 -.001321 -.000597
|
foreign |
Foreign | 5.245271 1.163592 4.51 0.000 2.925135 7.565407
_cons | 25.65058 1.271581 20.17 0.000 23.11512 28.18605
------------------------------------------------------------------------------
Here's my RMarkdown repex:
---
title: "Statamarkdown output problem"
output: html_document
---
```{r setup, include = F}
library(Statamarkdown)
```
First chunk is clean:
```{stata stata1,collectcode=TRUE}
sysuse auto
su mpg price
```
Second Stata Output contains undesired `Running .......\profile.do` output:
```{stata stata2, echo = T,collectcode=TRUE}
reg mpg price i.foreign , noheader
```
Problem persists even in chunks with `collectcode=FALSE`:
```{stata new_data, echo = T,collectcode=F}
webuse bpwide, clear
su sex agegrp
```
`cleanlog = F` does not do the trick:
```{stata new_data2, echo = T,collectcode=F, cleanlog = FALSE}
webuse bpwide, clear
su sex agegrp
```
Avoiding collectcode=T alltogether, i.e. load and preparing the data for each chunks would of course be a workaround, but extremely tedious.
I'm using R 3.6.3 and Stata 16.1 on a Windows machine.
Any ideas are very much appreciated!

It turns out Stata changed from
running .......\profile.do
to
Running .......\profile.do
A new version of the Statamarkdown package (0.5.0) accomodates this, now.

Related

Why is LaTeX failing to compile in RMarkdown

My code should be a pretty easy knit to a pdf, but it will not compile and I'm getting this message in R Markdown:
! LaTeX Error: Unicode character ₁ (U+2081)
not set up for use with LaTeX.
Error: LaTeX failed to compile L-work-5.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See L- work-5.log for more info.
Execution halted
here is the code:
---
title: "work 5"
author: "PLars"
date: "4/2/2022"
output: pdf_document
fonttheme: professionalfonts
fontsize: 12pt
editor_options:
markdown:
wrap: 72
---
```{r, echo = FALSE, results = "hide", message = FALSE, purl = FALSE}
library(knitr)
opts_chunk$set(tidy = FALSE,
fig.align = "left",
background = '#a6a6a6',
fig.width = 10,
fig.height = 10,
out.width ="\\linewidth",
out.height = "\\linewidth",
message = FALSE,
warning = FALSE,
fig.align = "left"
)
options(width = 55, digits = 3)
library(scales)
percent <- function(x, digits = 2, format = "f", ...) {
paste0(formatC(100 * x, format = format, digits = digits, ...), "%")
}
library(haven)
library(tinytex)
library(stargazer)
library(tidyverse)
library(texreg)
library(dplyr)
library(texreg)
library(AER)
library(tidyverse)
```
**Part I - Categorical Models (5 points)**
Say that you estimate an ordered logit model with a three category
dependent variable and two independent variables, X~₁i~ and X~₂i~, and
obtain the following results:
```{=tex}
\begin{center}
\begin{tabular}{c|rc}
\hline \hline
& $\hat{\beta}$ & SE \\
\hline
$X_{1}$ & $-0.68$ & $(0.23)$ \\
$X_{2}$ & $-0.47$ & $(0.13)$ \\
\hline
$\tau_1$ & $-1.02$ & $(0.46)$ \\
$\tau_2$ & $.85$ & $(0.21)$ \\
\hline
\end{tabular}
\end{center}
```
```{=tex}
\begin{enumerate}
\item Calculate $\Pr(Y_i=1 | X_{1i}=1, X_{2i}=0)$.
\item Calculate $\Pr(Y_i=2 | X_{1i}=1, X_{2i}=0)$.
\item Calculate $\Pr(Y_i=3 | X_{1i}=1, X_{2i}=0)$.
\item Calculate the first difference (difference in probability in category) that result from changing X_{2i} from -2 to 2, holding X_{1i} fixed at 0. Do calculations for each possible value of Y_i.
\item Explain how we might assess whether the parallel regression assumption holds for this model? If it does not, what alternative might you pursue if this were your model?
\end{enumerate}
```
#
First, we calculate $X_i \beta$
```{r}
(xiB <- (-.068*1) + (-0.47*0))
```
Then, plug into the following equations:
```{r}
(prob1 <- 1/(1 + exp(-(-1.02-xiB))))
(prob2 <- 1/(1 + exp(-(.85-xiB))) - prob1)
(prob3 <- 1 - (1/(1 + exp(-(.85-xiB)))))
prob1 + prob2 + prob3
```
For a start, try deleting the special characters ₁ and ₂ in the line
two independent variables, X~₁i~ and X~₂i~
This will let you compile.
You might be able to get this to work by including something like
\newunicodechar{₁}{\ensuremath{{}_1}}
and similarly for the subscript-2 character, at the top of your file (from this TeX Stack Exchange question), but I haven't tested it and don't want to go down that rabbit hole right now ...
Or just change the relevant text to
two independent variables, $X_{1i}$ and $X_{2i}$
which will probably typeset it as originally intended!

R Markdown, output test results in loop

I'm looking for a nicely formated markdown output of test results that are produced within a for loop and structured with headings. For example
df <- data.frame(x = rnorm(1000),
y = rnorm(1000),
z = rnorm(1000))
for (v in c("y","z")) {
cat("##", v, " (model 0)\n")
summary(lm(x~1, df))
cat("##", v, " (model 1)\n")
summary(lm(as.formula(paste0("x~1+",v)), df))
}
whereas the output should be
y (model 0)
Call:
lm(formula = x ~ 1, data = df)
Residuals:
Min 1Q Median 3Q Max
-3.8663 -0.6969 -0.0465 0.6998 3.1648
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.05267 0.03293 -1.6 0.11
Residual standard error: 1.041 on 999 degrees of freedom
y (model 1)
Call:
lm(formula = as.formula(paste0("x~1+", v)), data = df)
Residuals:
Min 1Q Median 3Q Max
-3.8686 -0.6915 -0.0447 0.6921 3.1504
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.05374 0.03297 -1.630 0.103
y -0.02399 0.03189 -0.752 0.452
Residual standard error: 1.042 on 998 degrees of freedom
Multiple R-squared: 0.0005668, Adjusted R-squared: -0.0004346
F-statistic: 0.566 on 1 and 998 DF, p-value: 0.452
z (model 0)
and so on...
There are several results discussing parts of the question like here or here suggesting the asis-tag in combination with the cat-statement. This one includes headers.
Closest to me request seems to be this question from two years ago. However, even though highly appreciated, some of suggestions are deprecated like the asis_output or I can't get them to work in general conditions like the formattable suggestion (e.g. withlm-output). I just wonder -- as two years have past since then -- if there is a modern approach that facilitates what I'm looking for.
Solution Type 1
You could do a capture.output(cat(.)) approach with some lapply-looping. Send the output to a file and use rmarkdown::render(.).
This is the R code producing a *.pdf.
capture.output(cat("---
title: 'Test Results'
author: 'Tom & co.'
date: '11 10 2019'
output: pdf_document
---\n\n```{r setup, include=FALSE}\n
knitr::opts_chunk$set(echo = TRUE)\n
mtcars <- data.frame(mtcars)\n```\n"), file="_RMD/Tom.Rmd") # here of course your own data
lapply(seq(mtcars), function(i)
capture.output(cat("# Model", i, "\n\n```{r chunk", i, ", comment='', echo=FALSE}\n\
print(summary(lm(mpg ~ ", names(mtcars)[i] ,", mtcars)))\n```\n"),
file="_RMD/Tom.Rmd", append=TRUE))
rmarkdown::render("_RMD/Tom.Rmd")
Produces:
Solution Type 2
When we want to automate the output of multiple model summaries in the rmarkdown itself, we could chose between 1. selecting chunk option results='asis' which would produce code output but e.g. # Model 1 headlines, or 2. to choose not to select it, which would produce Model 1 but destroys the code formatting. The solution is to use the option and combine it with inline code that we can paste() together with another sapply()-loop within the sapply() for the models.
In the main sapply we apply #G.Grothendieck's venerable solution to nicely substitute the Call: line of the output using do.call("lm", list(.)). We need to wrap an invisible(.) around it to avoid the unnecessary sapply() output [[1]] [[2]]... of the empty lists produced.
I included a ". " into the cat(), because leading white space like ` this` will be rendered to this in lines 6 and 10 of the summary outputs.
This is the rmarkdown script producing a *pdf that can also be executed ordinary line by line:
---
title: "Test results"
author: "Tom & co."
date: "15 10 2019"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Overview
This is an example of an ordinary code block with output that had to be included.
```{r mtcars, fig.width=3, fig.height=3}
head(mtcars)
```
# Test results in detail
The test results follow fully automated in detail.
```{r mtcars2, echo=FALSE, message=FALSE, results="asis"}
invisible(sapply(tail(seq(mtcars), -2), function(i) {
fo <- reformulate(names(mtcars)[i], response="mpg")
s <- summary(do.call("lm", list(fo, quote(mtcars))))
cat("\n## Model", i - 2, "\n")
sapply(1:19, function(j)
cat(paste0("`", ". ", capture.output(s)[j]), "` \n"))
cat(" \n")
}))
```
***Note:*** This is a concluding remark to show that we still can do other stuff afterwards.
Produces:
(Note: Site 3 omitted)
Context
I was hit by the same need as that of OP when trying to generate multiple plots in a loop, but one of them would apparently crash the graphical device (because of unpredictable bad input) even when called using try() and prevent all the remaining figures from being generated. I needed really independent code blocks, like in the proposed solution.
Solution
I've thought of preprocessing the source file before it was passed to knitr, preferably inside R, and found that the jinjar package was a good candidate. It uses a dynamic template syntax based on the Jinja2 templating engine from Python/Django. There are no syntax clashes with document formats accepted by R Markdown, but the tricky part was integrating it nicely with its machinery.
My hackish solution was to create a wrapper rmarkdown::output_format() that executes some code inside the rmarkdown::render() call environment to process the source file:
preprocess_jinjar <- function(base_format) {
if (is.character(base_format)) {
base_format <- rmarkdown:::create_output_format_function(base_format)
}
function(...) {
# Find the markdown::render() environment.
callers <- sapply(sys.calls(), function(x) deparse(as.list(x)[[1]]))
target <- grep('^(rmarkdown::)?render$', callers)
target <- target[length(target)] # render may be called recursively
render_envir <- sys.frames()[[target]]
# Modify input with jinjar.
input_paths <- evalq(envir = render_envir, expr = {
original_knit_input <- sub('(\\.[[:alnum:]]+)$', '.jinjar\\1', knit_input)
file.rename(knit_input, original_knit_input)
input_lines <- jinjar::render(paste(input_lines, collapse = '\n'))
writeLines(input_lines, knit_input)
normalize_path(c(knit_input, original_knit_input))
})
# Add an on_exit hook to revert the modification.
rmarkdown::output_format(
knitr = NULL,
pandoc = NULL,
on_exit = function() file.rename(input_paths[2], input_paths[1]),
base_format = base_format(...),
)
}
}
Then I can call, for example:
rmarkdown::render('input.Rmd', output_format = preprocess_jinjar('html_document'))
Or, more programatically, with the output format specified in the source file metadata as usual:
html_jinjar <- preprocess_jinjar('html_document')
rmarkdown::render('input.Rmd')
Here is a minimal example for input.Rmd:
---
output:
html_jinjar:
toc: false
---
{% for n in [1, 2, 3] %}
# Section {{ n }}
```{r block-{{ n }}}
print({{ n }}**2)
```
{% endfor %}
Caveats
It's a hack. This code depends on the internal logic of markdown::render() and likely there are edge cases where it won't work. Use at your own risk.
For this solution to work, the output format contructor must be called by render(). Therefore, evaluating it before passing it to render() will fail:
render('input.Rmd', output_format = 'html_jinja') # works
render('input.Rmd', output_format = html_jinja) # works
render('input.Rmd', output_format = html_jinja()) # fails
This second limitation could be circumvented by putting the preprocessing code inside the pre_knit() hook, but then it would only run after other output format hooks, like intermediates_generator() and other pre_knit() hooks of the format.

Pander formats tables weirdly when using significance stars and pandoc

If I run a linear regression with significance stars, render it through pander, and "Knit PDF" such as this:
pander(lm(crimerate ~ conscripted + birthyr + indigenous + naturalized, data = data), add.significance.stars = T)
I occasionally get output where there is weird spacing issues between rows in the output table.
I've tried setting pander options to report fewer digits panderOptions('digits', 2), but the problem persists.
Does anybody have any ideas?
I had the same problem. Something is wrong with the cell alignment, this error disappeared when i changed style to rmarkdown.
library(data.table)
dt <- data.table(Test = c("0 - 10 000"),
ALDT = "99.18 %")
First(space in table):
pandoc.table(dt, justify = c("left", "right"))
# From pandoc below
------------------
Test ALDT
---------- -------
0 - 10 000 99.18 %
------------------
Second(good formatting):
pandoc.table(dt, style = "rmarkdown", justify = c("left", "right"))
# From pandoc below
| Test | ALDT |
|:--------------|--------:|
| 0 - 10 000 | 99.18 % |
The first try doesn't work, something is wrong with the formatting pandoc gives us. But if you specify the style as rmarkdown it seems like the formatting is as it should be.

Display subset of R output with knitR

Is there a way to display only part of the R output with knitR? I want to display only part of the summary output from an lm model in a beamer presentation so that it doesn't run off the slide. (As a side note, why is my code not wrapping?) A minimal example is provided below.
\documentclass{beamer}
\begin{document}
\title{My talk}
\author{Me}
\maketitle
\begin{frame}[fragile, t]{Slide 1}
<<setup, include=FALSE, cache=FALSE, tidy=TRUE>>=
options(width=60, digits=5, show.signif.stars=FALSE)
#
<<mod1, tidy=TRUE>>==
data(cars) # load data
g <- lm(dist ~ speed + I(speed^2) + I(speed^3), data = cars)
summary(g)
#
\end{frame}
\end{document}
To be very specific, say that I wanted to return only the following output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.50505 28.40530 -0.687 0.496
speed 6.80111 6.80113 1.000 0.323
I(speed^2) -0.34966 0.49988 -0.699 0.488
I(speed^3) 0.01025 0.01130 0.907 0.369
Residual standard error: 15.2 on 46 degrees of freedom
Multiple R-squared: 0.6732, Adjusted R-squared: 0.6519
F-statistic: 31.58 on 3 and 46 DF, p-value: 3.074e-11
There's probably a better way to do this, but the following should work for you. It uses capture.output to select what parts of the printed output to display:
\documentclass{beamer}
\begin{document}
\title{My talk}
\author{Me}
\maketitle
\begin{frame}[fragile, t]{Slide 1}
<<setup, include=FALSE, cache=FALSE, tidy=TRUE>>=
options(width=60, digits=5, show.signif.stars=FALSE)
#
<<mod1, tidy=TRUE>>==
data(cars) # load data
g <- lm(dist ~ speed + I(speed^2) + I(speed^3), data = cars)
tmp <- capture.output(summary(g))
cat(tmp[9:length(tmp)], sep='\n')
#
\end{frame}
\end{document}
The summary.lm() method being invoked here returns a list of relevant outputs formatted nicely with print.summary.lm. If you want individual components of the list, try double brackets:
Input:
summary(g)[[4]]
summary(g)[[6]]
summary(g)[[7]]
summary(g)[[8]]
Output:
> summary(g)[[4]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.50504910 28.40530273 -0.6866693 0.4957383
speed 6.80110597 6.80113480 0.9999958 0.3225441
I(speed^2) -0.34965781 0.49988277 -0.6994796 0.4877745
I(speed^3) 0.01025205 0.01129813 0.9074113 0.3689186
> summary(g)[[6]]
[1] 15.20466
> summary(g)[[7]]
[1] 4 46 4
> summary(g)[[8]]
[1] 0.6731808
There must be a better way to combine the niceness of the summary method with list indexing, though.

I cannot figure out how to get R to recognize the Pander package

I am trying to get a R -> Docx workflow. I used the tutorial given here. The commands to setup your R system (which I used from the tutorial) are:
install.packages('pander')
library(knitr)
knit2html("example.rmd")
# installing/loading the package:
if(!require(installr)) { install.packages("installr"); require(installr)} #load / install+load installr
# Installing pandoc
install.pandoc()
FILE <- "example"
system(paste0("pandoc -o ", FILE, ".docx ", FILE, ".md"))
The example file from the site (example.rmd) is:
Doc header 1
============
```{r set_knitr_chunk_options}
opts_chunk$set(echo=FALSE,message=FALSE,results = "asis") # important for making sure the output will be well formatted.
```
```{r load_pander_methods}
require(pander)
replace.print.methods <- function(PKG_name = "pander") {
PKG_methods <- as.character(methods(PKG_name))
print_methods <- gsub(PKG_name, "print", PKG_methods)
for(i in seq_along(PKG_methods)) {
f <- eval(parse(text=paste(PKG_name,":::", PKG_methods[i], sep = ""))) # the new function to use for print
assign(print_methods[i], f, ".GlobalEnv")
}
}
replace.print.methods()
## The following might work with some tweaks:
## print <- function (x, ...) UseMethod("pander")
```
Some text explaining the analysis we are doing
```{r}
summary(cars)# a summary table
fit <- lm(dist~speed, data = cars)
fit
plot(cars) # a plot
```
This creates a doc file as shown below (there is a graph at the end too):
Doc header 1
opts_chunk$set(echo = FALSE, message = FALSE, results = "asis") # important for making sure the output will be well formatted.
## Warning: there is no package called 'pander'
## Error: no function 'pander' is visible
Some text explaining the analysis we are doing speed dist
Min. : 4.0 Min. : 2
1st Qu.:12.0 1st Qu.: 26
Median :15.0 Median : 36
Mean :15.4 Mean : 43
3rd Qu.:19.0 3rd Qu.: 56
Max. :25.0 Max. :120
Call: lm(formula = dist ~ speed, data = cars)
Coefficients: (Intercept) speed
-17.58 3.93
![generated graph image][1]
Now how can I remove the errors in the generated doc file? I would like to resolve the errors if possible.
pander is not knitr.
You will need to install the pander package. (i.e. install.packages('pander')) just as you installed knitr.

Resources