Using R Hmisc summary/summaryM latex command within Knitr Markdown pdf - r

I have been trying to get the Hmisc latex.summary and latex.summaryM examples to work within a pdf document created using Knitr in RStudio. But keep getting error messages. The example data is:
options(digits=3)
set.seed(173)
sex <- factor(sample(c("m","f"), 500, rep=TRUE))
country <- factor(sample(c('US', 'Canada'), 500, rep=TRUE))
age <- rnorm(500, 50, 5)
sbp <- rnorm(500, 120, 12)
label(sbp) <- 'Systolic BP'
units(sbp) <- "mmHg"
treatment <- factor(sample(c("Drug","Placebo"), 500, rep=TRUE))
sbp[1] <- NA
# Generate a 3-choice variable; each of 3 variables has 5 possible levels
symp <- c('Headache','Stomach Ache','Hangnail',
'Muscle Ache','Depressed')
symptom1 <- sample(symp, 500,TRUE)
symptom2 <- sample(symp, 500,TRUE)
symptom3 <- sample(symp, 500,TRUE)
Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms')
And I want to create a pdf document that contains the tables
tab1 <- summary(sex ~ treatment + Symptoms, fun=table)
tab2 <- summaryM(age + sex + sbp + Symptoms ~ treatment,
groups='treatment', test=TRUE)
I'm running R version 3.5.2 (2018-12-20), RStudio 1.1.463, Hmisc_4.2-0, and have installed tinytex using tinytex::install_tinytex().
After a few hours trial and error I discovered how, and am posting the code below in case it helps others.

The following code works for me, note;
Requirement for relsize latex package when Hmisc::units attribute is used to prevent the following failed to compile error.
! Undefined control sequence.
<recently read> \smaller
The mylatex function is taken from https://stackoverflow.com/a/31443576/4241780, and is required for removing unwanted output.
The option file = "" is needed to prevent the error
Error in system(comd, intern = TRUE, wait = TRUE) : 'yap' not found
Calls: <Anonymous> ... print -> print.latex -> show.latex -> show.dvi -> system
The use of the where = "!htbp" option ensures that the tables remain where they are placed and do not float to the top of the page (by default where = "!tbp") https://tex.stackexchange.com/a/2282.
---
title: "Untitled"
author: "Author"
date: "15 April 2019"
output:
pdf_document:
extra_dependencies: ["relsize"]
---
```{r setup, include=FALSE}
library(Hmisc)
library(dplyr)
mylatex <- function (...) {
o <- capture.output(latex(file = "", where = "!htbp", ...))
# this will strip /all/ line-only comments; or if you're only
# interested in stripping the first such comment you could
# adjust accordingly
o <- grep('^%', o, inv=T, value=T)
cat(o, sep='\n')
}
```
```{r data}
# As in question above ...
```
Here is the first table
```{r tab1, results = "asis"}
tab1 <- summary(sex ~ treatment + Symptoms, fun=table)
mylatex(tab1)
```
Here is the second table
```{r tab2, results = "asis"}
tab2 <- summaryM(age + sex + sbp + Symptoms ~ treatment, test=TRUE)
mylatex(tab2)
```

Related

Why is LaTeX failing to compile in RMarkdown

My code should be a pretty easy knit to a pdf, but it will not compile and I'm getting this message in R Markdown:
! LaTeX Error: Unicode character ₁ (U+2081)
not set up for use with LaTeX.
Error: LaTeX failed to compile L-work-5.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See L- work-5.log for more info.
Execution halted
here is the code:
---
title: "work 5"
author: "PLars"
date: "4/2/2022"
output: pdf_document
fonttheme: professionalfonts
fontsize: 12pt
editor_options:
markdown:
wrap: 72
---
```{r, echo = FALSE, results = "hide", message = FALSE, purl = FALSE}
library(knitr)
opts_chunk$set(tidy = FALSE,
fig.align = "left",
background = '#a6a6a6',
fig.width = 10,
fig.height = 10,
out.width ="\\linewidth",
out.height = "\\linewidth",
message = FALSE,
warning = FALSE,
fig.align = "left"
)
options(width = 55, digits = 3)
library(scales)
percent <- function(x, digits = 2, format = "f", ...) {
paste0(formatC(100 * x, format = format, digits = digits, ...), "%")
}
library(haven)
library(tinytex)
library(stargazer)
library(tidyverse)
library(texreg)
library(dplyr)
library(texreg)
library(AER)
library(tidyverse)
```
**Part I - Categorical Models (5 points)**
Say that you estimate an ordered logit model with a three category
dependent variable and two independent variables, X~₁i~ and X~₂i~, and
obtain the following results:
```{=tex}
\begin{center}
\begin{tabular}{c|rc}
\hline \hline
& $\hat{\beta}$ & SE \\
\hline
$X_{1}$ & $-0.68$ & $(0.23)$ \\
$X_{2}$ & $-0.47$ & $(0.13)$ \\
\hline
$\tau_1$ & $-1.02$ & $(0.46)$ \\
$\tau_2$ & $.85$ & $(0.21)$ \\
\hline
\end{tabular}
\end{center}
```
```{=tex}
\begin{enumerate}
\item Calculate $\Pr(Y_i=1 | X_{1i}=1, X_{2i}=0)$.
\item Calculate $\Pr(Y_i=2 | X_{1i}=1, X_{2i}=0)$.
\item Calculate $\Pr(Y_i=3 | X_{1i}=1, X_{2i}=0)$.
\item Calculate the first difference (difference in probability in category) that result from changing X_{2i} from -2 to 2, holding X_{1i} fixed at 0. Do calculations for each possible value of Y_i.
\item Explain how we might assess whether the parallel regression assumption holds for this model? If it does not, what alternative might you pursue if this were your model?
\end{enumerate}
```
#
First, we calculate $X_i \beta$
```{r}
(xiB <- (-.068*1) + (-0.47*0))
```
Then, plug into the following equations:
```{r}
(prob1 <- 1/(1 + exp(-(-1.02-xiB))))
(prob2 <- 1/(1 + exp(-(.85-xiB))) - prob1)
(prob3 <- 1 - (1/(1 + exp(-(.85-xiB)))))
prob1 + prob2 + prob3
```
For a start, try deleting the special characters ₁ and ₂ in the line
two independent variables, X~₁i~ and X~₂i~
This will let you compile.
You might be able to get this to work by including something like
\newunicodechar{₁}{\ensuremath{{}_1}}
and similarly for the subscript-2 character, at the top of your file (from this TeX Stack Exchange question), but I haven't tested it and don't want to go down that rabbit hole right now ...
Or just change the relevant text to
two independent variables, $X_{1i}$ and $X_{2i}$
which will probably typeset it as originally intended!

Return both text and inline r markdown from within a function

Uses:
install.packages("bookdown")
library(bookdown)
GitHub: https://github.com/MartinJLambert/r-markdown_function_test
Given that I need to reproduce these values multiple times within the same document, I have created a function that calculates a simple ANOVA and determines the F, df, p and n statistics, as well as an asterix indicator for significance based on the p-value.
---
output:
bookdown::pdf_document2
---
```{r include= FALSE}
# function for calculating and displaying statistics results from an ANOVA
func_aov_stats <- function(input_df, input_var, input_factor) {
aov_tmp <- aov(input_var ~ input_factor, input_df)
anova_tmp <- anova(aov_tmp)
temp_signif <- if(anova_tmp[1,5] < 0.001){print("***")}
else if(anova_tmp[1,5] < 0.01){print("**")}
else if(anova_tmp[1,5] <0.05){print("*")}
else {print("")}
paste(anova_tmp[1,1], anova_tmp[1,4], anova_tmp[1,5], temp_signif, anova_tmp[2,1]+2)
}
```
`r func_aov_stats(mtcars, mtcars$mpg, mtcars$cyl)`
This is simple enough and knitting this does exactly what I want it to do.
1 79.5610275293349 6.11268714258098e-10 *** 32
However, numbers alone are kinda useless, so I would like to report it as a string of text. Something along these lines:
ANOVA: F(df=anova_tmp[1,1]) = anova_tmp[1,4], p
= anova_tmp[1,5] temp_signif, n = anova_tmp[2,1]+2
I was thinking of simply pasting the inline r-markdown inside the function:
paste("ANOVA: F~(df=`r anova_tmp[1,1]`)~ = `r anova_tmp[1,4]`, p = `r paste(anova_tmp[1,5] temp_signif)`, n = `r anova_tmp[2,1]+2`")
But I get this:
ANOVA: F(df=r anova_tmp[1,1]) = r anova_tmp[1,4], p = r anova_tmp[1,5] temp_signif, n = r anova_tmp[2,1]+2
At least the markdown formatting worked, but it obviously doesn't paste the 'r' components as hoped.
What does work, is if I write it out manually outside of the function, elsewhere in the markdown document:
```{r outside_of_function, include= FALSE}
aov_tmp <- aov(mpg ~ cyl, mtcars)
anova_tmp <- anova(aov_tmp)
temp_signif <- if(anova_tmp[1,5] < 0.001){print("***")} else if(anova_tmp[1,5] < 0.01){print("**")} else if(anova_tmp[1,5] <0.05){print("*")} else {print("")}
```
ANOVA: F~(df=`r anova_tmp[1,1]`)~ = `r anova_tmp[1,4]`, p = `r paste(anova_tmp[1,5], temp_signif)`, n = `r anova_tmp[2,1]+2`
ANOVA: F(df=1) = 79.5610275, p = 6.11268714258098e-10 ***, n = 32
So the issue is within the function itself. While it does seem to be able to produce the formatting, the computation of the 'r' code seems to require something beyond my understanding.
The solution is: paste0().
Full code, updated and fully functioning:
---
output:
bookdown::pdf_document2
---
```{r include = FALSE}
# function for calculating and displaying statistics results from an ANOVA
func_aov_stats <- function(input_df, input_var, input_factor) {
aov_tmp <- aov(input_var ~ input_factor, input_df)
anova_tmp <- anova(aov_tmp)
temp_signif <- if(anova_tmp[1,5] < 0.001){print("***")}
else if(anova_tmp[1,5] < 0.01){print("**")}
else if(anova_tmp[1,5] <0.05){print("*")}
else {print("")}
paste0("ANOVA: F~(df=",anova_tmp[1,1],")~= ",anova_tmp[1,4],", p = ",anova_tmp[1,5]," ",temp_signif," n = ",anova_tmp[2,1]+2)
}
```
`r func_aov_stats(mtcars, mtcars$mpg, mtcars$cyl)`

Real data not found in R/exams

I am trying to develop an exam based on the results of a logit model fitted to a real data set. I try to load the data set, fit the model, and include some variables extracted from the model using the r varname syntax.
I first developed a small example using artificial data generated within the exercise. That worked fine and this is the corresponding Rmd file:
```{r data generation, echo = FALSE, results = "hide"}
library(tidyverse)
d <- tibble(y = rbinom(100, 1, 0.6), x1 = rnorm(100), x2=rnorm(100))
# randomize exams
nsize <- sample(50:150, 1)
sampled_dat <- sample(1:nrow(d), nsize, replace = TRUE)
fd <- d[sampled_dat, ]
fmodel <- glm(y ~ x1 + x2, data = fd, family = binomial("logit"))
```
Question
========
`r nrow(fd)`
```{r}
summary(fmodel)
```
Choose the correct answer.
Answerlist
----------
* sol1 `r nrow(fd)`
* sol2
Meta-information
================
exname: bdvDeviance
extype: schoice
exsolution: 10
exshuffle: TRUE
```
This worked as expected when launching
elearn_exam <- c("ess3.Rmd")
set.seed(1234567)
exams2nops(elearn_exam, n = 2, language = "en",
institution = "U", title = "Exam",
dir = "nops_pdf", name = "BDV", date = "2018-01-08", duplex = FALSE)
However, this is the analogous exercise loading a real data set:
```{r data generation, echo = FALSE, results = "hide"}
load("d.Rdata")
# randomize exams
nsize <- sample(180:250, 1)
sampled_dat <- sample(1:nrow(d), nsize, replace = TRUE)
fd <- d[sampled_dat, ]
logitModel <- glm(Adopted ~ CultArea + Trained + LabRice+ Education + ExtContact, data = fd, family=binomial("logit"))
```
Question
========
`r nrow(fd)`
Choose the correct answer.
Answerlist
----------
* When adding variables, the deviance did not change. The variables did not bring some useful information.
* sol2 `r nrow(fd)`
Meta-information
================
exname: bdvDeviance
extype: schoice
exsolution: 10
exshuffle: TRUE
```
This time, I get the following error:
> elearn_exam <- c("ess4.Rmd")
> set.seed(1234567)
> exams2nops(elearn_exam, n = 2, language = "en",
+ institution = "Uu", title = "Exam",
+ dir = "nops_pdf", name = "BDV_R", date = "2018-01-08", duplex = FALSE)
Quitting from lines 14-35 (ess4.Rmd)
Error in nrow(fd) : object 'fd' not found
I do not understand what the problem is in the second case. Apparently, the fd variable is not found when including it in r fd. The problem does not come from the regression because that works fine when knitting the Rmd file.
Your second example using the real data set just loads the corresponding data file via load("d.Rdata"), assuming that it is in the current working directory. However, when using any exams2xyz() interface, the exercises are processed in a temporary directory in order not to clutter the user's workspace. Hence, the d.Rdata file is not found in that directory and consequently cannot be loaded. And because of this problem, the fd object cannot be created and inserted. In short, the r fd code is working fine, the problem is loading the data.
To avoid this problem, you must either specify the full absolute path to your data file in load("/path/to/d.Rdata") or you need to copy the data to the temporary directory before loading it. For the latter, there is the convenience function include_supplement() that copies supplementary files to the temporary directory. By default, it takes them from the directory the exercise resides in. So you simply need to add:
include_supplement("d.Rdata")
before loading the data file. Note that when the file is not in the exercise directory itself but some sub-directory you can add the argument recursive = TRUE. Then sub-directories are searched recursively.

Trying to publish an R notebook and keep getting the same error (Error in contrib.url(repos, "source") trying to use CRAN without setting a mirror

I use OSX Yosemite with XQuartz as was suggested in other questions, and I've been attempting to publish a notebook but get the same error every time. This is what the .R file looks like:
#' ---
#' title: "MLB Payroll Analysis"
#' author: "Steven Quartz Universe"
#' date: "21 March 2015"
#' output: pdf_document
#' ---
#loading the payroll data from the Python document
payroll <- read.table("~/Documents/payroll.txt", header=TRUE, quote="\"")
View(payroll)
summary(payroll)
bank <- payroll$PayrollMillions
wins <- payroll$X2014Wins
#loading the payroll data from the Python document
payroll <- read.table("~/Documents/payroll.txt", header=TRUE, quote="\"")
summary(payroll)
bank <- payroll$PayrollMillions
wins <- payroll$X2014Wins
#displaying the mean and sd of payroll and wins (out of 162, of course)
mean(bank)
sd(bank)
mean(wins)
sd(wins)
#setting a linear regression
reg <- lm(wins ~ bank)
summary(reg)
#the regression is valid to significance < .10 (p-value .05072),
#but the R-squared is only .1296, a weak correlation
#a means of comparing the histogram to a normal distribution
histNorm <- function(x, densCol = "darkblue"){
m <- mean(x)
std <- sqrt(var(x))
h <- max(hist(x,plot=FALSE)$density)
d <- dnorm(x, mean=m, sd=std)
maxY <- max(h,d)
hist(x, prob=TRUE,
xlab="x", ylim=c(0, maxY),
main="(Probability) Histogram with Normal Density")
curve(dnorm(x, mean=m, sd=std),
col=densCol, lwd=2, add=TRUE)
}
#showing the histogram with normal distribution line
histNorm(reg$residuals, "purple")
#QQplots and Shapiro-Wilk test
qqnorm(reg$residuals)
qqline(reg$residuals)
shapiro.test(reg$residuals)
#p-value is .383; this can be considered a normal distribution
plot(reg$fitted.values,reg$residuals)
abline(h = 0)
#variances are wide, but in a channel
install.packages("lmtest")
library(lmtest)
bptest(reg)
#p-value of .849 given; we can assume variances are constant throughout the distribution
hats <- hatvalues(reg)
hatmu <- mean(hats)
hats[hats > 2 * hatmu]
#we get teams 14 and 19 with high leverage; the Dodgers and Yankees with their astronomical payrolls
treg <- rstudent(reg)
n <- length(treg)
p <- reg$coefficients
df <- n - p - 1
alpha <- 0.05
#no bonferroni correction for outliers
crit <- qt(1 - alpha/2,df)
treg[abs(treg) > crit]
#no outliers are found
#with bonferroni correction
crit <- qt(1 - (alpha/2)/n,df)
treg[abs(treg) > crit]
#no outliers are found
#comparison of outlier tests
pvals <- pt(-abs(treg),df)*2
padjb <- p.adjust(pvals, method = "bonferroni")
padjf <- p.adjust(pvals, method = "fdr")
cbind(pvals,padjb,padjf)
When I hit Compile Notebook, this is the output:
|...................... | 33%
ordinary text without R code
|........................................... | 67%
label: unnamed-chunk-1
processing file: payroll.spin.Rmd
Quitting from lines 9-90 (payroll.spin.Rmd)
Error in contrib.url(repos, "source") :
trying to use CRAN without setting a mirror
Calls: <Anonymous> ... withVisible -> eval -> eval -> install.packages -> contrib.url
I've looked through other questions on how to rectify this, but to no avail. I've done the command line fixes, again to no avail. Could someone point me as to what I'm doing wrong? Thanks kindly.
The line
install.packages("lmtest")
is the problem here. As is hinted by the error message
Error in contrib.url(repos, "source") :
trying to use CRAN without setting a mirror
it is expected that you provide a link to a repo for the package. So changing it to (for instance):
install.packages("lmtest", repos = "http://cran.us.r-project.org")
should do the trick. But as MrFlick and Ben Bolkers pointed out in their comments, it should probably be done when the package is not already installed.
I had this same issue with a Knit HTML publish, i modified the very beginning of the file like so:
---
title: "dialectic"
author: "micah smith"
date: "3/4/2017"
output: html_document
---
```{r setup, include=FALSE}
chooseCRANmirror(graphics=FALSE, ind=1)
knitr::opts_chunk$set(echo = TRUE)
the chooseCRANmirror(graphics=FALSE, ind=1) was the line that fixed it
chooseCRANmirror(graphics=FALSE, ind=1)
knitr::opts_chunk$set(echo = TRUE)
Write this at the start of your chunk if you have already installed the package.
If you have already run your install.packages("___") script, then you can try to set that codechunk to eval = FALSE when you try to knit your markdown file?
If you are installing multiple packages via a pkgs variable just do this. It worked for me. I couldn't knitr to pdf until I fixed it.
pkgs <- c("moments", "ggplot2", "dplyr", "tidyr", "tidyverse")
install.packages(pkgs, repos = "http://cran.us.r-project.org")

R markdown v2 and Hmisc tables

How can I take the output from summary in Hmisc and have it rendered in knitr with the correct formatting and preferably transferred to word as a table for my collaborators?
The following chunk produces a table but the formatting is off (all the value labels and numbers for the variables are on the same line, not beneath each other)
---
output: word_document
---
```{r table, results='asis'}
library(Hmisc)
options(digits=3)
set.seed(173)
sex <- factor(sample(c("m","f"), 500, rep=TRUE))
age <- rnorm(500, 50, 5)
treatment <- factor(sample(c("Drug","Placebo"), 500, rep=TRUE))
# Generate a 3-choice variable; each of 3 variables has 5 possible levels
symp <- c('Headache','Stomach Ache','Hangnail',
'Muscle Ache','Depressed')
symptom1 <- sample(symp, 500,TRUE)
symptom2 <- sample(symp, 500,TRUE)
symptom3 <- sample(symp, 500,TRUE)
Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms')
table(Symptoms)
# Note: In this example, some subjects have the same symptom checked
# multiple times; in practice these redundant selections would be NAs
# mChoice will ignore these redundant selections
#Frequency table sex*treatment, sex*Symptoms
summary(sex ~ treatment + Symptoms, fun=table)
```
My main focus was to get the summary.formula.reverse table from Hmisc into word for submission. I tend to use it a lot so I ended up with a quick hack that gets the table into word - although not using knitr. Feel free to improve and apply the same logic to the other summary.formula tables...
library(stringr)
library(Hmisc)
library(rtf)
tabl<-function(x,filename="tab.doc"){
u<-capture.output(print(x,exclude1=F,long=T,pctdig=1,))
col<-max(str_count(string=u,"\\|"))
row<-sum(as.numeric(str_detect(u,"\\|")==T))
su<-which(str_detect(u,"\\|")==T)
i<-str_trim(unlist(str_split(u[su[1]],"\\|")))
i2<-str_trim(unlist(str_split(u[su[2]],"\\|")))
i3<-paste(i,i2,sep="\n")
i3<-i3[-c(1,col+1)]
uo<-u[su[-c(1:2)]]
val<-lapply(uo,function(x) str_trim(unlist(str_split(x,"\\|"))))
misd<-lapply(val,function(x) ifelse(x[3]=="",paste("\\tab",x[2],sep=" "),paste("\\ql",x[2],sep=" ")))
f<-t(matrix(unlist(val),col+1))
f[,-c(1,col+1)]->f2
f2[,1]<-unlist(misd)
colnames(f2)<-i3
which(str_detect(f2,"\\ql")==T)->blank
inser<-function(df,place,vector){
df1<-rbind(df[1:place-1,],vector,df[place:length(df[,1]),])
df1
}
f3<-as.data.frame(f2)
lapply(c(1:length(names(f3))),function(x) levels(f3[[x]])<<-c(levels(f3[[x]]),""))
g<-1
for (i in blank[-1]) {
f3<-inser(f3,i-1+g,c(rep("",col-1)))
g<-g+1
}
y<-as.data.frame(f3)
di<-apply(y,2,function(x) max(nchar(x)))/12 ##12 char/inch
di[di<.5]<-.5
u<-RTF(file=filename,width=8.5, height=11, omi=c(1, 1, 1, 1), font.size=10)
addHeader(u,title="Table",subtitle=paste(date(),"\n",sep=""))
addTable(u,y,font.size=10,row.names=FALSE,NA.string="-",col.justify = c("L",rep("C",col-2)),header.col.justify = c("L",rep("C",col-2)),col.widths=di)
done(u)
return(u)
}

Resources