Related
Good morning everybody,
as stated above, I’m trying to render multiple Rmarkdown reports with different parameters for each report. Basically I have a folder of .csv files, which I have to clean up. I have packed all the steps in an .Rmd file, because this way the data gets cleaned and a short report is generated documenting the results. Some figures, some stats, nothing very dramatic, just an overview of how the cleaning went.
As each .csv file is slightly different, I have to tweak some parameters. This is the easy part. I have found some nice code in the “R for Data Science” book, which you can find here. https://r4ds.had.co.nz/r-markdown.html#parameters
This is my version:
library(dplyr)
library(stringr)
library(purrr)
# Create a vector with names
files <- c("dataframe", "datatable")
# Create a tibble with filenames and lists of parameters
reports <- tibble(
filename = str_c(files, ".html"),
params = map(files, ~ list(name = .,
factor = if_else(. == "dataframe", 2.5, 5))))
#-------------------------------------------------------------------
# make reports
reports <- reports %>%
select(output_file = filename, params) %>%
purrr::pwalk(rmarkdown::render, input = "template_datatable.Rmd")
Everything runs fine, when the .Rmd file uses data.frames.
As my .csv are about 1 GB each, I would use data.table to speed things up. But as soon as my .Rmd file contains some data.table code I get this error message:
Error: `:=` can only be used within a quasiquoted argument
If I just render one file with rmarkdown::render(input = "template_datatable.Rmd", output_file = "test.html", params = list(name = "datatable", carat = 5)), the .Rmd with the data.table code works fine.
My questions are.
What is causing this error? And is there a way to fix it?
Here is my code for the .Rmd using data.frames:
---
title: "A report for `r params$name`"
params:
name: "name"
factor: 1
output:
bookdown::html_document2:
fig_caption: yes
toc: yes
toc_float: true
code_folding: "hide"
---
```{r setup, include=FALSE}
# Setup Chunk
# Some knitr options
knitr::opts_chunk$set(echo = FALSE)
# Packages
library(dplyr)
library(ggplot2)
```
```{r dataImport}
df <- data.frame(A = seq(1, 100), B = seq(1, 100))
df <- df %>%
mutate(C = B * params$factor)
```
```{r makePlot}
ggplot(df, aes(A, C)) +
geom_line()
```
And my code for the .Rmd using data.tables:
```
---
title: "A report for `r params$name`"
params:
name: "name"
factor: 1
output:
bookdown::html_document2:
fig_caption: yes
toc: yes
toc_float: true
code_folding: "hide"
---
```{r setup, include=FALSE}
# Setup Chunk
# Some knitr options
knitr::opts_chunk$set(echo = FALSE)
# Packages
library(data.table)
library(ggplot2)
```
```{r dataImport}
dt <- data.table(A = seq(1, 100), B = seq(1, 100))
dt <- dt[, C := B*params$factor]
```
```{r makePlot}
ggplot(dt, aes(A, C)) +
geom_line()
```
Thanks for your help.
I would like to apply some latex-style formatting to column headings in a pander table in rmarkdown, knitting to pdf.
Notice in the toy document below the latex commands that work for the elements of the dataframe do not work for the headings. Specifically I would like (1) to be able to italicise some headings, (2) to be able to have headings with spaces between the letters (at the moment R automatically adds a .). However I am generally interested in how to get the headings in a dataframe to accept the same latex commands as the elements of the dataframe.
---
title: "Chapter 12: Adding to the Discrete Time Hazard Model"
output:
pdf_document: default
html_document: null
word_document: null
toc: yes
linestretch: 1.3
classoption: fleqn
header-includes:
- \setlength{\mathindent}{0pt}
- \setlength\parindent{0pt}
- \usepackage{amssymb}
---
```{r global_options, include=FALSE, echo = FALSE}
#this sets global knit options (i.e. options for the entire document. The following supresses any warnings from being include in the output and sets plot parameters. Note that setting dev to pdf allows us to set size of graphs easily
rm(list = ls())
knitr::opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/',
echo=FALSE, warning=FALSE, message=FALSE, dev = 'pdf')
```
``` {r table p 446}
abC <- 0.3600344
bC <- 0.2455304
intC <- 0.4787285
dfTrans <- data.frame("Prototype" = c("$\\textit{Left/not Blue}$", "Left/Blue", "Right/not Blue", "Right/Blue"),
"$LEFT$" = c(0,1,0,1),
"$\\textit{BLUE}$" = c(0,0,1,1),
`Combined Parameter Estimates` = c(paste("0 x ", round(abC,4), "+ 0 x", round(bC,4), "+ 0 x", round(intC, 4), sep = " "), 8, 9, 0))
library(pander)
panderOptions('table.split.table', 300) # this forces the table to the width of the page.
pander(dfTrans, justify = "left")
```
I'm not sure how to do this with pander, but here is a method using the kable function from knitr and kableExtra functions for detailed table formatting. I haven't changed the yaml markup, but the updated code chunks are pasted in below, followed by the output.
```{r global_options, include=FALSE, echo = FALSE}
#this sets global knit options (i.e. options for the entire document. The following supresses any warnings from being include in the output and sets plot parameters. Note that setting dev to pdf allows us to set size of graphs easily
knitr::opts_chunk$set(fig.width=12, fig.height=8, fig.path='Figs/',
echo=FALSE, warning=FALSE, message=FALSE, dev = 'pdf')
# rm(list = ls()) This is unnecessary. knitr runs the rmarkdown document in a clean session.
library(knitr)
library(kableExtra)
options(knitr.table.format = "latex") # latex output (instead of default html)
library(tidyverse) # For dplyr pipe (%>%) and mutate
```
```{r table p 446}
abC <- 0.3600344
bC <- 0.2455304
intC <- 0.4787285
# I've removed the latex formatting from the data frame code
dfTrans <- data.frame(Prototype = c("Italic_Left/not Blue", "Left/Blue", "Right/not Blue", "Right/Blue"),
LEFT = c(0,1,0,1),
BLUE = c(0,0,1,1),
`Combined Parameter Estimates` = c(paste("0 x ", round(abC,4), "+ 0 x", round(bC,4), "+ 0 x", round(intC, 4), sep = " "), 8, 9, 0))
# Remove periods in column names
names(dfTrans) = gsub("\\.", " ", names(dfTrans))
# Two other options:
# 1. Use the data_frame function from tidyverse, rather than the base data.frame function.
# data_frame doesn't add periods, so you won't need to fix the column names afterwards.
# 2. Set check.names=FALSE in data.frame
# Use kableExtra cell_spec function to format on a cell-by-cell basis
dfTrans = dfTrans %>%
mutate(Prototype = cell_spec(Prototype, color=c("black","blue"),
align=rep(c("l","r"), each=2)))
# Format each of the column names using kableExtra text_spec
names(dfTrans)[1] = text_spec(names(dfTrans)[1], italic=TRUE)
names(dfTrans)[2] = text_spec(names(dfTrans)[2], align="l")
names(dfTrans)[3] = text_spec(names(dfTrans)[3], align="r", italic=TRUE, color="blue")
names(dfTrans)[4] = text_spec(names(dfTrans)[4], align="r")
# Output the table
kable(dfTrans, booktabs=TRUE, escape=FALSE)
```
One thing I'm not sure how to do yet is to format just the first value of dfTrans$Prototype as italic. cell_spec seems to use only the first value of an italic logical vector, so the following italicizes the whole column:
dfTrans = dfTrans %>%
mutate(Prototype = cell_spec(Prototype, color=c("black","blue"),
align=rep(c("l","r"), each=2),
italic=c(TRUE, rep(FALSE, n()-1))))
Here is a huxtable-based solution (my package):
abC <- 0.3600344
bC <- 0.2455304
intC <- 0.4787285
dfTrans <- data.frame(Prototype = c("Italic_Left/not Blue", "Left/Blue", "Right/not Blue", "Right/Blue"),
LEFT = c(0,1,0,1),
BLUE = c(0,0,1,1),
`Combined Parameter Estimates` = c(paste("0 x ", round(abC,4), "+ 0 x", round(bC,4), "+ 0 x", round(intC, 4), sep = " "), 8, 9, 0))
library(huxtable)
huxTrans <- hux(dfTrans, add_colnames = TRUE) # column names become first row
huxTrans[1, 4] <- 'Combined Parameter Estimates' # get rid of the dots
align(huxTrans)[4:5, 1] <- 'right'
text_color(huxTrans)[c(3, 5), 1] <- 'blue'
text_color(huxTrans)[1, 3] <- 'blue'
italic(huxTrans)[1, c(1, 3)] <- TRUE
huxTrans # will automatically become LaTeX in RMarkdown
quick_pdf(huxTrans)
Which looks like this in the terminal:
And this in PDF output:
You can add borders as well if you want.
How can I use a variable as the chunk name? I have a child document which gets called a number of times, and I need to advance the chunk labels in such a manner than I can also cross reference them.
Something like this:
child.Rmd
```{r }
if(!exists('existing')) existing <- 0
existing = existing + 1
myChunk <- sprintf("myChunk-%s",existing)
```
## Analysis Routine `r existing`
```{r myChunk,echo = FALSE}
#DO SOMETHING, LIKE PLOT
```
master.Rmd
# Analysis Routines
Analysis for this can be seen in figures \ref{myChunk-1}, \ref{myChunk-2} and \ref{myChunk-3}
```{r child = 'child.Rmd'}
```
```{r child = 'child.Rmd'}
```
```{r child = 'child.Rmd'}
```
EDIT POTENTIAL SOLUTION
Here is one potential workaround, inspired by SQL injection of all things...
child.Rmd
```{r }
if(!exists('existing')) existing <- 0
existing = existing + 1
myChunk <- sprintf("myChunk-%s",existing)
```
## Analysis Routine `r existing`
```{r myChunk,echo = FALSE,fig.cap=sprintf("The Caption}\\label{%s",myChunk)}
#DO SOMETHING, LIKE PLOT
```
A suggestion to preknit the Rmd file into another Rmd file before knitting&rendering as follows
master.Rmd:
# Analysis Routines
Analysis for this can be seen in figures `r paste(paste0("\\ref{", CHUNK_NAME, 1:NUM_CHUNKS, "}"), collapse=", ")`
###
rmdTxt <- unlist(lapply(1:NUM_CHUNKS, function(n) {
c(paste0("## Analysis Routine ", n),
paste0("```{r ",CHUNK_NAME, n, ", child = 'child.Rmd'}"),
"```")
}))
writeLines(rmdTxt)
###
child.Rmd:
```{r,echo = FALSE}
plot(rnorm(100))
```
To knit & render the Rmd:
devtools::install_github("chinsoon12/PreKnitPostHTMLRender")
library(PreKnitPostHTMLRender) #requires version >= 0.1.1
NUM_CHUNKS <- 5
CHUNK_NAME <- "myChunk-"
preknit_knit_render_postrender("master.Rmd", "test__test.html")
Hope it helps. Cheers!
If you're getting to this level of complexity, I suggest you look at the brew package.
That provides a templating engine where you can dynamically create the Rmd for knitting.
You get to reference R variables in the outer brew environment, and build you dynamic Rmd from there.
Dynamic chunk names are possible with knitr::knit_expand(). Arguments are referenced in the child document, including in the chunk headers, using {{arg_name}}.
So my parent doc contains:
```{r child_include, results = "asis"}
###
# Generate a section for each dataset
###
species <- c("a", "b")
out <- lapply(species, function(sp) knitr::knit_expand("child.Rmd"))
res = knitr::knit_child(text = unlist(out), quiet = TRUE)
cat(res, sep = "\n")
```
And my child doc, which has no YAML header, contains:
# EDA for species {{sp}}
```{r getname-{{sp}}}
paste("The species is", "{{sp}}")
```
See here in the RMarkdown cookbook.
Outside of Rmarkdown the stand alone googleVis chart works fine, but when I plug it in the Rmarkdown file I am receiving just the Rmarkdown Code:
Viewer Output:
> TEST H 4/13/2016 require(googleVis) Loading required package:
> googleVis Welcome to googleVis version 0.5.10 Please read the Google
> API Terms of Use before you start using the package:
> https://developers.google.com/terms/
>
> Note, the plot method of googleVis will by default use the standard
> browser to display its output. See the googleVis package vignettes
> for more details, or visit http://github.com/mages/googleVis. To
> suppress this message use:
> suppressPackageStartupMessages(library(googleVis))
>
> dttm = data.frame(DT_ENTRY=Sys.Date()-1:20,variable="x",value=1:20)
> g1=gvisAnnotationChart(dttm,datevar="DT_ENTRY",numvar="value",idvar="variable")
> plot(g1) starting httpd help server ... done
Rmarkdown Code Below:
---
title: "test"
author: "H"
date: "4/13/2016"
output: html_document
highlight: tango
number_sections: yes
---
```{r}
require(googleVis)
dttm = data.frame(DT_ENTRY=Sys.Date()-1:20,variable="x",value=1:20)
g1=gvisAnnotationChart(dttm,datevar="DT_ENTRY",numvar="value",idvar="variable")
plot(g1)
```
The r chunk has to be declared as:
```{r plotg0, results='asis', tidy=FALSE, echo=FALSE}
The "asis" is important because it returns raw HTML
I am not sure if you still need an answer. However, I faced the same problem while trying to embed a sankey plot from gvisSankey onto a section on Rmarkdown. Thankfully, I found the solution on github.com/mages/googleVis (weirdly not recommended on Google Search).
Below is a reproducible example.
First, to create a sankey plot, I do this:
# install.packages("googleVis")
library(googleVis)
# Create a data.frame that is gvisSankey-conform
# i.e., has these columns: FROM - TO - WEIGHT
df <- data.frame(FROM = rep(c("A","B","C"), each = 3),
TO = rep(c("D","E","F"), times = 3),
WEIGHT = sample(1:10, 9, replace = TRUE))
sankeyplot <- gvisSankey(data = df,
from = "FROM",
to = "TO",
weight = "WEIGHT")
plot(sankeyplot)
plot(sankeyplot) will open a new tab containing the sankeyplot. Now, how do I embed the sankeyplot onto my Rmarkdown?
There are two important points:
To set self_contained:false in the YAML header
To set results='asis' in the R code chunk where the sankeyplot is embedded
---
title: "SankeyPlot_Example"
output:
html_document:
self_contained: false
date: '2022-10-21'
---
## Embed googleVis object (sankey plot) onto Rmarkdown
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(googleVis)
# Create a reproducible data.frame
df <- data.frame(FROM = rep(c("A","B","C"), each = 3),
TO = rep(c("D","E","F"), times = 3),
WEIGHT = sample(1:10, 9, replace = TRUE))
```
```{r results='asis'}
sankeyplot <- gvisSankey(data = df,
from = "FROM",
to = "TO",
weight = "WEIGHT")
print(sankeyplot, 'chart')
```
I would like to create an automated knitr report that will produce histograms for each numeric field within my dataframe. My goal is to do this without having to specify the actual fields (this dataset contains over 70 and I would also like to reuse the script).
I've tried a few different approaches:
saving the plot to an object, p, and then calling p after the loop
This only plots the final plot
Creating an array of plots, PLOTS <- NULL, and appending the plots within the loop PLOTS <- append(PLOTS, p)
Accessing these plots out of the loop did not work at all
Even tried saving each to a .png file but would rather not have to deal with the overhead of saving and then re-accessing each file
I'm afraid the intricacies of the plot devices are escaping me.
Question
How can I make the following chunk output each plot within the loop to the report? Currently, the best I can achieve is output of the final plot produced by saving it to an object and calling that object outside of the loop.
R markdown chunk using knitr in RStudio:
```{r plotNumeric, echo=TRUE, fig.height=3}
suppressPackageStartupMessages(library(ggplot2))
FIELDS <- names(df)[sapply(df, class)=="numeric"]
for (field in FIELDS){
qplot(df[,field], main=field)
}
```
From this point, I hope to customize the plots further.
Wrap the qplot in print.
knitr will do that for you if the qplot is outside a loop, but (at least the version I have installed) doesn't detect this inside the loop (which is consistent with the behaviour of the R command line).
Wish to add a quick note:
Somehow I googled the same question and get into this page.
Now in 2018, just use print() in the loop.
for (i in 1:n){
...
f <- ggplot(.......)
print(f)
}
I am using child Rmd files in markdown, also works in sweave.
in Rmd use following snippet:
```{r run-numeric-md, include=FALSE}
out = NULL
for (i in c(1:num_vars)) {
out = c(out, knit_child('da-numeric.Rmd'))
}
```
da-numeric.Rmd looks like:
Variabele `r num_var_names[i]`
------------------------------------
Missing : `r sum(is.na(data[[num_var_names[i]]]))`
Minimum value : `r min(na.omit(data[[num_var_names[i]]]))`
Percentile 1 : `r quantile(na.omit(data[[num_var_names[i]]]),probs = seq(0, 1, 0.01))[2]`
Percentile 99 : `r quantile(na.omit(data[[num_var_names[i]]]),probs = seq(0, 1, 0.01))[100]`
Maximum value : `r max(na.omit(data[[num_var_names[i]]]))`
```{r results='asis', comment="" }
warn_extreme_values=3
d1 = quantile(na.omit(data[[num_var_names[i]]]),probs = seq(0, 1, 0.01))[2] > warn_extreme_values*quantile(na.omit(data[[num_var_names[i]]]),probs = seq(0, 1, 0.01))[1]
d99 = quantile(na.omit(data[[num_var_names[i]]]),probs = seq(0, 1, 0.01))[101] > warn_extreme_values*quantile(na.omit(data[[num_var_names[i]]]),probs = seq(0, 1, 0.01))[100]
if(d1){cat('Warning : Suspect extreme values in left tail')}
if(d99){cat('Warning : Suspect extreme values in right tail')}
```
``` {r eval=TRUE, fig.width=6, fig.height=2}
library(ggplot2)
v <- num_var_names[i]
hp <- ggplot(na.omit(data), aes_string(x=v)) + geom_histogram( colour="grey", fill="grey", binwidth=diff(range(na.omit(data[[v]]))/100))
hp + theme(axis.title.x = element_blank(),axis.text.x = element_text(size=10)) + theme(axis.title.y = element_blank(),axis.text.y = element_text(size=10))
```
see my datamineR package on github
https://github.com/hugokoopmans/dataMineR
As an addition to Hugo's excellent answer, I believe that in 2016 you need to include a print command as well:
```{r run-numeric-md, include=FALSE}
out = NULL
for (i in c(1:num_vars)) {
out = c(out, knit_child('da-numeric.Rmd'))
}
`r paste(out, collapse = '\n')`
```
For knitting Rmd to HTML, I find it more convenient to have a list of figures. In this case I get the desirable output with results='hide' as follows:
---
title: "Make a list of figures and show it"
output:
html_document
---
```{r}
suppressPackageStartupMessages({
library(ggplot2)
library(dplyr)
requireNamespace("scater")
requireNamespace("SingleCellExperiment")
})
```
```{r}
plots <- function() {
print("print")
cat("cat")
message("message")
warning("warning")
# These calls generate unwanted text
scater::mockSCE(ngene = 77, ncells = 33) %>%
scater::logNormCounts() %>%
scater::runPCA() %>%
SingleCellExperiment::reducedDim("PCA") %>%
as.data.frame() %>%
{
list(
f12 = ggplot(., aes(x = PC1, y = PC2)) + geom_point(),
f22 = ggplot(., aes(x = PC2, y = PC3)) + geom_point()
)
}
}
```
```{r, message=FALSE, warning=TRUE, results='hide'}
plots()
```
Only the plots are shown and the warnings (which you can switch off, as well).