Related
Using the 'mtcars' dataset, how can one split the dataset into clusters using the 'Carb' field and output each grid on a separate pdf document with the Carb value being the name of the pdf document. I am new in R and the solutions I have found enable one to save each cluster on a different page of a pdf document. Have not found one where its possible to save each cluster as a separate document.
You can create pdfs for each part of dataset using approach of parameterized reports in Rmarkdown and not just creating tables, you can create a whole report for each clusters of the dataset.
So to do that, we need to first create a template rmarkdown file containing code for printing data as table where we also need to specify params in yaml of the file.
---
title: "Untitled"
author: "None"
date: '2022-07-26'
output: pdf_document
params:
carb: 1
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown table
```{r, echo=FALSE}
data(mtcars)
df <- mtcars[mtcars$carb %in% params$carb,]
knitr::kable(df, caption = paste("mtcars table for carb", params$carb))
```
Then from a separate R file (r script) or from console run this code which will create six pdfs for each value of carb
lapply(unique(mtcars$carb), function(carb_i) {
rmarkdown::render("tables.Rmd",
params = list(carb = carb_i),
output_file = paste0("table_for_carb",carb_i, ".pdf"))
})
So, for example, table_for_carb1.pdf looks like this
To know more how to create parameterized report with rmarkdown, see here
Here is an option with package gridExtra.
library(gridExtra)
sp <- split(mtcars, mtcars$carb)
lapply(seq_along(sp), \(i) {
carb <- names(sp)[i]
filename <- sprintf("carb%s.pdf", carb)
pdf(filename)
grid.table(sp[[i]])
dev.off()
})
To write the clusters to the same PDF file, one table per page, start by exporting the first table, then, in the lapply loop go to next page and export the next table. The new pages must be between the tables and there must already exist a page (the 1st) before starting a new one for the next table.
And since the filename doesn't depend on the number of carburetors, the code can be simplified and rewritten without the need for seq_along.
library(grid)
library(gridExtra)
sp <- split(mtcars, mtcars$carb)
pdf("carb.pdf")
grid.table(sp[[1]])
lapply(sp[-1], \(x) {
grid.newpage()
grid.table(x)
})
dev.off()
I have a large report that I am running through R Markdown. The report has a data frame. At the beginning of the script, the data frame is filtered. After that, it does lots of manipulation and interpretation.
Currently, I change what I filter for and knit each report individually. I want to automate this process so that I can provide a vector of terms to filter with and the reports are generated.
Here is an example:
---
title: "Create markdown htmls with loop"
author: "Nathan Roe"
date: "2/17/2022"
output: html_document
---
library(dplyr)
my_df <- data.frame(my_letters = letters[1:5], my_numbers = 1:5)
my_df %>% filter(my_letters == "a")
I want to generate reports for a, b, c, d, and e. Currently, I have to go in and change what is being filtered for. As shown in the example above, I am filtering for "a". After that, I would have to change it to filter for "b", and so on. Is there a way to automate this, so that I provide a vector a, b, c, d, and e and reports are generated based on those filters and htmls are generated using the letter as the title. For example, I provide my_letters <- letters[1:5] and the script creates a.html, b.html, c.html, d.html, and e.html.
It seems similar to this, https://community.rstudio.com/t/loop-for-output-files/79716, but this example is poorly explained, if it does even answer the question.
The link you mention gives all the elements to generate a parametrized report.
On your example, you could knit with custom parameters using rmarkdown::render.
markdown file : test.Rmd
---
title: "Create markdown htmls with loop"
author: "Nathan Roe"
date: "2/17/2022"
output: html_document
params:
letter: 'a'
---
# `r paste('Processing letter ',letter)`
```{r}
letter
```
html file generation with loop :
for (letter in letters[1:5] ) {
rmarkdown::render(input = 'test.Rmd',
output_file = paste0(letter,".html"),
params = list(letter = letter))
}
...
I'm after a way to render an Rmd document (that contains references to various "child" files) to a self-contained R Notebook without these dependencies.
At the moment, the .Rmd code chunks are located throughout a number of .R, .py and .sql files and are referenced in the report using
```{r extraction, include=FALSE, cache=FALSE}
knitr::read_chunk("myscript.R")
```
followed by
```{r chunk_from_myscript}
```
as documented here.
I've done this to avoid code duplication and to allow for running the source files separately however these code chunks are only executable in the report via a call to knit or render (when read_chunk is run and the code chunk is available).
Is there a way to spin-off an Rmd (prior to knitting) with
just these chunks populated?
This function
rmarkdown::render("report.Rmd", clean = FALSE)
almost gets there as it leaves the markdown files behind whilst removing extraction and populating chunk_from_myscript however as these files are straight markdown, the chunks are no longer executable and the chunk options are missing. It obviously also doesn't include chunks where eval=TRUE, echo=FALSE which would be needed to run the resulting notebook.
I've also looked at knitr::spin however this would mean disseminating the contents of the report to every source file and isn't terribly ideal.
Reprex
report.Rmd
---
title: 'Report'
---
```{r read_chunks, include=FALSE, cache=FALSE}
knitr::read_chunk("myscript.R")
```
Some documentation
```{r chunk_from_myscript}
```
Some more documentation
```{r chunk_two_from_myscript, eval=TRUE, echo=FALSE}
```
myscript.R
#' # MyScript
#'
#' This is a valid R source file which is formatted
#' using the `knitr::spin` style comments and code
#' chunks.
#' The file's code can be used in large .Rmd reports by
#' extracting the various chunks using `knitr::read_chunk` or
#' it can be spun into its own small commented .Rmd report
#' using `knitr::spin`
# ---- chunk_from_myscript
sessionInfo()
#' This is the second chunk
# ---- chunk_two_from_myscript
1 + 1
Desired Output
notebook.Rmd
---
title: 'Report'
---
Some documentation
```{r chunk_from_myscript}
sessionInfo()
```
Some more documentation
```{r chunk_two_from_myscript, eval=TRUE, echo=FALSE}
1 + 1
```
Working through your reprex I now better understand the issue you are trying to solve. You can knit into an output.Rmd to merge your report and scripts into a single markdown file.
Instead of using knitr::read_chunk, I've read in with knitr::spin to cat the asis output into another .Rmd file. Also note the params$final flag to allow rendering the final document when set as TRUE or allowing the knit to an intermediate .Rmd as FALSE by default.
report.Rmd
---
title: "Report"
params:
final: false
---
```{r load_chunk, include=FALSE}
chunk <- knitr::spin(text = readLines("myscript.R"), report = FALSE, knit = params$final)
```
Some documentation
```{r print_chunk, results='asis', echo=FALSE}
cat(chunk, sep = "\n")
```
to produce the intermediate file:
rmarkdown::render("report.Rmd", "output.Rmd")
output.Rmd
---
title: "Report"
---
Some documentation
```{r chunk_from_myscript, echo=TRUE}
sessionInfo()
```
With the secondary output.Rmd, you could continue with my original response below to render to html_notebook so that the document may be shared without needing to regenerate but still containing the source R markdown file.
To render the final document from report.Rmd you can use:
rmarkdown::render("report.Rmd", params = list(final = TRUE))
Original response
You need to include additional arguments to your render statement.
rmarkdown::render(
input = "output.Rmd",
output_format = "html_notebook",
output_file = "output.nb.html"
)
When you open the .nb.html file in RStudio the embedded .Rmd will be viewable in the editing pane.
Since neither knitr::knit nor rmarkdown::render seem suited to rendering to R markdown, I've managed to somewhat work around this by dynamically inserting the chunk text into each empty chunk and writing that to a new file:
library(magrittr)
library(stringr)
# Find the line numbers of every empty code chunk
get_empty_chunk_line_nums <- function(file_text){
# Create an Nx2 matrix where the rows correspond
# to code chunks and the columns are start/end line nums
mat <- file_text %>%
grep(pattern = "^```") %>%
matrix(ncol = 2, byrow = TRUE)
# Return the chunk line numbers where the end line number
# immediately follows the starting line (ie. chunk is empty)
empty_chunks <- mat[,1] + 1 == mat[,2]
mat[empty_chunks, 1]
}
# Substitute each empty code chunk with the code from `read_chunk`
replace_chunk_code <- function(this_chunk_num) {
this_chunk <- file_text[this_chunk_num]
# Extract the chunk alias
chunk_name <- stringr::str_match(this_chunk, "^```\\{\\w+ (\\w+)")[2]
# Replace the closing "```" with "<chunk code>\n```"
chunk_code <- paste0(knitr:::knit_code$get(chunk_name), collapse = "\n")
file_text[this_chunk_num + 1] %<>% {paste(chunk_code, ., sep = "\n")}
file_text
}
render_to_rmd <- function(input_file, output_file, source_files) {
lapply(source_files, knitr::read_chunk)
file_text <- readLines(input_file)
empty_chunks <- get_empty_chunk_line_nums(file_text)
for (chunk_num in empty_chunks){
file_text <- replace_chunk_code(file_text, chunk_num)
}
writeLines(file_text, output_file)
}
source_files <- c("myscript.R")
render_to_rmd("report.Rmd", "output.Rmd", source_files)
This has the added benefits of preserving chunk options and working
with Python and SQL chunks too since there is no requirement to evaluate
any chunks in this step.
I'm new to R and I'm really liking the flexibility of R Markdown to create reports.
My problem is that I want to use a random number generating function I've created my tables. I want simple tables to include string headers and the following function:
> ran<-function(x){
+ x<-runif(n=1, min=13.5,max=15.5)
+ round(x, digits=2)}.
It won't allow me to create a table using this method?
```{r}
String |String |String
-------|------|------
ran(x)|ran(x)|ran(x)
```
My ultimate goal is to create practice worksheets with simple statistics that will be different but within a bounded integer range - so I can ask follow-up questions with some idea of the mean, median etc.
Any help would be greatly appreciated.
Perhaps you should read up on both how to build working R code and how to code up Rmd files since your function doesn't work and there are a few places in the R Markdown docs that show how to do this:
---
output: html_document
---
```{r echo=FALSE}
ran <- function(x) {
runif(n=1, min=13.5, max=15.5) + round(x, digits=2)
}
```
One |Two |Three
-----------|-------------|-----------
`r ran(2)` | `r ran(3)` | `r ran(4)`
`r ran(2)` | `r ran(3)` | `r ran(4)`
`r ran(2)` | `r ran(3)` | `r ran(4)`
`r ran(2)` | `r ran(3)` | `r ran(4)`
generates:
Also, neither SO nor RStudio charges extra for spaces in code blocks. It'd be good to show students good code style while you're layin' down stats tracks.
Here is an approach that automates much of the report generation and reduces the amount of code you need to type. For starters, you can turn this into a parameterized report, which would make it easier to create worksheets using different values of x. Here's an example:
In your rmarkdown document you would declare parameters x and n in the yaml header. n is the number of random values you want to produce for each value of x. The x and n values in the yaml header are just the defaults knitr uses if no other values are input when you render the report:
---
output: html_document
params:
x: !r c(1,5,10)
n: 10
---
Then, in the same rmarkdown document you would have the text and code for your worksheet. You access the parameters x and n with params$x and params$n, respectively.
For example, the rest of the rmarkdown document could look like the code below. We put x into a list called x_vals with named elements, so that the resulting column names in the output will be the names of the list elements. We feed that list to sapply to get a column of n random values for each value of x. The whole sapply statement is wrapped in kable, which produces a table in rmarkdown format.
```{r, include=FALSE}
library(knitr)
```
```{r, echo=FALSE}
# Create a named list of the x values that we passed into this document
x_vals = as.list(setNames(params$x, paste0("x=", params$x)))
kable(sapply(x_vals, function(i) round(runif(params$n, 13.5, 15.5) + i, 2)))
```
You can now click the "knit" button and it will produce a table using the default parameter values:
If instead you want to use different values for x and/or n, open a separate R script file and type the following:
rmarkdown::render("Worksheet.Rmd",
params = list(x = c(2,4,6,8),
n = 5),
output_file="Worksheet.html")
In the code above, the render function compiles the rmarkdown document we just created, but with new x and n values, and saves the output to a file called Worksheet.html. (I've assumed that we've saved the rmarkdown document to a file called Worksheet.Rmd.) Here's what the output looks like:
You can also, of course, add parameters for the lower and upper limits of the runif function, rather than hard-coding them as 13.5 and 15.5.
If you want to create several worksheets, each with different x values, you can put render in a loop:
df = expand.grid(1:3,5:6,10:11)
for (i in 1:nrow(df)) {
rmarkdown::render("Worksheet.Rmd",
params = list(x=unlist(df[i,]), n=10),
output_file=paste0(paste(unlist(df[i,]),collapse="_"),".html"))
}
Is there a standard way to include the computed values from variables early on in the written knitr report before those values are computed in the code itself? The purpose is to create an executive summary at the top of the report.
For example, something like this, where variable1 and variable2 are not defined until later:
---
title: "Untitled"
output: html_document
---
# Summary
The values from the analysis are `r variable1` and `r variable2`
## Section 1
In this section we compute some values. We find that the value of variable 1 is `r variable1`
```{r first code block}
variable1 <- cars[4, 2]
```
## Section 2
In this section we compute some more values. In this section we compute some values. We find that the value of variable 2 is `r variable2`
```{r second code block}
variable2 <- cars[5, 2]
```
A simple solution is to simply knit() the document twice from a fresh Rgui session.
The first time through, the inline R code will trigger some complaints about variables that can't be found, but the chunks will be evaluated, and the variables they return will be left in the global workspace. The second time through, the inline R code will find those variables and substitute in their values without complaint:
knit("eg.Rmd")
knit2html("eg.Rmd")
## RStudio users will need to explicitly set knit's environment, like so:
# knit("eg.Rmd", envir=.GlobalEnv)
# knit2html("eg.Rmd", envir=.GlobalEnv)
Note 1: In an earlier version of this answer, I had suggested doing knit(purl("eg.Rmd")); knit2html("eg.Rmd"). This had the (minor) advantage of not running the inline R code the first time through, but has the (potentially major) disadvantage of missing out on knitr caching capabilities.
Note 2 (for Rstudio users): RStudio necessitates an explicit envir=.GlobalEnv because, as documented here, it by default runs knit() in a separate process and environment. It default behavior aims to avoid touching anything in global environment, which means that the first run won't leave the needed variables lying around anywhere that the second run can find them.
Here is another approach, which uses brew + knit. The idea is to let knitr make a first pass on the document, and then running it through brew. You can automate this workflow by introducing the brew step as a document hook that is run after knitr is done with its magic. Note that you will have to use brew markup <%= variable %> to print values in place.
---
title: "Untitled"
output: html_document
---
# Summary
The values from the analysis are <%= variable1 %> and
<%= variable2 %>
## Section 1
In this section we compute some values. We find that the value of variable 1
is <%= variable1 %>
```{r first code block}
variable1 = cars[6, 2]
```
## Section 2
In this section we compute some more values. In this section we compute
some values. We find that the value of variable 2 is <%= variable2 %>
```{r second code block}
variable2 = cars[5, 2]
```
```{r cache = F}
require(knitr)
knit_hooks$set(document = function(x){
x1 = paste(x, collapse = '\n')
paste(capture.output(brew::brew(text = x1)), collapse = '\n')
})
```
This has become pretty easy using the ref.label chunk option. See below:
---
title: Report
output: html_document
---
```{r}
library(pixiedust)
options(pixiedust_print_method = "html")
```
### Executive Summary
```{r exec-summary, echo = FALSE, ref.label = c("model", "table")}
```
Now I can make reference to `fit` here, even though it isn't yet defined in the script. For example, a can get the slope for the `qsec` variable by calling `round(coef(fit)[2], 2)`, which yields 0.93.
Next, I want to show the full table of results. This is stored in the `fittab` object created in the `"table"` chunk.
```{r, echo = FALSE}
fittab
```
### Results
Then I need a chunk named `"model"` in which I define a model of some kind.
```{r model}
fit <- lm(mpg ~ qsec + wt, data = mtcars)
```
And lastly, I create the `"table"` chunk to generate `fittab`.
```{r table}
fittab <-
dust(fit) %>%
medley_model() %>%
medley_bw() %>%
sprinkle(pad = 4,
bg_pattern_by = "rows")
```
I work in knitr, and the following two-pass system works for me. I have two (invisible) code chunks, one at the top and one at the bottom. The one at the bottom saves the values of any variables I need to include in the text before they are actually computed in a file (statedata.R). The top chunk sets the variable values to something that stands out if they haven't been defined yet, and then (if it exists) it grabs the actual values from the stored file.
The script needs to be knit twice, as values will be available only after one pass through. Note that the second chunk erases the saved state file at the end of the second pass, so that any later changes to the script that affect the saved variables will have to be computed anew (so that we don't accidentally report old values from an earlier run of the script).
---
title: "Untitled"
output: html_document
---
```{r, echo=FALSE, results='hide'}
# grab saved computed values from earlier passes
if (!exists("variable1")) {
variable1 <- "UNDEFINED"
variable2 <- "UNDEFINED"
if (file.exists("statedata.R")) {
source("statedata.R")
}
}
# Summary
The values from the analysis are `r variable1` and `r variable2`
## Section 1
In this section we compute some values. We find that the value of variable 1 is `r variable1`
```{r first code block}
variable1 <- cars[4, 2]
```
## Section 2
In this section we compute some more values. In this section we compute some values. We find that the value of variable 2 is `r variable2`
```{r second code block}
variable2 <- cars[5, 2]
```
```{r save variables for summary,echo=FALSE,results='hide'}
if (!file.exists("statedata.R")) {
dump(c("variable1","variable2"), file="statedata.R")
} else {
file.remove("statedata.R")
}
```
Latex macros can solve this problem. See this answer to my related question.
\newcommand\body{
\section{Analysis}
<<>>=
x <- 2
#
Some text here
} % Finishes body
\section*{Executive Summary}
<<>>=
x
#
\body