How to conditionally exclude a chunk after evaluating his content? - r

I'm creating a parametrized report in Rmarkdown, whereas some chunks should not be evaluated (included in the report) based on characteristics of the content within the chunk.
The report calculates individual summaries on a large survey for ~120 facilities with different numbers of units in them. Additionally unit size and volume is largely variable, therefore we exclude unit-analysis if the number of valid answers per unit is less than 10 (this is already recoded to NA in the dataframe-object). I therefore need to write a statement, in which the number of NA's within an object is counted per unit and if for every unit there is only NA, I'd like to do include = FALSE on the chunk. This would need to be repeated for ~50 chunks, therefore I tried to use eval.after.
Martin Schmelzer's comment made me realize I have 2 different problems:
1) I need to use regular expressions to detect the name of the object in a self-written function within the chunk.
2) I need to set up a function for conditionally evaluating eval.after in the chunks.
For Problem 1): The R-Chunk that needs to be checked for eval.after looks like this:
```{r leadership unit, eval=exclude_ifnot_unitC }
kable.unit.tblc(unitblc_leadership, caption = "Führung")
```
kable.unit.tblc(df, caption)is a self-written function that implements kableExtra()functions to style the tables and the first input is a dataframe (that was beforehand created in an R file). I should now use regular expression to extract the name of the dataframe out of the chunk, meaning everything from kable.unit.tblc(to , caption.
I tried this so far for first steps in regular expressions, but I'm not able to get the object "in between" those two expression:
x <- 'kable.unit.tblc(unitblc_leadership, caption = "Führung")'
stringr::str_extract(x, "^kable.unit.tblc\\(")
stringr::str_extract(x, ", caption")
The desired result of the extracted object would in this case be unitblc_leadership and stored in a variable, say test_object.
Regarding the second problem: I should set eval.after = 'include_if_valid' for those chunks and the function for testing this would be:
include_if_valid <- function() {
## search the chunk with regular expression for detecting the
# test object (Problem 1)
# count the number of NAs in all numeric variables of the
# test_object and if all cells are NA's give FALSE, if any
# cell has a value give TRUE
test_object %>%
select_if(is.numeric) %>%
summarise_all(.funs = list(~n.valid)) %>%
gather(key = "Unit", value = "nvalid") %>%
count(nvalid > 0) %>% pull(`nvalid > 0`)
as you can see, I need the test_object that should be derived with the function before - but I'm not sure if my intention is even possible.
The chunk should then look like something like this:
```{r leadership unit, eval.after=include_if_valid }
kable.unit.tblc(unitblc_leadership, caption = "Führung")
```
Edit: I thought too complicated - this solution by Martin worked just fine:
include_if_valid <- function(df) {
if (df %>%
select_if(is.numeric) %>%
summarise_all(.funs = list(~n.valid)) %>%
gather(key = "Unit", value = "nvalid") %>%
pull() %>% sum() > 0) {TRUE} else {FALSE}
}
and within the chunk:
{r leadership unit, eval=include_if_valid(unitblc_leadership) }
kable.unit.tblc(unitblc_leadership, caption = "Führung")

You can change the chunk option results to "hide", but this has to happen before you start evaluating the chunk (since eval.after is limited in which options it applies to). So to get what you want, you would need two chunks:
Compute enough to determine whether the chunk should be computed and displayed. Hide this one, in case no display is wanted.
In the next chunk, repeat calculations if you want to display them, and display the results, all conditional on the previously computed result.
Your example isn't reproducible, so here's a simple one. Suppose I only want to display x if its value is bigger than 10:
```{r include=FALSE}
# compute x as a random value between 9 and 11, but don't display anything
x <- runif(1, 9, 11)
```
```{r include = x > 10}
# display x conditional on its value being > 10
x
```

Here is a way to inject the data as a chunk option, check its validity and print a kable conditional on that result. Nice thing is that we can reference the first generic chunk and call it with a different dataframe.
With knit_hooks$set we create a new chunk hook named df. Everything inside if(before) will be evaluated before the chunk itself will be evaluated. The argument options contains all the chunk options set for the current chunk and envir is the chunk environment.
---
title: "Conditional Evaluation"
output: html_document
---
```{r setup, include = F}
library(dplyr)
library(knitr)
A <- data.frame(A = LETTERS[1:4])
B <- data.frame(B = rep(NA, 4))
C <- data.frame(C = letters[1:4])
include_if_valid <- function(df) {
return(all(!is.na(df)))
}
knit_hooks$set(df = function(before, options, envir) {
if (before) {
assign("valid", include_if_valid(options$df), envir = envir)
}
})
```
```{r generic, df = A, echo = F}
if(valid) kable(opts_current$get("df"))
```
```{r ref.label="generic", df = B, echo = F}
```
```{r ref.label="generic", df = C, echo = F}
```

Related

Convert list of different length into data table for markdown for html format

This is what Im doing to generate a markdown so that all the things should be in one place.
How can i put these output into a datatable form which are more readable and easier to search.The list which is made are of different length. Each list has a series of table under it.
If there a way to convert these differing length list to data table format that would be really helpful
The table looks like this
## Prepare for analyses
```{r,warning=FALSE,message=FALSE}
set.seed(1234)
library(europepmc)
library(tidypmc)
library(tidyverse)
#library(dplyr)
```
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
##Cytarabine cytogenetically normal aml adult clinical trial Randomized Controlled Trial. 828 records found, showing 10
```{r,include=FALSE}
b <-epmc_search(query = 'cytarabine cytogenetically normal aml adult clinical trial Randomized Controlled Trial OPEN_ACCESS:Y',limit = 10)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
docs <- map(pmcids, epmc_ftxt)
my_tables <- map(docs, pmc_table)
```
```{r}
names(my_tables) <- pmcids
```
The code chunk input and output is then displayed as follows:
```{r basicconsole}
source("flat.R")
L1 <- flattenlist(my_tables)
l.f <- Filter(function(a) any(!is.na(a)), L1)
l.f
#tibble:::print.tbl_df(head(df))
#n <- paste0("Valporic_", names(l.f), ".txt")
for (i in 1:length(l.f)) {
write.table(l.f[i], sep = "\t",row.names = FALSE,col.names = TRUE,file=paste0(names(l.f)[i], ".txt"))
}
UPDATE
I have manged to covert those tibble into dataframe
using this solution
##Outout
```{r}
abc <- mapply(cbind, l.f)
abc
But when it is rendered in the markdown the column formatting is gone. Now i have now dataframe inside list.
But still im not sure how to put that into a data table
**UPDATE 2.0 **
The better approach is to read those saved output as list of files into data table and then use it as markdown but so far it is taking only one ID only. My code.
tbl_fread <-
list.files(pattern = "*.txt") %>%
map_df(~fread(.))
knitr::kable(head(tbl_fread), "pipe")
Is it possible to put these files as such.
if a list of file are from one PMCID then those would be all in one column such as if PMCID one has 3 output then all of them should be one the same row. Then the next PMCID in the second one etc etc.
UPDATE new
I have managed to align the output into more readable format. But It seems that by default all the files assigned to multiple columns which would be the case given that im reading all the files together since my idea of using the list to data table didn't work.
If i can push or stack each unique PMCID over one another instead of all in one after another that would be. Good
knitr::kable(tbl_fread, align = "lccrr")
This may be something you can adapt for R Markdown. I'm not sure what the rationale is to save and load the tables. Instead, you could obtain the tables and show in html directly.
As you are using HTML, make sure to have results='asis' in your chunk. You can use a for loop and seq_along to show each table. You can include information in your table caption, such as the PMCID as well as table number.
---
title: "test13121"
author: "Ben"
date: "1/31/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Libraries
```{r}
library(tidypmc)
library(tidyverse)
library(europepmc)
library(kableExtra)
```
# Get Articles
```{r, echo = FALSE}
b <-epmc_search(query = 'cytarabine aml OPEN_ACCESS:Y',limit = 6)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
docs <- map(pmcids, epmc_ftxt)
my_tables <- map(docs, pmc_table)
names(my_tables) <- pmcids
```
# Show Tables
```{r, echo=F, results='asis'}
for (i in seq_along(my_tables)) {
for (j in seq_along(my_tables[[i]])) {
print(kable(x = my_tables[[i]][[j]], caption = paste0(names(my_tables)[i], ": Table ", j)))
}
}
```

Inline referencing of a code chunk in document

Suppose there is a code chunk as follows:
```{r mean diff}
(5-mean(dnorm(40,5,2))/5
```
I would like to be able to reference this code chunk with its label in inline form in a markdown document, so that the reference is replaced by the output of this chunk. Is there a way to do it?
" the difference is `r mean diff` %." ##something like this?
I think the answer to your question is "yes, but it's tricky". By default knitr doesn't save the last value computed in a code chunk. In regular code, the last value calculated is saved in .Last.value, but knitr doesn't simulate this.
However, a simple modification lets you do something very similar:
```{r}
meandiff <- 5-mean(dnorm(40,5,2))/5
meandiff # if you want the chunk to print its value
```
and then in the text, use
" the difference is `r meandiff` %."
If you really want to save the last value, it's possible by setting a "render hook". For example, the code below saves the last value, then calls the old hook:
```{r}
.Last.value <- NULL
old_hook <- knitr::opts_chunk$get("render")
knitr::opts_chunk$set(render = function(x, options, ...) {
.Last.value <<- x
if (!is.null(old_hook))
old_hook(x, options, ...)
else
knitr::knit_print(x, options, ...)
})
```
```{r mean diff}
5-mean(dnorm(40,5,2))/5
```
The value was `r .Last.value`.

wrapping wide table in rmarkdown

I have a really wide table (300+ columns) and would like to display it by wrapping the columns. In the example I will just use 100 columns.
What I have in mind is repetitively using kable to display the subset of the table:
library(kableExtra)
set.seed(1)
data = data.frame(matrix(rnorm(300, 10, 1), ncol = 100))
kable(data[, 1:5], 'latex', booktabs = T)
kable(data[, 6:10], 'latex', booktabs = T)
kable(data[, 11:15], 'latex', booktabs = T)
But this is apparently tedious... I know there are scaling down options but since I have so many columns, it won't be possible.
Is there any parameter I can twist in kable to make it happen?
Updated:
#jay.sf 's answer seems working well, but it didn't yield the same result here. Instead I got some plain code - could you please have a second look and let me know where can I improve? Thanks!
my sessionInfo() is: R version 3.5.1 (2018-07-02) with rmarkdown::pandoc_version() of 1.19.2.1.
This question is actually trickier than I thought at first glance. I used some tidyverse functions, specifically dplyr::select to get columns and purrr::map to move along groups of column indices.
My thinking with this was to make a list of vectors of column indices to choose, such that the first list item is 1:20, the second is 21:40, and so on, in order to break the data into 20 tables of 5 columns each (the number you use can be a different factor of ncol(data)). I underestimated the work to do that, but got ideas from an old SO post to rep the numbers 1 to 20 along the number of columns, sort it, and use that as the grouping then to split the columns.
Then each of those vectors becomes the column indices in select. The resulting list of data frames each gets passed to knitr::kable and kableExtra::kable_styling. Leaving things off there would get map's default of printing names as well, which isn't ideal, so I added a call to purrr::walk to print them neatly.
Note also that making the kable'd tables this way meant putting results="asis" in the chunk options.
---
title: "knitr chunked"
output: pdf_document
---
```{r include=FALSE}
library(knitr)
library(kableExtra)
library(dplyr)
library(purrr)
set.seed(1)
data = data.frame(matrix(rnorm(300, 10, 1), ncol = 100))
```
```{r results='asis'}
split(1:ncol(data), sort(rep_len(1:20, ncol(data)))) %>%
map(~select(data, .)) %>%
map(kable, booktabs = T) %>%
map(kable_styling) %>%
walk(print)
```
Top of the PDF output:
You could use a matrix containing your columns numbers and give it into a for loop with the cat function inside.
---
output: pdf_document
---
```{r, results="asis", echo=FALSE}
library(kableExtra)
set.seed(1)
dat <- data.frame(matrix(rnorm(300, 10, 1), ncol=100))
m <- matrix(1:ncol(dat), 5)
for (i in 1:ncol(m)) {
cat(kable(dat[, m[, i]], 'latex', booktabs=TRUE), "\\newline")
}
```
Result

rstudio hangs and aborts with rmarkdown loop

I have several datasets each of which have a common grouping factor. I want to produce one large report with separate sections for each grouping factor. Therefore I want to re-run a set of rmarkdown code for each iteration of the grouping factor.
Using the following approach from here doesnt work for me. i.e.:
---
title: "Untitled"
author: "Author"
output: html_document
---
```{r, results='asis'}
for (i in 1:2){
cat('\n')
cat("#This is a heading for ", i, "\n")
hist(cars[,i])
cat('\n')
}
```
Because the markdown I want to run on each grouping factor does not easily fit within one code chunk. The report must be ordered by grouping factor and I want to be able to come in and out of code chunks for each iteration over grouping factor.
So I went for calling an Rmd. with render using a loop from an Rscript for each grouping factor as found here:
# run a markdown file to summarise each one.
for(each_group in the_groups){
render("/Users/path/xx.Rmd",
output_format = "pdf_document",
output_file = paste0(each_group,"_report_", Sys.Date(),".pdf"),
output_dir = "/Users/path/folder")
}
My plan was to then combine the individual reports with pdftk. However, when I get to the about the 5th iteration my Rstudio session hangs and eventually aborts with a fatal error. I have ran individually the Rmd. for the grouping factors it stops at which work fine.
I tested some looping with the following simple test files:
.R
# load packages
library(knitr)
library(markdown)
library(rmarkdown)
# use first 5 rows of mtcars as example data
mtcars <- mtcars[1:5,]
# for each type of car in the data create a report
# these reports are saved in output_dir with the name specified by output_file
for (car in rep(unique(rownames(mtcars)), 100)){
# for pdf reports
rmarkdown::render(input = "/Users/xx/Desktop/2.Rmd",
output_format = "pdf_document",
output_file = paste("test_report_", car, Sys.Date(), ".pdf", sep=''),
output_dir = "/Users/xx/Desktop")
}
.Rmd
```{r, include = FALSE}
# packages
library(knitr)
library(markdown)
library(rmarkdown)
library(tidyr)
library(dplyr)
library(ggplot2)
```
```{r}
# limit data to car name that is currently specified by the loop
cars <- mtcars[rownames(mtcars)==car,]
# create example data for each car
x <- sample(1:10, 1)
cars <- do.call("rbind", replicate(x, cars, simplify = FALSE))
# create hypotheical lat and lon for each row in cars
cars$lat <- sapply(rownames(cars), function(x) round(runif(1, 30, 46), 3))
cars$lon <- sapply(rownames(cars), function(x) round(runif(1, -115, -80),3))
cars
```
Today is `r Sys.Date()`.
```{r}
# data table of cars sold
table <- xtable(cars[,c(1:2, 12:13)])
print(table, type="latex", comment = FALSE)
```
This works fine. So I also looked at memory pressure while running my actual loop over the Rmd. which gets very high.
Is there a way to reduce memory when looping over a render call to an Rmd. file?
Is there a better way to create a report for multiple grouping factors than looping over a render call to an Rmd. file, which doesn't rely on the entire loop being inside one code chunk?
Found a solution here rmarkdown::render() in a loop - cannot allocate vector of size
knitr::knit_meta(class=NULL, clean = TRUE)
use this line before the render line and it seems to work
I am dealing with the same issue now and it's very perplexing. I tried to create some simple MWEs but they loop successfully on occasion. So far, I've tried
Checking the garbage collection between iterations of rmarkdown::render. (They don't reveal any special accumulations.)
Removing all inessential objects
Deleting any cached files manually
Here is my question:
How can we debug hangs? Should we set up special log files to understand what's going wrong?

When selecting multiple inputs in a shiny app, why does the error depend on the order that the inputs were selected?

I have a flexdashboard document with runtime: shiny (I posted the app here https://arie.shinyapps.io/reproducible_example/ and embedded the code, but wanted to put the code below as well in case the app exceeds its allotted usage on shinyapps.io):
---
title: "Example"
runtime: shiny
output:
flexdashboard::flex_dashboard:
source_code: embed
---
Given the following example data set:
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
df <- tibble(name = c("peter", "paul", "mary"),
value = c(1:3))
```
I want to be able to make multiple selections from the following user interface:
Column {data-width=250, .sidebar}
-----------------------------------------------------------------------
```{r}
# creates interface
selectInput("name_input", label = "Name", choices = df$name,
selected = NULL, multiple = TRUE, selectize = TRUE)
```
and have a ggplot "react" to the selections. So I make a reactive data set:
```{r}
# reactive data
df_reactive <- reactive(df[df$name == input$name_input,])
```
and create the following plot:
Column {data-width=750}
-----------------------------------------------------------------------
### Chart B
```{r}
renderPlot(
ggplot(df_reactive(), aes(x = input$name_input, y = value) ) +
geom_col()
)
```
Now, when I Run Document and select first peter, then paul, and then mary, the plot reacts exactly as expected: It adds a bar each time a name is added. The problem occurs when I, for example, first select paul and then peter, which throws the error Aesthetics must be either length 1 or the same as the data (2): x, y.
The error makes sense to me in the context of a static chart, but I am confused about why the order of selecting the names should matter and how it can be resolved.
The problem is within:
df_reactive <- reactive(df[df$name == input$name_input,])
If length(input$name_input) is < 3 you will try to compare two arrays of different length. R will throw an error and it is also not the test you actually want to perform.
As I see it, you want to test for each element in df$name if it is included in input$name_input. Luckily there is a shortcut for that in R, so you wont have to use a for loop or sapply(),...
Like I wrote in the comments: df_reactive <- reactive(df[df$name %in% input$name_input,]) will work as well.
For more details concerning the notation, i would refer to an existing answer as the answer would become more of a duplicate then.
The difference between == and %in% is explained here:
difference between `%in%` VS `==`

Resources