I'm using Rmarkdown to produce a PDF of frequency tables. Producing a complex frequency table after running freq from questionr and adding row groupings with group_rows leads to an alignment problem on the last line of the first group. Reproducible example here:
---
output:
pdf_document:
latex_engine: xelatex
fig_caption: true
---
```{r}
library(haven)
library(questionr)
library(dplyr)
library(magrittr)
library(knitr)
library(kableExtra)
# Build some data
x <- rep(c(1,0),times=50)
y <- c(rep(1,times=25),rep(0,times=75))
z <- c(rep(1,times=75),rep(0,times=25))
# Function to run frequencies on several variables at a time
MassFreq <- function(...){
step1 <- list(...) # Wrap items into a list
step2 <- lapply(step1,freq,total=TRUE) # run frequencies on all items
step3 <- bind_rows(step2) # collapse list results into single df
Response <- unlist(lapply(step2,row.names),recursive=FALSE) # Get row names from frequencies
step4 <- cbind(Response,step3) #Stick row names at front of the dataframe
}
# Run function - returns a data frame object
test <- MassFreq(x,y,z)
# Build table
test %>%
kable(format="latex", booktabs = TRUE, row.names=FALSE) %>%
group_rows("Group 1",1,3) %>%
group_rows("Group 2",4,6) %>%
group_rows("Group 3",7,9)
```
Gives me this upon knitting:
The first "Total" text is right-aligned, but everything else is fine. Adding align=('lrrr') in the kable line does nothing, and align=('crrr') is kind of a mess. Using the index method for group_rows produces the same results. When leaving out the group_rows commands, everything in the first column is left-aligned and looks fine. My hunch is that kableExtra isn't playing well with questionr because the "Total" rows are created when running questionr::freq.
This is a bug in current CRAN version of kableExtra, 0.5.2. It has been fixed in the dev version. I will make a CRAN release next week.
Related
Below is my code that I am using to manually create a table. When I knit in R markdown, there is a border about it and ## in front of each row. Is there a way to remove these items? Or would I be best creating data frames using these data points and then using gt?
library(dplyr)
comparison <- matrix(c(-2267,-345916,-185344,-44.4,-24.0,-57.1,"+1224","+191534","+80,347","51.4","55.4","43.4"),ncol=3,byrow=TRUE)
colnames(comparison) <- c("Daycare Services","Total, All Industries","Accommodation and Food Services")
rownames(comparison) <- c("Decline","% Decline","Recovery","% Recovery")
comparison <- as.table(comparison)
comparison
Here is how I create it. Create a new R markdown file. Next, I put the code chunk in the first chunk (look below) and then enter the word, "comparison" in the place of summary(cars) below in the section that appears as such
knitr::opts_chunk$set(echo = TRUE)
comparison <- matrix(c(-2267,-345916,-185344,-44.4,-24.0,-57.1,"+1224","+191534","+80,347","51.4","55.4","43.4"),ncol=3,byrow=TRUE)
colnames(comparison) <- c("Daycare Services","Total, All Industries","Accommodation and Food Services")
rownames(comparison) <- c("Decline","% Decline","Recovery","% Recovery")
comparison <- as.table(comparison)
comparison
summary(cars)
If we are using gt, then we do
```{r}
library(gt)
library(dplyr)
comparison <- matrix(c(-2267,-345916,-185344,-44.4,-24.0,-57.1,"+1224","+191534","+80,347","51.4","55.4","43.4"),ncol=3,byrow=TRUE)
colnames(comparison) <- c("Daycare Services","Total, All Industries","Accommodation and Food Services")
rownames(comparison) <- c("Decline","% Decline","Recovery","% Recovery")
out <- as.data.frame(comparison)
gt(out, rownames_to_stub = TRUE)
```
-output
I have a really wide table (300+ columns) and would like to display it by wrapping the columns. In the example I will just use 100 columns.
What I have in mind is repetitively using kable to display the subset of the table:
library(kableExtra)
set.seed(1)
data = data.frame(matrix(rnorm(300, 10, 1), ncol = 100))
kable(data[, 1:5], 'latex', booktabs = T)
kable(data[, 6:10], 'latex', booktabs = T)
kable(data[, 11:15], 'latex', booktabs = T)
But this is apparently tedious... I know there are scaling down options but since I have so many columns, it won't be possible.
Is there any parameter I can twist in kable to make it happen?
Updated:
#jay.sf 's answer seems working well, but it didn't yield the same result here. Instead I got some plain code - could you please have a second look and let me know where can I improve? Thanks!
my sessionInfo() is: R version 3.5.1 (2018-07-02) with rmarkdown::pandoc_version() of 1.19.2.1.
This question is actually trickier than I thought at first glance. I used some tidyverse functions, specifically dplyr::select to get columns and purrr::map to move along groups of column indices.
My thinking with this was to make a list of vectors of column indices to choose, such that the first list item is 1:20, the second is 21:40, and so on, in order to break the data into 20 tables of 5 columns each (the number you use can be a different factor of ncol(data)). I underestimated the work to do that, but got ideas from an old SO post to rep the numbers 1 to 20 along the number of columns, sort it, and use that as the grouping then to split the columns.
Then each of those vectors becomes the column indices in select. The resulting list of data frames each gets passed to knitr::kable and kableExtra::kable_styling. Leaving things off there would get map's default of printing names as well, which isn't ideal, so I added a call to purrr::walk to print them neatly.
Note also that making the kable'd tables this way meant putting results="asis" in the chunk options.
---
title: "knitr chunked"
output: pdf_document
---
```{r include=FALSE}
library(knitr)
library(kableExtra)
library(dplyr)
library(purrr)
set.seed(1)
data = data.frame(matrix(rnorm(300, 10, 1), ncol = 100))
```
```{r results='asis'}
split(1:ncol(data), sort(rep_len(1:20, ncol(data)))) %>%
map(~select(data, .)) %>%
map(kable, booktabs = T) %>%
map(kable_styling) %>%
walk(print)
```
Top of the PDF output:
You could use a matrix containing your columns numbers and give it into a for loop with the cat function inside.
---
output: pdf_document
---
```{r, results="asis", echo=FALSE}
library(kableExtra)
set.seed(1)
dat <- data.frame(matrix(rnorm(300, 10, 1), ncol=100))
m <- matrix(1:ncol(dat), 5)
for (i in 1:ncol(m)) {
cat(kable(dat[, m[, i]], 'latex', booktabs=TRUE), "\\newline")
}
```
Result
I'm creating a parametrized report in Rmarkdown, whereas some chunks should not be evaluated (included in the report) based on characteristics of the content within the chunk.
The report calculates individual summaries on a large survey for ~120 facilities with different numbers of units in them. Additionally unit size and volume is largely variable, therefore we exclude unit-analysis if the number of valid answers per unit is less than 10 (this is already recoded to NA in the dataframe-object). I therefore need to write a statement, in which the number of NA's within an object is counted per unit and if for every unit there is only NA, I'd like to do include = FALSE on the chunk. This would need to be repeated for ~50 chunks, therefore I tried to use eval.after.
Martin Schmelzer's comment made me realize I have 2 different problems:
1) I need to use regular expressions to detect the name of the object in a self-written function within the chunk.
2) I need to set up a function for conditionally evaluating eval.after in the chunks.
For Problem 1): The R-Chunk that needs to be checked for eval.after looks like this:
```{r leadership unit, eval=exclude_ifnot_unitC }
kable.unit.tblc(unitblc_leadership, caption = "Führung")
```
kable.unit.tblc(df, caption)is a self-written function that implements kableExtra()functions to style the tables and the first input is a dataframe (that was beforehand created in an R file). I should now use regular expression to extract the name of the dataframe out of the chunk, meaning everything from kable.unit.tblc(to , caption.
I tried this so far for first steps in regular expressions, but I'm not able to get the object "in between" those two expression:
x <- 'kable.unit.tblc(unitblc_leadership, caption = "Führung")'
stringr::str_extract(x, "^kable.unit.tblc\\(")
stringr::str_extract(x, ", caption")
The desired result of the extracted object would in this case be unitblc_leadership and stored in a variable, say test_object.
Regarding the second problem: I should set eval.after = 'include_if_valid' for those chunks and the function for testing this would be:
include_if_valid <- function() {
## search the chunk with regular expression for detecting the
# test object (Problem 1)
# count the number of NAs in all numeric variables of the
# test_object and if all cells are NA's give FALSE, if any
# cell has a value give TRUE
test_object %>%
select_if(is.numeric) %>%
summarise_all(.funs = list(~n.valid)) %>%
gather(key = "Unit", value = "nvalid") %>%
count(nvalid > 0) %>% pull(`nvalid > 0`)
as you can see, I need the test_object that should be derived with the function before - but I'm not sure if my intention is even possible.
The chunk should then look like something like this:
```{r leadership unit, eval.after=include_if_valid }
kable.unit.tblc(unitblc_leadership, caption = "Führung")
```
Edit: I thought too complicated - this solution by Martin worked just fine:
include_if_valid <- function(df) {
if (df %>%
select_if(is.numeric) %>%
summarise_all(.funs = list(~n.valid)) %>%
gather(key = "Unit", value = "nvalid") %>%
pull() %>% sum() > 0) {TRUE} else {FALSE}
}
and within the chunk:
{r leadership unit, eval=include_if_valid(unitblc_leadership) }
kable.unit.tblc(unitblc_leadership, caption = "Führung")
You can change the chunk option results to "hide", but this has to happen before you start evaluating the chunk (since eval.after is limited in which options it applies to). So to get what you want, you would need two chunks:
Compute enough to determine whether the chunk should be computed and displayed. Hide this one, in case no display is wanted.
In the next chunk, repeat calculations if you want to display them, and display the results, all conditional on the previously computed result.
Your example isn't reproducible, so here's a simple one. Suppose I only want to display x if its value is bigger than 10:
```{r include=FALSE}
# compute x as a random value between 9 and 11, but don't display anything
x <- runif(1, 9, 11)
```
```{r include = x > 10}
# display x conditional on its value being > 10
x
```
Here is a way to inject the data as a chunk option, check its validity and print a kable conditional on that result. Nice thing is that we can reference the first generic chunk and call it with a different dataframe.
With knit_hooks$set we create a new chunk hook named df. Everything inside if(before) will be evaluated before the chunk itself will be evaluated. The argument options contains all the chunk options set for the current chunk and envir is the chunk environment.
---
title: "Conditional Evaluation"
output: html_document
---
```{r setup, include = F}
library(dplyr)
library(knitr)
A <- data.frame(A = LETTERS[1:4])
B <- data.frame(B = rep(NA, 4))
C <- data.frame(C = letters[1:4])
include_if_valid <- function(df) {
return(all(!is.na(df)))
}
knit_hooks$set(df = function(before, options, envir) {
if (before) {
assign("valid", include_if_valid(options$df), envir = envir)
}
})
```
```{r generic, df = A, echo = F}
if(valid) kable(opts_current$get("df"))
```
```{r ref.label="generic", df = B, echo = F}
```
```{r ref.label="generic", df = C, echo = F}
```
My data frame has ugly column names, but when displaying the table in my report, I want to their "real" names including special characters '(', new lines, greek letters, repeated names, etc.
Is there an easy way of replacing the names in knitr to allow such formatting?
Proposed solution
What I have tried to do is suppress the printing of the data frame names and use add_header_above for better names and names that span several columns. Some advice I've seen says to use:
x <- kable(df)
gsub("<thead>.*</thead>", "", x)
to remove the column names. That's fine, but the issue is that when I subsequently add_header_above, the original column names come back. If I use col.names=rep('',times=ncol(d.df)) in kable(...) the names are gone but the row remains, leaving a gap between my new column names and the table body. Here's a code chunk to illustrate:
```{r functions,echo=T}
drawTable <- function(d.df,caption='Given',hdr.above){
require(knitr)
require(kableExtra)
require(dplyr)
hdr.2 <- rep(c('Value','Rank'),times=ncol(d.df)/2)
x <- knitr::kable(d.df,format='latex',align='c',
col.names=rep('',times=ncol(d.df))) %>%
kable_styling(bootstrap_options=c('striped','hover',
'condensed','responsive'),position='center',
font_size = 9,full_width=F)
x %>% add_header_above(hdr.2) %>%
add_header_above(hdr.above)
}
```
```{r}
df <- data.frame(A=c(1,2),B=c(4,2),C=c(3,4),D=c(8,7))
hdr.above <- c('A2','B2','C2','D2')
drawTable(df,hdr.above = hdr.above)
```
I am not sure where you got the advice to replace rownames, but it seems excessively complex. It is much easier just to use the built-in col.names argument within kable. This solution works for both HTML and LaTeX outputs:
---
output:
pdf_document: default
html_document: default
---
```{r functions,echo=T}
require(knitr)
df <- data.frame(A=c(1,2),B=c(4,2),C=c(3,4),D=c(8,7))
knitr::kable(df,
col.names = c("Space in name",
"(Special Characters)",
"$\\delta{m}_1$",
"Space in name"))
```
PDF output:
HTML output:
If you're targeting HTML, then Δ is an option too.
I couldn't get the accepted answer to work on HTML, so used the above.
I have several datasets each of which have a common grouping factor. I want to produce one large report with separate sections for each grouping factor. Therefore I want to re-run a set of rmarkdown code for each iteration of the grouping factor.
Using the following approach from here doesnt work for me. i.e.:
---
title: "Untitled"
author: "Author"
output: html_document
---
```{r, results='asis'}
for (i in 1:2){
cat('\n')
cat("#This is a heading for ", i, "\n")
hist(cars[,i])
cat('\n')
}
```
Because the markdown I want to run on each grouping factor does not easily fit within one code chunk. The report must be ordered by grouping factor and I want to be able to come in and out of code chunks for each iteration over grouping factor.
So I went for calling an Rmd. with render using a loop from an Rscript for each grouping factor as found here:
# run a markdown file to summarise each one.
for(each_group in the_groups){
render("/Users/path/xx.Rmd",
output_format = "pdf_document",
output_file = paste0(each_group,"_report_", Sys.Date(),".pdf"),
output_dir = "/Users/path/folder")
}
My plan was to then combine the individual reports with pdftk. However, when I get to the about the 5th iteration my Rstudio session hangs and eventually aborts with a fatal error. I have ran individually the Rmd. for the grouping factors it stops at which work fine.
I tested some looping with the following simple test files:
.R
# load packages
library(knitr)
library(markdown)
library(rmarkdown)
# use first 5 rows of mtcars as example data
mtcars <- mtcars[1:5,]
# for each type of car in the data create a report
# these reports are saved in output_dir with the name specified by output_file
for (car in rep(unique(rownames(mtcars)), 100)){
# for pdf reports
rmarkdown::render(input = "/Users/xx/Desktop/2.Rmd",
output_format = "pdf_document",
output_file = paste("test_report_", car, Sys.Date(), ".pdf", sep=''),
output_dir = "/Users/xx/Desktop")
}
.Rmd
```{r, include = FALSE}
# packages
library(knitr)
library(markdown)
library(rmarkdown)
library(tidyr)
library(dplyr)
library(ggplot2)
```
```{r}
# limit data to car name that is currently specified by the loop
cars <- mtcars[rownames(mtcars)==car,]
# create example data for each car
x <- sample(1:10, 1)
cars <- do.call("rbind", replicate(x, cars, simplify = FALSE))
# create hypotheical lat and lon for each row in cars
cars$lat <- sapply(rownames(cars), function(x) round(runif(1, 30, 46), 3))
cars$lon <- sapply(rownames(cars), function(x) round(runif(1, -115, -80),3))
cars
```
Today is `r Sys.Date()`.
```{r}
# data table of cars sold
table <- xtable(cars[,c(1:2, 12:13)])
print(table, type="latex", comment = FALSE)
```
This works fine. So I also looked at memory pressure while running my actual loop over the Rmd. which gets very high.
Is there a way to reduce memory when looping over a render call to an Rmd. file?
Is there a better way to create a report for multiple grouping factors than looping over a render call to an Rmd. file, which doesn't rely on the entire loop being inside one code chunk?
Found a solution here rmarkdown::render() in a loop - cannot allocate vector of size
knitr::knit_meta(class=NULL, clean = TRUE)
use this line before the render line and it seems to work
I am dealing with the same issue now and it's very perplexing. I tried to create some simple MWEs but they loop successfully on occasion. So far, I've tried
Checking the garbage collection between iterations of rmarkdown::render. (They don't reveal any special accumulations.)
Removing all inessential objects
Deleting any cached files manually
Here is my question:
How can we debug hangs? Should we set up special log files to understand what's going wrong?