wrapping wide table in rmarkdown - r

I have a really wide table (300+ columns) and would like to display it by wrapping the columns. In the example I will just use 100 columns.
What I have in mind is repetitively using kable to display the subset of the table:
library(kableExtra)
set.seed(1)
data = data.frame(matrix(rnorm(300, 10, 1), ncol = 100))
kable(data[, 1:5], 'latex', booktabs = T)
kable(data[, 6:10], 'latex', booktabs = T)
kable(data[, 11:15], 'latex', booktabs = T)
But this is apparently tedious... I know there are scaling down options but since I have so many columns, it won't be possible.
Is there any parameter I can twist in kable to make it happen?
Updated:
#jay.sf 's answer seems working well, but it didn't yield the same result here. Instead I got some plain code - could you please have a second look and let me know where can I improve? Thanks!
my sessionInfo() is: R version 3.5.1 (2018-07-02) with rmarkdown::pandoc_version() of 1.19.2.1.

This question is actually trickier than I thought at first glance. I used some tidyverse functions, specifically dplyr::select to get columns and purrr::map to move along groups of column indices.
My thinking with this was to make a list of vectors of column indices to choose, such that the first list item is 1:20, the second is 21:40, and so on, in order to break the data into 20 tables of 5 columns each (the number you use can be a different factor of ncol(data)). I underestimated the work to do that, but got ideas from an old SO post to rep the numbers 1 to 20 along the number of columns, sort it, and use that as the grouping then to split the columns.
Then each of those vectors becomes the column indices in select. The resulting list of data frames each gets passed to knitr::kable and kableExtra::kable_styling. Leaving things off there would get map's default of printing names as well, which isn't ideal, so I added a call to purrr::walk to print them neatly.
Note also that making the kable'd tables this way meant putting results="asis" in the chunk options.
---
title: "knitr chunked"
output: pdf_document
---
```{r include=FALSE}
library(knitr)
library(kableExtra)
library(dplyr)
library(purrr)
set.seed(1)
data = data.frame(matrix(rnorm(300, 10, 1), ncol = 100))
```
```{r results='asis'}
split(1:ncol(data), sort(rep_len(1:20, ncol(data)))) %>%
map(~select(data, .)) %>%
map(kable, booktabs = T) %>%
map(kable_styling) %>%
walk(print)
```
Top of the PDF output:

You could use a matrix containing your columns numbers and give it into a for loop with the cat function inside.
---
output: pdf_document
---
```{r, results="asis", echo=FALSE}
library(kableExtra)
set.seed(1)
dat <- data.frame(matrix(rnorm(300, 10, 1), ncol=100))
m <- matrix(1:ncol(dat), 5)
for (i in 1:ncol(m)) {
cat(kable(dat[, m[, i]], 'latex', booktabs=TRUE), "\\newline")
}
```
Result

Related

Convert list of different length into data table for markdown for html format

This is what Im doing to generate a markdown so that all the things should be in one place.
How can i put these output into a datatable form which are more readable and easier to search.The list which is made are of different length. Each list has a series of table under it.
If there a way to convert these differing length list to data table format that would be really helpful
The table looks like this
## Prepare for analyses
```{r,warning=FALSE,message=FALSE}
set.seed(1234)
library(europepmc)
library(tidypmc)
library(tidyverse)
#library(dplyr)
```
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
##Cytarabine cytogenetically normal aml adult clinical trial Randomized Controlled Trial. 828 records found, showing 10
```{r,include=FALSE}
b <-epmc_search(query = 'cytarabine cytogenetically normal aml adult clinical trial Randomized Controlled Trial OPEN_ACCESS:Y',limit = 10)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
docs <- map(pmcids, epmc_ftxt)
my_tables <- map(docs, pmc_table)
```
```{r}
names(my_tables) <- pmcids
```
The code chunk input and output is then displayed as follows:
```{r basicconsole}
source("flat.R")
L1 <- flattenlist(my_tables)
l.f <- Filter(function(a) any(!is.na(a)), L1)
l.f
#tibble:::print.tbl_df(head(df))
#n <- paste0("Valporic_", names(l.f), ".txt")
for (i in 1:length(l.f)) {
write.table(l.f[i], sep = "\t",row.names = FALSE,col.names = TRUE,file=paste0(names(l.f)[i], ".txt"))
}
UPDATE
I have manged to covert those tibble into dataframe
using this solution
##Outout
```{r}
abc <- mapply(cbind, l.f)
abc
But when it is rendered in the markdown the column formatting is gone. Now i have now dataframe inside list.
But still im not sure how to put that into a data table
**UPDATE 2.0 **
The better approach is to read those saved output as list of files into data table and then use it as markdown but so far it is taking only one ID only. My code.
tbl_fread <-
list.files(pattern = "*.txt") %>%
map_df(~fread(.))
knitr::kable(head(tbl_fread), "pipe")
Is it possible to put these files as such.
if a list of file are from one PMCID then those would be all in one column such as if PMCID one has 3 output then all of them should be one the same row. Then the next PMCID in the second one etc etc.
UPDATE new
I have managed to align the output into more readable format. But It seems that by default all the files assigned to multiple columns which would be the case given that im reading all the files together since my idea of using the list to data table didn't work.
If i can push or stack each unique PMCID over one another instead of all in one after another that would be. Good
knitr::kable(tbl_fread, align = "lccrr")
This may be something you can adapt for R Markdown. I'm not sure what the rationale is to save and load the tables. Instead, you could obtain the tables and show in html directly.
As you are using HTML, make sure to have results='asis' in your chunk. You can use a for loop and seq_along to show each table. You can include information in your table caption, such as the PMCID as well as table number.
---
title: "test13121"
author: "Ben"
date: "1/31/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Libraries
```{r}
library(tidypmc)
library(tidyverse)
library(europepmc)
library(kableExtra)
```
# Get Articles
```{r, echo = FALSE}
b <-epmc_search(query = 'cytarabine aml OPEN_ACCESS:Y',limit = 6)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
docs <- map(pmcids, epmc_ftxt)
my_tables <- map(docs, pmc_table)
names(my_tables) <- pmcids
```
# Show Tables
```{r, echo=F, results='asis'}
for (i in seq_along(my_tables)) {
for (j in seq_along(my_tables[[i]])) {
print(kable(x = my_tables[[i]][[j]], caption = paste0(names(my_tables)[i], ": Table ", j)))
}
}
```

How to conditionally exclude a chunk after evaluating his content?

I'm creating a parametrized report in Rmarkdown, whereas some chunks should not be evaluated (included in the report) based on characteristics of the content within the chunk.
The report calculates individual summaries on a large survey for ~120 facilities with different numbers of units in them. Additionally unit size and volume is largely variable, therefore we exclude unit-analysis if the number of valid answers per unit is less than 10 (this is already recoded to NA in the dataframe-object). I therefore need to write a statement, in which the number of NA's within an object is counted per unit and if for every unit there is only NA, I'd like to do include = FALSE on the chunk. This would need to be repeated for ~50 chunks, therefore I tried to use eval.after.
Martin Schmelzer's comment made me realize I have 2 different problems:
1) I need to use regular expressions to detect the name of the object in a self-written function within the chunk.
2) I need to set up a function for conditionally evaluating eval.after in the chunks.
For Problem 1): The R-Chunk that needs to be checked for eval.after looks like this:
```{r leadership unit, eval=exclude_ifnot_unitC }
kable.unit.tblc(unitblc_leadership, caption = "Führung")
```
kable.unit.tblc(df, caption)is a self-written function that implements kableExtra()functions to style the tables and the first input is a dataframe (that was beforehand created in an R file). I should now use regular expression to extract the name of the dataframe out of the chunk, meaning everything from kable.unit.tblc(to , caption.
I tried this so far for first steps in regular expressions, but I'm not able to get the object "in between" those two expression:
x <- 'kable.unit.tblc(unitblc_leadership, caption = "Führung")'
stringr::str_extract(x, "^kable.unit.tblc\\(")
stringr::str_extract(x, ", caption")
The desired result of the extracted object would in this case be unitblc_leadership and stored in a variable, say test_object.
Regarding the second problem: I should set eval.after = 'include_if_valid' for those chunks and the function for testing this would be:
include_if_valid <- function() {
## search the chunk with regular expression for detecting the
# test object (Problem 1)
# count the number of NAs in all numeric variables of the
# test_object and if all cells are NA's give FALSE, if any
# cell has a value give TRUE
test_object %>%
select_if(is.numeric) %>%
summarise_all(.funs = list(~n.valid)) %>%
gather(key = "Unit", value = "nvalid") %>%
count(nvalid > 0) %>% pull(`nvalid > 0`)
as you can see, I need the test_object that should be derived with the function before - but I'm not sure if my intention is even possible.
The chunk should then look like something like this:
```{r leadership unit, eval.after=include_if_valid }
kable.unit.tblc(unitblc_leadership, caption = "Führung")
```
Edit: I thought too complicated - this solution by Martin worked just fine:
include_if_valid <- function(df) {
if (df %>%
select_if(is.numeric) %>%
summarise_all(.funs = list(~n.valid)) %>%
gather(key = "Unit", value = "nvalid") %>%
pull() %>% sum() > 0) {TRUE} else {FALSE}
}
and within the chunk:
{r leadership unit, eval=include_if_valid(unitblc_leadership) }
kable.unit.tblc(unitblc_leadership, caption = "Führung")
You can change the chunk option results to "hide", but this has to happen before you start evaluating the chunk (since eval.after is limited in which options it applies to). So to get what you want, you would need two chunks:
Compute enough to determine whether the chunk should be computed and displayed. Hide this one, in case no display is wanted.
In the next chunk, repeat calculations if you want to display them, and display the results, all conditional on the previously computed result.
Your example isn't reproducible, so here's a simple one. Suppose I only want to display x if its value is bigger than 10:
```{r include=FALSE}
# compute x as a random value between 9 and 11, but don't display anything
x <- runif(1, 9, 11)
```
```{r include = x > 10}
# display x conditional on its value being > 10
x
```
Here is a way to inject the data as a chunk option, check its validity and print a kable conditional on that result. Nice thing is that we can reference the first generic chunk and call it with a different dataframe.
With knit_hooks$set we create a new chunk hook named df. Everything inside if(before) will be evaluated before the chunk itself will be evaluated. The argument options contains all the chunk options set for the current chunk and envir is the chunk environment.
---
title: "Conditional Evaluation"
output: html_document
---
```{r setup, include = F}
library(dplyr)
library(knitr)
A <- data.frame(A = LETTERS[1:4])
B <- data.frame(B = rep(NA, 4))
C <- data.frame(C = letters[1:4])
include_if_valid <- function(df) {
return(all(!is.na(df)))
}
knit_hooks$set(df = function(before, options, envir) {
if (before) {
assign("valid", include_if_valid(options$df), envir = envir)
}
})
```
```{r generic, df = A, echo = F}
if(valid) kable(opts_current$get("df"))
```
```{r ref.label="generic", df = B, echo = F}
```
```{r ref.label="generic", df = C, echo = F}
```

Replace column names in kable/R markdown

My data frame has ugly column names, but when displaying the table in my report, I want to their "real" names including special characters '(', new lines, greek letters, repeated names, etc.
Is there an easy way of replacing the names in knitr to allow such formatting?
Proposed solution
What I have tried to do is suppress the printing of the data frame names and use add_header_above for better names and names that span several columns. Some advice I've seen says to use:
x <- kable(df)
gsub("<thead>.*</thead>", "", x)
to remove the column names. That's fine, but the issue is that when I subsequently add_header_above, the original column names come back. If I use col.names=rep('',times=ncol(d.df)) in kable(...) the names are gone but the row remains, leaving a gap between my new column names and the table body. Here's a code chunk to illustrate:
```{r functions,echo=T}
drawTable <- function(d.df,caption='Given',hdr.above){
require(knitr)
require(kableExtra)
require(dplyr)
hdr.2 <- rep(c('Value','Rank'),times=ncol(d.df)/2)
x <- knitr::kable(d.df,format='latex',align='c',
col.names=rep('',times=ncol(d.df))) %>%
kable_styling(bootstrap_options=c('striped','hover',
'condensed','responsive'),position='center',
font_size = 9,full_width=F)
x %>% add_header_above(hdr.2) %>%
add_header_above(hdr.above)
}
```
```{r}
df <- data.frame(A=c(1,2),B=c(4,2),C=c(3,4),D=c(8,7))
hdr.above <- c('A2','B2','C2','D2')
drawTable(df,hdr.above = hdr.above)
```
I am not sure where you got the advice to replace rownames, but it seems excessively complex. It is much easier just to use the built-in col.names argument within kable. This solution works for both HTML and LaTeX outputs:
---
output:
pdf_document: default
html_document: default
---
```{r functions,echo=T}
require(knitr)
df <- data.frame(A=c(1,2),B=c(4,2),C=c(3,4),D=c(8,7))
knitr::kable(df,
col.names = c("Space in name",
"(Special Characters)",
"$\\delta{m}_1$",
"Space in name"))
```
PDF output:
HTML output:
If you're targeting HTML, then Δ is an option too.
I couldn't get the accepted answer to work on HTML, so used the above.

Efficient way to wrap column names of proportion tables in rmarkdown pdf output

I'm making weighted tables of row proportions using the questionr package. I want to wrap the column names when they are too long. Because I'm making hundreds of tables, the solution needs to work on tables with varying numbers of columns. I also want to avoid setting all columns to a specific width. Ideally, short column names would remain at their normal width while names exceeding the specified maximum length would be wrapped.
Here are a bunch of solutions I've tried so far, written as .Rmd file:
---
title: "Example"
output: pdf_document
---
```{r setup, include=FALSE}
library(questionr)
library(knitr)
data("happy")
```
A simple weighted table with the "kable" method:
```{r table1, echo=TRUE}
kable(wtd.table(happy$degree, happy$happy, weights = happy$wtssall),
digits = 0)
```
The same "kable" table, but with row proportions:
```{r table2, echo=TRUE}
kable(rprop(wtd.table(happy$degree, happy$happy, weights = happy$wtssall)),
digits = 0)
```
I want to wrap the column headers, but kableExtra::column_spec() gives an error.
Even if it worked it requires manually setting each column width.:
```{r table3, echo=TRUE}
library(kableExtra)
kable(rprop(wtd.table(happy$degree, happy$happy, weights = happy$wtssall)),
digits = 0) %>%
column_spec(column = 2, width = ".25in")
```
Maybe str_wrap will do the trick?
```{r table4, echo=TRUE}
library(stringr)
kable(rprop(wtd.table(happy$degree, str_wrap(happy$happy, width = 8),
weights = happy$wtssall)),
digits = 0)
```
Giving up on knitr::kable(), maybe pander has a solution.
Here is the simple weighted frequency table.
```{r table5, echo=TRUE, results='asis'}
library(pander)
pandoc.table(wtd.table(happy$degree, str_wrap(happy$happy, width = 8),
weights = happy$wtssall),
split.cells=8)
```
So far, so good. But it doesn't work for the table of row proportions,
because the rprop table is of class ([1]"proptab" [2]"table")
while the wtd.table() is just class "table"
```{r table6, echo=TRUE, results='asis', error=TRUE}
pandoc.table(rprop(wtd.table(happy$degree, str_wrap(happy$happy, width = 8),
weights = happy$wtssall)),
split.cells=8)
```
But wait! I can pass a kable() product as pandoc output.
This table looks great, but I don't think I pass any
pandoc.table() arguments like "split.cells=8" to it.
```{r table7, echo=TRUE, results='asis', error=TRUE}
kable(rprop(wtd.table(happy$degree, happy$happy, weights = happy$wtssall)),
digits = 0, format = "pandoc")
```
And here is what the output of that .Rmd file looks like:
At least, for kableExtra, you need to specify format in your kable function to be either latex or html.
To make it dynamic, you can save the table to a variable before it goes into kable and use 2:(ncol(your_table) + 1) in the column_spec function (+1 for the column_name column).

Column alignment with kableExtra using group_rows

I'm using Rmarkdown to produce a PDF of frequency tables. Producing a complex frequency table after running freq from questionr and adding row groupings with group_rows leads to an alignment problem on the last line of the first group. Reproducible example here:
---
output:
pdf_document:
latex_engine: xelatex
fig_caption: true
---
```{r}
library(haven)
library(questionr)
library(dplyr)
library(magrittr)
library(knitr)
library(kableExtra)
# Build some data
x <- rep(c(1,0),times=50)
y <- c(rep(1,times=25),rep(0,times=75))
z <- c(rep(1,times=75),rep(0,times=25))
# Function to run frequencies on several variables at a time
MassFreq <- function(...){
step1 <- list(...) # Wrap items into a list
step2 <- lapply(step1,freq,total=TRUE) # run frequencies on all items
step3 <- bind_rows(step2) # collapse list results into single df
Response <- unlist(lapply(step2,row.names),recursive=FALSE) # Get row names from frequencies
step4 <- cbind(Response,step3) #Stick row names at front of the dataframe
}
# Run function - returns a data frame object
test <- MassFreq(x,y,z)
# Build table
test %>%
kable(format="latex", booktabs = TRUE, row.names=FALSE) %>%
group_rows("Group 1",1,3) %>%
group_rows("Group 2",4,6) %>%
group_rows("Group 3",7,9)
```
Gives me this upon knitting:
The first "Total" text is right-aligned, but everything else is fine. Adding align=('lrrr') in the kable line does nothing, and align=('crrr') is kind of a mess. Using the index method for group_rows produces the same results. When leaving out the group_rows commands, everything in the first column is left-aligned and looks fine. My hunch is that kableExtra isn't playing well with questionr because the "Total" rows are created when running questionr::freq.
This is a bug in current CRAN version of kableExtra, 0.5.2. It has been fixed in the dev version. I will make a CRAN release next week.

Resources