Allow duplicated names in binded tables - r

UPDATE for why I changed my votes.
This code has the table displayed but R doesn't knit pdf.
all_jt %>%
kbl(longtable = T, booktabs = T,
caption = "table") %>%
remove_column(7) %>%
add_header_above(c(" " = 2, "Year 1" = 4, "Year 2" = 4)) %>%
kable_styling(latex_options = c("repeat_header"))
Quitting from lines 13-37 (test_table.Rmd)
Error in remove_column(., 7) :
Removing columns was not implemented for latex kables yet
switching to select(-7) as in here Remove_Column from a kable table which will be output as latex/pdf doesn't work because R doesn't like duplicated column names.
I have two ANOVA tables, jt_1 and jt_2 below, that I want to merge and keep 1 column for the model term only. As I remove the duplicated column, R added .1 to the tail of columns' 7, 8, 9 and 10 names.
library(emmeans)
library(stringr)
warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)
jt_1 <- print(joint_tests(warp.lm), export = T) %>% as.data.frame()
jt_2 <- jt_1
all_jt <- cbind(jt_1, jt_2) %>%
setNames(gsub("summary.", "", colnames(.)))
all_jt[,-6]%>% #to remove the duplicated column for model term
data.frame(check.names = F) %>%
kbl(longtable = T, booktabs = T,
caption = "table") %>%
add_header_above(c(" " = 2, "Year 1" = 4, "Year 2" = 4)) %>%
kable_styling(latex_options = c("repeat_header"))
Here is a brief idea of what I need.
Many thanks in advance.

You can use remove_column function from kableExtra to remove a column instead of all_jt[,-6] which makes the column name unique.
library(knitr)
library(kableExtra)
all_jt %>%
kbl(longtable = T, booktabs = T,
caption = "table") %>%
remove_column(7) %>%
add_header_above(c(" " = 2, "Year 1" = 4, "Year 2" = 4)) %>%
kable_styling(latex_options = c("repeat_header"))

R does not like duplicate column names in data.frames. If you step through your last code block line by line you will notice that all_jt[, -6] makes column names unique by adding the ".1" suffix.
The/a solution is to provide column names to kbl directly, e.g.
all_jt[,-6] %>%
kbl(longtable = T, booktabs = T,
col.names = gsub("\\.\\d", "", names(.)),
caption = "table") %>%
add_header_above(c(" " = 2, "Year 1" = 4, "Year 2" = 4)) %>%
kable_styling(latex_options = c("repeat_header"))
This produces

Related

gt Tables - applying formatting to multiple dataframes

I have several tables that all need to be formatted similarly. I figured out how I want the tables formatted on one table, and have been trying to write a function that I can apply to all tables easily without having to copy and paste the code over and over
Here is my sample data frame
example <- data.frame(
stringsAsFactors = FALSE,
age = c("age 12 to 17", "age 18 to 25", "age 26 and up"),
alc_est = c(14579.28558, 131872.35964, 660957.6512),
bing_alc = c(0.0477, 0.3143, 0.224),
marj_est = c(35913.3345, 137033.12968, 534667.52856),
mj_use = c(0.1175, 0.3266, 0.1812),
heroin_est = c(NA, 419.5748, 18294.36356),
heroin_use = c(NA, 0.001, 0.0062),
meth_est = c(550.16172, 2139.83148, 68161.25778),
meth_use = c(0.0018, 0.0051, 0.0231)
)
Here is some of the formatting that will be applied to multiple tables
table_format <- example %>%
gt(rowname_col = "age") %>%
tab_stubhead(label = md("**Statewide**")) %>%
tab_header(
title = md("**table title here**"),
subtitle = "more info here"
) %>%
fmt_number(
columns = c(2, 4, 6, 8),
decimals = 0
) %>%
tab_spanner_delim(
delim = "_",
columns = c(2:9)
)
Here is how I tried to write a function to apply to multiple tables. The function runs but doesn't actually format the table
format_use_est <- function(df_here) {
result <- df_here %>%
gt(rowname_col = "age") %>%
tab_stubhead(label = md("**Statewide**")) %>%
tab_header(
title = md("**table title here**"),
subtitle = "more info here"
) %>%
fmt_percent(
columns = c(3, 5, 7, 9),
decimals = 1
) %>%
fmt_number(
columns = c(2, 4, 6, 8),
decimals = 0
) %>%
tab_spanner_delim(
delim = "_",
columns = c(2:9)
)
}
How can I write a function for formatting these tables and apply to multiple tables more effectively? Thank you!

Adding Headers to a table using formattable in R

I would like to add headers entitled "Category 1", "Category 2", "Category 3" and "Category 4" to a table, separated by a black line. I would like it to look like the picture attached. Please see the code for the table without the Categories/black lines between columns:
library(formattable)
data(mtcars)
df <- mtcars
formattable(df)
Desired plot
Here is one with kable where can use add_header_above by passing a vector of key/value pair with key suggesting the name and value for the number of columns to be expanded
library(kableExtra)
library(dplyr)
library(kableExtra)
mtcars %>%
kable() %>%
kable_styling(bootstrap_options = "bordered",
full_width = FALSE) %>%
add_header_above(c("", "Category1" = 3, "Category2" = 3,
"Category3" = 3, "Category4" = 2)) %>%
collapse_rows(columns = 1,
valign = "middle")
-output
Some of the styling can be changed as well
mtcars %>%
kable() %>%
kable_styling(bootstrap_options = "responsive",
full_width = FALSE) %>%
add_header_above(c("", "Category1" = 3, "Category2" = 3,
"Category3" = 3, "Category4" = 2)) %>%
collapse_rows(columns = 1,
valign = "middle")

R markdown contingency table %>% tabulate column variables with selective values

I am quite new to R, coming from Stata. Below is the r markdown chunk with reproducible data example. The data is representative to the data i am working with. But only with more binary (logical) and factor variables in number.
The libraries and data:
# Setup and load package:
library(dplyr)
library(expss)
library(hablar)
library(kableExtra)
library(summarytools)
# Load data:
data("mtcars")
raw_df <- select(mtcars,c(wt,cyl,gear,vs,am))
# Data prep and labelling:
df <- raw_df %>%
apply_labels(wt = "Facility ID",
cyl = "Geographical Area",
cyl = c("Area A" = 4,"Area B" = 6, "Area C" = 8),
gear = "Tier",
gear = c("Tier 1" = 3, "Tier 2" = 4, "Tier 3" = 5),
vs = "E.coli",
am = "V.choleri") %>%
convert(chr(wt),
fct(cyl,gear),
lgl(vs,am))
Please note that in my actual data there are more categorical and logical variables.
I have managed to make the following table in r markdown (html output):
df %>%
tab_cells(cyl, gear) %>%
tab_total_row_position("below") %>%
tab_total_statistic("u_rpct")%>%
tab_total_label("Total hosts (Row proportions)") %>%
tab_cols(vs, am) %>%
tab_stat_rpct() %>%
tab_cols(total(label = "Number of hosts")) %>%
tab_stat_cases() %>%
tab_pivot(stat_position = "outside_columns") %>%
recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy) %>%
split_table_to_df() %>%
kable(align = "c", digits = 1) %>%
kable_styling(bootstrap_options = c("striped", "condensed", "responsive"),
full_width = F, position = "center") %>%
row_spec(1:2, bold = TRUE)
Problems:
1. I wish i could include only "TRUE" columns, dropping "FALSE" columns from the table. But keeping the 1st Row label intact ("E. coli", "V.choleri"). In fact i would not be needing the 2nd row ("TRUE","FALSE)
2. I have labelled the "Total Row proportion" (#Total hosts), But can not remove the leading "#" sign. In the right most column cell of the row with the "Total row proportion", it shows "100". I tried it to be the summation of column cells instead, but failed. "100" is totally misleading.
3. I have also tried to get my desired table through "ctable" function of "summarytools" package. As it has an excellent structure, with number of observations also induced within proportion cells. :
print(ctable(df$cyl,df$am), method = 'render')
But the problem is it seems to permit only one pair of categorical variables. And also, the "FALSE" can not be omited. But the last column is perfect with rowtotals (observations)
Details:
R : 4.0.0
R studio: 1.2.5042
The packages are all up-to-date.
Tables from expss are usual data.frames. Column labels is just column names with rows separated with "|" symbol. So, you can manipulate them as usual column names. Row labels are located in the column row_labels and we can remove '#' sign with search and replace operations.
"Total row proportion" shows "100" because at the beginning you specify total statistic as row percent and row percent for single column is 100.
Taking into account all the above:
library(dplyr)
library(expss)
library(hablar)
library(kableExtra)
library(summarytools)
# Load data:
data("mtcars")
raw_df <- select(mtcars,c(wt,cyl,gear,vs,am))
# Data prep and labelling:
df <- raw_df %>%
apply_labels(wt = "Facility ID",
cyl = "Geographical Area",
cyl = c("Area A" = 4,"Area B" = 6, "Area C" = 8),
gear = "Tier",
gear = c("Tier 1" = 3, "Tier 2" = 4, "Tier 3" = 5),
vs = "E.coli",
am = "V.choleri") %>%
convert(chr(wt),
fct(cyl,gear),
lgl(vs,am))
tbl = df %>%
tab_cells(cyl, gear) %>%
tab_total_row_position("below") %>%
tab_total_statistic("u_rpct")%>%
tab_total_label("Total hosts (Row proportions)") %>%
tab_cols(vs, am) %>%
tab_stat_rpct() %>%
tab_cols(total(label = "Number of hosts")) %>%
# specify total statistic for last column
tab_stat_cases(total_statistic = "u_cases") %>%
tab_pivot(stat_position = "outside_columns") %>%
recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy) %>%
# remove columns with FALSE
except(contains("FALSE")) %>%
compute(
# remove '#' sign from row labels
row_labels = gsub("#", "", row_labels)
)
# remove '#' sign from column labels
colnames(tbl) = gsub("\\|TRUE", "", colnames(tbl))
tbl %>%
split_table_to_df() %>%
kable(align = "c", digits = 1) %>%
kable_styling(bootstrap_options = c("striped", "condensed", "responsive"),
full_width = F, position = "center") %>%
row_spec(1:2, bold = TRUE)

LaTex table in knitr with complex structure (rotating multirow text, removing column separators)

I need to create a latex table in RStudio for pdf output with the following structure:
This table was created for html output with the following code:
mat <- data.frame(a = c("column header","column header"),
rowx=c("row1","row2"),b = c("a","b"),
c = c("x","y"))
kable(mat, align = "c",col.names = c("","","v1","v2")) %>%
kable_styling(bootstrap_options = "striped", full_width = F,
position = "left",font_size = 12) %>%
column_spec(1, bold = T,width="2em",extra_css="transform: rotate(-90deg);") %>%
collapse_rows(columns = 1, valign = "middle") %>%
add_header_above(c(" " = 2, "row header" = 2))
I need to create a similar structure with LaTeX tables.
His is how far I got:
mat <- data.frame(a = c("column header","column header"),
rowx=c("row1","row2"),b = c("a","b"),c = c("x","y"))
kable(mat, align = "c",col.names = c("","","v1","v2")) %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "left",font_size = 12) %>%
collapse_rows(columns = 1, latex_hline = "none") %>%
add_header_above(c(" " = 2, "rows" = 2))
So I still need at least 2 more things:
rotate the label in the very first column
remove the spurious leftmost column separator in the second row.
Can this be achieved with kableExtra commands and parameters?
Here's a shot with huxtable (my package):
as_hux(mat, add_colnames = TRUE) %>%
insert_row(c("", "", "rows", "")) %>%
merge_cells(3:4, 1) %>%
merge_cells(1, 3:4) %>%
merge_cells(1:2, 1:2) %>%
set_rotation(3, 1, 90) %>%
set_bottom_border(0.4) %>%
set_bold(1:2, everywhere, TRUE) %>%
set_wrap(3, 1, TRUE) %>%
set_bottom_padding(4, -1, 48) %>%
set_bottom_padding(3, -1, 30) %>%
set_row_height(c("1em", "1em", "1.5em", "1.5em")) %>%
quick_pdf()
I have to admit, this took a lot of tweaking. TeX tables are hard to understand....

Row indentation with add_indent() in 2nd column in kableExtra

I try to indent cells of the 2nd column of a dataframe using the kableExtra-package for RMarkdown. It seems add_indent() only works for the first column, therefore does not change anything in my table of the reprex below:
Reprex with dummy data:
---
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(kableExtra)
group <- c(1, NA, NA, 2, NA, NA)
quest <- c("How is your mood today?", "good or very good", "bad or very bad",
"What colour is your hair?", "brown", "other")
percent <- c(NA, 80, 20, NA, 50, 50)
df <- tibble(group, quest, percent)
```
## Reprex
```{r, echo=TRUE}
# output without add_indent()
kable(df, booktabs = T, escape = T) %>%
add_header_above(header = c("Group" = 1,
"Question & answer options" = 1,
" %Agreement" = 1)) %>%
gsub("NA", " ", .)
```
```{r with indent, echo=TRUE}
# output with add_indent()
kable(df, booktabs = T, escape = T) %>%
add_header_above(header = c("Group" = 1,
"Question & answer options" = 1,
" %Agreement" = 1)) %>%
gsub("NA", " ", .) %>%
add_indent(positions = c(2,3,5,6))
Desired output: I would like to indent the rows 2, 3, 5, 6 of the 2nd column (that the answer options are intended below the questions and ideally also in italics). Italics could also be covered with cell_spec() but that works just column wise I think.
Is my desired output possible? (I guess it does not make sense to mix questions & answer options, but to keep up the format of an earlier report we would like to try it that way?)
Here are two possible ways.
Use kableExtra::group_rows isntead of an extra group column.
Add the indentation (kableExtra adds 1em) manually using cell_spec.
Option 1
```{r, echo = F}
df <- data.frame(quest, percent)
df %>%
mutate(quest = cell_spec(quest, italic = ifelse(row_number() %in% c(2,3,5,6), T, F))) %>%
kable(booktabs = T, escape = F) %>%
add_indent(c(2,3,5,6)) %>%
group_rows("Group 1", 1, 3) %>%
group_rows("Group 2", 4, 6) %>%
gsub("NA", " ", .)
```
Option 2
```{r, echo = F}
df <- data.frame(group, quest, percent)
df %>%
mutate(quest = cell_spec(quest, italic = ifelse(row_number() %in% c(2,3,5,6), T, F)),
quest = ifelse(row_number() %in% c(2,3,5,6), paste0("\\hspace{1em}", quest), quest)) %>%
kable(booktabs = T, escape = F) %>%
gsub("NA", " ", .)
```

Resources