Grid Table Format in R - r

I am using Expss package to create Tables in R.
I am have 5 Statements each statement Have 5 brands.
5 Statements are in 5 consecutive variables like a1,a2,a3,a4,a5
Can i have table in grid format like the following?
enter image description here

There are two solutions: one is verbose but not scalable, second is scalable but is not very simple. Both solutions are based on the idea that we reposition labels from variables to banner location.
library(expss)
# create sample of data
set.seed(123)
N = 150
df = data.frame(
st1 = sample(paste0("brand", 1:5), N, replace = TRUE),
st2 = sample(paste0("brand", 1:5), N, replace = TRUE),
st3 = sample(paste0("brand", 1:5), N, replace = TRUE),
st4 = sample(paste0("brand", 1:5), N, replace = TRUE),
st5 = sample(paste0("brand", 1:5), N, replace = TRUE)
) %>% apply_labels(
st1 = 'Statement 1',
st2 = 'Statement 2',
st3 = 'Statement 3',
st4 = 'Statement 4',
st5 = 'Statement 5'
)
# verbose solution with Tab_*. It is not scalable for large number of variables
# manipulation with variable labels is needed to repostion variable labels from rows to column
df %>%
tab_total_row_position("above") %>%
tab_cells("|" = drop_var_labs(st1)) %>%
tab_stat_cpct(label = var_lab(st1)) %>%
tab_cells("|" = drop_var_labs(st2)) %>%
tab_stat_cpct(label = var_lab(st2)) %>%
tab_cells("|" = drop_var_labs(st3)) %>%
tab_stat_cpct(label = var_lab(st3)) %>%
tab_cells("|" = drop_var_labs(st4)) %>%
tab_stat_cpct(label = var_lab(st4)) %>%
tab_cells("|" = drop_var_labs(st5)) %>%
tab_stat_cpct(label = var_lab(st5)) %>%
tab_pivot(stat_position = "inside_columns") %>%
tab_transpose()
# solution wich will work for arbirary number of variables
df %>%
calculate(
lapply(st1 %to% st5, function(item)
# manipulation with variable labels is needed to repostion variable labels from rows to column
cro(list(drop_var_labs(item)), list(var_lab(item)), total_row_position = "above")
)
) %>%
Reduce("%merge%", .) %>%
tab_transpose()

Related

R Highcharter - highlight same group in multiple stacked columns chart + order groups in columns

R newbie here :)
I have recently started using R library Highcharter as an alternative to ggplot2.
This is the sample code I am currently working on:
library(highcharter)
library(dplyr)
## Sample dataframe
YEAR <- c(2019,2020,2021)
CATEGORY <- c("dog", "cat", "mouse")
SAMPLE_DATA <- expand.grid(YEAR, CATEGORY)
names(SAMPLE_DATA)[1] <- "CATEGORY"
names(SAMPLE_DATA)[2] <- "YEAR"
SAMPLE_DATA$VALUE <- runif(n = 9, min = 400, max = 900)
## Chart
SAMPLE_DATA <- SAMPLE_DATA %>%
group_by(YEAR, CATEGORY) %>%
summarise(VALUE = sum(VALUE, na.rm = T))
highchart() %>%
hc_add_series(data = SAMPLE_DATA, hcaes(x = YEAR, y = round(VALUE,0), group = CATEGORY), type = "column") %>%
hc_plotOptions(column = list(stacking = "normal"))
What I am trying to do is:
Sort how the group "CATEGORY" is piled in each column, based on ascending/descending "VALUE"
Have that effect which highlights the same group in all columns as you hover over it
Does anyone have an idea? Thank you!
This is a late answer but I believe this is what you want.
Adding the data again because I think you swapped some column names on accident:
YEAR <- c(2019,2020,2021)
CATEGORY <- c("dog", "cat", "mouse")
SAMPLE_DATA <- expand.grid(YEAR, CATEGORY)
names(SAMPLE_DATA)[1] <- "YEAR"
names(SAMPLE_DATA)[2] <- "CATEGORY"
SAMPLE_DATA$VALUE <- runif(n = 9, min = 400, max = 900)
## Chart
SAMPLE_DATA <- SAMPLE_DATA %>%
group_by(YEAR, CATEGORY) %>%
summarise(VALUE = sum(VALUE, na.rm = T))
Creating plot:
SAMPLE_DATA %>%
ungroup() %>%
mutate(YEAR = factor(YEAR) %>% fct_reorder(VALUE, .desc = TRUE)) %>%
mutate(year_index = as.numeric(YEAR)) %>%
hchart(
type = "column",
hcaes(x = year_index,
y = VALUE,
group = CATEGORY,
name = YEAR),
) %>%
hc_xAxis(type = "category", labels = list(step = 1)) %>%
hc_plotOptions(series = list(stacking = TRUE))

How to display results from only select subgroups + the whole data frame in an expss table?

Apologies for what is a pretty basic question... I am loving using the expss package for table creation, but am having trouble working through some of the output display. Specifically, I have a situation where my data frame contains a grouping variable as well as a few variables that will be summarized. I'd like to create output that displays certain summary statistics for each value of the subgroup in turn (each value of the grouping variable) plus the total for the whole sample. Something like the code below, but appending the output1 and output2 objects together in a single table that maintains the formatting of expss's RStudio Viewer output.
library(expss)
set.seed(12345)
df <- data.frame(group = rep(1:5, each = 4),
varA = sample(1:4, 20, replace = TRUE),
varB = sample(6:9, 20, replace = TRUE))
output1 <- df[df$group == 1, ] %>%
tab_cells(varA, varB) %>%
tab_cols(total(label = "")) %>%
tab_stat_fun("Valid N" = w_n, "Mean" = w_mean, "SD" = w_sd,
"Median" = w_median, method = list) %>%
tab_pivot() %>%
set_caption("Group 1")
output2 <- df %>%
tab_cells(varA, varB) %>%
tab_cols(total(label = "")) %>%
tab_stat_fun("Valid N" = w_n, "Mean" = w_mean, "SD" = w_sd,
"Median" = w_median, method = list) %>%
tab_pivot() %>%
set_caption("All Groups")
expss_output_viewer()
output1
output2
I know that I can add tab_rows(group) to the piping which will display all of the groups; however, I am only interested in displaying each group in turn (plus the total), not all groups, for output.
There are special function for subgroups: tab_subgroup:
library(expss)
set.seed(12345)
df <- data.frame(group = rep(1:5, each = 4),
varA = sample(1:4, 20, replace = TRUE),
varB = sample(6:9, 20, replace = TRUE))
output <- df %>%
tab_cells(varA, varB) %>%
tab_cols(total(label = "")) %>%
tab_subgroup(group == 1) %>%
tab_row_label("Group 1") %>%
tab_stat_fun("Valid N" = w_n, "Mean" = w_mean, "SD" = w_sd,
"Median" = w_median, method = list) %>%
tab_row_label("All Groups") %>%
tab_subgroup() %>%
tab_stat_fun("Valid N" = w_n, "Mean" = w_mean, "SD" = w_sd,
"Median" = w_median, method = list) %>%
tab_pivot()
expss_output_viewer()
output
Alternatively, you can use tab_rows and net:
library(expss)
set.seed(12345)
df <- data.frame(group = rep(1:5, each = 4),
varA = sample(1:4, 20, replace = TRUE),
varB = sample(6:9, 20, replace = TRUE))
output <- df %>%
tab_cells(varA, varB) %>%
tab_cols(total(label = "")) %>%
tab_rows(net(group, "Group 1" = 1, "All Groups" = 1:5, position = "above")) %>%
tab_stat_fun("Valid N" = w_n, "Mean" = w_mean, "SD" = w_sd,
"Median" = w_median, method = list) %>%
tab_pivot()
expss_output_viewer()
output

Skip "zero" level of dichotomous variables in expss tables

I want to create a summary table for some dichotomous variables using the expss package. Since the variables are dichotomous, one of the two levels would the sufficient to "show the picture".
I tried to use the function tab_net_cell, but was not able to get the right results. Here is some example code with BrCa (Breast cancer) with 1 or 0. I only want to show the number of patients with but not without breast cancer.
df <- data.frame(BrCa = c(1,1,1,0,0,0,NA,NA,0,0))
df$group <- c(1,2,1,2,1,2,1,2,1,2)
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa) %>%
expss::tab_stat_cpct(total_row_position = "none",label = "%") %>%
expss::tab_stat_cases(total_row_position = "none",label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows")
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa) %>%
expss::tab_net_cells("BrCa" = eq(1)) %>%
expss::tab_stat_cpct(total_row_position = "none",label = "%") %>%
expss::tab_stat_cases(total_row_position = "none",label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows")
The simplest way is to filter resulted table:
df <- data.frame(BrCa = c(1,1,1,0,0,0,NA,NA,0,0))
df$group <- c(1,2,1,2,1,2,1,2,1,2)
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa) %>%
expss::tab_stat_cpct(total_row_position = "none",label = "%") %>%
expss::tab_stat_cases(total_row_position = "none",label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows") %>%
expss::where(grepl(1, row_labels))
Another way is to use mean and sum instead of cpct and cases:
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa*100) %>%
expss::tab_stat_mean(label = "%") %>%
expss::tab_stat_sum(label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows")

Normalising data with dplyr mutate() brings inconsistencies

I'm trying to reproduce the framework from this blogpost http://www.luishusier.com/2017/09/28/balance/ with the following code but it looks like I get inconsistent results
library(tidyverse)
library(magrittr)
ids <- c("1617", "1516", "1415", "1314", "1213", "1112", "1011", "0910", "0809", "0708", "0607", "0506")
data <- ids %>%
map(function(i) {read_csv(paste0("http://www.football-data.co.uk/mmz4281/", i ,"/F1.csv")) %>%
select(Date:AST) %>%
mutate(season = i)})
data <- bind_rows(data)
data <- data[complete.cases(data[ , 1:3]), ]
tmp1 <- data %>%
select(season, HomeTeam, FTHG:FTR,HS:AST) %>%
rename(BP = FTHG,
BC = FTAG,
TP = HS,
TC = AS,
TCP = HST,
TCC = AST,
team = HomeTeam)%>%
mutate(Pts = ifelse(FTR == "H", 3, ifelse(FTR == "A", 0, 1)),
Terrain = "Domicile")
tmp2 <- data %>%
select(season, AwayTeam, FTHG:FTR, HS:AST) %>%
rename(BP = FTAG,
BC = FTHG,
TP = AS,
TC = HS,
TCP = AST,
TCC = HST,
team = AwayTeam)%>%
mutate(Pts = ifelse(FTR == "A", 3 ,ifelse(FTR == "H", 0 , 1)),
Terrain = "Extérieur")
tmp3 <- bind_rows(tmp1, tmp2)
l1_0517 <- tmp3 %>%
group_by(season, team)%>%
summarise(j = n(),
pts = sum(Pts),
diff_but = (sum(BP) - sum(BC)),
diff_t_ca = (sum(TCP, na.rm = T) - sum(TCC, na.rm = T)),
diff_t = (sum(TP, na.rm = T) - sum(TC, na.rm = T)),
but_p = sum(BP),
but_c = sum(BC),
tir_ca_p = sum(TCP, na.rm = T),
tir_ca_c = sum(TCC, na.rm = T),
tir_p = sum(TP, na.rm = T),
tir_c = sum(TC, na.rm = T)) %>%
arrange((season), desc(pts), desc(diff_but))
Then I apply the framework mentioned above:
l1_0517 <- l1_0517 %>%
mutate(
# First, see how many goals the team scores relative to the average
norm_attack = but_p %>% divide_by(mean(but_p)) %>%
# Then, transform it into an unconstrained scale
log(),
# First, see how many goals the team concedes relative to the average
norm_defense = but_c %>% divide_by(mean(but_c)) %>%
# Invert it, so a higher defense is better
raise_to_power(-1) %>%
# Then, transform it into an unconstrained scale
log(),
# Now that we have normalized attack and defense ratings, we can compute
# measures of quality and attacking balance
quality = norm_attack + norm_defense,
balance = norm_attack - norm_defense
) %>%
arrange(desc(norm_attack))
When I look at the column norm_attack, I expect to find the same value for equivalent but_p values, which is not the case here:
head(l1_0517, 10)
for instance when but_p has value 83, row 5 and row 7, I get norm_attack at 0.5612738 and 0.5128357 respectively.
Is it normal? I would expect mean(l1_0517$but_p) to be fixed and therefore obtaining the same result when a value of l1_0517$but_p is log normalised?
UPDATE
I have tried to work on a simpler example but I can't reproduce this issue:
df <- tibble(a = as.integer(runif(200, 15, 100)))
df <- df %>%
mutate(norm_a = a %>% divide_by(mean(a)) %>%
log())
I found the solution after looking at the type of l1_0517
It is a grouped_df hence the different results.
The correct code is:
l1_0517 <- tmp3 %>%
group_by(season, team)%>%
summarise(j = n(),
pts = sum(Pts),
diff_but = (sum(BP) - sum(BC)),
diff_t_ca = (sum(TCP, na.rm = T) - sum(TCC, na.rm = T)),
diff_t = (sum(TP, na.rm = T) - sum(TC, na.rm = T)),
but_p = sum(BP),
but_c = sum(BC),
tir_ca_p = sum(TCP, na.rm = T),
tir_ca_c = sum(TCC, na.rm = T),
tir_p = sum(TP, na.rm = T),
tir_c = sum(TC, na.rm = T)) %>%
ungroup() %>%
arrange((season), desc(pts), desc(diff_but))

Pretty tables with cumulative count / percentage and group totals using R "tables" package

I am trying to produce a formatted html table which has columns for frequency, cumulative frequency, column percentage, and cumulative column percentage. The table should also have the data subsetted by a grouping variable, and including a group total.
I can almost achieve this using a combination of dplyr and tidyr, but the output is a dataframe which doesn't look so pretty. I wonder if there is an easier way using the tables::tabulate command?
# Sample data
dat <- data.frame(
id = 1:100,
group = factor(sample(c("A", "B"), 100, replace = TRUE)),
sessions = factor(sample(1:10, 100, replace = TRUE))
)
# dplyr/tidyr solution
library(dplyr)
library(tidyr)
dat %>%
group_by(group, sessions) %>%
tally() %>%
spread(key = group, value = n) %>%
mutate(All = rowSums(.[-1])) %>%
gather(key = group, value = n, -sessions) %>%
group_by(group) %>%
mutate(
cum_n = cumsum(n),
p = round(n / sum(n)*100,1),
cum_p = round(cum_n / sum(n)*100,1),
) %>%
data.frame() %>%
reshape(timevar = "group", idvar = "sessions", direction = "wide")
# As far as I get using tables::tabulate
library(tables)
tabular(
Factor(sessions, "Sessions") ~
(Heading()*group + 1) *
(
(n = 1) +
# (cum_n = ??) +
Heading("%")*Percent(denom = "col")*Format(digits = 2)
# + Heading("cum_%")*??*Format(digits = 2)
),
data = dat
)
I would recommend using knitr::kable and kableExtra, amazing packages for producing tables. You can also set it up for multiple format outputs, for example using the same code to produce html and latex for pdf.
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)
dat %>%
group_by(group, sessions) %>%
tally() %>%
spread(key = group, value = n) %>%
mutate(All = rowSums(.[-1])) %>%
gather(key = group, value = n, -sessions) %>%
group_by(group) %>%
mutate(
cum_n = cumsum(n),
p = round(n / sum(n)*100,1),
cum_p = round(cum_n / sum(n)*100,1),
) %>%
data.frame() %>%
reshape(timevar = "group", idvar = "sessions", direction = "wide") %>%
kable("html") %>%
kable_styling(bootstrap_options = c("striped", "hover"))

Resources