how to further refine expss table format? - r

I am trying to improve my table design using expss. My current design is shown below using the following code:
library(expss)
# bogus example data
x<-structure(list(visits= structure(c(17, 2, 23, 1, 21), label = "Total # Home Visits", class = c("labelled", "numeric")), months_enrolled = structure(c(21.42474, 51.105, 52.474, 53.75, 60.0392105), label = "Enrollment Duration (months)", class =c("labelled","numeric")), marital2 = structure(c("Married", NA, "Married", "Married", "Married"), label = "Marital Status", class = c("labelled", "character")), Relationship2 = structure(c("Mother", "Mother", "Mother", "Mother", "Mother"), label = "Relationship (recoded)", class = c("labelled", "character"))), row.names = c(NA, 5L), class = "data.frame")
htmlTable(x %>%
tab_cells(visits,months_enrolled) %>%
tab_rows(marital2, Relationship2, total()) %>% tab_stat_fun(Mean = w_mean, "Valid N" = w_n, method = list) %>%
tab_pivot() %>%
set_caption("Table 6: Bogus Visits and Duration by Characteristics") %>%
htmlTable(.,css.cell = c("width: 220px", # first column width
rep("width: 50px", ncol(.) - 1))))
I'd like to improve the table design by placing the mean statistics for Home Visits and Enrollment Duration as columns, thus saving a row for each level of Marital Status (and other vars in tab_rows). How is this achieved? Also, is it possible to shade alternate rows?

It seems, the simplest way is to transpose table:
htmlTable(x %>%
tab_cells(visits, months_enrolled) %>%
tab_cols(marital2, Relationship2, total()) %>%
tab_rows(total(label = "|")) %>%
tab_stat_fun(Mean = w_mean, "Valid N" = w_n) %>%
tab_pivot() %>%
tab_transpose() %>%
set_caption("Table 6: Bogus Visits and Duration by Characteristics") %>%
htmlTable(.,css.cell = c("width: 220px", # first column width
rep("width: 50px", ncol(.) - 1))))

Related

Grouping By Multiple Selection Answer for Likert Package

I wanted to create a likert graph that is grouped by Question i. I can create the likert graph for total responses ungrouped, but im uncertain of how to reformat question 6 without losing the column for question i. (aka do the reformatting done below but also have it take into account who selected what in question i.)
What I want is the sufficiency of Q6 grouped by their answer in question i.
Sample Dataframe:
SurveyClean2 <- data.frame(i = c("Mail,Email", "Mail", "Mail,Email,Podcast", "Radio,Podcast", "Radio", "Mail,Radio"), Q6_3 = c("Not Sufficient", "Very Sufficient", "Completely Sufficient", "Moderately Sufficient", "Moderately Sufficient", "Not Sufficient"))
Unnesting Question i:
UnnestQi <- SurveyClean2 %>%
as_tibble() %>%
mutate(i = str_split(Q3, ",")) %>%
unnest(i)
Survey2Q6 <- UnnestQi |> drop_na(Q5) |> drop_na(i)
Reformating Question 6 to Likert-friendly format:
clean_survey <- function(data, column, question) {
data %>%
dplyr::select(all_of({{column}})) %>%
dplyr::mutate(Question = question) %>%
dplyr::group_by(Question, across(1)) %>%
dplyr::count() %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = 2, values_from = n)
}
# table that contains survey questions/columns and the question name
survey_table <- dplyr::tibble(
column = c("Q6_3"),
question = c("Expert advice")
)
# loop through your data and clean it, then bind as dataframe
LikertGroupqi62 <- purrr::map2_df(survey_table$column, survey_table$question, function(x, y){
clean_survey(Survey2Q6, x, y)}) |>
mutate(across(everything(), ~ifelse(is.na(.), 0, .)))
## Likert
LikertGroupqi62 <- LikertGroupqi62 |> dplyr::select(Question, `Not Sufficient`, `Slightly Sufficient`, `Moderately Sufficient`, `Very Sufficient`, `Completely Sufficient`)
Likert WITHOUT grouping:
likert(Question~., LikertGroupqi62, ReferenceZero = 0, auto.key.in = list(columns = 1), main = list("Sufficiency of Cost-share Advice Based on Person or Agency Worked With"), col = c("#db6d00", "#924900", "#000000", "#004949", "#009292"),strip.left = FALSE, ylab = "", xlab = "Total Number of Respondents")

how to use R if statements to make distinctions between two options of a variable

I am new to R and would like to understand how to use the if statement/if-else function to make distinctions between two options of a variable, which in my case would be gender. Specifically, i would like to use this dataset to see if there is a difference between men and women in having a bank account on their name(-1 and -2 are women and men, from 1:5 it's the options for the financial account).
dput(Finaccount)
> dput(head(Finaccount))
structure(list(finaccount = c("0", "0", "0", "2 6", "0", "0"),
genderresp = c(-2, -1, -2, -2, -2, -2)), row.names = c(NA,
6L), class = "data.frame")
Next, i will create a chi squared table, but i would also find helpful any advice on how to show the direction of the data (so not only that there is a significance but that one gender has more bank accounts on their name than others ).
I am using other ways to make the gendered distinction, mostly by creating two datasets with subset or grepl like in the examples below but i am assuming there are easier ways.
RespSeason <-- data.frame(respjobhh, gendercleaned)
View(RespSeason)
RespseasonCleaned = na.omit(RespSeason)
View(RespseasonCleaned)
Seasonwoman <-- subset(RespseasonCleaned, gendercleaned == "-2")
Seasonmen<-- subset(RespseasonCleaned, gendercleaned == "-1")
gender = hh_complete_fixed_cleaned[,22 ]
gendercleaned = na.omit(gender)jobhh = hh_complete_fixed_cleaned[, 128]
jobhhcleaned = na.omit(jobhh)
sapply(jobhhcleaned, mean, na.rm = TRUE)
data.frame(jobhh, gendercleaned)
View(jobhh, gendercleaned)
genderjobbhh <-- data.frame(jobhh, gendercleaned)
View(genderjobbhh)
malehh <--genderjobbhh[!grepl('2', genderjobbhh$gendercleaned),]
View(malehh)
sapply(malehh, mean, na.rm = TRUE)
femalehh <--genderjobbhh[!grepl('1', genderjobbhh$gendercleaned),]
View(femalehh)
sapply(femalehh, mean, na.rm = TRUE)
Thank you in advance for your contribution
The below might help if you a looking to get a mean by a group:
library(tidyverse)
df %>%
mutate(gender = if_else(gender == -1, "male", if_else(gender == -2, "female", "unknown"))) %>%
group_by(gender) %>%
summerize(mean_gender = mean(gender))
or
library(tidyverse)
df %>%
mutate(gender = as.character(gender),
gender = recode(gender, "-1" = "male", "-2" = "female")) %>%
group_by(gender) %>%
summerize(mean_gender = mean(gender))

Count Backwards in String until pattern R

I'm trying to extract UPCs from item descriptions. There is a varying number of /'s in the front of the description, but the UPC is always right before the last /, so I was using a count of characters, however, there is a variable number of characters at the end based on pack size. In the replication, you can see on the first row what this is supposed to look like at the end, but the second row has dropped the first digit of the UPC and picked up the /. Looking for a way to do this inline with DPLYR. My original code is under the replication.
test <- structure(list(Month = structure(c(17987, 17987), class = "Date"),store_id = c("7005", "7005"), UPC = c("000004150860081","00001200050404/"), `Item Description` = c("ACQUA PANNA SPRING WATER/EACH/000004150860081/1","AQUAFINA 24PK/24PK/000001200050404/24"), `Cals Item Description` = c(NA_character_,NA_character_), `Sub-Category` = c(NA_character_, NA_character_), Category = c(NA_character_, NA_character_), Department = c(NA_character_,NA_character_), `Sales Dollars` = c(17.43, 131.78), Units = c(7,528), Cost = c(8.4, 112.2), `Gross Margin` = c(9.03, 19.58), `Gross Margin %` = c(0.5181, 0.1486)), row.names = c(NA,-2L), class = c("tbl_df", "tbl", "data.frame"))
foo <- list.files(pattern = "*.csv", full.names = T) %>%
map_df(~read_csv(.)) %>%
mutate(date = lubridate::mdy(str_sub(textbox43, start = -10))) %>%
mutate(store_id = str_sub(textbox6, start = 1, end = 4)) %>%
mutate(item_desc = textbox57) %>%
filter(!is.na(item_desc), item_desc != "") %>%
mutate(dollars = textbox58,
units = textbox59,
cost = textbox61,
gm = textbox66,
gm_pct = textbox67) %>%
mutate(UPC = str_sub(item_desc, start = -17, end = -3))
Is this what you want?
sub("^.*/([^/]+)/[^/]*$",
"\\1",
test$`Item Description`)
Returns:
[1] "000004150860081" "000001200050404"
Edit: You were asking for dplyr style:
test %>%
mutate(item_id = sub("^.*/([^/]+)/[^/]*$",
"\\1",
test$`Item Description`))

How to convert data with different levels of information into wide format? [duplicate]

This question already has an answer here:
Reshaping data.frame with a by-group where id variable repeats [duplicate]
(1 answer)
Closed 2 years ago.
I have a data of patients' operations/procedures (example as shown in the picture below) where one row describes a patient's procedure. There are 2 levels of information,
the first being the operation details, i.e. op_start_dt, priority_operation and asa_status
the second being the procedure details, i.e. proc_desc and proc_table
An operation can have more than 1 procedures. In the example below, patient A has 2 operations (defined by distinct op_start_dt). In his first operation, he had 1 procedure (defined by distinct proc_desc) and in his second, he had 2 procedures.
I would like to convert the data into a wide format, where a patient only has one row, and his information will be arranged operation by operation and within each operation, it will be arrange procedure by procedure, as shown below. So, proc_descxy refers to the proc_desc on xth operation and yth procedure.
Data:
df <- structure(list(patient = c("A", "A", "A"), department = c("GYNAECOLOGY /OBSTETRICS DEPT",
"GYNAECOLOGY /OBSTETRICS DEPT", "GYNAECOLOGY /OBSTETRICS DEPT"
), op_start_dt = structure(c(1424853000, 1424870700, 1424870700
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), priority_operation = c("Elective",
"Elective", "Elective"), asa_status = c(2, 3, 3), proc_desc = c("UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY",
"KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC",
"HEART, VARIOUS LESIONS, HEART TRANSPLANTATION"), proc_table = c("99",
"6A", "7C")), row.names = c(NA, 3L), class = "data.frame")
Desired output:
df <- structure(list(patient = "A", department = "GYNAECOLOGY /OBSTETRICS DEPT",
no_op = 2, op_start_dt1 = structure(1424853000, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), no_proc1 = 1, priority_operation1 = "Elective",
asa_status1 = 2, proc_desc11 = "UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY",
proc_table11 = "99", op_start_dt2 = structure(1424870700, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), no_of_proc2 = 2, priority_operation2 = "Elective",
asa_status2 = 3, proc_desc21 = "KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC",
proc_table21 = "6A", proc_desc22 = "HEART, VARIOUS LESIONS, HEART TRANSPLANTATION",
proc_table22 = "7C"), row.names = 1L, class = "data.frame")
My attempt:
I tried to work this out, but it gets confusing along the way, with pivot_longer then pivot_wideragain.
df %>%
# Operation-level Information
group_by(patient) %>%
mutate(op_nth = dense_rank(op_start_dt),
no_op = n_distinct(op_start_dt)) %>%
# Procedure-level Information
group_by(patient, op_start_dt) %>%
mutate(proc_nth = row_number(),
no_proc = n_distinct(proc_desc)) %>%
ungroup() %>%
# Make pivoting easier
mutate_all(as.character) %>%
# Pivot Procedure-level Information
pivot_longer(-c(patient, department, no_op, op_nth, proc_nth)) %>%
# Remove the indices for "Procedure" for Operation_level Information
mutate(proc_nth = case_when(!(name %in% c("op_start_dt", "no_proc", "priority_operation", "asa_status")) ~ proc_nth)) %>%
# Create the column names
unite(name, c(name, op_nth, proc_nth), sep = "", na.rm = TRUE) %>%
distinct() %>%
pivot_wider(names_from = name, values_from = value)
Create a unique ID column for each patient and then use pivot_wider.
library(dplyr)
df %>%
group_by(patient) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = row, values_from = op_start_dt:proc_table)

How to create a row by dividing First row by third row

I have a dataset which has values in first row & total in third row. I want to create a fourth row which is percentage of first by total which can be done by dividing first row with fourth row.
below is structure of dataframe
ds = structure(list(t1 = structure(c("1", "2", "Total"), label = "currently smoke any tobacco product", labels = c(no = 0,
yes = 1), class = "haven_labelled"), c1Female = c(679357.516868591,
8394232.81394577, 9073590.33081436), c1Male = c(2254232.8617363,
5802560.20343018, 8056793.06516647), se.c1Female = c(63743.4459540534,
421866.610586848, 485610.056540901), se.c1Male = c(185544.754820322,
386138.725133411, 571683.479953732), Total_1 = c(`1` = 2933590.37860489,
`2` = 14196793.0173759, `3` = 17130383.3959808), per = c(`1` = 0.171250713471665,
`2` = 0.828749286528335, `3` = 1)), class = "data.frame", row.names = c(NA,
-3L))
My try & what is wrong with this
ds %>% mutate(percentage = .[1,]/.[3,])
OUTPUT SHOULD BE : Below is the dput of Output Dataframe that I want
structure(list(t1 = structure(c(1L, 2L, 4L, 3L), .Label = c("1",
"2", "Percentage", "Total"), class = "factor"), c1Female = c(679357.517,
8394232.814, 9073590.331, 0.074871963), c1Male = c(2254232.86,
5802560.2, 8056793.07, 0.279792821), se.c1Female = c(63743.446,
421866.611, 485610.057, 0.131264674), se.c1Male = c(185544.755,
386138.725, 571683.48, 0.324558539), Total_1 = c(2933590.38,
14196793.02, 17130383.4, 0.171250714), per = c(0.171250713, 0.828749287,
1, 0.171250713)), class = "data.frame", row.names = c(NA, -4L
))
Do share the tidyverse way to do this. Also, do tell what is wrong with this approach below line code
ds %>% mutate(percentage = .[1,]/.[3,])
We can use summarise_at to divide multiple column values to return a single row and then bind with the original dataset
library(dplyr)
ds %>%
summarise_at(-1, ~ .[1]/.[3]) %>%
mutate(t1 = 'Percentage') %>%
bind_rows(ds, .)
# t1 c1Female c1Male se.c1Female se.c1Male Total_1 per
#1 1 6.793575e+05 2.254233e+06 6.374345e+04 1.855448e+05 2.933590e+06 0.1712507
#2 2 8.394233e+06 5.802560e+06 4.218666e+05 3.861387e+05 1.419679e+07 0.8287493
#3 Total 9.073590e+06 8.056793e+06 4.856101e+05 5.716835e+05 1.713038e+07 1.0000000
#4 Percentage 7.487196e-02 2.797928e-01 1.312647e-01 3.245585e-01 1.712507e-01 0.1712507
Or another option is add_row
ds %>%
add_row(t1 = 'Percentage') %>%
mutate_at(-1, ~ replace_na(., .[1]/.[3]))
Or do this within the add_row step itself
ds %>%
add_row(t1 = 'Percentage', !!!as.list(.[-1][1,]/.[-1][3,]))
# t1 c1Female c1Male se.c1Female se.c1Male Total_1 per
#1 1 6.793575e+05 2.254233e+06 6.374345e+04 1.855448e+05 2.933590e+06 0.1712507
#2 2 8.394233e+06 5.802560e+06 4.218666e+05 3.861387e+05 1.419679e+07 0.8287493
#3 Total 9.073590e+06 8.056793e+06 4.856101e+05 5.716835e+05 1.713038e+07 1.0000000
#4 Percentage 7.487196e-02 2.797928e-01 1.312647e-01 3.245585e-01 1.712507e-01 0.1712507

Resources