Grouping By Multiple Selection Answer for Likert Package - r

I wanted to create a likert graph that is grouped by Question i. I can create the likert graph for total responses ungrouped, but im uncertain of how to reformat question 6 without losing the column for question i. (aka do the reformatting done below but also have it take into account who selected what in question i.)
What I want is the sufficiency of Q6 grouped by their answer in question i.
Sample Dataframe:
SurveyClean2 <- data.frame(i = c("Mail,Email", "Mail", "Mail,Email,Podcast", "Radio,Podcast", "Radio", "Mail,Radio"), Q6_3 = c("Not Sufficient", "Very Sufficient", "Completely Sufficient", "Moderately Sufficient", "Moderately Sufficient", "Not Sufficient"))
Unnesting Question i:
UnnestQi <- SurveyClean2 %>%
as_tibble() %>%
mutate(i = str_split(Q3, ",")) %>%
unnest(i)
Survey2Q6 <- UnnestQi |> drop_na(Q5) |> drop_na(i)
Reformating Question 6 to Likert-friendly format:
clean_survey <- function(data, column, question) {
data %>%
dplyr::select(all_of({{column}})) %>%
dplyr::mutate(Question = question) %>%
dplyr::group_by(Question, across(1)) %>%
dplyr::count() %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = 2, values_from = n)
}
# table that contains survey questions/columns and the question name
survey_table <- dplyr::tibble(
column = c("Q6_3"),
question = c("Expert advice")
)
# loop through your data and clean it, then bind as dataframe
LikertGroupqi62 <- purrr::map2_df(survey_table$column, survey_table$question, function(x, y){
clean_survey(Survey2Q6, x, y)}) |>
mutate(across(everything(), ~ifelse(is.na(.), 0, .)))
## Likert
LikertGroupqi62 <- LikertGroupqi62 |> dplyr::select(Question, `Not Sufficient`, `Slightly Sufficient`, `Moderately Sufficient`, `Very Sufficient`, `Completely Sufficient`)
Likert WITHOUT grouping:
likert(Question~., LikertGroupqi62, ReferenceZero = 0, auto.key.in = list(columns = 1), main = list("Sufficiency of Cost-share Advice Based on Person or Agency Worked With"), col = c("#db6d00", "#924900", "#000000", "#004949", "#009292"),strip.left = FALSE, ylab = "", xlab = "Total Number of Respondents")

Related

loading combined data in R

I have got several Excel files (see link: we.tl/t-qJl3kVcY0j) that I combine together.
Each Excel file have columns from A to AF as below:
t-ph
Load
HR
BF
V'E
V'O2
V'CO2
d O2/dW
RER
EqO2
EqCO2
PETCO2
VES (ml)
VESi (ml/m²)
FC (bpm)
QC (l/min)
IC (l/min/m²)
PAS (mmHg)
PAD (mmHg)
PAM (mmHg)
ICT
TEV (ms)
RPD (%)
WCI (kg.m/m²)
RVSi (dyn.s/cm5.m²)
RVS (dyn.s/cm5)
VTD est (ml)
FE est (%)
O2Hb
HHb
tHb
HbDiff
My previous code was working, but since I have added columns AC (HHb) to AF (HbDiff), I can't load them anymore. I have tried to match the number of columns, the title, but still not. I can't produce a reproducible example since I can't load my data.
Here is the code I used:
pacman::p_load(tidyverse, readxl, ggpubr)
library(dplyr)
library(ggplot2)
library(afex) ##statistic package
# load data and format
load_files <- function(files){
temp <- read_excel(files) %>%
select(-(c(8:11, 14:15, 18:23))) %>%
mutate(id = pull(.[4,1])) %>% ##ID
mutate(body_mass = pull(.[7,3])) %>% ##body mass
mutate(training = pull(.[4,2])) %>% ##training group
set_names(c("time", "power", "hr", "fr", "VE", "absVO2", "VCO2", "PETCO2", "VES", "QC", "IC", "WCI", "RVSi", "RVS", "VTD", "FE", "O2Hb", "HHb", "tHb", "HbDiff", "id", "body_mass", "training")) %>%
slice(86:which(grepl("ration", VE))-1) %>% ##until recovery period
mutate_at(vars(1:16), as.numeric) %>%
mutate_at(vars(18), as.numeric) %>%
mutate(time = format(as.POSIXct(Sys.Date() + time), "%H:%M", tz="UTC"),
absVO2 = absVO2/1000,
VCO2 = VCO2/1000)
}
# apply function to all files
df <- map_df(file_list, load_files)
# remove those with who have less than four similar power
df <- df %>%
mutate(len_seq = rep(rle(power)$lengths, rle(power)$lengths)) %>%
filter(len_seq == 4) %>%
mutate(seq_id = rep(1:(n()/4), each = 4)) %>%
group_by(id) %>%
select(-seq_id)%>%
select(-(20))
# group data
df_sum <- df %>%
type.convert(as.is = TRUE) %>%
group_by(id, power, training) %>%
summarise_if(is.numeric, mean) %>%
group_by(id) %>%
mutate(percent_absVO2 = absVO2/max(absVO2)*100,
percent_power = power/max(power)*100,
percent_QC = QC/max(QC)*100,
percent_SV = VES/max(VES)*100,
percent_VCO2 = VCO2/max(VCO2)*100,
percent_VE = VE/max(VE)*100) %>%
mutate(VE_VO2 = VE/absVO2,
VE_VCO2 = VE/VCO2) %>%
mutate(RER = VCO2/absVO2, VT = VE/fr) %>%
mutate(relVO2 = absVO2/body_mass*1000,
percent_relVO2 = relVO2/max(relVO2)*100) %>%
mutate(BF = VE/VT) %>%
mutate(mech_perf = (power/(((0.003*power+0.1208)*1000*body_mass)/60))*100) %>%
mutate(group = ifelse(grepl(".*-PRD-C", id), "CAD", "Healthy")) %>%
mutate(temps = ifelse(grepl(".*-PRD-C1", id), "1", ifelse(grepl(".*-PRD-S1", id), "1", "2")))
Then the outcome I get:
Error in `set_names()`:
! The size of `nm` (23) must be compatible with the size of `x` (20).
Run `rlang::last_error()` to see where the error occurred.
Thank you very much for your precious help.

gtsummary tbl_merge with multiple columns of variable length

I am doing a meta-analysis and would like to use gtsummary for Table 1 (Description of the Included Studies). I would like to have each column be a detail of the study (e.g. Authors, Intervention, Number, etc). Within this MA, there are some studies that have more than 2 interventions, so the rows won't be equal among studies (i.e. first column has 1 row per study, second column variable rows per study, etc).
Here is a dataset for the problem that matches my own dataset.
library(tidyverse)
#Create dataset
MA <-
tibble(
Study = c("Study 1", "Study 2"),
Intervention1 = c("Placebo", "Control"),
Intervention2 = c("Walking", "Running"),
Intervention3 = c("Running", NA),
Number_Int1 = c(21, 19),
Number_Int2 = c(19, 20),
Number_Int3 = c(20, NA)
)
Created on 2022-06-27 by the reprex package (v2.0.1)
I've tried to use tbl_summary and tbl_merge to generate a summary table, but to no avail.
Here is what I would like the table to look like:
Any help would be appreciated.
Ben
I've managed to find a solution using the gt package. Here is the code:
MA %>% pivot_longer(
cols = !Study,
names_to = c(".value", ".value"),
names_pattern = "(.)(.)",
values_drop_na = TRUE
) %>%
rename(Intervention = In) %>%
rename(Number = Nu) %>%
gt(groupname_col = "Study") %>%
tab_stubhead(label = "Study") %>%
tab_options(row_group.as_column = TRUE)
This gives the following output table:
If anyone has any solutions using the gtsummary package, that'd be great.
Thanks,
Ben

Text transform issue inserting image in a gt table

I had previously asked about inserting images in gt tables here and gotten a lot of help. But I am now encountering a new issue. The difficult part is that I am having trouble creating a minimal example.
Consider this code from the answer to my original post
library(gt)
library(magrittr)
library(purrr)
library(ggplot2)
# Let's make some pngs
mtcars %>%
split(.$cyl) %>%
map(~ ggplot(.x, aes(hp, mpg, color = factor(gear))) + geom_point()) %>%
set_names(c("CA", "UT", "OH")) %>%
iwalk(~ ggsave(paste0(.y, "_test.png"), .x))
column_one <- c("CA", "UT", "OH")
# Put the filenames in the column
column_two <- c("CA", "UT", "OH")
dashboard.data <- data.frame(column_one, column_two, stringsAsFactors = FALSE)
names(dashboard.data)[1] <- "State"
names(dashboard.data)[2] <- "IncidenceGauge"
dboard3 <- dashboard.data %>%
gt() %>%
tab_header(
title = md("**Big Title**"),
subtitle = md("*Subtitle*")
) %>%
text_transform(
locations = cells_body(vars(IncidenceGauge)),
fn = function(x) {
# loop over the elements of the column
map_chr(x, ~ local_image(
filename = paste0(.x, "_test.png"),
height = 100
))
}) %>%
cols_label(IncidenceGauge = "Risk Level")
print(dboard3)
It runs fine and produces this image
Now, imagine that I want to create a header row, and in that row there is no image in column two. In my case I create an image called NA_test.png that is just a blank white square. For the example below, you'll have to image that's what NA_test.png is and not a plot from mtcars. You'll see that column two now begins with NA...
library(gt)
library(magrittr)
library(purrr)
library(ggplot2)
# Let's make some pngs
mtcars %>%
split(.$cyl) %>%
map(~ ggplot(.x, aes(hp, mpg, color = factor(gear))) + geom_point()) %>%
set_names(c("NA", "UT", "OH")) %>%
iwalk(~ ggsave(paste0(.y, "_test.png"), .x))
column_one <- c("Header", "UT", "OH")
# Put the filenames in the column
column_two <- c(NA, "UT", "OH")
dashboard.data <- data.frame(column_one, column_two, stringsAsFactors = FALSE)
names(dashboard.data)[1] <- "State"
names(dashboard.data)[2] <- "IncidenceGauge"
dboard3 <- dashboard.data %>%
gt() %>%
tab_header(
title = md("**Big Title**"),
subtitle = md("*Subtitle*")
) %>%
text_transform(
locations = cells_body(vars(IncidenceGauge)),
fn = function(x) {
# loop over the elements of the column
map_chr(x, ~ local_image(
filename = paste0(.x, "_test.png"),
height = 100
))
}) %>%
cols_label(IncidenceGauge = "Risk Level")
print(dboard3)
This runs as well and produces this...again you'll just have to imagine that next to Header NA_test.png is just a blank white image.
My issue is that in my script (much longer, but literally copied in form from this example) when I encounter NA's it seems as if it's treating them like blanks rather then substituting NA. I get error messages that R can't find _test.png. Not NA_test.png as I would expect it to look for, but _test.png.
Here's the dataframe that is passed to gt()
There is no call to text_transform() for the Base column, but there is for the CommunityNewCases column and that's where the error occurs.
Can anyone suggest any reason as to why this is happening?
How to handle NA fields is important to dashboard design. So the real question is - should a dashboard really aggregate a plot of the entries which have "NA" as location? NA is typically a data entry error, so isn't the plot misleading?
As a statistical language, R considers the logical NA as a blank or empty data field. This is by design. And it is powerful, because NA retains its meaning in both character and numeric fields.
To change R's standard behavior, simply reassign an alternative value to the logical NA. For example, your sample code can replace logical NA with the character "NA" to use it in a paste command.
column_two <- c("NA", "UT", "OH")
Or if your data is more complex than this toy example, replace NA with "NA" using tidyr library.
dboard3 <- dashboard.data %>%
tidyr::replace_na("NA") %>%
gt() %>%
...
Alternatively, you could retain the logical NA, but handle with ifelse and if.na() test as shown here...
dboard3 <- dashboard.data %>%
gt() %>%
tab_header(
title = md("**Big Title**"),
subtitle = md("*Subtitle*")
) %>%
text_transform(
locations = cells_body(vars(IncidenceGauge)),
fn = function(x) {
# loop over the elements of the column
map_chr(x, ~ local_image(
filename = ifelse(is.na(.x),
"NA_test.png",
paste0(.x, "_test.png")),
height = 100
))
}) %>%
cols_label(IncidenceGauge = "Risk Level")

Formatting an ftable in R

I have the following 3 way table I created in R.
with(dataset, ftable(xtabs(count ~ dos + sex + edu)))
The output looks like
edu high low medium unknown
dos sex
five-to-ten-years female 247776 44916 127133 23793
male 225403 37858 147821 20383
five-years-or-less female 304851 58018 182152 33649
male 253977 55720 193621 28972
more-than-ten-years female 709303 452605 539403 165675
male 629162 309193 689299 121336
native-born female 1988476 1456792 2094297 502153
male 1411509 1197395 2790522 395953
unknown female 57974 75480 73204 593141
male 40176 57786 93108 605542
I want to rename the variables and format the table so that I can include it in a report. I know that I can use dnn to rename the variables, but are there any other recommendations to rename the variables? And to format the table (similar to using kable)?
You could convert the output to a text matrix using the following function, after which you can style with kable however you choose:
ftab_to_matrix <- function(ft)
{
row_vars <- attr(ft, "row.vars")
for(i in seq_along(row_vars)){
row_vars[[i]] <- c(names(row_vars[i]), row_vars[[i]])}
rowvar_widths <- sapply(row_vars, function(x) max(nchar(x))) + 1
col_vars <- attr(ft, "col.vars")
rowvar_widths <- c(1, cumsum(c(rowvar_widths, max(nchar(names(col_vars))))))
ft_text <- capture.output(print(ft))
row_cols <- sapply(seq_along(rowvar_widths)[-1], function(x)
substr(ft_text, rowvar_widths[x - 1], rowvar_widths[x]))
ft_text <- substr(ft_text, rowvar_widths[length(rowvar_widths)] + 2, 100)
ft_breaks <- c(1, cumsum(lapply(strsplit(ft_text[length(ft_text)], "\\d "),
function(x) nchar(x) + 2)[[1]]))
col_cols <- sapply(seq_along(ft_breaks)[-1], function(x)
substr(ft_text, ft_breaks[x - 1], ft_breaks[x]))
trimws(cbind(row_cols, col_cols))
}
So, for example, using my example data from your last question, you could do something like:
my_tab <- with(`3waydata`, ftable(xtabs(count ~ duration + sex + education)))
as_image(kable_styling(kable(ftab_to_df(my_tab))), file = "kable.png")
Might have been easier had you given the full picture when you asked your first question... You could use gt to make fancy tables for reports. This is an edited version more fully demonstrating some capabilities.
library(dplyr)
library(gt)
way3data <- data %>%
group_by(duration, education, sex) %>%
summarise(count = sum(number)) %>%
ungroup
# Reorder with select and Titlecase with stringr
longer <- tidyr::pivot_wider(way3data,
values_from = count,
names_from = "education") %>%
select(duration, sex, high, medium, low, unknown) %>%
rename_with(stringr::str_to_title)
# Demonstrating some of the features of gt
# obviously could have done some of this
# to the original dataframe
myresults <- longer %>%
group_by(Duration) %>%
gt(rowname_col = "Sex") %>%
row_group_order(
groups = c("native-born",
"more-than-ten-years",
"five-to-ten-years",
"five-years-or-less",
"unknown")
) %>%
tab_spanner(label = "Education",
columns = matches("High|Low|Medium|Unknown")) %>%
tab_stubhead(label = "Duration or something") %>%
tab_style(
style = cell_text(style = "oblique", weight = "bold"),
locations = cells_row_groups()) %>%
tab_style(
style = cell_text(align = "right", style = "italic", weight = "bold"),
locations = cells_column_labels(
columns = vars(High, Low, Medium, Unknown)
)) %>%
tab_style(
style = cell_text(align = "right", weight = "bold"),
locations = cells_stub()) %>%
tab_header(
title = "Fancy table of counts with Duration, Education and Gender") %>%
tab_source_note(md("More information is available at https://stackoverflow.com/questions/62284264."))
# myresults
# Can save in other formats including .rtf
myresults %>%
gtsave(
"tab_1.png", expand = 10
)
You can read about all the formatting choices here
Data compliments of Allan
set.seed(69)
data <- data.frame(education = sample(c("high","low","medium","unknown"), 600, T),
sex = rep(c("Male", "Female"), 300),
duration = sample(c("unknown", "native-born",
"five-years-or-less", "five-to-ten-years",
"more-than-ten-years"), 600, T),
number = rpois(600, 10))

R- melting groups of columns from survey data -understanding how R(gather) works

New to code in general and have been through hundreds of google searches and stackoverflow threads but nothing yet really explains how my solution works.
There are many ways to melt data, most appear overly complex...curious why my solution works when all other solutions are overly complex.
Original dataframe
> df <- data.frame(
ResponseID = c("123", "1234"),
Q1_W1 = c("Agree", "strongly disagree"),
Q1_W2 = c("disagree", "Agree"),
Q2_W1 = c("Disagree", "Disagree"),
Q2_W2 = c("Agree", "NA")
)
Desired output
ResponseID variable value variable value
123 Q1_W1 agree Q2_W1 disagree
1234 Q1_W2 disagree Q2_W2 agree
I was able to achieve this with:
nalh5=ALH %>% gather(question,response, Q1_W1:Q1_W7)%>%
gather(q2, r2,Q2_W1:Q2_W3)%>%
gather(q3, r3, Q3_W1:Q3_W5)
It works well, but are there more efficient ways to achieve this?
I guess this is cleaner but still in my opnion you are butchering an already tidy dataset.
df %>%
pivot_longer(names_to = "Q1_questions",values_to = "Q1_answers",cols = contains("Q1")) %>%
pivot_longer(names_to = "Q2_questions",values_to = "Q2_answers",cols = contains("Q2"))
You can even make it into a function
butcher_function <- function(df,Q) {
names_to_par <- str_c(Q,"questions",sep = "_")
values_to_par <- str_c(Q,"answers",sep = "_")
pivot_longer(data = df,
names_to = names_to_par,
values_to = values_to_par,
cols = contains(Q))
}
df %>%
butcher_function(Q = "Q1") %>%
butcher_function(Q = "Q2")

Resources