I am doing a meta-analysis and would like to use gtsummary for Table 1 (Description of the Included Studies). I would like to have each column be a detail of the study (e.g. Authors, Intervention, Number, etc). Within this MA, there are some studies that have more than 2 interventions, so the rows won't be equal among studies (i.e. first column has 1 row per study, second column variable rows per study, etc).
Here is a dataset for the problem that matches my own dataset.
library(tidyverse)
#Create dataset
MA <-
tibble(
Study = c("Study 1", "Study 2"),
Intervention1 = c("Placebo", "Control"),
Intervention2 = c("Walking", "Running"),
Intervention3 = c("Running", NA),
Number_Int1 = c(21, 19),
Number_Int2 = c(19, 20),
Number_Int3 = c(20, NA)
)
Created on 2022-06-27 by the reprex package (v2.0.1)
I've tried to use tbl_summary and tbl_merge to generate a summary table, but to no avail.
Here is what I would like the table to look like:
Any help would be appreciated.
Ben
I've managed to find a solution using the gt package. Here is the code:
MA %>% pivot_longer(
cols = !Study,
names_to = c(".value", ".value"),
names_pattern = "(.)(.)",
values_drop_na = TRUE
) %>%
rename(Intervention = In) %>%
rename(Number = Nu) %>%
gt(groupname_col = "Study") %>%
tab_stubhead(label = "Study") %>%
tab_options(row_group.as_column = TRUE)
This gives the following output table:
If anyone has any solutions using the gtsummary package, that'd be great.
Thanks,
Ben
Related
I am trying to calculate the percentages for cigarettes smoking status by sex (for example, the % of males/females who are Non-smokers, Occasional smokers, Prefer not to say, Regular smokers etc). The default seems to calculate the percentage from the Row Total and not the Column Total. Any help would be greatly appreciated.
Dataframe
structure(list(sex = c("Female", "Male", "Female", "Female"),
cigarettes_smoking_status = c("Non-smoker", "Non-smoker",
"Non-smoker", "Non-smoker")), row.names = c(NA, 4L), class = "data.frame")
Code
smoking_status_by_sex <- smoking_data %>%
group_by(sex) %>%
dplyr::count(cigarettes_smoking_status) %>%
pivot_wider(names_from = sex, values_from = n) %>% #increase number of columns & reduce rows
adorn_totals(c("row", "col") )
smoking_status_by_sex_per <- smoking_status_by_sex %>%
mutate(female_pct = round((100*.[[2]]/Total),digits =2),
male_pct = round((100*.[[3]]/Total),digits =2),
prefer_not_to_say_pct = round((100*.[[4]]/Total), digits=2),
unknown_pct = round((100*.[[5]]/Total),digits =2),
total_pct = round((100*.[[6]]/Total), digits=2))
This is the table I am trying to replicate below
[What I am trying to replicate][1]
[1]: https://i.stack.imgur.com/hhDA4.png
I have tried using count, colSum, adorn_totals etc and then tried to use pivot_wider. Any help would be greatly appreciated.
Its easier to group_by sex and smoking status and then compute the relative frequencies. An example is given below.
library(tidyverse)
df<-starwars
df %>%
group_by(eye_color,skin_color) %>% ##grouping by eyecolor and skin color!
summarise(count1=n()) %>%
mutate(grouppercentage=(count1/sum(count1))*100)
I wanted to create a likert graph that is grouped by Question i. I can create the likert graph for total responses ungrouped, but im uncertain of how to reformat question 6 without losing the column for question i. (aka do the reformatting done below but also have it take into account who selected what in question i.)
What I want is the sufficiency of Q6 grouped by their answer in question i.
Sample Dataframe:
SurveyClean2 <- data.frame(i = c("Mail,Email", "Mail", "Mail,Email,Podcast", "Radio,Podcast", "Radio", "Mail,Radio"), Q6_3 = c("Not Sufficient", "Very Sufficient", "Completely Sufficient", "Moderately Sufficient", "Moderately Sufficient", "Not Sufficient"))
Unnesting Question i:
UnnestQi <- SurveyClean2 %>%
as_tibble() %>%
mutate(i = str_split(Q3, ",")) %>%
unnest(i)
Survey2Q6 <- UnnestQi |> drop_na(Q5) |> drop_na(i)
Reformating Question 6 to Likert-friendly format:
clean_survey <- function(data, column, question) {
data %>%
dplyr::select(all_of({{column}})) %>%
dplyr::mutate(Question = question) %>%
dplyr::group_by(Question, across(1)) %>%
dplyr::count() %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = 2, values_from = n)
}
# table that contains survey questions/columns and the question name
survey_table <- dplyr::tibble(
column = c("Q6_3"),
question = c("Expert advice")
)
# loop through your data and clean it, then bind as dataframe
LikertGroupqi62 <- purrr::map2_df(survey_table$column, survey_table$question, function(x, y){
clean_survey(Survey2Q6, x, y)}) |>
mutate(across(everything(), ~ifelse(is.na(.), 0, .)))
## Likert
LikertGroupqi62 <- LikertGroupqi62 |> dplyr::select(Question, `Not Sufficient`, `Slightly Sufficient`, `Moderately Sufficient`, `Very Sufficient`, `Completely Sufficient`)
Likert WITHOUT grouping:
likert(Question~., LikertGroupqi62, ReferenceZero = 0, auto.key.in = list(columns = 1), main = list("Sufficiency of Cost-share Advice Based on Person or Agency Worked With"), col = c("#db6d00", "#924900", "#000000", "#004949", "#009292"),strip.left = FALSE, ylab = "", xlab = "Total Number of Respondents")
I am having two sets of data values are arranged in different bins I want to compare two data sets mean accross bins of dataset1 and dataset2 visualize in line plot or any method to visualize I am new to this kind of analysis any suggestion will be very helpful
dataset1 and dataset2 actual data bin size is different bin1-bin200 on both datasets and the number of data is varing(300-200) so metioned below sample dataset I wanted use bootstrap method take random data example 100 from both datasets and take mean accross all bins dataset1 and 2 why I am doing boostrap in both data bin size is similar but datas are varing may infer in taking mean and also presence of outlier extream low and high values accross the bins may alter the result so I wanted to use bootstrap method take random dataset take mean across all bins
any suggestions how can I do this in R I am newbie to R and I am in learning phase please help me
dataset1=structure(list(genenames = c("data1", "data2", "data3", "data4", "data5", "data6"),
bin1 = c(0,20,9,0,2,0),
bin2 = c(5,20,8,30,10,0),
bin3 = c(0,0,1,1,3,0),
bin4 =c(6, 20, 10, 5, 0, 1),
bin5 =c(10,15,30,10,9, 4)),
class = "data.frame", row.names = c(NA, -6L))
dataset2=structure(list(genenames = c("data10", "data11", "data12", "data13", "data14", "data15"),
bin1 = c(0,30,0,0,20,0),
bin2 = c(0,0,8,10,20,0),
bin3 = c(0,10,19,15,3,10),
bin4 =c(30, 0, 0, 25, 0, 20),
bin5 =c(0,5,0,20,30, 29)),
class = "data.frame", row.names = c(NA, -6L))
dataset1_mean=colMeans(dataset1[,-1])
dataset2_mean=colMeans(dataset2[,-1])
any statisticl method to remove this outlier or any problem to use bootstrap method please mention
Thank you
Here is one way: After some data wrangling you could use boxplot and mark the mean with a red point:
library(dplyr)
library(ggplot2)
library(tidyr)
dataset1 <- dataset1 %>%
mutate(df = "df1")
dataset2 <- dataset2 %>%
mutate(df = "df2")
bind_rows(dataset1, dataset2) %>%
pivot_longer(
cols = starts_with("bin"),
names_to = "name",
values_to = "value"
) %>%
ggplot(aes(df, value))+
geom_boxplot() +
stat_summary(fun=mean,
geom="point",
shape=20,
size=4,
color="red",
position = position_dodge2 (width = 0.7, preserve = "single"))
I have the following 3 way table I created in R.
with(dataset, ftable(xtabs(count ~ dos + sex + edu)))
The output looks like
edu high low medium unknown
dos sex
five-to-ten-years female 247776 44916 127133 23793
male 225403 37858 147821 20383
five-years-or-less female 304851 58018 182152 33649
male 253977 55720 193621 28972
more-than-ten-years female 709303 452605 539403 165675
male 629162 309193 689299 121336
native-born female 1988476 1456792 2094297 502153
male 1411509 1197395 2790522 395953
unknown female 57974 75480 73204 593141
male 40176 57786 93108 605542
I want to rename the variables and format the table so that I can include it in a report. I know that I can use dnn to rename the variables, but are there any other recommendations to rename the variables? And to format the table (similar to using kable)?
You could convert the output to a text matrix using the following function, after which you can style with kable however you choose:
ftab_to_matrix <- function(ft)
{
row_vars <- attr(ft, "row.vars")
for(i in seq_along(row_vars)){
row_vars[[i]] <- c(names(row_vars[i]), row_vars[[i]])}
rowvar_widths <- sapply(row_vars, function(x) max(nchar(x))) + 1
col_vars <- attr(ft, "col.vars")
rowvar_widths <- c(1, cumsum(c(rowvar_widths, max(nchar(names(col_vars))))))
ft_text <- capture.output(print(ft))
row_cols <- sapply(seq_along(rowvar_widths)[-1], function(x)
substr(ft_text, rowvar_widths[x - 1], rowvar_widths[x]))
ft_text <- substr(ft_text, rowvar_widths[length(rowvar_widths)] + 2, 100)
ft_breaks <- c(1, cumsum(lapply(strsplit(ft_text[length(ft_text)], "\\d "),
function(x) nchar(x) + 2)[[1]]))
col_cols <- sapply(seq_along(ft_breaks)[-1], function(x)
substr(ft_text, ft_breaks[x - 1], ft_breaks[x]))
trimws(cbind(row_cols, col_cols))
}
So, for example, using my example data from your last question, you could do something like:
my_tab <- with(`3waydata`, ftable(xtabs(count ~ duration + sex + education)))
as_image(kable_styling(kable(ftab_to_df(my_tab))), file = "kable.png")
Might have been easier had you given the full picture when you asked your first question... You could use gt to make fancy tables for reports. This is an edited version more fully demonstrating some capabilities.
library(dplyr)
library(gt)
way3data <- data %>%
group_by(duration, education, sex) %>%
summarise(count = sum(number)) %>%
ungroup
# Reorder with select and Titlecase with stringr
longer <- tidyr::pivot_wider(way3data,
values_from = count,
names_from = "education") %>%
select(duration, sex, high, medium, low, unknown) %>%
rename_with(stringr::str_to_title)
# Demonstrating some of the features of gt
# obviously could have done some of this
# to the original dataframe
myresults <- longer %>%
group_by(Duration) %>%
gt(rowname_col = "Sex") %>%
row_group_order(
groups = c("native-born",
"more-than-ten-years",
"five-to-ten-years",
"five-years-or-less",
"unknown")
) %>%
tab_spanner(label = "Education",
columns = matches("High|Low|Medium|Unknown")) %>%
tab_stubhead(label = "Duration or something") %>%
tab_style(
style = cell_text(style = "oblique", weight = "bold"),
locations = cells_row_groups()) %>%
tab_style(
style = cell_text(align = "right", style = "italic", weight = "bold"),
locations = cells_column_labels(
columns = vars(High, Low, Medium, Unknown)
)) %>%
tab_style(
style = cell_text(align = "right", weight = "bold"),
locations = cells_stub()) %>%
tab_header(
title = "Fancy table of counts with Duration, Education and Gender") %>%
tab_source_note(md("More information is available at https://stackoverflow.com/questions/62284264."))
# myresults
# Can save in other formats including .rtf
myresults %>%
gtsave(
"tab_1.png", expand = 10
)
You can read about all the formatting choices here
Data compliments of Allan
set.seed(69)
data <- data.frame(education = sample(c("high","low","medium","unknown"), 600, T),
sex = rep(c("Male", "Female"), 300),
duration = sample(c("unknown", "native-born",
"five-years-or-less", "five-to-ten-years",
"more-than-ten-years"), 600, T),
number = rpois(600, 10))
I am having troubles trying to write R code for a choroplet using the highcharter package. I am trying to replicate the code in the following link on lines 84-112: https://www.kaggle.com/gloriousc/global-terrorism-in-1970-2016/code.
I have been encountering 2 errors:
When running line 95, error says that there is no object called "countrycode_data". I looked on the internet in order to find out what countrycode_data is and I discovered that it is a dataset of the containing country code to associate to country names in datasets. Countrycode_data, from what I understood, it should have been contained in the "countrycode" package that I had installed but I didn't manage to find out how to access this dataset. In order to overcome this problem i downloaded this dataset from the internet and managed to go on with the code.
When running the choroplet code starting on line 103, I encountered the following error: "Error: %in%(x = tail(joinBy, 1), table = names(df)) is not TRUE". I actually have no idea about what this error could mean, so I'm here asking for help.
I managed to overcome the 1st error problem even though I am not sure that it is the correct way.
I am going to leave the entire code right here:
knitr::opts_chunk$set(echo=TRUE, error=FALSE)
library(dplyr) #manipulate table
library(ggplot2) #visualization
library(highcharter) #making map
library("viridisLite") #Default Color Maps
library(countrycode) #list of country code
library(treemap) #make a treemap chart
library(reshape2) #melt function
library(plotly) #pie chart
library(tm) #text mining
library(SnowballC) #stemming text
library(wordcloud) #make a text chart
library(RColorBrewer) #make a color pallette
library(DT) #make datatable
#input the data
terror <- read.csv("../input/globalterrorismdb_0617dist.csv")
Terrorist Incidents Map
#count terrorism incidents per country as a dataframe
countries <- terror %>%
group_by(country_txt) %>%
summarise(Total = round(n()))
#Making a terrorism map
#Credit to umeshnarayanappa
names(countries) <- c("country.name", "total") #change the column name
countries$iso3 <- countrycode_data[match(countries$country.name, countrycode_data$country.name.en), "iso3c"] #add iso3 column from country_code
data(worldgeojson, package = "highcharter")
dshmstops <- data.frame(q = c(0, exp(1:5)/exp(5)),
c = substring(viridis(5 + 1, option = "D"), 0, 7)) %>% #from viridisLite, make a color
list_parse2() #from highchart package, parse df to list
highchart() %>% #from highchart package
hc_add_series_map(worldgeojson, countries, value = "total", joinBy = "iso3") %>%
hc_colorAxis(stops = dshmstops) %>%
hc_legend(enabled = TRUE) %>%
hc_add_theme(hc_theme_db()) %>%
hc_mapNavigation(enabled = TRUE) %>%
hc_title(text = "Global Terrorism in 1970-2016", style = list(fontSize = "25px")) %>%
hc_add_theme(hc_theme_google()) %>%
hc_credits(enabled = TRUE, text = "Sources: National Consortium for the Study of Terrorism and Responses to Terrorism (START)", style = list(fontSize = "10px"))
I want to specify that, even though I ctrl+c ctrl+v the lines, they are not working for me.
Thank you for reading everything and also, I hope, for your help.
I tried to replicate the example. I hope the following is enough for you to work by yourself and replicate the example. It seems that countrycode_data is on the psData package. This package requires the rJava package, which is not on my machine now. As you were looking for a workaround, I found my own way; I scrape country data including iso3. (You can probably use the ISOcodes package too.) You need to check if country names in the two datasets are identical or not, which is a common challenge. You usually see some mismatches. I do not have time to correct all, but I showed you how to revise some country names in recode(). The bottom line is that you want to add iso3 to countries. So you need to make sure that you have identical country names as much as possible. (Obviously, some countries do not exist any more. You cannot really do anything about them.) The author used match() in his code, but I rather used left_join() to do the same. After this, I think you are ready to follow the rest of the code. Note that hc_add_series_map() is also doing a join process. worldgeojson has a property called iso3. countries must have a column called iso3. Otherwise, you will get the same error message again.
library(tidyverse)
library(data.table)
library(rvest)
library(highcharter)
library(viridisLite)
# I used fread(). This is much faster.
terror <- fread("globalterrorismdb_0919dist.csv")
# I wrote my own code which does the same job.
count(terror, country_txt) %>%
setNames(nm = c("country.name", "total")) -> countries
# Get iso3 data
map_dfc(.x = c("official", "shortname", "iso3"),
.f = function(x) {read_html("http://www.fao.org/countryprofiles/iso3list/en/") %>%
html_nodes(paste("td.", x, sep = "")) %>%
html_text() %>%
gsub(pattern = "\\n(\\s+)?", replacement = "")}) %>%
setNames(nm = c("official", "shortname", "iso3")) -> iso3
# Revise some country names.
mutate(iso3, shortname = trimws(sub(x = shortname, pattern = "\\(.*\\)",
replacement = "")),
shortname = recode(.x = shortname,
`Bosnia and Herzegovina` = "Bosnia-Herzegovina",
`Brunei Darussalam` = "Brunei",
Czechia = "Czech Republic",
Congo = "Republic of the Congo",
`Côte d'Ivoire` = "Ivory Coast",
`Russian Federation` = "Russia",
`United Kingdom of Great Britain and Northern Ireland` = "United Kingdom",
`United States of America`= "United States"
)) -> iso3
# Join the two data sets
left_join(countries, iso3, by = c("country.name" = "shortname")) -> countries
data(worldgeojson, package = "highcharter")
dshmstops <- data.frame(q = c(0, exp(1:5)/exp(5)),
c = substring(viridis(5 + 1, option = "D"), 0, 7)) %>% #from viridisLite, make a color
list_parse2()
highchart() %>% #from highchart package
hc_add_series_map(worldgeojson, df = countries,
value = "total", joinBy = "iso3") %>%
hc_colorAxis(stops = dshmstops) %>%
hc_legend(enabled = TRUE) %>%
hc_add_theme(hc_theme_db()) %>%
hc_mapNavigation(enabled = TRUE) %>%
hc_title(text = "Global Terrorism in 1970-2016", style = list(fontSize = "25px")) %>%
hc_add_theme(hc_theme_google()) %>%
hc_credits(enabled = TRUE,
text = "Sources: National Consortium for the Study of Terrorism and Responses to Terrorism (START)",
style = list(fontSize = "10px"))