Add custom rows/header in Table1() - r

I'm using the table1 package to create summary statistics. My current table1() looks like this (code below):
I want to add another section, with a labeled row in bold called "Co-occurring disorder", that is not referring to a specific variable. I want two un-bolded rows after it refers to just the number of TRUE in two distinct variables: "Mental Health" for one row and "Substance use" for another. For example, if 12 people have TRUE for "mental health" and 7 people have TRUE for "substance use," the following row would start like this:
Alternatively, how do I add a blank row to the table?
My current code is pasted below.
library(table1)
opp2$SEX <-
factor(opp2$SEX, levels=c(1,2),
labels=c("Male", "Female"))
opp2$REGION <-
factor(opp2$REGION, levels=c(1:5),
labels = c("Northeast", "North Central", "South", "West", "Unknown"))
opp2$GS <- factor(opp2$GS, levels = c(1:5),
labels = c("Male", "Female",
"Transmasculine", "Transfeminine", "Unknown"))
#units(opp2$AGE) <- "years"
labels <- list(variables=list(SEX="Sex",
AGE="Age (years)",
REGION="Region"),
groups=list("", "Cis", "TGM"))
strata <- c(list(Total=opp2), split(opp2, opp2$GS))
my.render.cont <- function(x) {
with(stats.apply.rounding(stats.default(x), digits=2),
c("",
"Median (IQR)"=sprintf("%s (± %s)", MEDIAN, IQR)))
}
my.render.cat <- function(x) {
c("", sapply(stats.default(x),
function(y) with(y, sprintf("%d (%0.0f %%)", FREQ, PCT))))
}
table1(strata, labels, groupspan=c(1,2, 3),
render.continuous=my.render.cont, render.categorical=my.render.cat)`

Related

How do I create barplots with categories instead of numbers?

I'm just getting started in R and I'm trying to wrap my head around barplot for a university assignment. Specifically, I am using the General Social Survey 2018 dataset (for codebook: https://www.thearda.com/Archive/Files/Codebooks/GSS2018_CB.asp) and I am trying to figure out if religion has any effect on the way people seek out help for mental health. I want to use reliten (self-assessment of religiousness - from strong to no religion) as the IV and tlkclrgy, (asks if a person with mental health issues should reach out to a religious leader - yes or no) as the DV. For a better visualization of the data, I want to create a side-by-side barplot with reliten on the x-axis and see how many people answered yes and no on tlkclrgy. My problem is that on the barplot I get numbers instead of categories (from strong to no religion). This is what I tried, but I keep getting NA on the x-axis:
GSS$reliten <- factor(as.character(GSS$reliten),
levels = c("No religion", "Somewhat
strong", "Not very strong",
"Strong"))
GSS <- GSS18[!GSS18$tlkclrgy %in% c(0, 8, 9),]
GSS$reliten <- as_factor(GSS$reliten)
GSS$tlkclrgy <- as_factor(GSS$tlkclrgy)
ggplot(data=GSS,mapping=aes(x=reliten,fill=tlkclrgy))+
geom_bar(position="dodge")
Does anybody have any tips?
Here is complete code to download the codebook and data, table the two columns of interest and plot the frequencies.
1. Read the data
Data will be downloaded to a temporary directory, to keep my disk palatable. Use of these first two instructions is optional
od <- getwd()
setwd("~/Temp")
These are the links to the two files that need to be read and the filenames.
cols_url <- "https://osf.io/ydxu4/download"
cols_file <- "General Social Survey, 2018.col"
data_url <- "https://osf.io/e76rv/download"
data_file <- "General Social Survey, 2018.dat"
download.file(cols_url, cols_file, mode = "wb")
download.file(data_url, data_file, mode = "wb")
Now read in the codebook and process it, extracting the column widths and column names.
cols <- readLines(cols_file)
cols <- strsplit(cols, ": ")
widths_char <- sapply(cols, '[', 2)
i_widths <- grepl("-", widths_char)
f <- function(x) -eval(parse(text = x)) + 1L
widths <- rep(1L, length(widths_char))
widths[i_widths] <- f(widths[i_widths])
col_names <- sapply(cols, '[', 1)
col_names <- trimws(sub("^.[^ ]* ", "", col_names))
col_names <- tolower(col_names)
Finally, read the fixed width text file.
df1 <- read.fwf(data_file, widths = widths, header = FALSE, na.strings = "-", col.names = col_names)
2. Table the data
Find out where are the two columns we want with grep.
i_cols <- c(
grep("reliten", col_names, ignore.case = TRUE),
grep("tlkclrgy", col_names, ignore.case = TRUE)
)
head(df1[i_cols])
Table those columns and coerce to data.frame. Then coerce the columns to factor.
Here there is a problem, there is no answer 3 for tlkclrgy in the published survey but there are answers 3 in the data file. So I have created an extra factor level.
GSS <- as.data.frame(table(df1[i_cols]))
labels_reliten <- c(
"Not applicable",
"Strong",
"Not very strong",
"Somewhat Strong",
"No religion",
"Don't know",
"No answer"
)
levels_reliten <- c(0, 1, 2, 3, 4, 8, 9)
labels_tlkclrgy <- c(
"Not applicable",
"Yes",
"No",
"Not in codebook",
"Don't know",
"No answer"
)
levels_tlkclrgy <- c(0, 1, 2, 3, 8, 9)
GSS$reliten <- factor(
GSS$reliten,
labels = labels_reliten,
levels = levels_reliten
)
GSS$tlkclrgy <- factor(
GSS$tlkclrgy,
labels = labels_tlkclrgy,
levels = levels_tlkclrgy
)
3. Plot the frequencies table
library(ggplot2)
ggplot(data = GSS, mapping = aes(x = reliten, y = Freq, fill = tlkclrgy)) +
geom_col(position = "dodge")

How to have varying number of arguments with default value in function in R?

I am new in R and creating a function that highlight list of countries from the data set in the plot.
functionality issue If country names are not passed as arguments (which can vary) then it should be able to take from default list of countries.
I understand ... is used for variable arguments and then may be I can use list(...) but I am not able to put this together with default values.
Is there a way I can write: country_highlight_plot(Australia, Singapore, Norway)
and if I don't mention any country then it takes default countries.
Below is the code (using gapminder data to reproduce):
library(tidyverse)
library(gghighlight)
library(scales)
library(gapminder)
country_highlight_plot <- function(df = gapminder, y_var = gdpPercap,
background_line_color = "grey",
countries = default_list
){
# default list of highlight countries
default_list = c("India","Singapore","Malaysia","Norway",
"Denmark","United States","United Kingdom","China")
# quoting y-axis variable
y_var = enquo(y_var)
# Data Prep.
df %>%
mutate(highlight_type = case_when(country %in% countries ~ "Yes",
TRUE ~ "No")) %>%
# Plotting
ggplot() +
geom_line(aes(x = year, y = round(!!y_var,2), col = country), size = 1.1) +
gghighlight(highlight_type == "Yes",
unhighlighted_params = list(size = 1, colour = alpha(background_line_color, 0.4))) +
# facet_wrap(~continent) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90)
) +
labs(title = "GDP/Cap for world countries across time",
subtitle = "created by ViSa",
caption = "Data Source: Gapminder",
y = "Total Tax Revenue % of GDP"
)
}
country_highlight_plot()
# EDITED below line to gapminder df only
# country_highlight_plot(df=gapminder, y_var=gdpPercap, background_line_color= "pink")
I think setting a sane default value and checking for it makes sense to me.
I tend to differentiate between NULL and NA, where NULL means something like "use nothing or everything" and NA means "use a sane default".
Untested:
country_highlight_plot <- function(df = gapminder, y_var = gdpPercap,
background_line_color = "grey",
countries = NA) {
if (is.null(countries)) {
countries <- sort(unique(df[["country"]])) # assuming 'country' is in there
} else if (anyNA(countries)) {
countries <- c("India", "Singapore", "Malaysia", "Norway",
"Denmark", "United States", "United Kingdom", "China")
countries <- intersect(countries, unique(df[["country"]]))
}
# ...
}
This allows one to use country_highlight_plot(..., countries=NULL) for all countries in df, and country_highlight_plot(..., countries=NA) for your default list of countries. The intersect call ensures that if this function is called as part of a filtered-gapminder dataset, it won't look for countries that are not present. (Based on your current %in% usage, this might not be strictly necessary ... but if you use countries for anything else, it might still be useful. Defensive programming.)
Tangentially: if you're writing a function as part of a package, then I suggest you should import (mandate) the gapminder package, and then use a similar technique for using that dataset:
country_highlight_plot <- function(df, y_var = gdpPercap,
background_line_color = "grey",
countries = NA) {
if (missing(df)) {
df <- get("gapminder", envir = asNamespace("gapminder"))
}
if (is.null(countries)) {
countries <- sort(unique(df[["country"]])) # assuming 'country' is in there
} else if (anyNA(countries)) {
countries <- c("India", "Singapore", "Malaysia", "Norway",
"Denmark", "United States", "United Kingdom", "China")
}
# ...
}
With this, you can do
# default data, your default_list
country_highlight_plot()
# default data, all countries
country_highlight_plot(countries = NULL)
# pre-defined data, all default_list countries found
dplyr::filter(gapminder, ...) %>%
country_highlight_plot(.)
# pre-defined data, all countries found
dplyr::filter(gapminder, ...) %>%
country_highlight_plot(., countries = NULL)
# default data, manual countries
country_highlight_plot(countries = c("a","B"))

Order by name if repeat

I would like to sort the bars in descending order by value and if the value is repeated the name of the city must appear in alphabetical order
library(plotly)
city <- c("Paris", "New York", "Rio", "Salvador", "Curitiba", "Natal")
value <- c(10,20,30,10,10,10)
data <- data.frame(city, value, stringsAsFactors = FALSE)
data$city <- factor(data$city, levels = unique(data$city)[order(data$value, decreasing = FALSE)])
fig <- plot_ly(y = data$city, x = data$value, type = "bar", orientation = 'h')
Can be achieved using order function on dataframe. Applies order on value column, (-) sign indicates decreasing, and then on city name
data_ordered <- data[order(-data$value, data$city),]
data_ordered
city value
3 Rio 30
2 New York 20
5 Curitiba 10
6 Natal 10
1 Paris 10
4 Salvador 10
data_ordered$city <- factor(data_ordered$city, levels = data_ordered$city)
plot_ly(y = data_ordered$city, x = data_ordered$value, type = "bar", orientation = 'h') %>%
layout(yaxis = list(autorange = "reversed"))
Using tidyverse, i suggest that :
library(tidyverse)
city <- c("Paris", "New York", "Rio", "Salvador", "Curitiba", "Natal")
value <- c(10,20,30,10,10,10)
data <- data.frame(city, value)
db <- as_tibble(data)
db %>%
ggplot(aes(x = reorder(city, -value), y=value))+
geom_col()
The "reorder" function in the definition of "x" make what you want, and the alphabetical order is respected.
To make this graph vertically, add coord_flip in the end.
The "-value" can be switch to "value" if you want reorder
library(tidyverse)
city <- c("Paris", "New York", "Rio", "Salvador", "Curitiba", "Natal", "Zoo", "Aaa")
value <- c(10,20,30,10,10,10,10,10)
data <- data.frame(city, value)
db <- as_tibble(data)
db %>%
ggplot(aes(x = reorder(city, value), y=value))+
geom_col() +
coord_flip()

No labels in sjplot tab_df

I am trying to build a table with sjplot and the tab_df function but fail to get the label names in the table:
library(sjPlot)
library(stargazer)
Region<-c("Berlin", "Hamburg", "Berchtesgarden")
Sensor<-c("Riegl ", "Riegl ", "Riegl ")
Platform<-c("Aircraft", "UAV", "Helicoper")
Acquisition_Year <-c("2002", "2002", "2002")
Month<-c("August", "September", "July")
Flight_Height<-c("400-600m AGL","400-600m AGL","400-600m AGL")
LidarAcq<-as.data.frame(cbind(Region, Sensor, Platform, Acquisition_Year, Month, Flight_Height))
LidarAcq
attr(LidarAcq$Region, "label") <- "Region"
attr(LidarAcq$Sensor, "label") <- "Sensor"
attr(LidarAcq$Platform, "label") <- "Platform"
attr(LidarAcq$Acquisition_Year, "label") <- "Acquisition Year"
attr(LidarAcq$Month, "label") <- "Month"
attr(LidarAcq$Flight_Height, "label") <- "Flight Height"
tab_df(LidarAcq, title = "Lidar Acquisition Parameters" , file= "~/LidarAcq.doc")
As I understand, sjplot is supposed to automatically recognise the labels...
So what am I doing wrong?
I also cannot get sjplot to recognize the label for each column. You can use col.header and a character vector ("labels") to name your columns instead.
labels <- c("Region", "Sensor", "Platform", "Acquisition Value", "Month", "Flight Height")
tab_df(LidarAcq,
title = "Lidar Acquisition Parameters" ,
col.header = labels,
file= "~/LidarAcq.doc")

Function for label variable before plotting in R

I hope, this question is not too easy for this forum (actually, I'm almost a bit embarrassed to ask this question here, but I'm struggeling with this small issue the whole day...)
I have dataframes look like the following:
df <- data.frame(runif(4),
c("po", "pr", "po", "pr"),
c("Control 1","Control 1", "Treatment 1", "Treatment 1"))
names(df) <- list("values", "test_type", "group")
Now, I want easliy re-label the variables "test_type" and "group" for the plot afterwards. (it's nicer to read "pretest" instead of "pr" in a presentation :-) )
I could do it manually with:
df$test_type <- factor(df$test_type,
levels = c("pr", "po"),
labels = c("pretest", "posttest"))
df$group <- factor(df$group,
levels = c("Control 1", "Treatment 1"),
labels = c("control", "EST"))
In this case, I would have to repeat this for a lot more dataframes, which lead me to write a function:
var_label <- function(df, test, groups){
# Create labels
df$test_type <- factor(df$test,
levels = c("pr", "po"),
labels = c("pretest", "posttest"))
df$group <- factor(df$groups,
levels = c("Control 1", "Treatment 1"),
labels = c("control", "EST"))
return(list(df$test_type, df$group))
}
Unfortunately, this doesn't work. I tried a lot slight different versions and also different command from the Hmisc package, but none of these worked. I know, I can solve this problem in another way, but I try to write more efficient and shorter codes and would be really interested, what I have to change to make this function work. Or even better do you have a suggestion for a more efficient way?
Thank you a lot in advance!!
As I mentioned above, I think forcats::fct_relabel() is what you want here, along with dplyr::mutate_at(). Assuming that your relabeling needs are no more complex than what has been outlined in your question, the following should get you what you appear to be looking for.
####BEGIN YOUR DATAFRAME CREATION####
df <- data.frame(runif(4),
c("po", "pr", "po", "pr"),
c("Control 1","Control 1", "Treatment 1", "Treatment 1"))
names(df) <- list("values", "test_type", "group")
#####END YOUR DATAFRAME CREATION#####
# Load dplyr and forcats
library(dplyr)
library(forcats)
# create a map of labels and levels based on your implied logic
# the setup is label = level
label_map <- c("pretest" = "pr"
,"posttest" = "po"
,"control" = "Control 1"
,"EST" = "Treatment 1")
# create a function to exploit the label map
fct_label_select <- function(x, map) {
names(which(map == x))
}
# create a function which is responsive to a character vector
# as required by fct_relabel
fct_relabeler <- function(x, map) {
unlist(lapply(x, fct_label_select, map = map))
}
fct_relabeler(levels(df$test_type), map = label_map)
# function to meet your apparent needs
var_label <- function(df, cols, map){
df %>%
mutate_at(.vars = cols
,.fun = fct_relabeler
,map = map)
}
var_label(df = df, cols = c("test_type", "group"), map = label_map)
# values test_type group
# 1 0.05159681 posttest control
# 2 0.89050323 pretest control
# 3 0.42988881 posttest EST
# 4 0.32012811 pretest EST

Resources