Function for label variable before plotting in R - r

I hope, this question is not too easy for this forum (actually, I'm almost a bit embarrassed to ask this question here, but I'm struggeling with this small issue the whole day...)
I have dataframes look like the following:
df <- data.frame(runif(4),
c("po", "pr", "po", "pr"),
c("Control 1","Control 1", "Treatment 1", "Treatment 1"))
names(df) <- list("values", "test_type", "group")
Now, I want easliy re-label the variables "test_type" and "group" for the plot afterwards. (it's nicer to read "pretest" instead of "pr" in a presentation :-) )
I could do it manually with:
df$test_type <- factor(df$test_type,
levels = c("pr", "po"),
labels = c("pretest", "posttest"))
df$group <- factor(df$group,
levels = c("Control 1", "Treatment 1"),
labels = c("control", "EST"))
In this case, I would have to repeat this for a lot more dataframes, which lead me to write a function:
var_label <- function(df, test, groups){
# Create labels
df$test_type <- factor(df$test,
levels = c("pr", "po"),
labels = c("pretest", "posttest"))
df$group <- factor(df$groups,
levels = c("Control 1", "Treatment 1"),
labels = c("control", "EST"))
return(list(df$test_type, df$group))
}
Unfortunately, this doesn't work. I tried a lot slight different versions and also different command from the Hmisc package, but none of these worked. I know, I can solve this problem in another way, but I try to write more efficient and shorter codes and would be really interested, what I have to change to make this function work. Or even better do you have a suggestion for a more efficient way?
Thank you a lot in advance!!

As I mentioned above, I think forcats::fct_relabel() is what you want here, along with dplyr::mutate_at(). Assuming that your relabeling needs are no more complex than what has been outlined in your question, the following should get you what you appear to be looking for.
####BEGIN YOUR DATAFRAME CREATION####
df <- data.frame(runif(4),
c("po", "pr", "po", "pr"),
c("Control 1","Control 1", "Treatment 1", "Treatment 1"))
names(df) <- list("values", "test_type", "group")
#####END YOUR DATAFRAME CREATION#####
# Load dplyr and forcats
library(dplyr)
library(forcats)
# create a map of labels and levels based on your implied logic
# the setup is label = level
label_map <- c("pretest" = "pr"
,"posttest" = "po"
,"control" = "Control 1"
,"EST" = "Treatment 1")
# create a function to exploit the label map
fct_label_select <- function(x, map) {
names(which(map == x))
}
# create a function which is responsive to a character vector
# as required by fct_relabel
fct_relabeler <- function(x, map) {
unlist(lapply(x, fct_label_select, map = map))
}
fct_relabeler(levels(df$test_type), map = label_map)
# function to meet your apparent needs
var_label <- function(df, cols, map){
df %>%
mutate_at(.vars = cols
,.fun = fct_relabeler
,map = map)
}
var_label(df = df, cols = c("test_type", "group"), map = label_map)
# values test_type group
# 1 0.05159681 posttest control
# 2 0.89050323 pretest control
# 3 0.42988881 posttest EST
# 4 0.32012811 pretest EST

Related

Add custom rows/header in Table1()

I'm using the table1 package to create summary statistics. My current table1() looks like this (code below):
I want to add another section, with a labeled row in bold called "Co-occurring disorder", that is not referring to a specific variable. I want two un-bolded rows after it refers to just the number of TRUE in two distinct variables: "Mental Health" for one row and "Substance use" for another. For example, if 12 people have TRUE for "mental health" and 7 people have TRUE for "substance use," the following row would start like this:
Alternatively, how do I add a blank row to the table?
My current code is pasted below.
library(table1)
opp2$SEX <-
factor(opp2$SEX, levels=c(1,2),
labels=c("Male", "Female"))
opp2$REGION <-
factor(opp2$REGION, levels=c(1:5),
labels = c("Northeast", "North Central", "South", "West", "Unknown"))
opp2$GS <- factor(opp2$GS, levels = c(1:5),
labels = c("Male", "Female",
"Transmasculine", "Transfeminine", "Unknown"))
#units(opp2$AGE) <- "years"
labels <- list(variables=list(SEX="Sex",
AGE="Age (years)",
REGION="Region"),
groups=list("", "Cis", "TGM"))
strata <- c(list(Total=opp2), split(opp2, opp2$GS))
my.render.cont <- function(x) {
with(stats.apply.rounding(stats.default(x), digits=2),
c("",
"Median (IQR)"=sprintf("%s (± %s)", MEDIAN, IQR)))
}
my.render.cat <- function(x) {
c("", sapply(stats.default(x),
function(y) with(y, sprintf("%d (%0.0f %%)", FREQ, PCT))))
}
table1(strata, labels, groupspan=c(1,2, 3),
render.continuous=my.render.cont, render.categorical=my.render.cat)`

Subset a data frame in R/Shiny to generate a ggplot2 sf object

A complete ggplot2/Shiny beginner here. I have been searching on Stack and Google for days and could not come up with a decent solution.
Task: to create an interactive leaflet map showing a user-selected column in a long data format (Covid vaccine doses - first, second, and third dose; need shiny to feed this into ggplot2's "data"), which are pre-filtered based on additional user choices (month of the year, age group, type of vaccine administered; these cannot be fed into ggplot2 directly so I need to filter out the data). I am therefore interested in subsetting selected columns (time, age_group, vaccine) based on the values the users select in the input.
I am importing a data frame in .csv which needs to be merged with a sf object later on to match the data with the sf coordinates (supplied by RCzechia).
# Load packages
library(shiny)
library(here)
library(tidyverse)
library(ggplot2)
library(RCzechia)
library(sf)
# Load data
df <- read.csv("data", encoding = "UTF-8")
# load geo-spatial sf data for ggplot
czrep <- republika()
regions <- kraje(resolution = "low")
# Defining UI for the ggplot application
ui <- fluidPage(
titlePanel(),
# Sidebar
sidebarLayout(
sidebarPanel(width = 3,
selectInput("box_time", label = "Month & Year",
choices = sort(unique(df$time)), selected = "",
width = "100%", selectize=FALSE),
selectInput("box_age", label = "Age group",
choices = sort(unique(df$age_group)), selected = "",
width = "100%", selectize=FALSE),
selectInput("box_vax", label = "Type of vaccine",
choices = sort(unique(df$vaccine)), selected = "",
width = "100%", selectize=FALSE),
radioButtons("button_dose", label = "Vaccine dose",
choices = c("First dose" = "first_dose",
"Second dose" = "second_dose",
"Booster" = "booster"))
),
# Displaying the user-defined ggplot
mainPanel(
plotOutput("map")
)))
# Server
server <- function(input, output) {
# select column for ggplot
r_button_dose <- reactive({input$button_dose})
### Subset based on user choices - this is where I tried to create a new data frame (new_df) as a result of subsetting by - see below. ###
# merge the df with the sf object
new_df <- merge(regions, new_df, by.x = "region_id", by.y="region_id")
# transform data set into an sf object (readable by ggplot)
new_df <- st_as_sf(new_df)
})
# Generating the plot based on user choices
output$map <- renderPlot({
ggplot(data = new_df) +
geom_sf(aes_string(fill = r_button_dose(), colour = NA, lwd = 2)) +
geom_sf(data = czrep, color = "grey27", fill = NA) +
scale_fill_viridis_c(trans = "log", labels = scales::comma) +
labs(fill = "log scale") +
theme_bw() +
theme(legend.text.align = 1,
legend.title.align = 0.5)
})
}
# Starting the Shiny application
shinyApp(ui = ui, server = server)
I cannot figure out how to subset the data - I have tried many different things that I found here and on the RStudio community forms.
Here are a couple of things I have already tried:
# used both filter() and subset(); also tried both '==' and '%in%'
new_df %>%
filter(time %in% box_time() &
age_group %in% input$box_age() &
vaccine %in% input$box_vax())
})
#OR#
new_df <- reactive({
df <- df %>%
filter(time %in% box_time() &
age_group %in% input$box_age() &
vaccine %in% input$box_vax())
})
#OR#
new_df <- df
new_df$time <- df[df$time==box_time(),]
new_df$age_group <- df[df$age_group==input$box_age(),]
new_df$vaccine <- df[df$vaccine ==input$box_vax(),]
# I also tried passing them the same way as this example:
r_button_dose <- reactive({input$button_dose})
#OR EVEN#
new_df <- reactive({
new_df <- df
new_df$time <- df[df$X.U.FEFF.year_mo==box_time(),]
new_df$age_group <- df[df$age_group==input$box_age(),]
new_df$vaccine <- df[df$vaccine ==input$box_vax(),]
})
With the latest option, I get the following error - even though they are similar:
Listening on http://127.0.0.1:4092
Warning: Error in $: object of type 'closure' is not subsettable
1: runApp
Warning: Error in $: object of type 'closure' is not subsettable
1: runApp
Warning: Error in as.data.frame.default: cannot coerce class ‘c("reactiveExpr", "reactive", "function")’ to a data.frame
176: stop
175: as.data.frame.default
172: merge.data.frame
168: renderPlot [C:/Users/xyz/Documents/R/example/gg_app.R#78]
166: func
126: drawPlot
112: <reactive:plotObj>
96: drawReactive
83: renderFunc
82: output$map
1: runApp
I don't know what to do - looking for more examples online has not worked. I know that I cannot pass a reactive value directly (even though I am not sure if it is because it returns a logical value). I would be extremely grateful for any tips regarding how to resolve this - thank you!
You can define your reactive dataframe as a reactiveVal:
df_filtered <- reactiveVal(df) ## df being your initial static dataframe
The tricky bit is to treat your reactive dataframe as a function, not an static object:
## works:
df_filtered(df %>% filter(age_group == input$box_age))
renderDataTable(df_filtered()) ## note the parentheses
instead of:
## won't work:
df_filtered <- df %>% filter(age_group %in% input$box_age)
renderDataTable(df_filtered)
finally, wrap it into a reactive expression:
observe({df_filtered(df %>% filter(age_group == input$box_age))
## note: function argument, not assignment operator
output$map <- renderPlot({
df_filtered() %>% ## again: note function (parentheses)
ggplot() # etc.
})
}) %>% bindEvent(input$box_age, input$some_other_picker)
I think you are almost there, slight syntax issue. Note I return the new_df as part of reactive block (essentially a function), and, in renderPlot, I tell 'data' is in essence invocation result of function r_button_dose. You need to modify the fill attribute as I'm not sure what you want it to be filled with
# select column for ggplot
r_button_dose <- reactive({input$button_dose})
### Subset based on user choices - this is where I tried to create a new data frame (new_df) as a result of subsetting by - see below. ###
# merge the df with the sf object
new_df <- merge(regions, new_df, by.x = "region_id", by.y="region_id")
# transform data set into an sf object (readable by ggplot)
new_df <- st_as_sf(new_df)
new_df
})
# Generating the plot based on user choices
output$map <- renderPlot({
ggplot(data = r_button_dose()) +
geom_sf(aes_string(fill = r_button_dose()$region_id, colour = NA, lwd = 2)) +
geom_sf(data = czrep, color = "grey27", fill = NA) +
scale_fill_viridis_c(trans = "log", labels = scales::comma) +
labs(fill = "log scale") +
theme_bw() +
theme(legend.text.align = 1,
legend.title.align = 0.5)
})

How do I create barplots with categories instead of numbers?

I'm just getting started in R and I'm trying to wrap my head around barplot for a university assignment. Specifically, I am using the General Social Survey 2018 dataset (for codebook: https://www.thearda.com/Archive/Files/Codebooks/GSS2018_CB.asp) and I am trying to figure out if religion has any effect on the way people seek out help for mental health. I want to use reliten (self-assessment of religiousness - from strong to no religion) as the IV and tlkclrgy, (asks if a person with mental health issues should reach out to a religious leader - yes or no) as the DV. For a better visualization of the data, I want to create a side-by-side barplot with reliten on the x-axis and see how many people answered yes and no on tlkclrgy. My problem is that on the barplot I get numbers instead of categories (from strong to no religion). This is what I tried, but I keep getting NA on the x-axis:
GSS$reliten <- factor(as.character(GSS$reliten),
levels = c("No religion", "Somewhat
strong", "Not very strong",
"Strong"))
GSS <- GSS18[!GSS18$tlkclrgy %in% c(0, 8, 9),]
GSS$reliten <- as_factor(GSS$reliten)
GSS$tlkclrgy <- as_factor(GSS$tlkclrgy)
ggplot(data=GSS,mapping=aes(x=reliten,fill=tlkclrgy))+
geom_bar(position="dodge")
Does anybody have any tips?
Here is complete code to download the codebook and data, table the two columns of interest and plot the frequencies.
1. Read the data
Data will be downloaded to a temporary directory, to keep my disk palatable. Use of these first two instructions is optional
od <- getwd()
setwd("~/Temp")
These are the links to the two files that need to be read and the filenames.
cols_url <- "https://osf.io/ydxu4/download"
cols_file <- "General Social Survey, 2018.col"
data_url <- "https://osf.io/e76rv/download"
data_file <- "General Social Survey, 2018.dat"
download.file(cols_url, cols_file, mode = "wb")
download.file(data_url, data_file, mode = "wb")
Now read in the codebook and process it, extracting the column widths and column names.
cols <- readLines(cols_file)
cols <- strsplit(cols, ": ")
widths_char <- sapply(cols, '[', 2)
i_widths <- grepl("-", widths_char)
f <- function(x) -eval(parse(text = x)) + 1L
widths <- rep(1L, length(widths_char))
widths[i_widths] <- f(widths[i_widths])
col_names <- sapply(cols, '[', 1)
col_names <- trimws(sub("^.[^ ]* ", "", col_names))
col_names <- tolower(col_names)
Finally, read the fixed width text file.
df1 <- read.fwf(data_file, widths = widths, header = FALSE, na.strings = "-", col.names = col_names)
2. Table the data
Find out where are the two columns we want with grep.
i_cols <- c(
grep("reliten", col_names, ignore.case = TRUE),
grep("tlkclrgy", col_names, ignore.case = TRUE)
)
head(df1[i_cols])
Table those columns and coerce to data.frame. Then coerce the columns to factor.
Here there is a problem, there is no answer 3 for tlkclrgy in the published survey but there are answers 3 in the data file. So I have created an extra factor level.
GSS <- as.data.frame(table(df1[i_cols]))
labels_reliten <- c(
"Not applicable",
"Strong",
"Not very strong",
"Somewhat Strong",
"No religion",
"Don't know",
"No answer"
)
levels_reliten <- c(0, 1, 2, 3, 4, 8, 9)
labels_tlkclrgy <- c(
"Not applicable",
"Yes",
"No",
"Not in codebook",
"Don't know",
"No answer"
)
levels_tlkclrgy <- c(0, 1, 2, 3, 8, 9)
GSS$reliten <- factor(
GSS$reliten,
labels = labels_reliten,
levels = levels_reliten
)
GSS$tlkclrgy <- factor(
GSS$tlkclrgy,
labels = labels_tlkclrgy,
levels = levels_tlkclrgy
)
3. Plot the frequencies table
library(ggplot2)
ggplot(data = GSS, mapping = aes(x = reliten, y = Freq, fill = tlkclrgy)) +
geom_col(position = "dodge")

How do I get a summary table from a custom function to react to user input of variables?

This is my first attempt at using Shiny.
I have a simulated patient-level dataset with 4 variables:
group: Categorical, takes on values A, B and C. Represents 3 different treatment types that were used in the study.
week: Numeric variable, takes on values 1, 4, 8.Represents follow-up week.
painscore: Numeric variable, score on scale of 1-10, with 1 indicating no pain, 10 indicating extreme pain.
dependscore: Numeric variable, score on scale of 1-10, with 1 indicating no dependency on pain meds, 10 indicating extreme dependency.
Trying to build a simple app that accepts two inputs: the week, and the variable, and provides two outputs:
A boxplot of distribution of scores for the selected variable for the selected week. The x axis would represent the 3 levels of group (A, B and C).
A summary table the shows the number of observations, median, 25th percentile, 75th percentile and number of missing.
I was able to create the interactive boxplot, but I am unable to create the summary table. I was able to create static versions of this table in RMarkdown using the summaryBy function from doBy, but I am not able to implement it in Shiny. Tried following the advice here and here but I'm missing something.
Here's my code for reproducibility. Excuse the extensive annotations, (I'm a complete beginner) they are more for myself than for anyone else.
#libraries--------------------
library(shiny)
library(tidyverse)
library(knitr)
library(doBy)
#----------------------------
#input data
set.seed(123)
mydf <- data.frame( group = rep(rep(c("A","B","C"), each = 3), times = 3),
week = rep(rep(c(1,4,8), each = 9)),
painscore = sample(1:10, 27, replace = TRUE),
dependscore = sample(1:10, 27, replace = TRUE) )
#--------------------------
#define custom function to calculate summary statistics for column of interest.
#function explained in a little more detail when applied in the server function.
fun <- function(x) {
c( n = length(x),
m = median(x),
firstq = round(quantile(x)[2], 1),
lastq = round(quantile(x)[4], 1),
missing = mean(is.na(x)))
}
#-------------------------
#UI
ui <- fluidPage(
titlePanel("Shiny Boxplot and Table"),
#User can provide two different inputs
sidebarLayout(
sidebarPanel(
#1. allow user to pick week using radiobuttons
radioButtons(inputId = "pickedwk",
label = "week you want to display",
choices = c(1,4,8),
selected = 1),
#2. user can pick variable to visualize using dropdownboxes
selectInput(inputId = "var",
label = "variable to visualize",
list("How much pain did you feel today?" = "painscore",
"How dependent are you on medication?" = "dependscore")),
#helpertext
helpText("Enter week/variable choices here")
),
#Spaceholders for output
mainPanel(
plotOutput("boxplot"), #boxplot placeholder
htmlOutput("descriptives") #kable html table placeholder
)
)
)
#-------------------------
#Server
server <- function(input, output) {
#create dataset that subsets down to the week picked by user.
weeksub <- reactive({
mydf %>% filter(week == input$pickedwk[1])
})
#1. use reactive datasubset to render boxplot.
output$boxplot <- renderPlot({
ggplot(weeksub(), aes_string(x = "group", y = input$var)) + #input$var works here
geom_boxplot(fill = "red", outlier.shape = 15, outlier.color = "black") +
scale_y_continuous(name = as.character(input$var)) +
scale_x_discrete(name = "group") +
ggtitle(paste("Distribution of", as.character(input$var), "by treatment group"))
})
#2. use same reactive datasubset to render kable descriptive statistics for the variable picked.
output$descriptives <- renderText({
kable(summaryBy(input$var ~ group, data = as.data.frame(weeksub()), FUN = fun),
#note: here, I'm using the summaryBy function from package doBy. It takes the form var~ categoricalvar
# so, either painscore ~ group, or dependscore ~ group depending on what the user picked, and uses
#my custom function to return a table of count, median, 25th percentile, 75th percentile and missing count for
#the 3 levels of the group variable (for A, B, and C)
col.names = c("Number", "Median", "1Q", "3Q", "Missing"))
})
}#server function ends
# Run the application
shinyApp(ui = ui, server = server)
There are a couple of problems in your code:
The formula notation doesn't know how to deal with input$var. summaryBy supports an alternate syntax that works better. (You could also use as.formula and paste to build a formula.)
You are missing the "Group" column in col.names
You have to generate HTML from kable and pass it as HTML to the UI.
Change your table output to this:
output$descriptives <- renderUI({
HTML(kable(summaryBy(list(input$var, 'group'), data = as.data.frame(weeksub()), FUN = fun),
col.names = c('Group', "Number", "Median", "1Q", "3Q", "Missing"),
format='html'
))
})

Stack bar generated by Phyloseq

I am using this R package called "phyloseq" to analyze the bioinformatic data.
otumat = matrix(sample(1:100, 100, replace = TRUE), nrow = 10, ncol = 10)
otumat
rownames(otumat) <- paste0("OTU", 1:nrow(otumat))
colnames(otumat) <- paste0("Sample", 1:ncol(otumat))
otumat
taxmat = matrix(sample(letters, 70, replace = TRUE), nrow = nrow(otumat), ncol = 7)
rownames(taxmat) <- rownames(otumat)
colnames(taxmat) <- c("Domain", "Phylum", "Class", "Order", "Family", "Genus",
"Species")
taxmat
library("phyloseq")
OTU = otu_table(otumat, taxa_are_rows = TRUE)
TAX = tax_table(taxmat)
OTU
TAX
physeq = phyloseq(OTU, TAX)
physeq
plot_bar(physeq, fill = "Family")
So the bar graph generated do not stack the same Family together. For example, there are two separate "I" blocks in sample 10. I know phyloseq plot graph using ggplot2. Does any one know what ggplot2 associated codes I can add to the lot_bar(physeq, fill = "Family") to stack the same family together in the bar graph?
You need to reorder the levels of the factor being used for the x-axis. physeq presumably has a column called "Sample" (don't have the relevant package installed), you need to reorder the levels in this.
It should be possible to use a command like this
physeq$Sample <- factor(physeq$Sample, levels = paste0("Sample", 1:10))
Then it should plot correctly.
You might need to dig to find the relevant part to change
Actually, with respect, the plot_bar function does already do what you're asking:
# preliminaries
rm(list = ls())
library("phyloseq"); packageVersion("phyloseq")
data("GlobalPatterns")
gp.ch = subset_taxa(GlobalPatterns, Phylum == "Chlamydiae")
# the function call that does what you're asking for
plot_bar(gp.ch, fill = "Family")
See the following help tutorial for more details, examples:
https://joey711.github.io/phyloseq/plot_bar-examples.html
You can also specify the x-axis grouping as well.

Resources