Gtsummary columns for mixed model p-value and interaction - r

I have been conducting a cross-over experiment, testing a specific treatment to a group of patients who received treatment "1" and "2" in random order.
I am fairly new to R, and I wish to generate a table with tbl_summary with one column of each treatment effect on various parameters, as well as a column for the p-value from the mixed model analysis (between-group comparison) and a sequence-interaction p-value.
I have calculated the between-period difference in response to treatment within each period by using a mixed model approach with the lme4-package. Then, I compared the treatment response between groups by the estimated marginal means (emmeans).
I have conducted my statistics using the following code:
library(emmeans)
library(lme4)
library(lmerTest)
df <- data.frame (record_id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12),
treatment = c(1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2),
treatment_sequence = c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1),
treatment_response = c(-43.5, 135.0, 8.4, -7.2, 99.0, 159.0, 12.0, -27.0, 3.0, 12.0, -15.0, 91.5, 6.0, -9.0, 177.0, 27.0, 52.8, -54.0, -50.7, 63.0, -9.0, 186.0, -72.0, 15.0)
)
df
df_mm <- lmer(treatment_response ~ as.factor(treatment)*treatment_sequence + (1|record_id), data=df)
anova(df_mm)
emmeans(df_mm, list(pairwise ~ treatment), adjust = "bonferroni")
This gives the following output:
> df_mm <- lmer(treatment_response ~ as.factor(treatment)*treatment_sequence + (1|record_id), data=df)
> anova(df_mm) ###show model as anova???
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
as.factor(treatment) 1890.0 1890.0 1 10 0.4575 0.5141
treatment_sequence 832.1 832.1 1 10 0.2014 0.6632
as.factor(treatment):treatment_sequence 7466.0 7466.0 1 10 1.8071 0.2086
> emmeans(df_mm, list(pairwise ~ treatment), adjust = "bonferroni")
NOTE: Results may be misleading due to involvement in interactions
$`emmeans of treatment`
treatment emmean SE df lower.CL upper.CL
1 1.45 19.9 19.7 -40.1 43
2 61.83 19.9 19.7 20.3 103
Results are averaged over the levels of: treatment_sequence
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$`pairwise differences of treatment`
1 estimate SE df t.ratio p.value
treatment1 - treatment2 -60.4 26.2 10 -2.301 0.0442
Results are averaged over the levels of: treatment_sequence
Degrees-of-freedom method: kenward-roger
I would like the between-treatment comparison p-value (0.0442) along with the interaction p-value of 0.21 in the table. My aim is to create a table like this:
I have tried modifying the code from this post (Gtsummary columns for all post hoc pairwise comparisons), but I cannot seem to get it right.
Is this possible? And can someone help with the coding?

Below is a working example. BUT I don't think the emmeans method you're using is correct. If you want to use it, you'll need to update the code to grab the p-value from the emmeans object (it's just a random number for now).
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.1'
df <- data.frame (record_id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12),
treatment = c(1, 2, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2),
treatment_sequence = c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1),
treatment_response = c(-43.5, 135.0, 8.4, -7.2, 99.0, 159.0, 12.0, -27.0, 3.0, 12.0, -15.0, 91.5, 6.0, -9.0, 177.0, 27.0, 52.8, -54.0, -50.7, 63.0, -9.0, 186.0, -72.0, 15.0)
)
mod <- lme4::lmer(treatment_response ~ as.factor(treatment) * treatment_sequence + (1 | record_id), data=df)
tt <- emmeans::emmeans(mod, list(pairwise ~ treatment), adjust = "bonferroni")$`pairwise differences of treatment`
#> NOTE: Results may be misleading due to involvement in interactions
tt |> as.data.frame() |> dplyr::select(dplyr::last_col()) |> dplyr::pull()
#> [1] 0.04419325
car::Anova(mod) %>%
broom::tidy() %>%
dplyr::filter(dplyr::n() == dplyr::row_number()) |>
dplyr::pull(p.value)
#> [1] 0.1788567
my_custom_stats <- function(data, variable, ...) {
formula <-
as.formula(glue::glue(
"{variable} ~ as.factor(treatment) * treatment_sequence + (1 | record_id)"
))
mod <- lme4::lmer(formula, data = data)
# I think this is not appriraite due to the interaction
# but if you're confident about this approach, update pw_difference_p to be from emmeans
pw_difference_p <-
emmeans::emmeans(mod, list(pairwise ~ treatment), adjust = "bonferroni")
pw_difference_p <- runif(1)
interacton_p <-
car::Anova(mod) %>%
broom::tidy() %>%
dplyr::filter(dplyr::n() == dplyr::row_number()) |>
dplyr::pull(p.value)
dplyr::tibble(
pw_difference_p = pw_difference_p,
interacton_p = interacton_p
)
}
df %>%
tbl_summary(
by = treatment,
include = treatment_response,
statistic = all_continuous() ~ "{mean} ± {sd}"
) %>%
add_stat(fns = ~my_custom_stats) %>%
modify_header(interacton_p = "**Interaction P**",
pw_difference_p = "**Treatment P**") %>%
modify_fmt_fun(c(interacton_p, pw_difference_p) ~ style_pvalue) %>%
as_kable()
#> NOTE: Results may be misleading due to involvement in interactions
Characteristic
1, N = 12
2, N = 12
Treatment P
Interaction P
treatment_response
1 ± 60
62 ± 76
0.3
0.2
Created on 2022-08-24 by the reprex package (v2.0.1)

Related

Loop glm for every column in R dataset

I have a dataset of 100 patients (7 are shown here), 2 covariates and 50 phenotypes(5 are shown here). I want to perform a multivariable logistic regression for each phenotype using Covariate1 and Covariate2 as covariates to predict the Outcome, I would like to get a table like this, where I have the p-value, OR and confidence interval(CI)per each of the covariates.
I tried:
for (i in df) {
print(i)
model <-glm(Outcome~ x[i] +Covariate1 +Covariate2, family = binomial(link = "logit"), data=df)
I also tried the solution for this question. But x and y a reversed in my question, so it did not work:
R: automate table for results of several multivariable logistic regressions
Thanks very much for your help!
This is an example dataset
df<-structure(list(ID = c(1, 2, 3, 4, 5, 6, 7), Outcome = c(0, 0,
1, 1, 0, 1, 0), Covariate1 = c(1, 2, 3, 4, 5, 6, 7), Covariate2 = c(0,
0, 0, 1, 1, 1, 1), P1 = c(1, 0, 0, 1, 1, 1, 2), P2 = c(0, 2,
0, 1, 1, 1, 1), P3 = c(0, 0, 0, 1, 1, 1, 1), P4 = c(0, 0, 0,
1, 2, 1, 1), P5 = c(0, 0, 0, 1, 1, 1, 2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -7L))
if I understood correctly
df <- structure(
list(
ID = c(1, 2, 3, 4, 5, 6, 7),
Outcome = c(0, 0, 1, 1, 0, 1, 0),
Covariate1 = c(1, 2, 3, 4, 5, 6, 7),
Covariate2 = c(0, 0, 0, 1, 1, 1, 1),
P1 = c(1, 0, 0, 1, 1, 1, 2),
P2 = c(0, 2, 0, 1, 1, 1, 1),
P3 = c(0, 0, 0, 1, 1, 1, 1),
P4 = c(0, 0, 0, 1, 2, 1, 1),
P5 = c(0, 0, 0, 1, 1, 1, 2)
),
class = c("tbl_df",
"tbl", "data.frame"),
row.names = c(NA,-7L)
)
library(tidyverse)
first_tables <- map(
.x = select(df, starts_with("P")),
.f = ~ glm(
Outcome ~ .x + Covariate1 + Covariate2,
family = binomial(link = "logit"),
data = df
)
) %>%
map(broom::tidy)
map_df(
.x = first_tables,
.f = ~ .x %>% mutate(
p = p.value,
OR = exp(estimate),
CI5 = exp(estimate - 1.96 * std.error),
CI95 = exp(estimate + 1.96 * std.error),
.keep = "unused"
) %>%
select(-statistic),
.id = "phenotype"
) %>%
filter(term == ".x") %>%
select(-term)
#> # A tibble: 5 x 5
#> phenotype p OR CI5 CI95
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 P1 0.997 5.84e-10 0 Inf
#> 2 P2 0.996 1.53e- 4 0 Inf
#> 3 P3 0.824 2.00e+ 0 0.00442 904.
#> 4 P4 0.998 3.66e- 9 0 Inf
#> 5 P5 0.997 2.72e-10 0 Inf
Created on 2023-01-11 with reprex v2.0.2

R: Calculate row sum (MERSQI score), adjusted to missing values / not applicable categories

I would like to calculate sums of rows, including adjustment for missing data.
The row sums are "MERSQI" scores in real (scoring the quality of studies, 1study per row). Each col is a question about quality with a specific maximum of points achievable.
However, in some cases, questions were not applicable for some studies leading to "missing values". The row sum should be adjusted to standard denominator of 18 as maximal score/row sum, i.e.: (max achievable points= sum of maximal achievable points of applicable questions/cols)
total MERSQI score = row sum / max achievable points * 18
For example:
questions <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) #number of question or col number
max_quest <- c(3, 1.5, 1.5, 3, 1, 1, 1, 1, 3) #maximum of every single question
study1 <- c(1.5, 0.5, 1.5, 3, 0, 0, 0, 1, 3) #points for every single questions for study1
study2 <- c(1, 0.5, 0.5, 3, NA, NA, NA, 1, 1, 3) # for study2
study3 <- c(2, 1.5, NA, 3, NA, 1, NA, 1, 1, 3) #for study3
df <- rbind (questions, max_quest, study1, study2, study3)
For study1 we would have a row sum and resulting score of 10.5 and as there are no missing values.
For study2 we have a row sum of 10. We have three NA, maximal achievable points for study2 were 15 (=18 maximal points - 3*1 point of the NA questions), and adjusted MERSQI score of 12.85 (=10 *18/15).
For study3: row sum= 12.5, maximal achievable points=15.5 (=18 -(1.5+1+1)), adjusted MERSQI score= 15.53
Do you have any idea how to calculate the row sums with adjusting for missing values? Maybe with going through every row, using forloop and ifwith is.na?
Thank you!
PS: Link / explanation to the MERSQI score: https://www.aliem.com/article-review-how-do-you-assess/ and https://pubmed.ncbi.nlm.nih.gov/26107881/
There is an issue with the lengths of the vectors. I edited the dataset so that they are all length 9, but this should work:
apply(mat[, 3:5],
2,
FUN = function (x) {
tot = sum(x, na.rm = TRUE)
nas = which(is.na(x))
total_max = sum(max_quest)
if (!length(nas))
return(tot)
else
return(tot * total_max / (total_max - sum(max_quest[nas])))
})
Data:
questions <- c(1, 2, 3, 4, 5, 6, 7, 8, 9) #number of question or col number
max_quest <- c(3, 1.5, 1.5, 3, 1, 1, 1, 1, 3) #maximum of every single question
study1 <- c(1.5, 0.5, 1.5, 3, 0, 0, 0, 1, 3) #points for every single questions for study1
study2 <- c(1, 0.5, 0.5, 3, NA, NA, NA, 1, 1) # for study2
study3 <- c(2, 1.5, NA, 3, NA, 1, NA, 1, 1) #for study3
## rename mat because cbind(...) of vectors returns matrix.
mat <- cbind (questions, max_quest, study1, study2, study3)
For each study column calculate it's sum multiply by sum of max_quest and divide by max_quest - NA value.
library(dplyr)
val <- sum(df$max_quest)
df %>%
summarise(across(starts_with('study'),
~sum(., na.rm = TRUE)* val/(val - sum(max_quest[is.na(.)]))))
data
The data shared is not complete due to incompatible lengths. Also it would make sense if these values are in column-wise fashion than row-wise.
questions <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
max_quest <- c(3, 1.5, 1.5, 3, 1, 1, 1, 1, 3, 3)
study1 <- c(1.5, 0.5, 1.5, 3, 0, 0, 0, 1, 3, 0)
study2 <- c(1, 0.5, 0.5, 3, NA, NA, NA, 1, 1, 3)
study3 <- c(2, 1.5, NA, 3, NA, 1, NA, 1, 1, 3)
df <- data.frame(questions, max_quest, study1, study2, study3)
This can be done with vectorization.
First apply row sums and find number of NAs:
row_sums <- apply(df, 1, function(x) sum(x, na.rm=T))
row_NAs <- apply(df,1, function(x) sum(is.na(x)))
Then pull out studies and max points:
studies <- row_sums[3:length(row_sums)]
max <- row_sums[2]
Compute the MERSQI from the adjusted max, based on NAs:
adjusted_max <- rep(max, length(studies)) - row_NAs[3:length(row_NAs)]
MERSQI <- studies * max / adjusted_max

How to format table and or chi-square for testing significance among categorical variables

I have this data set where I want to compare variables to see if any groups are significant by various categorical variables. When I use the following code, R returns an warning message saying the p-value may be wrong. Am I formatting the chi-square incorrectly?
df<-(group, gender, race, handedness
1, 0, 3, 0
1, 1, 3, 1
2, 1, 3, 0
2, 1, 3, 0
3, 0, 2, 1
3, 0, 2, 0)
# Is Gender significant among groups
gendertab<-table(df$gender, df$group)
chisq.test(gendertab)
# Is Race significant
racetab<-table(df$race, df$group)
chisq.test(racetab)
# Is Handedness significant
handtab<-table(df$handedness, df$group)
chisq.test(handtab)
Try argument simulate.p.value = TRUE and compare the results to fisher.test (R manual page or Wikipedia). They seem to agree with each other.
# Is Gender significant among groups
gendertab<-table(df$gender, df$group)
fisher.test(gendertab)
chisq.test(gendertab, simulate.p.value = TRUE)
# Is Race significant
racetab<-table(df$race, df$group)
fisher.test(racetab)
chisq.test(racetab, simulate.p.value = TRUE)
# Is Handedness significant
handtab<-table(df$handedness, df$group)
fisher.test(handtab)
chisq.test(handtab, simulate.p.value = TRUE)
Data.
df <- read.csv(text = "
group, gender, race, handedness
1, 0, 3, 0
1, 1, 3, 1
2, 1, 3, 0
2, 1, 3, 0
3, 0, 2, 1
3, 0, 2, 0")

cv.glmnet gives auc value greater than 1

I want to get the AUC on the testing set from cv.glmnet for the best set of hyperparameters. according to this post.
I should run cvm and get it, however, when I do this i get a value greater than 1, and my understanding is that the AUC should be between 0 and 1. Here's an example:
age <- c(4, 8, 7, 12, 6, 9, 10, 14, 7)
gender <- as.factor(c(1, 0, 1, 1, 1, 0, 1, 0, 0))
bmi_p <- c(0.86, 0.45, 0.99, 0.84, 0.85, 0.67, 0.91, 0.29, 0.88)
m_edu <- as.factor(c(0, 1, 1, 2, 2, 3, 2, 0, 1))
p_edu <- as.factor(c(0, 2, 2, 2, 2, 3, 2, 0, 0))
f_color <- as.factor(c("blue", "blue", "yellow", "red", "red", "yellow",
"yellow", "red", "yellow"))
asthma <- c(1, 1, 0, 1, 0, 0, 0, 1, 1)
xfactors <- model.matrix(asthma ~ gender + m_edu + p_edu + f_color)[, -1]
x <- as.matrix(data.frame(age, bmi_p, xfactors))
cv.glmmod <- cv.glmnet(x, y=asthma, alpha=1,family="binomial", type.measure = "auc")
max(cv.glmmod$cvm)
[1] 7.0223
How do I interpret this number? is it really just .70223?
Thanks,
Steve
For your dataset, cv.glmnet() do not measure the loss by "AUC", but "deviance", which is what you obtained by cv.glmmod$cvm.
Althouth you run the CV by cv.glmnet(type.measure="auc"), your dataset is too small. In this situation, cv.glmnet() (actually cv.lognet()) issues warning "Too few (< 10) observations per fold for type.measure='auc' in cv.lognet; changed to type.measure='deviance'. Alternatively, use smaller value for nfolds", and according to what the function complains about, it sets type.measure="deviance".
You can verify this by showing cv.glmmod$name, which should be "Partial Likelihood Deviance" in your case, instead of "AUC".

How does one calculate LD50 from a glmer?

I am analyzing a data set where ~10 individuals are exposed to a set treatment (Time) and mortality is recorded (Alive, Dead). glmer was used to model the data because Treatments were blocked (Trial).
From the following model I want to predict the Time at which 50% of individuals die.
Trial <- c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3)
Time <- c(2, 6, 9, 12, 15, 18, 21, 24, 1, 2, 3, 4, 5, 6, 1.5, 3, 4.5, 6, 39)
Alive <- c(10, 0, 0, 0, 0, 0, 0, 0, 6, 2, 8, 1, 0, 0, 4, 6, 1, 2, 0)
Dead <- c(0, 10, 6, 10, 10, 10, 7, 10, 0, 8, 1, 9, 10, 10, 5, 0, 8, 6, 10)
ostrinaA.glmm<- glmer(cbind(Alive, Dead)~Time+(1|Trial), family = binomial(link="logit"))
summary(ostrinaA.glmm)
If I was simply modelling using glmthe dose.p function from MASS could be used. From a different forum I found generalized code for a dose.p.glmm from Bill Pikounis. It is as follows:
dose.p.glmm <- function(obj, cf = 1:2, p = 0.5) {
eta <- obj$family$linkfun(p)
b <- fixef(obj)[cf]
x.p <- (eta - b[1L])/b[2L]
names(x.p) <- paste("p = ", format(p), ":", sep = "")
pd <- -cbind(1, x.p)/b[2L]
SE <- sqrt(((pd %*% vcov(obj)[cf, cf]) * pd) %*% c(1, 1))
res <- structure(x.p, SE = SE, p = p)
class(res) <- "glm.dose"
res
}
I'm new to coding and need help adjusting this code for my model. My attempt is as follows:
dose.p.glmm <- function(ostrinaA.glmm, cf = 1:2, p = 0.5) {
eta <- ostrinaA.glmm$family$linkfun(p)
b <- fixef(ostrinaA.glmm)[cf]
x.p <- (eta - b[1L])/b[2L]
names(x.p) <- paste("p = ", format(p), ":", sep = "")
pd <- -cbind(1, x.p)/b[2L]
SE <- sqrt(((pd %*% vcov(obj)[cf, cf]) * pd) %*% c(1, 1))
res <- structure(x.p, SE = SE, p = p)
class(res) <- "glm.dose"
res
}
dose.p.glmm(ostrinaA.glmm, cf=1:2, p=0.5)
Error in ostrinaA.glmm$family : $ operator not defined for this S4 class
Any assistance adjusting this code for my model would be greatly appreciated.
At a quick glance I would think replacing
eta <- obj$family$linkfun(p)
with
f <- family(obj)
eta <- f$linkfun(p)
should do the trick.
You also need to replace the res <- ... line with
res <- structure(x.p, SE = matrix(SE), p = p)
This is rather obscure, but is necessary because the print.dose.glm method (from the MASS package) automatically tries to cbind() some stuff together. This fails if SE is a fancy matrix from the Matrix package rather than a vanilla matrix from base R: matrix() does the conversion.
If you are very new to coding, you might not realize that you don't have to change the obj variable name in the code you've copied to ostrina.glmm. In other words, Pikounis's code should work perfectly well with only the two modifications I suggested above.

Resources