loop through gtsummary table to pick out only significant variables - r

I have a question. I am, relatively new to R. I am transitioning some code from another app to R. In that code, I was able to loop through a table and pick out only the significant variables based on the p-value and the size of the odds ratio for logistic regression. Then I was able to say something like "x had a significant link with y" when the p was less than or equal to 0.05 and the odds ratio as above 1.00 and do the converse "x had a significant negative link with " when the p value was less than 0.05 and the odds ration was below 1.00. Then, I was able to do what I understand from the gtsummary literature is inline_text these statements. As I am trying to get my bearings with R, I was wondering how I would I accomplish this with gtsummary tables? My reproducible code does not work, but it is below:
# install.packages("gtsummary")
library(gtsummary)
library(tidyverse)
#simulated data
gender <- sample(c(0,1), size = 1000, replace = TRUE)
age <- round(runif(1000, 18, 80))
xb <- -9 + 3.5*gender + 0.2*age
p <- 1/(1 + exp(-xb))
y <- rbinom(n = 1000, size = 1, prob = p)
mod <- glm(y ~ gender + age, family = "binomial")
summary(mod)
#create the gtsummary table
tab1 = mod %>%
tbl_regression(exponentiate = TRUE) %>%
as_gt() %>%
gt::tab_source_note(gt::md("*This data is simulated*"))
#attempt of going through the gtsummary table
for (i in 1:nrow(tab1[1:3,])) { # does one row at a time
pv = tab1[["_data"]]$p.value
num = tab1[i, "pv"]
name = tab1[i, "variable"]
if(pv <=0.05 ){
cat("The link between", name, "and is significant. ")
}
}
I ask about the gtsummary regression table because, I will have to do the same thing with the tbl_summary as well. I thought I would begin with the regression version. The idea is to get the gorgeous inline_text via an if else. All of this is triggered by the going down the p-value column, and then pulling the name of the variable and the amazing inline_text information into the sentence. I have looked through the available questions others have asked, but I haven't found anything that gets to the heart of this. If I have missed it, please, point me in the correct direction.

There is a data frame in every gtsummary table called x$table_body. I think it's easier to extract the information you need from there. Example below! (you could also wrap the last line in an inline_text() if that is better for you).
# install.packages("gtsummary")
library(gtsummary)
#> #BlackLivesMatter
library(tidyverse)
#simulated data
gender <- sample(c(0,1), size = 1000, replace = TRUE)
age <- round(runif(1000, 18, 80))
xb <- -9 + 3.5*gender + 0.2*age
p <- 1/(1 + exp(-xb))
y <- rbinom(n = 1000, size = 1, prob = p)
mod <- glm(y ~ gender + age, family = "binomial")
#create the gtsummary table
tab1 = mod %>% tbl_regression(exponentiate = TRUE)
# extract the variable names and the pvalues
tab1$table_body %>%
select(variable, p.value) %>%
filter(p.value <= 0.05) %>% # only keep the sig pvalues
deframe() %>%
imap(~str_glue("The link between 'y' and {.y} is significant ({style_pvalue(.x, prepend_p = TRUE)})."))
#> $gender
#> The link between 'y' and gender is significant (p<0.001).
#>
#> $age
#> The link between 'y' and age is significant (p<0.001).
Created on 2022-11-07 with reprex v2.0.2

Related

ggplot legend values rescale

I would like to rescale the values on the legend of a plot coming from conditional_effects.
By doing something like this
plot(conditional_effects(brm_c_5, effects = "t:w_c_ratio",cond = conditions5), rug = T, points = T)
I'm getting the following
For time being I'm doing
p_col_1 <- ggplot_build(p_col_1)
and then I'm chainging the ranges in here p_col_1$plot$scales$scales[[3]]$range$range and here p_col_1$plot$scales$scales[[4]]$range$range but I'm not trusting this solution.
EXAMPLE:
As example please see this code. The defaults values for kidney$age is from 10 to 69 but let's say that I want to rescale it from -1 to 1. Then I could use the solution via ggplot_build but I'm looking for a smarter and more elegant solution.
library(brms)
fit1 <- brm(time | cens(censored) ~ age + sex + disease,
data = kidney, family = weibull, init = "0")
fit1
p_tr <- (plot(conditional_effects(fit1, effects = "disease:age"), rug = T, points = T)[[1]])
p_tr <- ggplot_build(p_tr)
p_tr$plot$scales$scales[[3]]$range$range <- c(1,0, -1) %>% as.character()
p_tr$plot$scales$scales[[4]]$range$range <- c(1, 0 ,-1)%>% as.character()
plot(p_tr %>% ggplot_gtable)
`
How could I rescale the values of w_c_ratio from -0.9:+0.9 in the original scale (which is going from 2 to 10)?

Removing certain parts of modelsummary in R at specific statistics

I am using gamm4:gamm4 to model longitudinal change.
I am trying to use the modelsummary::modelsummary function to create an output table of the following results:
I would like to add t-values and std.error to the output of the fixed effects, and remove the empty tags values from the random effects
model_lmer <- gamm4(Y ~ Tract + s(Age, by = Tract, k = 10) + Sex,
data = (DF1),
random = ~ (0 + Tract | ID))
modelsummary(model_lmer$mer,
statistic = c("s.e. = {std.error}",
"t = {statistic}"))
But I am struggling to write the correct syntax to remove the "t" and "s.e." from the random effects output.
This is kind of tricky, actually. The issue is that modelsummary()
automatically drops empty rows when they are filled with NA or an
empty string "". However, since glue strings can include arbitrary
text, it is hard to think of a general way to figure out if the row is
empty or not, because modelsummary() cannot know ex ante what
constitutes an empty string.
If you have an idea on how this check could be implemented, please report it
on Github:
https://github.com/vincentarelbundock/modelsummary
In the meantime, you could use the powerful tidy_custom.CLASSNAME
mechanism
to customize the statistic and p.value statistics directly instead
of through a glue string:
library(gamm4)
library(modelsummary)
# simulate
x <- runif(100)
fac <- sample(1:20,100,replace=TRUE)
eta <- x^2*3 + fac/20; fac <- as.factor(fac)
y <- rpois(100,exp(eta))
# fit
mod <- gamm4(y~s(x),family=poisson,random=~(1|fac))
# customize
tidy_custom.glmerMod <- function(x) {
out <- parameters::parameters(x)
out <- insight::standardize_names(out, style = "broom")
out$statistic <- sprintf("t = %.3f", out$statistic)
out$p.value <- sprintf("p = %.3f", out$p.value)
out
}
# summarize
modelsummary(mod$mer,
statistic = c("{statistic}", "{p.value}"))
Model 1
X(Intercept)
1.550
t = 17.647
p = 0.000
Xs(x)Fx1
0.855
t = 4.445
p = 0.000
Num.Obs.
100
RMSE
2.49
Note that I used simple glue strings in statistic = "{p.value}", otherwise they would be wrapped up in parentheses, as is default for standard errors.

Implementing multinomial-Poisson transformation with multilevel models

I know variations of this question have been asked before but I haven't yet seen an answer on how to implement the multinomial Poisson transformation with multilevel models.
I decided to make a fake dataset and follow the method outlined here, also consulting the notes the poster mentions as well as the Baker paper on MP transformation.
In order to check if I'm doing the coding correctly, I decided to create a binary outcome variable as a first step; because glmer can handle binary response variables, this will let me check I'm correctly recasting the logit regression as multiple Poissons.
The context of this problem is running multilevel regressions with survey data where the outcome variable is response to a question and the possible predictors are demographic variables. As I mentioned above, I wanted to see if I could properly code the binary outcome variable as a Poisson regression before moving on to multi-level outcome variables.
library(dplyr)
library(lme4)
key <- expand.grid(sex = c('Male', 'Female'),
age = c('18-34', '35-64', '45-64'))
set.seed(256)
probs <- runif(nrow(key))
# Make a fake dataset with 1000 responses
n <- 1000
df <- data.frame(sex = sample(c('Male', 'Female'), n, replace = TRUE),
age = sample(c('18-34', '35-64', '45-64'), n, replace = TRUE),
obs = seq_len(n), stringsAsFactors = FALSE)
age <- model.matrix(~ age, data = df)[, -1]
sex <- model.matrix(~ sex, data = df)[, -1]
beta_age <- matrix(c(0, 1), nrow = 2, ncol = 1)
beta_sex <- matrix(1, nrow = 1, ncol = 1)
# Create class probabilities as a function of age and sex
probs <- plogis(
-0.5 +
age %*% beta_age +
sex %*% beta_sex +
rnorm(n)
)
id <- ifelse(probs > 0.5, 1, 0)
df$y1 <- id
df$y2 <- 1 - df$y1
# First run the regular hierarchical logit, just with a varying intercept for age
glm_out <- glmer(y1 ~ (1|age), family = 'binomial', data = df)
summary(glm_out)
#Next, two Poisson regressions
glm_1 <- glmer(y1 ~ (1|obs) + (1|age), data = df, family = 'poisson')
glm_2 <- glmer(y2 ~ (1|obs) + (1|age), data = df, family = 'poisson')
coef(glm_1)$age - coef(glm_2)$age
coef(glm_out)$age
The outputs for the last two lines are:
> coef(glm_1)$age - coef(glm_2)$age
(Intercept)
18-34 0.14718933
35-64 0.03718271
45-64 1.67755129
> coef(glm_out)$age
(Intercept)
18-34 0.13517758
35-64 0.02190587
45-64 1.70852847
These estimates seem close but they are not exactly the same. I'm thinking I've specified an equation wrong with the intercept.

lmerTest::rand() behaves strangely when variable names contain '.'

I have some experience with lme4, but today I tried lmerTest for the first time and was surprised by some results when using the rand() function to examine the random components. (I know this is not advised by the authors of lme4!) In troubleshooting, I think I discovered some undesired behavior: when rand() sees a random effect term whose variable name contains a dot, it appears to parse the term into multiple variables. Of course these may or may not exist in the dataset, but unfortunately it doesn't throw an error, it just gives weird results.
Here's a MWE:
library(dplyr)
nsub <- 500
nvis <- 6
data <- data.frame(subjid = factor(sort(rep(c(1:nsub),nvis))),
visit = rep(c(1:nvis),nsub))
base <- filter(data, visit==1) %>%
select(subjid) %>%
mutate(baseage=rnorm(nsub, 40, 10)) %>%
merge(data, by="subjid") %>%
mutate(interval = ifelse(visit==1, 0, 2*(visit-1) + runif(nsub*nvis, 0, 1)),
age = baseage + interval,
ageverylong = age,
age.very.long = age,
y = 100 - 0.1*age + rnorm(nsub*nvis))
mod1 <- lmer(y ~ (age | subjid), data=base)
mod2 <- lmer(y ~ (ageverylong | subjid), data=base)
mod3 <- lmer(y ~ (age.very.long | subjid), data=base)
summary(mod1) # These three give the same results (so it's not an lme4 problem)
summary(mod2)
summary(mod3)
rand(mod1) # The top two are the same
rand(mod2)
rand(mod3) # But this one is different

R: cant get a lme{nlme} to fit when using self-constructed interaction variables

I'm trying to get a lme with self constructed interaction variables to fit. I need those for post-hoc analysis.
library(nlme)
# construct fake dataset
obsr <- 100
dist <- rep(rnorm(36), times=obsr)
meth <- dist+rnorm(length(dist), mean=0, sd=0.5); rm(dist)
meth <- meth/dist(range(meth)); meth <- meth-min(meth)
main <- data.frame(meth = meth,
cpgl = as.factor(rep(1:36, times=obsr)),
pbid = as.factor(rep(1:obsr, each=36)),
agem = rep(rnorm(obsr, mean=30, sd=10), each=36),
trma = as.factor(rep(sample(c(TRUE, FALSE), size=obsr, replace=TRUE), each=36)),
depr = as.factor(rep(sample(c(TRUE, FALSE), size=obsr, replace=TRUE), each=36)))
# check if all factor combinations are present
# TRUE for my real dataset; Naturally TRUE for the fake dataset
with(main, all(table(depr, trma, cpgl) >= 1))
# construct interaction variables
main$depr_trma <- interaction(main$depr, main$trma, sep=":", drop=TRUE)
main$depr_cpgl <- interaction(main$depr, main$cpgl, sep=":", drop=TRUE)
main$trma_cpgl <- interaction(main$trma, main$cpgl, sep=":", drop=TRUE)
main$depr_trma_cpgl <- interaction(main$depr, main$trma, main$cpgl, sep=":", drop=TRUE)
# model WITHOUT preconstructed interaction variables
form1 <- list(fixd = meth ~ agem + depr + trma + depr*trma + cpgl +
depr*cpgl +trma*cpgl + depr*trma*cpgl,
rndm = ~ 1 | pbid,
corr = ~ cpgl | pbid)
modl1 <- nlme::lme(fixed=form1[["fixd"]],
random=form1[["rndm"]],
correlation=corCompSymm(form=form1[["corr"]]),
data=main)
# model WITH preconstructed interaction variables
form2 <- list(fixd = meth ~ agem + depr + trma + depr_trma + cpgl +
depr_cpgl + trma_cpgl + depr_trma_cpgl,
rndm = ~ 1 | pbid,
corr = ~ cpgl | pbid)
modl2 <- nlme::lme(fixed=form2[["fixd"]],
random=form2[["rndm"]],
correlation=corCompSymm(form=form2[["corr"]]),
data=main)
The first model fits without any problems whereas the second model gives me following error:
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
Nothing i found out about this error so far helped me to solve the problem. However the solution is probably pretty easy.
Can someone help me? Thanks in advance!
EDIT 1:
When i run:
modl3 <- lm(form1[["fixd"]], data=main)
modl4 <- lm(form2[["fixd"]], data=main)
The summaries reveal that modl4 (with the self constructed interaction variables) in contrast to modl3 shows many more predictors. All those that are in 4 but not in 3 show NA as coefficients. The problem therefore definitely lies within the way i create the interaction variables...
EDIT 2:
In the meantime I created the interaction variables "by hand" (mainly paste() and grepl()) - It seems to work now. However I would still be interested in how i could have realized it by using the interaction() function.
I should have only constructed the largest of the interaction variables (combining all 3 simple variables).
If i do so the model gets fit. The likelihoods then are very close to each other and the number of coefficients matches exactly.

Resources