Removing certain parts of modelsummary in R at specific statistics - r

I am using gamm4:gamm4 to model longitudinal change.
I am trying to use the modelsummary::modelsummary function to create an output table of the following results:
I would like to add t-values and std.error to the output of the fixed effects, and remove the empty tags values from the random effects
model_lmer <- gamm4(Y ~ Tract + s(Age, by = Tract, k = 10) + Sex,
data = (DF1),
random = ~ (0 + Tract | ID))
modelsummary(model_lmer$mer,
statistic = c("s.e. = {std.error}",
"t = {statistic}"))
But I am struggling to write the correct syntax to remove the "t" and "s.e." from the random effects output.

This is kind of tricky, actually. The issue is that modelsummary()
automatically drops empty rows when they are filled with NA or an
empty string "". However, since glue strings can include arbitrary
text, it is hard to think of a general way to figure out if the row is
empty or not, because modelsummary() cannot know ex ante what
constitutes an empty string.
If you have an idea on how this check could be implemented, please report it
on Github:
https://github.com/vincentarelbundock/modelsummary
In the meantime, you could use the powerful tidy_custom.CLASSNAME
mechanism
to customize the statistic and p.value statistics directly instead
of through a glue string:
library(gamm4)
library(modelsummary)
# simulate
x <- runif(100)
fac <- sample(1:20,100,replace=TRUE)
eta <- x^2*3 + fac/20; fac <- as.factor(fac)
y <- rpois(100,exp(eta))
# fit
mod <- gamm4(y~s(x),family=poisson,random=~(1|fac))
# customize
tidy_custom.glmerMod <- function(x) {
out <- parameters::parameters(x)
out <- insight::standardize_names(out, style = "broom")
out$statistic <- sprintf("t = %.3f", out$statistic)
out$p.value <- sprintf("p = %.3f", out$p.value)
out
}
# summarize
modelsummary(mod$mer,
statistic = c("{statistic}", "{p.value}"))
Model 1
X(Intercept)
1.550
t = 17.647
p = 0.000
Xs(x)Fx1
0.855
t = 4.445
p = 0.000
Num.Obs.
100
RMSE
2.49
Note that I used simple glue strings in statistic = "{p.value}", otherwise they would be wrapped up in parentheses, as is default for standard errors.

Related

loop through gtsummary table to pick out only significant variables

I have a question. I am, relatively new to R. I am transitioning some code from another app to R. In that code, I was able to loop through a table and pick out only the significant variables based on the p-value and the size of the odds ratio for logistic regression. Then I was able to say something like "x had a significant link with y" when the p was less than or equal to 0.05 and the odds ratio as above 1.00 and do the converse "x had a significant negative link with " when the p value was less than 0.05 and the odds ration was below 1.00. Then, I was able to do what I understand from the gtsummary literature is inline_text these statements. As I am trying to get my bearings with R, I was wondering how I would I accomplish this with gtsummary tables? My reproducible code does not work, but it is below:
# install.packages("gtsummary")
library(gtsummary)
library(tidyverse)
#simulated data
gender <- sample(c(0,1), size = 1000, replace = TRUE)
age <- round(runif(1000, 18, 80))
xb <- -9 + 3.5*gender + 0.2*age
p <- 1/(1 + exp(-xb))
y <- rbinom(n = 1000, size = 1, prob = p)
mod <- glm(y ~ gender + age, family = "binomial")
summary(mod)
#create the gtsummary table
tab1 = mod %>%
tbl_regression(exponentiate = TRUE) %>%
as_gt() %>%
gt::tab_source_note(gt::md("*This data is simulated*"))
#attempt of going through the gtsummary table
for (i in 1:nrow(tab1[1:3,])) { # does one row at a time
pv = tab1[["_data"]]$p.value
num = tab1[i, "pv"]
name = tab1[i, "variable"]
if(pv <=0.05 ){
cat("The link between", name, "and is significant. ")
}
}
I ask about the gtsummary regression table because, I will have to do the same thing with the tbl_summary as well. I thought I would begin with the regression version. The idea is to get the gorgeous inline_text via an if else. All of this is triggered by the going down the p-value column, and then pulling the name of the variable and the amazing inline_text information into the sentence. I have looked through the available questions others have asked, but I haven't found anything that gets to the heart of this. If I have missed it, please, point me in the correct direction.
There is a data frame in every gtsummary table called x$table_body. I think it's easier to extract the information you need from there. Example below! (you could also wrap the last line in an inline_text() if that is better for you).
# install.packages("gtsummary")
library(gtsummary)
#> #BlackLivesMatter
library(tidyverse)
#simulated data
gender <- sample(c(0,1), size = 1000, replace = TRUE)
age <- round(runif(1000, 18, 80))
xb <- -9 + 3.5*gender + 0.2*age
p <- 1/(1 + exp(-xb))
y <- rbinom(n = 1000, size = 1, prob = p)
mod <- glm(y ~ gender + age, family = "binomial")
#create the gtsummary table
tab1 = mod %>% tbl_regression(exponentiate = TRUE)
# extract the variable names and the pvalues
tab1$table_body %>%
select(variable, p.value) %>%
filter(p.value <= 0.05) %>% # only keep the sig pvalues
deframe() %>%
imap(~str_glue("The link between 'y' and {.y} is significant ({style_pvalue(.x, prepend_p = TRUE)})."))
#> $gender
#> The link between 'y' and gender is significant (p<0.001).
#>
#> $age
#> The link between 'y' and age is significant (p<0.001).
Created on 2022-11-07 with reprex v2.0.2

Passing variable names as strings into the contrasts() argument in lm

I am trying to create a function that allows me to pass outcome and predictor variable names as strings into the lm() regression function. I have actually asked this before here, but I learned a new technique here and would like to try and apply the same idea in this new format.
Here is the process
library(tidyverse)
# toy data
df <- tibble(f1 = factor(rep(letters[1:3],5)),
c1 = rnorm(15),
out1 = rnorm(15))
# pass the relevant inputs into new objects like in a function
d <- df
outcome <- "out1"
predictors <- c("f1", "c1")
# now create the model formula to be entered into the model
form <- as.formula(
paste(outcome,
paste(predictors, collapse = " + "),
sep = " ~ "))
# now pass the formula into the model
model <- eval(bquote( lm(.(form),
data = d) ))
model
# Call:
# lm(formula = out1 ~ f1 + c1, data = d)
#
# Coefficients:
# (Intercept) f1b f1c c1
# 0.16304 -0.01790 -0.32620 -0.07239
So this all works nicely, an adaptable way of passing variables into lm(). But what if we want to apply special contrast coding to the factorial variable? I tried
model <- eval(bquote( lm(.(form),
data = d,
contrasts = list(predictors[1] = contr.treatment(3)) %>% setNames(predictors[1])) ))
But got this error
Error: unexpected '=' in:
" data = d,
contrasts = list(predictors[1] ="
Any help much appreciated.
Reducing this to the command generating the error:
list(predictors[1] = contr.treatment(3))
Results in:
Error: unexpected '=' in "list(predictors[1] ="
list() seems to choke when the left-hand side naming is a variable that needs to be evaluated.
Your approach of using setNames() works, but needs to be wrapped around the list construction step itself.
setNames(list(contr.treatment(3)), predictors[1])
Output is a named list containing a contrast matrix:
$f1
2 3
1 0 0
2 1 0
3 0 1

retain several best models during model dredging in R

Is there a way to retain the best models, for example, within two Alkaike Information Criterion (AIC) units of the best fitting model, during a model dredging approach in R? I am using the glmulti package, which returns the AIC of the best models, but does not allow visualizing the models associated with those values.
Thanks in advance.
Here is my example (data here):
results <- read.csv("gameresults.csv")
require(glmulti)
M <- glmulti(result~speed*svl*tailsize*strategy,
data=results, name = "glmulti.analysis",
intercept = TRUE, marginality = FALSE,
level = 2, minsize = 0, maxsize = -1, minK = 0, maxK = -1,
fitfunction = Multinom, method = "h", crit = "aic",
confsetsize = 100,includeobjects=TRUE)
summary(M)
The function glmulti::glmulti returns a S4 class object that can be accessed like a list. All of your models, not just the best, could be accessed. Since I don't have your functions and some other optional inputs, I performed a simplified version of your model just as a demonstration:
results <- read.csv("gameresults.csv")
library(glmulti)
M <- glmulti(result~speed*svl*strategy, data=results, crit = "aic", plotty = TRUE)
Here are a list of all models, accessed by the # operator:
M#formulas
# [[1]]
# result ~ 1 + speed + svl:speed + strategy:speed
# <environment: 0x11a616750>
#
# [[2]]
# result ~ 1 + speed + svl + svl:speed + strategy:speed
# <environment: 0x11a616750>
#
# [[3]]
# result ~ 1 + strategy + speed + svl:speed + strategy:speed
# <environment: 0x11a616750>
#
## **I omitted the remaining 36-3=33 models**
You can plot them individually based on the formula, using the base graphic or any packages that support use of model formulas. For example, I randomly selected one from the list:
plot(result ~ 1 + speed + svl, data=results)
## Hit <Return> to see next plot:
## Hit <Return> to see next plot:

R: cant get a lme{nlme} to fit when using self-constructed interaction variables

I'm trying to get a lme with self constructed interaction variables to fit. I need those for post-hoc analysis.
library(nlme)
# construct fake dataset
obsr <- 100
dist <- rep(rnorm(36), times=obsr)
meth <- dist+rnorm(length(dist), mean=0, sd=0.5); rm(dist)
meth <- meth/dist(range(meth)); meth <- meth-min(meth)
main <- data.frame(meth = meth,
cpgl = as.factor(rep(1:36, times=obsr)),
pbid = as.factor(rep(1:obsr, each=36)),
agem = rep(rnorm(obsr, mean=30, sd=10), each=36),
trma = as.factor(rep(sample(c(TRUE, FALSE), size=obsr, replace=TRUE), each=36)),
depr = as.factor(rep(sample(c(TRUE, FALSE), size=obsr, replace=TRUE), each=36)))
# check if all factor combinations are present
# TRUE for my real dataset; Naturally TRUE for the fake dataset
with(main, all(table(depr, trma, cpgl) >= 1))
# construct interaction variables
main$depr_trma <- interaction(main$depr, main$trma, sep=":", drop=TRUE)
main$depr_cpgl <- interaction(main$depr, main$cpgl, sep=":", drop=TRUE)
main$trma_cpgl <- interaction(main$trma, main$cpgl, sep=":", drop=TRUE)
main$depr_trma_cpgl <- interaction(main$depr, main$trma, main$cpgl, sep=":", drop=TRUE)
# model WITHOUT preconstructed interaction variables
form1 <- list(fixd = meth ~ agem + depr + trma + depr*trma + cpgl +
depr*cpgl +trma*cpgl + depr*trma*cpgl,
rndm = ~ 1 | pbid,
corr = ~ cpgl | pbid)
modl1 <- nlme::lme(fixed=form1[["fixd"]],
random=form1[["rndm"]],
correlation=corCompSymm(form=form1[["corr"]]),
data=main)
# model WITH preconstructed interaction variables
form2 <- list(fixd = meth ~ agem + depr + trma + depr_trma + cpgl +
depr_cpgl + trma_cpgl + depr_trma_cpgl,
rndm = ~ 1 | pbid,
corr = ~ cpgl | pbid)
modl2 <- nlme::lme(fixed=form2[["fixd"]],
random=form2[["rndm"]],
correlation=corCompSymm(form=form2[["corr"]]),
data=main)
The first model fits without any problems whereas the second model gives me following error:
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
Nothing i found out about this error so far helped me to solve the problem. However the solution is probably pretty easy.
Can someone help me? Thanks in advance!
EDIT 1:
When i run:
modl3 <- lm(form1[["fixd"]], data=main)
modl4 <- lm(form2[["fixd"]], data=main)
The summaries reveal that modl4 (with the self constructed interaction variables) in contrast to modl3 shows many more predictors. All those that are in 4 but not in 3 show NA as coefficients. The problem therefore definitely lies within the way i create the interaction variables...
EDIT 2:
In the meantime I created the interaction variables "by hand" (mainly paste() and grepl()) - It seems to work now. However I would still be interested in how i could have realized it by using the interaction() function.
I should have only constructed the largest of the interaction variables (combining all 3 simple variables).
If i do so the model gets fit. The likelihoods then are very close to each other and the number of coefficients matches exactly.

Get coefficients estimated by maximum likelihood into a stargazer table

Stargazer produces very nice latex tables for lm (and other) objects. Suppose I've fit a model by maximum likelihood. I'd like stargazer to produce a lm-like table for my estimates. How can I do this?
Although it's a bit hacky, one way might be to create a "fake" lm object containing my estimates -- I think this would work as long as summary(my.fake.lm.object) works. Is that easily doable?
An example:
library(stargazer)
N <- 200
df <- data.frame(x=runif(N, 0, 50))
df$y <- 10 + 2 * df$x + 4 * rt(N, 4) # True params
plot(df$x, df$y)
model1 <- lm(y ~ x, data=df)
stargazer(model1, title="A Model") # I'd like to produce a similar table for the model below
ll <- function(params) {
## Log likelihood for y ~ x + student's t errors
params <- as.list(params)
return(sum(dt((df$y - params$const - params$beta*df$x) / params$scale, df=params$degrees.freedom, log=TRUE) -
log(params$scale)))
}
model2 <- optim(par=c(const=5, beta=1, scale=3, degrees.freedom=5), lower=c(-Inf, -Inf, 0.1, 0.1),
fn=ll, method="L-BFGS-B", control=list(fnscale=-1), hessian=TRUE)
model2.coefs <- data.frame(coefficient=names(model2$par), value=as.numeric(model2$par),
se=as.numeric(sqrt(diag(solve(-model2$hessian)))))
stargazer(model2.coefs, title="Another Model", summary=FALSE) # Works, but how can I mimic what stargazer does with lm objects?
To be more precise: with lm objects, stargazer nicely prints the dependent variable at the top of the table, includes SEs in parentheses below the corresponding estimates, and has the R^2 and number of observations at the bottom of the table. Is there a(n easy) way to obtain the same behavior with a "custom" model estimated by maximum likelihood, as above?
Here are my feeble attempts at dressing up my optim output as a lm object:
model2.lm <- list() # Mimic an lm object
class(model2.lm) <- c(class(model2.lm), "lm")
model2.lm$rank <- model1$rank # Problematic?
model2.lm$coefficients <- model2$par
names(model2.lm$coefficients)[1:2] <- names(model1$coefficients)
model2.lm$fitted.values <- model2$par["const"] + model2$par["beta"]*df$x
model2.lm$residuals <- df$y - model2.lm$fitted.values
model2.lm$model <- df
model2.lm$terms <- model1$terms # Problematic?
summary(model2.lm) # Not working
I was just having this problem and overcame this through the use of the coef se, and omit functions within stargazer... e.g.
stargazer(regressions, ...
coef = list(... list of coefs...),
se = list(... list of standard errors...),
omit = c(sequence),
covariate.labels = c("new names"),
dep.var.labels.include = FALSE,
notes.append=FALSE), file="")
You need to first instantiate a dummy lm object, then dress it up:
#...
model2.lm = lm(y ~ ., data.frame(y=runif(5), beta=runif(5), scale=runif(5), degrees.freedom=runif(5)))
model2.lm$coefficients <- model2$par
model2.lm$fitted.values <- model2$par["const"] + model2$par["beta"]*df$x
model2.lm$residuals <- df$y - model2.lm$fitted.values
stargazer(model2.lm, se = list(model2.coefs$se), summary=FALSE, type='text')
# ===============================================
# Dependent variable:
# ---------------------------
# y
# -----------------------------------------------
# const 10.127***
# (0.680)
#
# beta 1.995***
# (0.024)
#
# scale 3.836***
# (0.393)
#
# degrees.freedom 3.682***
# (1.187)
#
# -----------------------------------------------
# Observations 200
# R2 0.965
# Adjusted R2 0.858
# Residual Std. Error 75.581 (df = 1)
# F Statistic 9.076 (df = 3; 1)
# ===============================================
# Note: *p<0.1; **p<0.05; ***p<0.01
(and then of course make sure the remaining summary stats are correct)
I don't know how committed you are to using stargazer, but you can try using the broom and the xtable packages, the problem is that it won't give you the standard errors for the optim model
library(broom)
library(xtable)
xtable(tidy(model1))
xtable(tidy(model2))

Resources