Reporting mgcv::gam summary with modelsummary - r

I'm attempting to report the model summary from mgcv::gam() using the modelsummary package. The flextable package provides a summary that is consistent with the summary output in R and what is often presented in publications. It separates out reporting for the fixed/parametric effects and the smooth terms.
Although flextable works well, I'd like to use modelsummary (mainly for it's ability to output to gt, kable, etc.). My plan was to produce two separate tables and report the appropriate data for parametric and smooth terms separately (there might be a better way?). However, I get hung up trying to omit coefficients in modelsummary().
Flextable example:
library(mgcv)
library(flextable)
library(modelsummary)
dat <- gamSim(1, n = 4000, dist = "normal", scale = 2)
mod <- gam(y ~ s(x0) + s(x1) + s(x2), data = dat)
flextable::as_flextable(mod)
My first step at getting the summary for parametric terms using modelsummary():
modelsummary(mod,
estimate = "estimate",
statistic = c("Std.Error" = "std.error",
"t-value" = "statistic",
"p-value" = "p.value"),
shape = term ~ model + statistic,
gof_map = NA)
I want to drop the smooth terms and include those in a different table or group, so I tried the coef_omit argument:
modelsummary(mod,
estimate = "estimate",
statistic = c("Std.Error" = "std.error",
"t-value" = "statistic",
"p-value" = "p.value"),
coef_omit = "^(?!.*Intercept)", #this should retain the intercept term
omit = ".*",
shape = term ~ model + statistic,
gof_map = NA)
Error in if (dat$part[i] == "estimates" && dat[[column]][i - 1] == dat[[column]][i]) { :
missing value where TRUE/FALSE needed
Interestingly, if I remove the shape argument to report statistics in "long format" the error goes away. I might be approaching formatting this summary completely wrong and am open to suggestions.

Related

Stargazer custom confidence intervals with multiple models

Stargazer is exponentiating 'wrong' confidence intervals because it is using normal distribution instead of t-distribution. So one has to use custom confidence intervals (Stargazer Confidence Interval Incorrect?).
But how does one it with multiple models?
model1 <- glm(vs ~ mpg + hp, data = mtcars, family = 'binomial')
model2 <- glm(vs ~ mpg + disp, data = mtcars, family = 'binomial')
library(stargazer)
stargazer(model1,
apply.coef = exp,
digits = 3,
ci = T,
t.auto = F,
type = "text",
ci.custom = list(exp(confint(model1))))
This works as intended. But when I am adding
ci.custom = list(exp(confint(model1, model2))))
then I'll get
Error in Pnames[which] : invalid subscript type 'list'
I tried with c() but to no avail.
The documentation says
a list of two-column numeric matrices ...
so
cc <- lapply(list(model1, model2), function(x) exp(confint(x)))
stargazer(model1, model2,
...,
ci.custom = cc)
should work. (cc <- list(exp(confint(model1)), exp(confint(model2))) also works, and is a little more explicit, but won't generalize as well ...)
For what it's worth, the difference for generalized linear models between the default CIs and those provided by confint() is not a Normal-vs-Student-t distinction (this is different from the case in the linked answer about linear models) — it's the difference between Wald and profile likelihood confidence intervals. (There is some theory for finite-size corrections in GLMs, called Bartlett corrections, but they're not easy to compute/widely available.)

Formatting regression results in R for latex

I am trying to create a regression results table in R for latex. I would like this table to have two separate columns: one for the estimates and one for the standard error. The following code
library(fixest)
library(car)
library(pander)
##Using the built-in CO2 data frame, run regression
i<- feols(conc ~ uptake + Treatment | Type, CO2, vcov = "hetero")
summary(i)
##Create regression table for latex
etable(i, postprocess.df = pandoc.table.return, style = "rmarkdown")
my_style = style.df(depvar.title = "", fixef.title = "",
fixef.suffix = " fixed effect", yesNo = "yes", default = TRUE)
setFixest_etable(style.df = my_style, postprocess.df = pandoc.table.return)
etable(i, style = "rmarkdown", caption = "New default values", se.below = NULL )
etable(i, tex = TRUE)
print(etable(i, tex = TRUE), file = "filename2.tex")
When put into Latex document on overleaf.com the following image is produced.
How can I alter my above code to have estimates and standard error in different columns in my table?
You can try the following:
modelsummary(your_regression, fmt=2,
estimate = c("{estimate}{stars} (std.error)"),
statistic = c(),
output = "latex")
What I like with modelsummary too, is that it enable to put many different model as a list to compare them.
Thanks to #léo-henry for suggesting modelsummary. I just wanted to point out that in the latest version of the package you can use the shape argument to display the estimates and standard errors (or other statitics) side-by-side. You will find details here: https://vincentarelbundock.github.io/modelsummary/articles/modelsummary.html#shape
Here is a minimal example:
library(modelsummary)
mod <- lm(mpg ~ hp + drat, data = mtcars)
modelsummary(mod, shape = term ~ model + statistic)

partykit: Modify terminal node to include standard deviation and significance of regressors

I would like to be able to personalize the plot that it is displayed to include standard deviation and statistical significance of the regressors after using the partykit::mob() function.
The following code is from partykit documentation.
library("partykit")
if(require("mlbench")) {
## Pima Indians diabetes data
data("PimaIndiansDiabetes", package = "mlbench")
## a simple basic fitting function (of type 1) for a logistic regression
logit <- function(y, x, start = NULL, weights = NULL, offset = NULL, ...) {
glm(y ~ 0 + x, family = binomial, start = start, ...)
}
## set up a logistic regression tree
pid_tree <- mob(diabetes ~ glucose | pregnant + pressure + triceps + insulin +
mass + pedigree + age, data = PimaIndiansDiabetes, fit = logit)
## see lmtree() and glmtree() for interfaces with more efficient fitting functions
## print tree
print(pid_tree)
## print information about (some) nodes
print(pid_tree, node = 3:4)
## visualization
plot(pid_tree,terminal_panel = NULL)
}
This is what it is produced:
And this is what I would like to get (for all the nodes).
Thanks in advance.
When using the node_terminal() function for visualizing the information within the terminal nodes, you can plug in a function FUN that customizes and formats the information. The input to FUN is the $info from the respective terminal node which for mob trees includes the fitted model $object. The output should be a character vector. As an example consider this custom summary:
mysummary <- function(info, digits = 2) {
n <- info$nobs
na <- format(names(coef(info$object)))
cf <- format(coef(info$object), digits = digits)
se <- format(sqrt(diag(vcov(info$object))), digits = digits)
c(paste("n =", n),
"Estimated parameters:",
paste(na, cf, se)
)
}
Based on this you get:
plot(pid_tree,
terminal_panel = node_terminal,
tp_args = list(FUN = mysummary))
This just shows coefficients and standard errors - but you can add significance stars or any other information you like. However, you need to do all the formatting yourself in the custom FUN.

Is there a way to write a regression table from R with coeffecients, se, p, odds ratio, and CIs?

I have rather robust data and would like to write my regression outputs to a "journal ready" format from R.
At the moment, I have gotten the outreg package to work in getting the coefficients, se, and p-values (code below). However, I cannot seem to find how I can also include the odds ratios or confidence intervals. Does anyone have suggestions on such a package that does this, or how I can achieve this using outreg?
Above-mentioned code for outreg table:
dm1.output <- list(dm1a, dm1b, dm1c, dm1d, dm1e, dm1f, dm1g, dm1h)
dm1.output2 <- as.data.frame(outreg(dm1.output, pv=T, starred = c("pv")))
If they are standard models you could look at modelsummary here:
library(modelsummary)
models <- list("Model 1" <- glm(am ~ mpg, data = mtcars, family = binomial),
"Model 2" <- glm(am ~ cyl, data = mtcars, family = binomial))
modelsummary(models, exponentiate = TRUE, stars = T, statistic = 'conf.int', conf_level = .95)
Or stargazer:
library(stargazer)
model_1 <- glm(am ~ mpg, data = mtcars, family = binomial)
stargazer(model_1, coef = list(exp(model_1$coefficients)), type = "text")
Or texreg
library(texreg)
model_trexreg <- texreg::extract(model_1)
screenreg(model_1, override.coef = exp(model_trexreg#coef), override.se = exp(model_trexreg#se))
The stargazer package is good for your tasks. It was designed with the goal of producing "journal ready" tables, since the developer is an academic.

sjt.lmer displaying incorrect p-values

I've just noticed that sjt.lmer tables are displaying incorrect p-values, e.g., p-values that do not reflect the model summary. This appears to be a new-ish issue, as this worked fine last month?
Using the provided data and code in the package vignette
library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(lme4)
library(sjstats)
load sample data
data(efc)
prepare grouping variables
efc$grp = as.factor(efc$e15relat)
levels(x = efc$grp) <- get_labels(efc$e15relat)
efc$care.level <- rec(efc$n4pstu, rec = "0=0;1=1;2=2;3:4=4",
val.labels = c("none", "I", "II", "III"))
data frame for fitted model
mydf <- data.frame(
neg_c_7 = efc$neg_c_7,
sex = to_factor(efc$c161sex),
c12hour = efc$c12hour,
barthel = efc$barthtot,
education = to_factor(efc$c172code),
grp = efc$grp,
carelevel = to_factor(efc$care.level)
)
fit sample models
fit1 <- lmer(neg_c_7 ~ sex + c12hour + barthel + (1 | grp), data = mydf)
summary(fit1)
p_value(fit1, p.kr =TRUE)
model summary
p_value summary
sjt.lmer output does not show these p-values??
Note that the first summary comes from a model fitted with lmerTest, which computes p-values with df based on Satterthwaite approximation (see first line in output).
p_value(), however, with p.kr = TRUE, uses the Kenward-Roger approximation from package pbkrtest, which is a bit more conservative.
Your output from sjt.lmer() seems to be messed up somehow, and I can't reproduce it with your example. My output looks ok:

Resources