I have a number of linear mixed models, which I have fitted with the lmerTest library, so that the summary() of the function would provide me with p-values of fixed effects.
I have written a loop function that extract the fixed effects of gender:time and gender:time:explanatory variable of interest.
Trying to now also extract the p-value of gender:time fixed effect (step 1) and also gender:time:explanatory variable (step 2).
Normally I can extract the p-value with this code:
coef(summary(model))[,5]["genderfemale:time"]
But inside the loop function it doesn't work and gives the error: "Error in coef(summary(model))[, 5] : subscript out of bounds"
See code
library(lmerTest)
# Create a list of models with interaction terms to loop over
models <- list(
mixed_age_interaction,
mixed_tnfi_year_interaction,
mixed_crp_interaction
)
# Create a list of explanatory variables to loop over
explanatoryVariables <- list(
"age_at_diagnosis",
"bio_drug_start_year",
"crp"
)
loop_function <- function(models, explanatoryVariables) {
# Create an empty data frame to store the results
coef_df <- data.frame(adj_coef_gender_sex = numeric(), coef_interaction_term = numeric(), explanatory_variable = character(), adj_coef_pvalue = numeric())
# Loop over the models and explanatory variables
for (i in seq_along(models)) {
model <- models[[i]]
explanatoryVariable <- explanatoryVariables[[i]]
# Extract the adjusted coefficients for the gender*time interaction
adj_coef <- fixef(model)["genderfemale:time"]
# Extract the fixed effect of the interaction term
interaction_coef <- fixef(model)[paste0("genderfemale:time:", explanatoryVariable)]
# Extract the p-value for the adjusted coefficient for gender*time
adj_coef_pvalue <- coef(summary(model))[,5]["genderfemale:time"]
# Add a row to the data frame with the results for this model
coef_df <- bind_rows(coef_df, data.frame(adj_coef_gender_sex = adj_coef, coef_interaction_term = interaction_coef, explanatory_variable = explanatoryVariable, adj_coef_pvalue = adj_coef_pvalue))
}
return(coef_df)
}
# Loop over the models and extract the fixed effects
coef_df <- loop_function(models, explanatoryVariables)
coef_df
My question is how can I extract the p-values from the models for gender:time and gender:time:explanatory variable and add them to the final data.frame coef_df?
Also adding a summary of one of the models for reference
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: basdai ~ 1 + gender + time + age_at_diagnosis + gender * time +
time * age_at_diagnosis + gender * age_at_diagnosis + gender *
time * age_at_diagnosis + (1 | ID) + (1 | country)
Data: dat
AIC BIC logLik deviance df.resid
254340.9 254431.8 -127159.5 254318.9 28557
Scaled residuals:
Min 1Q Median 3Q Max
-3.3170 -0.6463 -0.0233 0.6092 4.3180
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 154.62 12.434
country (Intercept) 32.44 5.695
Residual 316.74 17.797
Number of obs: 28568, groups: ID, 11207; country, 13
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 4.669e+01 1.792e+00 2.082e+01 26.048 < 2e-16 ***
genderfemale 2.368e+00 1.308e+00 1.999e+04 1.810 0.0703 .
time -1.451e+01 4.220e-01 2.164e+04 -34.382 < 2e-16 ***
age_at_diagnosis 9.907e-02 2.220e-02 1.963e+04 4.463 8.12e-06 ***
genderfemale:time 1.431e-01 7.391e-01 2.262e+04 0.194 0.8464
time:age_at_diagnosis 8.188e-02 1.172e-02 2.185e+04 6.986 2.90e-12 ***
genderfemale:age_at_diagnosis 8.547e-02 3.453e-02 2.006e+04 2.476 0.0133 *
genderfemale:time:age_at_diagnosis 4.852e-03 1.967e-02 2.274e+04 0.247 0.8052
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) gndrfm time ag_t_d gndrf: tm:g__ gnd:__
genderfemal -0.280
time -0.241 0.331
age_t_dgnss -0.434 0.587 0.511
gendrfml:tm 0.139 -0.519 -0.570 -0.293
tm:g_t_dgns 0.228 -0.313 -0.951 -0.533 0.543
gndrfml:g__ 0.276 -0.953 -0.329 -0.639 0.495 0.343
gndrfml::__ -0.137 0.491 0.567 0.319 -0.954 -0.596 -0.516
The internal function get_coefmat of {lmerTest} might be handy:
if fm is an example {lmer} model ...
library("lmerTest")
fm <- lmer(Informed.liking ~ Gender + Information * Product +
(1 | Consumer) + (1 | Consumer:Product),
data=ham
)
... you can obtain the coefficients including p-values as a dataframe like so (note the triple colon to expose the internal function):
df_coeff <- lmerTest:::get_coefmat(fm) |>
as.data.frame()
output:
## > df_coeff
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 5.8490289 0.2842897 322.3364 20.5741844 1.173089e-60
## Gender2 -0.2442835 0.2605644 79.0000 -0.9375169 3.513501e-01
## Information2 0.1604938 0.2029095 320.0000 0.7909626 4.295517e-01
## Product2 -0.8271605 0.3453291 339.5123 -2.3952818 1.714885e-02
## Product3 0.1481481 0.3453291 339.5123 0.4290057 6.681912e-01
## ...
edit
Here's a snippet which will return you the extracted coefficents for, e.g., models m1 and m2 as a combined dataframe:
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
list('m1', 'm2') |> ## observe the quotes
map_dfr( ~ list(
model = .x,
coeff = lmerTest:::get_coefmat(get(.x)) |>
as.data.frame() |>
rownames_to_column()
)
) |>
as_tibble() |>
unnest_wider(coeff)
output:
## + # A tibble: 18 x 7
## model rowname Estimate `Std. Error` df `t value` `Pr(>|t|)`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 m1 (Intercept) 5.85 0.284 322. 20.6 1.17e-60
## 2 m1 Gender2 -0.244 0.261 79.0 -0.938 3.51e- 1
## ...
## 4 m1 Product2 -0.827 0.345 340. -2.40 1.71e- 2
## ...
## 8 m1 Information2:Product3 0.272 0.287 320. 0.946 3.45e- 1
## ...
## 10 m2 (Intercept) 5.85 0.284 322. 20.6 1.17e-60
## 11 m2 Gender2 -0.244 0.261 79.0 -0.938 3.51e- 1
## ...
Related
I don't have any 'treatment' except the passage of time (date), and 10 times points. I have a total of 43190 measurements, they are continuous binomial data (0.0 to 1.0) of the percentual response variable (canopycov). In glm logic, this is a quasibinomial case, but I find just only glmmPQL in MASS package for use, but the model is not OK and I have NA for p-values in all the dates. In my case, I try:
#Packages
library(MASS)
# Dataset
ds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/pred_attack_F.csv")
str(ds)
# 'data.frame': 43190 obs. of 3 variables:
# $ date : chr "2021-12-06" "2021-12-06" "2021-12-06" "2021-12-06" ...
# $ canopycov: int 22 24 24 24 25 25 25 25 26 26 ...
# $ rep : chr "r1" "r1" "r1" "r1" ...
# Binomial Generalized Linear Mixed Models
m.1 <- glmmPQL(canopycov/100~date,random=~1|date,
family="quasibinomial",data=ds)
summary(m.1)
#Linear mixed-effects model fit by maximum likelihood
# Data: ds
# AIC BIC logLik
# NA NA NA
# Random effects:
# Formula: ~1 | date
# (Intercept) Residual
# StdDev: 1.251838e-06 0.1443305
# Variance function:
# Structure: fixed weights
# Formula: ~invwt
# Fixed effects: canopycov/100 ~ date
# Value Std.Error DF t-value p-value
# (Intercept) -0.5955403 0.004589042 43180 -129.77442 0
# date2021-06-14 -0.1249648 0.006555217 0 -19.06341 NaN
# date2021-07-09 0.7661870 0.006363749 0 120.39868 NaN
# date2021-07-24 1.0582366 0.006434893 0 164.45286 NaN
# date2021-08-03 1.0509474 0.006432295 0 163.38607 NaN
# date2021-08-08 1.0794612 0.006442704 0 167.54784 NaN
# date2021-09-02 0.9312346 0.006395722 0 145.60274 NaN
# date2021-09-07 0.9236196 0.006393780 0 144.45595 NaN
# date2021-09-22 0.7268144 0.006359224 0 114.29293 NaN
# date2021-12-06 1.3109809 0.006552314 0 200.07907 NaN
# Correlation:
# (Intr) d2021-06 d2021-07-0 d2021-07-2 d2021-08-03 d2021-08-08
# date2021-06-14 -0.700
# date2021-07-09 -0.721 0.505
# date2021-07-24 -0.713 0.499 0.514
# date2021-08-03 -0.713 0.499 0.514 0.509
# date2021-08-08 -0.712 0.499 0.514 0.508 0.508
# date2021-09-02 -0.718 0.502 0.517 0.512 0.512 0.511
# date2021-09-07 -0.718 0.502 0.518 0.512 0.512 0.511
# date2021-09-22 -0.722 0.505 0.520 0.515 0.515 0.514
# date2021-12-06 -0.700 0.490 0.505 0.499 0.500 0.499
# d2021-09-02 d2021-09-07 d2021-09-2
# date2021-06-14
# date2021-07-09
# date2021-07-24
# date2021-08-03
# date2021-08-08
# date2021-09-02
# date2021-09-07 0.515
# date2021-09-22 0.518 0.518
# date2021-12-06 0.503 0.503 0.505
# Standardized Within-Group Residuals:
# Min Q1 Med Q3 Max
# -6.66259139 -0.47887669 0.09634211 0.54135914 4.32231889
# Number of Observations: 43190
# Number of Groups: 10
I'd like to correctly specify that my data is temporally pseudo replicated in a mixed-effects, but I don't find another approach for this. Please, I need any help to solve it.
I don't understand the motivation for a quasi-binomial model here, there's some nice discussion of the binomial and quasi binomial densities here and here that might be worth reading (including applications).
The problem with the code is that you have date as a character, so R doesn't know its a date. You will have to decide the units of measurement for time as well as the reference point, but the model works fine once you fix this.
ds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/pred_attack_F.csv")
str(ds)
head(ds)
class(ds$date)
ds$datex <- as.Date(ds$date)
summary(as.numeric(difftime(ds$datex, as.Date(Sys.Date(), "%d%b%Y"), units = "days")))
ds$date_time <- as.numeric(difftime(ds$datex, as.Date(Sys.Date(), "%d%b%Y"), units = "days"))
# scale for laplace
ds$sdate_time <- scale(ds$date_time)
m1 <- glmer(cbind(canopycov,100 - canopycov)~ date_time + (1|date_time),
family="binomial",data=ds)
summary(m1)
# Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
# Family: binomial ( logit )
# Formula: canopycov/100 ~ date_time + (1 | date_time)
# Data: ds
#
# AIC BIC logLik deviance df.resid
# 35109.7 35135.7 -17551.9 35103.7 43187
#
# Scaled residuals:
# Min 1Q Median 3Q Max
# -19.0451 -1.5605 -0.5155 -0.1594 0.8081
#
# Random effects:
# Groups Name Variance Std.Dev.
# date_time (Intercept) 0 0
# Number of obs: 43190, groups: date_time, 10
#
# Fixed effects:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 8.5062394 0.0866696 98.15 <2e-16 ***
# date_time 0.0399010 0.0004348 91.77 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Correlation of Fixed Effects:
# (Intr)
# date_time 0.988
# optimizer (Nelder_Mead) convergence code: 0 (OK)
# boundary (singular) fit: see ?isSingular
m2 <- MASS::glmmPQL(canopycov/100~ date_time,random=~1|date_time,
family="quasibinomial",data=ds)
summary(m2)
# Linear mixed-effects model fit by maximum likelihood
# Data: ds
# AIC BIC logLik
# NA NA NA
#
# Random effects:
# Formula: ~1 | date_time
# (Intercept) Residual
# StdDev: 0.3082808 0.1443456
#
# Variance function:
# Structure: fixed weights
# Formula: ~invwt
# Fixed effects: canopycov/100 ~ date_time
# Value Std.Error DF t-value p-value
# (Intercept) 0.1767127 0.0974997 43180 1.812443 0.0699
# date_time 0.3232878 0.0975013 8 3.315728 0.0106
# Correlation:
# (Intr)
# date_time 0
#
# Standardized Within-Group Residuals:
# Min Q1 Med Q3 Max
# -6.66205315 -0.47852364 0.09635514 0.54154467 4.32129236
#
# Number of Observations: 43190
# Number of Groups: 10
Random coefficient Poisson models are rather difficult to fit, there tends to be some variability in parameter estimates between lme4 and glmmADMB. But in my case:
# Packages
library(lme4)
library(glmmADMB)
#Open my dataset
myds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/my_glmm_dataset.csv")
str(myds)
# 'data.frame': 526 obs. of 10 variables:
# $ Bioma : chr "Pampa" "Pampa" "Pampa" "Pampa" ...
# $ estacao : chr "verao" "verao" "verao" "verao" ...
# $ ciclo : chr "1°" "1°" "1°" "1°" ...
# $ Hour : int 22 23 0 1 2 3 4 5 6 7 ...
# $ anthill : num 23.5 23.5 23.5 23.5 23.5 ...
# $ formigueiro: int 2 2 2 2 2 2 2 2 2 2 ...
# $ ladenant : int 34 39 29 25 20 31 16 28 21 12 ...
# $ unladen : int 271 258 298 317 316 253 185 182 116 165 ...
# $ UR : num 65.7 69 71.3 75.8 78.1 ...
# $ temp : num 24.3 24.3 24 23.7 23.1 ...
I have a number of insects (ladenant) in the function of biome(Bioma), temperature (temp) and humidity(UR), but formiguieros is pseudoreplication. Then I try to model the relationship using lme4 and glmmADMB.
First I try lme4:
m.laden.1 <- glmer(ladenant ~ Bioma + poly(temp,2) + UR + (1 | formigueiro), data = DataBase, family = poisson(link = "log"))
summary(m.laden.1)
# Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
# Family: poisson ( log )
# Formula: ladenant ~ Bioma + poly(temp, 2) + UR + (1 | formigueiro)
# Data: DataBase
# AIC BIC logLik deviance df.resid
# 21585.9 21615.8 -10786.0 21571.9 519
# Scaled residuals:
# Min 1Q Median 3Q Max
# -10.607 -4.245 -1.976 2.906 38.242
# Random effects:
# Groups Name Variance Std.Dev.
# formigueiro (Intercept) 0.02049 0.1432
# Number of obs: 526, groups: formigueiro, 5
# Fixed effects:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 0.7379495 0.0976701 7.556 4.17e-14 ***
# BiomaTransition 1.3978383 0.0209623 66.684 < 2e-16 ***
# BiomaPampa -0.1256759 0.0268164 -4.687 2.78e-06 ***
# poly(temp, 2)1 7.1035195 0.2079550 34.159 < 2e-16 ***
# poly(temp, 2)2 -7.2900687 0.2629908 -27.720 < 2e-16 ***
# UR 0.0302810 0.0008029 37.717 < 2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Correlation of Fixed Effects:
# (Intr) BmTrns BimPmp p(,2)1 p(,2)2
# BiomaTrnstn -0.586
# BiomaPampa -0.199 0.352
# ply(tmp,2)1 -0.208 0.267 0.312
# ply(tmp,2)2 -0.191 0.085 -0.175 -0.039
# UR -0.746 0.709 0.188 0.230 0.316
# optimizer (Nelder_Mead) convergence code: 0 (OK)
# Model is nearly unidentifiable: very large eigenvalue
# - Rescale variables?
Second I try glmmADMB:
m.laden.2 <- glmmadmb(ladenant ~ Bioma + poly(temp,2) + UR + (1 | formigueiro), data = DataBase, family = "poisson", link = "log")
summary(m.laden.2)
# Call:
# glmmadmb(formula = ladenant ~ Bioma + poly(temp, 2) + UR + (1 |
# formigueiro), data = DataBase, family = "poisson", link = "log")
# AIC: 12033.9
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 1.52390 0.26923 5.66 1.5e-08 ***
# BiomaTransition 0.23967 0.08878 2.70 0.0069 **
# BiomaPampa 0.09680 0.05198 1.86 0.0626 .
# poly(temp, 2)1 -0.38754 0.55678 -0.70 0.4864
# poly(temp, 2)2 -1.16028 0.39608 -2.93 0.0034 **
# UR 0.01560 0.00261 5.97 2.4e-09 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Number of observations: total=526, formigueiro=5
# Random effect variance(s):
# Group=formigueiro
# Variance StdDev
# (Intercept) 0.07497 0.2738
Despite the between different statistical packages and estimations, the models are completely different significance levels in the case of Biome variable. My question is,
where is any other approach that can be used to compare the results and choose a final model?
Thanks in advance.
I got a little bit carried away. tl;dr as pointed out in comments, it's hard to get glmmADMB to work with a Poisson model, but a model with overdispersion (e.g. negative binomial) is clearly a lot better. Furthermore, you should probably incorporate some aspect of random slopes in the model ...
Packages (colorblindr is optional):
library(lme4)
library(glmmADMB)
library(glmmTMB)
library(broom.mixed)
library(tidyverse) ## ggplot2, dplyr, tidyr, purrr ...
library(colorblindr) ## remotes::install_github("clauswilke/colorblindr")
theme_set(theme_bw())
Get data: standardize input variables so we can easily make a sensible coefficient plot
## read.csv doesn't work for me out of the box, locale/encoding issues
myds <- readr::read_csv("my_glmm_dataset.csv") %>%
mutate(across(formigueiro, as.factor),
across(c(UR, temp), ~ drop(scale(.))))
Formulas and models:
form <- ladenant ~ Bioma + poly(temp,2) + UR + (1 | formigueiro)
## random slopes, all independent (also tried with correlations (| instead
## of ||), but fails)
form_x <- ladenant ~ Bioma + poly(temp,2) + UR + (1 + UR + poly(temp,2) || formigueiro)
glmer_pois <- glmer(form, data = myds, family = poisson(link = "log"))
## fails
glmmADMB_pois <- try(glmmadmb(form, data = myds, family = "poisson"))
## fails ("Parameters were estimated, but standard errors were not:
## the most likely problem is that the curvature at MLE was zero or negative"
glmmTMB_pois <- glmmTMB(form, data = myds, family = poisson)
glmer_nb2 <- glmer.nb(form, data = myds)
glmmADMB_nb2 <- glmmadmb(form, data = myds, family = "nbinom2")
glmmTMB_nb2 <- update(glmmTMB_pois, family = "nbinom2")
glmmTMB_nb1 <- update(glmmTMB_pois, family = "nbinom1")
glmmTMB_nb2ext <- update(glmmTMB_nb2, formula = form_x)
Put it all together:
modList <- tibble::lst(glmer_pois, glmmTMB_pois, glmer_nb2, glmmADMB_nb2, glmmTMB_nb2,
glmmTMB_nb1, glmmTMB_nb2ext)
bbmle::AICtab(modList)
dAIC df
glmmTMB_nb2ext 0.0 11
glmer_nb2 27.0 8
glmmADMB_nb2 27.1 8
glmmTMB_nb2 27.1 8
glmmTMB_nb1 79.5 8
glmmTMB_pois 16658.0 7
glmer_pois 16658.0 7
The 'nb2' models are all OK, but the random-slopes model is considerably better.
Coefficient plots, including the fixed effects from all methods:
tt <- (purrr::map_dfr(modList, tidy, effects = "fixed", conf.int = TRUE,
.id = "model") |>
dplyr::filter(term != "(Intercept)") |>
tidyr::separate(model, into = c("platform", "distrib"))
)
ggplot(tt, aes(y = term, x = estimate, xmin = conf.low, xmax = conf.high,
colour = platform, shape = distrib)) +
geom_pointrange(position = position_dodge(width = 0.25)) +
geom_vline(xintercept = 0, lty =2) +
scale_colour_OkabeIto()
I'm trying to build a list in R that contains all the regressor names which have pvalue below the 5% threshold.
For example:
first regression
#gender (male - female)
regr1 <- lm(salary ~ female, data = test)
summary(regr1)
output first regression:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.855618 0.001888 453.24 <2e-16 ***
female -0.054514 0.003088 -17.65 <2e-16 ***
second regression:
#education (PhD - Master - Bachelor - HighSchool - None)
regr2 <- lm(salary ~ Master + Bachelor + HighSchool + None, data = test)
summary(regr2)
output second regression:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.91008 0.02829 32.164 < 2e-16 ***
Master -0.05446 0.02836 -1.920 0.054811 .
Bachelor -0.10291 0.02848 -3.613 0.000303 ***
HighSchool -0.10173 0.02911 -3.495 0.000475 ***
None -0.12590 0.02864 -4.396 1.11e-05 ***
The variable master is not significant, so i don't want it in the List.
This is teh list that I would like to get:
varnames <- c("female", "Bachelor", "HighSchool", "None")
You can use broom::tidy and then manipulate the table, like this:
library(tidyverse)
tab <- lm(data = mtcars, mpg ~ cyl + disp + hp) %>% summary() %>% broom::tidy()
tab
# A tibble: 4 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 34.2 2.59 13.2 1.54e-13
2 cyl -1.23 0.797 -1.54 1.35e- 1
3 disp -0.0188 0.0104 -1.81 8.09e- 2
4 hp -0.0147 0.0147 -1.00 3.25e- 1
Then you filter the p.value column:
tab %>% filter(p.value < 0.05)
# A tibble: 1 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 34.2 2.59 13.2 1.54e-13
So now you can take the regressors name:
tab %>% filter(p.value < 0.05) %>% select(term) %>% as.character()
[1] "(Intercept)"
In base R you can do something like the following:
lr1 <- lm(Sepal.Length ~ ., data = iris)
coef_table <- coef(summary(tab))
row.names(coef_table)[coef_table[, "Pr(>|t|)"] < 0.001]
# "(Intercept)" "Sepal.Width" "Petal.Length"
I have a dataset that I am using to build generalised linear models. The response variable is binary (absence/presence) and the explanatory variables are categorical.
CODE
library(tidyverse)
library(AICcmodavg)
# Data
set.seed(123)
t <- tibble(ID = 1:100,
A = as.factor(sample(c(0, 1), 100, T)),
B = as.factor(sample(c("black", "white"), 100, T)),
C = as.factor(sample(c("pos", "neg", "either"), 100, T)))
# Candidate set of models - Binomial family because response variable
# is binary (0 for absent & 1 for present)
# Global model is A ~ B_black + C_either
m1 <- glm(A ~ 1, binomial, t)
m2 <- glm(A ~ B, binomial, t)
m3 <- glm(A ~ C, binomial, t)
m4 <- glm(A ~ B + C, binomial, t)
# List with all models
ms <- list(null = m1, m_B = m2, m_C = m3, m_BC = m4)
# Summary table
aic_tbl <- aictab(ms)
PROBLEM
I want to build a table like the one below that summarises the coefficients, standard errors, and Akaike weights of the models within my candidate set.
QUESTION
Can anyone suggest how to best build this table using my list of models and AIC table?
Just to point it out: broom gets you half-way to where you want to get by turning the model output into a dataframe, which you can then reshape.
library(broom)
bind_rows(lapply(ms, tidy), .id="key")
key term estimate std.error statistic p.value
1 null (Intercept) -0.12014431182649532 0.200 -0.59963969517107030 0.549
2 m_B (Intercept) 0.00000000000000123 0.283 0.00000000000000433 1.000
3 m_B Bwhite -0.24116205496397874 0.401 -0.60071814968372905 0.548
4 m_C (Intercept) -0.47957308026188367 0.353 -1.35892869678271544 0.174
5 m_C Cneg 0.80499548069651150 0.507 1.58784953814722285 0.112
6 m_C Cpos 0.30772282333522433 0.490 0.62856402205887851 0.530
7 m_BC (Intercept) -0.36339654526926718 0.399 -0.90984856337213305 0.363
8 m_BC Bwhite -0.25083209866475475 0.408 -0.61515191157571303 0.538
9 m_BC Cneg 0.81144822536950656 0.508 1.59682131202527056 0.110
10 m_BC Cpos 0.32706970242195277 0.492 0.66527127770403538 0.506
And if you must insist of the layout of your table, I came up with the following (arguably clumsy) way of rearranging everything:
out <- bind_rows(lapply(ms, tidy), .id="mod")
t1 <- out %>% select(mod, term, estimate) %>% spread(term, estimate) %>% base::t
t2 <- out %>% select(mod, term, std.error) %>% spread(term, std.error) %>% base::t
rownames(t2) <- paste0(rownames(t2), "_std_e")
tmp <- rbind(t1, t2[-1,])
new_t <- as.data.frame(tmp[-1,])
colnames(new_t) <- tmp[1,]
new_t
Alternatively, you may want to familiarise yourself with packages that are meant to display model output for publication, e.g. texreg or stargazer come to mind:
library(texreg)
screenreg(ms)
==================================================
null m_B m_C m_BC
--------------------------------------------------
(Intercept) -0.12 0.00 -0.48 -0.36
(0.20) (0.28) (0.35) (0.40)
Bwhite -0.24 -0.25
(0.40) (0.41)
Cneg 0.80 0.81
(0.51) (0.51)
Cpos 0.31 0.33
(0.49) (0.49)
--------------------------------------------------
AIC 140.27 141.91 141.66 143.28
BIC 142.87 147.12 149.48 153.70
Log Likelihood -69.13 -68.95 -67.83 -67.64
Deviance 138.27 137.91 135.66 135.28
Num. obs. 100 100 100 100
==================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
I have several models such as the example below for which I have estimates, standard errors, p-values, r2 etc. as data.frames in tidy format, but I don't have the original model objects (analysis was run on a different machine).
require(broom)
model <- lm(mpg ~ hp + cyl, mtcars)
tidy_model <- tidy(model)
glance_model <- glance(model)
# tidy_model
# # A tibble: 3 x 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) 36.9 2.19 16.8 1.62e-16
# 2 hp -0.0191 0.0150 -1.27 2.13e- 1
# 3 cyl -2.26 0.576 -3.93 4.80e- 4
# glance(model)
# # A tibble: 1 x 11
# r.squared adj.r.squared sigma ...
# * <dbl> <dbl> <dbl> ...
# 1 0.760 0.743 3.06 ...
There exist several packages (e.g. stargazer or texreg) which transform one or more model objects (lm, glm, etc.) into well-formatted regression tables side-by-side, see below for an example of texreg:
require(texreg)
screenreg(list(model1, model1)
# =================================
# Model 1 Model 2
# ---------------------------------
# (Intercept) 34.66 *** 34.66 ***
# (2.55) (2.55)
# cyl -1.59 * -1.59 *
# (0.71) (0.71)
# disp -0.02 -0.02
# (0.01) (0.01)
# ---------------------------------
# R^2 0.76 0.76
# Adj. R^2 0.74 0.74
# Num. obs. 32 32
# RMSE 3.06 3.06
# =================================
# *** p < 0.001, ** p < 0.01, * p < 0.05
Is there a similar package that uses tidy estimation results produced with broom as inputs rather than model objects to produce a table such as the above example?
Is there a similar package that uses tidy estimation results produced with broom as inputs
Not to my knowledge, but stargazer allows you to use custom inputs to generate regression tables. This allows us to create "fake" shell tables that we can populate with values from the tidy table. Using your example
# create fake models
dat <- lapply(tidy_model$term, function(...) rnorm(10))
dat <- as.data.frame(setNames(dat, c("mpg", tidy_model$term[-1])))
f <- as.formula(paste("mpg ~", paste(tidy_model$term[-1], collapse = " + ")))
fit <- lm(f, dat)
# set up model statistics
fit_stats <- data.frame(labels = names(glance_model),
mod1 = round(unlist(glance_model), 3),
mod2 = round(unlist(glance_model), 3),
row.names = NULL,
stringsAsFactors = FALSE)
We can then feed these values into stargazer:
library(stargazer)
stargazer(fit, fit, type = "text",
coef = list(tidy_model$estimate, tidy_model$estimate),
se = list(tidy_model$std.error, tidy_model$std.error),
add.lines = lapply(1:nrow(fit_stats), function(i) unlist(fit_stats[i, ])),
omit.table.layout = "s"
)
# ==========================================
# Dependent variable:
# ----------------------------
# mpg
# (1) (2)
# ------------------------------------------
# hp -0.019 -0.019
# (0.015) (0.015)
# cyl -2.265*** -2.265***
# (0.576) (0.576)
# Constant 36.908*** 36.908***
# (2.191) (2.191)
# ------------------------------------------
# r.squared 0.741 0.741
# adj.r.squared 0.723 0.723
# sigma 3.173 3.173
# statistic 41.422 41.422
# p.value 0 0
# df 3 3
# logLik -80.781 -80.781
# AIC 169.562 169.562
# BIC 175.425 175.425
# deviance 291.975 291.975
# df.residual 29 29
# ==========================================
# Note: *p<0.1; **p<0.05; ***p<0.01
I had another look at texreg, inspired by this answer, and there is a more native way to do this by defining an additional extraction method for texreg in addition to the previous answer:
extract_broom <- function(tidy_model, glance_model) {
# get estimates/standard errors from tidy
coef <- tidy_model$estimate
coef.names <- as.character(tidy_model$term)
se <- tidy_model$std.error
pvalues <- tidy_model$p.value
# get goodness-of-fit statistics from glance
glance_transposed <- as_tibble(cbind(name = names(glance_model), t(glance_model)))
gof.names <- as.character(glance_transposed$name)
gof <- as.double(glance_transposed$value)
gof.decimal <- gof %% 1 > 0
tr_object <- texreg::createTexreg(coef.names = coef.names,
coef = coef,
se = se,
pvalues = pvalues,
gof.names = gof.names,
gof = gof,
gof.decimal = gof.decimal)
return(tr_object)
}
This results in the following output:
texreg_model <- extract_broom(tidy_model, glance_model)
screenreg(list(texreg_model, texreg_model))
# =====================================
# Model 1 Model 2
# -------------------------------------
# (Intercept) 36.91 *** 36.91 ***
# (2.19) (2.19)
# hp -0.02 -0.02
# (0.02) (0.02)
# cyl -2.26 *** -2.26 ***
# (0.58) (0.58)
# -------------------------------------
# r.squared 0.74 0.74
# adj.r.squared 0.72 0.72
# sigma 3.17 3.17
# statistic 41.42 41.42
# p.value 0.00 0.00
# df 3 3
# logLik -80.78 -80.78
# AIC 169.56 169.56
# BIC 175.42 175.42
# deviance 291.97 291.97
# df.residual 29 29
# =====================================
# *** p < 0.001, ** p < 0.01, * p < 0.05