Example
library(glmmTMB)
library(ggeffects)
## Zero-inflated negative binomial model
(m <- glmmTMB(count ~ spp + mined + (1|site),
ziformula=~spp + mined,
family=nbinom2,
data=Salamanders,
na.action = "na.fail"))
summary(m)
ggemmeans(m, terms="spp")
spp | Predicted | 95% CI
--------------------------------
GP | 1.11 | [0.66, 1.86]
PR | 0.42 | [0.11, 1.59]
DM | 1.32 | [0.81, 2.13]
EC-A | 0.75 | [0.37, 1.53]
EC-L | 1.81 | [1.09, 3.00]
DES-L | 2.00 | [1.25, 3.21]
DF | 0.99 | [0.61, 1.62]
ggeffects::ggeffect(m, terms="spp")
spp | Predicted | 95% CI
--------------------------------
GP | 1.14 | [0.69, 1.90]
PR | 0.44 | [0.12, 1.63]
DM | 1.36 | [0.85, 2.18]
EC-A | 0.78 | [0.39, 1.57]
EC-L | 1.87 | [1.13, 3.07]
DES-L | 2.06 | [1.30, 3.28]
DF | 1.02 | [0.63, 1.65]
Questions
Why are ggeffect and ggemmeans giving different results for the marginal effects? Is it simply something internal with how the packages emmeans and effects are computing them? Also, does anyone know of some resources on how to compute marginal effects from scratch for a model like that in the example?
You fit a complex model: zero-inflated negative binomial model with random effects.
What you observe has little to do with the model specification. Let's show this by fitting a simpler model: Poisson with fixed effects only.
library("glmmTMB")
library("ggeffects")
m <- glmmTMB(
count ~ spp + mined,
family = poisson,
data = Salamanders
)
ggemmeans(m, terms = "spp")
#> # Predicted counts of count
#>
#> spp | Predicted | 95% CI
#> --------------------------------
#> GP | 0.73 | [0.59, 0.89]
#> PR | 0.18 | [0.12, 0.27]
#> DM | 0.91 | [0.76, 1.10]
#> EC-A | 0.34 | [0.25, 0.45]
#> EC-L | 1.35 | [1.15, 1.59]
#> DES-L | 1.43 | [1.22, 1.68]
#> DF | 0.79 | [0.64, 0.96]
ggeffect(m, terms = "spp")
#> # Predicted counts of count
#>
#> spp | Predicted | 95% CI
#> --------------------------------
#> GP | 0.76 | [0.62, 0.93]
#> PR | 0.19 | [0.13, 0.28]
#> DM | 0.96 | [0.79, 1.15]
#> EC-A | 0.35 | [0.26, 0.47]
#> EC-L | 1.41 | [1.20, 1.66]
#> DES-L | 1.50 | [1.28, 1.75]
#> DF | 0.82 | [0.67, 1.00]
The documentation explains that internally ggemmeans() calls emmeans::emmeans() while ggeffect() calls effects::Effect().
Both emmeans and effects compute marginal effects but they make a different (default) choice how to marginalize out (ie. average over) mined in order to get the effect of spp.
mined is a categorical variable with two levels: "yes" and "no". The crucial bit is that the two levels are not balanced: there are slightly more "no"s than "yes"s.
xtabs(~ mined + spp, data = Salamanders)
#> spp
#> mined GP PR DM EC-A EC-L DES-L DF
#> yes 44 44 44 44 44 44 44
#> no 48 48 48 48 48 48 48
Intuitively, this means that the weighted average over mined [think of (44 × yes + 48 × no) / 92] is not the same as the simple average over mined [think of (yes + no) / 2].
Let's check the intuition by specifying how to marginalize out mined when we call emmeans::emmeans() directly.
# mean (default)
emmeans::emmeans(m, "spp", type = "response", weights = "equal")
#> spp rate SE df lower.CL upper.CL
#> GP 0.726 0.0767 636 0.590 0.893
#> PR 0.181 0.0358 636 0.123 0.267
#> DM 0.914 0.0879 636 0.757 1.104
#> EC-A 0.336 0.0497 636 0.251 0.449
#> EC-L 1.351 0.1120 636 1.148 1.590
#> DES-L 1.432 0.1163 636 1.221 1.679
#> DF 0.786 0.0804 636 0.643 0.961
#>
#> Results are averaged over the levels of: mined
#> Confidence level used: 0.95
#> Intervals are back-transformed from the log scale
# weighted mean
emmeans::emmeans(m, "spp", type = "response", weights = "proportional")
#> spp rate SE df lower.CL upper.CL
#> GP 0.759 0.0794 636 0.618 0.932
#> PR 0.190 0.0373 636 0.129 0.279
#> DM 0.955 0.0909 636 0.793 1.152
#> EC-A 0.351 0.0517 636 0.263 0.469
#> EC-L 1.412 0.1153 636 1.203 1.658
#> DES-L 1.496 0.1196 636 1.279 1.751
#> DF 0.822 0.0832 636 0.674 1.003
#>
#> Results are averaged over the levels of: mined
#> Confidence level used: 0.95
#> Intervals are back-transformed from the log scale
The second option returns the marginal effects computed with ggeffects::ggeffect.
Update
#Daniel points out that ggeffects accepts the weights argument and will pass it to emmeans. This way you can keep using ggeffects and still control how predictions are averaged to compute marginal effects.
Try it out for yourself with:
ggemmeans(m, terms="spp", weights = "proportional")
ggemmeans(m, terms="spp", weights = "equal")
Related
I'm trying to convert a list of data frames from a ggeffects object into one data frame, so I can use it better in ggplot2. This is an simple example of what I'm trying:
library(ggeffects)
library(dplyr)
data(efc)
fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc)
full <- ggpredict(fit)
df <- bind_rows(full, .id="id")
But this gives me the following error:
Error: Can't recycle c12hour (size 35) to match neg_c_7 (size 12).
I'm new to R and Stackoverflow, so I hope this is all clear. Thank you!
I don't fully understand your goal, but one way of binding the data frames in a list is using do.call(bind_rows, thelist):
do.call(bind_rows, full)
# Predicted values of Total score BARTHEL INDEX
# c12hour
c12hour | Predicted | 95% CI
------------------------------------
0 | 75.44 | [73.26, 77.63]
35 | 66.58 | [64.91, 68.25]
70 | 57.71 | [55.81, 59.61]
100 | 50.11 | [47.55, 52.68]
170 | 32.38 | [27.73, 37.03]
# c161sex
c12hour | Predicted | 95% CI
------------------------------------
1 | 63.96 | [60.57, 67.35]
2 | 65.00 | [63.11, 66.90]
# c172code
c12hour | Predicted | 95% CI
------------------------------------
1 | 64.06 | [61.01, 67.10]
2 | 64.78 | [63.12, 66.43]
3 | 65.49 | [62.32, 68.67]
# neg_c_7
c12hour | Predicted | 95% CI
------------------------------------
6 | 78.17 | [75.11, 81.22]
10 | 68.98 | [67.14, 70.81]
14 | 59.79 | [57.88, 61.69]
20 | 46.00 | [42.04, 49.97]
28 | 27.63 | [20.31, 34.95]
Adjusted for:
* neg_c_7 = 11.84
* c161sex = 1.76
* c172code = 1.97
However, this form does not show all the data and the columns. To show all of them, you can use as.data.frame() or as_tibble().
do.call(bind_rows, full) |> as_tibble()
# A tibble: 52 × 6
x predicted std.error conf.low conf.high group
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 0 75.4 1.12 73.3 77.6 c12hour
2 5 74.2 1.06 72.1 76.3 c12hour
3 10 72.9 1.01 70.9 74.9 c12hour
4 15 71.6 0.965 69.8 73.5 c12hour
5 20 70.4 0.925 68.6 72.2 c12hour
6 25 69.1 0.893 67.4 70.9 c12hour
7 30 67.8 0.868 66.1 69.5 c12hour
8 35 66.6 0.851 64.9 68.2 c12hour
9 40 65.3 0.842 63.7 67.0 c12hour
10 45 64.0 0.843 62.4 65.7 c12hour
# … with 42 more rows
This can then be used to create a plot by using ggplot. For example:
do.call(bind_rows, full) |>
ggplot(aes(x =x, y = predicted, col = group)) +
geom_point()
The resulted plot:
I am trying to reproduce a result from R in Stata (Please note that the data below is fictitious and serves just as an example). For some reason however, Stata appears to deal with certain issues differently than R. It chooses different dummy variables to kick out in case of multicollinearity.
I have posted a related question dealing with the statistical implications of these country-year dummies being removed here.
In the example below, R kicks out 2, while Stata kicks out 3, leading to a different result. Check for example the coefficients and p-values for vote and vote_won.
In essence, all I want to know is how to communicate to either R or Stata, which variables to kick out, so that they both do the same.
Data
The data looks as follows:
library(data.table)
library(dplyr)
library(foreign)
library(censReg)
library(wooldridge)
data('mroz')
year= c(2005, 2010)
country = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
n <- 2
DT <- data.table( country = rep(sample(country, length(mroz), replace = T), each = n),
year = c(replicate(length(mroz), sample(year, n))))
x <- DT
DT <- rbind(DT, DT); DT <- rbind(DT, DT); DT <- rbind(DT, DT) ; DT <- rbind(DT, DT); DT <- rbind(DT, x)
mroz <- mroz[-c(749:753),]
DT <- cbind(mroz, DT)
DT <- DT %>%
group_by(country) %>%
mutate(base_rate = as.integer(runif(1, 12.5, 37.5))) %>%
group_by(country, year) %>%
mutate(taxrate = base_rate + as.integer(runif(1,-2.5,+2.5)))
DT <- DT %>%
group_by(country, year) %>%
mutate(vote = sample(c(0,1),1),
votewon = ifelse(vote==1, sample(c(0,1),1),0))
rm(mroz,x, country, year)
The lm regression in R
summary(lm(educ ~ exper + I(exper^2) + vote + votewon + country:as.factor(year), data=DT))
Call:
lm(formula = educ ~ exper + I(exper^2) + vote + votewon + country:as.factor(year),
data = DT)
Residuals:
Min 1Q Median 3Q Max
-7.450 -0.805 -0.268 0.954 5.332
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.170064 0.418578 26.69 < 0.0000000000000002 ***
exper 0.103880 0.029912 3.47 0.00055 ***
I(exper^2) -0.002965 0.000966 -3.07 0.00222 **
vote 0.576865 0.504540 1.14 0.25327
votewon 0.622522 0.636241 0.98 0.32818
countryA:as.factor(year)2005 -0.196348 0.503245 -0.39 0.69653
countryB:as.factor(year)2005 -0.530681 0.616653 -0.86 0.38975
countryC:as.factor(year)2005 0.650166 0.552019 1.18 0.23926
countryD:as.factor(year)2005 -0.515195 0.638060 -0.81 0.41968
countryE:as.factor(year)2005 0.731681 0.502807 1.46 0.14605
countryG:as.factor(year)2005 0.213345 0.674642 0.32 0.75192
countryH:as.factor(year)2005 -0.811374 0.637254 -1.27 0.20334
countryI:as.factor(year)2005 0.584787 0.503606 1.16 0.24594
countryJ:as.factor(year)2005 0.554397 0.674789 0.82 0.41158
countryA:as.factor(year)2010 0.388603 0.503358 0.77 0.44035
countryB:as.factor(year)2010 -0.727834 0.617210 -1.18 0.23869
countryC:as.factor(year)2010 -0.308601 0.504041 -0.61 0.54056
countryD:as.factor(year)2010 0.785603 0.503165 1.56 0.11888
countryE:as.factor(year)2010 0.280305 0.452293 0.62 0.53562
countryG:as.factor(year)2010 0.672074 0.674721 1.00 0.31954
countryH:as.factor(year)2010 NA NA NA NA
countryI:as.factor(year)2010 NA NA NA NA
countryJ:as.factor(year)2010 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.3 on 728 degrees of freedom
Multiple R-squared: 0.037, Adjusted R-squared: 0.0119
F-statistic: 1.47 on 19 and 728 DF, p-value: 0.0882
Same regression in Stata
write.dta(DT, "C:/Users/.../mroz_adapted.dta")
encode country, gen(n_country)
reg educ c.exper c.exper#c.exper vote votewon n_country#i.year
note: 9.n_country#2010.year omitted because of collinearity
note: 10.n_country#2010.year omitted because of collinearity
Source | SS df MS Number of obs = 748
-------------+---------------------------------- F(21, 726) = 1.80
Model | 192.989406 21 9.18997171 Prob > F = 0.0154
Residual | 3705.47583 726 5.1039612 R-squared = 0.0495
-------------+---------------------------------- Adj R-squared = 0.0220
Total | 3898.46524 747 5.21882897 Root MSE = 2.2592
---------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
exper | .1109858 .0297829 3.73 0.000 .052515 .1694567
|
c.exper#c.exper | -.0031891 .000963 -3.31 0.001 -.0050796 -.0012986
|
vote | .0697273 .4477115 0.16 0.876 -.8092365 .9486911
votewon | -.0147825 .6329659 -0.02 0.981 -1.257445 1.227879
|
n_country#year |
A#2010 | .0858634 .4475956 0.19 0.848 -.7928728 .9645997
B#2005 | -.4950677 .5003744 -0.99 0.323 -1.477421 .4872858
B#2010 | .0951657 .5010335 0.19 0.849 -.8884818 1.078813
C#2005 | -.5162827 .447755 -1.15 0.249 -1.395332 .3627664
C#2010 | -.0151834 .4478624 -0.03 0.973 -.8944434 .8640767
D#2005 | .3664596 .5008503 0.73 0.465 -.6168283 1.349747
D#2010 | .5119858 .500727 1.02 0.307 -.4710599 1.495031
E#2005 | .5837942 .6717616 0.87 0.385 -.7350329 1.902621
E#2010 | .185601 .5010855 0.37 0.711 -.7981486 1.169351
F#2005 | .5987978 .6333009 0.95 0.345 -.6445219 1.842117
F#2010 | .4853639 .7763936 0.63 0.532 -1.038881 2.009608
G#2005 | -.3341302 .6328998 -0.53 0.598 -1.576663 .9084021
G#2010 | .2873193 .6334566 0.45 0.650 -.956306 1.530945
H#2005 | -.4365233 .4195984 -1.04 0.299 -1.260294 .3872479
H#2010 | -.1683725 .6134262 -0.27 0.784 -1.372673 1.035928
I#2005 | -.39264 .7755549 -0.51 0.613 -1.915238 1.129958
I#2010 | 0 (omitted)
J#2005 | 1.036108 .4476018 2.31 0.021 .1573591 1.914856
J#2010 | 0 (omitted)
|
_cons | 11.58369 .350721 33.03 0.000 10.89514 12.27224
---------------------------------------------------------------------------------
Just for your question about which 'variables to kick out": I guess you meant which combination of interaction terms to be used as the reference group for calculating regression coefficients.
By default, Stata uses the combination of the lowest values of two variables as the reference while R uses the highest values of two variables as the reference. I use Stata auto data to demonstrate this:
# In R
webuse::webuse("auto")
auto$foreign = as.factor(auto$foreign)
auto$rep78 = as.factor(auto$rep78)
# Model
r_model <- lm(mpg ~ rep78:foreign, data=auto)
broom::tidy(r_model)
# A tibble: 11 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 26.3 1.65 15.9 2.09e-23
2 rep781:foreign0 -5.33 3.88 -1.38 1.74e- 1
3 rep782:foreign0 -7.21 2.41 -2.99 4.01e- 3
4 rep783:foreign0 -7.33 1.91 -3.84 2.94e- 4
5 rep784:foreign0 -7.89 2.34 -3.37 1.29e- 3
6 rep785:foreign0 5.67 3.88 1.46 1.49e- 1
7 rep781:foreign1 NA NA NA NA
8 rep782:foreign1 NA NA NA NA
9 rep783:foreign1 -3.00 3.31 -0.907 3.68e- 1
10 rep784:foreign1 -1.44 2.34 -0.618 5.39e- 1
11 rep785:foreign1 NA NA NA NA
In Stata:
. reg mpg i.foreign#i.rep78
note: 1.foreign#1b.rep78 identifies no observations in the sample
note: 1.foreign#2.rep78 identifies no observations in the sample
Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(7, 61) = 4.88
Model | 839.550121 7 119.935732 Prob > F = 0.0002
Residual | 1500.65278 61 24.6008652 R-squared = 0.3588
-------------+---------------------------------- Adj R-squared = 0.2852
Total | 2340.2029 68 34.4147485 Root MSE = 4.9599
-------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
foreign#rep78 |
Domestic#2 | -1.875 3.921166 -0.48 0.634 -9.715855 5.965855
Domestic#3 | -2 3.634773 -0.55 0.584 -9.268178 5.268178
Domestic#4 | -2.555556 3.877352 -0.66 0.512 -10.3088 5.19769
Domestic#5 | 11 4.959926 2.22 0.030 1.082015 20.91798
Foreign#1 | 0 (empty)
Foreign#2 | 0 (empty)
Foreign#3 | 2.333333 4.527772 0.52 0.608 -6.720507 11.38717
Foreign#4 | 3.888889 3.877352 1.00 0.320 -3.864357 11.64213
Foreign#5 | 5.333333 3.877352 1.38 0.174 -2.419912 13.08658
|
_cons | 21 3.507197 5.99 0.000 13.98693 28.01307
-------------------------------------------------------------------------------
To reproduce the previous R in Stata, we could recode those two variables foreign and rep78:
. reg mpg i.foreign2#i.rep2
note: 0b.foreign2#1.rep2 identifies no observations in the sample
note: 0b.foreign2#2.rep2 identifies no observations in the sample
Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(7, 61) = 4.88
Model | 839.550121 7 119.935732 Prob > F = 0.0002
Residual | 1500.65278 61 24.6008652 R-squared = 0.3588
-------------+---------------------------------- Adj R-squared = 0.2852
Total | 2340.2029 68 34.4147485 Root MSE = 4.9599
-------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
foreign2#rep2 |
0 1 | 0 (empty)
0 2 | 0 (empty)
0 3 | -3 3.306617 -0.91 0.368 -9.61199 3.61199
0 4 | -1.444444 2.338132 -0.62 0.539 -6.119827 3.230938
1 0 | 5.666667 3.877352 1.46 0.149 -2.086579 13.41991
1 1 | -5.333333 3.877352 -1.38 0.174 -13.08658 2.419912
1 2 | -7.208333 2.410091 -2.99 0.004 -12.02761 -2.389059
1 3 | -7.333333 1.909076 -3.84 0.000 -11.15077 -3.515899
1 4 | -7.888889 2.338132 -3.37 0.001 -12.56427 -3.213506
|
_cons | 26.33333 1.653309 15.93 0.000 23.02734 29.63933
-------------------------------------------------------------------------------
The same approach applies to reproduce Stata results in R, just redefine levels of those two factor variables.
I'm using lmer4 package [lmer() function] to estimate several Average Models, which I want to plot their Estimated Coefficients. I found this document, "Plotting Estimates (Fixed Effects) of Regression Models, by Daniel Lüdecke" that explains how to plot Estimates, and it works with Average Models, but uses Conditional Average values instead of Full Average values.
Example of script:
library(lme4)
options(na.action = "na.omit")
PA_model_clima1_Om_ST <- lmer(O.matt ~ mes_N + Temperatura_Ar_PM_ST + RH_PM_ST + Vento_V_PM_ST + Evapotranspiracao_PM_ST + Preci_total_PM_ST + (1|ID), data=Abund)
library(MuMIn)
options(na.action = "na.fail")
PA_clima1_Om_ST<-dredge(PA_model_clima1_Om_ST)
sort.PA_clima1_Om_ST<- PA_clima1_Om_ST[order(PA_clima1_Om_ST$AICc),]
top.models_PA_clima1_Om_ST<-get.models(sort.PA_clima1_Om_ST, subset = delta < 2)
model.sel(top.models_PA_clima1_Om_ST)
Avg_PA_clima1_Om_ST<-model.avg(top.models_PA_clima1_Om_ST, fit = TRUE)
summary(Avg_PA_clima1_Om_ST)
Results of this script:
Term codes:
Evapotranspiracao_PM_ST Preci_total_PM_ST RH_PM_ST Temperatura_Ar_PM_ST
1 2 3 4
Vento_V_PM_ST
5
Model-averaged coefficients:
(full average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) 5.4199 1.4094 1.4124 3.837 0.000124 ***
Preci_total_PM_ST -0.8679 1.0300 1.0313 0.842 0.400045
RH_PM_ST 0.6116 0.8184 0.8193 0.746 0.455397
Temperatura_Ar_PM_ST -1.9635 0.7710 0.7725 2.542 0.011026 *
Vento_V_PM_ST -0.6214 0.7043 0.7052 0.881 0.378289
Evapotranspiracao_PM_ST -0.1202 0.5174 0.5183 0.232 0.816654
(conditional average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) 5.4199 1.4094 1.4124 3.837 0.000124 ***
Preci_total_PM_ST -1.2200 1.0304 1.0322 1.182 0.237249
RH_PM_ST 1.0067 0.8396 0.8410 1.197 0.231317
Temperatura_Ar_PM_ST -1.9635 0.7710 0.7725 2.542 0.011026 *
Vento_V_PM_ST -0.8607 0.6936 0.6949 1.238 0.215546
Evapotranspiracao_PM_ST -0.3053 0.7897 0.7912 0.386 0.699619
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Plot scrip:
library(sjPlot)
library(sjlabelled)
library(sjmisc)
library(ggplot2)
data(efc)
theme_set(theme_sjplot())
plot_model(Avg_PA_clima1_Om_ST, type="est", vline.color="black", sort.est = TRUE, show.values = TRUE, value.offset = .3, title= "O. mattogrossae")
Plot:
As you can see, it uses the values of Conditional Average values instead of Full Average values.
How can I plot Estimates of Average Models using Full Average values?
I think it takes the conditional.. so unless you hack the function or maybe contact the author to have such an option, one way is to plot the coefficients yourself:
library(lme4)
library(MuMIn)
options(na.action = "na.fail")
set.seed(888)
dat= data.frame(y = rnorm(100),
var1 = rnorm(100),var2 = rnorm(100),
var3=rnorm(100),rvar = sample(1:2,replace=TRUE,100))
lme_mod <- lmer(y ~ var1+ var2+ var3 + (1|rvar), dat)
dre_mod <- dredge(lme_mod)
avg_mod = model.avg(dre_mod,fit=TRUE)
summary(avg_mod)
Model-averaged coefficients:
(full average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) -0.02988 0.18699 0.18936 0.158 0.875
var2 -0.03791 0.08817 0.08858 0.428 0.669
var1 -0.02999 0.07740 0.07778 0.386 0.700
var3 0.01521 0.05371 0.05404 0.281 0.778
(conditional average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) -0.02988 0.18699 0.18936 0.158 0.875
var2 -0.16862 0.11197 0.11339 1.487 0.137
var1 -0.15293 0.10841 0.10978 1.393 0.164
var3 0.11227 0.10200 0.10327 1.087 0.277
The matrix is under:
summary(avg_mod)$coefmat.full
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) -0.02988418 0.18698720 0.18935677 0.1578194 0.8745991
var2 -0.03791016 0.08816936 0.08857788 0.4279867 0.6686608
var1 -0.02998709 0.07740247 0.07778028 0.3855360 0.6998404
var3 0.01520633 0.05371407 0.05404100 0.2813850 0.7784151
We take it out, pivot and plot:
library(ggplot2)
df = data.frame(summary(avg_mod)$coefmat.full)
df$variable = rownames(df)
colnames(df)[2] = "std_error"
df = df[df$variable !="(Intercept)",]
df$type = ifelse(df$Estimate>0,"pos","neg")
ggplot(df,aes(x=variable,y=Estimate))+
geom_point(aes(col=type),size=3) +
geom_errorbar(aes(col=type,ymin=Estimate-1.96*std_error,ymax=Estimate+1.96*std_error),width=0,size=1) +
geom_text(aes(label=round(Estimate,digits=2)),nudge_x =0.1) +
geom_hline(yintercept=0,col="black")+ theme_bw()+coord_flip()+
scale_color_manual(values=c("#c70039","#111d5e")) +
theme(legend.position="none")
You can also use parameters::model_parameters(), which is internally used by sjPlot::plot_model(). model_parameters() has a component-argument to decide which component to return. However, plot_model() does not yet pass additional arguments down to model_parameters(). I'm going to address this in sjPlot. Meanwhile, using model_parameters() at least offers a quick plot()-method.
library(lme4)
library(MuMIn)
options(na.action = "na.fail")
set.seed(888)
dat= data.frame(y = rnorm(100),
var1 = rnorm(100),var2 = rnorm(100),
var3=rnorm(100),rvar = sample(1:2,replace=TRUE,100))
lme_mod <- lmer(y ~ var1+ var2+ var3 + (1|rvar), dat)
dre_mod <- dredge(lme_mod)
xavg_mod = model.avg(dre_mod,fit=TRUE)
library(parameters)
model_parameters(avg_mod)
#> Parameter | Coefficient | SE | 95% CI | z | df | p
#> --------------------------------------------------------------------
#> (Intercept) | -0.03 | 0.19 | [-0.40, 0.34] | 0.16 | 96 | 0.875
#> var2 | -0.17 | 0.11 | [-0.39, 0.05] | 1.49 | 96 | 0.137
#> var1 | -0.15 | 0.11 | [-0.37, 0.06] | 1.39 | 96 | 0.164
#> var3 | 0.11 | 0.10 | [-0.09, 0.31] | 1.09 | 96 | 0.277
model_parameters(avg_mod, component = "full")
#> Parameter | Coefficient | SE | 95% CI | z | df | p
#> --------------------------------------------------------------------
#> (Intercept) | -0.03 | 0.19 | [-0.40, 0.34] | 0.16 | 96 | 0.875
#> var2 | -0.04 | 0.09 | [-0.21, 0.14] | 0.43 | 96 | 0.669
#> var1 | -0.03 | 0.08 | [-0.18, 0.12] | 0.39 | 96 | 0.700
#> var3 | 0.02 | 0.05 | [-0.09, 0.12] | 0.28 | 96 | 0.778
plot(model_parameters(avg_mod, component = "full"))
You can do some minor modifications to the plot:
library(ggplot2)
plot(model_parameters(avg_mod, component = "full")) +
geom_text(aes(label = round(Coefficient, 2)), nudge_x = .2)
Created on 2020-06-27 by the reprex package (v0.3.0)
I normally work with lme4 package, but the glmmTMB package is increasingly becoming better suited to work with highly complicated data (think overdispersion and/or zero-inflation).
Is there a way to extract posterior modes and credible intervals from glmmTMB models, similar to how it is done for lme4 models (example here).
Details:
I am working with count data (available here) that are zero-inflated and overdispersed and have random effects. The package best suited to work with this sort of data is the glmmTMB (details here). (Note two outliers: euc0==78 and np_other_grass==20).
The data looks like this:
euc0 ea_grass ep_grass np_grass np_other_grass month year precip season prop_id quad
3 5.7 0.0 16.7 4.0 7 2006 526 Winter Barlow 1
0 6.7 0.0 28.3 0.0 7 2006 525 Winter Barlow 2
0 2.3 0.0 3.3 0.0 7 2006 524 Winter Barlow 3
0 1.7 0.0 13.3 0.0 7 2006 845 Winter Blaber 4
0 5.7 0.0 45.0 0.0 7 2006 817 Winter Blaber 5
0 11.7 1.7 46.7 0.0 7 2006 607 Winter DClark 3
The glmmTMB model:
model<-glmmTMB(euc0 ~ ea_grass + ep_grass + np_grass + np_other_grass + (1|prop_id), data = euc, family= nbinom2) #nbimom2 lets var increases quadratically
summary(model)
confint(model) #this gives the confidence intervals
How I would normally extract the posterior mode and credible intervals for a lmer/glmer model:
#extracting model estimates and credible intervals
sm.model <-arm::sim(model, n.sim=1000)
smfixef.model = sm.model#fixef
smfixef.model =coda::as.mcmc(smfixef.model)
MCMCglmm::posterior.mode(smfixef.model) #mode of the distribution
coda::HPDinterval(smfixef.model) #credible intervals
#among-brood variance
bid<-sm.model#ranef$prop_id[,,1]
bvar<-as.vector(apply(bid, 1, var)) #between brood variance posterior distribution
bvar<-coda::as.mcmc(bvar)
MCMCglmm::posterior.mode(bvar) #mode of the distribution
coda::HPDinterval(bvar) #credible intervals
Most of an answer:
Getting a multivariate Normal sample of the parameters of the conditional model is pretty easy (I think this is what arm::sim() is doing.
library(MASS)
pp <- fixef(model)$cond
vv <- vcov(model)$cond
samp <- MASS::mvrnorm(1000, mu=pp, Sigma=vv)
(then use the rest of your method above).
I'm a little skeptical that your second example is doing what you want it to do. The variance of the conditional modes is not necessarily a good estimate of the between-group variance (e.g. see here). Furthermore, I'm nervous about the half-assed-Bayesian approach (e.g., why no priors? Why look at the posterior mode, which is rarely a meaningful value in a Bayesian context?) although I do sometimes use similar approaches myself!) However, it's not too hard to use glmmTMB results to do a proper Markov chain Monte Carlo analysis:
library(tmbstan)
library(rstan)
library(coda)
library(emdbook) ## for lump.mcmc.list(), or use runjags::combine.mcmc()
t2 <- system.time(m2 <- tmbstan(model$obj))
m3 <- rstan::As.mcmc.list(m2)
lattice::xyplot(m3,layout=c(5,6))
m4 <- emdbook::lump.mcmc.list(m3)
coda::HPDinterval(m4)
It may be helpful to know that the theta column of m4 is the log of the among-group standard standard deviation ...
(See vignette("mcmc", package="glmmTMB") for a little bit more information ...)
I think Ben has already answered your question, so my answer does not add much to the discussion... Maybe just one thing, as you wrote in your comments that you're interested in the within- and between-group variances. You can get these information via parameters::random_parameters() (if I did not misunderstand what you were looking for). See example below that first generates simulated samples from a multivariate normal (just like in Ben's example), and later gives you a summary of the random effect variances...
library(readr)
library(glmmTMB)
library(parameters)
library(bayestestR)
library(insight)
euc_data <- read_csv("D:/Downloads/euc_data.csv")
model <-
glmmTMB(
euc0 ~ ea_grass + ep_grass + np_grass + np_other_grass + (1 | prop_id),
data = euc_data,
family = nbinom2
) #nbimom2 lets var increases quadratically
# generate samples
samples <- parameters::simulate_model(model)
#> Model has no zero-inflation component. Simulating from conditional parameters.
# describe samples
bayestestR::describe_posterior(samples)
#> # Description of Posterior Distributions
#>
#> Parameter | Median | 89% CI | pd | 89% ROPE | % in ROPE
#> --------------------------------------------------------------------------------
#> (Intercept) | -1.072 | [-2.183, -0.057] | 0.944 | [-0.100, 0.100] | 1.122
#> ea_grass | -0.001 | [-0.033, 0.029] | 0.525 | [-0.100, 0.100] | 100.000
#> ep_grass | -0.050 | [-0.130, 0.038] | 0.839 | [-0.100, 0.100] | 85.297
#> np_grass | -0.020 | [-0.054, 0.012] | 0.836 | [-0.100, 0.100] | 100.000
#> np_other_grass | -0.002 | [-0.362, 0.320] | 0.501 | [-0.100, 0.100] | 38.945
# or directly get summary of sample description
sp <- parameters::simulate_parameters(model, ci = .95, ci_method = "hdi", test = c("pd", "p_map"))
sp
#> Model has no zero-inflation component. Simulating from conditional parameters.
#> # Description of Posterior Distributions
#>
#> Parameter | Coefficient | p_MAP | pd | CI
#> --------------------------------------------------------------
#> (Intercept) | -1.037 | 0.281 | 0.933 | [-2.305, 0.282]
#> ea_grass | -0.001 | 0.973 | 0.511 | [-0.042, 0.037]
#> ep_grass | -0.054 | 0.553 | 0.842 | [-0.160, 0.047]
#> np_grass | -0.019 | 0.621 | 0.802 | [-0.057, 0.023]
#> np_other_grass | 0.019 | 0.999 | 0.540 | [-0.386, 0.450]
plot(sp) + see::theme_modern()
#> Model has no zero-inflation component. Simulating from conditional parameters.
# random effect variances
parameters::random_parameters(model)
#> # Random Effects
#>
#> Within-Group Variance 2.92 (1.71)
#> Between-Group Variance
#> Random Intercept (prop_id) 2.1 (1.45)
#> N (groups per factor)
#> prop_id 18
#> Observations 346
insight::get_variance(model)
#> Warning: mu of 0.2 is too close to zero, estimate of random effect variances may be unreliable.
#> $var.fixed
#> [1] 0.3056285
#>
#> $var.random
#> [1] 2.104233
#>
#> $var.residual
#> [1] 2.91602
#>
#> $var.distribution
#> [1] 2.91602
#>
#> $var.dispersion
#> [1] 0
#>
#> $var.intercept
#> prop_id
#> 2.104233
Created on 2020-05-26 by the reprex package (v0.3.0)
I noticed using plot_models from package sjPlot gives confidence intervals based on the Naive standard errors. I want it to use the Robust SEs. Is there a simple fix?
Currently, sjPlot does not support this option, however, it is planned for a forthcoming update. sjPlot uses the parameters package to compute model parameters - if you don't mind updating the parameters package from GitHub (and installing the see package), you can already use this feature:
library(parameters)
library(gee)
data(warpbreaks)
model <- gee(breaks ~ tension, id = wool, data = warpbreaks)
#> Beginning Cgee S-function, #(#) geeformula.q 4.13 98/01/27
#> running glm to get initial regression estimate
#> (Intercept) tensionM tensionH
#> 36.38889 -10.00000 -14.72222
mp <- model_parameters(model)
mp
#> Parameter | Coefficient | SE | 95% CI | z | df | p
#> ------------------------------------------------------------------------
#> (Intercept) | 36.39 | 2.80 | [ 30.90, 41.88] | 12.99 | 51 | < .001
#> tension [M] | -10.00 | 3.96 | [-17.76, -2.24] | -2.53 | 51 | 0.015
#> tension [H] | -14.72 | 3.96 | [-22.48, -6.96] | -3.72 | 51 | < .001
plot(mp)
mp <- model_parameters(model, robust = TRUE)
mp
#> Parameter | Coefficient | SE | 95% CI | z | df | p
#> ------------------------------------------------------------------------
#> (Intercept) | 36.39 | 5.77 | [ 25.07, 47.71] | 6.30 | 51 | < .001
#> tension [M] | -10.00 | 7.46 | [-24.63, 4.63] | -3.94 | 51 | 0.186
#> tension [H] | -14.72 | 3.73 | [-22.04, -7.41] | -1.34 | 51 | < .001
plot(mp)
Created on 2019-12-23 by the reprex package (v0.3.0)