Using emmeans with brms - r

I regularly use emmeans to calculate custom contrasts scross a wide range of statistical models. One of its strengths is its versatility: it is compatible with a huge range of packages. I have recently discovered that emmeans is compatible with the brms package, but am having trouble getting it to work. I will conduct an example multinomial logistic regression analysis use a dataset provided here. I will also conduct the same analysis in another package (nnet) to demonstrate what I need.
library(brms)
library(nnet)
library(emmeans)
# read in data
ml <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")
The data set contains variables on 200 students. The outcome variable is prog, program type, a three-level categorical variable (general, academic, vocation). The predictor variable is social economic status, ses, a three-level categorical variable. Now to conduct the analysis via the nnet package nnet
# first relevel so 'academic' is the reference level
ml$prog2 <- relevel(ml$prog, ref = "academic")
# run test in nnet
test_nnet <- multinom(prog2 ~ ses,
data = ml)
Now run the same test in brms
# run test in brms (note: will take 30 - 60 seconds)
test_brm <- brm(prog2 ~ ses,
data = ml,
family = "categorical")
I will not print the output of the two models but the coefficients are roughly equivalent in both
Now to create an emmeans object that will allow us to conduct pariwise tests
# pass into emmeans
rg_nnet <- ref_grid(test_nnet)
em_nnet <- emmeans(rg_nnet,
specs = ~prog2|ses)
# regrid to get coefficients as logit
em_nnet_logit <- regrid(em_nnet,
transform = "logit")
em_nnet_logit
# output
# ses = low:
# prog2 prob SE df lower.CL upper.CL
# academic -0.388 0.297 6 -1.115 0.3395
# general -0.661 0.308 6 -1.415 0.0918
# vocation -1.070 0.335 6 -1.889 -0.2519
#
# ses = middle:
# prog2 prob SE df lower.CL upper.CL
# academic -0.148 0.206 6 -0.651 0.3558
# general -1.322 0.252 6 -1.938 -0.7060
# vocation -0.725 0.219 6 -1.260 -0.1895
#
# ses = high:
# prog2 prob SE df lower.CL upper.CL
# academic 0.965 0.294 6 0.246 1.6839
# general -1.695 0.363 6 -2.582 -0.8072
# vocation -1.986 0.403 6 -2.972 -0.9997
#
# Results are given on the logit (not the response) scale.
# Confidence level used: 0.95
So now we have our lovely emmeans() object that we can use to perform a vast array of different comparisons.
However, when I try to do the same thing with the brms object, I don't even get past the first step of converting the brms object into a reference grid before I get an error message
# do the same for brm
rg_brm <- ref_grid(test_brm)
Error : The select parameter is not predicted by a linear formula. Use the 'dpar' and 'nlpar' arguments to select the parameter for which marginal means should be computed.
Predicted distributional parameters are: 'mugeneral', 'muvocation'
Predicted non-linear parameters are: ''
Error in ref_grid(test_brm) :
Perhaps a 'data' or 'params' argument is needed
Obviously, and unsurprisingly, there are some steps I am not aware of to get the Bayesian software to play nicely with emmeans. Clearly there are some extra parameters I need to specify at some stage of the process but I'm not sure if these need to be specified in brms or in emmeans. I've searched around the web but am having trouble finding a simple but thorough guide.
Can anyone who knows how, help me to get the brms model into an emmeans object?

Related

Marginal effects plot of PLM

I’ve run an individual-fixed effects panel model in R using the plm-package. I now want to plot the marginal effects.
However, neither plot_model() nor effect_plot() work for plm-objects. plot_model() works for type = “est” but not for type = “pred”.
My online search so far only suggests using ggplot (which however only displays OLS-regressions, not fixed effects) or outdated functions (i.e, sjp.lm())
Does anyone have any recommendations how I can visualize effects of plm-objects?
IFE_Aut_uc <- plm(LoC_Authorities_rec ~ Compassion_rec, index = c("id","wave"), model = "within", effect = "individual", data = D3_long2)
summary(IFE_Aut_uc)
plot_model(IFE_Aut_uc, type = "pred”)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 50238, 82308
and:
effect_plot(IFE_Pol_uc, pred = Compassion_rec)
Error in `stop_wrap()`:
! ~does not appear to be a one- or two-sided formula.
LoC_Politicians_recdoes not appear to be a one- or two-sided formula.
Compassion_recdoes not appear to be a one- or two-sided formula.
Backtrace:
1. jtools::effect_plot(IFE_Pol_uc, pred = Compassion_rec)
2. jtools::get_data(model, warn = FALSE)
4. jtools:::get_lhs(formula)
Edit 2022-08-20: The latest version of plm on CRAN now includes a predict() method for within models. In principle, the commands illustrated below using fixest should now work with plm as well.
In my experience, plm models are kind of tricky to deal with, and many of the packages which specialize in “post-processing” fail to handle these objects properly.
One alternative would be to estimate your “within” model using the fixest package and to plot the results using the marginaleffects package. (Disclaimer: I am the marginaleffects author.)
Note that many of the models estimated by plm are officially supported and tested with marginaleffects (e.g., random effects, Amemiya, Swaymy-Arora). However, this is not the case of this specific "within" model, which is even trickier than the others to support.
First, we estimate two models to show that the plm and fixest versions are equivalent:
library(plm)
library(fixest)
library(marginaleffects)
library(modelsummary)
data("EmplUK")
mod1 <- plm(
emp ~ wage * capital,
index = c("firm", "year"),
model = "within",
effect = "individual",
data = EmplUK)
mod2 <- feols(
emp ~ wage * capital | firm,
se = "standard",
data = EmplUK)
models <- list("PLM" = mod1, "FIXEST" = mod2)
modelsummary(models)
PLM
FIXEST
wage
0.000
0.000
(0.034)
(0.034)
capital
2.014
2.014
(0.126)
(0.126)
wage × capital
-0.043
-0.043
(0.004)
(0.004)
Num.Obs.
1031
1031
R2
0.263
0.986
R2 Adj.
0.145
0.984
R2 Within
0.263
R2 Within Adj.
0.260
AIC
4253.9
4253.9
BIC
4273.7
4273.7
RMSE
1.90
1.90
Std.Errors
IID
FE: firm
X
Now, we use the marginaleffects package to plot the results. There are two main functions for this:
plot_cap(): plot conditional adjusted predictions. How does my predicted outcome change as a function of a covariate?
plot_cme(): plot conditional marginal effects. How does the slope of my model with respect to one variable (i.e., a derivative or “marginal effect”) change with respect to another variable?
See the website for definitions and details: https://vincentarelbundock.github.io/marginaleffects/
plot_cap(mod2, condition = "capital")
plot_cme(mod2, effect = "wage", condition = "capital")

How to compute marginal effects of a multinomial logit model created with the nnet package?

I have a multinomial logit model created with the nnet R package, using the multinom command. The dependent variable has three categories/choice options. I am modelling the probability of selecting a certain irrigation type (no irrigation, surface irrigation, drip irrigation) based on farmer characteristics.
I would like to estimate marginal effects, i.e. by how much does the probability of selecting irrigation type Y change when I increase independent variable X by one unit? I have tried doing this with the margins package (marginal_effects), but this gives only 1 value per observation in the dataset. I was expecting three values, since I want the marginal effect for each of the three irrigation types.
Does someone know if there is a better R package to use for this? Or whether I am doing something wrong with the margins packages? Thank you.
You can use the marginaleffects
package to do
that (disclaimer: I am the maintainer). Please note the warning.
library(nnet)
library(marginaleffects)
mod <- multinom(factor(cyl) ~ hp + mpg, data = mtcars, quiet = true)
mfx <- marginaleffects(mod, type = "probs")
## Warning in sanity_model_specific.multinom(model, ...): The standard errors
## estimated by `marginaleffects` do not match those produced by Stata for
## `nnet::multinom` models. Please be very careful when interpreting the results.
summary(mfx)
## Average marginal effects
## type Group Term Effect Std. Error z value Pr(>|z|) 2.5 %
## 1 probs 6 hp 2.792e-04 0.000e+00 Inf < 2.22e-16 2.792e-04
## 2 probs 6 mpg -1.334e-03 0.000e+00 -Inf < 2.22e-16 -1.334e-03
## 3 probs 8 hp 2.396e-05 1.042e-126 2.298e+121 < 2.22e-16 2.396e-05
## 4 probs 8 mpg -2.180e-04 1.481e-125 -1.472e+121 < 2.22e-16 -2.180e-04
## 97.5 %
## 1 2.792e-04
## 2 -1.334e-03
## 3 2.396e-05
## 4 -2.180e-04
##
## Model type: multinom
## Prediction type: probs
The marginaleffects package should work in theory, but my example doesn't compile because of file size restrictions (meaning I don't have enough RAM for the 1.5 GB vector it tries to use). It's not even that large of a dataset, which is odd.
If you use marginal_effects() (margins package) for multinomial models, it only displays the output for a default category. You have to manually set each category you want to see. You can clean up the output with broom and then combine some other way. It's clunky, but it can work.
marginal_effects(model, category = 'cat1')

Cluster bootstrapped standard errors in R for plm functions

I have a fixed effects model with only few observations and would therefore like to bootstrap in order to obtain more accurate standard errors. At the same time, I assume SE to be clustered thus I would also like to correct for clustering, i.e. do a cluster bootstrap.
I found a function for lm models (vcovBS), however could not find anything for plm models. Does anybody know an analogous function to obtain cluster bootstrapped SE for fixed effects models?
The clusterSEs package has an implementation of the wild cluster bootstrap for plm models: https://www.rdocumentation.org/packages/clusterSEs/versions/2.6.2/topics/cluster.wild.plml.
An alternative package is fwildclusterboot. It does not work with plm but with two other fixed effects regression packages, lfe and fixest, and should be significantly faster than clusterSEs.
With the fixest package, its syntax would look like this:
library(fixest)
library(fwildclusterboot)
# load data set voters included in fwildclusterboot
data(voters)
# estimate the regression model via feols
feols_fit <- feols(proposition_vote ~ treatment + ideology1 + log_income + Q1_immigration , data = voters)
# bootstrap inference
boot_feols <- boottest(feols_fit, clustid = "group_id1", param = "treatment", B = 9999)
summary(boot_feols)
#> boottest.fixest(object = lm_fit, clustid = "group_id1", param = "treatment",
#> B = 9999)
#>
#> Observations: 300
#> Bootstr. Iter: 9999
#> Bootstr. Type: rademacher
#> Clustering: 1-way
#> Confidence Sets: 95%
#> Number of Clusters: 40
#>
#> term estimate statistic p.value conf.low conf.high
#> 1 treatment 0.073 3.786 0.001 0.033 0.114

In R, plot emmeans of glmmTMB linear model. Error in linkinv(summ$the.emmean) : could not find function "linkinv"

I would like to plot the emmeans of a glmmTMB model using plot().
When my glmmTMB model takes in log transformed data, plot(emmeans(glmmTMB_model)) works just fine.
However, when I attempt to plot the emmeans of the glmmTMB model of non-transformed data, I get the following error: Error in linkinv(summ$the.emmean) : could not find function "linkinv".
See my code below:
Site <- c(4,4,5,5)
Treatment <- c("Burnt_Intact-Vegetation","Burnt_Intact-Vegetation", "Burnt_Cleared", "Burnt_Cleared")
pH <- c(5.94, 5.91, 5.44, 5.49)
pH_EC_data <- data_frame(Site, Treatment, pH)
pH_model <- glmmTMB(pH~Treatment+(1|Site), data = pH_EC_data)
log10pH_model<- glmmTMB(log10(`pH`)~Treatment+(1|Site), data = pH_EC_data)
log10pH_analysis <- emmeans(log10pH_model, pairwise~Treatment, type = "response")
plot(log10pH_analysis)
##This plot works just fine.
pH_analysis <- emmeans(pH_model, pairwise~Treatment, type = "response")
plot(pH_analysis)
##This code results in the following error: Error in linkinv(summ$the.emmean) : could not find function "linkinv"
Note, log10pH_analysis and pH_analysis differ by one column. Emmeans of glmmTMB of logged data creates a "response" column whereas the same manipulation of non-tranformed data resulted in an "emmeans" column. See below:
log10pH_analysis
$emmeans
Treatment response SE df lower.CL upper.CL
Burnt_Cleared 5.47 0.0167 2 5.40 5.54
Burnt_Intact-Vegetation 5.90 0.0180 2 5.82 5.98
Confidence level used: 0.95
Intervals are back-transformed from the log10 scale
$contrasts
contrast estimate SE df t.ratio p.value
Burnt_Cleared - (Burnt_Intact-Vegetation) -0.0329 0.00188 2 -17.520 0.0032
Note: contrasts are still on the log10 scale
pH_analysis
$emmeans
Treatment emmean SE df lower.CL upper.CL
Burnt_Cleared 5.47 0.0176 2 5.39 5.55
Burnt_Intact-Vegetation 5.90 0.0176 2 5.82 5.98
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Burnt_Cleared - (Burnt_Intact-Vegetation) -0.43 0.0249 2 -17.238 0.0033
Thank you.
I am able to duplicate this with a similar example. It is due to a coding error that I will correct in the next update of the package.
The problem is related to the fact that your pH_analysis object is actually a list of two emmGrid objects -- one for the estimated marginals means, and the other for the pairwise comparisons of them. If you do
plot(pH_analysis, which = 1) # or plot(pH_analysis[[1]])
it will work just fine. The error occurs in trying to plot the second one as well, and the coding error has to do with the fact that the pairwise comparisons are not on a transformed scale but it thinks they are.
I do suggest that you not use plot() on the results of an emmeans() call with a two-sided formula or list of specs; and instead pick out the part you actually want to plot. Also, I think you must have a fairly old version of emmeans, as the default for plot() in recent versions is which = 1.
Thanks for reporting this bug.
It seems that type = "response" causes the problem for the non-transformed data.
pH_analysis <- emmeans(pH_model, pairwise~Treatment)
plot(pH_analysis)
##Plots beautifully; problem resolved

R - 'data' argument is of the wrong type using effect() function to summarise mixed models (lmer) estimate

I'm quite new to R, and recently I was tasked to use ggplot to visualise the results of an lmer model. To do so, I'm first trying to summarise and convert the mixed model estimates into a dataframe.
my code:
model <- lmer (outcome ~ group*time + (1|ID), data)
model.eff <- effect("group*time", model) #which supposedly summarises the mixed model estimates.
But here is where I'm stuck. I keep getting this error message
Error in terms.formula(formula, data = data) : 'data' argument is of the wrong type
After reading around, I gather the problem might lie with the class of my "model", but I'm not sure how to rectify this problem. Any help would be appreciated!
There is a package "broom" which makes handling model outputs much easier. It simply requires you to enter the model into the function "tidy":
library(broom)
model <-
lmer (outcome ~ group*time + (1|ID), data)
model.eff <- tidy(model)
You will then have an output in this style (example from my data as you didn't provide any example data):
effect group term estimate std.error statistic
1 fixed NA (Intercept) 6.14 4.68 1.31
2 fixed NA PFS_days -0.561 0.573 -0.981
3 ran_pars sex sd__(Intercept) 1.36 NA NA
4 ran_pars Residual sd__Observation 3.50 NA NA

Resources