Error using model averaging and predicting with MuMln in R - r

I'm using the MuMln package in R to get an averaged model (http://www.inside-r.org/packages/cran/MuMIn/docs/model.avg), and predict from that. The package also includes a predict function specifically for an object returned by model.avg (http://www.inside-r.org/node/123636). I've tried using the examples listed, code as follows:
# Example from Burnham and Anderson (2002), page 100:
fm1 <- lm(y ~ X1 + X2 + X3 + X4, data = Cement)
ms1 <- dredge(fm1)
# obtain model average for AIC delta <2
avgm <- model.avg(ms1, subset=delta<2)
# predict from the averaged model
averaged.full <- predict(avgm, full = TRUE)
But I keep getting
Error in predict.averaging(avgm, full = TRUE): can predict only from 'averaging' object containing model list
which I don't understand, because I did follow the examples and used an object returned by model.avg. Am I missing something?

When you create an "averaging" object directly from "model.selection" object, it does not contain the component models, which are required for predict to work. You can use model.avg(..., fit = TRUE) which will fit the models again.
To avoid fitting the models twice, you can first create a list of all models with
lapply(dredge(..., evaluate = FALSE), eval) and afterwards
use model.avg(..., subset = ...) on it.

Related

Clustered standard errors with imputed and weighted data in R

I am attempting to get clustered SEs (at the school level in my data) with data that is both imputed (MICE) and weighted (CBPS). I have tried a couple different approaches that have thrown different errors.
This is what I have to start, which works fine:
library(tidyverse)
library(mice)
library(MatchThem)
library(CBPS)
tempdata <- mice(d, m = 10, maxit = 50, meth = "pmm", seed = 99)
weighted_data <- weightthem(trtmnt ~ x1 + x2 + x3,
data = tempdata,
method = "cbps",
estimand = "ATT")
Using this (https://www.r-bloggers.com/2021/05/clustered-standard-errors-with-r/) as a guide, I attempted all 3, which all resulted in various types of error messages.
My data is in a restricted server so unfortunately I can't bring it into here to reproduce things exactly, although if it's useful I could attempt to recreate some sample data.
So attempting with estimatr first, I get this error:
m1 <- estimatr::lm_robust(outcome ~ trtmnt + x1 + x2 + x3,
clusters = schoolID,
data = weighted_data)
Error in eval_tidy(mfargs[[da]], data = data) :
object 'schoolID' not found
I have no clue where the schoolID variable would have dropped out/not be recognized. It isn't part of the weighting procedure but it should still be in the data frame...if I use it as a covariate in a standard model without clustering, it's there.
I also attempted with miceadds and got this error:
m2 <- miceadds::lm.cluster(outcome ~ trtmnt + x1 + x2 + x3,
cluster = "schoolID",
data = weighted_data)
Error in as.data.frame.default(data) :
cannot coerce class `"wimids"` to a data.frame
And finally, with sandwich and lmtest:
library(sandwich)
library(lmtest)
m3 <- weighted_models <- with(weighted_data,
exp=lm(outcome ~ trtmnt + x1 + x2 + x3))
msandwich <- coeftest(m3, vcov = vcovCL, cluster = ~schoolID)
Error in UseMethod("estfun") :
no applicable method for `estfun` applied to an object of class "c(`mimira`, `mira`)"
Any ideas on any of the above methods, or where to go next?
You were really close. You need to use with(weighted_data, .) to fit a model in your weighted datasets, and you need to use estimatr::lm_robust() to get the clustered standard errors. So try the following:
weighted_models <- with(weighted_data,
estimatr::lm_robust(outcome ~ trtmnt + x1 + x2 + x3,
cluster = schoolID))
Your first and second approaches were incorrect because you supplied weighted_data to a single model as if it were a data frame, but it's not; it's a complicated wimids object. You need to use the with() infrastructure to fit a model to the imputed weighted data.
Your third approach was close, but coeftest() needs to be used on a single model, not a mimira object, which contains all the models fit the imputed datasets. Although you can use coeftest() inside with() with mira objects, you cannot do so with mimira objects from MatchThem. This is where estimatr::lm_robust() comes in since it is able to apply the clustering within each imputed dataset.
I also recommend you take a look at this blog post on estimating treatment effects after weighting with multiply imputed data. The only difference in your case to the code presented in the post is that you would change vcov = "HC3" to vcov = ~schoolID in whichever function you use.

Quasi-Poisson mixed-effect model on overdispersed count data from multiple imputed datasets in R

I'm dealing with problems of three parts that I can solve separately, but now I need to solve them together:
extremely skewed, over-dispersed dependent count variable (the number of incidents while doing something),
necessity to include random effects,
lots of missing values -> multiple imputation -> 10 imputed datasets.
To solve the first two parts, I chose a quasi-Poisson mixed-effect model. Since stats::glm isn't able to include random effects properly (or I haven't figured it out) and lme4::glmer doesn't support the quasi-families, I worked with glmer(family = "poisson") and then adjusted the std. errors, z statistics and p-values as recommended here and discussed here. So I basically turn Poisson mixed-effect regression into quasi-Poisson mixed-effect regression "by hand".
This is all good with one dataset. But I have 10 of them.
I roughly understand the procedure of analyzing multiple imputed datasets – 1. imputation, 2. model fitting, 3. pooling results (I'm using mice library). I can do these steps for a Poisson regression but not for a quasi-Poisson mixed-effect regression. Is it even possible to A) pool across models based on a quasi-distribution, B) get residuals from a pooled object (class "mipo")? I'm not sure. Also I'm not sure how to understand the pooled results for mixed models (I miss random effects in the pooled output; although I've found this page which I'm currently trying to go through).
Can I get some help, please? Any suggestions on how to complete the analysis (addressing all three issues above) would be highly appreciated.
Example of data is here (repre_d_v1 and repre_all_data are stored in there) and below is a crucial part of my code.
library(dplyr); library(tidyr); library(tidyverse); library(lme4); library(broom.mixed); library(mice)
# please download "qP_data.RData" from the last link above and load them
## ===========================================================================================
# quasi-Poisson mixed model from single data set (this is OK)
# first run Poisson regression on df "repre_d_v1", then turn it into quasi-Poisson
modelSingle = glmer(Y ~ Gender + Age + Xi + Age:Xi + (1|Country) + (1|Participant_ID),
family = "poisson",
data = repre_d_v1)
# I know there are some warnings but it's because I share only a modified subset of data with you (:
printCoefmat(coef(summary(modelSingle))) # unadjusted coefficient table
# define quasi-likelihood adjustment function
quasi_table = function(model, ctab = coef(summary(model))) {
phi = sum(residuals(model, type = "pearson")^2) / df.residual(model)
qctab = within(as.data.frame(ctab),
{`Std. Error` = `Std. Error`*sqrt(phi)
`z value` = Estimate/`Std. Error`
`Pr(>|z|)` = 2*pnorm(abs(`z value`), lower.tail = FALSE)
})
return(qctab)
}
printCoefmat(quasi_table(modelSingle)) # done, makes sense
## ===========================================================================================
# now let's work with more than one data set
# object "repre_all_data" of class "mids" contains 10 imputed data sets
# fit model using with() function, then pool()
modelMultiple = with(data = repre_all_data,
expr = glmer(Y ~ Gender + Age + Xi + Age:Xi + (1|Country) + (1|Participant_ID),
family = "poisson"))
summary(pool(modelMultiple)) # class "mipo" ("mipo.summary")
# this has quite similar structure as coef(summary(someGLM))
# but I don't see where are the random effects?
# and more importantly, I wanted a quasi-Poisson model, not just Poisson model...
# ...but here it is not possible to use quasi_table function (defined earlier)...
# ...and that's because I can't compute "phi"
This seems reasonable, with the caveat that I'm only thinking about the computation, not whether this makes statistical sense. What I'm doing here is computing the dispersion for each of the individual fits and then applying it to the summary table, using a variant of the machinery that you posted above.
## compute dispersion values
phivec <- vapply(modelMultiple$analyses,
function(model) sum(residuals(model, type = "pearson")^2) / df.residual(model),
FUN.VALUE = numeric(1))
phi_mean <- mean(phivec)
ss <- summary(pool(modelMultiple)) # class "mipo" ("mipo.summary")
## adjust
qctab <- within(as.data.frame(ss),
{ std.error <- std.error*sqrt(phi_mean)
statistic <- estimate/std.error
p.value <- 2*pnorm(abs(statistic), lower.tail = FALSE)
})
The results look weird (dispersion < 1, all model results identical), but I'm assuming that's because you gave us a weird subset as a reproducible example ...

How to get a model matrix from clmm objects?

I want to estimate a multilevel ordered logistic model and afterwards access the model matrix. When running a simplified example from ?clmm:
library("ordinal")
mod1 <- clmm(SURENESS ~ PROD + (1|RESP), data = soup)
model.matrix(mod1)
I get the error message Error in eval(predvars, data, env) : object 'SURENESS' not found. From other packages I expected that setting parameters like model = TRUE the data going in are also exported to the estimated model, but here all relevant parameters seem to be set accordingly by default. Did I miss some parameters or elements from mod1 (I went through attributes(mod1) but did not find a model matrix.
Strangely if I set a random data.frame, it works:
set.seed(123)
df <- data.frame(y = factor(sample(c("A", "B", "C"), size = 1000, replace = TRUE), ordered = TRUE),
x = rnorm(1000),
id = factor(rep(1:10, each = 100)))
mod2 <- clmm(y ~ 1 + x + (1|id), data = df)
model.matrix(mod2)
So what's the difference between mod1 and mod2 and how do I get a model.matrix from mod1?
I do not think model.matrix(mod2) works for clmm objects. However, you can try to build a parallel model for the fixed effects part using functions like 'polr' and apply model.matrix() to the output object. The random-effects part can be fixed separately by using the clmm output.

Probability predictions with model averaged Cumulative Link Mixed Models fitted with clmm in ordinal package

I found that the predict function is currently not implemented in cumulative link mixed models fitted using the clmm function in ordinal R package. While predict is implemented for clmm2 in the same package, I chose to apply clmm instead because the later allows for more than one random effects. Further, I also fitted several clmm models and performed model averaging using model.avg function in MuMIn package. Ideally, I want to predict probabilities using the average model. However, while MuMIn supports clmm models, predict will also not work with the average model.
Is there a way to hack the predict function so that the function not only could predict probabilities from a clmm model, but also predict using model averaged coefficients from clmm (i.e. object of class "averaging")? For example:
require(ordinal)
require(MuMIn)
mm1 <- clmm(SURENESS ~ PROD + (1|RESP) + (1|RESP:PROD), data = soup,
link = "probit", threshold = "equidistant")
## test random effect:
mm2 <- clmm(SURENESS ~ PROD + (1|RESP) + (1|RESP:PROD), data = soup,
link = "logistic", threshold = "equidistant")
#create a model selection object
mm.sel<-model.sel(mm1,mm2)
##perform a model average
mm.avg<-model.avg(mm.sel)
#create new data and predict
new.data<-soup
##predict with indivindual model
predict(mm1, new.data)
I got the following error message:
In UseMethod("predict") :
no applicable method for predict applied to an object of class "clmm"
##predict with model average
predict(mm.avg, new.data)
Another error is returned:
Error in predict.averaging(mm.avg, new.data) :
predict for models 'mm1' and 'mm2' caused errors
I've been using clmm as well and yes I confirm predict.clmm is NOT (yet?) implemented. I didn't yet check the source code for fake.predict.clmm. It might work. If it doesn't, you're stuck with doing stuff by hand or using predict.clmm2.
I found a potential solution (pasted below) but have not been able to make work for my data.
Solution here: https://gist.github.com/mainambui/c803aaf857e54a5c9089ea05f91473bc
I think the problem is the number of coefficients I am using but am not experienced enough to figure it out. Hopefully this helps someone out though.
This is the model and newdata that I am using, though it is actually a model averaged version. Same predictors though.
ma10 <- clmm(Location3 ~ Sex * Grass3 + Sex * Forb3 + (1|Tag_ID), data =
IP_all_dunes)
ma_1 <- model.avg(ma10, ma8, ma5)##top 3 models
new_ma<- data.frame(Sex = c("m","f","m","f","m","f","m","f"),
Grass3 = c("1","1","1","1","0","0","0","0"),
Forb3 = c("0","0","1","1","0","0","1","1"))
# Arguments:
# - model = a clmm model
# - modelAvg = a clmm model average (object of class averaging)
# - newdata = a dataframe of new data to apply the model to
# Returns a dataframe of predicted probabilities for each row and response level
fake.predict.clmm <- function(modelAvg, newdata) {
# Actual prediction function
pred <- function(eta, theta, cat = 1:(length(theta) + 1), inv.link = plogis) {
Theta <- c(-1000, theta, 1000)
sapply(cat, function(j) inv.link(Theta[j + 1] - eta) - inv.link(Theta[j] -
eta))
}
# Multiply each row by the coefficients
#coefs <- c(model$beta, unlist(model$ST))##turn off if a model average is used
beta <- modelAvg$coefficients[2,3:12]
coefs <- c(beta, unlist(modelAvg$ST))
xbetas <- sweep(newdata, MARGIN=2, coefs, `*`)
# Make predictions
Theta<-modelAvg$coefficients[2,1:2]
#pred.mat <- data.frame(pred(eta=rowSums(xbetas), theta=model$Theta))
pred.mat <- data.frame(pred(eta=rowSums(xbetas), theta=Theta))
#colnames(pred.mat) <- levels(model$model[,1])
a<-attr(modelAvg, "modelList")
colnames(pred.mat) <- levels(a[[1]]$model[,1])
pred.mat
}

R: obtain coefficients&CI from bootstrapping mixed-effect model results

The working data looks like:
set.seed(1234)
df <- data.frame(y = rnorm(1:30),
fac1 = as.factor(sample(c("A","B","C","D","E"),30, replace = T)),
fac2 = as.factor(sample(c("NY","NC","CA"),30,replace = T)),
x = rnorm(1:30))
The lme model is fitted as:
library(lme4)
mixed <- lmer(y ~ x + (1|fac1) + (1|fac2), data = df)
I used bootMer to run the parametric bootstrapping and I can successfully obtain the coefficients (intercept) and SEs for fixed&random effects:
mixed_boot_sum <- function(data){s <- sigma(data)
c(beta = getME(data, "fixef"), theta = getME(data, "theta"), sigma = s)}
mixed_boot <- bootMer(mixed, FUN = mixed_boot_sum, nsim = 100, type = "parametric", use.u = FALSE)
My first question is how to obtain the coefficients(slope) of each individual levels of the two random effects from the bootstrapping results mixed_boot ?
I have no problem extracting the coefficients(slope) from mixed model by using augment function from broom package, see below:
library(broom)
mixed.coef <- augment(mixed, df)
However, it seems like broom can't deal with boot class object. I can't use above functions directly on mixed_boot.
I also tried to modify the mixed_boot_sum by adding mmList( I thought this would be what I am looking for), but R complains as:
Error in bootMer(mixed, FUN = mixed_boot_sum, nsim = 100, type = "parametric", :
bootMer currently only handles functions that return numeric vectors
Furthermore, is it possible to obtain CI of both fixed&random effects by specifying FUN as well?
Now, I am very confused about the correct specifications for the FUN in order to achieve my needs. Any help regarding to my question would be greatly appreciated!
My first question is how to obtain the coefficients(slope) of each individual levels of the two random effects from the bootstrapping results mixed_boot ?
I'm not sure what you mean by "coefficients(slope) of each individual level". broom::augment(mixed, df) gives the predictions (residuals, etc.) for every observation. If you want the predicted coefficients at each level I would try
mixed_boot_coefs <- function(fit){
unlist(coef(fit))
}
which for the original model gives
mixed_boot_coefs(mixed)
## fac1.(Intercept)1 fac1.(Intercept)2 fac1.(Intercept)3 fac1.(Intercept)4
## -0.4973925 -0.1210432 -0.3260958 0.2645979
## fac1.(Intercept)5 fac1.x1 fac1.x2 fac1.x3
## -0.6288728 0.2187408 0.2187408 0.2187408
## fac1.x4 fac1.x5 fac2.(Intercept)1 fac2.(Intercept)2
## 0.2187408 0.2187408 -0.2617613 -0.2617613
## ...
If you want the resulting object to be more clearly named you can use:
flatten <- function(cc) setNames(unlist(cc),
outer(rownames(cc),colnames(cc),
function(x,y) paste0(y,x)))
mixed_boot_coefs <- function(fit){
unlist(lapply(coef(fit),flatten))
}
When run through bootMer/confint/boot::boot.ci these functions will give confidence intervals for each of these values (note that all of the slopes facW.xZ are identical across groups because the model assumes random variation in the intercept only). In other words, whatever information you know how to extract from a fitted model (conditional modes/BLUPs [ranef], predicted intercepts and slopes for each level of the grouping variable [coef], parameter estimates [fixef, getME], random-effects variances [VarCorr], predictions under specific conditions [predict] ...) can be used in bootMer's FUN argument, as long as you can flatten its structure into a simple numeric vector.

Resources