Is there a way to extrapolate predicted data from lmer - r

I am using lmer to fit a multilevel polynomial regression model with several fixed effects (including subject-specific variables like age, short-term memory span, etc.) and two sets of random effects (Subject and Subject:Condition). Now I would like to predict data for a hypothetical subject with particular properties (age, short-term memory span, etc.). I fit the model (m) and created a new data frame (pred) that contains my hypothetical subject, but when I tried predict(m, pred) I got an error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "mer"
I know I could use the brute-force method of extracting fixed effects from my model and multiplying it all out, but is there a more elegant solution?

You can do this type of extrapolated prediction easily with the merTools package for R: http://www.github.com/jknowles/merTools
merTools includes a function called predictInterval which provides robust prediction capabilities for lmer and glmer fits. Specifically, you can use this function to predict extrapolated data, and to obtain prediction intervals that account for the variance in both the fixed and random effects, as well as the residual error of the model.
Here's a quick code example:
library(merTools)
m1 <- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)
predOut <- predictInterval(m1, newdata = sleepstudy, n.sims = 100)
# extrapolated data
extrapData <- sleepstudy[1:10,]
extrapData$Days <- 20
extrapPred <- predictInterval(m1, newdata = extrapData)

Related

Using merTools::predictInterval for Poisson family mixed models

I am utilizing the predictInterval() function from the merTools package. My model is fit utilizing a Poisson family specification like the below:
glmer(y ~ (1|key) + x, data = dat, family = poisson())
When I use predictInterval() to calculate the prediction interval associated with my model, I get the following warning message:
Warning message:
Prediction for NLMMs or GLMMs that are not mixed binomial regressions is not tested. Sigma set at 1.
I am taking this to mean that predictInterval() doesn't have an implementation for models fit with a Poisson distribution. I therefore do not trust the resulting interval.
Is my interpretation correct? I have searched around for similar issues but haven't found anything.
Any help would be greatly appreciated.

Longitudinal analysis using sampling weigths in R

I have longitudinal data from two surveys and I want to do a pre-post analysis. Normally, I would use survey::svyglm() or svyVGAM::svy_vglm (for multinomial family) to include sampling weights, but these functions don't account for the random effects. On the other hand, lme4::lmer accounts for the repeated measures, but not the sampling weights.
For continuous outcomes, I understand that I can do
w_data_wide <- svydesign(ids = ~1, data = data_wide, weights = data_wide$weight)
svyglm((post-pre) ~ group, w_data_wide)
and get the same estimates that I would get if I could use lmer(outcome ~ group*time + (1|id), data_long) with weights [please correct me if I'm wrong].
However, for categorical variables, I don't know how to do the analyses. WeMix::mix() has a parameter weights, but I'm not sure if it treats them as sampling weights. Still, this function can't support multinomial family.
So, to resume: can you enlighten me on how to do a pre-post test analysis of categorical outcomes with 2 or more levels? Any tips about packages/functions in R and how to use/write them would be appreciated.
I give below some data sets with binomial and multinomial outcomes:
library(data.table)
set.seed(1)
data_long <- data.table(
id=rep(1:5,2),
time=c(rep("Pre",5),rep("Post",5)),
outcome1=sample(c("Yes","No"),10,replace=T),
outcome2=sample(c("Low","Medium","High"),10,replace=T),
outcome3=rnorm(10),
group=rep(sample(c("Man","Woman"),5,replace=T),2),
weight=rep(c(1,0.5,1.5,0.75,1.25),2)
)
data_wide <- dcast(data_long, id~time, value.var = c('outcome1','outcome2','outcome3','group','weight'))[, `:=` (weight_Post = NULL, group_Post = NULL)]
EDIT
As I said below in the comments, I've been using lmer and glmer with variables used to calculate the weights as predictors. It happens that glmer returns a lot of problems (convergence, high eigenvalues...), so I give another look at #ThomasLumley answer in this post and others (https://stat.ethz.ch/pipermail/r-help/2012-June/315529.html | https://stats.stackexchange.com/questions/89204/fitting-multilevel-models-to-complex-survey-data-in-r).
So, my question is now if a can use participants id as clusters in svydesign
library(survey)
w_data_long_cluster <- svydesign(ids = ~id, data = data_long, weights = data_long$weight)
summary(svyglm(factor(outcome1) ~ group*time, w_data_long_cluster, family="quasibinomial"))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.875e+01 1.000e+00 18.746 0.0339 *
groupWoman -1.903e+01 1.536e+00 -12.394 0.0513 .
timePre 5.443e-09 5.443e-09 1.000 0.5000
groupWoman:timePre 2.877e-01 1.143e+00 0.252 0.8431
and still interpret groupWoman:timePre as differences in the average rate of change/improvement in the outcome over time between sex groups, as if I was using mixed models with participants as random effects.
Thank you once again!
A linear model with svyglm does not give the same parameter estimates as lme4::lmer. It does estimate the same parameters as lme4::lmer if the model is correctly specified, though.
Generalised linear models with svyglm or svy_vglm don't estimate the same parameters as lme4::glmer, as you note. However, they do estimate perfectly good regression parameters and if you aren't specifically interested in the variance components or in estimating the realised random effects (BLUPs) I would recommend just using svy_glm.
Another option if you have non-survey software for random effects versions of the models is to use that. If you scale the weights to sum to the sample size and if all the clustering in the design is modelled by random effects in the model, you will get at least a reasonable approximation to valid inference. That's what I've seen recommended for Bayesian survey modelling, for example.

How to use sample weights in GAM (mgcv) on survey data for Logit regression?

I'm interesting in performing a GAM regression on data from a national wide survey which presents sample weights. I read with interest this post.
I selected my vars of interest generating a DF:
nhanesAnalysis <- nhanesDemo %>%
select(fpl,
age,
gender,
persWeight,
psu,
strata)
Than, for what I understood, I generated a weighted DF with the following code:
library(survey)
nhanesDesign <- svydesign( id = ~psu,
strata = ~strata,
weights = ~persWeight,
nest = TRUE,
data = nhanesAnalysis)
Let's say that I would select only subjects with ageā‰„30:
ageDesign <- subset(nhanesDesign, age >= 30)
Now, I would fit a GAM model (fpl ~ s(age) + gender) with mgcv package. Is it possible to do so with the weights argument or using svydesign object ageDesign ?
EDIT
I was wondering if is it correct to extrapolate computed weights from the an svyglm object and use it for weights argument in GAM.
This is more difficult than it looks. There are two issues
You want to get the right amount of smoothing
You want valid standard errors.
Just giving the sampling weights to mgcv::gam() won't do either of these: gam() treats the weights as frequency weights and so will think it has a lot more data than it actually has. You will get undersmoothing and underestimated standard errors because of the weights, and you will also likely get underestimated standard errors because of the cluster sampling.
The simple work-around is to use regression splines (splines package) instead. These aren't quite as good as the penalised splines used by mgcv, but the difference usually isn't a big deal, and they work straightforwardly with svyglm. You do need to choose how many degrees of freedom to assign.
library(splines)
svglm(fpl ~ ns(age,4) + gender, design = nhanesDesign)

R multilevel mediation with uneven sample sizes in Y and M models

I'm trying to run a multilevel mediation analysis in R.
I get the error: Error in mediate(model.M, model.Y, treat = "treat", mediator = mediator, data=data):
number of observations do not match between mediator and outcome models
Models M and Y are multilevel lme4 models, and there are uneven sample sizes in these models. Is there anything I can do to run this analysis? Will it really only run if I have the same sample sizes in each model?
Fit the model with less observations first (which is, I guess, model.Y, because that model has more predictors and thus is more likely to have more missings), then use the model frame from that model as data for the 2nd model:
model.M <- lmer(..., data = model.Y#frame)
That should work.

glmer - predict with binomial data (cbind count data)

I am trying to predict values over time (Days in x axis) for a glmer model that was run on my binomial data. Total Alive and Total Dead are count data. This is my model, and the corresponding steps below.
full.model.dredge<-glmer(cbind(Total.Alive,Total.Dead)~(CO2.Treatment+Lime.Treatment+Day)^3+(Day|Container)+(1|index),
data=Survival.data,family="binomial")
We have accounted for overdispersion as you can see in the code (1:index).
We then use the dredge command to determine the best fitted models with the main effects (CO2.Treatment, Lime.Treatment, Day) and their corresponding interactions.
dredge.models<-dredge(full.model.dredge,trace=FALSE,rank="AICc")
Then made a workspace variable for them
my.dredge.models<-get.models(dredge.models)
We then conducted a model average to average the coefficients for the best fit models
silly<-model.avg(my.dredge.models,subset=delta<10)
But now I want to create a graph, with the Total Alive on the Y axis, and Days on the X axis, and a fitted line depending on the output of the model. I understand this is tricky because the model concatenated the Total.Alive and Total.Dead (see cbind(Total.Alive,Total.Dead) in the model.
When I try to run a predict command I get the error
# 9: In UseMethod("predict") :
# no applicable method for 'predict' applied to an object of class "mer"
Most of your problem is that you're using a pre-1.0 version of lme4, which doesn't have the predict method implemented. (Updating would be easiest, but I believe that if you can't for some reason, there's a recipe at http://glmm.wikidot.com/faq for doing the predictions by hand by extracting the fixed-effect design matrix and the coefficients ...)There's actually not a problem with the predictions, which predict the log-odds (by default) or the probability (if type="response"); if you wanted to predict numbers, you'd have to multiply by N appropriately.
You didn't give one, but here's a reproducible (albeit somewhat trivial) example using the built-in cbpp data set (I do get some warning messages -- no non-missing arguments to max; returning -Inf -- but I think this may be due to the fact that there's only one non-trivial fixed-effect parameter in the model?)
library(lme4)
packageVersion("lme4") ## 1.1.4, but this should work as long as >1.0.0
library(MuMIn)
It's convenient for later use (with ggplot) to add a variable for the proportion:
cbpp <- transform(cbpp,prop=incidence/size)
Fit the model (you could also use glmer(prop~..., weights=size, ...))
gm0 <- glmer(cbind(incidence, size - incidence) ~ period+(1|herd),
family = binomial, data = cbpp)
dredge.models<-dredge(gm0,trace=FALSE,rank="AICc")
my.dredge.models<-get.models(dredge.models)
silly<-model.avg(my.dredge.models,subset=delta<10)
Prediction does work:
predict(silly,type="response")
Creating a plot:
library(ggplot2)
theme_set(theme_bw()) ## cosmetic
g0 <- ggplot(cbpp,aes(period,prop))+
geom_point(alpha=0.5,aes(size=size))
Set up a prediction frame:
predframe <- data.frame(period=levels(cbpp$period))
Predict at the population level (ReForm=NA -- this may have to be REForm=NA in lme4 `1.0.5):
predframe$prop <- predict(gm0,newdata=predframe,type="response",ReForm=NA)
Add it to the graph:
g0 + geom_point(data=predframe,colour="red")+
geom_line(data=predframe,colour="red",aes(group=1))

Resources