High GLMER dispersion parameters - r

I am running a glmer with a random effect for count data (x) and two categorical variables (y and z):
fullmodel<-glmer(x~y*z + (1|Replicate), family = poisson, data = Data)
However, when I look at the dispersion parameter:
> dispersion_glmer(fullmodel)
[1] 2.338742
It is way higher than 1. Does this mean my model is over dispersed? How do I correct it. I want to keep my random effect but when I tried to swap the family to quasipoisson it says you can't use it for a glmer

Related

Is there a way to include an autocorrelation structure in the gam function of mgcv?

I am building a model using the mgcv package in r. The data has serial measures (data collected during scans 15 minutes apart in time, but discontinuously, e.g. there might be 5 consecutive scans on one day, and then none until the next day, etc.). The model has a binomial response, a random effect of day, a fixed effect, and three smooth effects. My understanding is that REML is the best fitting method for binomial models, but that this method cannot be specified using the gamm function for a binomial model. Thus, I am using the gam function, to allow for the use of REML fitting. When I fit the model, I am left with residual autocorrelation at a lag of 2 (i.e. at 30 minutes), assessed using ACF and PACF plots.
So, we wanted to include an autocorrelation structure in the model, but my understanding is that only the gamm function and not the gam function allows for the inclusion of such structures. I am wondering if there is anything I am missing and/or if there is a way to deal with autocorrelation with a binomial response variable in a GAMM built in mgcv.
My current model structure looks like:
gam(Response ~
s(Day, bs = "re") +
s(SmoothVar1, bs = "cs") +
s(SmoothVar2, bs = "cs") +
s(SmoothVar3, bs = "cs") +
as.factor(FixedVar),
family=binomial(link="logit"), method = "REML",
data = dat)
I tried thinning my data (using only every 3rd data point from consecutive scans), but found this overly restrictive to allow effects to be detected due to my relatively small sample size (only 42 data points left after thinning).
I also tried using the prior value of the binomial response variable as a factor in the model to account for the autocorrelation. This did appear to resolve the residual autocorrelation (based on the updated ACF/PACF plots), but it doesn't feel like the most elegant way to do so and I worry this added variable might be adjusting for more than just the autocorrelation (though it was not collinear with the other explanatory variables; VIF < 2).
I would use bam() for this. You don't need to have big data to fit a with bam(), you just loose some of the guarantees about convergence that you get with gam(). bam() will fit a GEE-like model with an AR(1) working correlation matrix, but you need to specify the AR parameter via rho. This only works for non-Gaussian families if you also set discrete = TRUE when fitting the model.
You could use gamm() with family = binomial() but this uses PQL to estimate the GLMM version of the GAMM and if your binomial counts are low this method isn't very good.

Running a glmer model with only random factors to get an estimate

I have run the following GLMER using the mixed function (from afex package) to estimate the paternity success of two different types of sires in different ecological scenarios. The predicted outcome is the relative numbers of offspring from each of the sires. I would like to know if sex ratio and density contribute to differential paternity success.
I used the following model to estimate this:
mixed(cbind(Ive, IV) ~ log2ratio + Total + (1|VialID) + (1|Batch),
method = "LRT", data = sex.ratioSO, family = binomial())
summary(sex.model.3)
sex.model.3$anova_table
Now, I would like to get estimates for each of these different sex ratios, so that I can plot them in a manner something like this:
In this plot, the "relative fitness" on the y axis can be calculated by the estimate.
So, for this I need the estimates specific to each sex ratio. I subsetted the data according to one of the "sexratio" using the following code:
unique(sexratioSO$ratio) # choosing the first sex ratio to subset
data = subset(sexratioSO, ratio == "2:01") # I would like to do this for all sex ratios in my data file.
I then ran, the null model with only my random factors to get an estimate of the relative numbers of offspring for that sex ratio so that I can plot it (I would like to do the same for all the sex ratios):
sex. ratio.estimate = lmer(cbind(Ive, IV) ~ (1|VialID) + (1|Batch), data = data,
control=lmerControl(check.nobs.vs.nlev = "ignore",
check.nobs.vs.rankZ = "ignore",
check.nobs.vs.nRE="ignore"))
However, I get this error -
Error in v/v.e : non-conformable arrays
FYI - "VialID" and "Batch" are factors since they are taken are random effects.
You can find the data file here.
It would be great if I can have some help on how I can obtain estimates for each of the sex ratios to plot them.
Many thanks.
ecosak

How can I get the null deviance of a glmer() model?

Is there a way to get the null deviance and df for a generalized linear mixed model fit with glmer()? Is there a reason that this is not included in the summary() output, the way that it is with a glm() object?
You can compute the null deviance by re-fitting the model with an intercept term only, e.g.
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
gm0 <- update(gm1, . ~ 1 + (1|herd))
deviance(gm1) ## 73.47
deviance(gm0) ## 92.42 (null deviance)
I'm not sure what you mean by the "null df" for the GLMM; the 'denominator degree of freedom' measure of effective sample size that works perfectly for balanced ANOVAs and questionably for linear mixed models [via inclusion/exclusion, Satterthwaite, Kenward-Roger, etc.] is hard to define for GLMMs.
I can think of a couple of reasons that lme4 doesn't automatically do this computation for you:
it could be an expensive re-fit (even for GLMs it does require refitting the model, see here for the code in glm that does it)
it's less obvious for GLMMs what the appropriate null model for comparison is. Do you remove both random and fixed effects and reduce the model to a GLM? Do you keep all of the random effects, or only intercept-level random effects, or some other mixture depending on the context of the question? Making the user do it themselves forces them to make this choice.
(That said, I don't believe that omitting the null deviance was an explicit choice.)
If you do choose to discard all of the random effects (i.e. comparing to deviance(glm(cbind(incidence, size - incidence) ~ period, data =cbpp, family = binomial)) in the example above, you should be able to do a meaningful comparison with a glmer fit, but there are some subtleties: you might want to read the section on Deviance and log-likelihood of GLMMs in ?deviance.merMod.

How to specify random effects names in a newdata data.frame used in predict() function? - lme4

I have a problem using the predict() function in lme4.
More precisely, I am not very clear on how to declare the names of random effects to be used in the newdata data frame, which I feed the predict() function with, in order to get some predictions.
I will try and describe my problem in detail.
Data
The data I am working with is longitudinal. I have 119 observations, for each of which I have several (6-7) measurements for each observation, which represent the size of proteins, which aggregate in time and grow bigger (let's call it LDL).
Model
The model used to describe this process is a Richard's curve (generalized logistic function), which can be written as
Now, I fit a separate curve for the group of measurements of each observation, with the following fixed, random effects, and variables:
alpha_fix - a fixed effect for alpha
alpha|Obs - a random effect for alpha, which varies among observations
gamma_fix - a fixed effect for gamma
gamma|Obs - a random effect for gamma, which varies among observations
delta_f - a fixed effect
Time - a continuous variable, time in hours
LDL - response variable, continuous, representing size of proteins at time point t.
Predictions
Once I fit the model, I want to use it to predict the value of LDL at a specific time point, for each observation. In order to do this, I need to use the predict function and assign a data frame for newdata. reading through the documentation here, it says the following:
If any random effects are included in re.form (see below), newdata
must contain columns corresponding to all of the grouping variables
and random effects used in the original model, even if not all are
used in prediction; however, they can be safely set to NA in this case
Now, the way I understand this, I need to have a data frame newdata, which in my case contains the following columns: "Time", "Obs", "alpha_f", "gamma_f", "delta_f", as well as two columns for the random effects of alpha and gamma, respectively. However, I am not sure how these two columns with random effects should be named, in order for the predict() function to understand them. I tried with "alpha|Obs" and "gamma|Obs", as well as "Obs$alpha", "Obs$gamma", but both throw the error
Error in FUN(X[[i]], ...) : random effects specified in re.form
that were not present in original model.
I was wondering whether anyone has any idea what the correct way to do this is.
For completeness, the code used to fit the model is provided below:
ModelFunction = function (alpha, gamma, delta, Time) {
15 + (alpha-15) / (1 + exp ((gamma-Time) / delta))
}
ModelGradient = deriv(body(ModelFunction) [[2]], namevec = c ("alpha", "gamma", "delta"), function.arg = ModelFunction)
starting_conditions = c (alpha = 5000, gamma = 1.5, delta = 0.2) # Based on visual observation
fit = nlmer (
LDL ~
ModelGradient (alpha, gamma, delta, Time)
~ (gamma | Obs) + (alpha | Obs),
start = starting_conditions,
nlmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 100000)),
data = ldlData)
I would really appreciate it if someone could give me some advice.
Thanks for your time!

Modeling a random-effects component using the lme4 package

I would like to estimate a generalized linear mixed-effects model using the glmer.nb function in R's lme4 package. I have panel data of various crime outcomes. My cross-sectional unit is the "precinct" (over 40 precincts) and I observe crime in those precincts over many months. I am evaluating an intervention that 'turns on/off' (dummy coded) over the month-years. I include "precinct" and "month" fixed effects (i.e., a full set of precinct and month dummies enter the model). I have only one independent variable I am assessing. The second model using glmer.nb is the function that returns an error.
# How the two-way fixed-effects model is specified (works well)
model_fe <- glm.nb(crime_counts ~ as.factor(precinct) + as.factor(month_year) + policy, data = df)
# Modeling "precincts" as the random-effect (fails)
model_re <- glmer.nb(crime_counts ~ (1 | precinct) + as.factor(month_year) + policy, data = df)
The error returned is shown below...
failure to converge in 10000 evaluationsModel failed to converge with max|grad| = 0.00295777 (tol = 0.001, component 1)iteration limit reached
In sum, I would like to specify precincts as the random intercept component. I also tried to specify precinct as a factor variable but that did not help. Any ideas? I am new to this package.

Resources