Using confint in R on dataset with NAs - r

For a null model glmer() I would like to calculate 95% CI of the intercept by using the confint() function in R on a dataset that contain NAs. Below is the model summary:
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(df$Valid.detections, df$Missed.detections) ~ 1 + (1 | SUR.ID) + (1 | Unit)
Data: df
Control: glmerControl(calc.derivs = F, optCtrl = list(maxfun = 20000))
AIC BIC logLik deviance df.resid
21286.9 21305.4 -10640.4 21280.9 3549
Scaled residuals:
Min 1Q Median 3Q Max
-0.40089 -0.39994 -0.00010 0.02841 0.56340
Random effects:
Groups Name Variance Std.Dev.
Unit (Intercept) 2.237e+01 4.729e+00
SUR.ID (Intercept) 1.883e-10 1.372e-05
Number of obs: 3552, groups: Unit, 3552; SUR.ID, 20
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.07468 0.08331 -24.9 <2e-16 ***
However, when I try to calculate 95% CIs for the intercept it returns an error:
Computing bootstrap confidence intervals ...
Error in if (const(t, min(1e-08, mean(t, na.rm = TRUE)/1e+06))) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In bootMer(object, bootFun, nsim = nsim, ...) :
some bootstrap runs failed (200/200)
Timing stopped at: 42.47 0.289 45.836
I googled the error and warning messages for a solution and found this thread http://r-sig-mixed-models.r-project.narkive.com/3vst8TmK/r-sig-me-confint-mermod-method-boot-throws-error, in which Ben Bolker suggested that one way to work around this issue is to remove the rows with NAs prior to using the confint() function. I tried this and no error / warnings are returned, but found that the calculated 95% CIs do not envelope the intercept estimate.
> c0
2.5 % 97.5 %
-3.129255 -2.931859
The calculated CIs do envelope the intercept estimate of the null model of which the NAs were excluded prior the use of the confint() function, however, I need the NAs in there, if possible. Any suggestions how this would be possible?
Thank you for any help.

Related

lme4 1.1-27.1 error: pwrssUpdate did not converge in (maxit) iterations

Sorry that this error has been discussed before, each answer on stackoverflow seems specific to the data
I'm attempting to run the following negative binomial model in lme4:
Model5.binomial<-glmer.nb(countvariable ~ waves + var1 + dummycodedvar2 + dummycodedvar3 + (1|record_id), data=datadfomit)
However, I receive the following error when attempting to run the model:
Error in f_refitNB(lastfit, theta = exp(t), control = control) :pwrssUpdate did not converge in (maxit) iterations
I first ran the model with only 3 predictor variables (waves, var1, dummycodedvar2) and got the same error. But centering the predictors fixed this problem and the model ran fine.
Now with 4 variables (all centered) I expected the model to run smoothly, but receive the error again.
Since every answer on this site seems to point towards a problem in the data, data that replicates the problem can be found here:
https://file.io/3vtX9RwMJ6LF
Your response variable has a lot of zeros:
I would suggest fitting a model that takes account of this, such as a zero-inflated model. The GLMMadaptive package can fit zero-inflated negative binomial mixed effects models:
## library(GLMMadaptive)
## mixed_model(countvariable ~ waves + var1 + dummycodedvar2 + dummycodedvar3, ## random = ~ 1 | record_id, data = data,
## family = zi.negative.binomial(),
## zi_fixed = ~ var1,
## zi_random = ~ 1 | record_id) %>% summary()
Random effects covariance matrix:
StdDev Corr
(Intercept) 0.8029
zi_(Intercept) 1.0607 -0.7287
Fixed effects:
Estimate Std.Err z-value p-value
(Intercept) 1.4923 0.1892 7.8870 < 1e-04
waves -0.0091 0.0366 -0.2492 0.803222
var1 0.2102 0.0950 2.2130 0.026898
dummycodedvar2 -0.6956 0.1702 -4.0870 < 1e-04
dummycodedvar3 -0.1746 0.1523 -1.1468 0.251451
Zero-part coefficients:
Estimate Std.Err z-value p-value
(Intercept) 1.8726 0.1284 14.5856 < 1e-04
var1 -0.3451 0.1041 -3.3139 0.00091993
log(dispersion) parameter:
Estimate Std.Err
0.4942 0.2859
Integration:
method: adaptive Gauss-Hermite quadrature rule
quadrature points: 11
Optimization:
method: hybrid EM and quasi-Newton
converged: TRUE

Non-converging glmmTMB

I am working with a multivariate model with a Gamma distribution and I would like to make use of the lme4 syntaxis deployed in glmmTMB, however, I have noticed something strange with my model. Apparently, the model easily converges when I use stats::glm, but seems to give an error in the glmmTMB framework. Here is a reproducible example:
d <- data.frame(gamlss.data::plasma) # Sample dataset
m4.1 <- glm(calories ~ fat*fiber, family = Gamma(link = "log"), data = d) # Dos parámetros con interacción
m4.2 <- glmmTMB(calories ~ fat*fiber, family = Gamma(link = "log"), data = d) # Dos parámetros con interacción
>Warning message:
In fitTMB(TMBStruc) :
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
I guess, the solution might lie on the control parameters, but after looking on the troubleshooting vignette, I am not sure on where to start.
One solution can be to scale variables (as long as they are numeric).
d <- data.frame(gamlss.data::plasma) # Sample dataset
m4.1 <- glm(calories ~ fat*fiber, family = Gamma(link = "log"), data = d)
m4.2 <- glmmTMB(calories ~ scale(fat)*scale(fiber), family = Gamma(link = "log"), data = d)
Here, the second model converges fine, whereas it didn't before.
However, note the difference in parameter estimates between the two models:
> summary(m4.1)
Call:
glm(formula = calories ~ fat * fiber, family = Gamma(link = "log"),
data = d)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.42031 -0.07605 -0.00425 0.07011 0.60073
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.120e+00 5.115e-02 119.654 <2e-16 ***
fat 1.412e-02 6.693e-04 21.104 <2e-16 ***
fiber 5.108e-02 3.704e-03 13.789 <2e-16 ***
fat:fiber -4.092e-04 4.476e-05 -9.142 <2e-16 ***
(Dispersion parameter for Gamma family taken to be 0.0177092)
Null deviance: 40.6486 on 314 degrees of freedom
Residual deviance: 5.4494 on 311 degrees of freedom
AIC: 4307.2
Number of Fisher Scoring iterations: 4
______________________________________________________________
> summary(m4.2)
Family: Gamma ( log )
Formula: calories ~ scale(fat) * scale(fiber)
Data: d
AIC BIC logLik deviance df.resid
4307.2 4326.0 -2148.6 4297.2 310
Dispersion estimate for Gamma family (sigma^2): 0.0173
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.458146 0.007736 964.0 <2e-16 ***
scale(fat) 0.300768 0.008122 37.0 <2e-16 ***
scale(fiber) 0.104224 0.007820 13.3 <2e-16 ***
scale(fat):scale(fiber) -0.073786 0.008187 -9.0 <2e-16 ***
This is because the estimates are based on scaled parameters, so they must be interpreted with caution, or 'un-scaled.' See:
Understanding `scale` in R for an understanding of what the scale() function does, and see: interpretation of scaled regression coefficients... for a more in-depth understanding of what this means in a model
As a last note, the fact that models converge doesn't mean that they are a good fit.

bootstrapping for lmer with interaction term

I am running a mixed model using lme4 in R:
full_mod3=lmer(logcptplus1 ~ logdepth*logcobb + (1|fyear) + (1 |flocation),
data=cpt, REML=TRUE)
summary:
Formula: logcptplus1 ~ logdepth * logcobb + (1 | fyear) + (1 | flocation)
Data: cpt
REML criterion at convergence: 577.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.7797 -0.5431 0.0248 0.6562 2.1733
Random effects:
Groups Name Variance Std.Dev.
fyear (Intercept) 0.2254 0.4748
flocation (Intercept) 0.1557 0.3946
Residual 0.9663 0.9830
Number of obs: 193, groups: fyear, 16; flocation, 16
Fixed effects:
Estimate Std. Error t value
(Intercept) 4.3949 1.2319 3.568
logdepth 0.2681 0.4293 0.625
logcobb -0.7189 0.5955 -1.207
logdepth:logcobb 0.3791 0.2071 1.831
I have used the effects package and function in R to calculate the 95% confidence intervals for the model output. I have calculated and extracted the 95% CI and standard error using the effects package so that I can examine the relationship between the predictor variable of importance and the response variable by holding the secondary predictor variable (logdepth) constant at the median (2.5) in the data set:
gm=4.3949 + 0.2681*depth_median + -0.7189*logcobb_range + 0.3791*
(depth_median*logcobb_range)
ef2=effect("logdepth*logcobb",full_mod3,
xlevels=list(logcobb=seq(log(0.03268),log(0.37980),,200)))
I have attempted to bootstrap the 95% CIs using code from here. However, I need to calculate the 95% CIs for only the median depth (2.5). Is there a way to specify in the confint() code so that I can calculate the CIs needed to visualize the bootstrapped results as in the plot above?
confint(full_mod3,method="boot",nsim=200,boot.type="perc")
You can do this by specifying a custom function:
library(lme4)
?confint.merMod
FUN: bootstrap function; if ‘NULL’, an internal function that returns the fixed-effect parameters as well as the random-effect parameters on the standard deviation/correlation scale will be used. See ‘bootMer’ for details.
So FUN can be a prediction function (?predict.merMod) that uses a newdata argument that varies and fixes appropriate predictor variables.
An example with built-in data (not quite as interesting as yours since there's a single continuous predictor variable, but I think it should illustrate the approach clearly enough):
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
pframe <- data.frame(Days=seq(0,20,by=0.5))
## predicted values at population level (re.form=NA)
pfun <- function(fit) {
predict(fit,newdata=pframe,re.form=NA)
}
set.seed(101)
cc <- confint(fm1,method="boot",FUN=pfun)
Picture:
par(las=1,bty="l")
matplot(pframe$Days,cc,lty=2,col=1,type="l",
xlab="Days",ylab="Reaction")

How do I grab the AR1 estimate and its SE from the gls function in R?

I am attempting to get the lag one autocorrelation estimates from the gls function (package {nlme}) with its SE. This is being done on a non-stationary univariate time series. Here is the output:
Generalized least squares fit by REML
Model: y ~ year
Data: tempdata
AIC BIC logLik
51.28921 54.37957 -21.64461
Correlation Structure: AR(1)
Formula: ~1
Parameter estimate(s):
Phi
0.9699799
Coefficients:
Value Std.Error t-value p-value
(Intercept) -1.1952639 3.318268 -0.3602072 0.7234
year -0.2055264 0.183759 -1.1184567 0.2799
Correlation:
(Intr)
year -0.36
Standardized residuals:
Min Q1 Med Q3 Max
-0.12504485 -0.06476076 0.13948378 0.51581993 0.66030397
Residual standard error: 3.473776
Degrees of freedom: 18 total; 16 residual
The phi coefficient seemed promising since it was under the correlation structure in the output
Correlation Structure: AR(1)
Formula: ~1
Parameter estimate(s):
Phi
0.9699799
but it regularly goes over one, which is not possible for correlation. Then there is the
Correlation:
(Intr)
Yearctr -0.36
but I was advised that this was likely not a correct estimate for the data (there were multiple test sites so this is just one of the unexpected estimates). Is there a function that outputs an AR1 estimate and its SE (other than arima)?
sample of autocorrelated data:
set.seed(29)
y = diffinv(rnorm(500))
x = 1:length(y)
gls(y~x, correlation=corAR1(form=~1))
Note: I am comparing the function arima() to gls() (or another method) to compare AR1 estimates and SE's. I am doing this under adviser request.

Interpreting the output of summary(glmer(...)) in R

I'm an R noob, I hope you can help me:
I'm trying to analyse a dataset in R, but I'm not sure how to interpret the output of summary(glmer(...)) and the documentation isn't a big help:
> data_chosen_stim<-glmer(open_chosen_stim~closed_chosen_stim+day+(1|ID),family=binomial,data=chosenMovement)
> summary(data_chosen_stim)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: open_chosen_stim ~ closed_chosen_stim + day + (1 | ID)
Data: chosenMovement
AIC BIC logLik deviance df.resid
96.7 105.5 -44.4 88.7 62
Scaled residuals:
Min 1Q Median 3Q Max
-1.4062 -1.0749 0.7111 0.8787 1.0223
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 0 0
Number of obs: 66, groups: ID, 35
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4511 0.8715 0.518 0.605
closed_chosen_stim2 0.4783 0.5047 0.948 0.343
day -0.2476 0.5060 -0.489 0.625
Correlation of Fixed Effects:
(Intr) cls__2
clsd_chsn_2 -0.347
day -0.916 0.077
I understand the GLM behind it, but I can't see the weights of the independent variables and their error bounds.
update: weights.merMod already has a type argument ...
I think what you're looking for weights(object,type="working").
I believe these are the diagonal elements of W in your notation?
Here's a trivial example that matches up the results of glm and glmer (since the random effect is bogus and gets an estimated variance of zero, the fixed effects, weights, etc etc converges to the same value).
Note that the weights() accessor returns the prior weights by default (these are all equal to 1 for the example below).
Example (from ?glm):
d.AD <- data.frame(treatment=gl(3,3),
outcome=gl(3,1,9),
counts=c(18,17,15,20,10,20,25,13,12))
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson(),
data=d.AD)
library(lme4)
d.AD$f <- 1 ## dummy grouping variable
glmer.D93 <- glmer(counts ~ outcome + treatment + (1|f),
family = poisson(),
data=d.AD,
control=glmerControl(check.nlev.gtr.1="ignore"))
Fixed effects and weights are the same:
all.equal(fixef(glmer.D93),coef(glm.D93)) ## TRUE
all.equal(unname(weights(glm.D93,type="working")),
weights(glmer.D93,type="working"),
tol=1e-7) ## TRUE

Resources