Using glmer.nb(), the error message:(maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate is returned - r

When using glmer.nb, we just get error message
> glm1 <- glmer.nb(Jul ~ scale(I7)+ Maylg+(1|Year), data=bph.df)
Error: (maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate
In addition: Warning message:
In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace > :
iteration limit reached
Who can help me? Thanks very much!
My data listed below.
Year Jul A7 Maylg L7b
331 1978 1948 6 1.322219 4
343 1979 8140 32 2.678518 2
355 1980 106896 26 2.267172 2
367 1981 36227 25 4.028205 2
379 1982 19085 18 2.752816 2
391 1983 26010 32 2.086360 3
403 1984 1959 1 2.506505 4
415 1985 8025 18 2.656098 0
427 1986 9780 20 1.939519 0
439 1987 48235 29 4.093912 1
451 1988 7473 30 2.974972 2
463 1989 2850 25 2.107210 2
475 1990 10555 18 2.557507 3
487 1991 70217 30 4.843563 0
499 1992 2350 31 1.886491 2
511 1993 3363 32 2.956649 4
523 1994 5140 37 1.934498 4
535 1995 14210 36 2.492760 1
547 1996 3644 27 1.886491 1
559 1997 9828 29 1.653213 1
571 1998 3119 41 2.535294 4
583 1999 5382 10 2.472756 3
595 2000 690 5 1.886491 2
607 2001 871 13 NA 2
619 2002 12394 27 0.845098 5
631 2003 4473 36 1.342423 2

You're going to have a lot of problems with this data set, among other things, because you have an observation-level random effect (you only have one data point per Year) and are trying to fit a negative binomial model. That essentially means you're trying to fit the overdispersion in two different ways at the same time.
If you fit the Poisson model, you can see that the results are strongly underdispersed (for a Poisson model, the residual deviance should be approximately equal to the residual degrees of freedom).
library("lme4")
glm0 <- glmer(Jul ~ scale(A7)+ Maylg+(1|Year), data=bph.df,
family="poisson")
print(glm0)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: poisson ( log )
Formula: Jul ~ scale(A7) + Maylg + (1 | Year)
Data: bph.df
AIC BIC logLik deviance df.resid
526.4904 531.3659 -259.2452 518.4904 21
Random effects:
Groups Name Std.Dev.
Year (Intercept) 0.9555
Number of obs: 25, groups: Year, 25
Fixed Effects:
(Intercept) scale(A7) Maylg
7.3471 0.3363 0.6732
deviance(glm0)/df.residual(glm0)
## [1] 0.0003479596
Or alternatively:
library("aods3")
gof(glm0)
## D = 0.0073, df = 21, P(>D) = 1
## X2 = 0.0073, df = 21, P(>X2) = 1
glmmADMB does manage to fit it, but I don't know how far I would trust the results (the dispersion parameter is very large, indicating that the model has basically converged to a Poisson distribution anyway).
bph.df <- na.omit(transform(bph.df,Year=factor(Year)))
glmmadmb(Jul ~ scale(A7)+ Maylg+(1|Year), data=bph.df,
family="nbinom")
GLMM's in R powered by AD Model Builder:
Family: nbinom
alpha = 403.43
link = log
Fixed effects:
Log-likelihood: -259.25
AIC: 528.5
Formula: Jul ~ scale(A7) + Maylg + (1 | Year)
(Intercept) scale(A7) Maylg
7.3628472 0.3348105 0.6731953
Random effects:
Structure: Diagonal matrix
Group=Year
Variance StdDev
(Intercept) 0.9105 0.9542
Number of observations: total=25, Year=25
The results are essentially identical to the Poisson model from lme4 above.

Related

Prediction in a linear mixed model in R

Consider the sleepstudy data in the lme4 package as shown below. The contains 18 subjects with repeated measurements of Reaction (Reaction is the response) taken on different days.
library("lme4")
head(sleepstudy)
Reaction Days Subject
1 249.5600 0 308
2 258.7047 1 308
3 250.8006 2 308
4 321.4398 3 308
5 356.8519 4 308
6 414.6901 5 308
The following code fits a linear mixed model with a random intercept.
fit1 = lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
We can obtain subject-specific random intercept using "ranef(fit1)". Also, one can use "predict(fit1)" to give predictions of the response for all the time points in the original data.
However, I would like to predict the response (Reaction) in R for the 18 subjects at Day=12 and Day 14 (Day 12 and 14 are days that are not in the original data but would like to make a prediction for Reaction).
That is, I should end up with a dataset that looks like this.
Days Subject Predicted_Response
12 308
12 309
...
12 371
12 372
14 308
14 309
...
14 371
14 372
We can accomplish this with the "newdata" argument of the predict method:
library("lme4")
fit1 = lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
newdata <- expand.grid(
Days = c(12, 14),
Subject = unique(sleepstudy$Subject)
)
newdata$Predicted_Response <- predict(fit1, newdata = newdata)
Days Subject Predicted_Response
1 12 308 417.7962
2 14 308 438.7308
3 12 309 299.1630
4 14 309 320.0976
5 12 310 313.9040
6 14 310 334.8385
7 12 330 381.4190
8 14 330 402.3536
9 12 331 387.2287
10 14 331 408.1633
11 12 332 385.2338
12 14 332 406.1683
... etc ...

MICE package in R: passive imputation

I aimed to handle missing values with multiple imputation and then analyse with mixed linear model.
I am stacked by passive imputation for "BMI" (body mass index) and "BMI category". "BMI" was calculated by height and weight and then categorized into "BMI category".
How to impute 'BMI category'?
The database looks like below:
sub_eu_surf[1:5, 3:12]
age gender smoking exercise education sbp dbp height weight bmi
1 41 1 1 2 18 120 80 185 107 31.26370
2 46 1 3 2 18 130 70 182 102 30.79338
3 46 1 3 2 18 130 70 182 102 30.79338
4 47 1 1 2 14 130 80 178 78 24.61810
5 47 1 1 1 14 150 80 175 85 27.75510
Since 'bmi category' is not a predictor of my imputation, I decided to create it after imputation. And details are below:
1. To define method and predictor
ini<-mice(sub_eu_surf, maxit=0)
meth<-ini$meth
meth["bmi"]<-"~I(weight/(height/100)^2)"
pred <- ini$predictorMatrix
pred[c("pm25_global", "pm25_eu", "pm10_eu", "no2_eu"), ]<-0
pred[,c("bmi", "hba1c", "pm25_eu", "pm10_eu")]<-0
pred[,"tc"]<-0
pred[c("smoking", "exercise", "hdl", "glucose"), "tc"]<-1
pred[c("smoking", "exercise", "hdl", "glucose"), "ldl"]<-0
vis <- ini$vis
imp_eu<-mice(sub_eu_surf, meth=meth, pred=pred, vis=vis, seed=200, print=F, m=5, maxit=5)
long_eu<- complete(imp_eu, "long", include=TRUE)
long_eu$bmi_category<-cut(as.numeric(long_eu$bmi), breaks=c(0, 18.5, 25, 30, 72))
complete_eu<-as.mids(long_eu)
But I received an error when analyzing my data:
test1<-with(imp_eu, lme(sbp~pm25_global+gender+age+education+bmi_category, random=~1|centre))
Error in eval(expr, envir, enclos) : object 'bmi_category' not found
How does this happen?
You are running your analyses on the original mids object imp_eu, not on the modified complete_eu. Try:
test1<-with(complete_eu, lme(sbp~pm25_global+gender+age+education+bmi_category, random=~1|centre))

Plotting estimated probabilities from binary logistic regression when one or more predictor variables are held constant

I am a biology grad student who has been spinning my wheels for about thirty hours on the following issue. In summary I would like to plot a figure of estimated probabilities from a glm binary logistic regression model i produced. I have already gone through model selection, validation, etc and am now simply trying to produce figures. I had no problem plotting probability curves for the model i selected but what i am really interested in is producing a figure that shows probabilities of a binary outcome for a predictor variable when the other predictor variable is held constant.
I cannot figure out how to assign this constant value to only one of the predictor variables and plot the probability for the other variable. Ultimately i would like to produce figures similar to the crude example i attached desired output. I admit I am a novice in R and I certainly appreciate folks' time but i have exhausted online searches and have yet to find the approach or a solution adequately explained. This is the closest information related to my question but i found the explanation vague and it failed to provide an example for assigning one predictor a constant value while plotting the probability of the other predictor. https://stat.ethz.ch/pipermail/r-help/2010-September/253899.html
Below i provided a simulated dataset and my progress. Thank you very much for your expertise, i believe a solution and code example would be helpful for other ecologists who use logistic regression.
The simulated dataset shows survival outcomes over the winter for lizards. The predictor variables are "mass" and "depth".
x<-read.csv('logreg_example_data.csv',header = T)
x
survival mass depth
1 0 4.294456 262
2 0 8.359857 261
3 0 10.740580 257
4 0 10.740580 257
5 0 6.384678 257
6 0 6.384678 257
7 0 11.596380 270
8 0 11.596380 270
9 0 4.294456 262
10 0 4.294456 262
11 0 8.359857 261
12 0 8.359857 261
13 0 8.359857 261
14 0 7.920406 258
15 0 7.920406 258
16 0 7.920406 261
17 0 10.740580 257
18 0 10.740580 258
19 0 38.824960 262
20 0 9.916840 239
21 1 6.384678 257
22 1 6.384678 257
23 1 11.596380 270
24 1 11.596380 270
25 1 11.596380 270
26 1 23.709520 288
27 1 23.709520 288
28 1 23.709520 288
29 1 38.568970 262
30 1 38.568970 262
31 1 6.581013 295
32 1 6.581013 298
33 1 0.766564 269
34 1 5.440803 262
35 1 5.440803 262
36 1 19.534710 252
37 1 19.534710 259
38 1 8.359857 263
39 1 10.740580 257
40 1 38.824960 264
41 1 38.824960 264
42 1 41.556970 239
#Dataset name is x
# time to run the glm model
model1<-glm(formula=survival ~ mass + depth, family = "binomial", data=x)
model1
summary(model1)
#Ok now heres how i predict the probability of a lizard "Bob" surviving the winter with a mass of 32.949 grams and a burrow depth of 264 mm
newdata<-data.frame(mass = 32.949, depth = 264)
predict(model1, newdata, type = "response")
# the lizard "Bob" has a 87.3% chance of surviving the winter
#Now lets assume the glm. model was robust and the lizard was endangered,
#from all my research I know the average burrow depth is 263.9 mm at a national park
#lets say i am also interested in survival probabilities at burrow depths of 200 and 100 mm, respectively
#how do i use the valuable glm model produced above to generate a plot
#showing the probability of lizards surviving with average burrow depths stated above
#across a range of mass values from 0.0 to 100.0 grams??????????
#i know i need to use the plot and predict functions but i cannot figure out how to tell R that i
#want to use the glm model i produced to predict "survival" based on "mass" when the other predictor "depth" is held at constant values of biological relevance
#I would also like to add dashed lines for 95% CI

How to give a function a specific column of a list using a for-loop but prevent that output is named according to the iterator command

Given the following example:
library(metafor)
dat <- escalc(measure = "RR", ai = tpos, bi = tneg, ci = cpos, di = cneg, data = dat.bcg, append = TRUE)
dat
rma(yi, vi, data = dat, mods = ~dat[[8]], subset = (alloc=="systematic"), knha = TRUE)
trial author year tpos tneg cpos cneg ablat alloc yi vi
1 1 Aronson 1948 4 119 11 128 44 random -0.8893 0.3256
2 2 Ferguson & Simes 1949 6 300 29 274 55 random -1.5854 0.1946
3 3 Rosenthal et al 1960 3 228 11 209 42 random -1.3481 0.4154
4 4 Hart & Sutherland 1977 62 13536 248 12619 52 random -1.4416 0.0200
5 5 Frimodt-Moller et al 1973 33 5036 47 5761 13 alternate -0.2175 0.0512
6 6 Stein & Aronson 1953 NA NA NA NA 44 alternate NA NA
7 7 Vandiviere et al 1973 8 2537 10 619 19 random -1.6209 0.2230
8 8 TPT Madras 1980 505 87886 499 87892 NA random 0.0120 0.0040
9 9 Coetzee & Berjak 1968 29 7470 45 7232 27 random -0.4694 0.0564
10 10 Rosenthal et al 1961 17 1699 65 1600 42 systematic -1.3713 0.0730
11 11 Comstock et al 1974 186 50448 141 27197 18 systematic -0.3394 0.0124
12 12 Comstock & Webster 1969 5 2493 3 2338 33 systematic 0.4459 0.5325
13 13 Comstock et al 1976 27 16886 29 17825 33 systematic -0.0173 0.0714
Now what i basically want is to iterate with the rma() command (only for mods argument) from - let's say - [7:8] and to store this result in a variable equal to the columnname.
Two problems:
1) When i enter the command:
rma(yi, vi, data = dat, mods = ~dat[[8]], subset = (alloc=="systematic"), knha = TRUE)
The modname is named as dat[[8]]. But I want the modname to be the columname (i.e. colnames(dat[i]))
Model Results:
estimate se tval pval ci.lb ci.ub
intrcpt 0.5543 1.4045 0.3947 0.7312 -5.4888 6.5975
dat[[8]] -0.0312 0.0435 -0.7172 0.5477 -0.2185 0.1560
2) Now imagine that I have a lot of columns more and I want to iterate from [8:53], such that each result gets stored in a variable named equal to the columnname.
Problem 2) has been solved:
for(i in 7:8){
assign(paste(colnames(dat[i]), i, sep=""), rma(yi, vi, data = dat, mods = ~dat[[i]], subset = (alloc=="systematic"), knha = TRUE))}
To answers 1st part of your question, you can change the names by accessing the attributes of the model object.
In this case
# inspect the attributes
attr(model$vb, which = "dimnames")
# assign the name
attr(model$vb, which = "dimnames")[[1]][2] <- paste(colnames(dat)[8])

Predicting/imputing the missing values of a Poisson GLM Regression in R?

I'm trying to explore ways of imputing missing values in a data set. My dataset contains the number of counts of an occurance (Unnatural, Natural and the sum Total) for Year(2001-2009), Month(1-12), Gender(M/F) and AgeGroup(4 groups).
One of the imputation techniques I'm exploring is (poisson) regression imputation.
Say my data looks like this:
Year Month Gender AgeGroup Unnatural Natural Total
569 2006 5 Male 15up 278 820 1098
570 2006 6 Male 15up 273 851 1124
571 2006 7 Male 15up 304 933 1237
572 2006 8 Male 15up 296 1064 1360
573 2006 9 Male 15up 298 899 1197
574 2006 10 Male 15up 271 819 1090
575 2006 11 Male 15up 251 764 1015
576 2006 12 Male 15up 345 792 1137
577 2007 1 Female 0 NA NA NA
578 2007 2 Female 0 NA NA NA
579 2007 3 Female 0 NA NA NA
580 2007 4 Female 0 NA NA NA
581 2007 5 Female 0 NA NA NA
...
After doing a basic GLM regression - 96 observations have been deleted due to them being missing.
Is there perhaps a way/package/function in R which will use the coefficients of this GLM model to 'predict' (ie. impute) the missing values for Total (even if it just stores it in a separate dataframe - I will use Excel to merge them)? I know I can use the coefficients to predict the different hierarchal rows - but this will take forever. Hopefully there's an one step function/method?
Call:
glm(formula = Total ~ Year + Month + Gender + AgeGroup, family = poisson)
Deviance Residuals:
Min 1Q Median 3Q Max
-13.85467 -1.13541 -0.04279 1.07133 10.33728
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 13.3433865 1.7541626 7.607 2.81e-14 ***
Year -0.0047630 0.0008750 -5.443 5.23e-08 ***
Month 0.0134598 0.0006671 20.178 < 2e-16 ***
GenderMale 0.2265806 0.0046320 48.916 < 2e-16 ***
AgeGroup01-4 -1.4608048 0.0224708 -65.009 < 2e-16 ***
AgeGroup05-14 -1.7247276 0.0250743 -68.785 < 2e-16 ***
AgeGroup15up 2.8062812 0.0100424 279.444 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 403283.7 on 767 degrees of freedom
Residual deviance: 4588.5 on 761 degrees of freedom
(96 observations deleted due to missingness)
AIC: 8986.8
Number of Fisher Scoring iterations: 4
First, be very careful about the assumption of missing at random. Your example looks like missingness co-occurs with Female and agegroup. You should really test whether missingness is related to any predictors (or whether any predictors are missing). If so, the responses could be skewed.
Second, the function you are seeking is likely to be predict, which can take a glm model. See ?predict.glm for more guidance. You may want to fit a cascade of models (i.e. nested models) to address missing values.
The mice package provides a function of the same name that allows each missing value to be predicted using a regression scheme based on the other values. It can cope with predictors also being missing because it uses an iterative MCMC algorithm.
I don't think poisson regression is an option, but if all of your counts are as large as the example normal regression should offer a reasonable approximation.

Resources