Modeling beta-binomial distributed data using glmmTBM - r

Im trying to fit a mixed effect model to asses for effects upon the rate of germinated polen grains. I started with a binomial distribution with a model structure like this:
glmer(cbind(NGG,NGNG) ~ RH3*Altitude + AbH + Date3 + (1 | Receptor/Code/Plant) +
(1 | Mountain/Community), data=database, family="binomial",
control = glmerControl(optimizer="bobyqa"))
Where NGG is the number of successes (germinated grains per stigma, can vary from 0 to e.g. 55), NGNG the number of failures (non-germinated grains 0 to e.g. 80). The issue is, after seeing the results, data seems to be over-dispersed, as indicated by the function (found in http://rstudio-pubs-static.s3.amazonaws.com/263877_d811720e434d47fb8430b8f0bb7f7da4.html):
overdisp_fun <- function(model) {
vpars <- function(m) {
nrow(m)*(nrow(m)+1)/2
}
model.df <- sum(sapply(VarCorr(model), vpars)) + length(fixef(model))
rdf <- nrow(model.frame(model))-model.df
rp <- residuals(model, type = "pearson") # computes pearson residuals
Pearson.chisq <- sum(rp^2)
prat <- Pearson.chisq/rdf
pval <- pchisq(Pearson.chisq, df = rdf, lower.tail = FALSE)
c(chisq = Pearson.chisq, ratio = prat, rdf = rdf, p = pval)
}
The output was:
chisq = 1.334567e+04, ratio = 1.656201e+00, rdf = 8.058000e+03, p = 3.845911e-268
So I decided to try a beta-binomial in glmmTMB as follows (its important to keep this hierarchical structure):
glmmTMB(cbind(NGG,NGNG) ~ RH3*Altitude + AbH + Date3 + (1 | Receptor/Code/Plant) +
(1 | Mountain/Community), data=database,
family=betabinomial(link = "logit"), na.action = na.omit, weights=NGT)
When I run it.. says:
Error in nlminb(start = par, objective = fn, gradient = gr, control = control$optCtrl) : (converted from warning) NA/NaN function evaluation
Is there something wrong in the model writing? I already checked for posible issues in (http://rstudio-pubs-static.s3.amazonaws.com/263877_d811720e434d47fb8430b8f0bb7f7da4.html) but did not find any solution yet.
thanks

Related

Comparison between configural and metric model (MI) using semTools

I try to analyze for measurement invariance with multi groups (8 groups; group = "stort"). Below you can see my syntax. The wb variables are all ordinal, ranging from 1-5. There are 8 different groups: n1 =233, n2= 832 n3=67 n4=68 n5=530 n6=169 n7=139 n8=108
The following error occurs: If I try to compare the configural and the metric model using semTools (see syntax) I get an error message: Error in A %*% P.inv : requires numeric/complex matrix/vector arguments (also if I use lavTestLRT()). Other model comparisons run without any errors.
My question is, how can I fix this? I appreciate all comments :)
Here, you can find the dataset: https://drive.google.com/drive/folders/1h9hpFoRhz-zphJ3NSfXtDxbWoad3CmQI?usp=sharing
This is my model:
modelwb.1<- 'kaw =~ wb1.1 + wb2.1 + wb3.1invers + wb4.1invers
paw =~ wb5.1 + wb6.1 + wb7.1invers + wb8.1invers
saw =~ wb9.1 + wb10.1 + wb11.1invers + wb12.1invers
kaw ~ paw + saw
paw ~ saw
kaw ~~ paw + saw
paw ~~ saw
'
fitwb.1 <- cfa(modelwb.1, data=df_wide, estimator = "WLSMV")
summary(fitwb.1, fit.measures = TRUE, standardized = TRUE)
#### Configural model
fitwbcon.1 <- cfa(modelwb.1, data = df_wide, group = "stort", estimator = "WLSMV", missing = "pairwise")
summary(fitwbcon.1, fit.measures = TRUE, standardized = TRUE)
#### Metric model
fitwbmet.1 <- cfa(modelwb.1, data = df_wide, group = "stort", estimator = "WLSMV", missing = "pairwise", group.equal = "loadings")
summary(fitwbmet.1, fit.measures = TRUE, standardized = TRUE)
##### Model comparison: configural & metric
x1<- semTools::compareFit(fitwbcon.1, fitwbmet.1)
summary(x1)
##or:
lavTestLRT(fitwbcon.1, fitwbmet.1)

R: cant get a lme{nlme} to fit when using self-constructed interaction variables

I'm trying to get a lme with self constructed interaction variables to fit. I need those for post-hoc analysis.
library(nlme)
# construct fake dataset
obsr <- 100
dist <- rep(rnorm(36), times=obsr)
meth <- dist+rnorm(length(dist), mean=0, sd=0.5); rm(dist)
meth <- meth/dist(range(meth)); meth <- meth-min(meth)
main <- data.frame(meth = meth,
cpgl = as.factor(rep(1:36, times=obsr)),
pbid = as.factor(rep(1:obsr, each=36)),
agem = rep(rnorm(obsr, mean=30, sd=10), each=36),
trma = as.factor(rep(sample(c(TRUE, FALSE), size=obsr, replace=TRUE), each=36)),
depr = as.factor(rep(sample(c(TRUE, FALSE), size=obsr, replace=TRUE), each=36)))
# check if all factor combinations are present
# TRUE for my real dataset; Naturally TRUE for the fake dataset
with(main, all(table(depr, trma, cpgl) >= 1))
# construct interaction variables
main$depr_trma <- interaction(main$depr, main$trma, sep=":", drop=TRUE)
main$depr_cpgl <- interaction(main$depr, main$cpgl, sep=":", drop=TRUE)
main$trma_cpgl <- interaction(main$trma, main$cpgl, sep=":", drop=TRUE)
main$depr_trma_cpgl <- interaction(main$depr, main$trma, main$cpgl, sep=":", drop=TRUE)
# model WITHOUT preconstructed interaction variables
form1 <- list(fixd = meth ~ agem + depr + trma + depr*trma + cpgl +
depr*cpgl +trma*cpgl + depr*trma*cpgl,
rndm = ~ 1 | pbid,
corr = ~ cpgl | pbid)
modl1 <- nlme::lme(fixed=form1[["fixd"]],
random=form1[["rndm"]],
correlation=corCompSymm(form=form1[["corr"]]),
data=main)
# model WITH preconstructed interaction variables
form2 <- list(fixd = meth ~ agem + depr + trma + depr_trma + cpgl +
depr_cpgl + trma_cpgl + depr_trma_cpgl,
rndm = ~ 1 | pbid,
corr = ~ cpgl | pbid)
modl2 <- nlme::lme(fixed=form2[["fixd"]],
random=form2[["rndm"]],
correlation=corCompSymm(form=form2[["corr"]]),
data=main)
The first model fits without any problems whereas the second model gives me following error:
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
Nothing i found out about this error so far helped me to solve the problem. However the solution is probably pretty easy.
Can someone help me? Thanks in advance!
EDIT 1:
When i run:
modl3 <- lm(form1[["fixd"]], data=main)
modl4 <- lm(form2[["fixd"]], data=main)
The summaries reveal that modl4 (with the self constructed interaction variables) in contrast to modl3 shows many more predictors. All those that are in 4 but not in 3 show NA as coefficients. The problem therefore definitely lies within the way i create the interaction variables...
EDIT 2:
In the meantime I created the interaction variables "by hand" (mainly paste() and grepl()) - It seems to work now. However I would still be interested in how i could have realized it by using the interaction() function.
I should have only constructed the largest of the interaction variables (combining all 3 simple variables).
If i do so the model gets fit. The likelihoods then are very close to each other and the number of coefficients matches exactly.

custom split rule with partykit

this post follows this question : https://stackoverflow.com/questions/31234329/rpart-user-defined-implementation
I'm very interested in tools which could handle tree growing with customized criteria, such that I could test different model.
I tried to use the partykit R package to grow a tree for which the split rule is given by the negative log-likelihood of a Cox model (which is log-quasi-likelihood in case of the Cox model) and a Cox model is fitted in each leaf.
As I understood reading the vignette about the MOB function, there are two way to implement my own split criteria, namely to get the fit function return either a list or a model object.
For my purpose, I tried the two solutions but I failed to make it work.
Solution 1 : return a list object :
I take as an example the "breast cancer dataset" as in the "mob" vignette.
I tried this :
cox1 = function(y,x, start = NULL, weights = NULL, offset = NULL, ...,
estfun = FALSE, object = TRUE){
res_cox = coxph(formula = y ~ x )
list(
coefficients = res_cox$coefficients,
objfun = - res_cox$loglik[2],
object = res_cox)
}
mob(formula = Surv(time, cens) ~ horTh + pnodes - 1 | age + tsize + tgrade + progrec +
estrec + menostat ,
data = GBSG2 ,
fit = cox1,
control = mob_control(alpha = 0.0001) )
There is a warning about the singularity of the X matrix, and the mob function a tree with a single node (even with smaller values for alpha).
Note that there is no singularity problem with the X matrix when running the coxph function :
res_cox = coxph( formula = Surv(time, cens) ~ horTh + pnodes ,
data = GBSG2 )
Solution 2 : Return a coxph.object :
I tried this :
cox2 = function(y,x, start = NULL, weights = NULL, offset = NULL, ... ){
res_cox = coxph(formula = y ~ x )
}
logLik.cox2 <- function(object, ...)
structure( - object$loglik[2], class = "logLik")
mob(formula = Surv(time, cens) ~ horTh + pnodes - 1 | age + tsize + tgrade + progrec +
estrec + menostat ,
data = GBSG2 ,
fit = cox2,
control = mob_control(alpha = 0.0001 ) )
So this time I get a split along the "progrec" variable :
Model-based recursive partitioning (cox2)
Model formula:
Surv(time, cens) ~ horTh + pnodes - 1 | age + tsize + tgrade +
progrec + estrec + menostat
Fitted party:
[1] root
| [2] progrec <= 21: n = 281
| xhorThno xhorThyes xpnodes
| 0.19306661 NA 0.07832756
| [3] progrec > 21: n = 405
| xhorThno xhorThyes xpnodes
| 0.64810352 NA 0.04482348
Number of inner nodes: 1
Number of terminal nodes: 2
Number of parameters per node: 3
Objective function: 1531.132
Warning message:
In coxph(formula = y ~ x) : X matrix deemed to be singular; variable 2
I would like to know what's wrong with my Solution 1.
I also tried a similar thing for a regression problem and get the same result, ending with a single leaf :
data("BostonHousing", package = "mlbench")
BostonHousing <- transform(BostonHousing,
chas = factor(chas, levels = 0:1, labels = c("no", "yes")),
rad = factor(rad, ordered = TRUE))
linear_reg = function(y,x, start = NULL, weights = NULL, offset = NULL, ...,
estfun = FALSE, object = TRUE){
res_lm = glm(formula = y ~ x , family = "gaussian")
list(
coefficients = res_lm$coefficients,
objfun = res_lm$deviance,
object = res_lm )
}
mob( formula = medv ~ log(lstat) + I(rm^2) | zn + indus + chas + nox +
+ age + dis + rad + tax + crim + b + ptratio,
data = BostonHousing ,
fit = linear_reg)
Also I would like to know if there is no problem using a variable for both "fit the model in a node" and "make a split".
Thank you in advance.
I will probably have other questions about partykit functioning.
The problem with the cox1() and linear_reg() functions you have set up are that you do not supply the estimating functions aka score contributions. As these are the basis for the inference that selects the splitting variable, the algorithm does not split at all if these are not provided. See this recent answer for some discussion of this issues.
But for coxph() objects (unlike the fitdistr() example in the discussion linked above) it is very easy to obtain these estimating functions or scores because there is an estfun() method available. So your cox2() approach is the easier route to go here.
The reason that the latter doesn't work correctly is due to the special handling of intercepts in coxph(). Internally, this always forces the intercept into the model but then omits the first column from the design matrix. When interfacing this through mob() you need to be careful not to mess this up because mob() sets up its own model matrix. And because you exclude the intercept, mob() thinks that it can estimate both levels of horTh. But this is not the case because the intercept is not identified in the Cox-PH model.
The best solution in this case (IMO) is the following: You let mob() set up an intercept but then exclude it again when passing the model matrix to coxph(). Because there are coef(), logLik(), and estfun() methods for the resulting objects, one can use the simple setup of your cox2() function.
Packages and data:
library("partykit")
library("survival")
data("GBSG2", package = "TH.data")
Fitting function:
cox <- function(y, x, start = NULL, weights = NULL, offset = NULL, ... ) {
x <- x[, -1]
coxph(formula = y ~ 0 + x)
}
Fitting of the MOB tree to the GBSG2 data:
mb <- mob(formula = Surv(time, cens) ~ horTh + pnodes | age + tsize + tgrade + progrec + estrec + menostat,
data = GBSG2, fit = cox)
mb
## Model-based recursive partitioning (cox)
##
## Model formula:
## Surv(time, cens) ~ horTh + pnodes | age + tsize + tgrade + progrec +
## estrec + menostat
##
## Fitted party:
## [1] root: n = 686
## xhorThyes xpnodes
## -0.35701115 0.05768026
##
## Number of inner nodes: 0
## Number of terminal nodes: 1
## Number of parameters per node: 2
## Objective function: 1758.86

correlation in multivariate mixed model in r

I am running multivariate mixed model in R by using nlme package. Suppose that x and y are responses variables for longitudinal data which assumed that the error within group is correlated. The residual error matrix is presented as:
So my question is how to involve the correlation into lme function?
I tried commands corr = corComSymm(from =~ 1 | x) or corr = corAR1(from =~ 1 | x) but did not work!
here en example:
# visiting time by months
time = rep(c(0,3,6,9),time = 4, 200)
# subjects
subject = rep(1:50, each = 4)
# first response variable "identity"
x = c(rep(0, 100), rep(1,100))
# second response variable "identity"
y = c(rep(1, 100), rep(0,100))
# values of both reponses variables (x_1, x_2)
value = c(rnorm(100,20,1),rnorm(100,48,1))
# variables refer to reponses variables (x_1, x_2)
variable = factor(c(rep(0,150),rep(1,50)), label=c("X","Y"))
df = data.frame(subject , time, x,y,value, variable)
library(nlme)
# fit the model that each response variable has intercept and slope (time) for each random and fixed effects
# as well as fixed effects slopes for sex and lesion, and each response has different variance
f= lme(value ~ -1 + x + y + x:time + y:time , random = ~ -1 + (x + y) + time:( x + y)|subject ,
weights = varIdent(form=~1| x),corr = corAR1(from = ~ 1|x), control=lmeControl(opt="optim"), data =df)
Error in corAR1(from = ~1 | x) : unused argument (from = ~1 | x)
Any suggestions?
I found this website (below) which helpful and useful, I posted here in case someone might has this problem in future.
https://rpubs.com/bbolker/3336

bootstrapping with lme4 model and missing values

I am working through an example from Aguinis, Gottfredson, & Culpepper (2013). They have provided some R code to perform a bootstrapping procedure in R to estimate confidence intervals for slope variances. This is their original R code:
library(RLRsim)
#STEP 3: Random Intercept and Random Slope model
lmm.fit3=lmer(Y ~ (Xc|l2id) + Xc + I(Wj-mean(Wj)), data=exdata, REML=F)
# Nonparametric Bootstrap Function
REMLVC=VarCorr(lmer(Y ~Xc+(Xc|l2id)+I(Wj-mean(Wj) ),data=exdata,REML=T))$l2id[1:2,1:2]
U.R=chol(REMLVC)
REbootstrap=function(Us,es,X,gs){
nj=nrow(Us)
idk=sample(1:nj,size=nj,replace=T)
Usk=as.matrix(Us[idk,])
esk=sample(es,size=length(es),replace=T)
S=t(Usk)%*%Usk/nj
U.S = chol(S)
A=solve(U.S)%*%U.R
Usk = Usk%*%A
datk=expand.grid(l1id = 1:6,l2id = 1:nj)
colnames(X)=c('one','Xc','Wjc')
datk=cbind(datk,X)
datk$yk = X%*%gs + Usk[datk$l2id,1]+Usk[datk$l2id,2]*X[,2]+esk
lmm.fitk=lmer(yk ~Xc+(Xc|l2id)+Wjc,data=datk,REML=F)
tau11k = VarCorr(lmm.fitk)$l2id[2,2]
tau11k
}
# Implementing Bootstrap
bootks=replicate(1500,REbootstrap(Us=ranef(lmm.fit3)$l2id,es=resid(lmm.fit3),X=model.matrix(lmm.fit3),gs=fixef(lmm.fit3)))
quantile(bootks,probs=c(.025,.975))
I was trying to adapt the code to suit my own data and model. That was unfruitful so far because (a) I do not fully understand all the lines of code and (b) I have missing datapoints in one of my predictors. Here is what I have so far:
#reproducible code
set.seed(855)
exdf <- data.frame(
ID= c(rep(1:105, 28)),
content= sort(c(rep(1:28, 105))),
PrePost= sample(0:1, 105*28, replace=TRUE),
eyeFRF= sort(rep(rnorm(28), 105)),
APMs= sample(0:1, 105*28, replace=TRUE),
Gf= rep(rnorm(105), 28)
)
exdf[which(exdf$ID==62), "eyeFRF"] <- NA
RandomMissing <- sample(rownames(exdf[-which(exdf$ID==62), ]), 17)
exdf[RandomMissing, "eyeFRF"] <- NA
View(exdf)
#model
M03b <- glmer(APMs ~ PrePost + Gf + eyeFRF + (1|content) + (eyeFRF|ID), data=exdf, family=binomial("logit"))
#own adaptation
REMLVC=VarCorr(M03b)$ID[1:2,1:2]
U.R=chol(REMLVC)
REbootstrap=function(Us, es, X, gs){
#Us = random effects
#es = residuals
#X = design matrix
#gs = fixed effects
nj = nrow(Us) #104 in this case, one is excluded (#62) b/c no eye-data
idk = sample(1:nj, size=nj, replace=TRUE) #104 IDs
Usk = as.matrix(Us[idk,]) #104 intercepts and slopes
esk = sample(es, size=length(es), replace=TRUE) #2895 datapoints called 'x' (errors?)
S = t(Usk)%*%Usk/nj #?
U.S = chol(S) #?
A = solve(U.S)%*%U.R #?
Usk = Usk%*%A #?
datk = expand.grid(content=1:28, ID=1:nj)
colnames(X) = c('one', 'PrePost', 'Gf', 'eyeFRF')
datk = cbind(datk, X)
datk$APMsk = X%*%gs + Usk[datk$ID,1] + Usk[datk$ID,2]*X[ ,2] + esk
lmm.fitk = glmer(APMsk ~ PrePost + Gf + eyeFRF + (1|content) + (zb|ID), data=datk, family=binomial("logit"))
tau11k = VarCorr(lmm.fitk)$l2id[2,2]
tau11k
}
# Implementing Bootstrap
bootks <- replicate(1500, REbootstrap(Us=ranef(M03b)$ID, es=resid(M03b), X=model.matrix(M03b), gs=fixef(M03b)))
quantile(bootks, probs=c(.025,.975))
(upgrading comment to an answer)
If you're trying to get confidence intervals via parametric bootstrapping, would confint(M03b,method="boot") work for you? (I think these methods may be new or better developed since that paper was written ...)

Resources