obscure warning lme4 using lmer in optwrap - r

Using lmer I get the following warning:
Warning messages:
1: In optwrap(optimizer, devfun, x#theta, lower = x#lower) :
convergence code 3 from bobyqa: bobyqa -- a trust region step failed to reduce q
This error ois generated after using anova(model1, model2) . I tried to make this reproducible but if I dput the data and try again I the error does not reproduce on the dput data, despite the original and new datarames have the exact same str.
If have tried again in a clean session, and the error reproduces, and again is lost with a dput
I know I am not giving people much to work with here, like i said I would love to reproduce the problem. Cayone shed light on this warning?

(I'm not sure whether this is a comment or an answer, but it's a bit long and might be an answer.)
The proximal cause of your difficulty with reproducing the result is that lme4 uses both environments and reference classes: these are tricky to "serialize", i.e. to translate to a linear stream that can be saved via dput() or save(). (Can you please try save() and see if it works better than dput()?
In addition, both environments and reference classes use "pass-by-reference" semantics, so operating on the saved model can change it. anova() automatically refits the model, which makes some tiny but non-zero changes in the internal structure of the saved model object (we are still trying to track this down).
#alexkeil's comment is wrong: the nonlinear optimizers used within lme4 do not use any calls to the pseudo-random number generator. They are deterministic (but the two points above explain why things might look a bit weird).
To allay your concerns with the fit, I would check the fit by computing the gradient and Hessian at the final fit, e.g.
library(lme4)
library(numDeriv)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
dd <- update(fm1,devFunOnly=TRUE)
params <- getME(fm1,"theta") ## also need beta for glmer fits
grad(dd,params)
## all values 'small', say < 1e-3
## [1] 0.0002462423 0.0003276917 0.0003415010
eigen(solve(hessian(dd,params)),only.values=TRUE)$values
## all values positive and of similar magnitude
## [1] 0.029051631 0.002757233 0.001182232
We are in the process of implementing similar checks to run automatically within lme4.
That said, I would still love to see your example, if there's a way to reproduce it relatively easily.
PS: in order to be using bobyqa, you must either be using glmer or have used lmerControl to modify the default optimizer choice ... ??

Related

Is glmmTMB truncated negative binomial family still under development?

I have been implementing some negative binomial hurdle models in the R package glmmTMB and have come across something perplexing about the truncated negative binomial family.
In examining the source for that family argument I have found:
truncated_nbinom2 <- function(link="log") {
r <- list(family="truncated_nbinom2",
variance=function(mu,theta) {
stop("variance for truncated nbinom2 family not yet implemented")
})
return(make_family(r,link))
}
I am wondering if this family is still under development (as indicated by the stop command in the variance)?
It is documented as working in the vignette, and I am getting reasonable estimates from the models I have fit using this family (e.g. simulated data from the model seem sensible). I know many of the authors of the package are on this forum so I hoped someone might be able to clarify.
The truncated_nbinom2 family should work fine for most purposes. Looking through the glmmTMB source code (grep "\$variance" R/*.R) the $variance component of the family object is used only:
computing Pearson residuals
in creating objects to be used by the effects package
You may run into trouble somewhere else in the pipeline, if you're using downstream packages that need to use the expected variance of an object to compute something. But everything else should be fine.
PS I found an expression for this variance and created an issue to remind us to implement it: https://github.com/glmmTMB/glmmTMB/issues/606
PPS this is in the development version now (unfortunately, I'm pretty sure the paper I found only covers truncated NB2, so truncated NB1 may have to wait a while. However, the answer still applies - the absence of a variance function will only cause trouble in a few circumstances, and should never cause subtle trouble ...)

error when fitting random effects model using bam() rather than gam() function in mgcv package, R

I am fitting a model with many random effects using the bam() function within the mgcv package for R. My basic model structure looks like:
fit <- bam(y ~ s(x1) + s(x2) + s(xn) + s(plot, bs = 're'), data = dat)
This function works for 4 subsets of my data, but not the fifth, which is surprising. Instead, it throws this error:
Error in qr.qty(qrx, f) :
right-hand side should have 14195 not 14196 rows
This error goes away if I switch to using the gam() rather than bam() function. It also goes away if I drop the random effect from the model. I am really unsure whats causing this error, or what to do about it. Unfortunately, generating a reproducible example would require passing along a very large dataset, as its not clear why this error is thrown on this particular dataset, compared to 4 other datasets fitting the exact same model.
Any idea why this error is being thrown, and how to overcome it, would be greatly appreciated.
I had the same question and I found this r-help mail which tries to solve the same problem:
[R] bam (mgcv) not using the specified number of cores
After reading the mail, I deleted all the code about the cluster, such as the argument cluster in bam() function. Then the error message goes away.
I don't know the details but I hope this trick will help you.
One possible cause of
Error in qr.qty(qrx, f) :
right-hand side should have 14195 not 14196 rows
is running out of RAM. This may explain why you have seen the error for some datasets but not others. This is especially common when using a large cluster size.

R: gls error "false convergence (8)" and glsControl function

I've seen that a common error when running a generalized least squares (gls) from nlme package in R is the "false convergence (8)". I am trying to run gls models to account for the spatial dependence of my residuals, but I got stucked with the same problem. For example:
library(nlme)
set.seed(2)
samp.sz<-400
lat<-runif(samp.sz,-4,4)
lon<-runif(samp.sz,-4,4)
exp1<-rnorm(samp.sz)
exp2<-rnorm(samp.sz)
resp<-1+4*exp1-3*exp2-2*lat+rnorm(samp.sz)
mod.cor<-gls(resp~exp1+exp2,correlation=corGaus(form=~lat,nugget=TRUE))
Error in gls(resp ~ exp1 + exp2, correlation = corGaus(form = ~lat, nugget = TRUE)) :
false convergence (8)
(the above data simulation was copied from here because it yields the same problem I am facing).
Then, I read that the function glsControl has some parameters (maxIter, msMaxIter, returnObject) that can be setted prior running the analysis, which can solve this error. As an attempt to understand what was going on, I adjusted the three parameters above to 500, 2000 and TRUE, and ran the same code above, but the error still shows up. I think that the glsControl didn't work at all, because none result was shown even I've asked for it.
glsControl(maxIter = 500, msMaxIter=2000, returnObject = TRUE)
mod.cor<-gls(resp~exp1+exp2,correlation=corGaus(form=~lat,nugget=TRUE))
For comparison, if I run different models with the same variables, it works fine and no error is shown.
For example, models containing only one explanatory variable.
mod.cor2<-gls(resp~exp1,correlation=corGaus(form=~lat,nugget=TRUE))
mod.cor3<-gls(resp~exp2,correlation=corGaus(form=~lat,nugget=TRUE))
I really digged into several sites, foruns and books in a desperate search trying to solve it, and then I come to know that the 'false convergence' is a recurrent error that many users have faced. However, none of the previous posts seems to solve it for me. i really thought the glsControl could provide an alternative, but it didn't. Do you guys have a clue on how can I solve that?
I really appreciate any help. Thanks in advance.
The problem is that the nugget effect is very small. Provide better starting values:
mod.cor <- gls(resp ~ exp1 + exp2,
correlation = corGaus(c(200, 0.1), form = ~lat, nugget = TRUE))
summary(mod.cor)
#<snip>
#Correlation Structure: Gaussian spatial correlation
# Formula: ~lat
# Parameter estimate(s):
# range nugget
#2.947163e+02 5.209379e-06
#</snip>
Note that this model may be sensitive to starting values even if there is no error or warning.
I would like to add a quote from library(lme4); help("convergence"):
The lme4 package uses general-purpose nonlinear optimizers (e.g.
Nelder-Mead or Powell's BOBYQA method) to estimate the
variance-covariance matrices of the random effects. Assessing reliably
whether such algorithms have converged is difficult.
I believe something similar applies here. This model is clearly problematic and you should be grateful for getting this error. You should at least check how the fit changes with different starting values and try increasing the number of iterations or decreasing the tolerance. In the end, I would suggest looking for a model that better fits the data (we know that this would be an OLS model including lat as a linear predictor here).
PS: A good coding style uses blanks where appropriate.

lmerTest:::anova uses lazy loading of data sets?

Ran into this problem while trying to get the empirical distribution of the K-R degrees of freedom...
This seems like fairly dangerous behaviour? Does it constitute a bug?
Reproducible example:
## import lmerTest package
library(lmerTest)
## an object of class merModLmerTest
m <- lmer(Informed.liking ~ Gender+Information+Product +(1|Consumer), data=ham)
# simulate data from fitted model
simData=ham
simData$Informed.liking=unlist(simulate(m))
# fit model to simulated data
m1 <- lmer(Informed.liking ~ Gender+Information+Product +(1|Consumer), data=simData)
stats:::anova(m1)
lmerTest:::anova(m1)
# simulate again, WITHOUT refitting
simData$Informed.liking=unlist(simulate(m))
stats:::anova(m1) # same as before
lmerTest:::anova(m1) # not same as before!
my response does not constitute a solid answer, rather an extended comment:
this looks pretty bad - in fact I have discovered today that almost all the analyses that I conducted in a project that was on the verge of submission have to be redone because of a related behavior of lmerTest.
The problem I have run into was when I used a short function that fits a model with lmer and then returns coef(summary(model)) - simple stuff, two lines of code. However the input to this function was named data and I also had a dataframe called data in the workspace. It seems that although during fitting with lmer the local variable from the function scope was correctly used, during summary the workspace data variable was used (which often was not the same as the dataframe passed to the function) leading to invalid t values and degrees of freedom leading to incorrect p values (the estimates and their standard error was ok however).
So, answering your question:
This seems like fairly dangerous behaviour? Does it constitute a bug?
It seems dangerous indeed and I would definitelly call this a bug.

R package "spatstat": How do you get standard errors for non-smooth terms in a poisson process model (function: ppm) when use.gam=TRUE?

In the R package spatstat (I am using the current version, 1.31-0) , there is an option use.gam. When you set this to true, you can include smooth terms in the linear predictor, the same way you do with the R package mgcv. For example,
g <- ppm(nztrees, ~1+s(x,y), use.gam=TRUE)
Now, if I want a confidence interval for the intercept, you can usually use summary or vcov, which works when you don't use gam but fails when you do use gam
vcov(g)
which gives the error message
Error in model.frame.default(formula = fmla, data =
list(.mpl.W = c(7.09716796875, :invalid type (list) for variable 's(x, y)'
I am aware that this standard error approximation here is not justified when you use gam, but this is captured by the warning message:
In addition: Warning message: model was fitted by gam();
asymptotic variance calculation ignores this
I'm not concerned about this - I am prepared to justify the use of these standard errors for the purpose I'm using them - I just want the numbers and would like to avoid "writing-my-own" to do so.
The error message I got above does not seem to depend on the data set I'm using. I used the nztrees example here because I know it comes pre-loaded with spatstat. It seems like it's complaining about the variable itself, but the model clearly understands the syntax since it fits the model (and the predicted values, for my own dataset, look quite good, so I know it's not just pumping out garbage).
Does anybody have any tips or insights about this? Is this a bug? To my surprise, I've been unable to find any discussion of this online. Any help or hints are appreciated.
Edit: Although I have definitively answered my own question here, I will not accept my answer for the time being. That way, if someone is interested and willing to put in the effort to find a "workaround" for this without waiting for the next edition of spatstat, I can award the bounty to him/her. Otherwise, I'll just accept my own answer at the end of the bounty period.
I have contacted one of the package authors, Adrian Baddeley, about this. He promptly responded and let me know that this is indeed a bug with the software and, apparently, I am the first person to encounter it. Fortunately, it only took him a short time to track down the issue and correct it. The fix will be included in the next release of spatstat, 1.31-1.
Edit: The updated version of spatstat has been released and does not have this bug anymore:
g <- ppm(nztrees, ~1+s(x,y), use.gam=TRUE)
sqrt( vcov(g)[1,1] )
[1] 0.1150982
Warning message:
model was fitted by gam(); asymptotic variance calculation ignores this
See the spatstat website for other release notes. Thanks to all who read and participated in this thread!
I'm not sure you can specify the trend the way you have which is possibly what is causing the error. It doesn't seem to make sense according to the documentation:
The default formula, ~1, indicates the model is stationary and no trend is to be fitted.
But instead you can specify the model like so:
g <- ppm(nztrees, ~x+y, use.gam=TRUE)
#Then to extract the coefficientss:
>coef(g)
(Intercept) x y
-5.0346019490 0.0013582470 -0.0006416421
#And calculate their se:
vc <- vcov(g)
se <- sqrt(diag(vc))
> se
(Intercept) x y
0.264854030 0.002244702 0.003609366
Does this make sense/expected result? I know that the package authors are very active on the r-sig-geo mailing lsit as they have helped me in the past. You may also want to post your question to that mailing list, but you should reference your question here when you do.

Resources