R: ...centering vec message - r

I am running a regression with lots of regressors (due to multiple interactions). R has been evaluating this regression for over 6 hours now and I keep receiving messages like:
...centering vec 2 i:6356053 c:2.0e-007 d:3.2e-001(t:2.1e-006)
ETA:7/21/2016 5:43:18 PM
I couldn't find anything about this type of message on the web, does anyone know what it means?

The error originates from the C source for lfe (line 214). According to the source:
i is the number of iterations you've gone through to try to converge on a solution (so in the OP's case, 6.35 million...)
c is your convergence rate for a given iteration
d is delta
t is your target
The message itself appears to be a primitive progress bar to keep you apprised of how many iterations you've done and the ETA of a result. In this case, it appears you are having troubles converging on a solution. The reason for why this is is be a better question for Cross Validated and you would be best served by providing them with a minimal reproducible example.
A good start on your own would be to carefully read the lfe documentation and vignettes. There is an entire vignette about speed that suggests other functions you may want to try (apart from felm) if your procedure is going slowly or not converging.

Related

Results different after latest JAGS update?

I am running Bayesian Hierarchical Modeling in R using R2jags. When I open code I used a month ago and run it on a dataset I used a month ago (verified by "date modified" in windows explorer), I get different results than I got a month ago. The only difference I can think of is I got a new work computer in the last month and we installed JAGS 4.3.0. I was previously using 4.2.0.
Is it remotely possible to get different results just from updating my version of JAGS? I'm not posting code or results here because I don't need help troubleshooting it - everything is exactly the same.
Edit:
Conversion seems fine - Gewekes, autocorrelation plots, and trace plots look good. That hasn't changed.
I have a seed set both via set.seet () and jags.seed=. Is that enough? I've never had a problem replicating these types of results before.
As far as how different the results are, they are large enough to cause meaningful difference in the inference. I am assessing relationships between 30 chemical exposures and a health outcome in among 336 humans. Here are two examples. Chemical B troubles me the most because of the credible interval shift. Chemical A is another example.
I also doubled the number of iterations from 50k to 100k which resulted in very minor/inconsequential differences.
Edit 2:
I posted at source forge asking about the different default RNGs for versions: https://sourceforge.net/p/mcmc-jags/discussion/610037/thread/52bfef7d17/
There are at least 3 possible reasons for you seeing a difference between results from these models:
One or both of your attempts to fit this model did not converge, and/or your effective sample size is so small that random sampling error is having a large impact on your inference. If you have already checked to ensure convergence and sufficient effective sample size (for both models) then you can rule this out.
You are seeing small differences in the posteriors due to the random sampling inherent to MCMC in otherwise converged results. If these differences are big enough to cause a meaningful difference in inference then your effective sample size is not high enough - so just run the models for longer and the difference should reduce. You can also set the random seed in JAGS using initial values for .RNG.seed and .RNG.name so that successive model runs are numerically identical. If you run the models for longer and this difference does not reduce (or if it is a large difference to begin with) then you can rule this out.
Your model contains a node for which the default sampling scheme changed between JAGS 4.2.0 and 4.3.0 - there were some changes to sampling schemes (and the order of precedence for assigning samplers to nodes) that could conceivably have changed your results (from memory I think this affected GLM particularly, but I can't remember exactly). However, although this may affect the probability of convergence, it should not substantially affect the posterior if the model does converge. It may be contributing to a numerical difference as explained for point (2) though.
I'd recommend first ensuring convergence of both models, and then (assuming they did both converge) looking at exactly how much of a difference you are seeing. If it looks like both models converged AND the difference is more than just random sampling variation, then please reply here and/or update your question (as that shouldn't happen ... i.e. we may need to look into the possibility of a bug in JAGS).
Thanks,
Matt
--------- Edit following additional information added to the question --------
Based on what you have written, it does seem that the difference in inference exceeds what might be expected due to random variation, so there may be some kind of underlying issue here. In order to diagnose this further we would need a minimal reproducible example (https://stackoverflow.com/help/minimal-reproducible-example). This means that you would need to provide not only the model (or preferably a simplified model that still exhibits the problem) but also some data to which we can fit the model. If your data are too sensitive to share then this could be a fictitious dataset for which you also see a difference between JAGS 4.2.0 and JAGS 4.3.0.
The official help forum for JAGS is at https://sourceforge.net/p/mcmc-jags/discussion/610037/ - so you can certainly post there, although we would still need a minimal reproducible example to be able to do anything. If you do so, then please update both posts with a link to the other so that anyone reading either post knows about the cross-posting. You should also note that R2jags is not officially supported on the JAGS forums, so please provide the minimal reproducible example using plain rjags code (or runjags if you prefer) rather than using the R2jags wrapper.
To answer your question in the comments: in order to obtain information on the samplers used you can use rjags::list.samplers() eg:
library(rjags)
# LINE is just a small example model built into rjags:
data(LINE)
LINE$recompile()
# $`bugs::ConjugateGamma`
# [1] "tau"
# $`bugs::ConjugateNormal`
# [1] "alpha"
# $`bugs::ConjugateNormal`
# [1] "beta"

Can I tell JAGS to re-start automatically after failure with initial values?

My model failed with the following error:
Compiling rjags model...
Error: The following error occured when compiling and adapting the model using rjags:
Error in rjags::jags.model(model, data = dataenv, inits = inits, n.chains = length(runjags.object$end.state), :
Error in node Y[34,10]
Observed node inconsistent with unobserved parents at initialization.
Try setting appropriate initial values.
I have done some diagnosis and found that there was a problem with initial values in chain 3. However, this can happen from time to time. Is there any way to tell run.jags or JAGS itself to re-try and re-run the model in such cases? For example, to tell him to make another N attempts to initialize the model properly. That would be very logical thing to do instead of just failing. Or do I have to do it manually with some tryCatch thing?
P.S.: note that I am currently using run.jags to run JAGS from R.
There is no facility for that provided within runjags, but it would be fairly simple to write yourself like so:
success <- FALSE
while(!success){
s <- try(results <- run.jags(...))
success <- class(s)!='try-error'
}
results
[Note that if this model NEVER works, the loop will never stop!]
A better idea might be to specify an initial values function/list that provides initial values that are guaranteed to work (if possible).
In runjags version 2, it will be possible to recover successful simulations when some simulations have crashed, so if you ran (say) 5 chains in parallel then if 1 or 2 crashed you would still have 3 or 4. That should be released within the next couple of weeks, and contains a large number of other improvements.
Usually when this error occurs it is an indication of a serious underlying problem. I don't think a strategy of "try again" is useful in general (and especially because default initial values are deterministic).
The default initial values generated by JAGS are given by a "typical" value from the prior distribution (e.g. mean, median, or mode). If it turns out that this is inconsistent with the data then there are typically two possible causes:
A posteriori constraints that need to be taken into account, such as
when modelling censored survival data with the dinterval
distribution
Prior-data conflict, e.g. the prior mean is so far
away from the value supported by the data that it has zero
likelihood.
These problems remain the same when you are supplying your own initial values.
If you think you can generate good initial values most of the time, with occasional failures, then it might be worth repeated attempts inside a call to try() but I think this is an unusual case.

Calculating Gelman and Rubins convergence statistic for only a subset of iterations (coda package)

I am trying to calculate Gelman and Rubin's convergence diagnostic for a JAGS analysis I am currently running in R using the R package rjags.
For example, I would like to assess the convergence diagnostic for my parameter beta. To do this, I am using the library coda and the command:
library(coda)
gelman.diag(out_2MCMC$beta)
with out_2MCMC being an MCMC list object with more than one chain, resulting in correct output with no error messages or whatsoever. However, as I used a high amount of iterations as burn-in, I would like to calculate the convergence diagnostic for only a subset of iterations (the part after the burn in only!).
To do this, I tried:
gelman.diag(out_2MCMC$beta[,10000:15000,])
This resulted in the following error:
Error in mcmc.list(x) : Arguments must be mcmc objects
Therefore, I tried:
as.mcmc(out_2MCMC$beta[,10000:15000,]))
But, surprisingly this resulted in the following error:
Error in gelman.diag(as.mcmc(out_2MCMC$beta[, 10000:15000,]))
You need at least two chains
As this is the same MCMC list object as I get from the JAGS analysis and the same as I am using when I am assessing the convergence diagnostic for all iterations (which works just perfect), I don't see the problem here.
The function itself only provides the option to use the second half of the series (iterations) in the computation of the convergence diagnostic. As my burn in phase is longer than that, this is unfortunately not enough for me.
I guess that it is something very obvious that I am just missing. Does anyone have any suggestions or tipps?
As it is a lot of code, I did not provide R code to run a full 2MCMC-JAGS analysis. I hope that the code above illustrates the problem well enough, maybe someone encountered the same problem before or recognizes any mistakes in my syntax. If you however feel that the complete code is necessary to understand my problem I can still provide example code that runs a 2MCM JAGS analysis.
I was looking for a solution to the same problem and found that the window() function in the stats package will accomplish the task. For your case:
window(out_2MCMC, 100001,15000)
will return an mcmc.list object containing the last 5,000 samples for all monitored parameters and updated attributes for the new list.

R package "spatstat": How do you get standard errors for non-smooth terms in a poisson process model (function: ppm) when use.gam=TRUE?

In the R package spatstat (I am using the current version, 1.31-0) , there is an option use.gam. When you set this to true, you can include smooth terms in the linear predictor, the same way you do with the R package mgcv. For example,
g <- ppm(nztrees, ~1+s(x,y), use.gam=TRUE)
Now, if I want a confidence interval for the intercept, you can usually use summary or vcov, which works when you don't use gam but fails when you do use gam
vcov(g)
which gives the error message
Error in model.frame.default(formula = fmla, data =
list(.mpl.W = c(7.09716796875, :invalid type (list) for variable 's(x, y)'
I am aware that this standard error approximation here is not justified when you use gam, but this is captured by the warning message:
In addition: Warning message: model was fitted by gam();
asymptotic variance calculation ignores this
I'm not concerned about this - I am prepared to justify the use of these standard errors for the purpose I'm using them - I just want the numbers and would like to avoid "writing-my-own" to do so.
The error message I got above does not seem to depend on the data set I'm using. I used the nztrees example here because I know it comes pre-loaded with spatstat. It seems like it's complaining about the variable itself, but the model clearly understands the syntax since it fits the model (and the predicted values, for my own dataset, look quite good, so I know it's not just pumping out garbage).
Does anybody have any tips or insights about this? Is this a bug? To my surprise, I've been unable to find any discussion of this online. Any help or hints are appreciated.
Edit: Although I have definitively answered my own question here, I will not accept my answer for the time being. That way, if someone is interested and willing to put in the effort to find a "workaround" for this without waiting for the next edition of spatstat, I can award the bounty to him/her. Otherwise, I'll just accept my own answer at the end of the bounty period.
I have contacted one of the package authors, Adrian Baddeley, about this. He promptly responded and let me know that this is indeed a bug with the software and, apparently, I am the first person to encounter it. Fortunately, it only took him a short time to track down the issue and correct it. The fix will be included in the next release of spatstat, 1.31-1.
Edit: The updated version of spatstat has been released and does not have this bug anymore:
g <- ppm(nztrees, ~1+s(x,y), use.gam=TRUE)
sqrt( vcov(g)[1,1] )
[1] 0.1150982
Warning message:
model was fitted by gam(); asymptotic variance calculation ignores this
See the spatstat website for other release notes. Thanks to all who read and participated in this thread!
I'm not sure you can specify the trend the way you have which is possibly what is causing the error. It doesn't seem to make sense according to the documentation:
The default formula, ~1, indicates the model is stationary and no trend is to be fitted.
But instead you can specify the model like so:
g <- ppm(nztrees, ~x+y, use.gam=TRUE)
#Then to extract the coefficientss:
>coef(g)
(Intercept) x y
-5.0346019490 0.0013582470 -0.0006416421
#And calculate their se:
vc <- vcov(g)
se <- sqrt(diag(vc))
> se
(Intercept) x y
0.264854030 0.002244702 0.003609366
Does this make sense/expected result? I know that the package authors are very active on the r-sig-geo mailing lsit as they have helped me in the past. You may also want to post your question to that mailing list, but you should reference your question here when you do.

convergence error codes in nlminb -- where stored?

I am building a Monte Carlo simulation for the purpose of power estimation, and I need to run 10,000 iterations, each of which involves fitting a bunch of mixed linear & logistic models to data I generate. Once in a blue moon I get an error like this:
nlminb problem, convergence error code = 1 ; message = iteration limit
reached without convergence
I gather from Googling the error that this is common and probably a function of my data (since it does not happen on most runs through the simulation program). However, it is a pain because it makes my simulation crash and I can lose days worth of runtime. I would like to make the program more robust by adding some error-handling to it, but I don't know where the "convergence error code" is stored, if anywhere.
Checking the manual pages for lme, lmeObject, and nlminb didn't really help. Any ideas?
That sounds more like a warning than an error. The "convergence" element of the list that nlminb returns will be 0 for successful convergence. I would ask whether you might want to increase the "iter.max" element in the control list. This information is on the help page.

Resources