When running a rather complicated model in JAGS via rjags, I get an error
Error: Error in node ttt3[126,509]
Current value is inconsistent with data
The strange thing is that I get this error after the model has initialized (including an adaptation period) and burned in for 50000 iterations. That is, jags.model() runs fine, update() runs fine, but coda.samples() returns the above error after several days of computation (I expect the model to take about 20 days to fit, if everything worked properly). So it seems that the MCMC algorithm is accepting a proposal for a posterior sample that JAGS then feels is inconsistent with the data, which I would have thought was impossible!
I would greatly appreciate any insight about what might be going on here. Unfortunately, I have no reproducible example other than my full model, which takes several days to fit. I can probably provide the full model specification and the data upon request. I don't even know for sure if my example is reproducible, though I have encountered the same error twice in a row (but presumably the error arises stochastically during the MCMC fitting?).
I've posted a bit more about the outlines of the model at Martyn's page here.
Apparently, it's a bug in JAGS. I keep getting the same thing, too. It's about the beta conjugate distribution.
Related
We want to run regressions on panel data, and use fixed-effects (to alleviate the problem of a firm- or time-effect) and censor the model at zero. However, I find little when googling the model, in terms of R code. Does anyone perhaps have any insights?
I will post a screenshot of our data, and of. the intended outcome.
We have tried the code described on this site: http://vps58738.lws-hosting.com/Rpkg/plm/reference/pldv.html
But, we are slightly unsure on what to code for all the specification (or parameters), and experience errors in the result. We are also unsure if this method described on the site is appropriate - it was just the only close thing we found.
I am running Bayesian Hierarchical Modeling in R using R2jags. When I open code I used a month ago and run it on a dataset I used a month ago (verified by "date modified" in windows explorer), I get different results than I got a month ago. The only difference I can think of is I got a new work computer in the last month and we installed JAGS 4.3.0. I was previously using 4.2.0.
Is it remotely possible to get different results just from updating my version of JAGS? I'm not posting code or results here because I don't need help troubleshooting it - everything is exactly the same.
Edit:
Conversion seems fine - Gewekes, autocorrelation plots, and trace plots look good. That hasn't changed.
I have a seed set both via set.seet () and jags.seed=. Is that enough? I've never had a problem replicating these types of results before.
As far as how different the results are, they are large enough to cause meaningful difference in the inference. I am assessing relationships between 30 chemical exposures and a health outcome in among 336 humans. Here are two examples. Chemical B troubles me the most because of the credible interval shift. Chemical A is another example.
I also doubled the number of iterations from 50k to 100k which resulted in very minor/inconsequential differences.
Edit 2:
I posted at source forge asking about the different default RNGs for versions: https://sourceforge.net/p/mcmc-jags/discussion/610037/thread/52bfef7d17/
There are at least 3 possible reasons for you seeing a difference between results from these models:
One or both of your attempts to fit this model did not converge, and/or your effective sample size is so small that random sampling error is having a large impact on your inference. If you have already checked to ensure convergence and sufficient effective sample size (for both models) then you can rule this out.
You are seeing small differences in the posteriors due to the random sampling inherent to MCMC in otherwise converged results. If these differences are big enough to cause a meaningful difference in inference then your effective sample size is not high enough - so just run the models for longer and the difference should reduce. You can also set the random seed in JAGS using initial values for .RNG.seed and .RNG.name so that successive model runs are numerically identical. If you run the models for longer and this difference does not reduce (or if it is a large difference to begin with) then you can rule this out.
Your model contains a node for which the default sampling scheme changed between JAGS 4.2.0 and 4.3.0 - there were some changes to sampling schemes (and the order of precedence for assigning samplers to nodes) that could conceivably have changed your results (from memory I think this affected GLM particularly, but I can't remember exactly). However, although this may affect the probability of convergence, it should not substantially affect the posterior if the model does converge. It may be contributing to a numerical difference as explained for point (2) though.
I'd recommend first ensuring convergence of both models, and then (assuming they did both converge) looking at exactly how much of a difference you are seeing. If it looks like both models converged AND the difference is more than just random sampling variation, then please reply here and/or update your question (as that shouldn't happen ... i.e. we may need to look into the possibility of a bug in JAGS).
Thanks,
Matt
--------- Edit following additional information added to the question --------
Based on what you have written, it does seem that the difference in inference exceeds what might be expected due to random variation, so there may be some kind of underlying issue here. In order to diagnose this further we would need a minimal reproducible example (https://stackoverflow.com/help/minimal-reproducible-example). This means that you would need to provide not only the model (or preferably a simplified model that still exhibits the problem) but also some data to which we can fit the model. If your data are too sensitive to share then this could be a fictitious dataset for which you also see a difference between JAGS 4.2.0 and JAGS 4.3.0.
The official help forum for JAGS is at https://sourceforge.net/p/mcmc-jags/discussion/610037/ - so you can certainly post there, although we would still need a minimal reproducible example to be able to do anything. If you do so, then please update both posts with a link to the other so that anyone reading either post knows about the cross-posting. You should also note that R2jags is not officially supported on the JAGS forums, so please provide the minimal reproducible example using plain rjags code (or runjags if you prefer) rather than using the R2jags wrapper.
To answer your question in the comments: in order to obtain information on the samplers used you can use rjags::list.samplers() eg:
library(rjags)
# LINE is just a small example model built into rjags:
data(LINE)
LINE$recompile()
# $`bugs::ConjugateGamma`
# [1] "tau"
# $`bugs::ConjugateNormal`
# [1] "alpha"
# $`bugs::ConjugateNormal`
# [1] "beta"
I am running a regression with lots of regressors (due to multiple interactions). R has been evaluating this regression for over 6 hours now and I keep receiving messages like:
...centering vec 2 i:6356053 c:2.0e-007 d:3.2e-001(t:2.1e-006)
ETA:7/21/2016 5:43:18 PM
I couldn't find anything about this type of message on the web, does anyone know what it means?
The error originates from the C source for lfe (line 214). According to the source:
i is the number of iterations you've gone through to try to converge on a solution (so in the OP's case, 6.35 million...)
c is your convergence rate for a given iteration
d is delta
t is your target
The message itself appears to be a primitive progress bar to keep you apprised of how many iterations you've done and the ETA of a result. In this case, it appears you are having troubles converging on a solution. The reason for why this is is be a better question for Cross Validated and you would be best served by providing them with a minimal reproducible example.
A good start on your own would be to carefully read the lfe documentation and vignettes. There is an entire vignette about speed that suggests other functions you may want to try (apart from felm) if your procedure is going slowly or not converging.
My model failed with the following error:
Compiling rjags model...
Error: The following error occured when compiling and adapting the model using rjags:
Error in rjags::jags.model(model, data = dataenv, inits = inits, n.chains = length(runjags.object$end.state), :
Error in node Y[34,10]
Observed node inconsistent with unobserved parents at initialization.
Try setting appropriate initial values.
I have done some diagnosis and found that there was a problem with initial values in chain 3. However, this can happen from time to time. Is there any way to tell run.jags or JAGS itself to re-try and re-run the model in such cases? For example, to tell him to make another N attempts to initialize the model properly. That would be very logical thing to do instead of just failing. Or do I have to do it manually with some tryCatch thing?
P.S.: note that I am currently using run.jags to run JAGS from R.
There is no facility for that provided within runjags, but it would be fairly simple to write yourself like so:
success <- FALSE
while(!success){
s <- try(results <- run.jags(...))
success <- class(s)!='try-error'
}
results
[Note that if this model NEVER works, the loop will never stop!]
A better idea might be to specify an initial values function/list that provides initial values that are guaranteed to work (if possible).
In runjags version 2, it will be possible to recover successful simulations when some simulations have crashed, so if you ran (say) 5 chains in parallel then if 1 or 2 crashed you would still have 3 or 4. That should be released within the next couple of weeks, and contains a large number of other improvements.
Usually when this error occurs it is an indication of a serious underlying problem. I don't think a strategy of "try again" is useful in general (and especially because default initial values are deterministic).
The default initial values generated by JAGS are given by a "typical" value from the prior distribution (e.g. mean, median, or mode). If it turns out that this is inconsistent with the data then there are typically two possible causes:
A posteriori constraints that need to be taken into account, such as
when modelling censored survival data with the dinterval
distribution
Prior-data conflict, e.g. the prior mean is so far
away from the value supported by the data that it has zero
likelihood.
These problems remain the same when you are supplying your own initial values.
If you think you can generate good initial values most of the time, with occasional failures, then it might be worth repeated attempts inside a call to try() but I think this is an unusual case.
I am trying to calculate Gelman and Rubin's convergence diagnostic for a JAGS analysis I am currently running in R using the R package rjags.
For example, I would like to assess the convergence diagnostic for my parameter beta. To do this, I am using the library coda and the command:
library(coda)
gelman.diag(out_2MCMC$beta)
with out_2MCMC being an MCMC list object with more than one chain, resulting in correct output with no error messages or whatsoever. However, as I used a high amount of iterations as burn-in, I would like to calculate the convergence diagnostic for only a subset of iterations (the part after the burn in only!).
To do this, I tried:
gelman.diag(out_2MCMC$beta[,10000:15000,])
This resulted in the following error:
Error in mcmc.list(x) : Arguments must be mcmc objects
Therefore, I tried:
as.mcmc(out_2MCMC$beta[,10000:15000,]))
But, surprisingly this resulted in the following error:
Error in gelman.diag(as.mcmc(out_2MCMC$beta[, 10000:15000,]))
You need at least two chains
As this is the same MCMC list object as I get from the JAGS analysis and the same as I am using when I am assessing the convergence diagnostic for all iterations (which works just perfect), I don't see the problem here.
The function itself only provides the option to use the second half of the series (iterations) in the computation of the convergence diagnostic. As my burn in phase is longer than that, this is unfortunately not enough for me.
I guess that it is something very obvious that I am just missing. Does anyone have any suggestions or tipps?
As it is a lot of code, I did not provide R code to run a full 2MCMC-JAGS analysis. I hope that the code above illustrates the problem well enough, maybe someone encountered the same problem before or recognizes any mistakes in my syntax. If you however feel that the complete code is necessary to understand my problem I can still provide example code that runs a 2MCM JAGS analysis.
I was looking for a solution to the same problem and found that the window() function in the stats package will accomplish the task. For your case:
window(out_2MCMC, 100001,15000)
will return an mcmc.list object containing the last 5,000 samples for all monitored parameters and updated attributes for the new list.