Related
UPDATE: Now with Traceplot example
UPDATE: Now with new traceplot
I am trying to adapt Outhwaite et. als 2018 code for occupancy modelling and have a couple of questions that I just can't seem to find an answer for...
Code used to create model
cat(
"model{
### Model ###
# State model
for (i in 1:nsite){
for (t in 1:nyear){
z[i,t] ~ dbern(psi[i,t])
logit(psi[i,t])<- b[t] + u[i]
}}
# Observation model
for(j in 1:nvisit) {
y[j] ~ dbern(Py[j]+0.0001)
Py[j]<- z[Site[j],Year[j]]*p[j]
logit(p[j]) <- a[Year[j]] + c*logL[j]
}
### Priors ###
# State model priors
for(t in 1:nyear){
b[t] ~ dunif(-10,10) # fixed year effect
}
for (i in 1:nsite) {
u[i] ~ dnorm(0, tau.u) # random site effect
}
tau.u <- 1/(sd.u * sd.u)
sd.u ~ dunif(0, 5) # half-uniform hyperpriors
# Observation model priors
for (t in 1:nyear) {
a[t] ~ dnorm(mu.a, tau.a) # random year effect
}
mu.a ~ dnorm(0, 0.01)
tau.a <- 1 / (sd.a * sd.a)
sd.a ~ dunif(0, 5) # half-uniform hyperpriors
c ~ dunif(-10, 10) # sampling effort effect
### Derived parameters ###
# Finite sample occupancy - proportion of occupied sites
for (t in 1:nyear) {
psi.fs[t] <- sum(z[1:nsite,t])/nsite
}
#data# nyear, nsite, nvisit, y, logL, Site, Year
}", file="bmmodel.txt"
)
Note that dbern(Py[j]+0.0001) includes a correction factor since dbern(0) is not supported in JAGS.
I am running the model on some plant data just basically trying it out to see if it runs and converges and behaves as I would expect it to.
Question number 1(ANSWERED): I am interested in the quantity psi.fs[t]. But since the model calculates this quantity after the actual modelling process, can convergence be assessed for psi.fs[t]?
R code for running model with R2JAGS
jagsrespsi<-jags(data.list, inits=test.inits,
n.chains=2, n.iter=15000, n.thin=3,
DIC=T,
model.file=paste0(modeltype,"model.txt"), parameters.to.save=c("psi.fs"))
Question number 2: When I use traceplot(jagsrespsi) to plot the traceplot seems all over the place but the Rhat for jagsrespsi$BUGSoutput is 1 for all my years? gelman.diag(as.mcmc(jagsrespsi)) also indicates convergence. Same goes for monitoring psi!
I am very astonished by this model behaviour and am suspecting there is something wrong... but no idea where to look
Yes, you can check psi.ft[] for convergence in exactly the same way as you check the convergence of the model's parameters. That's exactly what happens, for example, in a logistic regression, where the fitted probabilities of response are calculated as exp(z)/(1 + exp(z)) for some linear predictor z.
When you say the traceplot is "all over the place", what do you mean? This could be either good or bad. Can you show an example? A "good" traceplot looks like a "fat, hairy caterpillar": consecutive samples taken from all regions of the sample space, a horizontal hair ball. Although written for SAS, this page gives a reasonable high level description of what a good trace plot looks like, and what problems might be indicated by less-than-ideal examples.
In response to your edit to include the trace plot...
That doesn't look like a particularly good traceplot to me: there seems to be some negative autocorrelation between successive samples. Have you calculated the effective sample size [ESS]?
But the plot may look a little odd because your chain is very short, IMHO. You can use the ESS to provide a very rough approximation for the accuracy of an estimated probability: the worst case half width CI of a binomial proportion is +/-2 * sqrt(0.5*0.5/N), where N is the sample size (or ESS in this case). So even if the efficiency of your MCMC process is 1 - so that the ESS is equal to the chain length - then the accuracy of your estimates is only +/-0.02. To estimate a probability to 2 decimal places (so that the half width of the CI is no more than 0.005), you need an ESS of 40,000.
There's nothing wrong with using short chain lengths during testing, but for "production" runs, then I would always use a chan length much greater than 2,500. (And I'd also use multiple chains so that I can use Gelman-Rubin statistics to test for convergence.)
My goal is to basically migrate this code to R.
All the preprocessing wrt datasets has been already done, now however I am stuck in writing the "model" file. As a first attempt, and for the sake of clarity, I wrote the code which is shown below in R language.
What I want to do is to run an MCMC to have an estimate of the parameter R_t, given the daily reported data for Italian Country.
The main steps that have been pursued are:
Sample an array parameter, namely the log(R_t), from a Gaussian RW distribution
Gauss_RandomWalk <- function(N, x0, mu, variance) {
z <- cumsum(rnorm(n=N, mean=mu, sd=sqrt(variance)))
t <- 1:N
x <- (x0 + t*mu + z)
return(x)
}
log_R_t <- Gauss_RandomWalk(tot_dates, 0., 0., 0.035**2)
R_t_candidate <- exp(log_R_t)
Compute some quantities, that are function of this sampled parameters, namely the number of infections. This dependence is quite simple, since it is linear algebra:
infections <- rep(0. , tot_dates)
infections[1] <- exp(seed)
for (t in 2:tot_dates){
infections[t] <- sum(R_t_candidate * infections * gt_to_convolution[t-1,])
}
Convolve the array I have just computed with a delay distribution (onset+reporting delay), finally rescaling it by the exposure variable:
test_adjusted_positive <- convolve(infections, delay_distribution_df$density, type = "open")
test_adjusted_positive <- test_adjusted_positive[1:tot_dates]
positive <- round(test_adjusted_positive*exposure)
Compute the Likelihood, which is proportional to the probability that a certain set of data was observed (i.e. daily confirmed cases), by sampling the aforementioned log(R_t) parameter from which the variable positive is computed.
likelihood <- dnbinom(round(Italian_data$daily_confirmed), mu = positive, size = 1/6)
Finally, here we come to my BUGS model file:
model {
#priors as a Gaussian RW
log_rt[1] ~ dnorm(0, 0.035)
log_rt[2] ~ dnorm(0, 0.035)
for (t in 3:tot_dates) {
log_rt[t] ~ dnorm(log_rt[t-1] + log_rt[t-2], 0.035)
R_t_candidate[t] <- exp(log_rt[t])
}
# data likelihood
for (t in 2:tot_dates) {
infections[t] <- sum(R_t_candidate * infections * gt_to_convolution[t-1,])
}
test_adjusted_positive <- convolve(infections, delay_distribution)
test_adjusted_positive <- test_adjusted_positive[1:tot_dates]
positive <- test_adjusted_positive*exposure
for (t in 2:tot_dates) {
confirmed[t] ~ dnbinom( obs[t], positive[t], 1/6)
}
}
where gt_to_convolution is a constant matrix, tot_dates is a constant value and exposure is a constant array.
When trying to compile it through:
data <- NULL
data$obs <- round(Italian_data$daily_confirmed)
data$tot_dates <- n_days
data$delay_distribution <- delay_distribution_df$density
data$exposure <- exposure
data$gt_to_convolution <- gt_to_convolution
inits <- NULL
inits$log_rt <- rep(0, tot_dates)
library (rjags)
library (coda)
set.seed(1995)
model <- "MyModel.bug"
jm <- jags.model(model , data, inits)
It raises the following raising error:
Compiling model graph
Resolving undeclared variables
Allocating nodes
Deleting model
Error in jags.model(model, data, inits) : RUNTIME ERROR:
Compilation error on line 19.
Possible directed cycle involving test_adjusted_positive
Hence I am not even able to debug it a little, even though I'm pretty sure there is something wrong more in general but I cannot figure out what and why.
At this point, I think the best choice would be to implement a Metropolis Algorithm myself according to the likelihood above, but obviously, I would way much more prefer to use an already tested framework that is BUGS/JAGS, this is the reason why I am asking for help.
Assuming this ís my Bayesian model, how can i calculate the expected value of my Weibull distribution? Is there a command for finding the expected value of the Weibull distribution in R and JAGS? Thanks
model{
#likelihood function
for (i in 1:n)
{
t[i] ~ dweib(v,lambda)#MTBF
}
#Prior for MTBF
v ~ dgamma(0.0001, 0.0001)
lambda ~ dgamma(0.0001, 0.0001)
}
#inits
list(v=1, lambda=1,mu=0,tau=1)
#Data
list(n=10, t=c(5.23333333,8.95,8.6,230.983333,1.55,85.1,193.033333,322.966667,306.716667,1077.8)
The mean, or expected value, of the Weibull distribution using the moment of methods with parameters v and lambda, is:
lambda * Gamma(1 + 1/v)
JAGS does not have the Gamma function, but we can use a work around with a
function that is does have: logfact. You can add this line to your code and track the derived parameter exp_weibull.
exp_weibull <- lambda * exp(logfact(1/v))
Gamma is just factorial(x - 1), so the mean simplifies a bit. I illustrate
below with some R functions how this derivation is the same.
lambda <- 5
v <- 2
mu_traditional <- lambda * gamma(1 + 1/v)
mu_logged <- lambda * exp(lfactorial(1/v))
identical(mu_traditional, mu_logged)
[1] TRUE
EDIT:
It seems like JAGS also has the log of the Gamma distribution as well: loggam. Thus, another solution would be
exp_weibull <- lambda * exp(loggam(1 + 1/v))
My understanding is that the parameterization of the Weibull distribution used by JAGS is different from that used by dweibull in R. I believe the JAGS version uses shape, v and rate lambda with an expected value of lambda^{-1/v}*gamma(1+1/v). Thus, I've implemented the expected value in JAGS as lambda^(-1/v)*exp(loggam(1+(1/v))). Interested if others disagree, admittedly I've had a tough time tracking which parameterization is used and how the expected value is formulated, especially give some of the interchangeability in symbols used for different parameters in different formulations!
I have the following latent variable model: Person j has two latent variables, Xj1 and Xj2. The only thing we get to observe is their maximum, Yj = max(Xj1, Xj2). The latent variables are bivariate normal; they each have mean mu, variance sigma2, and their correlation is rho. I want to estimate the three parameters (mu, sigma2, rho) using only Yj, with data from n patients, j = 1,...,n.
I've tried to fit this model in JAGS (so I'm putting priors on the parameters), but I can't get the code to compile. Here's the R code I'm using to call JAGS. First I generate the data (both latent and observed variables), given some true values of the parameters:
# true parameter values
mu <- 3
sigma2 <- 2
rho <- 0.7
# generate data
n <- 100
Sigma <- sigma2 * matrix(c(1, rho, rho, 1), ncol=2)
X <- MASS::mvrnorm(n, c(mu,mu), Sigma) # n-by-2 matrix
Y <- apply(X, 1, max)
Then I define the JAGS model, and write a little function to run the JAGS sampler and return the samples:
# JAGS model code
model.text <- '
model {
for (i in 1:n) {
Y[i] <- max(X[i,1], X[i,2]) # Ack!
X[i,1:2] ~ dmnorm(X_mean, X_prec)
}
# mean vector and precision matrix for X[i,1:2]
X_mean <- c(mu, mu)
X_prec[1,1] <- 1 / (sigma2*(1-rho^2))
X_prec[2,1] <- -rho / (sigma2*(1-rho^2))
X_prec[1,2] <- X_prec[2,1]
X_prec[2,2] <- X_prec[1,1]
mu ~ dnorm(0, 1)
sigma2 <- 1 / tau
tau ~ dgamma(2, 1)
rho ~ dbeta(2, 2)
}
'
# run JAGS code. If latent=FALSE, remove the line defining Y[i] from the JAGS model
fit.jags <- function(latent=TRUE, data, n.adapt=1000, n.burnin, n.samp) {
require(rjags)
if (!latent)
model.text <- sub('\n *Y.*?\n', '\n', model.text)
textCon <- textConnection(model.text)
fit <- jags.model(textCon, data, n.adapt=n.adapt)
close(textCon)
update(fit, n.iter=n.burnin)
coda.samples(fit, variable.names=c("mu","sigma2","rho"), n.iter=n.samp)[[1]]
}
Finally, I call JAGS, feeding it only the observed data:
samp1 <- fit.jags(latent=TRUE, data=list(n=n, Y=Y), n.burnin=1000, n.samp=2000)
Sadly this results in an error message: "Y[1] is a logical node and cannot be observed". JAGS does not like me using "<-" to assign a value to Y[i] (I denote the offending line with an "Ack!"). I understand the complaint, but I'm not sure how to rewrite the model code to fix this.
Also, to demonstrate that everything else (besides the "Ack!" line) is fine, I run the model again, but this time I feed it the X data, pretending that it's actually observed. This runs perfectly and I get good estimates of the parameters:
samp2 <- fit.jags(latent=FALSE, data=list(n=n, X=X), n.burnin=1000, n.samp=2000)
colMeans(samp2)
If you can find a way to program this model in STAN instead of JAGS, that would be fine with me.
Theoretically you can implement a model like this in JAGS using the dsum distribution (which in this case uses a bit of a hack as you are modelling the maximum and not the sum of the two variables). But the following code does compile and run (although it does not 'work' in any real sense - see later):
set.seed(2017-02-08)
# true parameter values
mu <- 3
sigma2 <- 2
rho <- 0.7
# generate data
n <- 100
Sigma <- sigma2 * matrix(c(1, rho, rho, 1), ncol=2)
X <- MASS::mvrnorm(n, c(mu,mu), Sigma) # n-by-2 matrix
Y <- apply(X, 1, max)
model.text <- '
model {
for (i in 1:n) {
Y[i] ~ dsum(max_X[i])
max_X[i] <- max(X[i,1], X[i,2])
X[i,1:2] ~ dmnorm(X_mean, X_prec)
ranks[i,1:2] <- rank(X[i,1:2])
chosen[i] <- ranks[i,2]
}
# mean vector and precision matrix for X[i,1:2]
X_mean <- c(mu, mu)
X_prec[1,1] <- 1 / (sigma2*(1-rho^2))
X_prec[2,1] <- -rho / (sigma2*(1-rho^2))
X_prec[1,2] <- X_prec[2,1]
X_prec[2,2] <- X_prec[1,1]
mu ~ dnorm(0, 1)
sigma2 <- 1 / tau
tau ~ dgamma(2, 1)
rho ~ dbeta(2, 2)
#data# n, Y
#monitor# mu, sigma2, rho, tau, chosen[1:10]
#inits# X
}
'
library('runjags')
results <- run.jags(model.text)
results
plot(results)
Two things to note:
JAGS isn't smart enough to initialise the matrix of X while satisfying the dsum(max(X[i,])) constraint on its own - so we have to initialise X for JAGS using sensible values. In this case I'm using the simulated values which is cheating - the answer you get is highly dependent on the choice of initial values for X, and in the real world you won't have the simulated values to fall back on.
The max() constraint causes problems to which I can't think of a solution within a general framework: unlike the usual dsum constraint that allows one parameter to decrease while the other increases and therefore both parameters are used at all times, the min() value of X[i,] is ignored and the sampler is therefore free to do as it pleases. This will very very rarely (i.e. never) lead to values of min(X[i,]) that happen to be identical to Y[i], which is the condition required for the sampler to 'switch' between the two X[i,]. So switching never happens, and the X[] that were chosen at initialisation to be the maxima stay as the maxima - I have added a trace parameter 'chosen' which illustrates this.
As far as I can see the other potential solutions to the 'how do I code this' question will fall into essentially the same non-mixing trap which I think is a fundamental problem here (although I might be wrong and would very much welcome working BUGS/JAGS/Stan code that illustrates otherwise).
Solutions to the failure to mix are harder, although something akin to the Carlin & Chibb method for model selection may work (force a min(pseudo_X) parameter to be equal to Y to encourage switching). This is likely to be tricky to get working, but if you can get help from someone with a reasonable amount of experience with BUGS/JAGS you could try it - see:
Carlin, B.P., Chib, S., 1995. Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B 57, 473–484.
Alternatively, you could try thinking about the problem slightly differently and model X directly as a matrix with the first column all missing and the second column all equal to Y. You could then use dinterval() to set a constraint on the missing values that they must be lower than the corresponding maximum. I'm not sure how well this would work in terms of estimating mu/sigma2/rho but it might be worth a try.
By the way, I realise that this doesn't necessarily answer your question but I think it is a useful example of the difference between 'is it codeable' and 'is it workable'.
Matt
ps. A much smarter solution would be to consider the distribution of the maximum of two normal variates directly - I am not sure if such a distribution exists, but it it does and you can get a PDF for it then the distribution could be coded directly using the zeros/ones trick without having to consider the value of the minimum at all.
I believe you can model this in the Stan language treating the likelihood as a two component mixture with equal weights. The Stan code could look like
data {
int<lower=1> N;
vector[N] Y;
}
parameters {
vector<upper=0>[2] diff[N];
real mu;
real<lower=0> sigma;
real<lower=-1,upper=1> rho;
}
model {
vector[2] case_1[N];
vector[2] case_2[N];
vector[2] mu_vec;
matrix[2,2] Sigma;
for (n in 1:N) {
case_1[n][1] = Y[n]; case_1[n][2] = Y[n] + diff[n][1];
case_2[n][2] = Y[n]; case_2[n][1] = Y[n] + diff[n][2];
}
mu_vec[1] = mu; mu_vec[2] = mu;
Sigma[1,1] = square(sigma);
Sigma[2,2] = Sigma[1,1];
Sigma[1,2] = Sigma[1,1] * rho;
Sigma[2,1] = Sigma[1,2];
// log-likelihood
target += log_mix(0.5, multi_normal_lpdf(case_1 | mu_vec, Sigma),
multi_normal_lpdf(case_2 | mu_vec, Sigma));
// insert priors on mu, sigma, and rho
}
I am trying to evaluate themodel fit of several regressions in R, and I have run into a problem I have had multiple times now: the log-likelihood of my Poisson regression is infinite.
I'm using a non-integer dependent variable (Note: I know what I'm doing in this regard), and I'm wondering if maybe that's the problem. However, I don't get an infinite log-likelihood when running the regression with glm.nb.
Code to reproduce the issue is below.
Edit: the problem appears to go away when I coerce the DV to integer. Any idea how to get log likelihood from Poissons with non-integer DVs?
# Input Data
so_data <- data.frame(dv = c(21.0552722691125, 24.3061351414885, 7.84658638053276,
25.0294679770848, 15.8064731063311, 10.8171744654056, 31.3008088413026,
2.26643928259238, 18.4261153345417, 5.62915828161753, 17.0691184593063,
1.11959635820499, 30.0154935602592, 23.0000809735738, 28.4389825676123,
27.7678405415711, 23.7108405071757, 23.5070651053276, 14.2534787168392,
15.2058525068363, 19.7449094187771, 2.52384709295823, 29.7081691356397,
32.4723790240354, 19.2147002673637, 61.7911384519901, 10.5687170234821,
23.9047421013736, 18.4889651451222, 13.0360878554798, 15.1752866581849,
11.5205948111817, 31.3539840929108, 31.7255952728076, 25.3034625215724,
5.00013988265465, 30.2037887018226, 1.86123112349445, 3.06932041603219,
22.6739418581257, 6.33738321053804, 24.2933951601142, 14.8634827414491,
31.8302947881089, 34.8361908525564, 1.29606416941288, 13.206844629927,
28.843579313401, 25.8024295609021, 14.4414831628722, 18.2109680632694,
14.7092063453463, 10.0738043919183, 28.4124482962025, 27.1004208775326,
1.31350378236957, 14.3009307888745, 1.32555197766214, 2.70896028922312,
3.88043749517381, 3.79492216916016, 19.4507965653633, 32.1689088941444,
2.61278585713499, 41.6955885902228, 2.13466761675063, 30.4207256294235,
24.8231524369244, 20.7605955978196, 17.2182798298094, 2.11563574288652,
12.290778250655, 0.957467139696772, 16.1775287334746))
# Run Model
p_mod <- glm(dv ~ 1, data = so_data, family = poisson(link = 'log'))
# Be Confused
logLik(p_mod)
Elaborating on #ekstroem's comment: the Poisson distribution is only supported over the non-negative integers (0, 1, ...). So, technically speaking, the probability of any non-integer value is zero -- although R does allow for a little bit of fuzz, to allow for round-off/floating-point representation issues:
> dpois(1,lambda=1)
[1] 0.3678794
> dpois(1.1,lambda=1)
[1] 0
Warning message:
In dpois(1.1, lambda = 1) : non-integer x = 1.100000
> dpois(1+1e-7,lambda=1) ## fuzz
[1] 0.3678794
It is theoretically possible to compute something like a Poisson log-likelihood for non-integer values:
my_dpois <- function(x,lambda,log=FALSE) {
LL <- -lambda+x*log(lambda)-lfactorial(x)
if (log) LL else exp(LL)
}
but I would be very careful - some quick tests with integrate suggest it integrates to 1 (after I fixed the bug in it), but I haven't checked more carefully that this is really a well-posed probability distribution. (On the other hand, some reasonable-seeming posts on CrossValidated suggest that it's not insane ...)
You say "I know what I'm doing in this regard"; can you give some more of the context? Some alternative possibilities (although this is steering into CrossValidated territory) -- the best answer depends on where your data really come from (i.e., why you have "count-like" data that are non-integer but you think should be treated as Poisson).
a quasi-Poisson model (family=quasipoisson). (R will still not give you log-likelihood or AIC values in this case, because technically they don't exist -- you're supposed to do inference on the basis of the Wald statistics of the parameters; see e.g. here for more info.)
a Gamma model (probably with a log link)
if the data started out as count data that you've scaled by some measure of effort or exposure), use an appropriate offset model ...
a generalized least-squares model (nlme::gls) with an appropriate heteroscedasticity specification
Poisson log-likelihood involves calculating log(factorial(x)) (https://www.statlect.com/fundamentals-of-statistics/Poisson-distribution-maximum-likelihood). For values larger than 30 it has to be done using Stirling's approximation formula in order to avoid exceeding the limit of computer arithmetic. Sample code in Python:
# define a likelihood function. https://www.statlect.com/fundamentals-of- statistics/Poisson-distribution-maximum-likelihood
def loglikelihood_f(lmba, x):
#Using Stirling formula to avoid calculation of factorial.
#logfactorial(n) = n*ln(n) - n
n = x.size
logfactorial = x*np.log(x+0.001) - x #np.log(factorial(x))
logfactorial[logfactorial == -inf] = 0
result =\
- np.sum(logfactorial) \
- n * lmba \
+ np.log(lmba) * np.sum(x)
return result