Truncated normal pdf - stan

According to the documentation, I can deal with truncated normal distributions with unknown truncation points by doing y[n] ~ normal(mu, sigma) T[L,U]; where L and U are parameters.
However, what is the normal_lpdf version of y[n] ~ normal(mu, sigma) T[L,U]? I'm working on an HMM and if I use the forward algorithm as described in the semisupervised estimation on this page, I need to do something similar to acc[j] = gamma[t-1, j] + log(theta[j, k]) + log(phi[k, u[t]]); but in my case I need to replace log(phi[k, u[t]]) with the log of a truncated normal pdf.

Related

How can I use RStan without for loops?

Is there a way to more efficiently perform the following calculations in RStan?
I have only provided the minimal amount of coded that is needed:
parameters {
real beta_0;
real beta_1;
}
model {
vector [n] p_i = exp(beta_0 + beta_1*x)/[1 + exp(beta_0 + beta_1*x)];
y ~ bernoulli(p_i);
/* Likelihood:
for(i in 1:n){
p_i[i] = exp(beta_0 + beta_1*x[i])/(1 + exp(beta_0 + beta_1*x[i]));
y[i] ~ bernoulli(p_i[i]);
*/}
// Prior:
beta_0 ~ normal(m_beta_0, s_beta_0);
beta_1 ~ normal(m_beta_1, s_beta_1);
}
I obtain the following error message: "Matrix expression elements must be type row_vector and row vector expression elements must be int or real, but found element of type vector". If I use the for loop (which is commented out), the code works fine, but I would like to limit the use of for loops in my code. In the above code, x, is a vector of length n.
Another example:
parameters {
real gamma1;
real gamma2;
real gamma3;
real gamma4;
}
model {
// Likelihood:
real lambda;
real beta;
real phi;
for(i in 1:n){
lambda = exp(gamma1)*x[n_length[i]]^gamma2;
beta = exp(gamma3)*x[n_length[i]]^gamma4;
phi = lambda^(-1/beta);
y[i] ~ weibull(beta, phi);
}
//y ~ weibull(exp(gamma1)*x^gamma2, exp(gamma3)*x^gamma4); //cannot raise a vector to a power
// Prior:
gamma1 ~ normal(m_gamma1, s_gamma1);
gamma2 ~ normal(m_gamma2, s_gamma2);
gamma3 ~ normal(m_gamma3, s_gamma3);
gamma4 ~ normal(m_gamma4, s_gamma4);
}
The above code works, but the commented out likelihood calculation does not work since I "cannot raise a vector to a power" (but you can in R). I would, once again, like to not be forced to use for loops. In the above code, n_length, is a vector of length n.
A final example. If I want to draw 10000 samples from a normal distribution in R, I can simply specify
rnorm(10000, mu, sigma)
But in RStan, I would have to use a for loop, for example
parameters {
real mu;
real sigma;
}
generated quantities {
vector[n] x;
for(i in 1:n) {
x[i] = normal_rng(mu, sigma);
}
}
Is there anything that I can do to speed up my RStan examples?
This line of code:
vector [n] p_i = exp(beta_0 + beta_1*x)/[1 + exp(beta_0 + beta_1*x)];
is not valid syntax in the Stan language because square brackets are only used for indexing. It could instead be
vector [n] p_i = exp(beta_0 + beta_1*x) ./ (1 + exp(beta_0 + beta_1*x));
which utilizes the elementwise division operator, or better yet
vector [n] p_i = inv_logit(beta_0 + beta_1*x);
in which case y ~ bernoulli(p_i); would work as a likelihood. Better still, just do
y ~ bernoulli_logit(beta_0 + beta_1 * x);
and it will do the transformation for you in a numerically stable fashion. You could also use bernoulli_logit_glm, which is slightly faster particularly with large datasets.
In Stan 2.19.x, I think you can draw N values from a probability distribution in the generated quantities block. But you are too worried about for loops. The Stan program is transpiled to C++ where loops are fast and almost all of the functions in the Stan language that accept vector inputs and produce vector outputs actually involve the same loop in C++ as if you had done the loop yourself.

Expected value command R and JAGS

Assuming this ís my Bayesian model, how can i calculate the expected value of my Weibull distribution? Is there a command for finding the expected value of the Weibull distribution in R and JAGS? Thanks
model{
#likelihood function
for (i in 1:n)
{
t[i] ~ dweib(v,lambda)#MTBF
}
#Prior for MTBF
v ~ dgamma(0.0001, 0.0001)
lambda ~ dgamma(0.0001, 0.0001)
}
#inits
list(v=1, lambda=1,mu=0,tau=1)
#Data
list(n=10, t=c(5.23333333,8.95,8.6,230.983333,1.55,85.1,193.033333,322.966667,306.716667,1077.8)
The mean, or expected value, of the Weibull distribution using the moment of methods with parameters v and lambda, is:
lambda * Gamma(1 + 1/v)
JAGS does not have the Gamma function, but we can use a work around with a
function that is does have: logfact. You can add this line to your code and track the derived parameter exp_weibull.
exp_weibull <- lambda * exp(logfact(1/v))
Gamma is just factorial(x - 1), so the mean simplifies a bit. I illustrate
below with some R functions how this derivation is the same.
lambda <- 5
v <- 2
mu_traditional <- lambda * gamma(1 + 1/v)
mu_logged <- lambda * exp(lfactorial(1/v))
identical(mu_traditional, mu_logged)
[1] TRUE
EDIT:
It seems like JAGS also has the log of the Gamma distribution as well: loggam. Thus, another solution would be
exp_weibull <- lambda * exp(loggam(1 + 1/v))
My understanding is that the parameterization of the Weibull distribution used by JAGS is different from that used by dweibull in R. I believe the JAGS version uses shape, v and rate lambda with an expected value of lambda^{-1/v}*gamma(1+1/v). Thus, I've implemented the expected value in JAGS as lambda^(-1/v)*exp(loggam(1+(1/v))). Interested if others disagree, admittedly I've had a tough time tracking which parameterization is used and how the expected value is formulated, especially give some of the interchangeability in symbols used for different parameters in different formulations!

LaplacesDemon: when should I sum prior density?

I am migrating from JAGS to LaplacesDemon and trying to rewrite some of my codes. I have read the LaplacesDemon Tutorial and LaplacesDemon Examples vignettes and am a bit confused about some of examples in the vignettes.
In the simple example in LaplacesDemon Tutorial (p.5), the model is written as:
Model <- function(parm, Data)
{beta <- parm[Data$pos.beta]
sigma <- interval(parm[Data$pos.sigma], 1e-100, Inf)
parm[Data$pos.sigma] <- sigma
beta.prior <- dnormv(beta, 0, 1000, log=TRUE)
sigma.prior <- dhalfcauchy(sigma, 25, log=TRUE)
mu <- tcrossprod(beta, Data$X)
LL <- sum(dnorm(Data$y, mu, sigma, log=TRUE))
LP <- LL + sum(beta.prior) + sigma.prior
Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP,
yhat=rnorm(length(mu), mu, sigma), parm=parm)
return(Modelout)}
Here, the beta.prior was summed up for LP as there are more than one beta parameters.
But I found in the more advanced examples in the LaplacesDemon Example vignette, it doesn't seem to always follow the rule. Such as in example 87 (p.162):
Model <- function(parm, Data)
{### Log-Prior
beta.prior <- sum(dnormv(beta[,1], 0, 1000, log=TRUE), dnorm(beta[,-1], beta[,-Data$T], matrix(tau, Data$K, Data$T-1), log=TRUE))
zeta.prior <- dmvn(zeta, rep(0,Data$S), Sigma[ , , 1], log=TRUE)
phi.prior <- sum(dhalfnorm(phi[1], sqrt(1000), log=TRUE), dtrunc(phi[-1], "norm", a=0, b=Inf, mean=phi[-Data$T], sd=sigma[2], log=TRUE))
### Log-Posterior
LP <- LL + beta.prior + zeta.prior + sum(phi.prior) + sum(kappa.prior) + sum(lambda.prior) + sigma.prior + tau.prior
Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP, yhat=rnorm(prod(dim(mu)), mu, sigma[1]), parm=parm)
return(Modelout)}
(Put only part of the codes owing to the length of the example codes)
Here, zeta is more than one but wasn't summed in either the Log-Prior or Log-Posterior part, beta is more than one and was summed in Log-Prior and phi is also more than one parameters but it was summed in both Log-Prior and Log-Posterior parts.
And in the next example on p.167, it seems to be different again.
I was wondering in what scenario we should sum the prior density? Many thanks!
Have you tried running the code line by line? You would learn that there is nothing to sum since dmvn is the density function of multivariate normal distribution and it returns a single value -- probability density of observing vector zeta. The reason for all the sums is that to obtain probability of observing two independent events together we multiply their marginal probabilities (or sum their logs). So we multiply the probabilities of observing all the priors together to obtain their joint distribution.

Using JAGS or STAN when an observed node is the max of latent nodes

I have the following latent variable model: Person j has two latent variables, Xj1 and Xj2. The only thing we get to observe is their maximum, Yj = max(Xj1, Xj2). The latent variables are bivariate normal; they each have mean mu, variance sigma2, and their correlation is rho. I want to estimate the three parameters (mu, sigma2, rho) using only Yj, with data from n patients, j = 1,...,n.
I've tried to fit this model in JAGS (so I'm putting priors on the parameters), but I can't get the code to compile. Here's the R code I'm using to call JAGS. First I generate the data (both latent and observed variables), given some true values of the parameters:
# true parameter values
mu <- 3
sigma2 <- 2
rho <- 0.7
# generate data
n <- 100
Sigma <- sigma2 * matrix(c(1, rho, rho, 1), ncol=2)
X <- MASS::mvrnorm(n, c(mu,mu), Sigma) # n-by-2 matrix
Y <- apply(X, 1, max)
Then I define the JAGS model, and write a little function to run the JAGS sampler and return the samples:
# JAGS model code
model.text <- '
model {
for (i in 1:n) {
Y[i] <- max(X[i,1], X[i,2]) # Ack!
X[i,1:2] ~ dmnorm(X_mean, X_prec)
}
# mean vector and precision matrix for X[i,1:2]
X_mean <- c(mu, mu)
X_prec[1,1] <- 1 / (sigma2*(1-rho^2))
X_prec[2,1] <- -rho / (sigma2*(1-rho^2))
X_prec[1,2] <- X_prec[2,1]
X_prec[2,2] <- X_prec[1,1]
mu ~ dnorm(0, 1)
sigma2 <- 1 / tau
tau ~ dgamma(2, 1)
rho ~ dbeta(2, 2)
}
'
# run JAGS code. If latent=FALSE, remove the line defining Y[i] from the JAGS model
fit.jags <- function(latent=TRUE, data, n.adapt=1000, n.burnin, n.samp) {
require(rjags)
if (!latent)
model.text <- sub('\n *Y.*?\n', '\n', model.text)
textCon <- textConnection(model.text)
fit <- jags.model(textCon, data, n.adapt=n.adapt)
close(textCon)
update(fit, n.iter=n.burnin)
coda.samples(fit, variable.names=c("mu","sigma2","rho"), n.iter=n.samp)[[1]]
}
Finally, I call JAGS, feeding it only the observed data:
samp1 <- fit.jags(latent=TRUE, data=list(n=n, Y=Y), n.burnin=1000, n.samp=2000)
Sadly this results in an error message: "Y[1] is a logical node and cannot be observed". JAGS does not like me using "<-" to assign a value to Y[i] (I denote the offending line with an "Ack!"). I understand the complaint, but I'm not sure how to rewrite the model code to fix this.
Also, to demonstrate that everything else (besides the "Ack!" line) is fine, I run the model again, but this time I feed it the X data, pretending that it's actually observed. This runs perfectly and I get good estimates of the parameters:
samp2 <- fit.jags(latent=FALSE, data=list(n=n, X=X), n.burnin=1000, n.samp=2000)
colMeans(samp2)
If you can find a way to program this model in STAN instead of JAGS, that would be fine with me.
Theoretically you can implement a model like this in JAGS using the dsum distribution (which in this case uses a bit of a hack as you are modelling the maximum and not the sum of the two variables). But the following code does compile and run (although it does not 'work' in any real sense - see later):
set.seed(2017-02-08)
# true parameter values
mu <- 3
sigma2 <- 2
rho <- 0.7
# generate data
n <- 100
Sigma <- sigma2 * matrix(c(1, rho, rho, 1), ncol=2)
X <- MASS::mvrnorm(n, c(mu,mu), Sigma) # n-by-2 matrix
Y <- apply(X, 1, max)
model.text <- '
model {
for (i in 1:n) {
Y[i] ~ dsum(max_X[i])
max_X[i] <- max(X[i,1], X[i,2])
X[i,1:2] ~ dmnorm(X_mean, X_prec)
ranks[i,1:2] <- rank(X[i,1:2])
chosen[i] <- ranks[i,2]
}
# mean vector and precision matrix for X[i,1:2]
X_mean <- c(mu, mu)
X_prec[1,1] <- 1 / (sigma2*(1-rho^2))
X_prec[2,1] <- -rho / (sigma2*(1-rho^2))
X_prec[1,2] <- X_prec[2,1]
X_prec[2,2] <- X_prec[1,1]
mu ~ dnorm(0, 1)
sigma2 <- 1 / tau
tau ~ dgamma(2, 1)
rho ~ dbeta(2, 2)
#data# n, Y
#monitor# mu, sigma2, rho, tau, chosen[1:10]
#inits# X
}
'
library('runjags')
results <- run.jags(model.text)
results
plot(results)
Two things to note:
JAGS isn't smart enough to initialise the matrix of X while satisfying the dsum(max(X[i,])) constraint on its own - so we have to initialise X for JAGS using sensible values. In this case I'm using the simulated values which is cheating - the answer you get is highly dependent on the choice of initial values for X, and in the real world you won't have the simulated values to fall back on.
The max() constraint causes problems to which I can't think of a solution within a general framework: unlike the usual dsum constraint that allows one parameter to decrease while the other increases and therefore both parameters are used at all times, the min() value of X[i,] is ignored and the sampler is therefore free to do as it pleases. This will very very rarely (i.e. never) lead to values of min(X[i,]) that happen to be identical to Y[i], which is the condition required for the sampler to 'switch' between the two X[i,]. So switching never happens, and the X[] that were chosen at initialisation to be the maxima stay as the maxima - I have added a trace parameter 'chosen' which illustrates this.
As far as I can see the other potential solutions to the 'how do I code this' question will fall into essentially the same non-mixing trap which I think is a fundamental problem here (although I might be wrong and would very much welcome working BUGS/JAGS/Stan code that illustrates otherwise).
Solutions to the failure to mix are harder, although something akin to the Carlin & Chibb method for model selection may work (force a min(pseudo_X) parameter to be equal to Y to encourage switching). This is likely to be tricky to get working, but if you can get help from someone with a reasonable amount of experience with BUGS/JAGS you could try it - see:
Carlin, B.P., Chib, S., 1995. Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B 57, 473–484.
Alternatively, you could try thinking about the problem slightly differently and model X directly as a matrix with the first column all missing and the second column all equal to Y. You could then use dinterval() to set a constraint on the missing values that they must be lower than the corresponding maximum. I'm not sure how well this would work in terms of estimating mu/sigma2/rho but it might be worth a try.
By the way, I realise that this doesn't necessarily answer your question but I think it is a useful example of the difference between 'is it codeable' and 'is it workable'.
Matt
ps. A much smarter solution would be to consider the distribution of the maximum of two normal variates directly - I am not sure if such a distribution exists, but it it does and you can get a PDF for it then the distribution could be coded directly using the zeros/ones trick without having to consider the value of the minimum at all.
I believe you can model this in the Stan language treating the likelihood as a two component mixture with equal weights. The Stan code could look like
data {
int<lower=1> N;
vector[N] Y;
}
parameters {
vector<upper=0>[2] diff[N];
real mu;
real<lower=0> sigma;
real<lower=-1,upper=1> rho;
}
model {
vector[2] case_1[N];
vector[2] case_2[N];
vector[2] mu_vec;
matrix[2,2] Sigma;
for (n in 1:N) {
case_1[n][1] = Y[n]; case_1[n][2] = Y[n] + diff[n][1];
case_2[n][2] = Y[n]; case_2[n][1] = Y[n] + diff[n][2];
}
mu_vec[1] = mu; mu_vec[2] = mu;
Sigma[1,1] = square(sigma);
Sigma[2,2] = Sigma[1,1];
Sigma[1,2] = Sigma[1,1] * rho;
Sigma[2,1] = Sigma[1,2];
// log-likelihood
target += log_mix(0.5, multi_normal_lpdf(case_1 | mu_vec, Sigma),
multi_normal_lpdf(case_2 | mu_vec, Sigma));
// insert priors on mu, sigma, and rho
}

How to obtain new samples from ZIP or ZINB-model for bayesian p-value

Hopefully someone can help me with this one, because I am really stuck and do not find my coding error!
I am fitting zero-inflated poisson / negative binomial GLMs (no random effects) in JAGS (with R2Jags) and everything is fine with the parameter estimates, priors, initial values and chains convergence. All results are perfectly in line with, e.g., the estimates from the pscl-package, including my calculation of pearson residuals in the model...
The only thing I cannot get to work is to sample from the model a new sample to obtain a bayesian p-value for evaluating the model fit. The "normal" poisson and negative binomial models I fit before all gave the expected replicated samples and no problems occured.
Here's my code so far, but the important part is "#New Samples":
model{
# 1. Priors
beta ~ dmnorm(b0[], B0[,])
aB ~ dnorm(0.001, 1)
#2. Likelihood function
for (i in 1:N){
# Logistic part
W[i] ~ dbern(psi.min1[i])
psi.min1[i] <- 1 - psi[i]
eta.psi[i] <- aB
logit(psi[i]) <- eta.psi[i]
# Poisson part
Y[i] ~ dpois(mu.eff[i])
mu.eff[i] <- W[i] * mu[i]
log(mu[i]) <- max(-20, min(20, eta.mu[i]))
eta.mu[i] <- inprod(beta[], X[i,])
# Discrepancy measures:
ExpY[i] <- mu [i] * (1 - psi[i])
VarY[i] <- (1- psi[i]) * (mu[i] + psi[i] * pow(mu[i], 2))
PRes[i] <- (Y[i] - ExpY[i]) / sqrt(VarY[i])
D[i] <- pow(PRes[i], 2)
# New Samples:
YNew[i] ~ dpois(mu.eff[i])
PResNew[i] <- (YNew[i] - ExpY[i]) / sqrt(VarY[i])
DNew[i] <- pow(PResNew[i], 2)
}
Fit <- sum(D[1:N])
FitNew <- sum(DNew[1:N])
}
The big problem is, that I really tried all combinations and alterations I think could/should work, but when I look at the simulated samples, I get this here:
> all.equal( Jags1$BUGSoutput$sims.list$YNew, Jags1$BUGSoutput$sims.list$Y )
[1] TRUE
And, to make it really weird, when using the means of Fit and FitNew:
> Jags1$BUGSoutput$mean$Fit
[1] 109.7883
> Jags1$BUGSoutput$mean$FitNew
[1] 119.2111
Has anyone a clue what I am doing wrong? Any help would be deeply appreciated!
Kind regards, Ulf
I suspect this isn't the case, but the only obvious reason I can suspect for Y[i] and YNew[i] being always identical is if mu.eff[i] is ~zero, either because W[i] is 0 or mu[i] is close to zero. This implies that Y[] is always zero, which is easy to check from your data, but as I said it does seem odd that you would be trying to model this... Otherwise, I'm not sure what is going on ... try simplifying the code to see if that solves the problem, and then add things back in until it breaks again. Some other suggestions:
It may be helpful for debugging to look at the absolute values of Y and YNew rather than just Y==YNew
If you want a negative binomial (= gamma-Poisson) try sampling mu[i] from a gamma distribution - I have used this formulation for ZINB models extensively, so am sure it works
Your prior for aB looks odd to me - it gives a prior 95% CI for zero inflation around 12-88% - is that what you intended? And why a mean of 0.001 not 0? If you have no predictors then a beta prior for psi.min seems more natural - and if you have no useful prior information a beta(1,1) prior would be an obvious choice.
Minor point but you are calculating a lot of deterministic functions of aB within the for loop - this is going to slow down your model...
Hope that helps,
Matt
So, after getting a nervous breakdown and typing all again and again while searching for my coding error, I found the most stupid error I have ever made - so far:
I just did not specify "Y" as a parameter to save, only "YNew", so when I compared YNew and Y from the sims.list with all.equal, I did not get what I thought I should. I do not know why JAGS gives me the Y at all (from the sims.list of the JAGS-object), but for some reason it is just giving me YNew when asked to give Y. So this part is actually right:
Jags1$BUGSoutput$mean$Fit
[1] 109.7883
Jags1$BUGSoutput$mean$FitNew
[1] 119.2111
So I hope that I did not cause a major confusion for anybody...

Resources