Is there a way to more efficiently perform the following calculations in RStan?
I have only provided the minimal amount of coded that is needed:
parameters {
real beta_0;
real beta_1;
}
model {
vector [n] p_i = exp(beta_0 + beta_1*x)/[1 + exp(beta_0 + beta_1*x)];
y ~ bernoulli(p_i);
/* Likelihood:
for(i in 1:n){
p_i[i] = exp(beta_0 + beta_1*x[i])/(1 + exp(beta_0 + beta_1*x[i]));
y[i] ~ bernoulli(p_i[i]);
*/}
// Prior:
beta_0 ~ normal(m_beta_0, s_beta_0);
beta_1 ~ normal(m_beta_1, s_beta_1);
}
I obtain the following error message: "Matrix expression elements must be type row_vector and row vector expression elements must be int or real, but found element of type vector". If I use the for loop (which is commented out), the code works fine, but I would like to limit the use of for loops in my code. In the above code, x, is a vector of length n.
Another example:
parameters {
real gamma1;
real gamma2;
real gamma3;
real gamma4;
}
model {
// Likelihood:
real lambda;
real beta;
real phi;
for(i in 1:n){
lambda = exp(gamma1)*x[n_length[i]]^gamma2;
beta = exp(gamma3)*x[n_length[i]]^gamma4;
phi = lambda^(-1/beta);
y[i] ~ weibull(beta, phi);
}
//y ~ weibull(exp(gamma1)*x^gamma2, exp(gamma3)*x^gamma4); //cannot raise a vector to a power
// Prior:
gamma1 ~ normal(m_gamma1, s_gamma1);
gamma2 ~ normal(m_gamma2, s_gamma2);
gamma3 ~ normal(m_gamma3, s_gamma3);
gamma4 ~ normal(m_gamma4, s_gamma4);
}
The above code works, but the commented out likelihood calculation does not work since I "cannot raise a vector to a power" (but you can in R). I would, once again, like to not be forced to use for loops. In the above code, n_length, is a vector of length n.
A final example. If I want to draw 10000 samples from a normal distribution in R, I can simply specify
rnorm(10000, mu, sigma)
But in RStan, I would have to use a for loop, for example
parameters {
real mu;
real sigma;
}
generated quantities {
vector[n] x;
for(i in 1:n) {
x[i] = normal_rng(mu, sigma);
}
}
Is there anything that I can do to speed up my RStan examples?
This line of code:
vector [n] p_i = exp(beta_0 + beta_1*x)/[1 + exp(beta_0 + beta_1*x)];
is not valid syntax in the Stan language because square brackets are only used for indexing. It could instead be
vector [n] p_i = exp(beta_0 + beta_1*x) ./ (1 + exp(beta_0 + beta_1*x));
which utilizes the elementwise division operator, or better yet
vector [n] p_i = inv_logit(beta_0 + beta_1*x);
in which case y ~ bernoulli(p_i); would work as a likelihood. Better still, just do
y ~ bernoulli_logit(beta_0 + beta_1 * x);
and it will do the transformation for you in a numerically stable fashion. You could also use bernoulli_logit_glm, which is slightly faster particularly with large datasets.
In Stan 2.19.x, I think you can draw N values from a probability distribution in the generated quantities block. But you are too worried about for loops. The Stan program is transpiled to C++ where loops are fast and almost all of the functions in the Stan language that accept vector inputs and produce vector outputs actually involve the same loop in C++ as if you had done the loop yourself.
Related
I'm trying to implement a modified version of a Kalman filter, in which I have a n-dimensional Normal prior on my vector of n hidden variables, and then a sequence of m independent data vectors which are distributed with different-but-known covariance matrices according to the hidden variables.
More formally:
The generalised solution for the posterior distribution of any given hidden variable conditional on all previous observed variables is:
So it is fairly "easy" to compute the final posterior distribution – all you have to do is apply the above transformations iteratively, starting with your prior and using the known covariance matrices and observed values at each step i. So a potential way to code this, assuming I have a list Sigma with the known covariance matrices, a matrix O with the observed value vectors, and the other variables stored in Lambda, Delta, and mu, is:
inv_Sigma = solve (Sigma [[1]])
inv_Lambda_plus_Delta = solve (Delta)
Delta_i = solve (inv_Sigma + inv_Lambda_plus_Delta)
mu_i = Delta_i %*% (inv_Sigma %*% O [, 1] + inv_Lambda_plus_Delta %*% mu)
if (n > 1) {
for (i in seq (2, n)) {
inv_Sigma = solve (Sigma [[i]])
inv_Lambda_plus_Delta = solve (Lambda + Delta_i)
Delta_i = solve (inv_Sigma + inv_Lambda_plus_Delta)
mu_i = Delta_i %*% (inv_Sigma %*% O [, i] + inv_Lambda_plus_Delta %*% mu_i)
}
}
However, as the dimension of the vectors grows, having to calculate these matrix inverses iteratively over and over and over gets increasingly computationally expensive, to the point where (for the problem I want to solve) it's prohibitive. Is there a more efficient way to do this calculation, either with some already-existing function or package or with a more efficient implementation of matrix inversion? Is a simulation-based solution better for my problem?
I have a time series model (INGARCH):
lambda_t = alpha0 + alpha1*(x_(t-1)) + beta1*(lambda_(t-1))
X_t ~ poisson (lambda_t)
where t is the length of observation or data, alpha0, alpha1 and beta1 are the parameters.
X_t is the series of data, lambda_t is the series of mean.
This model has the condition of alpha1 + beta1 < 1.
In my estimation, I would like to add in the condition of alpha1 + beta1 <1 in my code, I add a while loop in the log-likelihood function, but the loop cannot stop.
What could I do to solve this problem? Is there any other way to add a constraint of alpha1 + beta1 < 1 without using while loop?
Below are my code:
ll <- function(par) {
h.new = rep(0,n)
#par[1] is alpha0
#par[2] is alpha1
#par[3] is beta1
while(par[2] + par[3] < 1){
for (i in 2:n) {
h.new[i] <- par[1] + par[2] * dat[i-1] + par[3] * h.new[i-1]
}
-sum(dpois(dat, h.new, log=TRUE))
}
}
#simply generate a dataset as I have not found a suitable real dataset to fit in
set.seed(77)
n=400
dat <- rpois(n,36)
nlminb(start = c(0.1,0.1,0.1), lower = 1e-6, ll)
You do not change par at all inside the while. In particular, if you would have printed par[1] and par[2] in the while you would see you are endlessly printing the original values, 0.1 - hence you are stuck in the while for ever.
par is a single, unchanging object in each call from nlminb. You just have to make sure if par is bad, you return something not minimal, so nlminb does not keep searching in that direction:
ll <- function(par) {
#If alpha + beta > 1, this is terrible and return an infinite score
#It may be better to throw an error if you get NaN values! The if will
#fail anyway, but if you want to power through add checks:
if( is.nan(par[2]) || is.nan(par[3]) || par[2]+par[3]>1) return(Inf)
h.new = rep(0,n)
#remove while
for (i in 2:n) {
h.new[i] <- par[1] + par[2] * dat[i-1] + par[3] * h.new[i-1]
}
-sum(dpois(dat, h.new, log=TRUE))
}
The algorithm nlminb (or any minimization function) very roughly goes:
Set parameters to initial guess
Send parameters to the objective functions
Guess new parameters:
a. if the score did not improve much, return minimized guess
b. if the score is good, keep searching in this direction
c. else, search in some other direction
Go back to (2) with new parameters
Note you have to return a score for each set of parameters, you do not iterate them in the objective function.
I'm new to Stan, so hoping you can point me in the right direction. I'll build up to my situation to make sure we're on the same page...
If I had a collection of univariate normals, the docs tell me that:
y ~ normal(mu_vec, sigma);
provides the same model as the unvectorized version:
for (n in 1:N)
y[n] ~ normal(mu_vec[n], sigma);
but that the vectorized version is (much?) faster. Ok, fine, makes good sense.
So the first question is: is it possible to take advantage of this vectorization speedup in the univariate normal case where both the mu and sigma of the samples vary by position in the vector. I.e. if both mu_vec and sigma_vec are vectors (in the previous case sigma was a scalar), then is this:
y ~ normal(mu_vec, sigma_vec);
equivalent to this:
for (n in 1:N)
y[n] ~ normal(mu_vec[n], sigma_vec[n]);
and if so is there a comparable speedup?
Ok. That's the warmup. The real question is how to best approach the multi-variate equivalent of the above.
In my particular case, I have N of observations of bivariate data for some variable y, which I store in an N x 2 matrix. (For order of magnitude, N is about 1000 in my use case.)
My belief is that the mean of each component of each observation is 0 and that the stdev of each component is each observation is 1 (and I'm happy to hard code them as such). However, my belief is that the correlation (rho) varies from observation to observation as a (simple) function of another observed variable, x (stored in an N element vector). For example, we might say that rho[n] = 2*inverse_logit(beta * x[n]) - 1 for n in 1:N and our goal is to learn about beta from our data. I.e. the covariance matrix for the nth observation would be:
[1, rho[n]]
[rho[n], 1 ]
My question is what's the best way to put this together in a STAN model so that it isn't slow as heck? Is there a vectorized version of the multi_normal distribution so that I could specify this as:
y ~ multi_normal(vector_of_mu_2_tuples, vector_of_sigma_matrices)
or perhaps as some other similar formulation? Or will I need to write:
for (n in 1:N)
y[n] ~ multi_normal(vector_of_mu_2_tuples[n], vector_of_sigma_matrices[n])
after having set up vector_of_sigma_matrices and vector_of_mu_2_tuples in an earlier block?
Thanks in advance for any guidance!
Edit to add code
Using python, I can generate data in the spirit of my problem as follows:
import numpy as np
import pandas as pd
import pystan as pys
import scipy as sp
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import seaborn as sns
def gen_normal_data(N, true_beta, true_mu, true_stdevs):
N = N
true_beta = true_beta
true_mu = true_mu
true_stdevs = true_stdevs
drivers = np.random.randn(N)
correls = 2.0 * sp.special.expit(drivers*true_beta)-1.0
observations = []
for i in range(N):
covar = np.array([[true_stdevs[0]**2, true_stdevs[0] * true_stdevs[1] * correls[i]],
[true_stdevs[0] * true_stdevs[1] * correls[i], true_stdevs[1]**2]])
observations.append(sp.stats.multivariate_normal.rvs(true_mu, covar, size=1).tolist())
observations = np.array(observations)
return {
'N': N,
'true_mu': true_mu,
'true_stdev': true_stdevs,
'y': observations,
'd': drivers,
'correls': correls
}
and then actually generate the data using:
normal_data = gen_normal_data(100, 1.5, np.array([1., 5.]), np.array([2., 5.]))
Here's what the data set looks like (scatterplot of y colored by correls in the left pane and by drivers in the right pane...so the idea is that the higher the driver the closer to 1 the correl and the lower the driver, the closer to -1 the correl. So would expect red dots on the left pane to be "down-left to up-right" and blue dots to be "up-left to down-right", and indeed they are:
fig, axes = plt.subplots(1, 2, figsize=(15, 6))
x = normal_data['y'][:, 0]
y = normal_data['y'][:, 1]
correls = normal_data['correls']
drivers = normal_data['d']
for ax, colordata, cmap in zip(axes, [correls, drivers], ['coolwarm', 'viridis']):
color_extreme = max(abs(colordata.max()), abs(colordata.min()))
sc = ax.scatter(x, y, c=colordata, lw=0, cmap=cmap, vmin=-color_extreme, vmax=color_extreme)
divider = make_axes_locatable(ax)
cax = divider.append_axes('right', size='5%', pad=0.05)
fig.colorbar(sc, cax=cax, orientation='vertical')
fig.tight_layout()
Using the brute force approach, I can set up a STAN model that looks like this:
model_naked = pys.StanModel(
model_name='naked',
model_code="""
data {
int<lower=0> N;
vector[2] true_mu;
vector[2] true_stdev;
real d[N];
vector[2] y[N];
}
parameters {
real beta;
}
transformed parameters {
}
model {
real rho[N];
matrix[2, 2] cov[N];
for (n in 1:N) {
rho[n] = 2.0*inv_logit(beta * d[n]) - 1.0;
cov[n, 1, 1] = true_stdev[1]^2;
cov[n, 1, 2] = true_stdev[1] * true_stdev[2] * rho[n];
cov[n, 2, 1] = true_stdev[1] * true_stdev[2] * rho[n];
cov[n, 2, 2] = true_stdev[2]^2;
}
beta ~ normal(0, 10000);
for (n in 1:N) {
y[n] ~ multi_normal(true_mu, cov[n]);
}
}
"""
)
This fits nicely:
fit_naked = model_naked.sampling(data=normal_data, iter=1000, chains=2)f = fit_naked.plot();
f.tight_layout()
But I'm hoping someone can point me in the right direction for the "marginalized" approach where we break down our bivariate normal into a pair of independent normals that can be blended using the correlation. The reason I need this is that in my actual use case, both dimensions of are fat-tailed. I am happy to model this as a student-t distribution, but the issue is that STAN only allows a single nu to be specified (not one for each dimension), so I think I'll need to find a way to decompose a multi_student_t into a pair of independent student_t's so that I can set the degrees of freedom separately for each dimension.
The univariate normal distribution does accept vectors for any or all of its arguments and it will be faster than looping over the N observations to call it N times with scalar arguments.
However, the speedup is only going to be linear because the calculations are all the same, but it only has to allocate memory once if you only call it once. The overall wall time is more affected by the number of function evaluations you have to do, which is up to 2^10 - 1 per MCMC iteration (by default), but whether you hit the maximum treedepth depends on the geometry of the posterior distribution you are trying to sample from, which, in turn, depends on everything including the data you condition on.
The bivariate normal distribution can be written as a product of a marginal univariate normal distribution for the first variable and a conditional univariate normal distribution for the second variable given the first variable. In Stan code, we can utilize element-wise multiplication and division to write its log-density like
target += normal_lpdf(first_variable | first_means, first_sigmas);
target += normal_lpdf(second_variable | second_means +
rhos .* first_sigmas ./ second_sigmas .* (first_variable - first_means),
second_sigmas .* sqrt(1 - square(rhos)));
Unfortunately, the more general multivariate normal distribution in Stan does not have an implementation that inputs arrays of covariance matrices.
This isn't quite answering your question, but you can make your program more efficient by removing a bunch of redundant calculations and converting scale a bit to use tanh rather than scaled inverse logit. I'd get rid of the scaling and just use smaller betas, but I left it so that it should get the same results.
data {
int<lower=0> N;
vector[2] mu;
vector[2] sigma;
vector[N] d;
vector[2] y[N];
}
parameters {
real beta;
}
transformed data {
real var1 = square(sigma[1]);
real var2 = square(sigma[2]);
real covar12 = sigma[1] * sigma[2];
vector[N] d_div_2 = d * 0.5;
}
model {
// note: tanh(u) = 2 * inv_logit(u / 2) - 1
vector[N] rho = tanh(beta * d_div_2);
matrix[2, 2] Sigma;
Sigma[1, 1] = var1;
Sigma[2, 2] = var2;
// only reassign what's necessary with minimal recomputation
for (n in 1:N) {
Sigma[1, 2] = rho[n] * covar12;
Sigma[2, 1] = Sigma[1, 2];
y[n] ~ multi_normal(true_mu, Sigma);
}
// weakly informative priors fit more easily
beta ~ normal(0, 8);
}
You could also factor by figuring out Cholesky factorization as function of rho and other fixed values and use that---it saves a solver step in the multivariate normal.
The other option you have is to write out the multi-student-t directly rather than using our built-in implementation. The built-in probably won't be a whole lot faster as the whole operation's pretty heavily dominated by the matrix solve.
I have the following latent variable model: Person j has two latent variables, Xj1 and Xj2. The only thing we get to observe is their maximum, Yj = max(Xj1, Xj2). The latent variables are bivariate normal; they each have mean mu, variance sigma2, and their correlation is rho. I want to estimate the three parameters (mu, sigma2, rho) using only Yj, with data from n patients, j = 1,...,n.
I've tried to fit this model in JAGS (so I'm putting priors on the parameters), but I can't get the code to compile. Here's the R code I'm using to call JAGS. First I generate the data (both latent and observed variables), given some true values of the parameters:
# true parameter values
mu <- 3
sigma2 <- 2
rho <- 0.7
# generate data
n <- 100
Sigma <- sigma2 * matrix(c(1, rho, rho, 1), ncol=2)
X <- MASS::mvrnorm(n, c(mu,mu), Sigma) # n-by-2 matrix
Y <- apply(X, 1, max)
Then I define the JAGS model, and write a little function to run the JAGS sampler and return the samples:
# JAGS model code
model.text <- '
model {
for (i in 1:n) {
Y[i] <- max(X[i,1], X[i,2]) # Ack!
X[i,1:2] ~ dmnorm(X_mean, X_prec)
}
# mean vector and precision matrix for X[i,1:2]
X_mean <- c(mu, mu)
X_prec[1,1] <- 1 / (sigma2*(1-rho^2))
X_prec[2,1] <- -rho / (sigma2*(1-rho^2))
X_prec[1,2] <- X_prec[2,1]
X_prec[2,2] <- X_prec[1,1]
mu ~ dnorm(0, 1)
sigma2 <- 1 / tau
tau ~ dgamma(2, 1)
rho ~ dbeta(2, 2)
}
'
# run JAGS code. If latent=FALSE, remove the line defining Y[i] from the JAGS model
fit.jags <- function(latent=TRUE, data, n.adapt=1000, n.burnin, n.samp) {
require(rjags)
if (!latent)
model.text <- sub('\n *Y.*?\n', '\n', model.text)
textCon <- textConnection(model.text)
fit <- jags.model(textCon, data, n.adapt=n.adapt)
close(textCon)
update(fit, n.iter=n.burnin)
coda.samples(fit, variable.names=c("mu","sigma2","rho"), n.iter=n.samp)[[1]]
}
Finally, I call JAGS, feeding it only the observed data:
samp1 <- fit.jags(latent=TRUE, data=list(n=n, Y=Y), n.burnin=1000, n.samp=2000)
Sadly this results in an error message: "Y[1] is a logical node and cannot be observed". JAGS does not like me using "<-" to assign a value to Y[i] (I denote the offending line with an "Ack!"). I understand the complaint, but I'm not sure how to rewrite the model code to fix this.
Also, to demonstrate that everything else (besides the "Ack!" line) is fine, I run the model again, but this time I feed it the X data, pretending that it's actually observed. This runs perfectly and I get good estimates of the parameters:
samp2 <- fit.jags(latent=FALSE, data=list(n=n, X=X), n.burnin=1000, n.samp=2000)
colMeans(samp2)
If you can find a way to program this model in STAN instead of JAGS, that would be fine with me.
Theoretically you can implement a model like this in JAGS using the dsum distribution (which in this case uses a bit of a hack as you are modelling the maximum and not the sum of the two variables). But the following code does compile and run (although it does not 'work' in any real sense - see later):
set.seed(2017-02-08)
# true parameter values
mu <- 3
sigma2 <- 2
rho <- 0.7
# generate data
n <- 100
Sigma <- sigma2 * matrix(c(1, rho, rho, 1), ncol=2)
X <- MASS::mvrnorm(n, c(mu,mu), Sigma) # n-by-2 matrix
Y <- apply(X, 1, max)
model.text <- '
model {
for (i in 1:n) {
Y[i] ~ dsum(max_X[i])
max_X[i] <- max(X[i,1], X[i,2])
X[i,1:2] ~ dmnorm(X_mean, X_prec)
ranks[i,1:2] <- rank(X[i,1:2])
chosen[i] <- ranks[i,2]
}
# mean vector and precision matrix for X[i,1:2]
X_mean <- c(mu, mu)
X_prec[1,1] <- 1 / (sigma2*(1-rho^2))
X_prec[2,1] <- -rho / (sigma2*(1-rho^2))
X_prec[1,2] <- X_prec[2,1]
X_prec[2,2] <- X_prec[1,1]
mu ~ dnorm(0, 1)
sigma2 <- 1 / tau
tau ~ dgamma(2, 1)
rho ~ dbeta(2, 2)
#data# n, Y
#monitor# mu, sigma2, rho, tau, chosen[1:10]
#inits# X
}
'
library('runjags')
results <- run.jags(model.text)
results
plot(results)
Two things to note:
JAGS isn't smart enough to initialise the matrix of X while satisfying the dsum(max(X[i,])) constraint on its own - so we have to initialise X for JAGS using sensible values. In this case I'm using the simulated values which is cheating - the answer you get is highly dependent on the choice of initial values for X, and in the real world you won't have the simulated values to fall back on.
The max() constraint causes problems to which I can't think of a solution within a general framework: unlike the usual dsum constraint that allows one parameter to decrease while the other increases and therefore both parameters are used at all times, the min() value of X[i,] is ignored and the sampler is therefore free to do as it pleases. This will very very rarely (i.e. never) lead to values of min(X[i,]) that happen to be identical to Y[i], which is the condition required for the sampler to 'switch' between the two X[i,]. So switching never happens, and the X[] that were chosen at initialisation to be the maxima stay as the maxima - I have added a trace parameter 'chosen' which illustrates this.
As far as I can see the other potential solutions to the 'how do I code this' question will fall into essentially the same non-mixing trap which I think is a fundamental problem here (although I might be wrong and would very much welcome working BUGS/JAGS/Stan code that illustrates otherwise).
Solutions to the failure to mix are harder, although something akin to the Carlin & Chibb method for model selection may work (force a min(pseudo_X) parameter to be equal to Y to encourage switching). This is likely to be tricky to get working, but if you can get help from someone with a reasonable amount of experience with BUGS/JAGS you could try it - see:
Carlin, B.P., Chib, S., 1995. Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B 57, 473–484.
Alternatively, you could try thinking about the problem slightly differently and model X directly as a matrix with the first column all missing and the second column all equal to Y. You could then use dinterval() to set a constraint on the missing values that they must be lower than the corresponding maximum. I'm not sure how well this would work in terms of estimating mu/sigma2/rho but it might be worth a try.
By the way, I realise that this doesn't necessarily answer your question but I think it is a useful example of the difference between 'is it codeable' and 'is it workable'.
Matt
ps. A much smarter solution would be to consider the distribution of the maximum of two normal variates directly - I am not sure if such a distribution exists, but it it does and you can get a PDF for it then the distribution could be coded directly using the zeros/ones trick without having to consider the value of the minimum at all.
I believe you can model this in the Stan language treating the likelihood as a two component mixture with equal weights. The Stan code could look like
data {
int<lower=1> N;
vector[N] Y;
}
parameters {
vector<upper=0>[2] diff[N];
real mu;
real<lower=0> sigma;
real<lower=-1,upper=1> rho;
}
model {
vector[2] case_1[N];
vector[2] case_2[N];
vector[2] mu_vec;
matrix[2,2] Sigma;
for (n in 1:N) {
case_1[n][1] = Y[n]; case_1[n][2] = Y[n] + diff[n][1];
case_2[n][2] = Y[n]; case_2[n][1] = Y[n] + diff[n][2];
}
mu_vec[1] = mu; mu_vec[2] = mu;
Sigma[1,1] = square(sigma);
Sigma[2,2] = Sigma[1,1];
Sigma[1,2] = Sigma[1,1] * rho;
Sigma[2,1] = Sigma[1,2];
// log-likelihood
target += log_mix(0.5, multi_normal_lpdf(case_1 | mu_vec, Sigma),
multi_normal_lpdf(case_2 | mu_vec, Sigma));
// insert priors on mu, sigma, and rho
}
I am learning rstan and at the moment I'm solving exercises from Gelmans "Bayesian Data Analysis". For reference this is about the example 5 in chapter 3.
It keeps failing with:
Initialization failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
error occurred during calling the sampler; sampling not done
this is my R code:
library(rstan)
scode <- "
transformed data {
real o_data[5];
o_data[1] <- 10;
o_data[2] <- 10;
o_data[3] <- 12;
o_data[4] <- 11;
o_data[5] <- 9;
}
parameters {
real mu;
real<lower=0> sigma;
real tru_val[5];
}
model {
mu ~ uniform(0.0,20.0);
sigma ~ gamma(2,1);
for (i in 1:5) {
tru_val[i] ~ normal(mu,sigma);
tru_val[i] ~ uniform(o_data[i]-0.5, o_data[i]+0.5);
}
}
"
afit <- stan(model_code = scode, verbose=TRUE)
The funny thing is - if I change the second tru_val sampling to tru_val[i] ~ normal(o_data[i],0.5); the model will evaluate just fine.
So far I tried in the stan code:
rearranging the sampling statements
introducing helper variables
explicitely writing increment_log_p statements
change variable names in case I had accidentially used a keyword
add print statements in the stan code
setting mu to 10
relaxing/widening the constraints in the uniform distribution
and combinations of above
I noticed something surprising, as I printed the values of tru_val that - no matter which order the statements - I make it prints values around 0 typically between -2 and +2 - even when I set mu <- 10; sigma <- 1; (in the data section) and the sampling statement tru_val[i] ~ uniform(9.5,10.5). I don't really see how it can get these numbers.
I really hope someone can shine some light onto this.
The constraints of the variable need to match the support of the distribution you're using. For tru_val[i] ~ uniform(9.5, 10.5), tru_val has to be defined as real<lower=9.5,upper=10.5> tru_val[5].
In this statement, tru_val[i] ~ normal(mu, sigma), Stan is not drawing a sample from a normal distribution and setting it to tru_val[i]. It is calculating the joint distribution function (in log space); in this case, it's evaluating the normal probability distribution function of tru_val[i] given mu and sigma (in log space).
(The best place to ask questions is on the Stan users mailing list.)