I'm having trouble setting up a Hierarchical Multinomial Processing tree in Stan. As a starting point, I am trying to add a hierarchy to the simple model here:
https://github.com/stan-dev/example-models/blob/master/Bayesian_Cognitive_Modeling/CaseStudies/MPT/MPT_1_Stan.R
I'm not sure why the code does not work. Any help would be greatly appreciated.
Example Data (Based on Julia syntax):
Nsub = 2
Ntrials = 100
FCat = [20 60 20;30 50 20]
data {
// Number of subjects
int<lower=1> Nsub;
// Number of Trials
int<lower=1> Ntrials;
// Data
int<lower=0,upper=Ntrials> FCat[Nsub,4];
}
parameters {
vector<lower=0,upper=1>[Nsub] c;
vector<lower=0,upper=1>[Nsub] r;
vector<lower=0,upper=1>[Nsub] u;
real<lower=0> c_omega;
real<lower=0> r_omega;
real<lower=0> u_omega;
real<lower=0,upper=1> c_kappa;
real<lower=0,upper=1> r_kappa;
real<lower=0,upper=1> u_kappa;
}
transformed parameters {
simplex[4] theta[Nsub];
real<lower=0> c_A;
real<lower=0> c_B;
real<lower=0> r_A;
real<lower=0> r_B;
real<lower=0> u_A;
real<lower=0> u_B;
c_A <- c_kappa*c_omega;
c_B <- (1-c_kappa)*c_omega;
r_A <- r_kappa*r_omega;
r_B <- (1-r_kappa)*r_omega;
u_A <- u_kappa*u_omega;
u_B <- (1-u_kappa)*u_omega;
// Create category responses
for (i in 1:Nsub){
theta[i,1] <- c[i]*r[i];
theta[i,2] <- (1 - c[i])*sqrt(u[i]);
theta[i,3] <- (1 - c[i])*2*u[i]*(1 - u[i]);
theta[i,4] <- c[i]*(1 - r[i]) + (1 - c[i])*sqrt(1 - u[i]);
}
}
model {
// HyperPriors
c_omega ~ gamma(2,8);
r_omega ~ gamma(2,8);
u_omega ~ gamma(2,8);
c_kappa ~ beta(50,50);
r_kappa ~ beta(50,50);
u_kappa ~ beta(70,30);
// Priors
c ~ beta(c_A, c_B);
r ~ beta(r_A, r_B);
u ~ beta(u_A, u_B);
for (i in 1:Nsub){
FCat[i] ~ multinomial(theta[i]);
}
}
The sqrt() function just takes square roots of a scalar (real or int), returning a real.
You can use print() to see if your values sum to 1. Or you can do a test yourself and reject if they don't. The usual stick-breaking construction for a simplex is given in the manual in the simplex transform. The usual approach is to take the first value to be between 0 and 1, then the next to be some fraction of what's left (1 - first value), and so on. We also have softmax built in. You have to be careful with identifiability with a softmax parameterization --- the priors will matter unless you set one of the inputs to a constant.
Related
I am trying to do Variational inference, so that I can get the best approximating distribution with respect to the target distribution: a Normal-Inverse-Wishart.
Normal-Inverse-Wishart distribution
However when I compile the stan file with the model code it gives the error:
Error in stanc(file = file, model_code = model_code, model_name = ?>model_name, : 0
Syntax error in 'string', line 16, column 14 to column 15, parsing >error:
Expected "generated quantities {" or end of file after end of model >block.
I have tried to investigate what this is referring to but I require some help. My R code is:
stan_file <- "RStan VI model.stan"
stan_model <- rstan::stan_model(file = stan_file) // Error occurs at this line
The RStan file model code is:
data {
int<lower=1> N; // number of assets
real<lower=0> nu0; // prior confidence for sigma
matrix[N, N] sigma_0; // prior sigma
real<lower=0> T0; // prior confidence for mu
vector[N] mu0; // prior mu
}
parameters {
matrix[N, N] sigma;
vector[N] mu;
}
transformed parameters {
matrix[N, N] a;
matrix[N, N] b;
a = sigma0*nu0;
b = sigma/T0;
}
model {
target += inv_wishart_lpdf(sigma | nu0, a);
target += normal_lpdf(mu | mu0, b);
}
I even tried changing the last section of the model code to:
model {
sigma ~ inv_wishart(nu0, a);
mu ~ normal(mu0, b);
}
But still same error. Would anyone know what the error is and how I can fix it?
Many thanks.
Best,
Nihaar
I am trying to code a custom Probit function in Stan to improve my understanding of the Stan language and likelihoods. So far I've written the logarithm of the normal pdf but am receiving an error message that I've found to be unintelligible when I am trying to write the likelihood. What am I doing wrong?
Stan model
functions {
real normal_lpdf(real mu, real sigma) {
return -log(2 * pi()) / 2 - log(sigma)
- square(mu) / (2 * sigma^2);
}
real myprobit_lpdf(int y | real mu, real sigma) {
return normal_lpdf(mu, sigma)^y * (1 - normal_lpdf(mu, sigma))^(1-y);
}
}
data {
int N;
int y[N];
}
parameters {
real mu;
real<lower = 0> sigma;
}
model {
for (n in 1:N) {
target += myprobit_lpdf(y[n] | mu, sigma);
}
}
Error
PARSER EXPECTED:
Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname, :
failed to parse Stan model 'Probit_lpdf' due to the above error.
R code to simulate data
## DESCRIPTION
# testing a Probit model
## DATA
N <- 2000
sigma <- 1
mu <- 0.3
u <- rnorm(N, 0, 2)
y.star <- rnorm(N, mu, sigma)
y <- ifelse(y.star > 0,1, 0)
data = list(
N = N,
y = y
)
## MODEL
out.stan <- stan("Probit_lpdf.stan",data = data, chains = 2, iter = 1000 )
The full error message is
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Probabilty functions with suffixes _lpdf, _lpmf, _lcdf, and _lccdf,
require a vertical bar (|) between the first two arguments.
error in 'model2a7252aef8cf_probit' at line 7, column 27
-------------------------------------------------
5: }
6: real myprobit_lpdf(real y, real mu, real sigma) {
7: return normal_lpdf(mu, sigma)^y * (1 - normal_lpdf(mu, sigma))^(1-y);
^
8: }
-------------------------------------------------
which is telling you that the normal_lpdf function excepts three inputs and a vertical bar separating the first from the second.
It is also not a good idea to give your function the same name as a function that is already in the Stan language, such as normal_lpdf.
But the functions you have written do not implement the log-likelihood of a probit model anyway. First, the standard deviation of the errors is not identified by the data, so you do not need sigma. Then, the correct expressions would be something like
real Phi_mu = Phi(mu);
real log_Phi_mu = log(Phi_mu);
real log1m_Phi_mu = log1m(Phi_mu);
for (n in 1:N)
target += y[n] == 1 ? log_Phi_mu : log1m_Phi_mu;
although that is just a slow way of doing
target += bernoulli_lpmf(y | Phi(mu));
I have implemented a stan hierarchical model with level 1 within groups to be a linear model and level 2 within subjects Gaussian mixture model. It means the slope obtained from level 1 is used by level model GMM to cluster. When I run the model it has a convergence problem.
WARNING:pystan:Maximum (flat) parameter count (1000) exceeded:
skipping diagnostic tests for n_eff and Rhat. To run all diagnostics call pystan.check_hmc_diagnostics(fit)
WARNING:pystan:2 of 500 iterations ended with a divergence (0.4 %).
WARNING:pystan:Try running with adapt_delta larger than 0.8 to remove the divergences.
WARNING:pystan:Chain 1: E-BFMI = 0.0611
WARNING:pystan:E-BFMI below 0.2 indicates you may need to reparameterize your model
Any comments on improving the model?
multi_level_model = """
data {
int<lower=0> N; // No of observations
int J; // No of subjects
int<lower=1,upper=J> RID[N];
vector[N] x; // Cognitive measure
}
parameters{
real a;
vector[J] b;
real mu_b;
real<lower=0,upper=2> sigma_b;
# Gaussian Parameters for level 2
ordered[2] mu;
real<lower=0> sigma[2];
real<lower=0, upper=1> theta;
}
transformed parameters {
vector[N] y_hat;
for(i in 1:N)
y_hat[i] <- a + x[i] * b[RID[i]];
}
model {
sigma_b ~ normal(0, 1);
b ~ normal (mu_b, sigma_b);
a ~ normal (0, 1);
sigma ~ normal(0, 1);
mu ~ normal(0, 1);
theta ~ beta(5, 5);
for (n in 1:J)
target += log_mix(theta,
normal_lpdf(b[n] | mu[1], sigma[1]),
normal_lpdf(b[n] | mu[2], sigma[2]));
}
"""
I am using the brms package to build a multilevel model with a gaussian process on the predictor, x. The model looks like this: make_stancode(y ~ gp(x, cov = "exp_quad", by= groups) + (1| groups), data = dat) so a gp on the x predictor and a multilevel group variable. In my case I have 5 groups. I've been looking at the code for that (below) and I'm trying to figure out the meanings and dimensions of some of the parameters.
I see that M_1 is the number of groups
My questions are:
What is the meaning of N_1, is it the same as the number of observations , N? It is used here: vector[N_1] z_1[M_1]; // unscaled group-level effects
For Kgp_1 and Mgp_1 ( int Kgp_1; and int Mgp_1;), if I have 5 groups are both Kgp_1 and Mgp_1 equal to 5? If so, why are two variables used?
// generated with brms 1.10.0
functions {
/* compute a latent Gaussian process
* Args:
* x: array of continuous predictor values
* sdgp: marginal SD parameter
* lscale: length-scale parameter
* zgp: vector of independent standard normal variables
* Returns:
* a vector to be added to the linear predictor
*/
vector gp(vector[] x, real sdgp, real lscale, vector zgp) {
matrix[size(x), size(x)] cov;
cov = cov_exp_quad(x, sdgp, lscale);
for (n in 1:size(x)) {
// deal with numerical non-positive-definiteness
cov[n, n] = cov[n, n] + 1e-12;
}
return cholesky_decompose(cov) * zgp;
}
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> Kgp_1;
int<lower=1> Mgp_1;
vector[Mgp_1] Xgp_1[N];
int<lower=1> Igp_1[Kgp_1];
int<lower=1> Jgp_1_1[Igp_1[1]];
int<lower=1> Jgp_1_2[Igp_1[2]];
int<lower=1> Jgp_1_3[Igp_1[3]];
int<lower=1> Jgp_1_4[Igp_1[4]];
int<lower=1> Jgp_1_5[Igp_1[5]];
// data for group-level effects of ID 1
int<lower=1> J_1[N];
int<lower=1> N_1;
int<lower=1> M_1;
vector[N] Z_1_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
real temp_Intercept; // temporary intercept
// GP hyperparameters
vector<lower=0>[Kgp_1] sdgp_1;
vector<lower=0>[Kgp_1] lscale_1;
vector[N] zgp_1;
real<lower=0> sigma; // residual SD
vector<lower=0>[M_1] sd_1; // group-level standard deviations
vector[N_1] z_1[M_1]; // unscaled group-level effects
}
transformed parameters {
// group-level effects
vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);
}
model {
vector[N] mu = rep_vector(0, N) + temp_Intercept;
mu[Jgp_1_1] = mu[Jgp_1_1] + gp(Xgp_1[Jgp_1_1], sdgp_1[1], lscale_1[1], zgp_1[Jgp_1_1]);
mu[Jgp_1_2] = mu[Jgp_1_2] + gp(Xgp_1[Jgp_1_2], sdgp_1[2], lscale_1[2], zgp_1[Jgp_1_2]);
mu[Jgp_1_3] = mu[Jgp_1_3] + gp(Xgp_1[Jgp_1_3], sdgp_1[3], lscale_1[3], zgp_1[Jgp_1_3]);
mu[Jgp_1_4] = mu[Jgp_1_4] + gp(Xgp_1[Jgp_1_4], sdgp_1[4], lscale_1[4], zgp_1[Jgp_1_4]);
mu[Jgp_1_5] = mu[Jgp_1_5] + gp(Xgp_1[Jgp_1_5], sdgp_1[5], lscale_1[5], zgp_1[Jgp_1_5]);
for (n in 1:N) {
mu[n] = mu[n] + (r_1_1[J_1[n]]) * Z_1_1[n];
}
// priors including all constants
target += student_t_lpdf(sdgp_1 | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(lscale_1 | 0, 0.5)
- 1 * normal_lccdf(0 | 0, 0.5);
target += normal_lpdf(zgp_1 | 0, 1);
target += student_t_lpdf(sigma | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += student_t_lpdf(sd_1 | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(z_1[1] | 0, 1);
// likelihood including all constants
if (!prior_only) {
target += normal_lpdf(Y | mu, sigma);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = temp_Intercept;
}
If you use make_standata(...) on the same formula, you can see the data that would be passed onto Stan. From here, you can piece together what some of the variables do. If I use the lme4::sleepstudy dataset as a proxy for your data, I get:
library(brms)
dat <- lme4::sleepstudy
dat$groups <- dat$Subject
dat$y <- dat$Reaction
dat$x <- dat$Days
s_data <- make_standata(
y ~ gp(x, cov = "exp_quad", by= groups) + (1| groups), data = dat)
s_data$N_1
#> 18
For N_1, I get 18 which is the number of levels in groups in this dataset.
For Kgp_1 and Mgp_1 ( int Kgp_1; and int Mgp_1;), if I have 5 groups are both Kgp_1 and Mgp_1 equal to 5? If so, why are two variables used?
s_data$Mgp_1
#> 1
s_data$Kgp_1
#> 18
It looks like Kgp_1 is again the number of groups. I am not sure what Mgp_1 does besides set the length of the vector vector[Mgp_1] Xgp_1[N];
I am trying to migrate some code from JAGS to Stan. Say I have the following dataset:
N <- 10
nchoices <- 3
ncontrols <- 3
toydata <- list("y" = rbinom(N, nchoices - 1, .5),
"controls" = matrix(runif(N*ncontrols), N, ncontrols),
"N" = N,
"nchoices" = nchoices,
"ncontrols" = ncontrols)
and that I want to run a multinomial logit with the following code (taken from section 9.5 of the documentation):
data {
int N;
int nchoices;
int y[N];
int ncontrols;
vector[ncontrols] controls[N];
}
parameters {
matrix[nchoices, ncontrols] beta;
}
model {
for (k in 1:nchoices)
for (d in 1:ncontrols)
beta[k,d] ~ normal(0,100);
for (n in 1:N)
y[n] ~ categorical(softmax(beta * controls[n]));
}
I now want to fix the first row of beta to zero. In JAGS I would simply declare in the model block that
for (i in 1:ncontrols) {
beta[1,i] <- 0
}
but I am not sure about how to do this in Stan. I have tried many combinations along the lines of section 6.2 of the documentation (Partially Known Parameters) like, for instance,
parameters {
matrix[nchoices, ncontrols] betaNonObs;
}
transformed parameters {
matrix[nchoices, ncontrols] beta;
for (i in 1:ncontrols) beta[1][i] <- 0
for (k in 2:nchoices) beta[k] <- betaNonObs[k - 1]
}
but none of them work. Any suggestions?
It would be helpful to mention the error message. In this case, if beta is declared to be a matrix, then the syntax you want is the R-like syntax
beta[1,i] <- 0.0; // you also omitted the semicolon
To answer your broader question, I believe you were on the right track with your last approach. I would create a matrix of parameters in the parameters block called free_beta and copy those elements to another matrix declared in the model block called beta that has one extra row at the top for the fixed zeros. Like
data {
int N;
int nchoices;
int y[N];
int ncontrols;
vector[ncontrols] controls[N];
}
parameters {
matrix[nchoices-1, ncontrols] free_beta;
}
model {
// copy free beta into beta
matrix[nchoices,ncontrols] beta;
for (d in 1:ncontrols)
beta[1,d] <- 0.0;
for (k in 2:nchoices)
for (d in 1:ncontrols)
beta[k,d] <- free_beta[k-1,d];
// priors on free_beta, which execute faster this way
for (k in 1:(nchoices-1))
row(free_beta,k) ~ normal(0.0, 100.0);
// likelihood
for (n in 1:N)
y[n] ~ categorical(softmax(beta * controls[n]));
}