In Stan, I get the following error:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
require unconstrained variable declaration. found simplex.
ERROR at line 48
46: for (j in 1:records) {
47: real phenology_predictor;
48: simplex[7] pi;
^
I don't quite understand what is the problem. When I used real pi[7] instead of simplex[7] pi, I got different error:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
no matches for function name="categorical_log"
arg 0 type=int
arg 1 type=real[1]
available function signatures for categorical_log:
0. categorical_log(int, vector) : real
1. categorical_log(int[1], vector) : real
unknown distribution=categorical
ERROR at line 63
62:
63: Y[j] ~ categorical(pi);
^
64:
which I don't understand either... My whole code:
data {
int sites;
int records;
int Y[records];
vector[records] yday;
int site[records];
}
transformed data {
int M[sites];
}
parameters {
real<lower=0,upper=1> psi;
real<lower=0,upper=1000> phi_phen_scale;
real phi_alpha;
real q_date;
real q_date2;
real q_site[sites];
}
model {
real p[records];
real q[records];
// priors
phi_phen_scale ~ normal(0, 10);
phi_alpha ~ normal(0, 10);
q_date ~ normal(0, 10);
q_date2 ~ normal(0, 10);
// vectorized
M ~ bernoulli(psi);
q_site ~ normal(0, 10);
for (j in 1:records) {
real phenology_predictor;
simplex[7] pi;
phenology_predictor <- q_date * yday[j] + q_date2 * yday[j]^2;
p[j] <- M[site[j]] * inv_logit(phi_alpha + phi_phen_scale * phenology_predictor);
q[j] <- inv_logit(q_site[site[j]] + phenology_predictor);
pi[1] <- 1-p[j] + p[j]*(1-q[j])^6;
pi[2] <- p[j]*q[j] ;
pi[3] <- p[j]*(1-q[j])*q[j];
pi[4] <- p[j]*(1-q[j])^2*q[j];
pi[5] <- p[j]*(1-q[j])^3*q[j];
pi[6] <- p[j]*(1-q[j])^4*q[j];
pi[7] <- p[j]*(1-q[j])^5*q[j];
Y[j] ~ categorical(pi);
}
}
Constrained local parameters, such as simplexes, cannot be declared inside the model block because they are not checked. So, you should just declare pi to be a plain vector of length 7, like vector[7] pi;. Nevertheless, pi needs to be on the simplex in order to be an admissible argument to the categorical function.
If it is the case that symbolically pi is non-negative and sums to 1, then it is a question of making sure that numerically they are sufficiently close to non-negative and sum to something that is sufficiently close to 1. I'm not sure what the numerical tolerance is for a simplex in Stan but there is some wiggle room. If numerical error is the problem then doing pi <- pi / sum(pi); before passing pi to the categorical function may help.
Related
I am trying to do Variational inference, so that I can get the best approximating distribution with respect to the target distribution: a Normal-Inverse-Wishart.
Normal-Inverse-Wishart distribution
However when I compile the stan file with the model code it gives the error:
Error in stanc(file = file, model_code = model_code, model_name = ?>model_name, : 0
Syntax error in 'string', line 16, column 14 to column 15, parsing >error:
Expected "generated quantities {" or end of file after end of model >block.
I have tried to investigate what this is referring to but I require some help. My R code is:
stan_file <- "RStan VI model.stan"
stan_model <- rstan::stan_model(file = stan_file) // Error occurs at this line
The RStan file model code is:
data {
int<lower=1> N; // number of assets
real<lower=0> nu0; // prior confidence for sigma
matrix[N, N] sigma_0; // prior sigma
real<lower=0> T0; // prior confidence for mu
vector[N] mu0; // prior mu
}
parameters {
matrix[N, N] sigma;
vector[N] mu;
}
transformed parameters {
matrix[N, N] a;
matrix[N, N] b;
a = sigma0*nu0;
b = sigma/T0;
}
model {
target += inv_wishart_lpdf(sigma | nu0, a);
target += normal_lpdf(mu | mu0, b);
}
I even tried changing the last section of the model code to:
model {
sigma ~ inv_wishart(nu0, a);
mu ~ normal(mu0, b);
}
But still same error. Would anyone know what the error is and how I can fix it?
Many thanks.
Best,
Nihaar
I'm having trouble adjusting a linear regression model on the stan. When observing the error message, the identification in the block part of the transformed parameters is noted.
See below the structure of the code in stan.
Packages:
library(rstan)
library(bayesplot)
Data:
head(Orange)
cols <- c(colnames(Orange[-1]))
Orange <- Orange[,cols]
str(Orange)
Code in stan:
See that the block structure within the stan follows the recommended pattern, however I am not able to identify which part of the code may seem wrong to me.
y = Orange$circumference
x = Orange$age
n = length(y)
regresstan = '
data{
int n;
real y[n];
real x[n];
}
parameters{
real alpha;
real beta;
real sigma;
}
transformed parameters{
real mu[n];
mu = alpha + beta*x;
}
model{
//Priors
alpha ~ normal(0, 100);
beta ~ normal(0, 100);
sigma ~ uniform(0, 100);
//Likelihood
y ~ normal(mu, sigma);
}
'
Error:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for:
real * real[ ]
Available argument signatures for operator*:
real * real
vector * real
row_vector * real
matrix * real
row_vector * vector
vector * row_vector
matrix * vector
row_vector * matrix
matrix * matrix
real * vector
real * row_vector
real * matrix
No matches for:
real + ill-formed
Available argument signatures for operator+:
int + int
real + real
vector + vector
row_vector + row_vector
matrix + matrix
vector + real
row_vector + real
matrix + real
real + vector
real + row_vector
real + matrix
+int
+real
+vector
+row_vector
+matrix
Expression is ill formed.
error in 'modele28054257a16_a9d23411185fa271b60f20be43062e80' at line 16, column 23
-------------------------------------------------
14: transformed parameters{
15: real mu[n];
16: mu = alpha + beta*x;
^
17: }
-------------------------------------------------
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model 'a9d23411185fa271b60f20be43062e80' due to the above error.
The error comes from the transformed parameters block at the line
mu = alpha + beta*x;
The error is saying you can't multiply a real scalar by a real vector (the error of real * real[ ]). You can solve this by looping over the values of mu
transformed parameters {
real mu[n];
for(i in 1:n) {
mu[i] = alpha + beta * x[i];
}
}
which resolves the issue as now you have a real scalar times a real scalar.
I am trying to code a custom Probit function in Stan to improve my understanding of the Stan language and likelihoods. So far I've written the logarithm of the normal pdf but am receiving an error message that I've found to be unintelligible when I am trying to write the likelihood. What am I doing wrong?
Stan model
functions {
real normal_lpdf(real mu, real sigma) {
return -log(2 * pi()) / 2 - log(sigma)
- square(mu) / (2 * sigma^2);
}
real myprobit_lpdf(int y | real mu, real sigma) {
return normal_lpdf(mu, sigma)^y * (1 - normal_lpdf(mu, sigma))^(1-y);
}
}
data {
int N;
int y[N];
}
parameters {
real mu;
real<lower = 0> sigma;
}
model {
for (n in 1:N) {
target += myprobit_lpdf(y[n] | mu, sigma);
}
}
Error
PARSER EXPECTED:
Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname, :
failed to parse Stan model 'Probit_lpdf' due to the above error.
R code to simulate data
## DESCRIPTION
# testing a Probit model
## DATA
N <- 2000
sigma <- 1
mu <- 0.3
u <- rnorm(N, 0, 2)
y.star <- rnorm(N, mu, sigma)
y <- ifelse(y.star > 0,1, 0)
data = list(
N = N,
y = y
)
## MODEL
out.stan <- stan("Probit_lpdf.stan",data = data, chains = 2, iter = 1000 )
The full error message is
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Probabilty functions with suffixes _lpdf, _lpmf, _lcdf, and _lccdf,
require a vertical bar (|) between the first two arguments.
error in 'model2a7252aef8cf_probit' at line 7, column 27
-------------------------------------------------
5: }
6: real myprobit_lpdf(real y, real mu, real sigma) {
7: return normal_lpdf(mu, sigma)^y * (1 - normal_lpdf(mu, sigma))^(1-y);
^
8: }
-------------------------------------------------
which is telling you that the normal_lpdf function excepts three inputs and a vertical bar separating the first from the second.
It is also not a good idea to give your function the same name as a function that is already in the Stan language, such as normal_lpdf.
But the functions you have written do not implement the log-likelihood of a probit model anyway. First, the standard deviation of the errors is not identified by the data, so you do not need sigma. Then, the correct expressions would be something like
real Phi_mu = Phi(mu);
real log_Phi_mu = log(Phi_mu);
real log1m_Phi_mu = log1m(Phi_mu);
for (n in 1:N)
target += y[n] == 1 ? log_Phi_mu : log1m_Phi_mu;
although that is just a slow way of doing
target += bernoulli_lpmf(y | Phi(mu));
What is behind Approx and approxfun? I know that these two functions perform a linear interpolation, however I didn't find any reference on how they do that. I guess they use a least square regression model but I am not sure.
Finally, if it's true that they use a least square regression model what is the difference between them and lm + predict?
As commented , you should read the source code. Interpolation problem
Find y(v), given (x,y)[i], i = 0,..,n-1 */
For example approxfun use a simple this algorithm for linear approximation :
y(v), given (x,y)[i], i = 0,..,n-1 */
find the correct interval (i,j) by bisection */
Use i,j for linear interpolation
Here an R code that aprahrase the C function approx1 :
approx1 <-
function( v, x, y)
{
## Approximate y(v), given (x,y)[i], i = 0,..,n-1 */
i <- 1
j <- length(x)
ij <- 0
## find the correct interval by bisection */
while(i < (j-1) ) {
ij <- floor((i + j)/2)
if(v < x[ij])
j <- ij
else
i <- ij
}
## linear interpolation */
if(v == x[j]) return(y[j])
if(v == x[i]) return(y[i])
return (y[i] + (y[j] - y[i]) * ((v - x[i])/(x[j] - x[i])))
}
I am trying to migrate some code from JAGS to Stan. Say I have the following dataset:
N <- 10
nchoices <- 3
ncontrols <- 3
toydata <- list("y" = rbinom(N, nchoices - 1, .5),
"controls" = matrix(runif(N*ncontrols), N, ncontrols),
"N" = N,
"nchoices" = nchoices,
"ncontrols" = ncontrols)
and that I want to run a multinomial logit with the following code (taken from section 9.5 of the documentation):
data {
int N;
int nchoices;
int y[N];
int ncontrols;
vector[ncontrols] controls[N];
}
parameters {
matrix[nchoices, ncontrols] beta;
}
model {
for (k in 1:nchoices)
for (d in 1:ncontrols)
beta[k,d] ~ normal(0,100);
for (n in 1:N)
y[n] ~ categorical(softmax(beta * controls[n]));
}
I now want to fix the first row of beta to zero. In JAGS I would simply declare in the model block that
for (i in 1:ncontrols) {
beta[1,i] <- 0
}
but I am not sure about how to do this in Stan. I have tried many combinations along the lines of section 6.2 of the documentation (Partially Known Parameters) like, for instance,
parameters {
matrix[nchoices, ncontrols] betaNonObs;
}
transformed parameters {
matrix[nchoices, ncontrols] beta;
for (i in 1:ncontrols) beta[1][i] <- 0
for (k in 2:nchoices) beta[k] <- betaNonObs[k - 1]
}
but none of them work. Any suggestions?
It would be helpful to mention the error message. In this case, if beta is declared to be a matrix, then the syntax you want is the R-like syntax
beta[1,i] <- 0.0; // you also omitted the semicolon
To answer your broader question, I believe you were on the right track with your last approach. I would create a matrix of parameters in the parameters block called free_beta and copy those elements to another matrix declared in the model block called beta that has one extra row at the top for the fixed zeros. Like
data {
int N;
int nchoices;
int y[N];
int ncontrols;
vector[ncontrols] controls[N];
}
parameters {
matrix[nchoices-1, ncontrols] free_beta;
}
model {
// copy free beta into beta
matrix[nchoices,ncontrols] beta;
for (d in 1:ncontrols)
beta[1,d] <- 0.0;
for (k in 2:nchoices)
for (d in 1:ncontrols)
beta[k,d] <- free_beta[k-1,d];
// priors on free_beta, which execute faster this way
for (k in 1:(nchoices-1))
row(free_beta,k) ~ normal(0.0, 100.0);
// likelihood
for (n in 1:N)
y[n] ~ categorical(softmax(beta * controls[n]));
}