Writing log likelihood for WAIC (logistic hierarchal stan model) - r

I'm creating a new model and I want to compare this with another model using WAIC. I understand that I need to write a generated quantities block. However, I'm struggling to convert the logsumexp of beta. I would greatly appreciate any leads/help. My model block looks like this:
model {
//prior for phi,b
phi ~ cauchy(0,5);
mu_b ~ normal(0,1);
sigma_b ~ cauchy(0,1);
mu ~ normal(0,1);
sigma ~ cauchy(0,1);
//model
log_b_z ~ normal(0, 1);
theta_raw ~ normal(mu, sigma);
for (i in 1:n) {
vector[number_segments] test;
for (j in 1:number_segments) {
test[j] = beta_lpdf(response[i] | p[j][i]*phi, (1-p[j][i])*phi) + log(prob_segment[j]);
}
target += log_sum_exp(test);
}
}

You need to define a generated quantities block that defines your posterior predictive log likelihood for each data point.
You can do it this way for a mixture with minimal recomputation.
transformed parameters {
vector[n] log_lik;
{
vector[number_semgnents log_prob_segment = log(prob_segment);
for (i in 1:n) {
vector[number_segments] lp = log_prob_segment;
for (j in 1:number_segments) {
lp[j] += beta_lpdf(response[i] | p[j, i] * phi, (1 - p[j, i]) * phi);
log_lik[i] = log_sum_exp(lp);
}
}
...
model {
target += sum(log_lik);
...
You could also define log_lik as a generated quantity---that can be more efficient if you can vectorize the likelihood (which isn't possible yet for mixtures in Stan).
Once you've done that, you can use the loo package to calculate WAIC, etc., as described in the vignette and references.

Related

Error in Stan Code when variable is clearly defined

I am getting the following error in my Stan code:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for:
gpareto_lcdf(real, real, real)
Available argument signatures for gpareto_lcdf:
gpareto_lcdf(vector, real, real)
error in 'modelafda6ff99d79_gpd' at line 54, column 50
-------------------------------------------------
52: for (i in 1:n) {
53: if (censored[i]) {
54: target += gpareto_lcdf(value[i] | k, sigma);
^
55: } else {
-------------------------------------------------
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model 'gpd' due to the above error.
In my R studio version, it seems to be complaining about the sigma parameter and not being able to find a match for it. I don't understand why this is an issue given that sigma is defined in my gpareto_lcdf. Here is the code that I am using:
functions {
real gpareto_lpdf(vector y, real k, real sigma) {
// generalised Pareto log pdf
int N = rows(y);
real inv_k = inv(k);
if (k<0 && max(y)/sigma > -inv_k)
reject("k<0 and max(y)/sigma > -1/k; found k, sigma =", k, sigma)
if (sigma<=0)
reject("sigma<=0; found sigma =", sigma)
if (fabs(k) > 1e-15)
return -(1+inv_k)*sum(log1p((y) * (k/sigma))) -N*log(sigma);
else
return -sum(y)/sigma -N*log(sigma); // limit k->0
}
real gpareto_lcdf(vector y, real k, real sigma) {
// generalised Pareto log cdf
real inv_k = inv(k);
if (k<0 && max(y)/sigma > -inv_k)
reject("k<0 and max(y)/sigma > -1/k; found k, sigma =", k, sigma)
if (sigma<=0)
reject("sigma<=0; found sigma =", sigma)
if (fabs(k) > 1e-15)
return sum(log1m_exp((-inv_k)*(log1p((y) * (k/sigma)))));
else
return sum(log1m_exp(-(y)/sigma)); // limit k->0
}
}
data {
// the input data
int<lower = 1> n;
real<lower = 0> value[n];
int<lower = 0, upper = 1> censored[n];
// parameters for the prior
real<lower = 0> a;
real<lower = 0> b;
}
parameters {
real k;
real sigma;
}
model {
// prior
k ~ gamma(a, b);
sigma ~ gamma(a,b);
// likelihood
for (i in 1:n) {
if (censored[i]) {
target += gpareto_lcdf(value[i] | k, sigma);
} else {
target += gpareto_lpdf(value[i] | k, sigma);
}
}
}
Clearly sigma is defined in the gpareto_lcdf and so I am unsure why Stan is complaining about this.
Your code in the likelihood section of the model block doesn't match the way you have defined the gpareto...() functions in the functions block. The gpareto functions take a vector as the first argument but instead you are looping through and trying to pass a single element of value each time. That's why you get the error that the data types you are passing to gpareto_lcdf() do not match the "signature" of the function. The function expects the first argument to be a vector, the second to be a real, and the third to be a real. But you are passing three reals.
The error has nothing to do with sigma. The ^ symbol is pointing to the entire function call to gpareto_lcdf() and just happens to be pointing near where the word sigma is, but the error isn't related to sigma.
To fix this error, you would need to do one of the following:
Redefine the gpareto() functions to take three real arguments and keep your loop in the model block as is.
Rewrite your model block to not use a loop and instead be vectorized.
I'm not sure the vectorization will work with the condition you have in the model block so you may be forced to go with the first solution.
I would recommend posting this question on the Stan forum where you may get a better answer.

Error in optim:Function cannot be evaluated at initial parameters

So I want to estimate 5 parameters consist of intercept, age, disease, mdu, and alfa from panel data. And I write the pdf, here's the code of multivariate negative binomial for panel data. I use optim() package to estimate the parameters. Please help me to fix what is wrong with my code..
library(pglm)
library(plm)
data("HealthIns")
dat<- pdata.frame(HealthIns,index = c("id","year"))
y<-data.matrix(dat$mdu)
y[is.na(y)]=0
Y<-matrix(data=y,nrow=5908,ncol=5)
dat$ageclass<-ifelse(dat$age >=30,1,0)
x1<-data.matrix(dat$ageclass)
x1[is.na(x1)]=0
X1<-matrix(data=x1,nrow=5908,ncol=5)
dat$gender <-ifelse(dat$sex=="male",1,0)
x2<-data.matrix(dat$gender)
x2[is.na(x2)]=0
X2<-matrix(data=x2,nrow=5908,ncol=5)
x3<-data.matrix(dat$disease)
x3[is.na(x3)]=0
X3<-matrix(data=x3,nrow=5908,ncol=5)
Function for optim package
po.gam=function(para){
#Lambda(i,t)
{for (i in (1:5908)){
for(t in (1:5)){
lambda<-matrix(para[1] + para[2]*X1 + para[3]*X2 +
para[4]*X3,nrow=5908,ncol=5)}}
}
#Sigma N(i,t) terhadap t
num.claims.of.t <-numeric(nrow(Y))
{for (i in seq(nrow(Y))){
num.claims.of.t[i] <-sum(Y[i,])}
}
#Sigma Lambda(i,t) terhadap t
num.lambda.of.t<-numeric(nrow(Y))
{for (i in seq(nrow(Y))){
num.lambda.of.t[i]<-sum(lambda[i,])}
}
#Produc Exponential Dist
prod.exp<-numeric(nrow(Y))
{for (i in seq(nrow(Y))){
prod.exp[i]<-prod(lambda[i,]^Y[i,]/factorial(Y[i,]))}
}
#JOINT PROBABILITY OF TIMESNYA...
joint.pdf.mvnb<-prod.exp*gamma(num.claims.of.t + (1/para[5]))/gamma(1/para[5])*((1/para[5])/(num.lambda.of.t + (1/para[5])))^(1/para[5])*(num.lambda.of.t + (1/para[5]))^(-num.claims.of.t)
#PRODUC NUMBER OF CLAIMS SEMUA INDIVIDU
-log(prod(joint.pdf.mvnb))
}
start.value <- c(beta0=1,beta1=1,beta2=1,beta3=1,alfa=1)
MLE_estimator<-optim(start.value,po.gam,hessian=TRUE)
MLE_estimator
And here is my result
> MLE_estimator<-optim(start.value,po.gam,hessian=TRUE)
Error in optim(start.value, po.gam, hessian = TRUE) :
function cannot be evaluated at initial parameters

Error in declaring data in rstan for graded response model

I am trying to use Stan, specifically through rstan, to fit a graded response model. Luo and Jiao (2018), available here, provide Stan code for doing so. Here is their code, edited only to include more white space:
data{
int<lower=2, upper=4> K; //number of categories
int <lower=0> n_student;
int <lower=0> n_item;
int<lower=1,upper=K> Y[n_student,n_item];
}
parameters {
vector[n_student] theta;
real<lower=0> alpha [n_item];
ordered[K-1] kappa[n_item]; //category difficulty
real mu_kappa; //mean of the prior distribution of category difficulty
real<lower=0> sigma_kappa; //sd of the prior distribution of category difficulty
}
model{
alpha ~ cauchy(0,5);
theta ~ normal(0,1);
for (i in 1: n_item){
for (k in 1:(K-1)){
kappa[i,k] ~ normal(mu_kappa,sigma_kappa);
}}
mu_kappa ~ normal(0,5);
sigma_kappa ~ cauchy(0,5);
for (i in 1:n_student){
for (j in 1:n_item){
Y[i,j] ~ ordered_logistic(theta[i]*alpha[j],kappa[j]);
}}
}
generated quantities {
vector[n_item] log_lik[n_student];
for (i in 1: n_student){
for (j in 1: n_item){
log_lik[i, j] = ordered_logistic_log (Y[i, j],theta[i]*alpha[j],kappa[j]);
}}
}
However, when I try to use this code, the parser throws an error. Here is the R code to reproduce the error:
library("rstan")
n <- 100
m <- 10
K <- 4
example_responses <- sample(x = 1:4, size = n * m, replace = TRUE)
example_responses <- matrix(example_responses, nrow = n, ncol = m)
example_dat <- list(K = K,
n_student = n,
n_item = m,
Y = example_responses)
fit <- stan(file = "~/grm.stan", data = example_dat)
Here is the error I receive:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
error in 'modelf6471b3f018_grm' at line 2, column 21
-------------------------------------------------
2:
3: data {
4: int<lower=2, upper=4> K; // number of categories
^
5: int<lower=0> n_student;
-------------------------------------------------
PARSER EXPECTED: <one of the following:
a variable declaration, beginning with type,
(int, real, vector, row_vector, matrix, unit_vector,
simplex, ordered, positive_ordered,
corr_matrix, cov_matrix,
cholesky_corr, cholesky_cov
or '}' to close variable declarations>
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model 'grm' due to the above error.
I've tried going through the code and the Stan manual to see what the issue is with the data declaration, but I can't find a problem with it. The declaration appears to be very similar to a declaration example in the Stan Language Reference:
int<lower = 1> N;
Can anyone tell me what I'm missing?
Your code has non-standard characters in some of the white space, including right after K;

'expected a comma' error in OpenBUGS

I am trying to fit a model using OpenBUGS. Here is the code:
model {
# N observations
for (i in 1:N) {
y[i] ~ dbin(p.bound[i],1)
p.bound[i]<-max(0,min(1,p[i]))
logit(p[i])<-Xbeta[i]
Xbeta[i] <- a[sp[i]]-0.5*pow(((X1[i]-opt1[sp[i]])/tol1[sp[i]])+((X2[i]-opt2[sp[i]])/tol2[sp[i]])+((X3[i]-opt3[sp[i]])/tol3[sp[i]]))
}
for (j in 1:n.sp) {
a[j] ~ dnorm(a.hat[j],tau.a)
a.hat[j]<-mu.a
opt1[j] ~ dnorm(opt.hat1[j],tau.opt1)
opt.hat1[j]<-mu.opt1
tol1[j] ~ dnorm(tol.hat1[j],tau.tol1)
tol.hat1[j]<-mu.tol1
opt2[j] ~ dnorm(opt.hat2[j],tau.opt2)
opt.hat2[j]<-mu.opt2
tol2[j] ~ dnorm(tol.hat2[j],tau.tol2)
tol.hat2[j]<-mu.tol2
opt3[j] ~ dnorm(opt.hat3[j],tau.opt3)
opt.hat3[j]<-mu.opt3
tol3[j] ~ dnorm(tol.hat3[j],tau.tol3)
tol.hat3[j]<-mu.tol3
}
mu.a~dnorm(0,0.0001)
mu.opt1~dnorm(0,0.0001)
mu.tol1~dunif(0.04,37)
mu.opt2~dnorm(0,0.0001)
mu.tol2~dunif(0.04,37)
mu.opt3~dnorm(0,0.0001)
mu.tol3~dunif(0.04,37)
tau.a<-pow(sigma.a,-2)
sigma.a~dunif(0,100)
tau.opt1<-pow(sigma.opt1,-2)
sigma.opt1~dunif(0,100)
tau.opt2<-pow(sigma.opt2,-2)
sigma.opt2~dunif(0,100)
tau.opt3<-pow(sigma.opt3,-2)
sigma.opt3~dunif(0,100)
tau.tol1<-pow(sigma.tol1,-2)
sigma.tol1~dunif(0,100)
tau.tol2<-pow(sigma.tol2,-2)
sigma.to2~dunif(0,100)
tau.tol3<-pow(sigma.tol3,-2)
sigma.tol3~dunif(0,100)
}
But, When I run this code, I get the error 'expected a comma'.
If anyone can help me to solve this problem that will be great.
The error is in the power function. From the BUGS manual...
pow(e1, e2) e1^e2
At the moment you just have e1;
Xbeta[i] <- a[sp[i]]-0.5*pow(e1)
where e1 = ((X1[i]-opt1[sp[i]])/....

How to estimate parameters from BG / NBD model using bbmle?

I am trying to estimate the probability for a customer to be alive after a certain period. I have data for 500 customers of a firm. For each customer I know x (number of transactions by a given customer over all time periods), tx (time of the last transaction) and T (total time between the first purchase and the end of the observation window).
I work with the BG/NBD model. In order to estimate the probability to be alive, I first need to estimate 4 parameters (r, alpha, a and b) included in this model. To optimize the value of these parameters, I am using the « bbmle » package (shown below).
However when I run the code, it doesn’t give any result. Furthermore it also seems that R doesn’t recognise many of the « objects » that are included in the functions below.
Does anyone notice any error that I’ve made in the code ? Is there another way to write it ?
bgLlh <- function(mydata, r, alpha, a, b) {
with (mydata, {
if (a<=0 | b<=0 | r<=0 | alpha<=0) return (NaN)
term1 <-log(gamma(r+mydata$x)) - log(gamma(r)) + r*log(alpha)
term2 <-log(gamma(a+b))+log(gamma(b+mydata$x))-log(gamma(b))-log(gamma(a+b+mydata$x))
term3<- -(r+mydata$x)*log(alpha+mydata$T)
term4 <- if(mydata$x > 1) {log(a)-log(b+mydata$x-1)-(r+mydata$x)*log(alpha+mydata$tx)
} else {0}
llh <- term1 + term2 +log(exp(term3)+(mydata$x>0)*exp(term4))
f <- -sum(llh)
return(f)
})
}
bgEstimateParameters <- function(mydata, initValues, safeMode=FALSE) {
llhd <- function(r, alpha, a, b) {
return (bgLlh(data, r, alpha, a, b))
}
library(bbmle)
if (safeMode) {
fit <- mle2(llhd, initValues, skip.hessian=TRUE, method="Nelder-Mead")
} else {
fit <- mle2(llhd, initValues)
}
return (fit)
}

Resources