simple Gamma GLM in STAN - r

I'm trying a simple Gamma GLM in STAN and R, but it crashes immediately
generate data:
set.seed(1)
library(rstan)
N<-500 #sample size
dat<-data.frame(x1=runif(N,-1,1),x2=runif(N,-1,1))
#the model
X<-model.matrix(~.,dat)
K<-dim(X)[2] #number of regression params
#the regression slopes
betas<-runif(K,-1,1)
shape <- 10
#simulate gamma data
mus<-exp(X%*%betas)
y<-rgamma(500,shape=shape,rate=shape/mus)
this is my STAN model:
model_string <- "
data {
int<lower=0> N; //the number of observations
int<lower=0> K; //the number of columns in the model matrix
matrix[N,K] X; //the model matrix
vector[N] y; //the response
}
parameters {
vector[K] betas; //the regression parameters
real<lower=0, upper=1000> shape; //the shape parameter
}
model {
y ~ gamma(shape, (shape/exp(X * betas)));
}"
when I run this model, R immediately crashes:
m <- stan(model_code = model_string, data = list(X=X, K=3, N=500, y=y), chains = 1, cores=1)
update : I think the problem is somewhere in the vectorization as I can get a running model where I pass every column of X as a vector.
update2: this also works
for(i in 1:N)
y[i] ~ gamma(shape, (shape / exp(X[i,] * betas)));

The problem with the original code is that there is no operator currently defined in Stan for a scalar divided by a vector. In this case,
shape / exp(X * betas)
You might be able to do
shape[1:N] ./ exp(X * betas)
or failing that,
(shape * ones_vector) ./ exp(X * betas)

Related

Estimating parameters using stan when the distribution for response variable in a regression is non-normal

I am using R+stan for Bayesian estimates of model parameters when the distribution for response variable in a regression is not normal but rather some custom distribution as below.
Let say I have below data generating process
x <- rnorm(100, 0, .5)
noise <- rnorm(100,0,1)
y = exp(10 * x + noise) / (1 + exp(10 * x + noise))
data <- list( x= x, y = y, N = length(x))
In stan, I am creating below stan object
Model = "
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
transformed parameters {
vector[N] mu;
for (f in 1:N) {
mu[f] = alpha + beta * x[f];
}
}
model {
sigma ~ chi_square(5);
alpha ~ normal(0, 1);
beta ~ normal(0, 1);
y ~ ???;
}
"
However as you can see, what will be the right stan continuous distribution in the model block for y?
Any pointer will be highly appreciated.
Thanks for your time.
The problem is not so much that the distribution of errors isn't normal (which is the assumption in a regular linear regression), but that that's clearly not a linear relationship between x and y. You DO have a linear relationship with normally distributed noise (z = x * 10 + noise, where I use z to avoid confusion with your y), but then you apply the softmax function: y = softmax(z). If you want to model this using a linear regression, you need to invert the softmax (i.e. get the z back from y), which you do using the inverse softmax (which is the logit function since the softmax is the inverse logit function and the inverse inverse logit is the logit). Then you can do a standard linear regression.
model = "
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
transformed data{
// invert the softmax function, so there's a linear relationship between x and z
vector[N] z = logit(y);
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
transformed parameters {
vector[N] mu;
// no need to loop here, this can be vectorized
mu = alpha + beta * x;
}
model {
sigma ~ chi_square(5);
alpha ~ normal(0, 1);
beta ~ normal(0, 1);
z ~ normal(mu, sigma);
}
generated quantities {
// now if you want to check the prediction,
//you predict linearly and then apply the softmax again
vector[N] y_pred = inv_logit(alpha + beta * x);
}
"
If you won't use mu again outside the model, you can skip the transformed parameters block and compute it directly when needed:
z ~ normal(alpha + beta * x, sigma);
On a sidenote: You might want to reconsider your priors. The true values for alpha and beta in this case are 0 and 10 respectively. The likelihood is precise enough to overwrite the prior largely, but you'll probably see some shrinkage towards zero for beta (i.e. you might get 9 instead of 10). Try something like normal(0, 10) instead. And I've never seen someone use a chi-squared distribution as a prior on standard deviations.

Generated quantities block in stan model

i'm building a standard linear regression model and i want to include the generated quantities block and i want to use the dot_self() function. The problem is I canĀ“t get simulation samples. The error is: Stan model 'LinearRegression' does not contain samples. . I think the the function dot_self() is not being recognized as a function.
I show stan code and R code here.
Thanks in advance.
Note: I am sure that the data entered is correct because the model without the generated quantities block works perfectly.
Stan Code:
data {
int<lower=1> N;
int<lower=1> K;
matrix[N, K] X;
vector[N] y;
}
parameters {
vector[K] beta;
real<lower=0> sigma;
}
model{
vector[N] mu;
mu = X * beta;
beta ~ normal(0, 10);
sigma ~ cauchy(0, 5);
y ~ normal(mu, sigma);
}
generated quantities {
real rss;
real totalss;
real<lower=0, upper=1> R2;
vector[N] mu;
mu=X * beta;
rss=dot_self(y-mu);
totalss=dot_self(y-mean(y));
R2=1 - rss/totalss;
}
R Code to run Stan model:
library(rstan)
library(coda)
library(ggplot2)
rstan_options(auto_write=T)
options(mc.cores=parallel::detectCores())
dat=list(N=N, K=ncol(X), y=y, X=X)
fit3 = stan(file = "C:.... LinearRegression.stan", data = dat, iter = 100,chains = 4)
print(fit3, digits=3, prob=c(.025,.5,.975))
The error is due to the bounds on R2. I believe there is no need to impose bounds on generated quantities.
Here I used simulated X and y:
X = matrix(runif(N*K), N, K)
y = rowSums(X)
The results after removing the bounds are:

Outcome prediction using JAGS from R

[Code is updated and does not correspond to error messages anymore]
I am trying to understand how JAGS predicts outcome values (for a mixed markov model). I've trained the model on a dataset which includes outcome m and covariates x1, x2 and x3.
Predicting the outcome without fixing parameter values works in R, but the output seems completely random:
preds <- run.jags("model.txt",
data=list(x1=x1, x2=x2, x3=x3, m=m,
statealpha=rep(1,times=M), M=M, T=T, N=N), monitor=c("m_pred"),
n.chains=1, inits = NA, sample=1)
Compiling rjags model...
Calling the simulation using the rjags method...
Note: the model did not require adaptation
Burning in the model for 4000 iterations...
|**************************************************| 100%
Running the model for 1 iterations...
Simulation complete
Finished running the simulation
However, as soon as I try to fix parameters (i.e. use model estimates to predict outcome m, I get errors:
preds <- run.jags("model.txt",
data=list(x1=x1, x2=x2, x3=x3,
statealpha=rep(1,times=M), M=M, T=T, N=N, beta1=beta1), monitor=c("m"),
n.chains=1, inits = NA, sample=1)
Compiling rjags model...
Error: The following error occured when compiling and adapting the model using rjags:
Error in rjags::jags.model(model, data = dataenv, n.chains = length(runjags.object$end.state), :
RUNTIME ERROR:
Compilation error on line 39.
beta1[2,1] is a logical node and cannot be observed
beta1 in this case is a 2x2 matrix of coefficient estimates.
How is JAGS predicting m in the first example (no fixed parameters)? Is it just completely randomly choosing m?
How can I include earlier acquired model estimates to simulate new outcome values?
The model is:
model{
for (i in 1:N)
{
for (t in 1:T)
{
m[t,i] ~ dcat(ps[i,t,])
}
for (state in 1:M)
{
ps[i,1,state] <- probs1[state]
for (t in 2:T)
{
ps[i,t,state] <- probs[m[(t-1),i], state, i,t]
}
for (prev in 1:M){
for (t in 1:T) {
probs[prev,state,i,t] <- odds[prev,state,i,t]/totalodds[prev,i,t]
odds[prev,state,i,t] <- exp(alpha[prev,state,i] +
beta1[prev,state]*x1[t,i]
+ beta2[prev,state]*x2[t,i]
+ beta3[prev,state]*x3[t,i])
}}
alpha[state,state,i] <- 0
for (t in 1:T) {
totalodds[state,i,t] <- odds[state,1,i,t] + odds[state,2,i,t]
}
}
alpha[1,2,i] <- raneffs[i,1]
alpha[2,1,i] <- raneffs[i,2]
raneffs[i,1:2] ~ dmnorm(alpha.means[1:2],alpha.prec[1:2, 1:2])
}
for (state in 1:M)
{
beta1[state,state] <- 0
beta2[state,state] <- 0
beta3[state,state] <- 0
}
beta1[1,2] <- rcoeff[1]
beta1[2,1] <- rcoeff[2]
beta2[1,2] <- rcoeff[3]
beta2[2,1] <- rcoeff[4]
beta3[1,2] <- rcoeff[5]
beta3[2,1] <- rcoeff[6]
alpha.Sigma[1:2,1:2] <- inverse(alpha.prec[1:2,1:2])
probs1[1:M] ~ ddirich(statealpha[1:M])
for (par in 1:6)
{
alpha.means[par] ~ dt(T.constant.mu,T.constant.tau,T.constant.k)
rcoeff[par] ~ dt(T.mu, T.tau, T.k)
}
T.constant.mu <- 0
T.mu <- 0
T.constant.tau <- 1/T.constant.scale.squared
T.tau <- 1/T.scale.squared
T.constant.scale.squared <- T.constant.scale*T.constant.scale
T.scale.squared <- T.scale*T.scale
T.scale <- 2.5
T.constant.scale <- 10
T.constant.k <- 1
T.k <- 1
alpha.prec[1:2,1:2] ~ dwish(Om[1:2,1:2],2)
Om[1,1] <- 1
Om[1,2] <- 0
Om[2,1] <- 0
Om[2,2] <- 1
## Prediction
for (i in 1:N)
{
m_pred[1,i] <- m[1,i]
for (t in 2:T)
{
m_pred[t,i] ~ dcat(ps_pred[i,t,])
}
for (state in 1:M)
{
ps_pred[i,1,state] <- probs1[state]
for (t in 2:T)
{
ps_pred[i,t,state] <- probs_pred[m_pred[(t-1),i], state, i,t]
}
for (prev in 1:M)
{
for (t in 1:T)
{
probs_pred[prev,state,i,t] <- odds_pred[prev,state,i,t]/totalodds_pred[prev,i,t]
odds_pred[prev,state,i,t] <- exp(alpha[prev,state,i] +
beta1[prev,state]*x1[t,i]
+ beta2[prev,state]*x2[t,i]
+ beta3[prev,state]*x3[t,i])
}}
for (t in 1:T) {
totalodds_pred[state,i,t] <- odds_pred[state,1,i,t] + odds_pred[state,2,i,t]
}
}
}
TL;DR: I think you're just missing a likelihood.
Your model is complex, so perhaps I'm missing something, but as far as I can tell there is no likelihood. You are supplying the predictors x1, x2, and x3 as data, but you aren't giving any observed m. So in what sense can JAGS be "fitting" the model?
To answer your questions:
Yes, it appears that m is drawn as random from a categorical distribution conditioned on the rest of the model. Since there are no m supplied as data, none of the parameter distributions have cause for update, so your result for m is no different than you'd get if you just did random draws from all the priors and propagated them through the model in R or whatever.
Though it still wouldn't constitute fitting the model in any sense, you would be free to supply values for beta1 if they weren't already defined completely in the model. JAGS is complaining because currently beta1[i] = rcoeff[i] ~ dt(T.mu, T.tau, T.k), and the parameters to the T distribution are all fixed. If any of (T.mu, T.tau, T.k) were instead given priors (identifying them as random), then beta1 could be supplied as data and JAGS would treat rcoeff[i] ~ dt(T.mu, T.tau, T.k) as a likelihood. But in the model's current form, as far as JAGS is concerned if you supply beta1 as data, that's in conflict with the fixed definition already in the model.
I'm stretching here, but my guess is if you're using JAGS you have (or would like to) fit the model in JAGS too. It's a common pattern to include both an observed response and a desired predicted response in a jags model, e.g. something like this:
model {
b ~ dnorm(0, 1) # prior on b
for(i in 1:N) {
y[i] ~ dnorm(b * x[i], 1) # Likelihood of y | b (and fixed precision = 1 for the example)
}
for(i in 1:N_pred) {
pred_y[i] ~ dnorm(b * pred_x[i], 1) # Prediction
}
}
In this example model, x, y, and pred_x are supplied as data, the unknown parameter b is to be estimated, and we desire the posterior predictions pred_y at each value of pred_x. JAGS knows that the distribution in the first for loop is a likelihood, because y is supplied as data. Posterior samples of b will be constrained by this likelihood. The second for loop looks similar, but since pred_y is not supplied as data, it can do nothing to constrain b. Instead, JAGS knows to simply draw pred_y samples conditioned on b and the supplied pred_x. The values of pred_x are commonly defined to be the same as observed x, giving a predictive interval for each observed data point, or as a regular sequence of values along the x axis to generate a smooth predictive interval.

Stan: Copula on observed variable and latent variable

Consider observed data y1 and y2. y1 is measured on a continuous scale and y2 is measured on a binary scale. A continuous latent variable z is assumed to generate y2 as: y2 = I(z > 0). (If z is normal then y2 is binary probit marginally). Furthermore, a copula is used to model the dependency between y1 and z. This model could be written hierarchically (with some abuse of notation) as:
y2 = I(z > 0)
(y1, z) ~ C(F_y1( |w), F_z( |w) | phi)
w, phi ~ priors
where w is the vector of marginal parameters for y1 and z, F_y1 and F_z are respective marginal cdfs for y1 and z, phi is the copula parameter.
How could this be modelled in Stan? I have written a custom probability function to sample y1 and z from the bivariate likelihood produced by the copula. What I don't know how to do is to account for (generate?) the latent variable(s) z, and how to specify the relationship between y2 and z.
I have already looked at Probit regression with data augmentation in stan, but this does not seem helpful due to the copula I have in my model.
Edit: I might be mistaken about the above link not being useful. I have written the following code, would appreciate comments on if it looks correct (theoretically).
functions {
real copulapdf_log(real[] y1, real[] z, vector mu1, vector mu2, real sigma1, real phi, int n){
real logl;
real s;
logl <- 0.0;
for (i in 1:n){
s <- log(dCphi_du1du2_s(normal_cdf(y1[i],mu1[i],sigma1), logistic_cdf(z[i],mu2[i],1), phi)) + normal_log(y1[i],mu1[i],sigma1) + logistic_log(z[i],mu2[i],1);
logl <- logl + s;
}
return logl;
}
}
data {
int<lower=0> n; // number of subjects
int<lower=0> k1; // number of predictors for y1
int<lower=0> k2; // number of predictors for y2
real y1[n]; // continuous data
real y2[n]; // 0/1 binary data
matrix[n, k1] x1; // predictor variables for y1
matrix[n, k2] x2; // predictor variables for y2
}
transformed data{
int<lower=-1, upper=1> sign[n];
for (i in 1:n) {
if (y2[i]==1)
sign[i] <- 1;
else
sign[i] <- -1;
}
}
parameters {
real phi; // frank copula param
vector[k1] b1; // beta coefficients for y1
vector[k2] b2; // beta coefficients for y2
real<lower=0> abs_z[n]; // abs value of latent variable
real<lower=0> sigma1; // sd for y1's normal distribution
}
transformed parameters {
real v[n];
vector[n] mu1; // location for y1
vector[n] mu2; // location for z
for (i in 1:n) {
v[i] <- sign[i] * abs_z[i];
}
mu1 <- x1 * b1;
mu2 <- x2 * b2;
}
model {
b1 ~ normal(0, 100);
b2 ~ normal(0, 100);
phi ~ normal(0, 10);
increment_log_prob(copulapdf_log(y1, v, mu1, mu2, sigma1, phi, n));
}
If you need the latent parameter formulation, that's just like the Albert and Chib characterization of probit regression. What you need to do is declare the truncation in the parameters. There's an example in the manual chapter on regression involving multivariate probit that shows how it's done. Basically the positive values get a lower=0 constraint and the negative ones an upper=0 constraint and then you put both sets of parameters together into a z vector (if you actually need to put it back together).

Coda for visualizing Stan trajectories

I'm using Stan (specifically, rstan) with a Bayesian univariate linear regression $y = \beta_0 + \beta_1 x + \varepsilon$. And I'm trying to use the Coda package to visualize the resulting trajectories and distributions for the $\beta$s. However, this produces the error Error in plot.new() : figure margins too large. traceplot and densplot seem to work fine. The problem seems to be with plot.mcmc, which is supposed to produce a nice panel output. You can see an example of the expected output here, on slide "Traceplots and Density Plots".
Here's a minimum (non-)working example using the mtcars dataset:
library(rstan)
library(coda)
stanmodel <- "
data { // Data block: Exogenously given information
int<lower=1> N; // Sample size
vector<lower=1>[N] y; // Response or output.
// [N] means this is a vector of length N
vector<lower=0, upper=1>[N] x; // The single regressor; either 0 or 1
}
parameters { // Parameter block: Unobserved variables to be estimated
vector[2] beta; // Regression coefficients
real<lower=0> sigma; // Standard deviation of the error term
}
model { // Model block: Connects data to parameters
vector[N] yhat; // Regression estimate for y
yhat <- beta[1] + x*beta[2];
// Priors
beta ~ normal(0, 10);
// To plot in R: plot(function (x) {dnorm(x, 0, 10)}, -30, 30)
sigma ~ cauchy(0, 5); // With sigma bounded at 0, this is half-cauchy
// http://en.wikipedia.org/wiki/Cauchy_distribution
// To plot in R: plot(function (x) {dcauchy(x, 0, 5)}, 0, 10)
// Likelihood
y ~ normal(yhat, sigma); // yhat is the estimator, plus the N(0, sigma^2) error
// Note that Stan uses standard deviation
}
"
# Designate data
nobs <- nrow(mtcars)
y <- mtcars$mpg
x <- mtcars$am # Simple regression version doesn't include constant
data <- list(
N = nobs, # Sample size or number of observations
y = y, # The response or output
x = x # The single variable regressor, transmission type
)
# Set a seed for the random number generator
set.seed(123456)
# Run the model
bayes = stan(
model_code = stanmodel,
data = data, # Use the model and data we just defined
iter = 12000, # We're going to take 12,000 draws from the posterior,
warmup = 2000, # But throw away the first 2,000
thin = 10, # And only keep every tenth draw.
chains = 3 # But we'll do these 12,000 draws 4 times.
)
# Use the coda library to visualize parameter trajectories and distributions
param_samples <-
as.data.frame(bayes)[,c('beta[1]', 'beta[2]')]
plot(as.mcmc(param_samples))

Resources