I understand how to extract chains from a Stan model but I was wondering if there was any quick way to extract the values displayed on the default Stan output table.
Here is some toy data
# simulate linear model
a <- 3 # intercept
b <- 2 # slope
# we can have both the predictor and the noise vary
x <- rnorm(28, 0, 1)
eps <- rnorm(28, 0, 2)
y <- a + b*x + eps
Which when we analyse
mod <- lm(y ~ x, df)
We can extract coefficients from
mod$coefficients
# (Intercept) x
# 3.355967 2.151597
I wondered if there is any way to do the equivalent with a Stan output table
# Step 1: Make List
data_reg <- list(N = 28, x = x, y = y)
# Step 2: Create Model String
write("
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
vector[N] mu;
sigma ~ cauchy(0, 2);
beta ~ normal(0,10);
alpha ~ normal(0,100);
for ( i in 1:N ) {
mu[i] = alpha + beta * x[i];
}
y ~ normal(mu, sigma);
}
", file = "temp.stan")
# Step 3: Generate MCMC Chains
fit1 <- stan(file = "temp.stan",
data = data_reg,
chains = 2,
warmup = 1000,
iter = 2000,
cores = 2,
refresh = 1000)
Now, when we call the model
fit1
# Output
# mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
# alpha 3.33 0.01 0.40 2.57 3.06 3.33 3.59 4.13 1229 1
# beta 2.14 0.01 0.40 1.37 1.89 2.14 2.40 2.98 1470 1
# sigma 1.92 0.01 0.27 1.45 1.71 1.90 2.09 2.51 1211 1
# lp__ -31.92 0.05 1.30 -35.27 -32.50 -31.63 -30.96 -30.43 769 1
Is there any way to index and extract elements from the Output table displayed above?
If you only want means, then the get_posterior_mean function will work. Otherwise, you assign the result of print(fit1) or summary(print1) to an object, you can extract stuff from that object, but it is probably better to just do as.matrix(fit1) or as.data.frame(fit1) and calculate whatever you want yourself on the resulting columns.
I'm trying to adjust a Zero Inflated Poisson Hidden Markov Model with Stan. For the Poisson-HMM in a past forum this setting was shown. see link.
While to adjust the ZIP with the classical theory is well documented the code and model.
ziphsmm
library(ziphsmm)
set.seed(123)
prior_init <- c(0.5,0.5)
emit_init <- c(20,6)
zero_init <- c(0.5,0)
tpm <- matrix(c(0.9, 0.1, 0.2, 0.8),2,2,byrow=TRUE)
result <- hmmsim(n=100,M=2,prior=prior_init, tpm_parm=tpm,emit_parm=emit_init,zeroprop=zero_init)
y <- result$series
serie <- data.frame(y = result$series, m = result$state)
fit1 <- fasthmmfit(y,x=NULL,ntimes=NULL,M=2,prior_init,tpm,
emit_init,0.5, hessian=FALSE,method="BFGS",
control=list(trace=1))
fit1
$prior
[,1]
[1,] 0.997497445
[2,] 0.002502555
$tpm
[,1] [,2]
[1,] 0.9264945 0.07350553
[2,] 0.3303533 0.66964673
$zeroprop
[1] 0.6342182
$emit
[,1]
[1,] 20.384688
[2,] 7.365498
$working_parm
[1] -5.9879373 -2.5340475 0.7065877 0.5503559 3.0147840 1.9968067
$negloglik
[1] 208.823
Stan
library(rstan)
ZIPHMM <- 'data {
int<lower=0> N;
int<lower=0> y[N];
int<lower=1> m;
}
parameters {
real<lower=0, upper=1> theta; //
positive_ordered[m] lambda; //
simplex[m] Gamma[m]; // tpm
}
model {
vector[m] log_Gamma_tr[m];
vector[m] lp;
vector[m] lp_p1;
// priors
lambda ~ gamma(0.1,0.01);
theta ~ beta(0.05, 0.05);
// transposing tpm and taking the log of each entry
for(i in 1:m)
for(j in 1:m)
log_Gamma_tr[j, i] = log(Gamma[i, j]);
lp = rep_vector(-log(m), m); //
for(n in 1:N) {
for(j in 1:m){
if (y[n] == 0)
lp_p1[j] = log_sum_exp(log_Gamma_tr[j] + lp) +
log_sum_exp(bernoulli_lpmf(1 | theta),
bernoulli_lpmf(0 | theta) + poisson_lpmf(y[n] | lambda[j]));
else
lp_p1[j] = log_sum_exp(log_Gamma_tr[j] + lp) +
bernoulli_lpmf(0 | theta) +
poisson_lpmf(y[n] | lambda[j]);
}
lp = lp_p1;
}
target += log_sum_exp(lp);
}'
mod_ZIP <- stan(model_code = ZIPHMM, data=list(N=length(y), y=y, m=2), iter=1000, chains=1)
print(mod_ZIP,digits_summary = 3)
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
theta 0.518 0.002 0.052 0.417 0.484 0.518 0.554 0.621 568 0.998
lambda[1] 7.620 0.039 0.787 6.190 7.038 7.619 8.194 9.132 404 1.005
lambda[2] 20.544 0.039 0.957 18.861 19.891 20.500 21.189 22.611 614 1.005
Gamma[1,1] 0.664 0.004 0.094 0.473 0.604 0.669 0.730 0.841 541 0.998
Gamma[1,2] 0.336 0.004 0.094 0.159 0.270 0.331 0.396 0.527 541 0.998
Gamma[2,1] 0.163 0.003 0.066 0.057 0.114 0.159 0.201 0.312 522 0.999
Gamma[2,2] 0.837 0.003 0.066 0.688 0.799 0.841 0.886 0.943 522 0.999
lp__ -222.870 0.133 1.683 -227.154 -223.760 -222.469 -221.691 -220.689 161 0.999
True values
real = list(tpm = tpm,
zeroprop = nrow(serie[serie$m == 1 & serie$y == 0, ]) / nrow(serie[serie$m == 1,]),
emit = t(t(tapply(serie$y[serie$y != 0],serie$m[serie$y != 0], mean))))
real
$tpm
[,1] [,2]
[1,] 0.9 0.1
[2,] 0.2 0.8
$zeroprop
[1] 0.6341463
$emit
[,1]
1 20.433333
2 7.277778
Estimates give quite oddly to someone could help me to know that I am doing wrong. As we see the estimates of stan zeroprop = 0.518 while the real value is 0.634, on the other hand the values of the t.p.m. in stan they are quite distant and the means lambda1 = 7.62 and lambda2 = 20.54 although they approximate enough gave in different order to the real 20.43 and 7.27. I think I'm making some mistake in defining the model in Stan but I do not know which.
Although I don't know the inner workings of the ZIP-HMM fitting algorithm, there are some obvious differences in what you have implemented in the Stan model and how the ZIP-HMM optimization algorithm describes itself. Addressing these appears to be sufficient to generate similar results.
Differences Between the Models
Initial State Probability
The values that the ZIP-HMM estimates, specifically fit1$prior, indicate that it includes an ability to learn a probability for initial state. However, in the Stan model, this is fixed to 1:1
lp = rep_vector(-log(m), m);
This should be changed to allow the model to estimate an initial state.
Priors on Parameters (optional)
The Stan model has non-flat priors on lambda and theta, but presumably the ZIP-HMM is not weighting the specific values it arrives. If one wanted to more realistically mimic the ZIP-HMM, then flat priors would be better. However, the ability to have non-flat priors in Stan is really an opportunity to develop a more well-tuned model than is achievable with standard HMM inference algorithms.
Zero-Inflation on State 1
From the documentation of the fasthmmfit method
Fast gradient descent / stochastic gradient descent algorithm to learn the parameters in a specialized zero-inflated hidden Markov model, where zero-inflation only happens in State 1. [emphasis added]
The Stan model assumes zero-inflation on all states. This is likely why the estimated theta value is deflated relative to the ZIP-HMM MAP estimate.
State Ordering
When estimating discrete latent states or clusters in Stan, one can use an ordered vector as a trick to mitigate against label switching issues. This is effectively achieved here with
positive_ordered[m] lambda;
However, since the ZIP-HMM only has zero-inflation on the first state, correctly implementing this behavior in Stan requires prior knowledge of what the rank of the lambda is for the "first" state. This seems very problematic for generalizing this code. For now, let's just move forward under the assumption that we can always recover this information somehow. In this specific case, we will assume that state 1 in the HMM has the higher lambda value, and therefore will be state 2 in the Stan model.
Updated Stan Model
Incorporating the above changes in the model should be something like
Stan Model
data {
int<lower=0> N; // length of chain
int<lower=0> y[N]; // emissions
int<lower=1> m; // num states
}
parameters {
simplex[m] start_pos; // initial pos probs
real<lower=0, upper=1> theta; // zero-inflation parameter
positive_ordered[m] lambda; // emission poisson params
simplex[m] Gamma[m]; // transition prob matrix
}
model {
vector[m] log_Gamma_tr[m];
vector[m] lp;
vector[m] lp_p1;
// transposing tpm and taking the log of each entry
for (i in 1:m) {
for (j in 1:m) {
log_Gamma_tr[j, i] = log(Gamma[i, j]);
}
}
// initial position log-lik
lp = log(start_pos);
for (n in 1:N) {
for (j in 1:m) {
// log-lik for state
lp_p1[j] = log_sum_exp(log_Gamma_tr[j] + lp);
// log-lik for emission
if (j == 2) { // assuming only state 2 has zero-inflation
if (y[n] == 0) {
lp_p1[j] += log_mix(theta, 0, poisson_lpmf(0 | lambda[j]));
} else {
lp_p1[j] += log1m(theta) + poisson_lpmf(y[n] | lambda[j]);
}
} else {
lp_p1[j] += poisson_lpmf(y[n] | lambda[j]);
}
}
lp = lp_p1; // log-lik for next position
}
target += log_sum_exp(lp);
}
MAP Estimate
Loading the above as a string variable code.ZIPHMM, we first compile it and run a MAP estimate (since MAP estimation is going to behave most like the HMM fitting algorithm):
model.ZIPHMM <- stan_model(model_code=code.ZIPHMM)
// note the use of some initialization on the params,
// otherwise it can occasionally converge to strange extrema
map.ZIPHMM <- optimizing(model.ZIPHMM, algorithm="BFGS",
data=list(N=length(y), y=y, m=2),
init=list(theta=0.5, lambda=c(5,10)))
Examining the estimated parameters
> map.ZIPHMM$par
start_pos[1] start_pos[2]
9.872279e-07 9.999990e-01
theta
6.342449e-01
lambda[1] lambda[2]
7.370525e+00 2.038363e+01
Gamma[1,1] Gamma[2,1] Gamma[1,2] Gamma[2,2]
6.700871e-01 7.253215e-02 3.299129e-01 9.274678e-01
shows they closely reflect the values that fasthmmfit inferred, excepting that the state orders are switched.
Sampling the Posterior
This model can also be run with MCMC to infer a full posterior,
samples.ZIPHMM <- stan(model_code = code.ZIPHMM,
data=list(N=length(y), y=y, m=2),
iter=2000, chains=4)
which samples well and yields similar results (and without any parameter initializations)
> samples.ZIPHMM
Inference for Stan model: b29a2b7e93b53c78767aa4b0c11b62a0.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
start_pos[1] 0.45 0.00 0.29 0.02 0.20 0.43 0.69 0.97 6072 1
start_pos[2] 0.55 0.00 0.29 0.03 0.31 0.57 0.80 0.98 6072 1
theta 0.63 0.00 0.05 0.53 0.60 0.63 0.67 0.73 5710 1
lambda[1] 7.53 0.01 0.72 6.23 7.02 7.49 8.00 9.08 4036 1
lambda[2] 20.47 0.01 0.87 18.83 19.87 20.45 21.03 22.24 5964 1
Gamma[1,1] 0.65 0.00 0.11 0.43 0.57 0.65 0.72 0.84 5664 1
Gamma[1,2] 0.35 0.00 0.11 0.16 0.28 0.35 0.43 0.57 5664 1
Gamma[2,1] 0.08 0.00 0.03 0.03 0.06 0.08 0.10 0.16 5605 1
Gamma[2,2] 0.92 0.00 0.03 0.84 0.90 0.92 0.94 0.97 5605 1
lp__ -214.76 0.04 1.83 -219.21 -215.70 -214.43 -213.43 -212.25 1863 1
Working with the "prostate" dataset in "ElemStatLearn" package.
set.seed(3434)
fit.lm = train(data=trainset, lpsa~., method = "lm")
fit.ridge = train(data=trainset, lpsa~., method = "ridge")
fit.lasso = train(data=trainset, lpsa~., method = "lasso")
Comparing RMSE (for bestTune in case of ridge and lasso)
fit.lm$results[,"RMSE"]
[1] 0.7895572
fit.ridge$results[fit.ridge$results[,"lambda"]==fit.ridge$bestTune$lambda,"RMSE"]
[1] 0.8231873
fit.lasso$results[fit.lasso$results[,"fraction"]==fit.lasso$bestTune$fraction,"RMSE"]
[1] 0.7779534
Comparing absolute value of coefficients
abs(round(fit.lm$finalModel$coefficients,2))
(Intercept) lcavol lweight age lbph svi lcp gleason pgg45
0.43 0.58 0.61 0.02 0.14 0.74 0.21 0.03 0.01
abs(round(predict(fit.ridge$finalModel, type = "coef", mode = "norm")$coefficients[8,],2))
lcavol lweight age lbph svi lcp gleason pgg45
0.49 0.62 0.01 0.14 0.65 0.05 0.00 0.01
abs(round(predict(fit.lasso$finalModel, type = "coef", mode = "norm")$coefficients[8,],2))
lcavol lweight age lbph svi lcp gleason pgg45
0.56 0.61 0.02 0.14 0.72 0.18 0.00 0.01
My question is: how can "ridge" RMSE be higher than that of plain "lm". Doesn't that defeat the very purpose of penalized regression vs plain "lm"?
Also, how can the absolute value of the coefficient of "lweight" be actually higher in ridge (0.62) vs that in lm (0.61)? Both coefficients are positive originally without the abs().
I was expecting ridge to perform similar to lasso, which not only reduced RMSE but also shrank the size of coefficients vs plain "lm".
Thank you!
I have a question about the how to calculate differences between the coefficients (categorical variables) of glm in Rstan.
As example, I used iris dataset in R to judge whether I can calculate the posterior distribution of differences of coefficients.
At first, I conducted a basic glm procedure like below and calculate the significant differences of coefficients.
library(tidyverse)
library(magrittr)
library(multcomp)
iris_glm <-
glm(Sepal.Length ~ Species, data = iris)
multcomp::glht(iris_glm, linfct = mcp(Species = "Tukey")) %>%
summary(.) %>%
broom::tidy()
lhs rhs estimate std.error statistic p.value
1 versicolor - setosa 0 0.930 0.1029579 9.032819 0.000000e+00
2 virginica - setosa 0 1.582 0.1029579 15.365506 0.000000e+00
3 virginica - versicolor 0 0.652 0.1029579 6.332686 4.294805e-10
Next, I conducted bayesian glm procedure using stan like below code, and calculate the posterior distribution of the differences between coefficients in generated quantities section.
# Make the model matrix for Rstan
iris_mod <-
model.matrix(Sepal.Length ~ Species, data = iris) %>%
as.data.frame(.)
# Input data
stan_data <-
list(N = nrow(iris_mod),
SL = iris$Sepal.Length,
Intercept = iris_mod$`(Intercept)`,
versicolor = iris_mod$Speciesversicolor,
virginica = iris_mod$Speciesvirginica)
# Stan code
data{
int N;
real <lower = 0> SL[N];
int <lower = 1> Intercept[N];
int <lower = 0, upper = 1> versicolor[N];
int <lower = 0, upper = 1> virginica[N];
}
parameters{
real beta0;
real beta1;
real beta2;
real <lower = 0> sigma;
}
transformed parameters{
real mu[N];
for(n in 1:N) mu[n] = beta0*Intercept[n] + beta1*versicolor[n] +
beta2*virginica[n];
}
model{
for(n in 1:N) SL[n] ~ normal(mu[n], sigma);
}
generated quantities{
real diff_beta0_beta1;
real diff_beta1_beta2;
real diff_beta0_beta2;
diff_beta0_beta1 = (beta0 + beta1) - beta0;
diff_beta1_beta2 = (beta0 + beta1) - (beta0 + beta2);
diff_beta0_beta2 = (beta0 + beta2) - beta0;
}
library(rstan)
fit_stan <-
stan(file = "iris.stan", data = stan_data, chains = 4,
seed = 1234)
# confirmation of posterior distribution
print(fit_stan, pars = c("diff_beta0_beta1", "diff_beta1_beta2",
"diff_beta0_beta2"))
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
diff_beta0_beta1 0.92 0 0.1 0.73 0.86 0.92 0.99 1.13 2041 1
diff_beta1_beta2 0.65 0 0.1 0.45 0.58 0.65 0.72 0.86 4000 1
diff_beta0_beta2 1.58 0 0.1 1.38 1.51 1.58 1.64 1.78 1851 1
Finally, I could get same results between the frequentist method and bayesian method.
I think this is correct way, but I'm not sure this because there are no information nor examples.
Also I also confirm this way could be extended another error distributions (including, poisson, gamma, binomial, negative- binomial etc.).
If there are another good ways or advices, please teach me.
You can calculate any function (including the difference in coefficients) of draws (such as those produced by Stan) from any proper posterior distribution.
I have a logistic regression model, for which I have been using the rms package. The model fits best using a log term for tn1, and for clinical interpretation I’m using log2. I ran the model using lrm from the rms package, and then to double check, I ran it using glm. The initial coefficients are the same:
h <- lrm(formula = outcomehosp ~ I(log2(tn1 + 0.001)) + apscore_ad +
emsurg + corrapiidiag, data = d, x = TRUE, y = TRUE)
Coef S.E. Wald Z Pr(>|Z|)
Intercept -3.4570 0.3832 -9.02 <0.0001
tn1 0.0469 0.0180 2.60 0.0093
apscore_ad 0.1449 0.0127 11.44 <0.0001
emsurg 0.0731 0.3228 0.23 0.8208
f <- glm(formula = outcomehosp ~ apscore_ad + emsurg + corrapiidiag +
I(log2(tn1 + 0.001)), family = binomial(), data = tn24)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.45699 0.38315 -9.023 < 2e-16
I(log2(tn1 + 0.001)) 0.04690 0.01804 2.600 0.00932
apscore_ad 0.14487 0.01267 11.438 < 2e-16
emsurg 0.07310 0.32277 0.226 0.82082
However when I try to get the odds ratios, they are noticeably different for tn1 between the two models, and this doesn’t seem to be a difference of log2 transformation.
summary(h)
Effects Response : outcomehosp
Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95
tn1 0 0.21 0.21 0.362120 0.15417 6.5300e-02 0.673990
Odds Ratio 0 0.21 0.21 1.436400 NA 1.0675e+00 1.962100
apscore_ad 14 25.00 11.00 1.593600 0.15631 1.3605e+00 1.961000
Odds Ratio 14 25.00 11.00 4.921400 NA 3.8981e+00 7.106600
emsurg 0 1.00 1.00 0.073103 0.33051 -5.8224e-01 0.734860
Odds Ratio 0 1.00 1.00 1.075800 NA 5.5865e-01 2.085200
exp(f$coefficients)
(Intercept) 0.03152467
apscore_ad 1.15589222
emsurg 1.07584115
I(log2(tn1 + 0.001)) 1.04802
Would anyone be able to explain what the rms package is calculating the odds ratio of? Many thanks.
The tn1 effect from summary(h) is the effect on the log of the odds ratio of tn1 going from 0 to 0.21 -- the inter-quartile range. See ?summary.rms.
So, the effect from the first row of summary(h) is 0.36212 = (log2(0.211)-log2(0.001))*.0469.