Fixing a parameter to a distribution in JAGS - r

In the Bayesian programing language JAGS, I am looking for a way to fix a parameter to a specific distribution, as opposed to a constant. The paragraph below presents this question more explicitly and references JAGS code. I would also be open to answers that use other probabilistic programming languages (e.g., stan).
The first code chunk below (model1) is a JAGS script designed to estimate a two-group Gaussian mixture model with unequal variances. I am looking for a way to fix one of the parameters (say $\mu_2$) to a particular distribution (e.g., dnorm(0,0.0001)). I know how to fix $\mu_2$ to a constant (e.g., see model2 in code chunk 2), though I cannot find a way to fix $\mu_2$ to my prior belief(e.g., see model3 in code chunk 3, which shows conceptually what I am trying to do).
Thanks in advance!
Code chunk 1
model1 = "
model {
for (i in 1:n1){
y1[i] ~ dnorm (mu1 , phi1)
}
for (i in 1:n2){
y2[i] ~ dnorm (mu2 , phi2)
}
# Priors
phi1 ~ dgamma(.001,.001)
phi2 ~ dgamma(.001,.001)
sigma2.1 <- 1/phi1
sigma2.2 <- 1/phi2
mu1 ~ dnorm (0,0.0001)
mu2 ~ dnorm (0,0.0001)
# Create a variable for the mean difference
delta <- mu1 - mu2
}
"
Code chunk 2
model2 = "
model {
for (i in 1:n1){
y1[i] ~ dnorm (mu1 , phi1)
}
for (i in 1:n2){
y2[i] ~ dnorm (mu2 , phi2)
}
# Priors
phi1 ~ dgamma(.001,.001)
phi2 ~ dgamma(.001,.001)
sigma2.1 <- 1/phi1
sigma2.2 <- 1/phi2
mu1 ~ dnorm (0,0.0001)
mu2 <- 1.27
# Create a variable for the mean difference
delta <- mu1 - mu2
}
"
Code chunk 3
model3 = "
model {
for (i in 1:n1){
y1[i] ~ dnorm (mu1 , phi1)
}
for (i in 1:n2){
y2[i] ~ dnorm (mu2 , phi2)
}
# Priors
phi1 ~ dgamma(.001,.001)
phi2 ~ dgamma(.001,.001)
sigma2.1 <- 1/phi1
sigma2.2 <- 1/phi2
mu1 ~ dnorm (0,0.0001)
mu2 <- dnorm (0,0.0001)
# Create a variable for the mean difference
delta <- mu1 - mu2
}
"

I don't know JAGS, but here are two Stan versions. One takes a single sample of mu2 across all iterations; the second takes a different sample of mu2 for each iteration.
Either way, I'm not qualified to judge whether this is actually a good idea. (The second version, in particular, is something that the Stan team has deliberately tried to avoid, for the reasons described here.) But it's at least possible.
(In both examples, I changed some of the prior distributions to make the data easier to work with, but the basic idea is the same.)
One sample of mu2
First, the Stan model.
data {
int<lower=0> n1;
vector[n1] y1;
int<lower=0> n2;
vector[n2] y2;
}
transformed data {
// Set mu2 to a single randomly selected value (instead of giving it a prior
// and estimating it).
real mu2 = normal_rng(0, 0.0001);
}
parameters {
real mu1;
real<lower=0> phi1;
real<lower=0> phi2;
}
transformed parameters {
real sigma1 = 1 / phi1;
real sigma2 = 1 / phi2;
}
model {
mu1 ~ normal(0, 0.0001);
phi1 ~ gamma(1, 1);
phi2 ~ gamma(1, 1);
y1 ~ normal(mu1, sigma1);
y2 ~ normal(mu2, sigma2);
}
generated quantities {
real delta = mu1 - mu2;
// We can't return mu2 from the transformed data block. So if we want to see
// what it was, we have to copy its value into a generated quantity and return
// that.
real mu2_return = mu2;
}
Next, R code to generate fake data and fit the model.
# Generate fake data.
n1 = 1000
n2 = 1000
mu1 = rnorm(1, 0, 0.0001)
mu2 = rnorm(1, 0, 0.0001)
phi1 = rgamma(1, shape = 1, rate = 1)
phi2 = rgamma(1, shape = 1, rate = 1)
y1 = rnorm(n1, mu1, 1 / phi1)
y2 = rnorm(n2, mu2, 1 / phi2)
delta = mu1 - mu2
# Fit the Stan model.
library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = T)
stan.data = list(n1 = n1, y1 = y1, n2 = n2, y2 = y2)
stan.model = stan(file = "stan_model.stan",
data = stan.data,
cores = 3, iter = 1000)
We can extract the samples from the Stan model and see that we correctly recovered the parameters' true values - except, of course, in the case of mu2.
# Pull out the samples.
library(tidybayes)
library(tidyverse)
stan.model %>%
spread_draws(mu1, phi1, mu2_return, phi2) %>%
ungroup() %>%
dplyr::select(.draw, mu1, phi1, mu2 = mu2_return, phi2) %>%
pivot_longer(cols = -c(.draw), names_to = "parameter") %>%
ggplot(aes(x = value)) +
geom_histogram() +
geom_vline(data = data.frame(parameter = c("mu1", "phi1", "mu2", "phi2"),
true.value = c(mu1, phi1, mu2, phi2)),
aes(xintercept = true.value), color = "red", size = 1.5) +
facet_wrap(~ parameter, scales = "free") +
theme_bw() +
scale_x_continuous("Parameter value") +
scale_y_continuous("Number of samples")
New sample of mu2 for each iteration
We can't generate a random number in the parameters, transformed parameters, or model block; again, this is a deliberate design choice. But we can generate a whole bunch of numbers in the transformed data block and grab a new one for each iteration. To do this, we need a way to figure out which iteration we're on in the parameters block. I used Louis's solution from the end of this discussion on the Stan forums. First, save the following C++ code as iter.hpp in your working directory:
static int itct = 1;
inline void add_iter(std::ostream* pstream__) {
itct += 1;
}
inline int get_iter(std::ostream* pstream__) {
return itct;
}
Next, define the Stan model as follows. The functions add_iter() and get_iter() are defined in iter.hpp; if you're working in RStudio, you'll get error symbols when you edit the Stan file because RStudio doesn't know that we're going to bring in those function definitions from elsewhere.
functions {
void add_iter();
int get_iter();
}
data {
int<lower=0> n1;
vector[n1] y1;
int<lower=0> n2;
vector[n2] y2;
int<lower=0> n_iterations;
}
transformed data {
vector[n_iterations + 1] all_mu2s;
for(n in 1:(n_iterations + 1)) {
all_mu2s[n] = normal_rng(0, 0.0001);
}
}
parameters {
real mu1;
real<lower=0> phi1;
real<lower=0> phi2;
}
transformed parameters {
real sigma1 = 1 / phi1;
real sigma2 = 1 / phi2;
real mu2 = all_mu2s[get_iter()];
}
model {
mu1 ~ normal(0, 0.0001);
phi1 ~ gamma(1, 1);
phi2 ~ gamma(1, 1);
y1 ~ normal(mu1, sigma1);
y2 ~ normal(mu2, sigma2);
}
generated quantities {
real delta = mu1 - mu2;
add_iter();
}
Note that the model actually generates 1 more random value for mu2 than we need. When I tried generating exactly n_iterations random values, I got an error informing me that Stan had tried to access all_mu2s[1001].
I find this worrisome, because it means I don't fully understand what's going on internally - shouldn't there be only 1000 iterations, given the R code below? But it just looks like an off-by-one error, and the fitted model looks reasonable, so I didn't pursue this further.
Also, note that this approach gets the iteration number, but not the chain. I ran just one chain; if you run more than one chain, the ith value of mu2 will be the same in each chain. That same Stan forums discussion has a suggestion for distinguishing among chains, but I didn't explore it.
Finally, generate fake data and fit the model to it. When we compile the model, we need to sneak in the function definitions from iter.hpp, as described here.
# Generate fake data.
n1 = 1000
n2 = 1000
mu1 = rnorm(1, 0, 0.0001)
mu2 = rnorm(1, 0, 0.0001)
phi1 = rgamma(1, shape = 1, rate = 1)
phi2 = rgamma(1, shape = 1, rate = 1)
y1 = rnorm(n1, mu1, 1 / phi1)
y2 = rnorm(n2, mu2, 1 / phi2)
delta = mu1 - mu2
n.iterations = 1000
# Fit the Stan model.
library(rstan)
stan.data = list(n1 = n1, y1 = y1, n2 = n2, y2 = y2,
n_iterations = n.iterations)
stan.model = stan_model(file = "stan_model.stan",
allow_undefined = T,
includes = paste0('\n#include "',
file.path(getwd(), 'iter.hpp'),
'"\n'))
stan.model.fit = sampling(stan.model,
data = stan.data,
chains = 1,
iter = n.iterations,
pars = c("mu1", "phi1", "mu2", "phi2"))
Once again, we recovered the values of mu1, phi1, and phi2 reasonably well. This time, we used a whole range of values for mu2, which follow the specified distribution.
# Pull out the samples.
library(tidybayes)
library(tidyverse)
stan.model.fit %>%
spread_draws(mu1, phi1, mu2, phi2) %>%
ungroup() %>%
dplyr::select(.draw, mu1, phi1, mu2 = mu2, phi2) %>%
pivot_longer(cols = -c(.draw), names_to = "parameter") %>%
ggplot(aes(x = value)) +
geom_histogram() +
stat_function(dat = data.frame(parameter = "mu2", value = 0),
fun = function(.x) { dnorm(.x, 0, 0.0001) * 0.01 },
color = "blue", size = 1.5) +
geom_vline(data = data.frame(parameter = c("mu1", "phi1", "mu2", "phi2"),
true.value = c(mu1, phi1, mu2, phi2)),
aes(xintercept = true.value), color = "red", size = 1.5) +
facet_wrap(~ parameter, scales = "free") +
theme_bw() +
scale_x_continuous("Parameter value") +
scale_y_continuous("Number of samples")

Related

Failing to simulate data for a negative binomial probability distribution

I am hoping someone can help me.
In a beginners workshop I attended, in the process of fitting a multiple regression model, the instructor initially established a prior predictive check using a Poisson distribution for the outcome. This was done in two steps. Initially, a function was created:
multiple_regression_poisson_dgp <- function(predictor1,
predictor2, alpha_mean, alpha_sd, beta_predictor1_mean,
beta_predictor1_sd, beta_predictor2_mean,
beta_predictor2_sd) {
N <- length(predictor1)
alpha <- rnorm(1, mean = alpha_mean, sd = alpha_sd);
beta_predictor1 <- rnorm(1, mean = beta_predictor1_mean,
sd = beta_predictor1_sd);
beta_predictor2 <- rnorm(1, mean = beta_predictor2_mean,
sd = beta_super_sd);
outcome <- rpois(N, lambda = alpha + beta_predictor1 *
predictor1 + beta_predictor2 * predictor2)
return(outcome)
}
After this function was created, the following priors were generated:
multiple_regression_poisson_dgp(dataset$predictor1,
dataset$predictor2,
alpha_mean = 1,
alpha_sd = 0.5,
beta_predictor1_mean = -0.25,
beta_predictor1_sd = 0.5,
beta_predictor2_mean = 0,
beta_predictor2_sd = 1)
This worked fine. The issue is that, further down the line, it was shown that the Poisson distribution was not the most adequate. The negative binomial was suggested as the next step. Unfortunately, when I try to replicate the process for the negative binomial, I am unsuccessful. I have tried to replicate both of the steps shown above, but for the negative binomial. The first step was coded as:
multiple_regression_negative_binomial_dgp <-
function(predictor1, predictor2, alpha_mean, alpha_sd,
beta_predictor1_mean, beta_predictor1_sd,
beta_predictor2_mean, beta_predictor2_sd, phi_mean,
phi_sd) {
N <- length(predictor1)
alpha <- rnorm(1, mean = alpha_mean, sd = alpha_sd);
beta_predictor1 <- rnorm(1, mean = beta_predictor1_mean,
sd = beta_predictor1_sd);
beta_predictor2 <- rnorm(1, mean = beta_predictor2_mean,
sd = beta_super_sd);
phi <- rnorm(1, mean = phi_mean, sd = phi_sd);
outcome<- rnbinom(N, size = mu + mu^2/phi, mu = alpha +
beta_predictor1 * predictor1 + beta_predictor2 *
predictor2)
return(outcome)
}
Because there is a phi in the negative binomial, and given that it will be a parameter whose prior I will be calculating, I assumed it needed to be added to the equation. Additionally, given the documentation for rnbinom(), I thought i could treat mu as I treated lambda in the Poisson generation, feeding the regression equation onto it.
The function is likely inadequate, but after I create it and move onto the second step, the errors emerge. The second step I coded as:
multiple_regression_negative_binomial_dgp(dataset$predictor1,
dataset$predictor2,
alpha_mean = 1,
alpha_sd = 0.5,
beta_predictor1_mean = -0.25,
beta_predictor1_sd = 0.5,
beta_predictor2_mean = 0,
beta_predictor2_sd = 1,
phi_mean = 0,
phi_sd = 1)
However, as soon as I try to run this data generating process, I get the warning stating:
Error in rnbinom(N, size = mu, mu = alpha + beta_predictor1 * predictor1 + beta_predictor2 * predictor2 : object 'mu' not found
Any help would be much appreciated, I realize that I am applying a more mechanistic mindset in trying to replicate the Poisson data generating process for the negative binomial one, but I have been unable to find any clues as to how to solve this. Most examples I came across define a value for mu and for size, instead of 'feeding' it the formula.
In your multiple_regression_negative_binomial_dgp function, you call rnbinom. That function needs a size argument, and you assign mu + mu^2/phi to it, but mu is not defined within the function, nor passed to it. The fact that rnbinom contains a mu argument, which you do provide (alpha + beta_predictor1 * predictor1 + beta_predictor2 * predictor2), doesn't take care of it, because rnbinom doesn't pass that information over to size. I would suggest you try:
multiple_regression_negative_binomial_dgp <- function(predictor1, predictor2,
alpha_mean, alpha_sd, beta_predictor1_mean,
beta_predictor1_sd, beta_predictor2_mean,
beta_predictor2_sd, phi_mean, phi_sd) {
N <- length(predictor1)
alpha <- rnorm(1, mean=alpha_mean, sd=alpha_sd)
beta_predictor1 <- rnorm(1, mean=beta_predictor1_mean, sd=beta_predictor1_sd)
beta_predictor2 <- rnorm(1, mean=beta_predictor2_mean, sd=beta_super_sd);
phi <- rnorm(1, mean=phi_mean, sd=phi_sd)
Mu <- alpha + beta_predictor1*predictor1 + beta_predictor2*predictor2
outcome <- rnbinom(N, size=Mu + Mu^2/phi, mu=Mu)
return(outcome)
}

What is causing this rjags error: dimension mismatch?

I am experiencing issues with running the following time-series JAGS model in R:
data(lynx)
y <- as.vector(lynx)
y
x <- 1:length(y)
library(rjags)
mod <- "model {
alpha ~ dnorm(0, 0.0001)
beta ~ dnorm(0, 0.0001)
lambda ~ dgamma(1, 1)
for (i in 2:length(y)) {
y[i] ~ dpois(lambda[i])
lambda[i] <- alpha + beta * x[i - 1]
}
}"
mod <- textConnection(mod)
samples <- jags.model(mod, data = list('x' = x, 'y' = y), n.chains = 3) #
# Error in jags.model(mod, data = list(x = x, y = y), n.chains = 3) :
# RUNTIME ERROR:
# Cannot insert node into lambda[1:114]. Dimension mismatch
Is someone able to explain what the above error is referring to and how to fix it?
lambda is written as the rate term of the Poisson distribution in your loop but then you specify it as a gamma distribution in your priors. This is causing a dimension mismatch. On top of this, you need to use the appropriate link function for the Poisson distribution.
mod <- "model {
alpha ~ dnorm(0, 0.0001)
beta ~ dnorm(0, 0.0001)
for (i in 2:length(y)) {
y[i] ~ dpois(lambda[i])
log(lambda[i]) <- alpha + beta * x[i - 1]
}
}"
mod <- textConnection(mod)
# create model object
model_fit <- jags.model(mod, data = list('x' = x, 'y' = y), n.chains = 3)
# collect samples
samples <- coda.samples(model_fit, c("alpha", "beta"), n.iter = 10000)

Stan Model Posterior Predictions Outside Possible Range of Data

I am having a lot of fun right now learning the ropes of modeling in Stan. Right now I'm wrestling with my model of a mixed between- and within-subjects factorial experimental design. There are different groups of subjects, Each subject indicates how much they expect each of three different beverages (water, decaf, and coffee) to reduce their caffeine withdrawal. The outcome variable - expectancy of withdrawal reduction - was measured via a Visual Analog Scale from 0 - 10 with 0 indicating no expectation of withdrawal reduction and 10 indicating a very high expectation of withdrawal reduction. I want to test if there are between-group differences in the amount of expected withdrawal-reduction potential of the three different beverages.
Here is the data
df <- data.frame(id = rep(1:46, each = 3),
group = c(3,3,3,1,1,1,3,3,3,1,1,1,3,3,3,1,1,1,3,3,3,2,2,2,1,1,1,3,3,3,3,3,3,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3,2,2,2,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3,2,2,2,3,3,3,3,3,3,2,2,2,3,3,3,3,3,3,1,1,1,3,3,3,3,3,3,1,1,1,2,2,2,2,2,2,1,1,1,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,1,1,1,1,3,3,3,1,1,1,3,3,3),
bevType = rep(c(3,2,1), times = 46),
score = c(2.9,1.0,0.0,9.5,5.0,4.5,9.0,3.0,5.0,5.0,0.0,3.0,9.5,2.0,3.0,8.5,0.0,6.0,5.2,3.0,4.0,8.4,7.0,2.0,10.0,0.0,3.0,7.3,1.0,1.8,8.5,2.0,9.0,10.0,5.0,10.0,8.3,2.0,5.0,6.0,0.0,5.0,6.0,0.0,5.0,10.0,0.0,5.0,6.8,1.0,4.8,8.0,1.0,4.0,7.0,4.0,6.0,6.5,1.0,3.1,9.0,1.0,0.0,6.0,0.0,2.0,9.5,4.0,6.0,8.0,1.0,3.8,0.4,0.0,7.0,7.0,0.0,3.0,9.0,2.0,5.0,9.5,2.0,7.0,7.9,5.0,4.9,8.0,1.0,1.0,9.3,5.0,7.9,6.5,2.0,3.0,8.0,2.0,6.0,10.0,0.0,5.0,6.0,0.0,5.0,6.8,0.1,7.0,8.0,3.0,9.1,8.2,0.0,7.9,8.2,5.0,0.0,9.2,1.0,3.1,9.1,3.0,0.6,5.7,2.0,5.1,7.0,0.0,7.4,8.0,1.0,1.5,9.1,4.0,4.3,8.5,8.0,5.0))
Now for the model. The model has a grand mean parameter a, a categorical predictor representing groups deflections from the grand mean bGroup, a term for deflections of the different beverage types from the grand mean bBev, a term for each subject's intercept bSubj, and a term for the group by beverage interaction bGxB. I also estimated separate noise parameters for each beverage type.
To allow posterior predictive checks I drew from the joint posterior using the generated quantities block and the normal_rng function.
### Step 1: Put data into list
dList <- list(N = 138,
nSubj = 46,
nGroup = 3,
nBev = 3,
sIndex = df$id,
gIndex = df$group,
bIndex = df$bevType,
score = df$score,
gMean = 4.718841,
gSD = 3.17)
#### Step 1 model
write("
data{
int<lower=1> N;
int<lower=1> nSubj;
int<lower=1> nGroup;
int<lower=1> nBev;
int<lower=1,upper=nSubj> sIndex[N];
int<lower=1,upper=nGroup> gIndex[N];
int<lower=1,upper=nBev> bIndex[N];
real score[N];
real gMean;
real gSD;
}
parameters{
real a;
vector[nSubj] bSubj;
vector[nGroup] bGroup;
vector[nBev] bBev;
vector[nBev] bGxB[nGroup]; // vector of vectors, stan no good with matrix
vector[nBev] sigma;
real<lower=0> sigma_a;
real<lower=0> sigma_s;
real<lower=0> sigma_g;
real<lower=0> sigma_b;
real<lower=0> sigma_gb;
}
model{
vector[N] mu;
//hyper-priors
sigma_s ~ normal(0,10);
sigma_g ~ normal(0,10);
sigma_b ~ normal(0,10);
sigma_gb ~ normal(0,10);
//priors
sigma ~ cauchy(0,1);
a ~ normal(gMean, gSD);
bSubj ~ normal(0, sigma_s);
bGroup ~ normal(0,sigma_g);
bBev ~ normal(0,sigma_b);
for (i in 1:nGroup) { //hierarchical prior on interaction
bGxB[i] ~ normal(0, sigma_gb);
}
// likelihood
for (i in 1:N){
score[i] ~ normal(a + bGroup[gIndex[i]] + bBev[bIndex[i]] + bSubj[sIndex[i]] + bGxB[gIndex[i]][bIndex[i]], sigma[bIndex[i]]);
}
}
generated quantities{
real y_draw[N];
for (i in 1:N) {
y_draw[i] = normal_rng(a + bGroup[gIndex[i]] + bBev[bIndex[i]] + bSubj[sIndex[i]] + bGxB[gIndex[i]][bIndex[i]], sigma[bIndex[i]]);
}
}
", file = "temp.stan")
##### Step 3: generate the chains
mod <- stan(file = "temp.stan",
data = dList,
iter = 5000,
warmup = 3000,
cores = 1,
chains = 1)
Next we extract the draws from the joint posterior, and generate estimates of the group mean, upper and lower 95% HPDI. First we need a function to calculate the HPDI
HPDIFunct <- function (vector) {
sortVec <- sort(vector)
ninetyFiveVec <- ceiling(.95*length(sortVec))
fiveVec <- length(sortVec) - length(ninetyFiveVec)
diffVec <- sapply(1:fiveVec, function (i) sortVec[i + ninetyFiveVec] - sortVec[i])
minVal <- sortVec[which.min(diffVec)]
maxVal <- sortVec[which.min(diffVec) + ninetyFiveVec]
return(list(sortVec, minVal, maxVal))
}
Now to extract the draws from the posterior
#### Step 5: Posterior predictive checks
y_draw <- data.frame(extract(mod, pars = "y_draw"))
And plot the mean, lower HPDI and upper HPDI draws of these draws against the actual data.
df$drawMean <- apply(y_draw, 2, mean)
df$HPDI_Low <- apply(y_draw, 2, function(i) HPDIFunct(i)[[2]][1])
df$HPDI_Hi <- apply(y_draw, 2, function(i) HPDIFunct(i)[[3]][1])
### Step 6: plot posterior draws against actual data
ggplot(df, aes(x = factor(bevType), colour = factor(group))) +
geom_jitter(aes(y = score), shape = 1, position = position_dodge(width=0.9)) +
geom_point(aes(y = drawMean), position = position_dodge(width=0.9), stat = "summary", fun.y = "mean", shape = 3, size = 3, stroke = 2) +
geom_point(aes(y = HPDI_Low), position = position_dodge(width=0.9), stat = "summary", fun.y = "mean", shape = 1, size = 3, stroke = 1) +
geom_point(aes(y = HPDI_Hi), position = position_dodge(width=0.9), stat = "summary", fun.y = "mean", shape = 1, size = 3, stroke = 1) +
scale_colour_manual(name = "Experimental Group", labels = c("Group 1", "Group 2", "Group 3"), values = c("#616a6b", "#00AFBB", "#E7B800")) +
scale_x_discrete(labels = c("Water", "Decaf", "Coffee")) +
labs(x = "Beverage Type", y = "Expectancy of Withdrawal Alleviation") +
scale_y_continuous(breaks = seq(0,10,2)) +
theme(axis.text.x = element_text(size = 12),
axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold"),
axis.text.y = element_text(size = 12),
legend.title = element_text(size = 13, face = "bold"))
Looking at the graph, for Water expectancies the model seems to represent the centre (crosses) and spread (open circles) of the data quite well. But this breaks down for the Decaf and Coffee expectancies. For Decaf expectancies the lower HPDI is below the range of possible values (lower limit = 0) and the spread of the draws from the posterior (represented in each group by the open circles) is too large. The Coffee group's upper HPDI limit is also above the range of the data (upper limit = 10) and the spread is too large for the actual data.
So my question is:
How do I constrain the draws from the joint posterior to the actual range of the data?
Is there some sort of brute-force way to constrain the draws from the posterior in Stan? Or would a more adaptable estimation of differences in the variance across the three beverage conditions be more effective (in which case this would be more of a CV question than a SO question)?
The standard way to constrain a posterior variable is to use a link function to transform it. That's the way generalized linear models (GLMs) like logistic regression and Poisson regression work. For example, to go from positive ot unconstrained, we use a log transform. To go from a probability in (0, 1) to unconstrained, we use a log odds transform.
If your outcomes are ordinal values on a 1-10 scale, a common approach that respects that data scale is ordinal logistic regression.
To expand on #Bob Carpenter's answer, here are two ways you could approach the problem. (I've had cause to use both of these recently and struggled to get them up and running. This may be useful to other beginners like me.)
Method 1: Ordered Logistic Regression
We're going to assume that each user has a "true" expectancy for each response, which is on an arbitrary continuous scale, and model it as a latent variable. If the user's actual responses fall into K categories, we also model K - 1 cutpoints between those categories. The probability that the user selects a given response category is equal to the area under the logistic pdf between the relevant cutpoints.
The Stan model looks like this. The main difference is that the model fits an additional ordered vector of cutpoints, and uses the ordered_logistic distribution. (I also changed the priors on the sigmas to Cauchy, to keep them positive, and switched to non-centered parameterization. But those changes are independent of the question at hand.) Edit: Also added inputs for new (hypothetical) observations about which we want to make predictions, and added a new generated quantity for those predictions.
data {
// the real data
int<lower=1> N;
int<lower=1> nSubj;
int<lower=1> nGroup;
int<lower=1> nBev;
int minResponse;
int maxResponse;
int<lower=1,upper=nSubj> sIndex[N];
int<lower=1,upper=nGroup> gIndex[N];
int<lower=1,upper=nBev> bIndex[N];
int<lower=minResponse,upper=maxResponse> score[N];
// hypothetical observations for new predictions
int<lower=1> nNewPred;
int<lower=0> nNewSubj;
int<lower=0> nNewGroup;
int<lower=0> nNewBev;
int<lower=1,upper=nSubj+nNewSubj> sNewIndex[nNewPred];
int<lower=1,upper=nGroup+nNewGroup> gNewIndex[nNewPred];
int<lower=1,upper=nBev+nNewBev> bNewIndex[nNewPred];
}
parameters {
real a;
vector[nSubj] bSubj;
vector[nGroup] bGroup;
vector[nBev] bBev;
vector[nBev] bGxB[nGroup];
real<lower=0> sigma_s;
real<lower=0> sigma_g;
real<lower=0> sigma_b;
real<lower=0> sigma_gb;
ordered[maxResponse - minResponse] cutpoints;
}
model {
// hyper-priors
sigma_s ~ cauchy(0, 1);
sigma_g ~ cauchy(0, 1);
sigma_b ~ cauchy(0, 1);
sigma_gb ~ cauchy(0, 1);
// priors
a ~ std_normal();
bSubj ~ std_normal();
bGroup ~ std_normal();
bBev ~ std_normal();
for (i in 1:nGroup) {
bGxB[i] ~ std_normal();
}
// likelihood
for(i in 1:N) {
score[i] ~ ordered_logistic(a +
(bGroup[gIndex[i]] * sigma_g) +
(bBev[bIndex[i]] * sigma_b) +
(bSubj[sIndex[i]] * sigma_s) +
(bGxB[gIndex[i]][bIndex[i]] * sigma_gb),
cutpoints);
}
}
generated quantities {
real y_draw[N];
real y_new_pred[nNewPred];
vector[nGroup+nNewGroup] bNewGroup;
vector[nBev+nNewBev] bNewBev;
vector[nSubj+nNewSubj] bNewSubj;
vector[nBev+nNewBev] bNewGxB[nGroup+nNewGroup];
// generate posterior predictions for the real data
for (i in 1:N) {
y_draw[i] = ordered_logistic_rng(a +
(bGroup[gIndex[i]] * sigma_g) +
(bBev[bIndex[i]] * sigma_b) +
(bSubj[sIndex[i]] * sigma_s) +
(bGxB[gIndex[i]][bIndex[i]] * sigma_gb),
cutpoints);
}
// generate predictions for the new observations
for (i in 1:(nGroup+nNewGroup)) {
if (i <= nGroup) { bNewGroup[i] = bGroup[i]; }
else { bNewGroup[i] = normal_rng(0, 1); }
}
for (i in 1:(nBev+nNewBev)) {
if (i <= nBev) { bNewBev[i] = bBev[i]; }
else { bNewBev[i] = normal_rng(0, 1); }
}
for (i in 1:(nSubj+nNewSubj)) {
if (i <= nSubj) { bNewSubj[i] = bSubj[i]; }
else { bNewSubj[i] = normal_rng(0, 1); }
}
for (i in 1:(nBev+nNewBev)) {
for(j in 1:(nGroup+nNewGroup)) {
if (i <= nBev && j <= nGroup) { bNewGxB[i][j] = bGxB[i][j]; }
else { bNewGxB[i][j] = normal_rng(0, 1); }
}
}
for (i in 1:nNewPred) {
y_new_pred[i] = ordered_logistic_rng(a +
(bNewGroup[gNewIndex[i]] * sigma_g) +
(bNewBev[bNewIndex[i]] * sigma_b) +
(bNewSubj[sNewIndex[i]] * sigma_s) +
(bNewGxB[gNewIndex[i]][bNewIndex[i]] * sigma_gb),
cutpoints);
}
}
It looks like responses in your dataset are recorded to the nearest tenth, so that gives us 101 possible categories between 0 and 10. To keep everything as Stan-friendly integers, we can multiply all the responses by 10. (I also added one to each response because I had trouble fitting the model when one of the possible categories was zero.) Edit: Added new test data for a hypothetical "subject 47", one observation for each group/beverage.
new.pred.obs = expand.grid(group = 1:3, bevType = 2:3) %>%
mutate(id = max(df$id) + 1)
dList <- list(N = 138,
nSubj = 46,
nGroup = 3,
nBev = 3,
minResponse = 1,
maxResponse = 101,
sIndex = df$id,
gIndex = df$group,
bIndex = df$bevType,
score = (df$score * 10) + 1,
nNewPred = nrow(new.pred.obs),
nNewSubj = 1,
nNewGroup = 0,
nNewBev = 0,
sNewIndex = new.pred.obs$id,
gNewIndex = new.pred.obs$group,
bNewIndex = new.pred.obs$bevType)
After we extract y_draw, we can convert it back to the original scale:
y_draw <- (data.frame(extract(mod, pars = "y_draw")) - 1) / 10
Everything else is the same as before. Now the posterior predictions are correctly confined to [0, 10].
To draw inferences on the original scale about differences between beverages, we can use the predictions for our hypothetical data. For each sample, we have one predicted output for a new subject in each group/beverage combination. We can compare the "coffee" vs. "decaf" responses within each sample and group:
# Get predictions for hypothetical observations
new.preds.df = data.frame(rstan::extract(mod, pars = "y_new_pred")) %>%
rownames_to_column("sample") %>%
gather(obs, pred, -sample) %>%
mutate(obs = gsub("y_new_pred\\.", "", obs),
pred = (pred - 1) / 10) %>%
inner_join(new.pred.obs %>%
rownames_to_column("obs") %>%
mutate(bevType = paste("bev", bevType, sep = ""),
group = paste("Group", group)),
by = c("obs")) %>%
select(-obs) %>%
spread(bevType, pred) %>%
mutate(bevTypeDiff = bev3 - bev2)
(Alternatively, we could have done this prediction for new observations in R, or in a separate Stan model; see here for examples of how this could be done.)
Method 2: Beta Regression
Once we get up to 101 response categories, calling these possibilities discrete categories seems a little strange. It feels more natural to say, as your original model tried to do, that we're capturing a continuous outcome that happens to be bounded between 0 and 10. Also, in ordered logistic regression, the response categories don't have to be regularly spaced with respect to the latent variable. (This is a feature, not a bug; for example, for Likert responses, there's no guarantee that the difference between "Strongly agree" and "Agree" is the same as the difference between "Agree" and "Neither agree not disagree".) as a result, it's difficult to say anything about the "distance" a particular factor causes a response to move on the original scale (as opposed to the scale of the latent variable). But the cutpoints inferred by the model above are pretty regularly spaced, which again suggests that the outcome in your dataset is already reasonably scale-like:
# Get the sampled parameters
sampled.params.df = data.frame(as.array(mod)[,1,]) %>%
select(-matches("y_draw")) %>%
rownames_to_column("iteration")
# Plot selected cutpoints
sampled.params.df %>%
select(matches("cutpoints")) %>%
gather(cutpoint, value) %>%
mutate(cutpoint.num = as.numeric(gsub("^cutpoints\\.([0-9]+)\\.$", "\\1", cutpoint))) %>%
group_by(cutpoint.num) %>%
summarize(mean.value = mean(value),
lower.95 = quantile(value, 0.025),
lower.50 = quantile(value, 0.25),
upper.50 = quantile(value, 0.75),
upper.95 = quantile(value, .975)) %>%
ggplot(aes(x = cutpoint.num, y = mean.value)) +
geom_point(size = 3) +
geom_linerange(aes(ymin = lower.95, ymax = upper.95)) +
geom_linerange(aes(ymin = lower.50, ymax = upper.50), size = 2) +
scale_x_continuous("cutpoint", breaks = seq(0, 100, 10)) +
scale_y_continuous("") +
theme_bw()
(Thick and thin lines represent 50% and 95% intervals, respectively. I'm enjoying the little "jump" every 10 cutpoints, which suggests subjects treated, say, 5.9 vs. 6.0 as a larger difference than 5.8 vs. 5.9. But the effect seems to be quite mild. The scale also seems to stretch out a bit towards the high end, but again, it's not too drastic.)
For a continuous outcome in a bounded interval, we can use the beta distribution; see here and here for further discussion.
For the beta distribution, we need two parameters, mu and phi, both of which must be positive. In this example, I allowed mu to be unbounded and applied inv_logit before feeding it into the beta distribution; I constrained phi to be positive and gave it a Cauchy prior. But you could do it in any number of ways. I also coded a full set of mu parameters but only a single phi; again, you can experiment with other options.
data {
int<lower=1> N;
int<lower=1> nSubj;
int<lower=1> nGroup;
int<lower=1> nBev;
int<lower=1,upper=nSubj> sIndex[N];
int<lower=1,upper=nGroup> gIndex[N];
int<lower=1,upper=nBev> bIndex[N];
vector<lower=0,upper=1>[N] score;
}
parameters {
real a;
real a_phi;
vector[nSubj] bSubj;
vector[nGroup] bGroup;
vector[nBev] bBev;
vector[nBev] bGxB[nGroup];
real<lower=0> sigma_s;
real<lower=0> sigma_g;
real<lower=0> sigma_b;
real<lower=0> sigma_gb;
}
model {
vector[N] mu;
//hyper-priors
sigma_s ~ cauchy(0, 1);
sigma_g ~ cauchy(0, 1);
sigma_b ~ cauchy(0, 1);
sigma_gb ~ cauchy(0, 1);
//priors
a ~ std_normal();
a_phi ~ cauchy(0, 1);
bSubj ~ std_normal();
bGroup ~ std_normal();
bBev ~ std_normal();
for (i in 1:nGroup) {
bGxB[i] ~ std_normal();
}
// likelihood
for(i in 1:N) {
mu[i] = a +
(bGroup[gIndex[i]] * sigma_g) +
(bBev[bIndex[i]] * sigma_b) +
(bSubj[sIndex[i]] * sigma_s) +
(bGxB[gIndex[i]][bIndex[i]] * sigma_gb);
score[i] ~ beta(inv_logit(mu[i]) .* a_phi,
(1 - inv_logit(mu[i])) .* a_phi);
}
}
generated quantities {
real y_draw[N];
real temp_mu;
for (i in 1:N) {
temp_mu = a +
(bGroup[gIndex[i]] * sigma_g) +
(bBev[bIndex[i]] * sigma_b) +
(bSubj[sIndex[i]] * sigma_s) +
(bGxB[gIndex[i]][bIndex[i]] * sigma_gb);
y_draw[i] = beta_rng(inv_logit(temp_mu) .* a_phi,
(1 - inv_logit(temp_mu)) .* a_phi);
}
}
The beta distribution is supported on (0, 1), so we divide the observed scores by 10. (The model also fails if we give it scores of exactly 0 or 1, so I converted all such scores to 0.01 and 0.99, respectively.)
dList.beta <- list(N = 138,
nSubj = 46,
nGroup = 3,
nBev = 3,
sIndex = df$id,
gIndex = df$group,
bIndex = df$bevType,
score = ifelse(df$score == 0, 0.01,
ifelse(df$score == 10, 0.99,
df$score / 10)))
Undo the transformation when extracting y_draw, and then the procedure is the same as before.
y_draw.beta <- data.frame(extract(mod.beta, pars = "y_draw")) * 10
Once again, the posterior draws are correctly bounded.

Simulating datasets in R for model selection

I made a code to simulate a dataset in R to see how backward selection works in machine learning. And I generated poly() function to write polynomial function and then wanted to choose the suitable polynomial using Cp, BIC, adjusted R^2.
The code is:
###Generating dataset
set.seed(1)
X = rnorm(100)
eps = rnorm(100)
beta0 = 3
beta1 = 2
beta2 = -3
beta3 = 0.3
Y = beta0 + beta1 * X + beta2 * X^2 + beta3 * X^3 + eps
library(leaps)
data.full = data.frame(y = Y, x = X)
mod.full = regsubsets(y ~ poly(x, 10, raw = T), data = data.full, nvmax = 10)
mod.summary = summary(mod.full)
### Find the model size for best cp, BIC and adjr2
which.min(mod.summary$cp)
For cp, BIC and adjusted R^2 I get model with polynomial 3 as it should be
However, now I want to simulate 100 datasets and see in how many datasets do I get the right model. I simulated 100 datasets but now I am not getting polynomial 3 for each of the measures. And I don't quite understand what I'm doing wrong. My code for simulation is:
###Generating 100 datasets
data <- replicate(100, rnorm(n=100))
epsilon <- replicate(100,rnorm(n=100))
###Formula (same as before)
Y = beta0 + beta1 * data + beta2 * data^2 + beta3 * data^3 + epsilon
data.full = data.frame(y = Y, x = data)
###Using polynomial terms
mod.bwd = regsubsets(data.full$y.1 ~ poly(data.full$x.1, 10, raw = T), data = data.full, nvmax = 10,
method = "backward")
bwd.summary = summary(mod.bwd)
which.min(bwd.summary$cp)
which.min(bwd.summary$bic)
which.max(bwd.summary$adjr2)
For a given subset cp, Bic, adjr2 are giving me different results. For example, using y.1 and x.1 (first dataset in simulation) gives following results:
which.min(bwd.summary$cp): 7
which.min(bwd.summary$bic): 4
which.max(bwd.summary$adjr2): 9
Can someone help me what I'm doing wrong in simulating these 100 datasets.
If I've read your code correctly you run the model on the same simulated dataset 100 times instead of all 100 simulated datasets, this should do the trick:
set.seed(42)
###Generating 100 datasets
data <- replicate(100, rnorm(n=100))
epsilon <- replicate(100,rnorm(n=100))
###Formula (same as before)
Y = beta0 + beta1 * data + beta2 * data^2 + beta3 * data^3 + epsilon
data.full = data.frame(y = Y, x = data)
res <- lapply(1:100, function(i){
###Using polynomial terms
mod.bwd = regsubsets(data.full[[i]] ~ poly(data.full[[100+i]], 10, raw = T), data = data.full, nvmax = 10,
method = "backward")
bwd.summary = summary(mod.bwd)
c(
which.min(bwd.summary$cp),
which.min(bwd.summary$bic),
which.max(bwd.summary$adjr2)
)
})
res <- do.call(rbind, res)
With this rng-seed this gives some lines where all cirteria select the correct model.

How to define function arguments based on data.frame columns (R)?

I have a script that runs maximum likelihood estimation for a linear model. The model has several variables and I need to vary them occasionally, maybe add or drop some. The usual way to define the likelihood function is like this:
LL <- function(beta0, beta1, beta2, mu, sigma){
R = y - beta0*X$x0 + beta1*X$x1 + beta2*X$x2
R = dnorm(R, mu, sigma, log = T)
-sum(R)
}
I have dependent variable in vector y and covariates in data.frame X:
X <- data.frame(x0 = 1, x1 = runif(100), x2 = runif(100)*2)
y <- X$x0 + X$x1 + X$x2 + rnorm(100)
Now the amount of variables is subject to change by application and I need to reformulate the function so that it will take as many covariates as there are columns in the data.frame X. I was already able to reformulate this to a more general form:
cols <- 0:(ncol(X)-1)
betas <- paste0("beta", cols)
eqR <- paste0("y - ", paste0(betas, "*X$x", cols, collapse = " - "))
LL <- function(beta0, beta1, beta2, mu, sigma){
R = as.formula(eqR)
R = dnorm(R, mu, sigma, log = T)
-sum(R)
}
I'm still struggling to find a way to dynamically define the function so that it would take the same number of beta arguments as there are columns in the covariate matrix. Ellipsis is perhaps useful here? I also tried with do.call:
LL <- function(betas, mu, sigma){
R <- do.call(dnorm(as.formula(eqR), mu, sigma, log = T), betas)
-sum(R)
}
That doesn't work when you fit the model, which has another stumbling block in the list of initial values:
require(stats4)
fit <- mle(LL, start = list(beta0 = 0, beta1 = 0, beta2 = 0, mu = 0, sigma = 1))
Any ideas for this?
EDIT:
I made some advance with bbmle package:
require(bbmle)
dfModel <- cbind(y, X)
cols <- 0:(ncol(X)-1)
betas <-paste0("beta",cols)
betaList <- as.list(rep(0), length(betas)))
names(betaList) <- betas
initList <- c(betaList, mu = 0, sigma = 1)
fitML <- mle2(mu ~ dnorm(mean = y - beta0*x0 - beta1*x1 - beta2*x2, sd = sigma),
start = initList,
data = dfModel)
The above example works. But when I try to define the function beforehand with as.formula, I can't get it working. So the following does not work.
eqR <- paste0("y - ", paste0(betas, "*x", cols, collapse = " - "))
fitML <- mle2(mu ~ dnorm(mean = as.formula(eqR), sd = sigma),
start = initList,
data = dfModel)
The error message is:
Error in eval(expr, envir, enclos) : object 'beta0' not found
I suspect that this might have something to do with scoping - conflict between dnorm and as.formula? I just can't find workaround for that.
Try this:
betas = c(0,0,0)
X <- data.frame(x0 = 1, x1 = runif(100), x2 = runif(100)*2)
y <- apply(X,1,sum) + rnorm(100)
where betas is (b0, b1, b2, ...etc) and its length must be equal to the number of columns of X.
Since X could have a different number of columns y should be defined as above.
Your LL function should change to:
LL <- function(betas, mu, sigma){
R = y - as.matrix(X) %*% as.matrix(betas)
R = dnorm(R, mu, sigma, log = T)
-sum(R)
}
where %*% is the matrix product. This is the same as doing b[1]*X[,1] + b[2]*X[,2] + b[3]*X[,3] + ... + b[n]*X[,n]
With these changes, you could have data frame X with any number of columns, betas an array of the same length as columns of X.
I hope I understood what you needed.

Resources