I'm trying to implement a hierarchical mixture model in Stan that describes how performance on a task changes over time. In the model (see code below), there are three lower level parameters that are assumed to be drawn from a mixture of two normals (dperf_int, dperf_sd, and sf). To implement these, I've just assumed that the prior on these lower level parameters is a mixture. There are also two parameters (perf1_int and perf1_sd) that are only estimated at the group level. For these, I've assumed that the observation itself is drawn from a mixture of normals.
data {
int<lower=0> Nsubj; //Number of subjects (60)
int<lower=0> Nobs; //Number of observations per subject (10)
matrix[Nsubj,Nobs] perf; //Matrix of performance observations
}
parameters {
vector<lower=0,upper=1>[Nsubj] mixing_prop; // mixing proportion for each subject
ordered[2] perf1_int; //mean performance at time 1 (estimated at group level only)
vector<lower=0>[2] perf1_sd; //sd of performance at time 1 (estimated at group level only)
ordered[2] dperf_int_mean; //mean of performance change intercept for each mixture
vector<lower=0>[2] dperf_int_sd; //sd of performance change intercept for each mixture
vector<lower=0>[2] dperf_sd_mean; //mean of performance change sd
vector<lower=0>[2] dperf_sd_sd; //sd of performance change sd
vector[2] sf_mean; //mean self-feedback for each mixture
vector<lower=0>[2] sf_sd; //sd of self-feedback for each mixture
//subject level parameters
vector[Nsubj] dperf_int;
vector<lower=0>[Nsubj] dperf_sd;
vector[Nsubj] sf;
}
model {
real perf_change;
for(i in 1:Nsubj){
vector[4] increments;
increments[1]=log_mix(mixing_prop[i],
normal_lpdf(dperf_int[i] | dperf_int_mean[1], dperf_int_sd[1]),
normal_lpdf(dperf_int[i] | dperf_int_mean[2], dperf_int_sd[2]));
increments[2]=log_mix(mixing_prop[i],
normal_lpdf(dperf_sd[i] | dperf_sd_mean[1], dperf_sd_sd[1]),
normal_lpdf(dperf_sd[i] | dperf_sd_mean[2], dperf_sd_sd[2]));
increments[3]=log_mix(mixing_prop[i],
normal_lpdf(sf[i] | sf_mean[1], sf_sd[1]),
normal_lpdf(sf[i] | sf_mean[2], sf_sd[2]));
increments[4]=log_mix(mixing_prop[i],
normal_lpdf(perf[i,1] | perf1_int[1], perf1_sd[1]),
normal_lpdf(perf[i,1] | perf1_int[2], perf1_sd[2]));
target+= log_sum_exp(increments);
for(j in 2:Nobs){
perf_change = dperf_int[i] + perf[i,j-1]*sf[i];
perf[i,j] ~ normal( perf[i,j-1] + perf_change, dperf_sd[i]);
}
}
}
I'm pretty sure I've implemented the mixture of the group-level parameters correctly, as I've just done what was in the Stan manual. However, I wanted to check that I've done the hierarchical mixture components of the model correctly. The model is quite inefficient. I've run 4 chains at 4000 iterations and many of the parameters have < 10 n_eff. So I suspect I'm missing something.
Here is the code used to run the model:
library(tidyverse)
library(rstan)
#load data
load(file="wide_data.RData")
#prepare data for stan
model_data = list(
Nsubj = dim(wide_data)[1],
Nobs = 10,
perf = as.matrix(select(wide_data ,perf1:perf10))
)
fit = stan(file="stan_univariate_hierarchical_mixture.stan",
data=model_data,cores=4,
thin=10,
warmup=1000,
chains=4,
iter=4000)
...and the data is here: https://www.dropbox.com/s/eqzw1lou6uba8i3/wide_data.RData?dl=0
Any help would be much appreciated!
Related
I'm running a meta-analysis where I'm interested in the effect of X on the effect of age on habitat use (raw mean values and variances) using the metafor package.
An example of one of my models is:
mod6 <-
rma.mv(
yi = Used_value,
V = Used_variance,
slab = Citation,
mods = ~ Age + poly(Slope, degrees = 2),
random = ~ 1 | Region,
data = vel.focal,
method = "ML"
)
My justification for not using Citation as a random effect is that using only Region accounts for more of the heterogeneity than when random = list( ~ 1 | Citation/ID, ~ 1 | Region) or when Citation/ID is used by itself.
What I need for output is the prediction for each age by region, but the predict() function for the model and the associated forest plot spits out the prediction for each row, as it assumes each row in the data is a unique study. In my case it is not as I have my input values separated by age and season.
predict(mod6)
pred se ci.lb ci.ub pi.lb pi.ub
Riehle and Griffith 1993.1 9.3437 2.3588 4.7205 13.9668 0.2362 18.4511
Riehle and Griffith 1993.2 9.3437 2.3588 4.7205 13.9668 0.2362 18.4511
Riehle and Griffith 1993.3 9.3437 2.3588 4.7205 13.9668 0.2362 18.4511
Spina 2000.1 8.7706 2.7386 3.4030 14.1382 -0.7364 18.2776
Spina 2000.2 8.5407 2.7339 3.1824 13.8991 -0.9611 18.0426
Spina 2000.3 8.5584 2.7406 3.1868 13.9299 -0.9509 18.0676
Vondracek and Longanecker 1993.1 12.6116 2.5138 7.6847 17.5385 3.3462 21.8769
Vondracek and Longanecker 1993.2 12.6116 2.5138 7.6847 17.5385 3.3462 21.8769
Vondracek and Longanecker 1993.3 12.3817 2.5327 7.4176 17.3458 3.0965 21.6669
Vondracek and Longanecker 1993.4 12.3817 2.5327 7.4176 17.3458 3.0965 21.6669
Does anybody know a way to modify the arguments inside predict() to tell it how you want your predictions output or to tell it that there are multiple rows per slab?
You need to use the newmods argument to specify the values for Age for which you want predicted values. You will have to plug in something for the linear and quadratic terms for the Slope variable as well (e.g., holding Slope constant at its mean and hence the quadratic term will just be the mean squared). Region is not a fixed effect, so it is not relevant if you want to compute predicted values based on the fixed effects. If you want to compute BLUPs for those random effects, you can do so with ranef(). One can then combine the predictions based on the fixed effects with the BLUPs. That would be the general idea, but implementing this will require a bit of programming.
Question: Have a look at data set Two.csv. It contains a potentially dependent binary variable Y , and
two potentially independent variables {X1, X2} for each unit of measurement.
(a) Read data set Two.csv into R and have a look at the format of the dependent variable.
Discuss three models which might be appropriate in this data situation. Discuss which
aspects speak in favor of each model, and which aspects against.
(b) Suppose variable Y measures financial ratings A : y = 1, B : y = 2, and C : y = 3, that
is, the creditworthiness A: high, B: intermediate, C: low for unit of measurement firm
i. Model Y by means of an ordered Logit model as a function of {X1,X2} and estimate
your model by means of a built-in command.
(c) Explain the proportional odds-assumption and test whether the assumption is critical
in the context of the data set at hand.
##a) Read data set Two.csv into R and have a look at the format of the dependent variable.
O <- read.table("C:/Users/DELL/Downloads/ExamQEIII2021/Two.csv",header=TRUE,sep=";")
str(O)
dim(O)
View(O)
##b)
library(oglmx)
ologit<- oglmx(y~x1+x2,data=O, link="logit",constantMEAN = FALSE, constantSD = FALSE,
delta=0,threshparam =NULL)
results.ologis <- ologit.reg(y~x1+x2,data=O)
summary(results.ologis)
## x1 1.46251
## x2 -0.45391
margins.oglmx(results.ologis,ascontinuous = FALSE) #Build in command for AMElogit
##c) Explain the proportional odds-assumption and test whether the assumption is critical
#in the context of the data set at hand.
#ordinal Logit WITH proportional odds(PO)
library(VGAM)
a <- vglm(y~x1+x2,family=cumulative(parallel=TRUE),data=O)
summary(a)
#ordinal Logit WITHOUT proportional odds [a considers PO and c doesn't]
c <- vglm(y~x1+x2,family=cumulative(parallel=FALSE),data=O)
summary(c)
pchisq(deviance(a)-deviance(c),df.residual(a)-df.residual(c),lower.tail=FALSE)
## 0.4936413 ## No significant difference in the variance left unexplained. Cannot
#confirm that PO assumption is critical.
#small model
LLa <- logLik(a)
#large model
LLc <- logLik(c)
#2*LLc-2*
df.residual(c)
df.residual(a) #or similarly, via a Likelihood Ratio test.
# or, if you are unsure about the number of degrees of freedom
LL<- 2*(LLc -LLa)
1-pchisq(LL,df.residual(a)-df.residual(c))
## 0.4936413 [SAME AS ## No sign. line]
##Conclusion: Likelihood do not differ significantly with the assumption of non PO.
UPDATE: Now with Traceplot example
UPDATE: Now with new traceplot
I am trying to adapt Outhwaite et. als 2018 code for occupancy modelling and have a couple of questions that I just can't seem to find an answer for...
Code used to create model
cat(
"model{
### Model ###
# State model
for (i in 1:nsite){
for (t in 1:nyear){
z[i,t] ~ dbern(psi[i,t])
logit(psi[i,t])<- b[t] + u[i]
}}
# Observation model
for(j in 1:nvisit) {
y[j] ~ dbern(Py[j]+0.0001)
Py[j]<- z[Site[j],Year[j]]*p[j]
logit(p[j]) <- a[Year[j]] + c*logL[j]
}
### Priors ###
# State model priors
for(t in 1:nyear){
b[t] ~ dunif(-10,10) # fixed year effect
}
for (i in 1:nsite) {
u[i] ~ dnorm(0, tau.u) # random site effect
}
tau.u <- 1/(sd.u * sd.u)
sd.u ~ dunif(0, 5) # half-uniform hyperpriors
# Observation model priors
for (t in 1:nyear) {
a[t] ~ dnorm(mu.a, tau.a) # random year effect
}
mu.a ~ dnorm(0, 0.01)
tau.a <- 1 / (sd.a * sd.a)
sd.a ~ dunif(0, 5) # half-uniform hyperpriors
c ~ dunif(-10, 10) # sampling effort effect
### Derived parameters ###
# Finite sample occupancy - proportion of occupied sites
for (t in 1:nyear) {
psi.fs[t] <- sum(z[1:nsite,t])/nsite
}
#data# nyear, nsite, nvisit, y, logL, Site, Year
}", file="bmmodel.txt"
)
Note that dbern(Py[j]+0.0001) includes a correction factor since dbern(0) is not supported in JAGS.
I am running the model on some plant data just basically trying it out to see if it runs and converges and behaves as I would expect it to.
Question number 1(ANSWERED): I am interested in the quantity psi.fs[t]. But since the model calculates this quantity after the actual modelling process, can convergence be assessed for psi.fs[t]?
R code for running model with R2JAGS
jagsrespsi<-jags(data.list, inits=test.inits,
n.chains=2, n.iter=15000, n.thin=3,
DIC=T,
model.file=paste0(modeltype,"model.txt"), parameters.to.save=c("psi.fs"))
Question number 2: When I use traceplot(jagsrespsi) to plot the traceplot seems all over the place but the Rhat for jagsrespsi$BUGSoutput is 1 for all my years? gelman.diag(as.mcmc(jagsrespsi)) also indicates convergence. Same goes for monitoring psi!
I am very astonished by this model behaviour and am suspecting there is something wrong... but no idea where to look
Yes, you can check psi.ft[] for convergence in exactly the same way as you check the convergence of the model's parameters. That's exactly what happens, for example, in a logistic regression, where the fitted probabilities of response are calculated as exp(z)/(1 + exp(z)) for some linear predictor z.
When you say the traceplot is "all over the place", what do you mean? This could be either good or bad. Can you show an example? A "good" traceplot looks like a "fat, hairy caterpillar": consecutive samples taken from all regions of the sample space, a horizontal hair ball. Although written for SAS, this page gives a reasonable high level description of what a good trace plot looks like, and what problems might be indicated by less-than-ideal examples.
In response to your edit to include the trace plot...
That doesn't look like a particularly good traceplot to me: there seems to be some negative autocorrelation between successive samples. Have you calculated the effective sample size [ESS]?
But the plot may look a little odd because your chain is very short, IMHO. You can use the ESS to provide a very rough approximation for the accuracy of an estimated probability: the worst case half width CI of a binomial proportion is +/-2 * sqrt(0.5*0.5/N), where N is the sample size (or ESS in this case). So even if the efficiency of your MCMC process is 1 - so that the ESS is equal to the chain length - then the accuracy of your estimates is only +/-0.02. To estimate a probability to 2 decimal places (so that the half width of the CI is no more than 0.005), you need an ESS of 40,000.
There's nothing wrong with using short chain lengths during testing, but for "production" runs, then I would always use a chan length much greater than 2,500. (And I'd also use multiple chains so that I can use Gelman-Rubin statistics to test for convergence.)
I ran the following code for a binary classification task w/ an SVM in both R (first sample) and Python (second example).
Given randomly generated data (X) and response (Y), this code performs leave group out cross validation 1000 times. Each entry of Y is therefore the mean of the prediction across CV iterations.
Computing area under the curve should give ~0.5, since X and Y are completely random. However, this is not what we see. Area under the curve is frequently significantly higher than 0.5. The number of rows of X is very small, which can obviously cause problems.
Any idea what could be happening here? I know that I can either increase the number of rows of X or decrease the number of columns to mediate the problem, but I am looking for other issues.
Y=as.factor(rep(c(1,2), times=14))
X=matrix(runif(length(Y)*100), nrow=length(Y))
library(e1071)
library(pROC)
colnames(X)=1:ncol(X)
iter=1000
ansMat=matrix(NA,length(Y),iter)
for(i in seq(iter)){
#get train
train=sample(seq(length(Y)),0.5*length(Y))
if(min(table(Y[train]))==0)
next
#test from train
test=seq(length(Y))[-train]
#train model
XX=X[train,]
YY=Y[train]
mod=svm(XX,YY,probability=FALSE)
XXX=X[test,]
predVec=predict(mod,XXX)
RFans=attr(predVec,'decision.values')
ansMat[test,i]=as.numeric(predVec)
}
ans=rowMeans(ansMat,na.rm=TRUE)
r=roc(Y,ans)$auc
print(r)
Similarly, when I implement the same thing in Python I get similar results.
Y = np.array([1, 2]*14)
X = np.random.uniform(size=[len(Y), 100])
n_iter = 1000
ansMat = np.full((len(Y), n_iter), np.nan)
for i in range(n_iter):
# Get train/test index
train = np.random.choice(range(len(Y)), size=int(0.5*len(Y)), replace=False, p=None)
if len(np.unique(Y)) == 1:
continue
test = np.array([i for i in range(len(Y)) if i not in train])
# train model
mod = SVC(probability=False)
mod.fit(X=X[train, :], y=Y[train])
# predict and collect answer
ansMat[test, i] = mod.predict(X[test, :])
ans = np.nanmean(ansMat, axis=1)
fpr, tpr, thresholds = roc_curve(Y, ans, pos_label=1)
print(auc(fpr, tpr))`
You should consider each iteration of cross-validation to be an independent experiment, where you train using the training set, test using the testing set, and then calculate the model skill score (in this case, AUC).
So what you should actually do is calculate the AUC for each CV iteration. And then take the mean of the AUCs.
I am learning rstan and at the moment I'm solving exercises from Gelmans "Bayesian Data Analysis". For reference this is about the example 5 in chapter 3.
It keeps failing with:
Initialization failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
error occurred during calling the sampler; sampling not done
this is my R code:
library(rstan)
scode <- "
transformed data {
real o_data[5];
o_data[1] <- 10;
o_data[2] <- 10;
o_data[3] <- 12;
o_data[4] <- 11;
o_data[5] <- 9;
}
parameters {
real mu;
real<lower=0> sigma;
real tru_val[5];
}
model {
mu ~ uniform(0.0,20.0);
sigma ~ gamma(2,1);
for (i in 1:5) {
tru_val[i] ~ normal(mu,sigma);
tru_val[i] ~ uniform(o_data[i]-0.5, o_data[i]+0.5);
}
}
"
afit <- stan(model_code = scode, verbose=TRUE)
The funny thing is - if I change the second tru_val sampling to tru_val[i] ~ normal(o_data[i],0.5); the model will evaluate just fine.
So far I tried in the stan code:
rearranging the sampling statements
introducing helper variables
explicitely writing increment_log_p statements
change variable names in case I had accidentially used a keyword
add print statements in the stan code
setting mu to 10
relaxing/widening the constraints in the uniform distribution
and combinations of above
I noticed something surprising, as I printed the values of tru_val that - no matter which order the statements - I make it prints values around 0 typically between -2 and +2 - even when I set mu <- 10; sigma <- 1; (in the data section) and the sampling statement tru_val[i] ~ uniform(9.5,10.5). I don't really see how it can get these numbers.
I really hope someone can shine some light onto this.
The constraints of the variable need to match the support of the distribution you're using. For tru_val[i] ~ uniform(9.5, 10.5), tru_val has to be defined as real<lower=9.5,upper=10.5> tru_val[5].
In this statement, tru_val[i] ~ normal(mu, sigma), Stan is not drawing a sample from a normal distribution and setting it to tru_val[i]. It is calculating the joint distribution function (in log space); in this case, it's evaluating the normal probability distribution function of tru_val[i] given mu and sigma (in log space).
(The best place to ask questions is on the Stan users mailing list.)