Winbugs to Rjags beta binomial model translation - r

I am working through the textbook "Bayesian Ideas and Data Analysis" by Christensen et al.
There is a simple exercise in the book that involves cutting and pasting the following code to run in Winbugs:
model{ y ~ dbin(theta, n) # Model the data
ytilde ~ dbin(theta, m) # Prediction of future binomial
theta ~ dbeta(a, b) # The prior
prob <- step(ytilde - 20) # Pred prob that ytilde >= 20 }
list(n=100, m=100, y=10, a=1, b=1) # The data
list(theta=0.5, ytilde=10) # Starting/initial values
I am trying to translate the following into R2jags code and am running into some trouble. I thought I could fairly directly write my R2Jags code in this fashion:
model {
#Likelihoods
y ~ dbin(theta,n)
yt ~ dbin(theta,m)
#Priors
theta ~ dbeta(a,b)
prob <- step(yt - 20)
}
with the R code:
library(R2jags)
n <- 100
m <- 100
y <- 10
a <- 1
b <- 1
jags.data <- list(n = n,
m = m,
y = y,
a = a,
b = b)
jags.init <- list(
list(theta = 0.5, yt = 10), #Chain 1 init
list(theta = 0.5, yt = 10), #Chain 2 init
list(theta = 0.5, yt = 10) #Chain 3 init
)
jags.param <- c("theta", "yt")
jags.fit <- jags.model(data = jags.data,
inits = jags.inits,
parameters.to.save = jags.param,
model.file = "hw21.bug",
n.chains = 3,
n.iter = 5000,
n.burnin = 100)
print(jags.fit)
However, calling the R code brings about the following error:
Error in jags.model(data = jags.data, inits = jags.inits, parameters.to.save = jags.param, :
unused arguments (parameters.to.save = jags.param, model.file = "hw21.bug", n.iter = 5000, n.burnin = 100)
Is it because I am missing a necessary for loop in my R2Jags model code?

The error is coming from the R function jags.model (not from JAGS) - you are trying to use arguments parameters.to.save etc to the wrong function.
If you want to keep the model as similar to WinBUGS as possible, there is an easier way than specifying the data and initial values in R. Put the following into a text file called 'model.txt' in your working directory:
model{
y ~ dbin(theta, n) # Model the data
ytilde ~ dbin(theta, m) # Prediction of future binomial
theta ~ dbeta(a, b) # The prior
prob <- step(ytilde - 20) # Pred prob that ytilde >= 20
}
data{
list(n=100, m=100, y=10, a=1, b=1) # The data
}
inits{
list(theta=0.5, ytilde=10) # Starting/initial values
}
And then run this in R:
library('runjags')
results <- run.jags('model.txt', monitor='theta')
results
plot(results)
For more information on this method of translating WinBUGS models to JAGS see:
http://runjags.sourceforge.net/quickjags.html
Matt

This old blog post has an extensive example of converting BUGS to JAGS accessed via package rjags not R2jags. (I like the package runjags even better.) I know we're supposed to present self-contained answers here, not just links, but the post is rather long. It goes through each logical step of a script, including:
loading the package
specifying the model
assembling the data
initializing the chains
running the chains
examining the results

Related

Logistic Regression in R: Optimization Issues concerning Initial Guess

I need to implement a logistic regression manually, using the Score/GMM approach, without the use of GLM. This is because at later stages the model will be much more complicated. Currently I am running into a problem where for the logistic regression, the optimization procedures are very initial point dependent.To illustrate, here is my code using an online dataset. More details about the procedure are in the comments:
library(data,table)
library(nleqslv)
library(Matrix)
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
data_analysis<-data.table(mydata)
data_analysis[,constant:=1]
#Likelihood function for logit
#The logistic regression will regress the binary variable
#admit on a constant and the variable gpa
LL <- function(beta){
beta=as.numeric(beta)
data_temp=data_analysis
mat_temp2 = cbind(data_temp$constant,
data_temp$gpa)
one = rep(1,dim(mat_temp2)[1])
h = exp(beta %*% t(mat_temp2))
choice_prob = h/(1+h)
llf <- sum(data_temp$admit * log(choice_prob)) + (sum((one-data_temp$admit) * log(one-choice_prob)))
return(-1*llf)
}
#Score to be used when optimizing using LL
#Identical to the Score function below but returns negative output
Score_LL <- function(beta){
data_temp=data_analysis
mat_temp2 = cbind(data_temp$constant,
data_temp$gpa)
one = rep(1,dim(mat_temp2)[1])
h = exp(beta %*% t(mat_temp2))
choice_prob = h/(1+h)
resid = as.numeric(data_temp$admit - choice_prob)
score_final2 = t(mat_temp2) %*% Diagonal(length(resid), x=resid) %*% one
return(-1*as.numeric(score_final2))
}
#The Score/Deriv/Jacobian of the Likelihood function
Score <- function(beta){
data_temp=data_analysis
mat_temp2 = cbind(data_temp$constant,
data_temp$gpa)
one = rep(1,dim(mat_temp2)[1])
h = exp(beta %*% t(mat_temp2))
choice_prob = as.numeric(h/(1+h))
resid = as.numeric(data_temp$admit - choice_prob)
score_final2 = t(mat_temp2) %*% Diagonal(length(resid), x=resid) %*% one
return(as.numeric(score_final2))
}
#Derivative of the Score function
Score_Deriv <- function(beta){
data_temp=data_analysis
mat_temp2 = cbind(data_temp$constant,
data_temp$gpa)
one = rep(1,dim(mat_temp2)[1])
h = exp(beta %*% t(mat_temp2))
weight = (h/(1+h)) * (1- (h/(1+h)))
weight_mat = Diagonal(length(weight), x=weight)
deriv = t(mat_temp2)%*%weight_mat%*%mat_temp2
return(-1*as.array(deriv))
}
#Quadratic Gain function
#Minimized at Score=0 and so minimizing is equivalent to solving the
#FOC of the Likelihood. This is the GMM approach.
Quad_Gain<- function(beta){
h=Score(as.numeric(beta))
return(sum(h*h))
}
#Derivative of the Quadratic Gain function
Quad_Gain_deriv <- function(beta){
return(2*t(Score_Deriv(beta))%*%Score(beta))
}
sol1=glm(admit ~ gpa, data = data_analysis, family = "binomial")
sol2=optim(c(2,2),Quad_Gain,gr=Quad_Gain_deriv,method="BFGS")
sol3=optim(c(0,0),Quad_Gain,gr=Quad_Gain_deriv,method="BFGS")
When I run this code, I get that sol3 matches what glm produces (sol1) but sol2, with a different initial point, differs from the glm solution by a lot. This is something happening in my main code with the actual data as well. One solution is to create a grid and test multiple starting points. However, my main data set has 10 parameters and this would make the grid very large and the program computationally infeasible. Is there a way around this problem?
Your code seems overly complicated. The following two functions define the negative log-likelihood and negative score vector for a logistic regression with the logit link:
logLik_Bin <- function (betas, y, X) {
eta <- c(X %*% betas)
- sum(dbinom(y, size = 1, prob = plogis(eta), log = TRUE))
}
score_Bin <- function (betas, y, X) {
eta <- c(X %*% betas)
- crossprod(X, y - plogis(eta))
}
Then you can use it as follows:
# load the data
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
# fit with optim()
opt1 <- optim(c(-1, 1, -1), logLik_Bin, score_Bin, method = "BFGS",
y = mydata$admit, X = cbind(1, mydata$gre, mydata$gpa))
opt1$par
# compare with glm()
glm(admit ~ gre + gpa, data = mydata, family = binomial())
Typically, for well-behaved covariates (i.e., expecting to have a coefficients in the interval [-4 to 4]), starting at 0 is a good idea.

How to estimate the Kalman Filter with 'KFAS' R package, with an AR(1) transition equation?

I am using 'KFAS' package from R to estimate a state-space model with the Kalman filter. My measurement and transition equations are:
y_t = Z_t * x_t + \eps_t (measurement)
x_t = T_t * x_{t-1} + R_t * \eta_t (transition),
with \eps_t ~ N(0,H_t) and \eta_t ~ N(0,Q_t).
So, I want to estimate the variances H_t and Q_t, but also T_t, the AR(1) coefficient. My code is as follows:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
fit <- fitSSM(ss_model, inits = c(0,0.6,0), method = 'L-BFGS-B')
But it returns: "Error in is.SSModel(do.call(updatefn, args = c(list(inits, model), update_args)),: System matrices (excluding Z) contain NA or infinite values, covariance matrices contain values larger than 1e+07"
The NA definitions for the variances works well, as documented in the package's paper. However, it seems this cannot be done for the AR coefficients. Does anyone know how can I do this?
Note that I am aware of the SSMarima function, which eases the definition of the transition equation as ARIMA models. Although I am able to estimate the AR(1) coef. and Q_t this way, I still cannot estimate the \eps_t variance (H_t). Moreover, I am migrating my Kalman filter codes from EViews to R, so I need to learn SSMcustom for other models that are more complicated.
Thanks!
It seems that you are missing something in your example, as your error message comes from the function fitSSM. If you want to use fitSSM for estimating general state space models, you need to provide your own model updating function. The default behaviour can only handle NA's in covariance matrices H and Q. The main goal of fitSSM is just to get started with simple stuff. For complex models and/or large data, I would recommend using your self-written objective function (with help of logLik method) and your favourite numerical optimization routines manually for maximum performance. Something like this:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
objf <- function(pars, model, estimate = TRUE) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
if (estimate) {
-logLik(model)
} else {
model
}
}
opt <- optim(c(1, 0.5, 1), objf, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100), model = ss_model)
ss_model_opt <- objf(opt$par, ss_model, estimate = FALSE)
Same with fitSSM:
updatefn <- function(pars, model) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
model
}
fit <- fitSSM(ss_model, c(1, 0.5, 1), updatefn, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100))
identical(ss_model_opt, fit$model)

Data file seemingly too large with wine and openBUGS

I have recently started trying to do some bayesian modelling with OpenBUGS and R using R2OpenBUGS.
I followed a great link for installing wine and OpenBUGS by David Eagles and the schools example posted there worked fine.
I then tried to run some code I had and kept getting an error after the the model is syntactically correct telling me the data was not loading properly.
After a few days of troubleshooting this it seems that when the datafile is above a certain length the top bit (where you traditionally highlight list in OpenBUGS to load data) is lost and the data cannot be loaded by OpenBUGS.
A screenshot of my OpenBUGS program with the loaded data where the list section has been cut off
This picture shows that at the top of my data.txt file there is no longer a list{ which is basically stopping all of my attempts to run the model.
To confirm this I have ran the example of a simple linear regression where I set the number of points to 1000 and then to 100:
# Load in paths for running OpenBUGS through wine
source("/Users/dp323/Desktop/R/scripts/Where_is_my_OpenBUGS.R")
# set up model
linemodel <- function() {
for (j in 1:N) {
Y[j] ~ dnorm(mu[j], tau) ## Response values Y are Normally distributed
mu[j] <- alpha + beta * (x[j] - xbar) ## linear model with x values centred
}
## Priors
alpha ~ dnorm(0, 0.001)
beta ~ dnorm(0, 0.001)
tau ~ dgamma(0.001, 0.001)
sigma <- 1/sqrt(tau)
}
# set up data ####
linedata <- list(Y = c(1:100), x = c(1:100), N = 100, xbar = 3)
# set initial values ####
lineinits <- function() {
list(alpha = 1, beta = 1, tau = 1)
}
# run bugs ####
lineout <- bugs(data = linedata, inits = lineinits, parameters.to.save = c("alpha", "beta", "sigma"), model.file = linemodel, n.chains = 1, n.iter = 10000, OpenBUGS.pgm = OpenBUGS.pgm, WINE = WINE, WINEPATH = WINEPATH, useWINE = T, debug = T)
The model runs great when linedata <- list(Y = c(1:100), x = c(1:100), N = 100, xbar = 3) but falls apart at the loading in data stage when linedata <- list(Y = c(1:1000), x = c(1:1000), N = 1000, xbar = 3).
I have exhausted a google search of limits of data file size on wine and OpenBUGS and not found anything helpful.
Anyone have any suggestions of what to try/where to start/experience of this before?

How can I save a JAGS model object in R?

I am using the package rjags to do MCMC in R and I would like to save the output of the function jags.model for later use in another R session.
Here is a simple example for the mean of a Normal distribution:
library(rjags)
N <- 1000
x <- rnorm(N, 0, 5)
model.str <- 'model {for (i in 1:N) {
x[i] ~ dnorm(mu, 5)}
mu ~ dnorm(0, .0001)}'
jags <- jags.model(textConnection(model.str), data = list(x = x, N = N))
update(jags, 1000)
I can generate samples of mu like this:
coda.samples(model=jags,n.iter=1,variable.names="mu")
# [[1]]
# Markov Chain Monte Carlo (MCMC) output:
# Start = 2001
# End = 2001
# Thinning interval = 1
# mu
# [1,] 0.2312028
#
# attr(,"class")
# [1] "mcmc.list"
Now I would like to save the model object jags for later use in a new R session, so that I don't have to initialize and burn in the Markov Chain again:
save(file="/tmp/jags.Rdata", list="jags")
quit()
However, after starting a new R session and reloading the model I get an error message that the JAGS model must be recompiled:
load("/tmp/jags.Rdata")
coda.samples(model=jags,n.iter=1,variable.names="mu")
# Error in model$iter() : JAGS model must be recompiled
Why is that? How can I save the object jags in R for later use?
Note: The question has been asked before, but the OP was not very specific about the problem.
Maybe I am totally off track regarding what you really want to do, but I would set up a jags model like this, using R2jags instead of rjags (just something like a different wrapper):
library(R2jags)
N <- 1000
x <- rnorm(N, 0, 5)
sink("test.txt")
cat("
model{
for (i in 1:N) {
x[i] ~ dnorm(mu, 5)
}
mu ~ dnorm(0, .0001)
}
",fill = TRUE)
sink()
inits <- function() {
list(
mu = dnorm(1, 0, 0.01))
}
params <- c("mu")
chains <- 3
iter <- 1000
jags1 <- jags(model.file = "test.txt", data = list(x = x, N = N),
parameters.to.save = params, inits = inits,
n.chains = chains, n.iter = iter, n.burnin=floor(iter/2),
n.thin = ifelse(floor(iter/100) < 1, 1, floor(iter/100)))
jags2 <- update(jags1, 10000)
jags2
plot(jags2)
traceplot(jags2)
jags2.mcmc <- as.mcmc(jags2)
There is no difference in the results and I like this procedure because it's much more the way I used winbugs, so...
The last line of code converts the jags2-object to an mcmc-list which can be treated by package coda.
Good luck!
P.S. Here's a second answer:
After looking again on your code, the only thing after loading the jags-object that is missing to get the behavior you want, is:
jags$recompile()
coda.samples(model=jags,n.iter=1,variable.names="mu")
But if you really just want to use the already obtained posterior samples or maybe just want to update the chains for more iterations, you can also use the R2jags-procedure.

error message JAGS subset out of range

I am attempting to call the following jags model in R:
model{
# Main model level 1
for (i in 1:N){
ficon[i] ~ dnorm(mu[i], tau)
mu[i] <- alpha[country[i]]
}
# Priors level 1
tau ~ dgamma(.1,.1)
# Main model level 2
for (j in 1:J){
alpha[j] ~ dnorm(mu.alpha, tau.alpha)
}
# Priors level 2
mu.alpha ~ dnorm(0,.01)
tau.alpha ~ dgamma(.1,.1)
sigma.1 <- 1/(tau)
sigma.2 <- 1/(tau.alpha)
ICC <- sigma.2 / (sigma.1+sigma.2)
}
This is a hierarchical model, where ficon is a continuous variable 0-60, that may have a different mean or distribution by country. N = number of total observations (2244) and J = number of countries (34). When I run this model, I keep getting the following error message:
Compilation error on line 5.
Subset out of range: alpha[35]
This code worked earlier, but it's not working now. I assume the problem is that there are only 34 countries, and that's why it's getting stuck at i=35, but I'm not sure how to solve the problem. Any advice you have is welcome!
The R code that I use to call the model:
### input files JAGS ###
data <- list(ficon = X$ficon, country = X$country, J = 34, N = 2244)
inits1 <- list(alpha = rep(0, 34), mu.alpha = 0, tau = 1, tau.alpha = 1)
inits2 <- list(alpha = rep(1, 34), mu.alpha = 1, tau = .5, tau.alpha = .5)
inits <- list(inits1, inits2)
# call empty model
eqlsempty <- jags(data, inits, model.file = "eqls_emptymodel.R",
parameters = c("mu.alpha", "sigma.1", "sigma.2", "ICC"),
n.chains = 2, n.iter = itt, n.burnin = bi, n.thin = 10)
To solve the problem you need to renumber your countries so they only have the values 1 to 34. If you only have 34 countries and yet you are getting the error message you state then one of the countries must have the value 35. To solve this one could call the following R code before bundling the data:
x$country <- factor(x$country)
x$country <- droplevels(x$country)
x$country <- as.integer(x$country)
Hope this helps

Resources