I have a time series model (INGARCH):
lambda_t = alpha0 + alpha1*(x_(t-1)) + beta1*(lambda_(t-1))
X_t ~ poisson (lambda_t)
where t is the length of observation or data, alpha0, alpha1 and beta1 are the parameters.
X_t is the series of data, lambda_t is the series of mean.
This model has the condition of alpha1 + beta1 < 1.
In my estimation, I would like to add in the condition of alpha1 + beta1 <1 in my code, I add a while loop in the log-likelihood function, but the loop cannot stop.
What could I do to solve this problem? Is there any other way to add a constraint of alpha1 + beta1 < 1 without using while loop?
Below are my code:
ll <- function(par) {
h.new = rep(0,n)
#par[1] is alpha0
#par[2] is alpha1
#par[3] is beta1
while(par[2] + par[3] < 1){
for (i in 2:n) {
h.new[i] <- par[1] + par[2] * dat[i-1] + par[3] * h.new[i-1]
}
-sum(dpois(dat, h.new, log=TRUE))
}
}
#simply generate a dataset as I have not found a suitable real dataset to fit in
set.seed(77)
n=400
dat <- rpois(n,36)
nlminb(start = c(0.1,0.1,0.1), lower = 1e-6, ll)
You do not change par at all inside the while. In particular, if you would have printed par[1] and par[2] in the while you would see you are endlessly printing the original values, 0.1 - hence you are stuck in the while for ever.
par is a single, unchanging object in each call from nlminb. You just have to make sure if par is bad, you return something not minimal, so nlminb does not keep searching in that direction:
ll <- function(par) {
#If alpha + beta > 1, this is terrible and return an infinite score
#It may be better to throw an error if you get NaN values! The if will
#fail anyway, but if you want to power through add checks:
if( is.nan(par[2]) || is.nan(par[3]) || par[2]+par[3]>1) return(Inf)
h.new = rep(0,n)
#remove while
for (i in 2:n) {
h.new[i] <- par[1] + par[2] * dat[i-1] + par[3] * h.new[i-1]
}
-sum(dpois(dat, h.new, log=TRUE))
}
The algorithm nlminb (or any minimization function) very roughly goes:
Set parameters to initial guess
Send parameters to the objective functions
Guess new parameters:
a. if the score did not improve much, return minimized guess
b. if the score is good, keep searching in this direction
c. else, search in some other direction
Go back to (2) with new parameters
Note you have to return a score for each set of parameters, you do not iterate them in the objective function.
Related
My goal is to basically migrate this code to R.
All the preprocessing wrt datasets has been already done, now however I am stuck in writing the "model" file. As a first attempt, and for the sake of clarity, I wrote the code which is shown below in R language.
What I want to do is to run an MCMC to have an estimate of the parameter R_t, given the daily reported data for Italian Country.
The main steps that have been pursued are:
Sample an array parameter, namely the log(R_t), from a Gaussian RW distribution
Gauss_RandomWalk <- function(N, x0, mu, variance) {
z <- cumsum(rnorm(n=N, mean=mu, sd=sqrt(variance)))
t <- 1:N
x <- (x0 + t*mu + z)
return(x)
}
log_R_t <- Gauss_RandomWalk(tot_dates, 0., 0., 0.035**2)
R_t_candidate <- exp(log_R_t)
Compute some quantities, that are function of this sampled parameters, namely the number of infections. This dependence is quite simple, since it is linear algebra:
infections <- rep(0. , tot_dates)
infections[1] <- exp(seed)
for (t in 2:tot_dates){
infections[t] <- sum(R_t_candidate * infections * gt_to_convolution[t-1,])
}
Convolve the array I have just computed with a delay distribution (onset+reporting delay), finally rescaling it by the exposure variable:
test_adjusted_positive <- convolve(infections, delay_distribution_df$density, type = "open")
test_adjusted_positive <- test_adjusted_positive[1:tot_dates]
positive <- round(test_adjusted_positive*exposure)
Compute the Likelihood, which is proportional to the probability that a certain set of data was observed (i.e. daily confirmed cases), by sampling the aforementioned log(R_t) parameter from which the variable positive is computed.
likelihood <- dnbinom(round(Italian_data$daily_confirmed), mu = positive, size = 1/6)
Finally, here we come to my BUGS model file:
model {
#priors as a Gaussian RW
log_rt[1] ~ dnorm(0, 0.035)
log_rt[2] ~ dnorm(0, 0.035)
for (t in 3:tot_dates) {
log_rt[t] ~ dnorm(log_rt[t-1] + log_rt[t-2], 0.035)
R_t_candidate[t] <- exp(log_rt[t])
}
# data likelihood
for (t in 2:tot_dates) {
infections[t] <- sum(R_t_candidate * infections * gt_to_convolution[t-1,])
}
test_adjusted_positive <- convolve(infections, delay_distribution)
test_adjusted_positive <- test_adjusted_positive[1:tot_dates]
positive <- test_adjusted_positive*exposure
for (t in 2:tot_dates) {
confirmed[t] ~ dnbinom( obs[t], positive[t], 1/6)
}
}
where gt_to_convolution is a constant matrix, tot_dates is a constant value and exposure is a constant array.
When trying to compile it through:
data <- NULL
data$obs <- round(Italian_data$daily_confirmed)
data$tot_dates <- n_days
data$delay_distribution <- delay_distribution_df$density
data$exposure <- exposure
data$gt_to_convolution <- gt_to_convolution
inits <- NULL
inits$log_rt <- rep(0, tot_dates)
library (rjags)
library (coda)
set.seed(1995)
model <- "MyModel.bug"
jm <- jags.model(model , data, inits)
It raises the following raising error:
Compiling model graph
Resolving undeclared variables
Allocating nodes
Deleting model
Error in jags.model(model, data, inits) : RUNTIME ERROR:
Compilation error on line 19.
Possible directed cycle involving test_adjusted_positive
Hence I am not even able to debug it a little, even though I'm pretty sure there is something wrong more in general but I cannot figure out what and why.
At this point, I think the best choice would be to implement a Metropolis Algorithm myself according to the likelihood above, but obviously, I would way much more prefer to use an already tested framework that is BUGS/JAGS, this is the reason why I am asking for help.
When specifying a model in JAGS/BUGS, the "is distributed as" symbol ~ is very useful. How to do this in R when using MCMC methods which require me to specify the likelihood?
Let's say, I want to estimate three parameters which are multivariate normally distributed.
In JAGS, I would do this by specifying pars[1:n] ~ dmnorm(mu[1:3], sigma[1:3, 1:3]). If everything is specified correctly, JAGS would go on to estimate these parameters under the given distribution.
In R, there are similar functions, like the dmvnorm() function, from the mvtnorm package. However, I'm not sure how to use these. I have to provide the data to get the probability density, whereas in JAGS, I only have to provide the parameters of the distribution like mu and sigma. What is the R equivalent to the ~ syntax in JAGS?
Here's some random data:
set.seed(123)
y = rbinom(10, 1, 0.2)
y
> y
[1] 0 0 0 1 1 0 0 1 0 0
So we know that the value of p that generated this data is 0.2. Let's see how we could try and recover that information (assuming we didn't know it). In JAGS I would write the following model:
model{
for(i in 1:10){
y[i] ~ dbern(p)
}
p ~ dunif(0, 1)
}
So I have said that the data is generated using (or sampled from) a Bernoulli distribution with parameter p, and that the prior for p is a Beta(1,1) which is equivalent to a uniform distribution.
So let's (initially) forget the Bayesian part. You have asked how to compute the likelihood. The likelihood for a parameter theta given independent and identically distributed data y = (y_1, ..., y_N) is
L(theta | y) = product(f(y_i | theta), i = 1,...,N)
In our example, the pdf f(y_i | theta) is p^y_i * (1 - p)^(1 - y_i). I know this just simplifies to p if y_i is 1, or (1 - p) if y_i is zero, but let's assume we don't know this and we are just using the Binomial probability function with parameters n = 1, and p to compute this, then you could get the likelihood like this:
Like = function(p){
prod(dbinom(y, 1, p))
}
This is a pretty simple function that only works for single values of p, but it works, e.g.
> Like(0.1)
[1] 0.0004782969
> Like(0.2)
[1] 0.001677722
> Like(0.3)
[1] 0.002223566
>
We can make it work for a whole range of values of p by using sapply
Like = function(p){
sapply(p, function(p.i)prod(dbinom(y, 1, p.i)))
}
So now, for example I could calculate the likelihood of values of p ranging from 0.01 to 0.99 in steps of 0.01 by
p = seq(0.01, 0.99, by = 0.01)
l = Like(p)
And I could plot them
plot(p, l, type = "l")
You can see from the plot that the likelihood is maximized at 0.3, so this is the MLE of p based on this data.
Going back to the Bayesian problem, here's an implementation of Metropolis-Hastings (uncommented sorry):
MH = function(N = 1000, p0 = runif(1)){
log.like = function(p){
sum(dbinom(y, size = 1, p, log = TRUE))
}
ll0 = log.like(p0)
r = c(p0, rep(0, N))
for(i in 1:N){
p1 = runif(1)
ll1 = log.like(p1)
if(ll1 > ll0 || log(runif(1)) < ll1 - ll0){
p0 = p1
ll0 = ll1
}
r[i + 1] = p0
}
return(r)
}
Now we take a sample of size 10,000 from this, with
set.seed(123)
p = MH(10000)
plot(density(p))
abline(v = c(mean(p), mean(p) + c(-1,1)*qnorm(0.975)*sd(p)))
and plot the KDE of the sample (plus some credible intervals)
And see that the Metropolis-Hastings has worked -- the intervals are wide because the sample size is small.
The criteria is determined by the following function in the attached image.
I have been working at it for a while now but this is what I have. I need to
ridge.crit <- function(y, X, lambda, beta){
n <- length(y) #number of rows
p <- ncol(X) #number of predictor variables
for(i in 1:n){
(y[i] - beta[1] - sum(beta[2:p+1] * X[i,]))^2
##WE GOTTA ADD THEM TOGETHER AND STORE FINAL ANSWER AND RETURN IT!
}
return(final_answer)
}
I have a problem regarding the following model,
where I want to make inference on μ and tau, u is a known vector and x is the data vector. The log-likelihood is
I have a problem writing a log-likelihood in R.
x <- c(3.3569,1.9247,3.6156,1.8446,2.2196,6.8194,2.0820,4.1293,0.3609,2.6197)
mu <- seq(0,10,length=1000)
normal.lik1<-function(theta,x){
u <- c(1,3,0.5,0.2,2,1.7,0.4,1.2,1.1,0.7)
mu<-theta[1]
tau<-theta[2]
n<-length(x)
logl <- sapply(c(mu,tau),function(mu,tau){logl<- -0.5*n*log(2*pi) -0.5*n*log(tau^2+u^2)- (1/(2*tau^2+u^2))*sum((x-mu)^2) } )
return(logl)
}
#test if it works for mu=1, tau=2
head(normal.lik1(c(1,2),x))
#Does not work..
I want to be able to plug in the vector for mu and plot it over mu for a fixed value of tau, say 2. I also want to find out the MLE's of tau and mu using the optim function. I tried:
theta.hat<-optim(c(1,1),loglike2,control=list(fnscale=-1),x=x,,method="BFGS")$par
But it does not work.. Any suggestions to how I can write the likelihood?
First, as has been mentioned in the comments to your question, there is no need to use sapply(). You can simply use sum() – just as in the formula of the logLikelihood.
I changed this part in normal.lik1() and multiplied the expression that is assigned to logl by minus 1 such that the function computes the minus logLikelihood. You want to search for the minimum over theta since the function returns positive values.
x < c(3.3569,1.9247,3.6156,1.8446,2.2196,6.8194,2.0820,4.1293,0.3609,2.6197)
u <- c(1,3,0.5,0.2,2,1.7,0.4,1.2,1.1,0.7)
normal.lik1 <- function(theta,x,u){
mu <- theta[1]
tau <- theta[2]
n <- length(x)
logl <- - n/2 * log(2*pi) - 1/2 * sum(log(tau^2+u^2)) - 1/2 * sum((x-mu)^2/(tau^2+u^2))
return(-logl)
}
This can be done using nlm(), for example
nlm(normal.lik1, c(0,1), hessian=TRUE, x=x,u=u)$estimate
where c(0,1) are the starting values for the algorithm.
To plot the logLikelihood for a range of values of mu and some fixed tau you can adjust the function such that mu and tau are separate numeric arguments.
normal.lik2 <- function(mu,tau,x,u){
n <- length(x)
logl <- - n/2 * log(2*pi) - 1/2 * sum(log(tau^2+u^2)) - 1/2 * sum((x-mu)^2/(tau^2+u^2))
return(logl)
}
Then define some range for mu, compute the loglikelihood and use plot().
range.mu <- seq(-10,20,0.1)
loglik <- sapply(range.mu, function(m) normal.lik2(mu=m,tau=2,x=x,u=u))
plot(range.mu, loglik, type = "l")
I'm sure there are more elegant ways to do this but this does the trick.
I have the following latent variable model: Person j has two latent variables, Xj1 and Xj2. The only thing we get to observe is their maximum, Yj = max(Xj1, Xj2). The latent variables are bivariate normal; they each have mean mu, variance sigma2, and their correlation is rho. I want to estimate the three parameters (mu, sigma2, rho) using only Yj, with data from n patients, j = 1,...,n.
I've tried to fit this model in JAGS (so I'm putting priors on the parameters), but I can't get the code to compile. Here's the R code I'm using to call JAGS. First I generate the data (both latent and observed variables), given some true values of the parameters:
# true parameter values
mu <- 3
sigma2 <- 2
rho <- 0.7
# generate data
n <- 100
Sigma <- sigma2 * matrix(c(1, rho, rho, 1), ncol=2)
X <- MASS::mvrnorm(n, c(mu,mu), Sigma) # n-by-2 matrix
Y <- apply(X, 1, max)
Then I define the JAGS model, and write a little function to run the JAGS sampler and return the samples:
# JAGS model code
model.text <- '
model {
for (i in 1:n) {
Y[i] <- max(X[i,1], X[i,2]) # Ack!
X[i,1:2] ~ dmnorm(X_mean, X_prec)
}
# mean vector and precision matrix for X[i,1:2]
X_mean <- c(mu, mu)
X_prec[1,1] <- 1 / (sigma2*(1-rho^2))
X_prec[2,1] <- -rho / (sigma2*(1-rho^2))
X_prec[1,2] <- X_prec[2,1]
X_prec[2,2] <- X_prec[1,1]
mu ~ dnorm(0, 1)
sigma2 <- 1 / tau
tau ~ dgamma(2, 1)
rho ~ dbeta(2, 2)
}
'
# run JAGS code. If latent=FALSE, remove the line defining Y[i] from the JAGS model
fit.jags <- function(latent=TRUE, data, n.adapt=1000, n.burnin, n.samp) {
require(rjags)
if (!latent)
model.text <- sub('\n *Y.*?\n', '\n', model.text)
textCon <- textConnection(model.text)
fit <- jags.model(textCon, data, n.adapt=n.adapt)
close(textCon)
update(fit, n.iter=n.burnin)
coda.samples(fit, variable.names=c("mu","sigma2","rho"), n.iter=n.samp)[[1]]
}
Finally, I call JAGS, feeding it only the observed data:
samp1 <- fit.jags(latent=TRUE, data=list(n=n, Y=Y), n.burnin=1000, n.samp=2000)
Sadly this results in an error message: "Y[1] is a logical node and cannot be observed". JAGS does not like me using "<-" to assign a value to Y[i] (I denote the offending line with an "Ack!"). I understand the complaint, but I'm not sure how to rewrite the model code to fix this.
Also, to demonstrate that everything else (besides the "Ack!" line) is fine, I run the model again, but this time I feed it the X data, pretending that it's actually observed. This runs perfectly and I get good estimates of the parameters:
samp2 <- fit.jags(latent=FALSE, data=list(n=n, X=X), n.burnin=1000, n.samp=2000)
colMeans(samp2)
If you can find a way to program this model in STAN instead of JAGS, that would be fine with me.
Theoretically you can implement a model like this in JAGS using the dsum distribution (which in this case uses a bit of a hack as you are modelling the maximum and not the sum of the two variables). But the following code does compile and run (although it does not 'work' in any real sense - see later):
set.seed(2017-02-08)
# true parameter values
mu <- 3
sigma2 <- 2
rho <- 0.7
# generate data
n <- 100
Sigma <- sigma2 * matrix(c(1, rho, rho, 1), ncol=2)
X <- MASS::mvrnorm(n, c(mu,mu), Sigma) # n-by-2 matrix
Y <- apply(X, 1, max)
model.text <- '
model {
for (i in 1:n) {
Y[i] ~ dsum(max_X[i])
max_X[i] <- max(X[i,1], X[i,2])
X[i,1:2] ~ dmnorm(X_mean, X_prec)
ranks[i,1:2] <- rank(X[i,1:2])
chosen[i] <- ranks[i,2]
}
# mean vector and precision matrix for X[i,1:2]
X_mean <- c(mu, mu)
X_prec[1,1] <- 1 / (sigma2*(1-rho^2))
X_prec[2,1] <- -rho / (sigma2*(1-rho^2))
X_prec[1,2] <- X_prec[2,1]
X_prec[2,2] <- X_prec[1,1]
mu ~ dnorm(0, 1)
sigma2 <- 1 / tau
tau ~ dgamma(2, 1)
rho ~ dbeta(2, 2)
#data# n, Y
#monitor# mu, sigma2, rho, tau, chosen[1:10]
#inits# X
}
'
library('runjags')
results <- run.jags(model.text)
results
plot(results)
Two things to note:
JAGS isn't smart enough to initialise the matrix of X while satisfying the dsum(max(X[i,])) constraint on its own - so we have to initialise X for JAGS using sensible values. In this case I'm using the simulated values which is cheating - the answer you get is highly dependent on the choice of initial values for X, and in the real world you won't have the simulated values to fall back on.
The max() constraint causes problems to which I can't think of a solution within a general framework: unlike the usual dsum constraint that allows one parameter to decrease while the other increases and therefore both parameters are used at all times, the min() value of X[i,] is ignored and the sampler is therefore free to do as it pleases. This will very very rarely (i.e. never) lead to values of min(X[i,]) that happen to be identical to Y[i], which is the condition required for the sampler to 'switch' between the two X[i,]. So switching never happens, and the X[] that were chosen at initialisation to be the maxima stay as the maxima - I have added a trace parameter 'chosen' which illustrates this.
As far as I can see the other potential solutions to the 'how do I code this' question will fall into essentially the same non-mixing trap which I think is a fundamental problem here (although I might be wrong and would very much welcome working BUGS/JAGS/Stan code that illustrates otherwise).
Solutions to the failure to mix are harder, although something akin to the Carlin & Chibb method for model selection may work (force a min(pseudo_X) parameter to be equal to Y to encourage switching). This is likely to be tricky to get working, but if you can get help from someone with a reasonable amount of experience with BUGS/JAGS you could try it - see:
Carlin, B.P., Chib, S., 1995. Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B 57, 473–484.
Alternatively, you could try thinking about the problem slightly differently and model X directly as a matrix with the first column all missing and the second column all equal to Y. You could then use dinterval() to set a constraint on the missing values that they must be lower than the corresponding maximum. I'm not sure how well this would work in terms of estimating mu/sigma2/rho but it might be worth a try.
By the way, I realise that this doesn't necessarily answer your question but I think it is a useful example of the difference between 'is it codeable' and 'is it workable'.
Matt
ps. A much smarter solution would be to consider the distribution of the maximum of two normal variates directly - I am not sure if such a distribution exists, but it it does and you can get a PDF for it then the distribution could be coded directly using the zeros/ones trick without having to consider the value of the minimum at all.
I believe you can model this in the Stan language treating the likelihood as a two component mixture with equal weights. The Stan code could look like
data {
int<lower=1> N;
vector[N] Y;
}
parameters {
vector<upper=0>[2] diff[N];
real mu;
real<lower=0> sigma;
real<lower=-1,upper=1> rho;
}
model {
vector[2] case_1[N];
vector[2] case_2[N];
vector[2] mu_vec;
matrix[2,2] Sigma;
for (n in 1:N) {
case_1[n][1] = Y[n]; case_1[n][2] = Y[n] + diff[n][1];
case_2[n][2] = Y[n]; case_2[n][1] = Y[n] + diff[n][2];
}
mu_vec[1] = mu; mu_vec[2] = mu;
Sigma[1,1] = square(sigma);
Sigma[2,2] = Sigma[1,1];
Sigma[1,2] = Sigma[1,1] * rho;
Sigma[2,1] = Sigma[1,2];
// log-likelihood
target += log_mix(0.5, multi_normal_lpdf(case_1 | mu_vec, Sigma),
multi_normal_lpdf(case_2 | mu_vec, Sigma));
// insert priors on mu, sigma, and rho
}