Learning hidden markov model in R - r

A hidden Markov model (HMM) is one in which you observe a sequence of observations, but do not know the sequence of states the model went through to generate the observations. Analyses of hidden Markov models seek to recover the sequence of hidden states from the observed data.
I have data with both observations and hidden states (observations are of continuous values) where the hidden states were tagged by an expert. I would like to train a HMM that would be able - based on a (previously unseen) sequence of observations - to recover the corresponding hidden states.
Is there any R package to do that? Studying the existing packages (depmixS4, HMM, seqHMM - for categorical data only) allows you to specify a number of hidden states only.
EDIT:
Example:
data.tagged.by.expert = data.frame(
hidden.state = c("Wake", "REM", "REM", "NonREM1", "NonREM2", "REM", "REM", "Wake"),
sensor1 = c(1,1.2,1.2,1.3,4,2,1.78,0.65),
sensor2 = c(7.2,5.3,5.1,1.2,2.3,7.5,7.8,2.1),
sensor3 = c(0.01,0.02,0.08,0.8,0.03,0.01,0.15,0.45)
)
data.newly.measured = data.frame(
sensor1 = c(2,3,4,5,2,1,2,4,5,8,4,6,1,2,5,3,2,1,4),
sensor2 = c(2.1,2.3,2.2,4.2,4.2,2.2,2.2,5.3,2.4,1.0,2.5,2.4,1.2,8.4,5.2,5.5,5.2,4.3,7.8),
sensor3 = c(0.23,0.25,0.23,0.54,0.36,0.85,0.01,0.52,0.09,0.12,0.85,0.45,0.26,0.08,0.01,0.55,0.67,0.82,0.35)
)
I would like to create a HMM with discrete time t whrere random variable x(t) represents the hidden state at time t, x(t) {"Wake", "REM", "NonREM1", "NonREM2"}, and 3 continuous random variables sensor1(t), sensor2(t), sensor3(t) representing the observations at time t.
model.hmm = learn.model(data.tagged.by.user)
Then I would like to use the created model to estimate hidden states responsible for newly measured observations
hidden.states = estimate.hidden.states(model.hmm, data.newly.measured)

Data (training/testing)
To be able to run learning methods for Naive Bayes classifier, we need longer data set
states = c("NonREM1", "NonREM2", "NonREM3", "REM", "Wake")
artificial.hypnogram = rep(c(5,4,1,2,3,4,5), times = c(40,150,200,300,50,90,30))
data.tagged.by.expert = data.frame(
hidden.state = states[artificial.hypnogram],
sensor1 = log(artificial.hypnogram) + runif(n = length(artificial.hypnogram), min = 0.2, max = 0.5),
sensor2 = 10*artificial.hypnogram + sample(c(-8:8), size = length(artificial.hypnogram), replace = T),
sensor3 = sample(1:100, size = length(artificial.hypnogram), replace = T)
)
hidden.hypnogram = rep(c(5,4,1,2,4,5), times = c(10,10,15,10,10,3))
data.newly.measured = data.frame(
sensor1 = log(hidden.hypnogram) + runif(n = length(hidden.hypnogram), min = 0.2, max = 0.5),
sensor2 = 10*hidden.hypnogram + sample(c(-8:8), size = length(hidden.hypnogram), replace = T),
sensor3 = sample(1:100, size = length(hidden.hypnogram), replace = T)
)
Solution
In the solution, we used Viterbi algorithm - combined with Naive Bayes classifier.
At each clock time t, a Hidden Markov Model consist of
an unobserved state (denoted as hidden.state in this case) taking a finite number of states
states = c("NonREM1", "NonREM2", "NonREM3", "REM", "Wake")
a set of observed variables (sensor1, sensor2, sensor3 in this case)
Transition matrix
A new state is entered based upon a transition probability distribution
(transition matrix). This can be easily computed from data.tagged.by.expert e.g. using
library(markovchain)
emit_p <- markovchainFit(data.tagged.by.expert$hidden.state)$estimate
Emission matrix
After each transition is made, an observation (sensor_i) is produced according to a conditional probability distribution (emission matrix) which depends on the current state H of hidden.state only. We will replace emmision matrices by Naive Bayes classifier.
library(caret)
library(klaR)
library(e1071)
model = train(hidden.state ~ .,
data = data.tagged.by.expert,
method = 'nb',
trControl=trainControl(method='cv',number=10)
)
Viterbi algorithm
To solve the problem, we use Viterbi algorithm with the initial probability of 1 for "Wake" state and 0 otherwise. (We expect the patient to be awake in the beginning of the experiment)
# we expect the patient to be awake in the beginning
start_p = c(NonREM1 = 0,NonREM2 = 0,NonREM3 = 0, REM = 0, Wake = 1)
# Naive Bayes model
model_nb = model$finalModel
# the observations
observations = data.newly.measured
nObs <- nrow(observations) # number of observations
nStates <- length(states) # number of states
# T1, T2 initialization
T1 <- matrix(0, nrow = nStates, ncol = nObs) #define two 2-dimensional tables
row.names(T1) <- states
T2 <- T1
Byj <- predict(model_nb, newdata = observations[1,])$posterior
# init first column of T1
for(s in states)
T1[s,1] = start_p[s] * Byj[1,s]
# fill T1 and T2 tables
for(j in 2:nObs) {
Byj <- predict(model_nb, newdata = observations[j,])$posterior
for(s in states) {
res <- (T1[,j-1] * emit_p[,s]) * Byj[1,s]
T2[s,j] <- states[which.max(res)]
T1[s,j] <- max(res)
}
}
# backtract best path
result <- rep("", times = nObs)
result[nObs] <- names(which.max(T1[,nObs]))
for (j in nObs:2) {
result[j-1] <- T2[result[j], j]
}
# show the result
result
# show the original artificial data
states[hidden.hypnogram]
References
To read more about the problem, see Vomlel Jiří, Kratochvíl Václav : Dynamic Bayesian Networks for the Classification of Sleep Stages , Proceedings of the 11th Workshop on Uncertainty Processing (WUPES’18), p. 205-215 , Eds: Kratochvíl Václav, Vejnarová Jiřina, Workshop on Uncertainty Processing (WUPES’18), (Třeboň, CZ, 2018/06/06) [2018] Download

Related

Simulating ODE model for different initial conditions in R

I have a model, and I want to generate random initial conditions, run the model, and save the output so that each simulation is a replicate. But I have a hard time interpreting and implementing loops (and I also know they are not always the best to use in R), so I'm struggling.
My ultimate goal is to iterate the simulation across 10 different random initial conditions, and save the output of the ODE including a column for simulation number.
First I have my random initial conditions:
library(deSolve)
states <- c(r=runif(1, min=0.1, max=25), # resource state variable
c=runif(1, min=0.1, max=10)) # consumer state variable
Then I have my parameters and model:
parameters <- c(g=5, # resource growth rate )
K=25, # resource carrying capacity
a=1, # consumer attack rate
h=1, # consumer handling time
e=0.9, # consumer conversion efficiency
m=0.5, # consumer mortality rate
avgrain = 1500, # average rainfall
A = 1000,
w = 0.6,
phi = 8.5,
ropt1 = 1500, # optimal rainfall for resource growth
s1 = 1000, # standard deviation for plant growth rate as a function of rainfall
ropt2 = 1000, # optimal rainfall for herbivore attack (feeding) rate
s2 = 500, # standard deviation for herbivore attack rate as a function of rainfall
avgtemp = 20, # average temperature
A_temp = 7,
w_temp = 0.5,
phi_temp = 0.5,
topt1 = 13, # optimal temperature for resource growth
ts1 = 10 # standard deviation for plant growth rate as a function of temperature
)
model <- function(t, states, parameters) {
with(as.list(c(states, parameters)), {
# rainfall time series function
rain <- avgrain + (A*sin((w*t)+phi)) # rainfall function
# temperature time series function
temp = avgtemp + (A_temp*sin((w_temp*t)+phi_temp))
# dynamic g and a equations
dg_both <- (exp(-(rain - ropt1)^2/(s1^2))) + (exp(-(temp - topt1)^2/(ts1^2)))
da = exp(-(rain - ropt2)^2/(s2^2))
# rate of change of state variables
dr <- dg_both*r*(1-(r/K)) - ((c*da*r)/(1+(da*h*r)))
dc <- ((c*e*da*r)/(1+(da*h*r)))- c*m
# return rate of change
list(c(dr, dc), rain=rain, temp=temp, dg_both=dg_both, da=da)
})
}
times <- seq(0, 200, by = 1)
out <- ode(y = states, times = times, func = model, parms = parameters, method="lsoda")
Would I do this with a for loop? Thank you in advance!
Here one of the other approaches, mentioned by #Ben Bolker. Here we use replicate instead of a loop. This has the advantage, that we don't need to create a list() for the results beforehand.
N <- 10
res <- replicate(N, ode(y = c(r = runif(1, min = 0.1, max = 25),
c = runif(1, min = 0.1, max = 10)),
times = times, func = model,
parms = parameters, method="lsoda"),
simplify = FALSE)
plot(out, res)
As an additional goody, we can also plot the results using deSolve's built-in plotting function. This works of course also with res in Ben's approach. The resulting data structure can then be simplified to something like a matrix or array, either with do.call(rbind, res) as in Ben's example, or with option simplify directly in replicate.
Yes, a for loop will be fine. There are lots of other slightly fancier ways to do this (replicate or lapply from base R, purrr::map_dfr from tidyverse ...), but they won't save you any time or memory — they're just a slightly more elegant way to do the same thing.
set.seed(101)
N <- 10
res <- list()
for (i in 1:N) {
## pick new initial conditions
cur_states <- c(r=runif(1, min=0.1, max=25),
c=runif(1, min=0.1, max=10))
## run model and attach index column to the matrix
res[[i]] <-
cbind(run = i,
ode(y = cur_states, times = times, func = model,
parms = parameters, method="lsoda")
)
}
## combine individual runs into one long matrix
res_final <- do.call(rbind,res)

r: for loop to simulate predictions when random sampling is applied

I am trying to simulate how replacement/reassignment of values on random samples affect predictions conveyed by AUC.
I have a tumor classification in a dataframe denoted df$who which has levels 1, 2, 3 corresponding to the severity of the tumor lesion.
Intro to the question
Lets say the baseline data looks like this:
set.seed(1)
df <- data.frame(
who = as.factor(sample(1:3, size = 6000, replace = TRUE, prob = c(0.8, 0.15, 0.05))),
age = round(runif(n = 6000, min = 18, max = 95), digits = 1),
gender = sample(c("m", "f"), size = 6000, replace = TRUE, prob = c(1/3, 2/3)),
event.time = runif(n = 6000, min = 8, max = 120),
event = as.factor(sample(0:2, size = 6000, replace = TRUE, prob = c(0.25, 0.2, 0.55)))
)
And a standard cause-specific Cox regression looks like:
library(survival)
a_baseline <- coxph(Surv(event.time, event == 1) ~ who + age + gender, data = df, x = TRUE)
From which AUC can be obtained as a measure of predictive performance. Here, leave-one-out bootstrap on 5-year prediction on df$event == 1.
library(riskRegression)
u <- Score(list("baseline" = a_baseline),
Surv(event.time, event == 1) ~ 1,
data = df,
times = 60,
plots = "cal",
B = 50,
split.method = "loob",
metrics = c("auc", "brier")
)
# The AUC is then obtained
u$AUC$score$AUC[2]
Question
I want to simulate how re-classifying a random 5% of df$who == 1 to dfwho == 2 affect the 5-year prediction on df$event == 1
I want to create 10 separate and simulated subsets of the baseline data df, but each containing a random allocation of 5% df$who == 1 to .. == 2. Then, I want to apply each of these 10 separate and simulated subsets to predict the 5-year risk of df$event == 1.
I have applied a for loop to this. The expected output is dataframe that tells me which of the 10 simulated datasets yielded the highest and lowest u$AUC$score$AUC[2] (i.e., the best and worst prediction).
I am new to for loop, but here is my go (that obviously did not work).
all_auc <- data.frame() ## create a dataframe to fill in AUC from all 10 simulated sub-datasets
for(i in 1:10){ #1:10 represent the simulated datasets from 1 to 10
df[i] <- df #allocating baseline data to each of the 10 datasets
df[i]$who[sample(which(df[i]$who==1), round(0.05*length(which(df[i]$who==1))))]=2 #create the random 5% allocation of who==1 to who==2 in the i'th simulated dataset
ith_cox <- coxph(Surv(event.time, event == 1) ~ who + age + gender, data = df[i], x = TRUE) #create the i'th Cox regression based on the i´th dataset
# create the predictions based on the i´th Cox
u[i] <- Score(list("baseline" = ith_cox),
Surv(event.time, event == 1) ~ 1,
data = df[i],
times = 60,
plots = "cal",
B = 50,
split.method = "loob",
metrics = c("auc", "brier")
)
# summarize all AUC from all 10 sub-datasets
all_auc <- u[i]$AUC$score$AUC[2]
}
(1) I could not get this for loop to work as described, and
(2) the final dataframe all_auc should provide only which of the 10 datasets yielded the worst and best predictions (I will then use these two data sets for further analysis).
A final note
This is only a reproducible example. The for loop will be applied to 10.000 simulated datasets in our analysis. I do not know if this could affect the answer - but, it illustrates the importance of the result: a dataframe (or vector?) that simply tells me which simulated dataset yielded the best vs worst predictions, and that I subsequently will be able to use these two dataframes for furter analysis, eg df2930 and df8939.

R LightGBM ignores init_score when continuing training with init_model

General description of my problem
I am performing a Poisson regression using LightGBM in R.
I am using an "offset" for the training, similar to using log(time) in a GLM as the offset when modelling insurance claims because we want to ensure that expected value of the response is proportional to time. I do this using the init_score parameter within lab.train().
I am using the "continue training" option in lgb.train (where you specify a value for init_model). This is because I want to build a "stumps" model first, and then continue training with a more complex model. This is to help me identify potential interaction terms in the data. This is just for background why I am doing this - not relevant to the specific issue described below.
However, when I continue training, the offset originally specified in the first model I build is no longer used by the fitting process. I think init_model overrides any value of init_score, but init_model does NOT itself contain or allow for init_score. So, as far as I can see, the init_score is totally lost from the fitting process once you continue training using init_model.
This means that the "starting point" when continuing to train a model is not the "finishing point" from the original model build. e.g. in my example below, I want the poisson log-likelihood error metric for models 2 and 3 to "start" from where model 1 finished. This isn't the case - but surely that is what "continue training" should deliver?
I have entered comments into the code below to explain the issue more clearly.
Reproducible example
library(lightgbm)
library(data.table)
# simulate some data
# z follows a Poisson distribution
# the mean of z is given by t * exp(x+y), where t is the "time exposed to risk"
# t is uniform(0,10)
# x and y are uniform(0,1)
# I want to specify log(t) using init_score in the lightGBM
# i.e. just like Poisson regression in insurance where log(t) is the offset in a GLM or GBM
n <- 10000 # number of rows
set.seed(42)
d <- data.table(t = runif(n,0,10), x = runif(n,0,1), y = runif(n,0,1))
d[, z := rpois(n, t * exp(x+y))]
# check weighted mean looks about right
# should get actual = 2.957188 and
# underlying = 2.939975
d[, list(actual = sum(z)/sum(t),
underlying = sum(t * exp(x+y))/sum(t)),]
# build a lightGBM using 100 rounds and specify log(t) as init_score
feature_cols <- c('x','y')
dm <- as.matrix(d[, ..feature_cols])
l_train <- lgb.Dataset(dm, label=d[,z], free_raw_data = FALSE)
setinfo(l_train, "init_score", log(d$t))
params <- list(objective='poisson', metric = 'poisson')
lgbm_1 <- lgb.train(params = params,
valids = list(train = l_train),
data = l_train,
nrounds = 100,
num_leaves = 2,
bagging_fraction = 1,
bagging_freq = 1,
feature_fraction = 1,
learning_rate=0.2)
train_log_1 <- lgb.get.eval.result(lgbm_1, "train", 'poisson')
# get the model predictions and check that they are close to expected
# remember that we need to manually apply the init_score to get the prediction
# i.e. we need to add log(t) onto the raw score, or multiply the scaled prediction by t
# the predictions are all very close
d[, lgbm_predicted_1 := t*predict(lgbm_1, dm, raw_score = FALSE)]
d[, list(actual = sum(z)/sum(t),
predicted_1 = sum(lgbm_predicted_1)/sum(t),
underlying = sum(t * exp(x+y))/sum(t)),]
# save the model
lgb.save(lgbm_1, 'lgbm_1.txt')
# ATTEMPT A - CONTINUE TRAINING FROM MODEL 1
# don't change the init_score
# note iterations in console start at 101 because we are continuing training
# however, the error metric (poisson log likelihood)
# start from a totally different value to where the first model ended
lgbm_2 <- lgb.train(params = params,
init_model = 'lgbm_1.txt',
valids = list(train = l_train),
data = l_train,
nrounds = 100,
num_leaves = 2,
bagging_fraction = 1,
bagging_freq = 1,
feature_fraction = 1,
learning_rate=0.2)
train_log_2 <- lgb.get.eval.result(lgbm_2, "train", 'poisson')
# check predictions - predicted_2 are WAY TOO HIGH now!
# I think this is because lightGBM uses the predictions from the first model
# as the starting point for training
# but the predictions from model 1 DO NOT ALLOW FOR THE log(t) being the offset to the original model!
d[, lgbm_predicted_2 := t*predict(lgbm_2, dm, raw_score = FALSE)]
d[, list(actual = sum(z)/sum(t),
predicted_1 = sum(lgbm_predicted_1)/sum(t),
predicted_2 = sum(lgbm_predicted_2)/sum(t),
underlying = sum(t * exp(x+y))/sum(t)),]
# ATTEMPT B - try init_score = 0?
# doesn't seem to make any difference
# so my hypothesis is that init_score is being ignored
# and over-written by the init_model
# but... how does the original init_score ever get back into the fitting process?
# init_score + init_model is a good stating point
# init_model on it's own is not
setinfo(l_train, "init_score", rep(0, nrow(d)))
lgbm_3 <- lgb.train(params = params,
valids = list(train = l_train),
init_model = 'lgbm_1.txt',
data = l_train,
nrounds = 100,
num_leaves = 2,
bagging_fraction = 1,
bagging_freq = 1,
feature_fraction = 1,
learning_rate=0.2)
train_log_3 <- lgb.get.eval.result(lgbm_3, "train", 'poisson')
# check predictions - models 2 and 3 are identical, the init_score made no difference
d[, lgbm_predicted_3 := t*predict(lgbm_3, dm, raw_score = FALSE)]
d[, list(actual = sum(z)/sum(t),
predicted_1 = sum(lgbm_predicted_1)/sum(t),
predicted_2 = sum(lgbm_predicted_2)/sum(t),
predicted_3 = sum(lgbm_predicted_3)/sum(t),
underlying = sum(t * exp(x+y))/sum(t)),]
# compare training logs
# question - why do V2 and V3 not start from the "finishing" point of V1?
# it's because the init_model is wrong, because it doesn't allow for the init_score
logs <- data.table(v1 = train_log_1, v2 = train_log_2, v3 = train_log_3)

How can I calculate the Sobol sensitivities of the area under a curve in R?

I have written a fairly complex ordinary differential equation pharmacokinetic / pharmacodynamic model in R using the dMod package. I’ve converted this into equations readable by the ODEsensitivity package so I can get Sobol sensitivity indices for my rate constants. I can return the contribution of the rate constants to the value of my drug concentration in different compartments.
However, my desired readout is a duration of drug exposure in a compartment – this could be achieved by the amount of time for which the drug is over an effective concentration, or by an area-under-the-curve. However, I don’t understand how I can evaluate a Sobol sensitivity for any metric more complicated than the drug concentration. How can I return the Sobol indices for a different output?
I wrote the following simple toy version to simulate drug transfer between two compartments of different size to demonstrate. The code below returns a data frame of sensitivities at each point in time for ster_C1 and ster_C2. I would rather evaluate the sensitivity of ster_C1/ster_C2, or the area under the curve for ster_C1 between some arbitrary timepoints. Thanks in advance!
library(ODEsensitivity)
toymod <- function(Time, State, Pars) {
with(as.list(c(State, Pars)), {
dchem_B <- -1*(k01*chem_B) +1*(k02*chem_C)*(1/10)
dchem_C <- 1*(k01*chem_B)*(10/1) -1*(k02*chem_C)
return(list( c(dchem_B, dchem_C) ))
})
}
toyparams <- c(
chem_B = 1e-10,
chem_C = 0,
k01 = 200,
k02 = 1000
)
toyparnames <- names(toyparams)
toylower <- toyparams/10
toyupper <- toyparams*10
toyinit <- c(chem_B = 1e-10, chem_C = 0)
toytimes <- seq(5e-05, 0.005, len = 100)
set.seed(59281)
toy_sobol <- ODEsobol(mod = toymod, pars = toyparnames,
state_init = toyinit,
times = toytimes,
n = 500,
rfuncs = "runif",
rargs = paste0("min = ", toylower,
", max = ", toyupper),
sobol_method = "Martinez",
ode_method = "lsoda",
parallel_eval = TRUE,
parallel_eval_ncores = 2)

multivariate state space model dlm okuns law

I'm trying to estimate an Okun's law equation with a dlm using the dlm package in R. I can estimate the non-time varying model using nls as follows:
const_coef <- nls(formula = dur~ b1*dur_lag1 + b2*(d2lgdp-b0) + b3*d2lrulc_lag2 ,
start = list(b0 =0.1, b1=0.1, b2=0.1, b3=0.1),
data = mod_data)
the dlm model I want to be able to estimate allows for b1 and b0 in the above to follow random walks. I can do this in Eviews by declaring the measurement equation and appending the states (below is some code provided by the authors of the original paper which I can replicate:
'==========================
' SPECIFY THE KALMAN FILTER
'==========================
'Priors on state variables
vector(2) mprior
mprior(1) = 4 'Prior on starting value for trend GDP growth (annual average GDP growth over 1950s)
mprior(2) = 0 'Prior on starting value for lagged dependent variable
sym(2) vprior
vprior(1,1) = 5 'Prior on variance of trend GDP growth (variance of annual GDP growth over 1950s)
vprior(2,2) = 1 'Prior on variance of lagged dependent variable
'Specify coefficient vector
coef(8) ckf
'Declare state space
sspace ss1
ss1.append dur = lag*dur(-1) + ckf(2)*(d2lgdp-trend)+ckf(3)*D2LRULC(-2)+[var=exp(ckf(4))] 'Measurement equation
ss1.append #state trend = 1*trend(-1) + [var = exp(ckf(5))] 'State equation for trend GDP growth (random walk)
ss1.append #state lag = 1*lag(-1) + [var = exp(ckf(6))] 'State equation for lagged dependent variable (random walk)
'Apply priors to state space
ss1.append #mprior mprior
ss1.append #vprior vprior
'Set parameter starting values
param ckf(2) -0.0495 ckf(3) 0.01942 ckf(4) -2.8913 ckf(5) -4.1757 ckf(6) -6.2466 'starting values for parameters
'=====================
' ESTIMATE THE MODEL
'=====================
'Estimate state space
smpl %estsd %ested 'Estimation sample
ss1.ml(m=500,showopts) 'Estimate Kalman filter by maximum likelihood
freeze(mytab) ss1.stats
I'm really not sure how to do this with the dlm package. I've tried the following:
buildSS <- function(v){
dV <- exp(v[1]) # Variance of the measurment equation (ckf4)
dW <- c(exp(v[2]), # variance of the lagged dep (ckf6)
0, # variance of the coef on d2lgdp ckf(2) set to 0
0, # variance of the coef on d2lrulc ckf(3) set to 0
exp(v[3]) # variance of the random walk intercept (ckf5)
)
beta.vec <- c(1,v[4],v[5],1) # Params ckf(2) ckf3(3)
okuns <- dlmModReg(mod_data.tvp[,-1], addInt = TRUE, dV =dV, dW = dW, m0 = beta.vec)
}
#'Set parameter starting values
ckf4Guess <- -2.8913
ckf2guess <- -0.0495
ckf3guess <- 0.01942
ckf5guess <- -4.1757
ckf6guess <- -6.2466
params <- c(ckf4Guess,
ckf5guess,
ckf6guess,
ckf2guess,
ckf3guess)
tvp_mod.mle <- dlmMLE(mod_data.tvp[,"dur"] , parm = params, build = buildSS)
tvp_mod <- buildSS(tvp_mod.mle$par)
tvp_filter <- dlmFilter(mod_data$dur,tvp_mod)
The above code runs, but the outputs are not correct. I am not specifying the the states properly. Does anyone have any experience in building dlms with mutlvirate regression in R?
I think I have gotten to a solution - I've managed to recreate the estimates in the paper which estimates this model using Eviews (also checked this using Eviews).
#--------------------------------------------------------------------------------------------------------------------------
# tvp model full model - dur = alpha*dur(-1)+ beta(dgdp-potential) + gamma*wages
#--------------------------------------------------------------------------------------------------------------------------
# Construct DLM
OkunsDLMfm <- dlm(
FF = matrix(c(1,1,1,1),ncol = 4, byrow = TRUE),
V = matrix(1),
GG = matrix(c(1,0,0,0,
0,1,0,0,
0,0,1,0,
0,0,0,1), ncol = 4, byrow = TRUE),
W = matrix(c(1,0,0,0,
0,1,0,0,
0,0,1,0,
0,0,0,1), ncol = 4, byrow = TRUE),
JFF = matrix(c(1,2,3,0),ncol = 4, byrow = TRUE),
X = cbind(mod_data$dur_lag1,mod_data$d2lgdp, mod_data$d2lrulc_lag2), # lagged dep var, dgdp, wages.
m0 = c(0,0,0,0),
C0 = matrix(c(1e+07,0,0,0,
0,1e+07,0,0,
0,0,1e+07,0,
0,0,0,1e+07), ncol = 4, byrow = TRUE)
)
buildOkunsFM <- function(p){
V(OkunsDLMfm) <- exp(p[2])
GG(OkunsDLMfm)[1,1] <- 1
GG(OkunsDLMfm)[2,2] <- 1
GG(OkunsDLMfm)[3,3] <- 1
GG(OkunsDLMfm)[4,4] <- 1
W(OkunsDLMfm)[1,1] <- exp(p[3])
W(OkunsDLMfm)[2,2] <- 0
W(OkunsDLMfm)[3,3] <- 0
W(OkunsDLMfm)[4,4] <- exp(p[4])
m0(OkunsDLMfm) <- c(0,0,0,p[1]*4)
C0(OkunsDLMfm)[1,1] <- 1
C0(OkunsDLMfm)[4,4] <- 5
return(OkunsDLMfm)
}
okuns.estfm <- dlmMLE(y = mod_data$dur, parm = c(-0.049,-1.4,-6,-5), build = buildOkunsFM)
OkunsDLM1fm <- buildOkunsFM(okuns.estfm$par)
The time varying level, the estimate of potential output, is derived by dividing the 4 element of the state vector by the second * by negative 1.
Not sure if this is best way to specify the DLM, but the results from the model are very close to what is reported (within 0.01) of the results from using Eviews. That being said, very open to any other specifications.

Resources