Predicting future emissions from fitted HMM model - r

I've fitted a HMM model to my data using hmm.discnp package in R as follows:
library(hmm.discnp)
zs <- hmm(y=lis,K=5)
Now I want to predict the future K observations (emissions) from this model. But I am only able to get most probable state sequence for the observations that I already have through Viterbi algorithm.
I have t emissions already , i.e (y(1),...,y(t)).
I want the most probable future K emissions from the fitted HMM object i.e (y(t+1),...y(t+k)).
Is there a function to calculate this? if not then how do I calculate it manually?

Generating emissions from an HMM is pretty straightforward to do manually. I'm am not really familiar with R but I explain here the steps to generate data as you ask.
First thing to keep in mind is that, by its Markovian nature, the HMM has no memory. At any time, only the current state is known, what happened before is "forgotten". This means that the generation of the sample at time t+1 only depends of the sample at time t.
If you have a sequence, the first thing you can do is to fit the most probable state sequence (with the Viterbi algorithm) as you did. Now, you know the state that generated the last observation that you have (the one that you denote y(t)).
Now, from this state, you know the probabilities to transit to each other state of the model thanks to the transition matrix. This is a probability mass function (pmf) and you can draw a state number from this pmf (not by hand! R should have a built-in function to draw a sample from a pmf). The state number you draw is the state in which your system is at time t+1.
With this information, you can now draw a sample observation from the probability function that is assigned to this new state (same here, if it is a Gaussian distribution, use a Gaussian random generator that should exist in R).
From this state t+1, you can now apply the same procedure to reach a state at time t+2 and so on.
Keep in mind that if you do this full procedure several times (to generate data samples from time t+1 to t+k), you will end up with different results. This is due to the probabilistic nature of the model. I am not sure of what you mean by most probable future emissions and I am not sure whether there are some routines or not to do so. You can compute the likelihood of the full sequence you obtain at the end (from 1 to t+k). It will in general be greater that the likelihood of the sequence up to t as the last part has been truly generated from the model itself and thus "perfectly" fits in some regards.

Related

Is there a numerical method for approaching the first derivative at t = 0 s in a real-time application?

I want to improve step-by-step, whilst unevenly-sampled data are coming, the value of the first derivative at t = 0 s. For example, if you want to find the initial velocity in a projectile's motion, but you do not know its final position and velocity, however, you are receiving (slowly) the measurements of the projectile's current position and time.
Update - 26 Aug 2018
I would like to give you more details:
"Unevenly-sampled data" means the time intervals are not regular (irregular times between successive measurements). However, data have almost the same sampling frequency, i.e., it is about 15 min. Thus, there are some measurements without changes, because of the nature of the phenomenon (heat transfer). It gives an exponential tendency and I can fit data to a known model, but an important amount of information is required. For practical purposes, I only need to know the value of the very first slope for the whole process.
I tried a progresive Weighted Least Squares (WLS) fitting procedure, with a weight matrix such as
W = diag((0.5).^(1:kk)); % where kk is the last measurement id
But it was using preprocessed data (i.e., jitter-removing, smoothing, and fitting using the theoretical functional). I gave me the following result:
This is a real example of the problem and its "current solution"
It is good for me, but I would like to know if there is an optimal manner of doing that, but employing the raw data (or smoothed data).
IMO, additional data is not relevant to improve the estimate at zero. Because perturbations come into play and the correlation between the first and last samples goes decreasing.
Also, the asymptotic behavior of the phenomenon is probably not known rigorously (is it truly a first order linear model) ? And this can introduce a bias in the measurements.
I would stick to the first points (say up to t=20) and fit a simple model, say quadratic.
If in fact what you are trying to do is to fit a first order linear model to the data, then least-squares fitting on the raw data is fine. If there are significant outliers, robust fitting is preferable.

Simulating returns from ARMA(1,1) - MCsGARCH(1,1) model

How can I find expected intraday return of ARMA(1,1) - MCsGARCH(1,1) Model in R?
The sample code of the model is available at http://www.unstarched.net/2013/03/20/high-frequency-garch-the-multiplicative-component-garch-mcsgarch-model/
I think you are mixing up something here. There is no "expected intraday return", for the ARMA(1,1) - MCsGARCH(1,1) there only is an estimation of the volatility of the following period/day (sigma, as you've already noticed in the comments).
I assume you are referring to the last plot on the website you provided, that would mean you want to know the VaR (Value-at-Risk) that is calculated with the volatility from the estimation procedure.
If you look at the code that was used to provide the plot:
D = as.POSIXct(rownames(roll#forecast$VaR))
VaRplot(0.01, actual = xts(roll#forecast$VaR[, 3], D), VaR = xts(roll#forecast$VaR[,1], D))
You can see that the VaR (and the returns) where taken from the object roll. After you've run the simulation (without changing any variable names from the example), you could store them in a variable for later use like this:
my_VaR = roll#forecast$VaR[, 1]
my_act = roll#forecast$VaR[, 3]
Where VaR, 1] is the first listelement for VaR. If you check str(roll) you see pretty much at the end, that:
Element 1: stands for the alpha(1%) VaR
Element 2: stands for the alpha(5%) VaR and
Element 3: stands for the realized return.
To adress what you said in your comment:
Have a look at the variable df (generated from as.data.frame(roll), that may include what you are looking for.
I want to compare the expected return and the actual return.
This seems to drift more in the direction of Cross Validated, but I'll try to give a brief outline.
GARCH models are primarily used for volatility forecasting and to learn about the volatility dynamics of a time series (and/or the correlation dynamics in multivariate models). Now since variance is of the second moment, which translates to squared, it is always positive. But are returns always positive? Of course they are not. This means the volatility forecast gives us an idea of the magnitude of the returns of the next period, but at that point we don't know if it will be a positve return or a negative return. That's were the Value-at-Risk (VaR) comes into play.
Take e. g. a portfolio manager who owns one asset. With a GARCH model he could predict the volatility of the next period (let's say he uses a daily return series, then that would be tomorrow). Traders watch the risk of their portfolio, it is much more closely monitored than the potential chances. So with the volatility forecast he can make a good guess about the risk he has of his asset loosing in value tomorrow. A 95%-VaR of lets say 1,000 EUR means, with a 95% probability, the risk (or loss) of tomorrow will not exceed 1,000 EUR. A higher probability comes with less certainty, so a 99%-VaR will be higher, e. g. 1,500 EUR.
To wrap this up: there is no "expected" return, there is only a volatility forecast for tomorrow that gives an inclination (never certainty) of how tomorrows return could turn out. With the VaR this can be used for risk management. This is what is being done in the last part of the article you provided.
what is the difference of ugarchsim and roll function?
You could check in the documentation of the rugarch package, every function and its properties are explained in more detail in there. At a quick glance I would say ugarchsim is used if you want to fit a model to a complete time series. The last standard deviation is then the forecast for the next period. The documentation for ugarchroll says:
ugarchroll-methods {rugarch} function: Univariate GARCH Rolling
Density Forecast and Backtesting
Description
Method for creating rolling density forecast from ARMA-GARCH models with option for refitting every n periods with
parallel functionality. is used aswell for forecasting as for
backtesting.
This is if you want to test how your model would've performed in the past. It only takes e. g. the first 300 datapoints provided and give the forecast for datapoint 301. Then the VaR (95% or 99%) is compared to the realized return of datapoint 301. Then the model is being refitted, giving a forecast for datapoint 302 and so on and on.
Edit: added answers to the questions from the comments.

Update Stanfit object with new data

I am generating data via API calls, one data point at a time. I want to feed each point to a Stan model, save the updated model, and discard the data point.
Is this possible with Stan?
If so, how do you deal with group-level parameters? For example, if my model has J group-level parameters, but I'm only inputing one data point at a time, will this not generate an error?
I think your problem can be conceptualized as Bayesian updating. In other words, you beliefs about the parameters are currently represented by some joint distribution, then you get one more data point, and you want to update your beliefs in light of this data point. And then repeat that process.
If so, then you can do a Stan model that has only one data point, but you need some way of representing your current beliefs with a probability distribution to use as the prior. This typically would be done with some multivariate normal distribution on the parameters in the unconstrained space. You can use the unconstrain_pars function in the rstan package to obtain a matrix of unconstrained posterior draws and then see what multivariate normal it is close to. You probably want to use some shrunken covariance estimator for the multivariate normal if you have a lot of parameters. Then, in your Stan program use a multivariate normal prior on the parameters and do whatever transformations you need to do to get transformed parameters in the constrained space (many such transformations are documented in the Stan User Manual).
It is true that when you estimate a hierarchical model with only one data point, that data point has essentially no information about the groups that the particular data point are not in. However, in that case, the margins of the posterior distribution for the parameters of the omitted groups will be essentially the same as the prior distribution. That is fine.

Hmm training with multiple observations and mhsmm package in R

i wanted to train a new hmm model, by means of Poisson observations that are the only thing i know.
I'm using the mhsmm package for R.
The first thing that bugs me is the initialization of the model, in the examples is:
J<-3
initial <- rep(1/J,J)
P <- matrix(1/J, nrow = J, ncol = J)
b <- list(lambda=c(1,3,6))
model = hmmspec(init=initial, trans=P, parms.emission=b,dens.emission=dpois.hsmm)
in my case i don't have initial values for the emission distribution parameters, that's what i want to estimate. How?
Secondly: if i only have observations, how do i pass them to
h1 = hmmfit(list_of_observations, model ,mstep=mstep.pois)
in order to obtain the trained model?
list_of_observations, in the examples, contains a vector of states, one of observations and one of observation sequence length and is usually obtained by a simulation of the model:
list_of_observations = simulate(model, N, rand.emis = rpois.hsmm)
EDIT: Found this old question with an answer that partially solved my problem:
MHSMM package in R-Input Format?
These two lines did the trick:
train <- list(x = data.df$sequences, N = N)
class(train) <- "hsmm.data"
where data.df$sequences is the array containing all observations sequences and N is the array containing the count of observations for each sequence.
Still, the initial model is totally random, but i guess this is the way it is meant to be since it will be re-estimated, am i right?
The problem of initialization is critical not only for HMMs and HSMMs, but for all learning methods based on a form of the Expectation-Maximization algorithm. EM converges to a local optimum in terms of likelihood between model and data, but that does not always guarantee to reach the global optimum.
Goal: find estimates of the emission distribution but it also works for initial probability and transition matrix
Algorithm: needs initial estimate to start the optimisation from
You: have to provide an initial "guess" of the parameters
This may seem confusing at first, but the EM algorithm needs a point to start the optimisation. Then it makes some computations and it gives you a better estimate of your own initial guess (re-estimation, as you said). It is not able to just find the best parameters on its own, without being initialised.
From my experience, there is no general way to initialise the parameters that guarantee to converge to a global optimum, but it will depend more on the case at hand. That's why initialisation plays a critical role (mostly for emission distribution).
What I used to do in such a case is to separate the training data in different groups (e.g. percentiles of a certain parameter in the set), estimate the parameters on these groups, and then use them as initial parameter estimates for the EM algorithm. Basically, you have to try different methods and see which one works best.
I'd recommend to search the literature if similar problems have been solved with HMM, and try their initialisation method.

Simple algorithm for online outlier detection of a generic time series

I am working with a large amount of time series.
These time series are basically network measurements coming every 10 minutes, and some of them are periodic (i.e. the bandwidth), while some other aren't (i.e. the amount of routing traffic).
I would like a simple algorithm for doing an online "outlier detection". Basically, I want to keep in memory (or on disk) the whole historical data for each time series, and I want to detect any outlier in a live scenario (each time a new sample is captured).
What is the best way to achieve these results?
I'm currently using a moving average in order to remove some noise, but then what next? Simple things like standard deviation, mad, ... against the whole data set doesn't work well (I can't assume the time series are stationary), and I would like something more "accurate", ideally a black box like:
double outlier_detection(double* vector, double value);
where vector is the array of double containing the historical data, and the return value is the anomaly score for the new sample "value" .
This is a big and complex subject, and the answer will depend on (a) how much effort you want to invest in this and (b) how effective you want your outlier detection to be. One possible approach is adaptive filtering, which is typically used for applications like noise cancelling headphones, etc. You have a filter which constantly adapts to the input signal, effectively matching its filter coefficients to a hypothetical short term model of the signal source, thereby reducing mean square error output. This then gives you a low level output signal (the residual error) except for when you get an outlier, which will result in a spike, which will be easy to detect (threshold). Read up on adaptive filtering, LMS filters, etc, if you're serious about this kind of technique.
I suggest the scheme below, which should be implementable in a day or so:
Training
Collect as many samples as you can hold in memory
Remove obvious outliers using the standard deviation for each attribute
Calculate and store the correlation matrix and also the mean of each attribute
Calculate and store the Mahalanobis distances of all your samples
Calculating "outlierness":
For the single sample of which you want to know its "outlierness":
Retrieve the means, covariance matrix and Mahalanobis distances from training
Calculate the Mahalanobis distance "d" for your sample
Return the percentile in which "d" falls (using the Mahalanobis distances from training)
That will be your outlier score: 100% is an extreme outlier.
PS. In calculating the Mahalanobis distance, use the correlation matrix, not the covariance matrix. This is more robust if the sample measurements vary in unit and number.

Resources