I'm trying to detect anomalies from training data.
First, I train a model according to a given time series using the forecast package:
train <- forecast(ts(sin((2*pi)*seq(from=0,to=10,by=0.01)),frequency=100))
Then, once I get new time series i try seeing how well they fit the trained data, and by that finding anomalies.
Currently I'm using the accuracy function which doesn't seem to be the right tool for the job:
test <- ts(sin((2*pi)*seq(from=0,to=20,by=0.01))+sin((3*pi)*seq(from=0,to=20,by=0.01)),frequency=100)accuracy(train,test)
accuracy(train,test)
I also thought of somehow analyzing the residuals of the new dataset according to the trained model.
Does anyone have any good ideas as to how to optimize this task?
Related
Suppose I fit AR(p) model using R arima function from stats package. I fit it using a sample x_1,...,x_n. In theory, when predicting x_{n+1} using this model, it needs an access x_n,...x_{n-p}.
How does the model know which observation I want to predict? What if I wanted to actually predict x_n based on x_{n-1},...,x_{n-p-1} and how my code would differ in this case? Can I make in-sample forecasts, similar to Python's functionality?
If my questions imply that I think about forecasting in a wrong way, please kindly correct my understanding of the subject.
I am working with the R programming language. In particular, I am using "Markov Switching Models" for the purpose modelling more complicated dataset with varying degrees of volatility. For this problem, I am using the "MSwM" package in R:
https://cran.r-project.org/web/packages/MSwM/MSwM.pdf
https://cran.r-project.org/web/packages/MSwM/vignettes/examples.pdf
I am following the first example from the "vignettes":
#load library
library(MSwM)
#load data
data(example)
#plot data
plot(ts(example))
#fit basic model
mod=lm(y~x,example)
#fit markov switching model:
mod.mswm=msmFit(mod,k=2,p=1,sw=c(TRUE,TRUE,TRUE,TRUE),control=list(parallel=FALSE))
The above code successfully creates the model - 2 "Regimes" have been identified:
Question: I checked the documentation of this package, and there does not seem to be any functions that allow you to predict future values of this time series. I did some research and it seems like the "Markov Switching Model" should allow you to predict future values of a time series, e.g. : https://stats.stackexchange.com/questions/90448/how-to-forecast-a-markov-switching-model
OR:
However, there does not seem to be a straightforward way to do this in R. Can someone please suggest how to resolve this problem?
Thanks
I used the forecast package to forecast the daily time-series of variable Y using its lag values and a time series of an external parameter X. I found nnetar model (a NARX model) was the best in terms of overall performance. However, I was not able to get the prediction of peaks of the time series well despite my various attempts with parameter tuning.
I then extracted the peak values (above a threshold) of Y (and of course this is not a regular time series anymore) and corresponding X values and tried to fit a regression model (note: not an autoregression model) using various models in carat package. I found out the prediction of peak values using brnn(Bidirectional recurrent neural networks) model just using X values is better than that of nnetar which uses both lag values and X values.
Now my question is how do I go from here to create ensamples of these two models (i.e whenever the prediction using brnn regression model ( or any other regression model) is better I want to replace the prediction using nnetar and move forward - I am mostly concerned about the peaks)? Is this a commonly used approach?
Instead of trying to pick one model that would be the superior at anytime, it's typically better to do an average of the models, in order to include as many individual views as possible.
In the experiments I've been involved in, where we tried to pick one model that would outperform, based on historical performance, it's typically shown that a simple average was as good or better. Which is in line with the typical results on this problem: https://otexts.com/fpp2/combinations.html
So, before you try to go more advanced at it by using trying to pick a specific model based on previous performance, or by using an weighted average, consider doing a simple average of the two models.
If you want to continue with a sort of selection/weighted averaging, try to have a look at the FFORMA package in R: https://github.com/pmontman/fforma
I've not tried the specific package (yet), but have seen promising results in my test using the original m4metalearning package.
I am working on a LDA model with textmineR, have calculated coherence, log-likelihood measures and optimized my model.
As a last step I would like to see how well the model predicts topics on unseen data. Thus, I am using the predict() function from the textminer package in combination with GIBBS sampling on my testset-sample.
This results in predicted "Theta" values for each document in my testset-sample.
While I have read in another post that perplexity-calculations are not available with the texminer package (See this post here: How do i measure perplexity scores on a LDA model made with the textmineR package in R?), I am now wondering what the purpose of the prediction function is then for? Especially with a large dataset of over 100.000 Documents it is hard to just visually assess whether the prediction has performed well or not.
I do not want to use perplexity for model selection (I am using coherence/log-likelihood instead), but as far as I understand, perplexity would help me to understand how well the prediction is and how "surprised" the model is with new, previously unseen data.
Since this does not seem to be available for textmineR, I am not sure how to assess the model prediction. Is there anything else that I could use to measure the prediction quality of my textminer model?
Thank you!
I find 2 ways of training and testing hts and I am not sure which one is appropriate
1. In this question (link given)
MASE Extraction Hierarchical Data ('hts' and 'forecast' packages R)
you have divided the time series into train and test and applied hts
2. In this question (link provided)
Hierarchical Time Series
you are applying hts to the entire ts and then splitting into train and test.
My question is which of these is correct? please help
Also, I would like to know how can i check the accuracy of the forecast i make. That is, how far into the future can I determine that the forecast would statistically be appropriate using the hts model I choose to use. Any pointers to any articles or an example would be helpful. I would like to determine if i should forecast for 4,8, or 12 weeks ahead.
Thanks in advance