Forecasting based on MAD or MAPE - r

I'm a student learning forecast, a beginner in R.
I'm forecasting a seasonal data using HoltWinter, however, i want to R optimize alpha, beta and gamma based on a different criteria rather than only MSE, (maybe MAD or MAPE).
1) There is a way HoltWinters() or hw() can do this by themselves?
2) Even better, there is a way R can optimize the parameters based on a diferent function or equation that i can choose?

Related

tsCV() function versus train/test split for time-series in R

So, frequency of data is monthly and it is stationary I have an ARIMA model using auto.arima. Couple tests are applied to the data before creating model like ACF,ADF etc.
y is my monthly time-series object using ts() function:
myarima=auto.arima(y, stepwise = F,approximation = F,trace=T)
Then I use forecast function:
forecast = forecast(myarima,h=10)
autoplot(forecast)
Since for this case, I did not create any train and test sets because my data has fluctutation at the end so if I create a train/test split since my test set should equal to the forecast horizon (last 10 months) then the model will not be able to understand fluctutations at the end since it will be the test. Would be great to be enlighten regarding the K-fold cross validation to avoid these kind of scenarios.
Without train and test split, after creating the model and visualizing the forecast, I went for tsCV():
myforecast_arima<-function(x,h){
forecast(auto.arima(x),stepwise=F,approximation=F,h=h)
}
error_myarima<-tsCV(y,myforecast_arima,h=10)
mean(arimaerror^2,na.rm=TRUE) #To get MSE
Then I get a kind of low MSE value which is around 0.30
So, my question is, is it trustworthy method to evaluate ARIMA models and then deploy using this pathway? or Should I use train/test split method? What would you guys prefer in general? Should I use any other method? and how can I determine the window parameter of tsCV() function? If my pathway is correct then how can I improve it? What are the biggest differences between K-Fold CV and tsCV() function?
Thank you!

student-t distribution for portfolio optimization in R

How can i use a student-t distribution for portfolio optimization in R?
I would fit the data via the estimated parameter and then throw my new distribution into a portfolio optimization package.
From the beginning: I'm trying to do a Portfolio Optimization via the Entropy Pooling Approach by Meucci. As a Basis (Reference Model) i would like to use historical data fitted by a multivariate skewed t-distribution.
Basics: The Entropy Pooling Approach is build upon Black-Litterman - simply said: you can incorporate Views (absolute or relative) into your Model/Portfolio Optimization. The difference compared to the BL is, that you can use a non-normal distribution (not even returns), non-linear Views and views on a variety of parameters. (returns, cor, sd etc.) Therefore, you can put any random data into your model as a reference model. The following step is to blend this model with your individual selected views.
So now, i have a distribution object, but how do i get the distribution into my optimizer. (optimize.portfolio - package 'PortfolioAnalytics'). The requirement therefore is "an xts, vector, matrix, data frame, timeSeries or zoo object of asset returns". The gap in my knowledge is at the transition from the distribution to the new data set.
Thx in Advance!
My code subsequent:
return_distribution = sn::mst.mple(y=returns[,-1])
xi = c(return_distribution[['dp']]$beta)
omega = return_distribution[['dp']]$Omega
alpha = return_distribution[['dp']]$alpha
df = return_distribution[['dp']]$nu
marketDistribution = BLCOP::mvdistribution('mst', xi = xi, Omega = omega,
alpha = alpha, nu = df)
You should look for scenario optimisation, see e.g. https://quant.stackexchange.com/questions/31818/optimize-portfolio-of-non-normal-binary-return-assets/31847#31847 . For an implementaion in R, see for instance https://quant.stackexchange.com/questions/42339/target-market-correlation-for-long-short-equity-portfolio/50622#50622 (though it does not use PortfolioAnalytics).

How to build a model for temperature-outcome using dlm?

I have a dataset containing information about weather, air pollution and healthoutcomes. I want to regress temperature (T) and temperature lag (T1) against cardiac deaths (CVD). I have previously used the glm model in R using the following script:
#for mean daily temperature and temperature lags separately.
modelT<-glm(cvd~T, data=datapoisson, family=poisson(link="log"), na=na.omit)
I get the effect estimates and standard error values which i used to convert to risk ratio.
Now i want to use dynamic linear model or distributed linear model for check the predictor-outcome and lagged predictor outcome association. However, i can't find the script for running the model in R.
I installed the DLM package in R, but still can't figure out how to build a model using DLM package in R.
I would appreciate if someone can help with it.
Could you try least squares multiple regression to predict the outcome? I used that method when I tried to 'predict' which factors influenced power in a floating offshore wind turbine. It is good for correlating multiple parameters.
They fit a plane to a set of points, but it seems like a similar idea.
https://math.stackexchange.com/questions/99299/best-fitting-plane-given-a-set-of-points

How do I decide between different forecasting model families to automate forecasting for 150 time series?

I have weekly time series data for multiple departments (retail domain) and based on some research, I am automating the process of finding model parameters for each time series. So far, I have implemented the following models for each time series in a for loop:
1) ARIMA (auto.arima in R)
2) stlf (cannot use R's ets function since I have weekly data)
3) TBATS
4) Regression on ARIMA errors (using fourier terms)
5) Baseline models: naive & mean
I want to understand how to choose models for each time series. I have multiple approaches to this:
1) Choose model with lowest RMSE on test data (risk: overfitting on test data)
2) Choose model with lowest RMSE best on cross-validation of time series (tsCV)
3) Choose one family of models for all the time series based on which family gives lowest average RMSE score across all the time series.
Are there any ways I can improve my approach? Any disadvantages to any of the above approaches? Any better approach?
Thanks a lot!
Forecast your data with all forecasting methods mentioned above, after that calculate the MAPE and check which model is giving best results then use that model for forecast your data.
Also try to check with different different data transformation like log, inverse, etc.. for your input data.

Multivariate time series model using MARSS package (or maybe dlm)

I have two temporal processes. I would like to see if one temporal process (X_{t,2}) can be used to perform better forecast of the other process (X_{t,1}). I have multiple sources providing temporal data on X_{t,2}, (e.g. 3 time series measuring X_{t,2}). All time series require a seasonal component.
I found MARSS' notation to be pretty natural to fit this type of model and the code looks like this:
Z=factor(c("R","S","S","S")) # observation matrix
B=matrix(list(1,0,"beta",1),2,2) #evolution matrix
A="zero" #demeaned
R=matrix(list(0),4,4); diag(R)=c("r","s","s","s")
Q="diagonal and unequal"
U="zero"
period = 12
per.1st = 1 # Now create factors for seasons
c.in = diag(period)
for(i in 2:(ceiling(TT/period))) {c.in = cbind(c.in,diag(period))}
c.in = c.in[,(1:TT)+(per.1st-1)]
rownames(c.in) = month.abb
C = "unconstrained" #2 x 12 matrix
dlmfit = MARSS(data, model=list(Z=Z,B=B,Q=Q,C=C, c=c.in,R=R,A=A,U=U))
I got a beta estimate implying that the second temporal process is useful in forecasting the first process but to my dismay, MARSS gives me an error when I use MARSSsimulate to forecast because one of the matrices (related to seasonality) is time-varying.
Anyone, knows a way around this issue of the MARSS package? And if not, any tips on fitting an analogous model using, say the dlm package?
I was able to represent my state-space model in a form adequate to use with the dlm package. But I encountered some problems using dlm too. First, the ML estimates are VERY unstable. I bypassed this issue by constructing the dlm model based on marss estimates. However, dlmFilter is not working properly. I think the issue is that dlmFilter is not designed to deal with models with multiple sources for one time series, and additional seasonal components. dlmForecast gives me forecasts that I need!!!
In summary for my multivariate time series model (with multiple sources providing data for one of the temporal processes), the MARSS library gave me reasonable estimates of the parameters and allowed me to obtain filtered and smoothed values of the states. Forecast values were not possible. On the other hand, dlm gave fishy estimates for my model and the dlmFilter didn't work, but I was able to use dlmForecast to forecast values using the model I fitted in MARSS and reexpressed in dlm appropriate form.

Resources