How to determine the most significant predictors - multivariate forecasting - r

I would like to create a forecasting model with time series in R. I have a target time series 'Sales' that I would like to forecast. I also have several time series that represent, for example, GDP or advertising spend. Unfortunately I have a lot of independent time series and I don't know how to figure out the most significant ones. It would be best to find out the most important ones already before building the model.
I have already worked with classification problems, here I have always used the Pearson correlation value. This is not possible with time series, right? How can I determine the correlation for time series and use the correlation to find suitable time series that describe my target time series?
I tried to use the corr.test() function in R, but I think thats not right.

Related

Classifying M multivariate time-series based on known K classes in R

I have a M multivariate time series data, by multivariate I mean that a time series is represented by more than one variable that varies in time (see example image for simulated data). All have the same size. I want to build a classifier trained on K class (eg. all time series data belongs to A, B or C class).
Is there a straightforward implementation of this in R, specifically, as the regular classification approaches (e.g random forest, SVM) will ignore the dependent data and give different predictions within the same time series. I have an intuition how this could be solved, e.g. using some ensemble classification, or concatenating time series into a univariate vector, but I have a feeling there is a better approach for this that doesn't require me to reinvent the wheel. I also know that KNN and DTW approach could in theory work, but not sure how they get around these issues above (e.g. the multivariate problem)
Appreciate any pointers and references

Changing seasonality over time in structural model using KFAS package in R

I have a time series for which I want to adjust a structural model (trend, seasonal and cycle) using KFAS. However, seasonality starts at a certain point in time. Say, the time series ranges monthly from january 2000 through august 2022, but seasonality starts in 2011. Is there a way to capture such behavior in the series without splitting the data at that point?
I have already tried splitting the time series, but I would like a unified model. I am using KFAS in R for the estimation, though I have used also autostm for automatic structural models. Even though they achieve an appropriate fit (even for the whole time series), I think it can be improved with this idea. I thought I could us a regressor on the seasonality but I couldn't find how.
Are you using SSModel with a formula input? You could try adding a seasonality term to your data and add the seasonality term to the right-hand side of ~ in the formula.

Detecting seasonality for time series and apply cross correlation function

I have a question using R's ccf() function. I have two time series that represents snow water equivalent on the surface and groundwater head under the ground. I want to find out the "propagation" time from the surface to the ground, so I think that using cross correlation between two time series can help me to detect what's the "lag" time between them.
It seems that ccf() function is a proper way to determine the lag of two time series. But according to the mathematical concepts of cross correlation, it seems that it requires stationarity of the input data, and both of my time series are seasonal, because intuitively we know that snow occurs in winter. Data with seasonality is considered as non-stationary, so I think I might need to decompose the data so that it's stationary. Then I used both stl() function and decompose() function to detect whether there is a seasonality pattern, but both of them gave me this error message:
Error in decompose(swefoothill):
time series has no or less than 2 periods
which is pretty self explanatory, both time series don't have a clear seasonality. But that doesn't mean that my data are not seasonal. So I want to ask under this circumstance, is it okay for me to perform a ccf() directly for both time series? I did a sample analysis and the correlation factor figure looks like this:
And I'm observing a cycle pattern here, am I doing it wrong? Thanks a lot for your help!

R - Forecast multiple time-series (15K Products)

Hi Stack Overflow community.
I have 5 years of weekly price data for more than 15K Products (5*15K**52 records). Each product is a univariate time series. The objective is to forecast the price of each product.
I am familiar with the univariate time series analysis in which we can visualize each ts series, plot its ACF, PACF, and forecast the series. But, Univariate time series analysis is not possible in this case when I have 15K different time-series, can not visualize each time series, its ACF, PACF, and forecast separately of each product, and make a tweak/decision on it.
I am looking for some recommendations and directions to solve this multi-series forecasting problem using R (preferable). Any help and support will be appreciated.
Thanks in advance.
I would suggest you use auto.arima from the forecast package.
This way you don't have to search for the right ARIMA model.
auto.arima: Returns best ARIMA model according to either AIC, AICc or BIC value. The function conducts a search over possible models within the order constraints provided.
fit <- auto.arima(WWWusage)
plot(forecast(fit,h=20))
Instead of WWWusage you could put one of your time series, to fit an ARIMA model.
With forecast you then perform the forecast - in this case 20 time steps ahead (h=20).
auto.arima basically chooses the ARIMA parameters for you (according to AIC - Akaike information criterion).
You would have to try, if it is too computational expensive for you. But in general it is not that uncommon to forecast that many time series.
Another thing to keep in mind could be, that it might after all not be that unlikely, that there is some cross-correlation in the time series. So from a forecasting precision standpoint it could make sense to not treat this as a univariate forecasting problem.
The setting it sounds quite similar to the m5 forecasting competition that was recently held on Kaggle. Goal was to point forecasts the unit sales of various products sold in the USA by Walmart.
So a lot of time series of sales data to forecast. In this case the winner did not do a univariate forecast. Here a link to a description of the winning solution. Since the setting seems so similar to yours, it probably makes sense to read a little bit in the kaggle forum of this challenge - there might be even useful notebooks (code examples) available.

How to deal with time series data with many 0's?

I have time series data ranging from 0 to 30 million. Its basically web traffic weekly data. I am working on building a forecasting model with this data. I want to understand how can I deal with this range of data. I tried box cox transformation with prophet model. I am not sure about what metrics could I use to evaluate the performance of the model. The data has a lot of 0's. I can't remove them from the dataset. Is there a better way to deal with the 0's other than the Box Cox transformation? I had issues with the inverse transformation but I added a small value (0.1) to the data to avoid negative values.
If your series have lot of periodic zero data,Croston method is a one way.It is a basically forecast strategy for products with intermittent demand.Also you can try exponential smoothing and traditional ARIMA,SARIMA models and clip the negative values in the forecast(this is according to your use case).
you can find croston method in forecast package.
also refer these links as well.
https://stats.stackexchange.com/questions/8779/analysis-of-time-series-with-many-zero-values/8782
https://stats.stackexchange.com/questions/373689/forecasting-intermittent-demand-with-zeroes-in-times-series
https://robjhyndman.com/papers/foresight.pdf

Resources