Forecasting panel data and time series - r

I have a panel data set of lets say 1000 observations, so i=1,2,...,1000 . The data set runs in daily basis for a month, so t=1,2,...,31.
I want to estimate individual specific in R:
y_i10=αi+βi∗yi9+γi∗yi8+...+δi∗yi1+ϵit
and then produce density forecasts for the next 21 days, that is produce density forecasts for yi11,yi12 etc
My questions are:
Can I do this with plm package ? I am aware how to estimate with plm package but I do not know how to produce the forecasts.
Would it be easier (and correct) if I consider each observation as a separate time series, and use arima(9,0,0) for each one of them, and then get the density forecasts ? If so, how can I get the density forecasts ?
In (2.) , how can I include individual specific effects that are constant over time ?
Thanks a lot

Related

Seasonal Arima not being able to model seasonal components efficiently

The following is a (not perfectly)seasonal time series sequence that I am trying to fit an ARIMA model to:
data plot
I made it stationary(confirmed by adf test) using first order regular diff and first order seasonal diff, after which it looked like this:
acf and pacf of the stationary data
The seasonal components are at 12 and 24, both significant in the acf plot, while in the pacf plot, only the 12th one is significant.
I tried a lot of combinations for p,d,a,P,D,Q but no matter what the combination, the residuals always seem to have the first seasonal lag as significant in both the acf and pacf plots.
I decided to go with ARIMA(4,1,4)(1,1,1)[96] (even though 12 is the frequency of data(monthly), it happens to show seasonality at roughly 96 intervals, i.e. in 8 years) because it was giving the best log-likelihood score, but still it doesn't fits the seasonal components.
residual acf and pacf
Can anyone please suggest me what should be improved/tried in order for the model to fit all the lags?
Sharing the r file and dataset here: https://drive.google.com/drive/folders/1okMUkBj2W2nF9NkoX4igq2-7QP-cgnSO?usp=sharing

Create Bi-variate probability density to sample from in R

I have some data that has two variables: spend and outcomes and they are given at a weekly frequency.
I would like to model the relationship between the two at a yearly level, but do not have enough years worth of data to build a model. I do have about 3 years worth of weekly data, however, and would like to simulate several more weeks of data points (spend and outcomes) based on a bi-variate probability density between spend and outcomes which I could then use to roll up to a yearly frequency.
Is there a package in R that can take take two variables and find an estimate for the density function which I could then use to simulate many more data points?
Thanks so much!
The simulate_kde function in the package simukde will internally make a kernel density estimation and create samples from it.
Alternatively, the MASS package has the kde2d function to obtain a bivariate kernel density.
You could then sample from that, as for instance described in this post.

Can one improve coefficients in recursion using some machine learning algorithm?

I've got this prediction problem for daily data across several years. My data has both yearly and weekly seasonality.
I tried using the following recurrence:(which I just came up with, from nowhere if you like) xn = 1/4(xn-738 + xn-364 + xn-7 + 1/6(xn-1+xn-2+xn-3+xn-4+xn-5+xn-6)
Basically, I am taking into consideration some of the previous days in the week before the day I am trying to predict and also the corresponding day a year and two years earlier. I am doing an average over them.
My question is: can one try to improve the prediction by replacing the coefficients 1/4,1/6 etc with coefficients that would make the mean squared residual smaller?
Personally I see your problem as a regression.
If you have enough data I would run for timeseries prediction.
You said that the data has yearly and weekly seasonality. In order to cope with that you can have two models one with weekly window and one dealing with the yearly pattern and then somehow combine them (linear combination or even another model).
However if you don't have you could try passing the above xi as features to a regression model such as linear regression,svm,feed forward neural network and on theory it will find those coefficients that produce small enough loss (error).

Difference between simulate() and forecast() in "forecast" package

I am working on building a time series model.
However, I am having trouble understanding what the difference is between the simulate function and the forecast function in the forecast package.
Suppose I built an arima model and want to use it to simulate future values as long as 10 years. The data is hourly and we have a year worth of data.
When using forecast to predict the next 1000-step-ahead estimation, I got the following plot.
Using forecast method
Then I used the simulate function to simulate the next 1000 simulated values and got the following plot.
Using simulate method
Data points after the red line are simulated data points.
In the latter example, I used the following codes to simulate the future values.
simulate(arima1, nsim=1000, future=TRUE, bootstrap=TRUE))
where arima1 is my trained arima model, bootstrap residuals are used because the model residuals are not very normal.
Per definition in the forecast package, future=TRUE means that we are simulating future values based on the historical data.
Can anyone tell me what the difference is between these two method? Why does simulate() give me a much more realistic results but forecasted values from forecast() just converge to a constant after several iterations (no much fluctuation to the results from simulate())?
A simulation is a possible future sample path of the series.
A point forecast is the mean of all possible future sample paths. So the point forecasts are usually much less variable than the data.
The forecast function produces point forecasts (the mean) and interval forecasts containing the estimated variation in the future sample paths.
As a side point, an ARIMA model is not appropriate for this time series because of the skewness. You might need to use a transformation first.

R: Generate a Seasonal ARIMA time-series model using parameters of existing data

I have a count time series data which I'm able to use to determine the parameters of the underlying stochastic process. For example say I have a SARIMA (p,d,q)(P,D,Q)[S] seasonal ARIMA model.
How do I use this to generate a new count time series data set?
Being even more specific: a SARIMA(1,0,1)(1,0,0)[12] - how can I generate a time series for a 10 year period for each month? (i.e., 120 points to estimate the number of 'counts'.)
Use simulate.Arima() from the forecast package. It handles seasonal ARIMA models whereas arima.sim() does not.
However, ARIMA models are not suitable for count time series as they assume the process is defined on the whole real line.

Resources