Can one improve coefficients in recursion using some machine learning algorithm? - recursion

I've got this prediction problem for daily data across several years. My data has both yearly and weekly seasonality.
I tried using the following recurrence:(which I just came up with, from nowhere if you like) xn = 1/4(xn-738 + xn-364 + xn-7 + 1/6(xn-1+xn-2+xn-3+xn-4+xn-5+xn-6)
Basically, I am taking into consideration some of the previous days in the week before the day I am trying to predict and also the corresponding day a year and two years earlier. I am doing an average over them.
My question is: can one try to improve the prediction by replacing the coefficients 1/4,1/6 etc with coefficients that would make the mean squared residual smaller?

Personally I see your problem as a regression.
If you have enough data I would run for timeseries prediction.
You said that the data has yearly and weekly seasonality. In order to cope with that you can have two models one with weekly window and one dealing with the yearly pattern and then somehow combine them (linear combination or even another model).
However if you don't have you could try passing the above xi as features to a regression model such as linear regression,svm,feed forward neural network and on theory it will find those coefficients that produce small enough loss (error).

Related

Extracting linear term from a polynomial predictor in a GLM

I am relatively new to both R and Stack overflow so please bear with me. I am currently using GLMs to model ecological count data under a negative binomial distribution in brms. Here is my general model structure, which I have chosen based on fit, convergence, low LOOIC when compared to other models, etc:
My goal is to characterize population trends of study organisms over the study period. I have created marginal effects plots by using the model to predict on a new dataset where all covariates are constant except year (shaded areas are 80% and 95% credible intervals for posterior predicted means):
I am now hoping to extract trend magnitudes that I can report and compare across species (i.e. say a certain species declined or increased by x% (+/- y%) per year). Because I use poly() in the model, my understanding is that R uses orthogonal polynomials, and the resulting polynomial coefficients are not easily interpretable. I have tried generating raw polynomials (setting raw=TRUE in poly()), which I thought would produce the same fit and have directly interpretable coefficients. However, the resulting models don't really run (after 5 hours neither chain gets through even a single iteration, whereas the same model with raw=FALSE only takes a few minutes to run). Very simplified versions of the model (e.g. count ~ poly(year, 2, raw=TRUE)) do run, but take several orders of magnitude longer than setting raw=FALSE, and the resulting model also predicts different counts than the model with orthogonal polynomials. My questions are (1) what is going on here? and (2) more broadly, how can I feasibly extract the linear term of the quartic polynomial describing response to year, or otherwise get at a value corresponding to population trend?
I feel like this should be relatively simple and I apologize if I'm overlooking something obvious. Please let me know if there is further code that I should share for more clarity–I didn't want to make the initial post crazy long, but happy to show specific predictions from different models or anything else. Thank you for any help.

Survival analysis for various timepoints

I am running survival analysis on a very large data set and attempting to examine the impact of a particular variable on survival at various time points (30 days, 90 days, 180 days, 365 days).
I want to run a univariate Cox regression and I am not sure how to do this properly. The data set contains a variable, "Time", which contains the amount of days that patients have been present in the data set.
At first, I simply did a subset of the major dataset at the various time points (i.e. subset by Time <= 30...etc) and then just ran a simple Cox regression in each data frame (coxph(surv(time, event)~x) ...this was obviously foolish because it only included information leading up to each interval. I have no idea how to attack this problem otherwise and was unable to find a good answer through my searches
Any suggestions would be greatly appreciated. thank you!

Forecast in R - auto.arima with external regressors

Further to this discussion regarding fitting arima model using external regressors.
From Auto.arima to forecast in R
I was able to forecast perfectly for next 5 months given that I had future values for the predictors explaining my response variable (churn_rate).
arima_model_churn_rate <- auto.arima(tsm_churn_rate, stepwise = FALSE,
approximation = FALSE,
xreg = xreg_in_out_p_month_1)
number_of_future_month <- 5
forecast_churn_rate <- forecast (arima_model_churn_rate,
xreg = xreg_fut_in_out_p_month_churn_rate,
h = number_of_future_month)
plot(forecast_churn_rate)
My question is as I need to predict in future I can not wait for the predictors to be measured to make prediction for future months ?
If I have to wait till end of month then I can do simple calculation to see what is churn rate ?
My goal is predict for next 3 months in that case what I should I do get future values for my predictors?
I am kind of confused with this whole scenario as discussed in the blog. For arima model with external regressor we need future values. Its perfectly worked for example case where I just trained my model on 2 years data and I used next 5 months measurements for predictors as future value.
But what If I want to predict for future 3/6/ or even year and If I have to wait for future values then I am already in that time point. Then prediction does not make any sense.
Can someone explain this whole concept to me please. Sorry if I could not explain this whole scenario really well. I tried my level best to get around though.
Thanks in advance !!
If you don't have values for your future predictors, then you need to either forecast them first, or use a different model.
You could try a model without those predictors, or you could include lagged values of the predictors where the lag is at least as long as the forecast horizon.

Difference between simulate() and forecast() in "forecast" package

I am working on building a time series model.
However, I am having trouble understanding what the difference is between the simulate function and the forecast function in the forecast package.
Suppose I built an arima model and want to use it to simulate future values as long as 10 years. The data is hourly and we have a year worth of data.
When using forecast to predict the next 1000-step-ahead estimation, I got the following plot.
Using forecast method
Then I used the simulate function to simulate the next 1000 simulated values and got the following plot.
Using simulate method
Data points after the red line are simulated data points.
In the latter example, I used the following codes to simulate the future values.
simulate(arima1, nsim=1000, future=TRUE, bootstrap=TRUE))
where arima1 is my trained arima model, bootstrap residuals are used because the model residuals are not very normal.
Per definition in the forecast package, future=TRUE means that we are simulating future values based on the historical data.
Can anyone tell me what the difference is between these two method? Why does simulate() give me a much more realistic results but forecasted values from forecast() just converge to a constant after several iterations (no much fluctuation to the results from simulate())?
A simulation is a possible future sample path of the series.
A point forecast is the mean of all possible future sample paths. So the point forecasts are usually much less variable than the data.
The forecast function produces point forecasts (the mean) and interval forecasts containing the estimated variation in the future sample paths.
As a side point, an ARIMA model is not appropriate for this time series because of the skewness. You might need to use a transformation first.

Forecasting panel data and time series

I have a panel data set of lets say 1000 observations, so i=1,2,...,1000 . The data set runs in daily basis for a month, so t=1,2,...,31.
I want to estimate individual specific in R:
y_i10=αi+βi∗yi9+γi∗yi8+...+δi∗yi1+ϵit
and then produce density forecasts for the next 21 days, that is produce density forecasts for yi11,yi12 etc
My questions are:
Can I do this with plm package ? I am aware how to estimate with plm package but I do not know how to produce the forecasts.
Would it be easier (and correct) if I consider each observation as a separate time series, and use arima(9,0,0) for each one of them, and then get the density forecasts ? If so, how can I get the density forecasts ?
In (2.) , how can I include individual specific effects that are constant over time ?
Thanks a lot

Resources