Confidence Interval for Training data - r

I am building a time series model in R with training data and predicting the future values.
fit_arima <- auto.arima(train.ts, xreg=xreg.vars.train)
I get the CI for the predicted data using the model that I developed with training data.
fcast_arima <- forecast(fit_arima, xreg = xreg.vars.test, h= nrow(test.data), level=95)
Point Forecast Lo 95 Hi 95
Apr 2015 2.000000 1.396790 2.603210
May 2015 2.000000 1.396790 2.603210
Jun 2015 2.397746 1.794537 3.000956
Jul 2015 2.000000 1.396790 2.603210
Aug 2015 2.397746 1.794537 3.000956
Sep 2015 2.000000 1.396790 2.603210
Oct 2015 2.000000 1.396790 2.603210
Nov 2015 2.397746 1.794537 3.000956
Dec 2015 2.795493 2.192283 3.398702
But I am looking for a way to get CI for training data as well. Can someone help to find the way to do?
Thanks,
Kaly

Related

How to change negative values to 0 of forecasts in R?

As the data is of rainfall, I want to replace the negative values both in point forecasts and intervals with 0. How can this be done in R ? Looking for the R codes that can make the required changes.
The Forecast values obtained in R using an ARIMA model are given below
> Predictions
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 2021 -1.6625108 -165.62072 162.2957 -252.41495 249.0899
Feb 2021 0.8439712 -165.57869 167.2666 -253.67752 255.3655
Mar 2021 35.9618300 -130.53491 202.4586 -218.67297 290.5966
Apr 2021 53.4407679 -113.05822 219.9398 -201.19746 308.0790
May 2021 206.7464927 40.24744 373.2455 -47.89184 461.3848
Jun 2021 436.2547446 269.75569 602.7538 181.61641 690.8931
Jul 2021 408.2814434 241.78239 574.7805 153.64311 662.9198
Aug 2021 431.7649076 265.26585 598.2640 177.12657 686.4032
Sep 2021 243.5520546 77.05300 410.0511 -11.08628 498.1904
Oct 2021 117.4581047 -49.04095 283.9572 -137.18023 372.0964
Nov 2021 25.0773401 -141.42171 191.5764 -229.56098 279.7157
Dec 2021 28.9468415 -137.55188 195.4456 -225.69098 283.5847
Jan 2022 -0.4912674 -171.51955 170.5370 -262.05645 261.0739
Feb 2022 2.2963271 -168.86759 173.4602 -259.47630 264.0690
Mar 2022 43.3561613 -127.81187 214.5242 -218.42275 305.1351
Apr 2022 48.6538398 -122.51431 219.8220 -213.12526 310.4329
May 2022 228.4762035 57.30805 399.6444 -33.30290 490.2553
Jun 2022 445.3540781 274.18592 616.5222 183.57497 707.1332
Jul 2022 441.8287867 270.66063 612.9969 180.04968 703.6079
Aug 2022 592.5766086 421.40845 763.7448 330.79751 854.3557
Sep 2022 220.6996396 49.53148 391.8678 -41.07946 482.4787
Oct 2022 158.7952154 -12.37294 329.9634 -102.98389 420.5743
Nov 2022 29.9052184 -141.26288 201.0733 -231.87380 291.6842
Dec 2022 25.9432583 -145.22303 197.1095 -235.83298 287.7195
In this context, try using:
Predictions[Predictions < 0] <- 0
Which will replace all values less than 0 with 0. Because of the processing, the use of for loops is discouraged in applications where vectorization can be applied.

Modeling a repeated measures logistic growth curve

I have cumulative population totals data for the end of each month for two years (2016, 2017). I would like to combine these two years and treat each months cumulative total as a repeated measure (one for each year) and fit a non linear growth model to these data. The goal is to determine whether our current 2018 cumulative monthly totals are on track to meet our higher 2018 year-end population goal by increasing the model's asymptote to our 2018 year-end goal. I would ideally like to integrate a confidence interval into the model that reflects the variability between the two years at each month.
My columns in my data.frame are as follows:
- Year is year
- Month is month
- Time is the month's number (1-12)
- Total is the month-end cumulative population total
- Norm is the proportion of year-end total for that month
- log is the Total log transformed
Year Month Total Time Norm log
1 2016 January 3919 1 0.2601567 8.273592
2 2016 February 5887 2 0.3907993 8.680502
3 2016 March 7663 3 0.5086962 8.944159
4 2016 April 8964 4 0.5950611 9.100972
5 2016 May 10014 5 0.6647637 9.211739
6 2016 June 10983 6 0.7290892 9.304104
7 2016 July 11775 7 0.7816649 9.373734
8 2016 August 12639 8 0.8390202 9.444543
9 2016 September 13327 9 0.8846920 9.497547
10 2016 October 13981 10 0.9281067 9.545455
11 2016 November 14533 11 0.9647504 9.584177
12 2016 December 15064 12 1.0000000 9.620063
13 2017 January 3203 1 0.2163458 8.071843
14 2017 February 5192 2 0.3506923 8.554874
15 2017 March 6866 3 0.4637622 8.834337
16 2017 April 8059 4 0.5443431 8.994545
17 2017 May 9186 5 0.6204661 9.125436
18 2017 June 10164 6 0.6865248 9.226607
19 2017 July 10970 7 0.7409659 9.302920
20 2017 August 11901 8 0.8038501 9.384378
21 2017 September 12578 9 0.8495778 9.439705
22 2017 October 13422 10 0.9065856 9.504650
23 2017 November 14178 11 0.9576494 9.559447
24 2017 December 14805 12 1.0000000 9.602720
Here is my data plotted as a scatter plot:
Should I treat the two years as separate models or can I combine all the data into one?
I've been able to calculate the intercept and the growth parameter for just 2016 using the following code:
coef(lm(logit(df_tot$Norm[1:12]) ~ df_tot$Time[1:12]))
and got a non-linear least squares regression for 2016 with this code:
fit <- nls(Total ~ phi1/(1+exp(-(phi2+phi3*Time))), start = list(phi1=15064, phi2 = -1.253, phi3 = 0.371), data = df_tot[c(1:12),], trace = TRUE)
Any help is more than appreciated! Time series non-linear modeling is not my strong suit and googling hasn't got me very far at this point.

Can this time series forecasting model (in R) be further improved?

I am trying to build this forecasting model but can't get impressive results. The low no. of records to train the model is one of the reasons for not so good results, I believe, and so I am seeking help.
Here is the predictor variables' time series matrix. Here the Paidts7 variable is actually a lag variable of Paidts6.
XREG =
Paidts2 Paidts6 Paidts7 Paidts4 Paidts5 Paidts8
Jan 2014 32932400 29703000 58010000 21833 38820 102000.0
Feb 2014 33332497 35953000 29703000 10284 38930 104550.0
Mar 2014 35811723 40128000 35953000 11132 39840 104550.0
Apr 2014 28387000 29167000 40128000 13171 40010 104550.0
May 2014 27941601 27942000 29167000 9192 39640 104550.0
Jun 2014 34236746 35010000 27942000 8766 39430 104550.0
Jul 2014 22986887 26891000 35010000 11217 39060 104550.0
Aug 2014 31616679 31990000 26891000 8118 38840 104550.0
Sep 2014 41839591 46052000 31990000 10954 38380 104550.0
Oct 2014 36945266 36495000 46052000 14336 37920 104550.0
Nov 2014 44026966 41716000 36495000 12362 36810 104550.0
Dec 2014 57689000 60437000 41716000 14498 36470 104550.0
Jan 2015 35150678 35263000 60437000 22336 34110 104550.0
Feb 2015 33477565 33749000 35263000 12188 29970 107163.8
Mar 2015 41226928 41412000 33749000 11122 28580 107163.8
Apr 2015 31031405 30588000 41412000 12605 28970 107163.8
May 2015 31091543 29327000 30588000 9520 27820 107163.8
Jun 2015 38212015 35818000 29327000 10445 28880 107163.8
Jul 2015 32523660 32102000 35818000 12006 28730 107163.8
Aug 2015 33749299 33482000 32102000 9303 27880 107163.8
Sep 2015 48275932 44432000 33482000 10624 25950 107163.8
Oct 2015 32067045 32542000 44432000 15324 25050 107163.8
Nov 2015 46361434 40862000 32542000 10706 25190 107163.8
Dec 2015 68206802 71005000 40862000 14499 24670 107163.8
Jan 2016 34847451 29226000 71005000 23578 23100 107163.8
Feb 2016 34249625 43835001 29226000 13520 21430 109842.9
Mar 2016 45707923 56087003 43835001 15247 19980 109842.9
Apr 2016 33512366 37116000 56087003 18797 20900 109842.9
May 2016 33844153 42902002 37116000 11870 21520 109842.9
Jun 2016 40251630 53203010 42902002 14374 23150 109842.9
Jul 2016 33947604 38411008 53203010 18436 24230 109842.9
Aug 2016 35391779 38545003 38411008 11654 24050 109842.9
Sep 2016 49399281 55589008 38545003 13448 23510 109842.9
Oct 2016 36463617 45751005 55589008 19871 23940 109842.9
Nov 2016 45182618 51641006 45751005 14998 24540 109842.9
Dec 2016 64894588 79141002 51641006 18143 24390 109842.9
Here is the Y variable (to be predicted)
Jan Feb Mar Apr May Jun
2014 1266757.8 1076023.4 1285495.7 1026840.2 910148.8 1111744.5
2015 1654745.7 1281946.6 1372669.3 1017266.6 841578.4 1353995.5
2016 1062048.8 1860531.1 1684564.3 1261672.0 1249547.7 1829317.9
Jul Aug Sep Oct Nov Dec
2014 799973.1 870778.9 1224827.3 1179754.0 1186726.3 1673259.5
2015 1127006.2 779374.9 1223445.6 925473.6 1460704.8 1632066.2
2016 1410316.4 1276771.1 1668296.7 1477083.3 1466419.2 2265343.3
I tried Forecast::ARIMA and Forecast::NNETAR models with external regressor but couldn't bring MAPE below 7. I am targetting MAPE below 3 and RMSE under 50000. You are welcome to use any other package and function.
Here is the test data: XREG =
Paidts2test Paidts6test Paidts7test Paidts4test
Jan 2017 31012640 36892000 79141002 27912
Feb 2017 33009746 39020000 36892000 9724
Mar 2017 39296653 52787000 39020000 11335
Apr 2017 36387649 36475000 52787000 17002
May 2017 40269571 41053000 36475000 11436
Paidts5test Paidts8test
Jan 2017 25100 109842.9
Feb 2017 25800 112589.0
Mar 2017 25680 112589.0
Apr 2017 25540 112589.0
May 2017 25830 112589.0
Y =
1627598 1041766 1381536 1346429 1314992
If you find out that removing one or more of the predictor variables is improving the result significantly, please go ahead. Your help will be greatly appreciated and please suggest in 'R' only not in some other tool.
-Thanks
Try auto.arima, it will also allow you to use xreg.
https://www.rdocumentation.org/packages/forecast/versions/8.1/topics/auto.arima

hybridModel of Auto.arima and ANN produce point forecast outside of 95% CI

I have been working on time series forecasting and recently read about how the hybrid model of auto.arima and ann provide better/more accurate forecasting results.
I have six time series data sets, the hybrid model work wonders for five of them but it gives weird results for the other.
I ran the model using the following to packages:
library(forecast)
library(forecastHybrid)
Here is the data:
ts.data
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012 1 16 41 65 87 104 152 203 213 263
2013 299 325 388 412 409 442 447 421 435 448 447 443
2014 454 446 467 492 525
Model:
fit <- hybridModel(ts.data, model="an")
Forecast results for the next 5 periods:
forecast(fit, 5)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jun 2014 594.6594 519.2914 571.0163 505.6007 584.7070
Jul 2014 702.1626 528.7327 601.8827 509.3710 621.2444
Aug 2014 738.5732 540.6665 630.2566 516.9534 653.9697
Sep 2014 752.1329 553.8905 657.3403 526.5090 684.7218
Oct 2014 762.7481 567.9391 683.5994 537.3256 714.2129
You see how the point forecasts are outside of the 95% confidence interval.
Does anybody know what this is happening and how I could fix it?
Any thoughts and insights are appreciated!
Thanks in advance.
See the description of this issue here
tl;dr nnetar models do not create prediction intervals, so these are not included in the ensemble prediction intervals. When the "forecast" package adds this behavior (on the road map for 2016), the prediction intervals and point forecasts will be consistent

How to get forecast dataset from R language?

I am following along in this guide to forecast data in ARIMA data.
The question I have is how do I extract the data points from the forecasted data?
I would like to have those points so I could graph the exact same thing in excel. Is this possible?
Thank you.
Suppose you use something like
library(forecast)
m_aa <- auto.arima(AirPassengers)
f_aa <- forecast(m_aa, h=24)
then you can show values for the forecast, for example with
f_aa
which gives
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 1961 446.7582 431.7435 461.7729 423.7953 469.7211
Feb 1961 420.7582 402.5878 438.9286 392.9690 448.5474
Mar 1961 448.7582 427.9043 469.6121 416.8649 480.6515
Apr 1961 490.7582 467.5287 513.9877 455.2318 526.2846
May 1961 501.7582 476.3745 527.1419 462.9372 540.5792
Jun 1961 564.7582 537.3894 592.1270 522.9012 606.6152
Jul 1961 651.7582 622.5388 680.9776 607.0709 696.4455
Aug 1961 635.7582 604.7986 666.7178 588.4096 683.1069
Sep 1961 537.7582 505.1511 570.3653 487.8900 587.6264
Oct 1961 490.7582 456.5830 524.9334 438.4918 543.0246
Nov 1961 419.7582 384.0838 455.4326 365.1989 474.3176
Dec 1961 461.7582 424.6450 498.8714 404.9985 518.5179
Jan 1962 476.5164 431.6293 521.4035 407.8675 545.1653
Feb 1962 450.5164 401.1834 499.8494 375.0681 525.9647
Mar 1962 478.5164 425.1064 531.9265 396.8328 560.2000
Apr 1962 520.5164 463.3192 577.7137 433.0408 607.9920
May 1962 531.5164 470.7676 592.2652 438.6092 624.4237
Jun 1962 594.5164 530.4126 658.6203 496.4780 692.5548
Jul 1962 681.5164 614.2245 748.8083 578.6024 784.4304
Aug 1962 665.5164 595.1809 735.8519 557.9475 773.0853
Sep 1962 567.5164 494.2636 640.7692 455.4859 679.5469
Oct 1962 520.5164 444.4581 596.5747 404.1953 636.8376
Nov 1962 449.5164 370.7525 528.2803 329.0574 569.9754
Dec 1962 491.5164 410.1368 572.8961 367.0570 615.9758
and you can save these values with something like
write.csv(f_aa, file="location_and_filename.csv")

Resources