Error using 'residuals' on moving average for ts - vector

I have been working on forecasting using ts object. In order to test the accuracy of a moving average, I used the code below:
fixt_ma <- ma(fixtures_training, 3)
residuals(fixt_ma)
acc_fixt_ma <- accuracy(fixt_ma, fixtures_test)
dput for fixtures_training
structure(c(161L, 338L, 393L, 405L, 439L, 386L, 442L, 406L, 413L,
421L), .Tsp = c(2019.48076923077, 2019.65384615385, 52), class = "ts")
When I use the residuals(fixt_ma) function, or alternatively when I write the code like residuals$fixt_ma, I get the error below:
Error: $ operator is invalid for atomic vectors
Does anyone know how I can fix this?

forecast::ma does moving average smoothing. It is not a model and so it has no residuals.
Perhaps you wanted an MA(3) model, in which case you could use
fixt_ma <- Arima(fixtures_training, order=c(0,0,3))
Then residuals() will work:
residuals(fixt_ma)
#> Time Series:
#> Start = c(2019, 26)
#> End = c(2019, 35)
#> Frequency = 52
#> [1] -79.553356 99.757891 -16.836345 -8.949918 42.906861 4.752855
#> [7] 39.762007 -29.453682 47.281713 11.804971
However accuracy() will give an error using the code in your question. If you want forecast accuracy measures on the test set, you first have to produce forecasts:
fc_ma <- forecast(fixt_ma, h=length(fixtures_test))
acc_fixt_ma <- accuracy(fc_ma, fixtures_test)

Related

Use lag(x,1) or lag(x,-1) for dynamic regression?

I have a simple yet somehow confusing question about dynamic regressions and lagged independent variables. I have 3 time series and I want to study the effect of 3 indedendent variables (namely PSVI, NSVI, and BTC_Ret) from the previous week on the current weeks bitcoin log returns. I want to analyse for example if a negative change in PSVI (Positve Sentiment Index) from the previous week can tell us something about the direction of the BTC returns in the following week.
I came across the lag function which can do exactly do that.
If I understand the function correctly, I would use the the lag function in combination with the dyn$lm function from the package dyn to get the results I want.
My code would then look as follows:
test1 <- dyn$lm(BTC_Ret~lag(PSVI,1)+lag(NSVI,1)+lag(BTC_Ret,1))
summary(test1)
Am I right to assume that I need to use lag(x,1) and not lag(x,-1)?
And should I use dyn$lm to study the effect or is there a better way to do all of this?
My data looks as follows:
structure(c(0.151825062532955, -0.179352391776254, -0.171610266403897,
0.0159227765884022, -0.353420091085592, -0.0179223189753976,
0.260710954985742, -0.0878045204765083, 0.17494222283881, -0.183889954532262,
-0.15249960475038, 0.0325479482522972, -0.216135243885031, 0.0258548317723122,
0.170469815313808, 0.0552681180119521, 0.0676987678252168, 0.0247151614282206,
-0.101373110320685, -0.0244444101458825, -0.363995910827583,
-0.819549195465083, -0.311532754839479, -0.661660753934884, -0.036159476713393,
-0.0116417252109642, -0.219357256430676, -0.386169350367107,
-0.468384245564164, 0.226420789220966, -0.2366560332375, 0.2425676656972,
-0.351430535471613, -0.287492079068963, 0.548071569094531, -0.228973857164721,
-0.139490538928287, 0.247548840497568, -0.361502742177194, 0.0604938285432965,
0.619445016304069, 0.0947076213861557, -0.887137767470338, 0.0485516007581502,
0.0429273907756451, -0.701341407090506, 0.34191134646093, -0.428167056300805,
-0.298917079322128, 0.517537828051947, 0.0474069010338689, -0.118044838446349,
-0.414289228784203, 0.143198527419672, 0.0733053148180489, 0.0131259707878403,
-0.106103445964187, 0.107827719520595, -0.604074345624302, 0.444400965939648
), .Dim = c(20L, 3L), .Dimnames = list(NULL, c("BTC_Ret", "PSVI",
"NSVI")), .Tsp = c(2018, 2018.36538461538, 52), class = c("mts",
"ts", "matrix"))
Many thanks!
Assuming tt defined in the Note at the end (copied from the question) we use the following.
ts class is normally used with R's lag. The -1 in that means move the series 1 forward so that the previous value lines up with the current row. There is more information in ?lag.
Do not use dplyr's lag which does not work with ts class and furthermore is different and uses the opposite convention or if you want to load dplyr use library(dplyr, exclude = c("filter", "lag")) to ensure that you are using R's lag.
library(dyn)
test1 <- dyn$lm(BTC_Ret ~ lag(PSVI,-1) + lag(NSVI,-1) + lag(BTC_Ret,-1), tt)
These alternatives also work:
Lag <- function(x, k = 1) lag(x, -k)
test2 <- dyn$lm(BTC_Ret ~ Lag(PSVI) + Lag(NSVI) + Lag(BTC_Ret), tt)
test3 <- dyn$lm(BTC_Ret ~ lag(tt, -1), tt)
Note
tt <- structure(c(0.151825062532955, -0.179352391776254, -0.171610266403897, 0.0159227765884022, -0.353420091085592, -0.0179223189753976, 0.260710954985742, -0.0878045204765083, 0.17494222283881, -0.183889954532262, -0.15249960475038, 0.0325479482522972, -0.216135243885031, 0.0258548317723122, 0.170469815313808, 0.0552681180119521, 0.0676987678252168, 0.0247151614282206, -0.101373110320685, -0.0244444101458825, -0.363995910827583, -0.819549195465083, -0.311532754839479, -0.661660753934884, -0.036159476713393, -0.0116417252109642, -0.219357256430676, -0.386169350367107, -0.468384245564164, 0.226420789220966, -0.2366560332375, 0.2425676656972, -0.351430535471613, -0.287492079068963, 0.548071569094531, -0.228973857164721, -0.139490538928287, 0.247548840497568, -0.361502742177194, 0.0604938285432965, 0.619445016304069, 0.0947076213861557, -0.887137767470338, 0.0485516007581502, 0.0429273907756451, -0.701341407090506, 0.34191134646093, -0.428167056300805, -0.298917079322128, 0.517537828051947, 0.0474069010338689, -0.118044838446349, -0.414289228784203, 0.143198527419672, 0.0733053148180489, 0.0131259707878403, -0.106103445964187, 0.107827719520595, -0.604074345624302, 0.444400965939648 ), .Dim = c(20L, 3L), .Dimnames = list(NULL, c("BTC_Ret", "PSVI", "NSVI")), .Tsp = c(2018, 2018.36538461538, 52), class = c("mts", "ts", "matrix"))

replace() error in MAGE function in R's iglu package for glucose analysis

I'm trying to use the all_metrics() function of the iglu package in R. The function always throws an error when calling the mage_ma_single() function, which itself gets called by the mage() function. Here's the package's function's source code which leads to an error, see lines 58 to 64.
Here's the error I'm getting:
Error: Problem with mutate() column MA_Long.
ℹ MA_Long = replace(MA_Long, 1:long_ma, MA_Long[long_ma]).
ℹ MA_Long must be size 5 or 1, not 23.
Here's some sample data with which I can reproduce the error:
structure(list(id = c(1, 1, 1, 1, 1), time = structure(c(1611228720, 1611229080, 1611247620, 1611249960, 1611263940), class = c("POSIXct", "POSIXt"), tzone = ""), gl = c(97L, 90L, 89L, 96L, 87L)), row.names = c(NA, 5L), class = "data.frame")
It turned out there needed to be at least 23 observations per ID in order for all_metrics() (and mage_ma_single()) to run successfully. I've mailed the package's author, Irina Gaynanova, and she'll look into providing a fix soon.

Error in xy.coords when trying to fit ARIMA model, please advise

I'm hoping you might be able to help me with an issue that I'm having when trying to fit an ARIMA model for a school project that I`m working on.
The data that I'm using shows weekly sales figures starting from 2019 and going till 2021. My goal is to produce a forecast for the remainder of 2021 based on those figures. As my dataset is comprised of weekly data and the seasonality based on the ACF and PACF plots seems to occur once a year I've set the "S =" argument from the sarima() function to 52. The problem is that every time I try to run the model, I keep getting an error and I can't figure out any way of getting rid of it.
I've tried to use the same code with other data sets on the datacamp environment with "S = 52" and the model runs without a problem. I'm hoping that somebody might be able to give me some advice on how to deal with this issue. Thank you!
P.S.
If the "S =" argument is set lower than 35 then the model will run. (Just in case this information might help)
####Install Packages####
library(tidyverse)
library(zoo)
library(xts)
library(lubridate)
library(astsa)
library(tseries)
library(forecast)
######Load and inspect the data########
unit_sales <- structure(list(Date = c("30/03/2019", "06/04/2019", "13/04/2019",
"20/04/2019", "27/04/2019", "04/05/2019", "11/05/2019", "18/05/2019",
"25/05/2019", "01/06/2019", "08/06/2019", "15/06/2019", "22/06/2019",
"29/06/2019", "06/07/2019", "13/07/2019", "20/07/2019", "27/07/2019",
"03/08/2019", "10/08/2019", "17/08/2019", "24/08/2019", "31/08/2019",
"07/09/2019", "14/09/2019", "21/09/2019", "28/09/2019", "05/10/2019",
"12/10/2019", "19/10/2019", "26/10/2019", "02/11/2019", "09/11/2019",
"16/11/2019", "23/11/2019", "30/11/2019", "07/12/2019", "14/12/2019",
"21/12/2019", "28/12/2019", "04/01/2020", "11/01/2020", "18/01/2020",
"25/01/2020", "01/02/2020", "08/02/2020", "15/02/2020", "22/02/2020",
"29/02/2020", "07/03/2020", "14/03/2020", "21/03/2020", "28/03/2020",
"04/04/2020", "11/04/2020", "18/04/2020", "25/04/2020", "02/05/2020",
"09/05/2020", "16/05/2020", "23/05/2020", "30/05/2020", "06/06/2020",
"13/06/2020", "20/06/2020", "27/06/2020", "04/07/2020", "11/07/2020",
"18/07/2020", "25/07/2020", "01/08/2020", "08/08/2020", "15/08/2020",
"22/08/2020", "29/08/2020", "05/09/2020", "12/09/2020", "19/09/2020",
"26/09/2020", "03/10/2020", "10/10/2020", "17/10/2020", "24/10/2020",
"31/10/2020", "07/11/2020", "14/11/2020", "21/11/2020", "28/11/2020",
"05/12/2020", "12/12/2020", "19/12/2020", "26/12/2020", "02/01/2021",
"09/01/2021", "16/01/2021", "23/01/2021", "30/01/2021", "06/02/2021",
"13/02/2021", "20/02/2021", "27/02/2021", "06/03/2021", "13/03/2021",
"20/03/2021", "27/03/2021"), Units = c(967053.4, 633226.9, 523264,
473914.2, 418087.5, 504342.2, 477819, 415650, 406972.3, 429791.4,
441724.4, 453221.8, 402005.8, 414993.4, 381457.2, 391218.7, 486925.9,
409791.8, 399217.9, 409210, 478121.2, 495549.1, 503918.3, 535949.5,
517450.4, 523036.8, 616456.9, 665979.3, 705201.5, 700168.1, 763538.8,
875501.2, 886586.6, 967806, 1094195, 1285950.5, 1450436.1, 1592162.8,
2038160.5, 1676988.8, 1026193.7, 820405.5, 738643.9, 669657.6,
720287.7, 673194.1, 754102.5, 639532, 680413.6, 710702, 711722.8,
834036.8, 427817.2, 505849.6, 441047.4, 439411, 487634.1, 594594.8,
548796.7, 565682, 528275.2, 448092, 467780.1, 544160.3, 538275.8,
485055.5, 592097.3, 537514.3, 493381.9, 445280.8, 448111.2, 419263.4,
457125.7, 561169.6, 704575.3, 656423.1, 653751.3, 622937.7, 718022.8,
768901.9, 793443, 814604.2, 876269.3, 982921.8, 1064920.7, 1201494.4,
1337374.9, 1619595.8, 1734773.8, 1624071, 1777832.3, 1648201.9,
1106253.8, 940141.1, 796129.1, 853392.9, 932059.1, 905990.4,
981188.6, 907823.9, 956098.8, 1003966.7, 1331125.5, 805593.6,
799486.2)), class = "data.frame", row.names = c(NA, -105L))
####Convert date column to date format
unit_sales$Date <- as.Date(unit_sales$Date, format ="%d/%m/%Y" )
###Convert to xts object
unit_sales_xts <- xts(unit_sales, unit_sales$Date)
periodicity(unit_sales_xts)
###Convert to ts object
unit_sales_vector <- unit_sales$Units
unit_sales_ts <- ts(unit_sales_vector, start = decimal_date(as.Date("2019-03-30")), frequency = 52)
###Plot data
ts.plot(unit_sales_ts)
###Make data stationary and plot it
ts.plot(diff(log(unit_sales_ts)))
###Plot ACF and PACF
pacf_plot <- pacf(diff(log(unit_sales_ts)), lag.max = 105)
acf_plot <- acf(diff(log(unit_sales_ts)), lag.max = 105)
###Test if data is stationary
adf.test(diff(log(unit_sales_ts)))
###Fit ARIMA model
sarima(unit_sales_ts, p = 1, d = 1, q = 0)
sarima.for(unit_sales_ts, n.ahead = 39, 1,1,0)
**###Fit Seasona ARIMA model - THIS IS WHERE THE ERROR OCCURS -**
sarima(unit_sales_ts, p = 1, d = 1, q = 0, P = 0, D = 1, Q = 0, S = 52)
###Forecast using the above model
sarima.for(unit_sales_ts,n.ahead = 39, p = 1, d = 1, q = 0, P = 0, D = 1, Q = 0, S = 52)
I tested you code and get the same error, so I read into the astsa::sarima() implementation and found these two lines, concerning the use of seasonality and your data:
alag <- max(10 + sqrt(num), 3 * S)
nlag <- ifelse(S < 7, 20, 3 * S)
Without reading the whole implementation, I deduce, that the package creator suposes 3 times the season size for the parameter to work correctly. Which is not your case with 105 observation when using S = 52. Now if that is a bug or just not well documented or properly treated in the code, I can not tell you. I do not know which version of the package datacamp runs and what is the update history of the package itself. But we can assume that at least one of the two lines causes the error since all values from 35 for S cause the same error.
One way to work arround is printing the implementation code of the function to console (just write "astsa::sarima" and hit enter, without the " though), copy it to modify the lines (I tried to use 2 * instead of 3 *) and assing it to a function name of your own. Then the code runs. Also you could try the print at the datacamp environment and compare to you local installation.

Checking residuals (from ETS+STL method) with checkresiduals() function

I have one ts object which contain one column with weekly data (freqency = 52) for the period 2016-2019(only one week from 2019).
#>TEST_1
#>Time Series:
#>Start = c(2016, 1)
#>End = c(2019, 1)
#>Frequency = 52
So I am performing forecast with this ts object with, function forcast() from forecast package.This function give me selection of best model ETS (Exponential smoothing) for my series.
Forecast method: STL + ETS(M,A,N)
Model Information:
ETS(M,A,N)
Call:
ets(y = x, model = etsmodel, allow.multiplicative.trend = allow.multiplicative.trend)
Smoothing parameters:
alpha = 0.0044
beta = 0.0044
Initial states:
l = 496.0001
b = -0.7495
sigma: 0.2538
AIC AICc BIC
2328.009 2328.406 2343.290
But here arise a new problem for me. Namely I trying to perform residual diagnostics for residuals from this model with function checkresiduals() but I receive this message.
#> Warning message:
#> In modeldf.default(object) :
#> Could not find appropriate degrees of freedom for this model.
So can anybody help me how to find appropriate degrees of freedom for this model
with checkresiduals() function? Below is data from residuals.
residuals<-structure(c(103.861587225712, 232.922530738897, -177.501044573567,
-32.3310448885088, 51.8658720663952, -127.669525632371, -21.3736988850188,
31.8283388622758, 134.388167819753, -202.279672375648, -150.211885150427,
59.7872220312138, 7.21928088178879, -31.0067512774922, 240.664063232754,
-259.693899860492, 51.2068097649542, 133.051059120384, 153.754774108432,
-245.448120335887, -41.7151580882252, 329.736089553496, -176.574681226445,
-5.49877539363433, -57.9440644242901, -141.920372666123, 59.631632197218,
30.3566233456523, -19.5674149569647, 49.8299466802158, 8.08039437858747,
-179.219757481181, 61.6262480548803, 14.2886335749734, 147.521659709062,
-203.114556948222, 232.39658682842, 17.0359701527633, 122.671792930753,
1.17404214154658, -21.3604900851155, 43.6067134825538, 56.6694972222097,
-74.206099457236, 22.2154797604099, -42.6209506582884, -69.0881062270763,
44.9935627424999, -65.4843011281191, 45.9859871219855, 38.48475732006,
217.607886572158, -81.752879329815, -62.3165846738133, 91.3280029935076,
13.8065979268541, -27.5160607993942, -2.45614326754531, 8.82428074173083,
-21.9816546447523, 58.6350169306539, 2.99591624137327, 25.4548944489055,
-7.80971451574547, -33.741824891111, 148.727324165574, -103.887619405031,
13.6976122890256, -6.22642628362576, -89.0151943344358, 151.68500527824,
113.373271376477, 165.103295852743, -295.039665234726, 213.698114407198,
-76.4034402042766, -9.34573346398901, -71.4103830503603, 122.800589573655,
-55.724016585403, 63.7939569095491, 44.9784699409192, 151.519180259845,
-58.4408170188741, -74.3037359893916, -47.7713298497972, 163.367074626196,
-249.379445021869, -112.112655284116, -43.5458433646284, -53.5666005867634,
281.491207440336, -121.212142480196, -33.9138735682901, -31.1438180301793,
-31.2555698825003, 20.3181357200996, -46.2564548372715, 19.2769399131227,
82.0903051423776, -53.9874588993755, -81.7381076026692, -109.42037514781,
-128.567530337503, 239.606771386708, -163.928615298084, 88.3650587021525,
22.3840519205474, -19.7936259061341, 133.392615761316, 14.8789465334592,
-7.35384302392632, -193.309220279654, 199.807229000058, 124.081926626315,
-52.3795507957004, 26.248230162833, -123.352126375918, -136.687848362162,
242.06397333675, -49.2896526387001, -47.0413692896267, -315.639803224046,
122.111855110991, -135.453045844048, -34.9514109509343, -51.0671430546247,
75.2304903204274, 58.5168476811577, 205.900859581612, -195.231017102347,
17.0666471041718, -55.7835085816988, -105.931678098968, -173.52733115843,
229.313605012801, 4.76417288414814, 24.9291766474627, -324.904858037879,
449.500524512662, -126.709163220759, 18.7291455153395, -76.1328146141673,
-298.217791616455, 137.973841964018, -16.2916958267025, -31.8650948708939,
99.4876416447454, -49.4760819558044, 84.1071094148195, 44.155870901787,
-133.53348599245, 117.30321085781, 35.0222913102854, 71.5981819455558,
-87.2032279610021, -272.900607282635), .Tsp = c(2016, 2019, 52
), class = "ts")
The degrees of freedom are: DF = (# lags to be tested) - (# parameters that you estimate).
You can manually set the degrees of freedom using:
checkresidual(..., df=DF)
Hope it works.
Goosse

ARIMA forecasts are way off

I am using ARIMA (auto.arima) to forecast for 52 weeks. The time series model fits the data well (see plot below, red line is the fitted value). The input data has a decreasing trend.
The forecasts (highlighted area) however seems to just taking off after the actual values end.
How can the forecasts be tamed?
dput of the input
> dput(baseTs)
structure(c(5.41951956469523, 5.49312499014084, 5.56299025716832,
5.64442852110163, 5.71385023974044, 5.77578632033402, 5.82985917237953,
5.86346591034374, 5.89626165157029, 5.92013286862512, 5.94200331713403,
5.93996840759539, 5.93917517855891, 5.90355191030718, 5.87180377346416,
5.83190030607801, 5.79624428055153, 5.75377043604686, 5.71445345904649,
5.70025269940165, 5.69789272204017, 5.73728731204876, 5.77015169357394,
5.78936321107329, 5.80113284575595, 5.79449448552444, 5.78193215198878,
5.74003482344406, 5.71694163930612, 5.66689345413153, 5.614357635737,
5.58578389962286, 5.55824727570498, 5.58495146060423, 5.61344117957187,
5.63637441850401, 5.65948408172102, 5.65558124383951, 5.64909390802285,
5.6664546352889, 5.68205689033408, 5.69991437586231, 5.72273650369514,
5.72006065065194, 5.71556512542993, 5.6717608006789, 5.64610326418084,
5.57193975508467, 5.49406607804055, 5.40126523530993, 5.31513540386482,
5.238437956722, 5.15362077920702, 5.11960611878249, 5.08498887979172,
5.08408134201562, 5.07361213981111, 5.04830559379816, 5.01401413448689,
5.0418662607737, 5.06947584464062, 5.08771495309317, 5.10587165060358,
5.1438369937098, 5.1815251206981, 5.2318657906363, 5.29385492077065,
5.29652029253008, 5.29998067741868, 5.28242409629194, 5.2722770646788,
5.24927444462166, 5.22226735874711, 5.16555064465208, 5.10956459841778,
5.09439240612378, 5.07617974794969, 5.04418337811006, 5.0075619037348,
4.99108423417745, 4.9874504485194, 4.99135285004736, 4.99217791657733,
4.94874445528885, 4.90320874819525, 4.84508278068469, 4.79086127023963,
4.75236840849279, 4.71431573721527, 4.71936529020481, 4.72422850167074,
4.72203091743033, 4.71732868614755, 4.71175323610448, 4.70566162766782,
4.71165837247331, 4.71767529028615, 4.75129316683193, 4.7863855803437,
4.85248191548789, 4.91865394024373, 4.9590849617955, 4.99960686851895,
5.02020678181827, 5.04201201976595, 5.02025906892952, 4.99735920720967,
4.92520279823639, 4.84822505567723, 4.81118504683572, 4.77330440072099,
4.72636395544651, 4.6861111959621, 4.64912520396312, 4.61348981514599,
4.58517820348434, 4.56378688913207, 4.549011597464, 4.52900600122321,
4.56028365470815, 4.60248987909752, 4.65628990381626, 4.70496326660038,
4.73779351647955, 4.76616725791407, 4.79569018347378, 4.83185281078024,
4.85177852259102, 4.87488251014986, 4.89468916229158, 4.9077984323135,
4.92375782591088, 4.96363767543938, 5.05416277704822, 5.1426680212522,
5.232495043331, 5.32153608753653, 5.41780853915163, 5.51131526881126,
5.62791210324026), .Tsp = c(2015.05769230769, 2017.73076923077,
52), class = "ts")
The code used
fc <- try(auto.arima(baseTs,ic='aic',approximation = F))
baseFc <- forecast(fc,h = weeks_forecasted)
baseVolume_forecast_new <- baseFc$mean
What could be the reason behind the forecasts exploding?

Resources