Issues with ARIMAX forecasting (auto.arima) - r

I'm trying to forecast an accumulated monthly time series (see data below) with the auto.arima function with exogenous regressors. I have two issues.
1) My first issue is that when I fit the model and use the forecast function to predict the second half of 2019 the forecast starts from zero as can be seen in this forecast plot.
This only happens when I include a matrix of exogenous regressors and not when I use a single time series as regressor as can be seen in this plot.
Why is that? My code is:
regnskab <- ts(data$Regnskab, frequency = 12, start = c(2014,1), end = c(2019,6))
budget <- ts(data$Budget, frequency = 12, start = c(2014,1), end = c(2019,6))
dagtilbud <- ts(data$Dagtilbud, frequency = 12, start = c(2014,1), end = c(2019,6))
skole <- ts(data$Skole, frequency = 12, start = c(2014,1), end = c(2019,6))
sundhed <- ts(data$Sundhed, frequency = 12, start = c(2014,1), end = c(2019,6))
miljø <- ts(data$Miljø, frequency = 12, start = c(2014,1), end = c(2019,6))
tsmatrix <- cbind(budget, dagtilbud, miljø, skole, sundhed)
fit <- auto.arima(regnskab, xreg = tsmatrix)
fcast <- forecast(fit, h = 6, xreg = tsmatrix)
autoplot(fcast)
summary(fcast)
2) My second issue is that I want a forecast for 6 months forward, but the h=6 option does not apply when including exogenous regressors. Can this be solved in any way? Again, it is not a problem without exogenous regressors.
I hope you can help and sorry for the data spamming!
A summary of my model:
> summary(fcast)
Forecast method: Regression with ARIMA(1,0,0)(1,0,0)[12] errors
Model Information:
Series: regnskab
Regression with ARIMA(1,0,0)(1,0,0)[12] errors
Coefficients:
ar1 sar1 budget dagtilbud miljø skole sundhed
0.7466 0.6693 0.0101 2.0861 0.1037 2.5240 7.7623
s.e. 0.0935 0.1042 0.0077 0.6967 1.7672 0.7535 2.6611
sigma^2 estimated as 1.884: log likelihood=-114.84
AIC=245.68 AICc=248.21 BIC=263.2
Error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set -0.01739231 1.297694 0.9002519 -0.1065542 0.9060671 0.3687968 -0.03222251
> regnskab
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2014 19.11281 36.68003 54.66383 74.93864 94.10328 113.36373 134.96638 152.75095 170.79800 189.55430 207.00803 227.82096
2015 18.90205 37.20079 55.73305 75.44689 94.74538 115.03997 136.79829 155.41164 173.69889 191.96484 210.42391 231.52982
2016 20.12939 38.51516 56.32522 78.04822 97.46681 116.58424 139.43255 157.83048 175.26727 195.06259 213.73833 234.45281
2017 20.43082 38.55219 57.50119 78.07558 97.50132 119.13735 141.71973 161.49281 180.32002 199.27769 216.92571 239.40683
2018 19.35194 37.40571 55.36897 76.33412 95.90922 117.41442 140.03545 159.10527 177.88068 194.43207 215.28905 245.85670
2019 20.85722 40.01691 59.97383 81.92719 103.15225 123.81454
> tsmatrix
budget dagtilbud miljø skole sundhed
Jan 2014 230.0605 2.616639 0.597125 3.193017 0.456470
Feb 2014 230.0605 5.025708 1.047983 6.402845 1.012468
Mar 2014 230.0605 7.548424 1.458105 9.816814 1.602384
Apr 2014 230.0605 10.350321 1.957022 13.446215 2.263646
May 2014 230.0605 12.913356 2.439587 17.100957 2.873934
Jun 2014 230.0605 15.380146 2.915020 20.791343 3.498350
Jul 2014 230.0605 17.931069 3.434464 23.701276 3.987042
Aug 2014 230.0605 20.441732 3.837721 27.319389 4.597127
Sep 2014 230.0605 22.839922 4.295486 30.859254 5.185271
Oct 2014 230.0605 25.234620 4.761740 34.350629 5.819948
Nov 2014 230.0605 27.554525 5.163576 37.688182 6.416112
Dec 2014 230.0605 30.109529 5.742699 42.095747 7.313195
Jan 2015 234.5089 2.404843 0.643976 3.185265 0.477921
Feb 2015 234.5089 5.090533 1.094641 6.654691 1.040235
Mar 2015 234.5089 7.319261 1.462134 10.168618 1.659232
Apr 2015 234.5089 10.040823 1.943120 14.082780 2.356247
May 2015 234.5089 12.470742 2.431818 17.827494 2.963360
Jun 2015 234.5089 14.846720 3.019969 21.612527 3.615607
Jul 2015 234.5089 17.543682 3.540084 24.702634 4.126374
Aug 2015 234.5089 19.786612 3.984587 28.330977 4.741392
Sep 2015 234.5089 22.037785 4.362497 31.942762 5.367815
Oct 2015 234.5089 24.347196 4.805391 35.423452 6.019133
Nov 2015 234.5089 26.751255 5.250481 38.964450 6.642436
Dec 2015 234.5089 29.276667 5.789919 43.428855 7.555361
Jan 2016 237.2361 2.538133 0.721184 3.352676 0.508847
Feb 2016 237.2361 4.906975 1.377086 6.804320 1.100914
Mar 2016 237.2361 7.184724 1.719629 10.290800 1.744743
Apr 2016 237.2361 9.895237 2.333842 14.223635 2.480869
May 2016 237.2361 12.316509 2.850905 17.957433 3.115473
Jun 2016 237.2361 14.578536 3.404785 21.759111 3.858713
Jul 2016 237.2361 17.215216 3.867858 24.949928 4.359129
Aug 2016 237.2361 19.399769 4.406750 28.503968 5.030926
Sep 2016 237.2361 21.702215 4.792190 32.112449 5.674259
Oct 2016 237.2361 24.112579 5.238401 35.625806 6.328084
Nov 2016 237.2361 26.453919 5.677270 39.158270 6.977991
Dec 2016 237.2361 28.969565 6.098136 43.558768 7.974787
Jan 2017 241.9089 2.538901 0.917354 3.488151 0.535639
Feb 2017 241.9089 4.847981 1.450172 6.857674 1.138782
Mar 2017 241.9089 7.281994 1.899543 10.394615 1.808938
Apr 2017 241.9089 10.031959 2.388542 14.335895 2.554613
May 2017 241.9089 12.411935 2.893036 18.042788 3.206503
Jun 2017 241.9089 14.982942 3.282057 22.137085 3.959622
Jul 2017 241.9089 17.567382 3.770244 25.392706 4.540047
Aug 2017 241.9089 19.738993 4.484434 29.108498 5.196528
Sep 2017 241.9089 22.273634 5.051894 32.693173 5.870257
Oct 2017 241.9089 24.636583 5.456458 36.203329 6.544383
Nov 2017 241.9089 27.259158 5.793056 39.867875 7.249982
Dec 2017 241.9089 29.831986 6.079033 44.273697 8.269454
Jan 2018 246.0944 2.467981 0.985846 3.377469 0.544258
Feb 2018 246.0944 4.877189 1.383190 6.815726 1.167431
Mar 2018 246.0944 7.367918 1.738033 10.486250 1.848972
Apr 2018 246.0944 10.148353 2.249466 14.439246 2.614913
May 2018 246.0944 12.687311 2.844656 18.194669 3.328234
Jun 2018 246.0944 15.482606 3.616200 22.433048 4.108966
Jul 2018 246.0944 17.715938 3.982451 25.305411 4.689087
Aug 2018 246.0944 20.077201 4.696088 29.018017 5.396796
Sep 2018 246.0944 22.659831 5.158706 32.860215 6.087975
Oct 2018 246.0944 24.719623 5.586616 36.143198 6.713136
Nov 2018 246.0944 27.750904 6.069519 40.237747 7.501346
Dec 2018 246.0944 30.326036 6.308786 44.733470 8.564162
Jan 2019 251.9230 2.653607 0.932776 3.501389 0.595458
Feb 2019 251.9230 5.070721 1.445741 6.991538 1.243721
Mar 2019 251.9230 7.542256 1.825956 10.737607 1.941444
Apr 2019 251.9230 10.301781 2.330015 14.647082 2.733956
May 2019 251.9230 13.193286 2.999816 18.671285 3.455616
Jun 2019 251.9230 15.423716 3.516735 22.612031 4.145206

The xreg matrix in the forecast function should be for the future time periods. If you want h=6, then give a matrix of 6 rows corresponding to those 6 periods.

Related

Sub string from character string using regex in R

I'm scraping PDF reports for their data.
I'm trying to extract the location the report is based off. I've got a character string with the location, and then a rolling 13 months header seen here:
header_line <- "Corp Dec '20 Jan '21 Feb '21 Mar '21 Apr '21 May '21 Jun '21 Jul '21 Aug '21 Sep '21 Oct '21 Nov '21 Dec '21"
I'd like to extract all characters from the beginning of the string to the start of WHATEVER month could be appearing after Because it's a rolling 13-month report, it could be any of those months abbreviations next to the location.
I have this working for the above example, but I'm not sure how to create an "Or pattern" with regex. I know I could brute force it with a loop or apply function, but I was hoping there was a less dirty way.
stringr::str_extract(header_line, "[^Dec]+")
[1] "Corp "
It is difficult to anticipate the possible cases that the location could be, but the below solution may cover most of it. It will match everything prior to 3 alphabetical characters, followed by a space, and apostrophe, and 2 digits.
str_extract(header_line, '^(.*?)(?=[a-zA-Z]{3}\\s\'\\d{2})')
Test cases:
header_line <- "Corp Dec '20 Jan '21 Feb '21 Mar '21 Apr '21 May '21 Jun '21 Jul '21 Aug '21 Sep '21 Oct '21 Nov '21 Dec '21"
header_line2 <- "Corp multiple words Dec '20 Jan '21 Feb '21 Mar '21 Apr '21 May '21 Jun '21 Jul '21 Aug '21 Sep '21 Oct '21 Nov '21 Dec '21"
header_line3 <- "Corp multiple words 1 Dec '20 Jan '21 Feb '21 Mar '21 Apr '21 May '21 Jun '21 Jul '21 Aug '21 Sep '21 Oct '21 Nov '21 Dec '21"
header_line4 <- "Corp multiple 444 Dec '20 Jan '21 Feb '21 Mar '21 Apr '21 May '21 Jun '21 Jul '21 Aug '21 Sep '21 Oct '21 Nov '21 Dec '21"
str_extract(header_line, '^(.*?)(?=[a-zA-Z]{3}\\s\'\\d{2})')
[1] "Corp"
str_extract(header_line2, '^(.*?)(?=[a-zA-Z]{3}\\s\'\\d{2})')
[1] "Corp multiple words"
str_extract(header_line3, '^(.*?)(?=[a-zA-Z]{3}\\s\'\\d{2})')
[1] "Corp multiple words 1"
str_extract(header_line4, '^(.*?)(?=[a-zA-Z]{3}\\s\'\\d{2})')
[1] "Corp multiple 444"

rrule to get the 2nd Monday, Wednesday and Friday of the month, for every month

I am trying to create a rrule for my fullcalendar event, that occur on the 2nd Monday, Wednesday and Friday of the month for every month.
Here is the rrule I have tried
RRULE:FREQ=MONTHLY;COUNT=10;INTERVAL=1;WKST=SU;BYDAY=MO,WE,FR;BYSETPOS=2
events: [{
title: 'rrule event',
rrule: {
freq: RRule.MONTHLY,
count: 10,
interval: 1,
wkst: RRule.SU,
byweekday: [RRule.MO, RRule.WE, RRule.FR],
bysetpos: [2]
},
duration: '02:00',
rendering: 'inverse-background'
}
],
This is what I get
1 Fri, 03 May 2019 12:33:53 GMT
2 Wed, 05 Jun 2019 12:33:53 GMT
3 Wed, 03 Jul 2019 12:33:53 GMT
4 Mon, 05 Aug 2019 12:33:53 GMT
5 Wed, 04 Sep 2019 12:33:53 GMT
6 Fri, 04 Oct 2019 12:33:53 GMT
7 Mon, 04 Nov 2019 12:33:53 GMT
8 Wed, 04 Dec 2019 12:33:53 GMT
9 Fri, 03 Jan 2020 12:33:53 GMT
10 Wed, 05 Feb 2020 12:33:53 GMT
What is expected is
1 Mon, 08 Apr 2019
2 Wed, 10 Apr 2019
3 Fri, 12 Apr 2019
4 Mon, 13 May 2019
5 Wed, 08 May 2019
6 Fri, 10 May 2019.........
RFC 5545, section 3.3.10. states:
Each BYDAY value can also be preceded by a positive (+n) or
negative (-n) integer. If present, this indicates the nth
occurrence of a specific day within the MONTHLY or YEARLY "RRULE".
So the rule you're looking for literally specifies the 2nd Monday (2MO), Wednesday (2WE) and Friday (2FR) of each month.
FREQ=MONTHLY;COUNT=10;BYDAY=2MO,2WE,2FR
(click to see the results)
Note that INTERVAL=1 is the default and WKST=SU is meaningless in this case, so you can just as well omit them.
Btw, your rule basically says, of all Mondays, Wednesdays and Fridays of a month, take the second instance in that month.

Is there a better way to order levels and factors together? I have successfully ordered them but there must be a more elegant way.

Create a month vector.
> mths<-month.abb
> mths
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
but this is character vector, hence I convert it to factors as follows:
> mths1<-factor(mths)
> mths1
[1] Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Levels: Apr Aug Dec Feb Jan Jul Jun Mar May Nov Oct Sep
But now the order of levels is shown Alphabetic, Which is not what we want. Trying the following we get the levels in the correct order but data gets jumbled.
> levels(mths1)<-mths
> mths1
[1] May Apr Aug Jan Sep Jul Jun Feb Dec Nov Oct Mar
Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
And then I tried this:
> mths1[]<-mths
> mths1
[1] Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
And now the factors as well as their levels are all sorted correctly.
I want to know what is the dynamics happening behind the scenes in each of the above cases as the assignments to levels and data got me a bit confused.
Finally, what is a more elegant way to achieve the same thing?
Combining comments above into an answer
mths<-month.abb
mths
# [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
mths1 <-factor(mths, levels=mths, ordered=TRUE)
mths1
# [1] Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec

Subsetting an R xts object by year returns data from February to January of following year

I have a monthly data series formatted as an xts object with 361 months of data ending in 2016. Below is a sample.
Jan 2015 20.97
Feb 2015 13.34
Mar 2015 15.29
Apr 2015 14.55
May 2015 13.84
Jun 2015 18.23
Jul 2015 12.12
Aug 2015 28.43
Sep 2015 24.50
Oct 2015 15.07
Nov 2015 16.13
Dec 2015 18.21
Jan 2016 27.59
I'm trying to extract a subset with VIXCLS["2015/"]. When I get back the results the year starts on February of 2015 and ends on Jan of 2016. I've also tried to extract a specific month (ex: VIXCLS["2015-12-31]) and I get back January 2015 data with the January time stamp.
VIXCLS['2015-12-31']
# VIXCLS
# Jan 2016 27.59
Please let me know if you have any ideas as why this is happening.
This is likely a timezone issue. Your index is yearmon, which does not have a timezone. But your xts object may have a non-UTC timezone, which would cause conversion issues. For example, this xts object has my local timezone, so the subsetting works as you describe:
VIXCLS <- structure(c(20.97, 13.34, 15.29, 14.55, 13.84, 18.23, 12.12, 28.43, 24.5,
15.07, 16.13, 18.21, 27.59), .Dim = c(13L, 1L), index = structure(c(1420070400,
1422748800, 1425168000, 1427846400, 1430438400, 1433116800, 1435708800,
1438387200, 1441065600, 1443657600, 1446336000, 1448928000, 1451606400),
tzone = "", tclass = "yearmon"), class = c("xts", "zoo"),
.indexCLASS = "yearmon", tclass = "yearmon", .indexTZ = "", tzone = "")
VIXCLS['2015/']
# [,1]
# Feb 2015 13.34
# Mar 2015 15.29
# Apr 2015 14.55
# May 2015 13.84
# Jun 2015 18.23
# Jul 2015 12.12
# Aug 2015 28.43
# Sep 2015 24.50
# Oct 2015 15.07
# Nov 2015 16.13
# Dec 2015 18.21
# Jan 2016 27.59
But if I set the indexTZ to "UTC", the issue goes away:
indexTZ(VIXCLS) <- "UTC"
VIXCLS['2015/']
# [,1]
# Jan 2015 20.97
# Feb 2015 13.34
# Mar 2015 15.29
# Apr 2015 14.55
# May 2015 13.84
# Jun 2015 18.23
# Jul 2015 12.12
# Aug 2015 28.43
# Sep 2015 24.50
# Oct 2015 15.07
# Nov 2015 16.13
# Dec 2015 18.21
# Jan 2016 27.59
If you get a warning about the object timezone being different than the local timezone, you can suppress it using options(xts_check_TZ=FALSE).
Also note that this is corrected in the development version of xts on GitHub. Index classes that do not have a timezone attribute are not allowed to have a non-UTC timezone.

Counting number of month between two dates whose class is yearmon?

I need to create a new data frame from my original whose format given below.
MonthFrom MonthTo
Jan 2010 May 2010
Mar 2010 Jan 2012
Jan 2011 Jun 2011
Mar 2010 Jun 2010
Feb 2012 Mar 2012
Feb 2013 Feb 2013 #please note that these two months same.
The example data set above is from my data. I want to create a data frame as below.
Month NumberofMonth
Jan 5
Jan 12
Feb 1
Feb 2
Mar 16
Mar 4
So Generally,the function will count the number of months between two dates (whose class yearmon), and will assign this number to corresponding date. For example, If the number of months in first row is 5 and the MonthFrom in the first row is January, the function will assign the 5 to january. Can anyone help me please?
Given that the zoo type yearmon you're using allows for basic math manipulation and month name extraction with format(), the following should work for you (unless I've missed something in your requirements):
library(zoo)
my.df <- data.frame(
MonthFrom=as.yearmon(c("Jan 2010", "Mar 2010", "Jan 2011", "Mar 2010", "Feb 2012", "Feb 2013")),
MonthTo=as.yearmon(c("May 2010", "Jan 2012", "Jun 2011", "Jun 2010", "Mar 2012", "Feb 2013")))
print(my.df)
## MonthFrom MonthTo
## 1 Jan 2010 May 2010
## 2 Mar 2010 Jan 2012
## 3 Jan 2011 Jun 2011
## 4 Mar 2010 Jun 2010
## 5 Feb 2012 Mar 2012
## 6 Feb 2013 Feb 2013
new.df <- data.frame(
Month=format(my.df$MonthFrom, "%b"),
NumberOfMonth= (my.df$MonthTo - my.df$MonthFrom) * 12 + 1)
print(new.df)
## Month NumberOfMonth
## 1 Jan 5
## 2 Mar 23
## 3 Jan 6
## 4 Mar 4
## 5 Feb 2
## 6 Feb 1

Resources