ARIMA forecasts are way off - r

I am using ARIMA (auto.arima) to forecast for 52 weeks. The time series model fits the data well (see plot below, red line is the fitted value). The input data has a decreasing trend.
The forecasts (highlighted area) however seems to just taking off after the actual values end.
How can the forecasts be tamed?
dput of the input
> dput(baseTs)
structure(c(5.41951956469523, 5.49312499014084, 5.56299025716832,
5.64442852110163, 5.71385023974044, 5.77578632033402, 5.82985917237953,
5.86346591034374, 5.89626165157029, 5.92013286862512, 5.94200331713403,
5.93996840759539, 5.93917517855891, 5.90355191030718, 5.87180377346416,
5.83190030607801, 5.79624428055153, 5.75377043604686, 5.71445345904649,
5.70025269940165, 5.69789272204017, 5.73728731204876, 5.77015169357394,
5.78936321107329, 5.80113284575595, 5.79449448552444, 5.78193215198878,
5.74003482344406, 5.71694163930612, 5.66689345413153, 5.614357635737,
5.58578389962286, 5.55824727570498, 5.58495146060423, 5.61344117957187,
5.63637441850401, 5.65948408172102, 5.65558124383951, 5.64909390802285,
5.6664546352889, 5.68205689033408, 5.69991437586231, 5.72273650369514,
5.72006065065194, 5.71556512542993, 5.6717608006789, 5.64610326418084,
5.57193975508467, 5.49406607804055, 5.40126523530993, 5.31513540386482,
5.238437956722, 5.15362077920702, 5.11960611878249, 5.08498887979172,
5.08408134201562, 5.07361213981111, 5.04830559379816, 5.01401413448689,
5.0418662607737, 5.06947584464062, 5.08771495309317, 5.10587165060358,
5.1438369937098, 5.1815251206981, 5.2318657906363, 5.29385492077065,
5.29652029253008, 5.29998067741868, 5.28242409629194, 5.2722770646788,
5.24927444462166, 5.22226735874711, 5.16555064465208, 5.10956459841778,
5.09439240612378, 5.07617974794969, 5.04418337811006, 5.0075619037348,
4.99108423417745, 4.9874504485194, 4.99135285004736, 4.99217791657733,
4.94874445528885, 4.90320874819525, 4.84508278068469, 4.79086127023963,
4.75236840849279, 4.71431573721527, 4.71936529020481, 4.72422850167074,
4.72203091743033, 4.71732868614755, 4.71175323610448, 4.70566162766782,
4.71165837247331, 4.71767529028615, 4.75129316683193, 4.7863855803437,
4.85248191548789, 4.91865394024373, 4.9590849617955, 4.99960686851895,
5.02020678181827, 5.04201201976595, 5.02025906892952, 4.99735920720967,
4.92520279823639, 4.84822505567723, 4.81118504683572, 4.77330440072099,
4.72636395544651, 4.6861111959621, 4.64912520396312, 4.61348981514599,
4.58517820348434, 4.56378688913207, 4.549011597464, 4.52900600122321,
4.56028365470815, 4.60248987909752, 4.65628990381626, 4.70496326660038,
4.73779351647955, 4.76616725791407, 4.79569018347378, 4.83185281078024,
4.85177852259102, 4.87488251014986, 4.89468916229158, 4.9077984323135,
4.92375782591088, 4.96363767543938, 5.05416277704822, 5.1426680212522,
5.232495043331, 5.32153608753653, 5.41780853915163, 5.51131526881126,
5.62791210324026), .Tsp = c(2015.05769230769, 2017.73076923077,
52), class = "ts")
The code used
fc <- try(auto.arima(baseTs,ic='aic',approximation = F))
baseFc <- forecast(fc,h = weeks_forecasted)
baseVolume_forecast_new <- baseFc$mean
What could be the reason behind the forecasts exploding?

Related

Error in xy.coords when trying to fit ARIMA model, please advise

I'm hoping you might be able to help me with an issue that I'm having when trying to fit an ARIMA model for a school project that I`m working on.
The data that I'm using shows weekly sales figures starting from 2019 and going till 2021. My goal is to produce a forecast for the remainder of 2021 based on those figures. As my dataset is comprised of weekly data and the seasonality based on the ACF and PACF plots seems to occur once a year I've set the "S =" argument from the sarima() function to 52. The problem is that every time I try to run the model, I keep getting an error and I can't figure out any way of getting rid of it.
I've tried to use the same code with other data sets on the datacamp environment with "S = 52" and the model runs without a problem. I'm hoping that somebody might be able to give me some advice on how to deal with this issue. Thank you!
P.S.
If the "S =" argument is set lower than 35 then the model will run. (Just in case this information might help)
####Install Packages####
library(tidyverse)
library(zoo)
library(xts)
library(lubridate)
library(astsa)
library(tseries)
library(forecast)
######Load and inspect the data########
unit_sales <- structure(list(Date = c("30/03/2019", "06/04/2019", "13/04/2019",
"20/04/2019", "27/04/2019", "04/05/2019", "11/05/2019", "18/05/2019",
"25/05/2019", "01/06/2019", "08/06/2019", "15/06/2019", "22/06/2019",
"29/06/2019", "06/07/2019", "13/07/2019", "20/07/2019", "27/07/2019",
"03/08/2019", "10/08/2019", "17/08/2019", "24/08/2019", "31/08/2019",
"07/09/2019", "14/09/2019", "21/09/2019", "28/09/2019", "05/10/2019",
"12/10/2019", "19/10/2019", "26/10/2019", "02/11/2019", "09/11/2019",
"16/11/2019", "23/11/2019", "30/11/2019", "07/12/2019", "14/12/2019",
"21/12/2019", "28/12/2019", "04/01/2020", "11/01/2020", "18/01/2020",
"25/01/2020", "01/02/2020", "08/02/2020", "15/02/2020", "22/02/2020",
"29/02/2020", "07/03/2020", "14/03/2020", "21/03/2020", "28/03/2020",
"04/04/2020", "11/04/2020", "18/04/2020", "25/04/2020", "02/05/2020",
"09/05/2020", "16/05/2020", "23/05/2020", "30/05/2020", "06/06/2020",
"13/06/2020", "20/06/2020", "27/06/2020", "04/07/2020", "11/07/2020",
"18/07/2020", "25/07/2020", "01/08/2020", "08/08/2020", "15/08/2020",
"22/08/2020", "29/08/2020", "05/09/2020", "12/09/2020", "19/09/2020",
"26/09/2020", "03/10/2020", "10/10/2020", "17/10/2020", "24/10/2020",
"31/10/2020", "07/11/2020", "14/11/2020", "21/11/2020", "28/11/2020",
"05/12/2020", "12/12/2020", "19/12/2020", "26/12/2020", "02/01/2021",
"09/01/2021", "16/01/2021", "23/01/2021", "30/01/2021", "06/02/2021",
"13/02/2021", "20/02/2021", "27/02/2021", "06/03/2021", "13/03/2021",
"20/03/2021", "27/03/2021"), Units = c(967053.4, 633226.9, 523264,
473914.2, 418087.5, 504342.2, 477819, 415650, 406972.3, 429791.4,
441724.4, 453221.8, 402005.8, 414993.4, 381457.2, 391218.7, 486925.9,
409791.8, 399217.9, 409210, 478121.2, 495549.1, 503918.3, 535949.5,
517450.4, 523036.8, 616456.9, 665979.3, 705201.5, 700168.1, 763538.8,
875501.2, 886586.6, 967806, 1094195, 1285950.5, 1450436.1, 1592162.8,
2038160.5, 1676988.8, 1026193.7, 820405.5, 738643.9, 669657.6,
720287.7, 673194.1, 754102.5, 639532, 680413.6, 710702, 711722.8,
834036.8, 427817.2, 505849.6, 441047.4, 439411, 487634.1, 594594.8,
548796.7, 565682, 528275.2, 448092, 467780.1, 544160.3, 538275.8,
485055.5, 592097.3, 537514.3, 493381.9, 445280.8, 448111.2, 419263.4,
457125.7, 561169.6, 704575.3, 656423.1, 653751.3, 622937.7, 718022.8,
768901.9, 793443, 814604.2, 876269.3, 982921.8, 1064920.7, 1201494.4,
1337374.9, 1619595.8, 1734773.8, 1624071, 1777832.3, 1648201.9,
1106253.8, 940141.1, 796129.1, 853392.9, 932059.1, 905990.4,
981188.6, 907823.9, 956098.8, 1003966.7, 1331125.5, 805593.6,
799486.2)), class = "data.frame", row.names = c(NA, -105L))
####Convert date column to date format
unit_sales$Date <- as.Date(unit_sales$Date, format ="%d/%m/%Y" )
###Convert to xts object
unit_sales_xts <- xts(unit_sales, unit_sales$Date)
periodicity(unit_sales_xts)
###Convert to ts object
unit_sales_vector <- unit_sales$Units
unit_sales_ts <- ts(unit_sales_vector, start = decimal_date(as.Date("2019-03-30")), frequency = 52)
###Plot data
ts.plot(unit_sales_ts)
###Make data stationary and plot it
ts.plot(diff(log(unit_sales_ts)))
###Plot ACF and PACF
pacf_plot <- pacf(diff(log(unit_sales_ts)), lag.max = 105)
acf_plot <- acf(diff(log(unit_sales_ts)), lag.max = 105)
###Test if data is stationary
adf.test(diff(log(unit_sales_ts)))
###Fit ARIMA model
sarima(unit_sales_ts, p = 1, d = 1, q = 0)
sarima.for(unit_sales_ts, n.ahead = 39, 1,1,0)
**###Fit Seasona ARIMA model - THIS IS WHERE THE ERROR OCCURS -**
sarima(unit_sales_ts, p = 1, d = 1, q = 0, P = 0, D = 1, Q = 0, S = 52)
###Forecast using the above model
sarima.for(unit_sales_ts,n.ahead = 39, p = 1, d = 1, q = 0, P = 0, D = 1, Q = 0, S = 52)
I tested you code and get the same error, so I read into the astsa::sarima() implementation and found these two lines, concerning the use of seasonality and your data:
alag <- max(10 + sqrt(num), 3 * S)
nlag <- ifelse(S < 7, 20, 3 * S)
Without reading the whole implementation, I deduce, that the package creator suposes 3 times the season size for the parameter to work correctly. Which is not your case with 105 observation when using S = 52. Now if that is a bug or just not well documented or properly treated in the code, I can not tell you. I do not know which version of the package datacamp runs and what is the update history of the package itself. But we can assume that at least one of the two lines causes the error since all values from 35 for S cause the same error.
One way to work arround is printing the implementation code of the function to console (just write "astsa::sarima" and hit enter, without the " though), copy it to modify the lines (I tried to use 2 * instead of 3 *) and assing it to a function name of your own. Then the code runs. Also you could try the print at the datacamp environment and compare to you local installation.

Can I account for a shift in a timeseries from quarter x onwards in the seas() package

I want to seasonally adjust the following time series:
s <- c(37.74991 , 37.77515 , 37.75384 , 37.94780 , 39.77417, 39.79430 , 39.81612,
39.99021, 42.20018, 42.22210, 42.26033, 42.45065, 45.07874, 45.15395,
45.22697, 45.51945, 47.79920, 47.85800, 47.92614, 48.25480, 49.91503,
49.91005, 50.04165, 50.42833, 51.72783, 51.60934, 51.61586, 51.89149,
55.29518, 55.17836, 55.22256, 55.55186, 57.44819, 57.25458, 57.25675,
57.68244, 59.67495, 59.42791, 59.44048, 59.87237, 61.81162, 61.60417,
61.63524, 62.06323, 63.39596, 63.19328, 63.14214, 63.49329, 65.63239,
65.46829, 65.48508, 65.65813, 69.06712, 68.92904, 68.94045, 69.18806,
73.69241, 73.47518, 73.42786, 73.82185, 69.54005, 69.59465, 69.85418,
70.30283, 73.80306, 73.69583, 73.69901, 74.12954)
s <-ts(s, start = c(1999, 1), frequency = 4)
There is an intended jump in the values of s from t=21 onwards. Therefore, I create a Dummy:
t <- rep_len(1,68)
t[1:20] <- 0
t <-ts(t, start = c(1999, 1), frequency = 4)
A X13-ARIMA model to seasonally adjust the data including the dummy
s_a <- final(seas(s, outlier = NULL, xreg = t))
gives me the error:
Error: X-13 run failed
Errors:
- forecasts end date, 2018.4, must end on or before user-defined
regression variables end date, 2015.4.
But actually, I am not interested in a forecast but the adjusted time series whithout a jump. So can I turn the forecast off or do I have to specify the seas() function differently?

How to get years from a time series when the plot is displaying time series of a monthly freq (Rbeast )

I´d like to extract years from a time series index (the underlying time series is of monthly frequency). The reason I want to do it is creating a yearly axis.
I am using a package Rbeast.
Here is my result for tsp(NDVI_Chobe001.ts)
[1] 2002.500 2020.417 12.000
##Here is a more detailed time series
time(NDVI_Chobe001.ts)
Jan Feb Mar Apr May Jun Jul Aug Sep
2002 2002.500 2002.583 2002.667
2003 2003.000 2003.083 2003.167 2003.250 2003.333 2003.417 2003.500 2003.583 2003.667
2004 2004.000 2004.083 2004.167 2004.250 2004.333 2004.417 2004.500 2004.583 2004.667
2005 2005.000 2005.083 2005.167 2005.250 2005.333 2005.417 2005.500 2005.583 2005.667
2006 2006.000 2006.083 2006.167 2006.250 2006.333 2006.417 2006.500 2006.583 2006.667
2007 2007.000 2007.083 2007.167 2007.250 2007.333 2007.417 2007.500 2007.583 2007.667
2008 2008.000 2008.083 2008.167 2008.250 2008.333 2008.417 2008.500 2008.583 2008.667
2009 2009.000 2009.083 2009.167 2009.250 2009.333 2009.417 2009.500 2009.583 2009.667
2010 2010.000 2010.083 2010.167 2010.250 2010.333 2010.417 2010.500 2010.583 2010.667
2011 2011.000 2011.083 2011.167 2011.250 2011.333 2011.417 2011.500 2011.583 2011.667
2012 2012.000 2012.083 2012.167 2012.250 2012.333 2012.417 2012.500 2012.583 2012.667
2013 2013.000 2013.083 2013.167 2013.250 2013.333 2013.417 2013.500 2013.583 2013.667
2014 2014.000 2014.083 2014.167 2014.250 2014.333 2014.417 2014.500 2014.583 2014.667
2015 2015.000 2015.083 2015.167 2015.250 2015.333 2015.417 2015.500 2015.583 2015.667
2016 2016.000 2016.083 2016.167 2016.250 2016.333 2016.417 2016.500 2016.583 2016.667
2017 2017.000 2017.083 2017.167 2017.250 2017.333 2017.417 2017.500 2017.583 2017.667
2018 2018.000 2018.083 2018.167 2018.250 2018.333 2018.417 2018.500 2018.583 2018.667
2019 2019.000 2019.083 2019.167 2019.250 2019.333 2019.417 2019.500 2019.583 2019.667
2020 2020.000 2020.083 2020.167 2020.250 2020.333 2020.417
Oct Nov Dec
2002 2002.750 2002.833 2002.917
2003 2003.750 2003.833 2003.917
2004 2004.750 2004.833 2004.917
2005 2005.750 2005.833 2005.917
2006 2006.750 2006.833 2006.917
2007 2007.750 2007.833 2007.917
2008 2008.750 2008.833 2008.917
2009 2009.750 2009.833 2009.917
2010 2010.750 2010.833 2010.917
2011 2011.750 2011.833 2011.917
2012 2012.750 2012.833 2012.917
2013 2013.750 2013.833 2013.917
2014 2014.750 2014.833 2014.917
2015 2015.750 2015.833 2015.917
2016 2016.750 2016.833 2016.917
2017 2017.750 2017.833 2017.917
2018 2018.750 2018.833 2018.917
2019 2019.750 2019.833 2019.917
2020
##Here is the dput results
Also this is the result of dput of the time series
dput(NDVI_Chobe001.ts)
structure(c(0.258672185, 0.237639041, 0.223988035, 0.22397988,
0.28132144, 0.387710909, 0.556186453, 0.719580311, 0.443294248,
0.314433357, 0.292755672, 0.278023297, 0.246809774, 0.230477039,
0.228955071, 0.234966762, 0.396659718, 0.491330645, 0.716274496,
0.73417416, 0.576279481, 0.479403466, 0.423930923, 0.377550602,
0.313801774, 0.297335261, 0.27886131, 0.285054814, 0.323942137,
0.482912961, 0.544500134, 0.692623308, 0.512637643, 0.526592884,
0.432898755, 0.331892323, 0.294543398, 0.274633904, 0.247602217,
0.24889248, 0.268682448, 0.623132479, 0.732958587, 0.794572999,
0.697092229, 0.588587711, 0.389213056, 0.348485514, 0.282932917,
0.264831569, 0.254937301, 0.277647575, 0.342873991, 0.450392312,
0.631838679, 0.716638566, 0.508704542, 0.423229761, 0.324663042,
0.321186864, 0.295935102, 0.255618784, 0.257619908, 0.27254904,
0.270580191, 0.432220414, 0.585539863, 0.716581759, 0.593955107,
0.559614127, 0.388112159, 0.386833323, 0.323056858, 0.296201373,
0.299547175, 0.340499135, 0.487932489, 0.531619232, 0.755898976,
0.630418376, 0.624101053, 0.46624194, 0.407831925, 0.396696504,
0.348860597, 0.311130303, 0.301932775, 0.359536613, 0.389388896,
0.639178419, 0.693073002, 0.544238686, 0.608043068, 0.520223438,
0.489590537, 0.371915235, 0.322345492, 0.285747424, 0.262060636,
0.290601893, 0.272739968, 0.465184219, 0.597999142, 0.58280379,
0.498312536, 0.351555151, 0.313456794, 0.30176279, 0.272389062,
0.256200802, 0.257570355, 0.261360949, 0.280664516, 0.472871998,
0.635346196, 0.809469884, 0.708410897, 0.454619991, 0.374153554,
0.31216354, 0.277643235, 0.273912332, 0.279202729, 0.274954368,
0.3220379, 0.66542086, 0.786922135, 0.749038686, 0.494324261,
0.380828443, 0.31699487, 0.293321759, 0.275010923, 0.263032267,
0.254334753, 0.270928799, 0.319622211, 0.699834174, 0.733615599,
0.743950233, 0.701439492, 0.514223884, 0.414668945, 0.353937354,
0.304874948, 0.263697644, 0.25397833, 0.270116643, 0.310931558,
0.529621168, 0.769325391, 0.676328487, 0.561630209, 0.550257129,
0.408412794, 0.362071348, 0.294350827, 0.271100888, 0.26194204,
0.265314367, 0.311826075, 0.556399621, 0.586507353, 0.715697274,
0.71326184, 0.554275685, 0.423167172, 0.332214247, 0.316422341,
0.264796933, 0.253106015, 0.274547234, 0.274233387, 0.331729551,
0.568674796, 0.678751599, 0.651238604, 0.550351642, 0.50121297,
0.39598441, 0.373748125, 0.342521719, 0.308862893, 0.368914514,
0.346751371, 0.590197072, 0.550842239, 0.641805925, 0.73276961,
0.616976479, 0.501595155, 0.441702349, 0.420716154, 0.297677153,
0.302351869, 0.347572841, 0.347128221, 0.488411226, 0.739632011,
0.773688839, 0.527578039, 0.531276077, 0.481584383, 0.450086834,
0.331415825, 0.298545112, 0.281891087, 0.301013691, 0.334150401,
0.537372378, 0.756594273, 0.778707894, 0.728867792, 0.634829201,
0.415475576, 0.353712963), .Tsp = c(2002.5, 2020.41666666667,
12), class = "ts")
Here is my code
#Here I am Specifying the option parameters explicitly
opt=list() #Create an empty list to append individual model parameters
opt$period=12 #Period of the cyclic/seasonal component of the modis time series
opt$startTime=2002.500000 ##specify start time
Here I am plotting the data
out=beast(NDVI_Chobe001.ts, opt)
p<-plot(out)
However, the plot time series is in monthly frequency , see example here[enter image description here][1]
##Now I am extracting years from the time series to use when plotting
time <- as.yearmon( time(NDVI_Chobe001.ts))
time<-format.Date(x,"%Y")
plot(out,axis(1,time, labels = TRUE))
But this is not working, it is still displayed in monthly frequency, How do I lot my time series by year?
[1]: https://i.stack.imgur.com/VNxJC.jpg
What I tried is something like this:
Your time series data:
df <- structure(c(0.258672185, 0.237639041, 0.223988035, 0.22397988,
0.28132144, 0.387710909, 0.556186453, 0.719580311, 0.443294248,
0.314433357, 0.292755672, 0.278023297, 0.246809774, 0.230477039,
0.228955071, 0.234966762, 0.396659718, 0.491330645, 0.716274496,
0.73417416, 0.576279481, 0.479403466, 0.423930923, 0.377550602,
0.313801774, 0.297335261, 0.27886131, 0.285054814, 0.323942137,
0.482912961, 0.544500134, 0.692623308, 0.512637643, 0.526592884,
0.432898755, 0.331892323, 0.294543398, 0.274633904, 0.247602217,
0.24889248, 0.268682448, 0.623132479, 0.732958587, 0.794572999,
0.697092229, 0.588587711, 0.389213056, 0.348485514, 0.282932917,
0.264831569, 0.254937301, 0.277647575, 0.342873991, 0.450392312,
0.631838679, 0.716638566, 0.508704542, 0.423229761, 0.324663042,
0.321186864, 0.295935102, 0.255618784, 0.257619908, 0.27254904,
0.270580191, 0.432220414, 0.585539863, 0.716581759, 0.593955107,
0.559614127, 0.388112159, 0.386833323, 0.323056858, 0.296201373,
0.299547175, 0.340499135, 0.487932489, 0.531619232, 0.755898976,
0.630418376, 0.624101053, 0.46624194, 0.407831925, 0.396696504,
0.348860597, 0.311130303, 0.301932775, 0.359536613, 0.389388896,
0.639178419, 0.693073002, 0.544238686, 0.608043068, 0.520223438,
0.489590537, 0.371915235, 0.322345492, 0.285747424, 0.262060636,
0.290601893, 0.272739968, 0.465184219, 0.597999142, 0.58280379,
0.498312536, 0.351555151, 0.313456794, 0.30176279, 0.272389062,
0.256200802, 0.257570355, 0.261360949, 0.280664516, 0.472871998,
0.635346196, 0.809469884, 0.708410897, 0.454619991, 0.374153554,
0.31216354, 0.277643235, 0.273912332, 0.279202729, 0.274954368,
0.3220379, 0.66542086, 0.786922135, 0.749038686, 0.494324261,
0.380828443, 0.31699487, 0.293321759, 0.275010923, 0.263032267,
0.254334753, 0.270928799, 0.319622211, 0.699834174, 0.733615599,
0.743950233, 0.701439492, 0.514223884, 0.414668945, 0.353937354,
0.304874948, 0.263697644, 0.25397833, 0.270116643, 0.310931558,
0.529621168, 0.769325391, 0.676328487, 0.561630209, 0.550257129,
0.408412794, 0.362071348, 0.294350827, 0.271100888, 0.26194204,
0.265314367, 0.311826075, 0.556399621, 0.586507353, 0.715697274,
0.71326184, 0.554275685, 0.423167172, 0.332214247, 0.316422341,
0.264796933, 0.253106015, 0.274547234, 0.274233387, 0.331729551,
0.568674796, 0.678751599, 0.651238604, 0.550351642, 0.50121297,
0.39598441, 0.373748125, 0.342521719, 0.308862893, 0.368914514,
0.346751371, 0.590197072, 0.550842239, 0.641805925, 0.73276961,
0.616976479, 0.501595155, 0.441702349, 0.420716154, 0.297677153,
0.302351869, 0.347572841, 0.347128221, 0.488411226, 0.739632011,
0.773688839, 0.527578039, 0.531276077, 0.481584383, 0.450086834,
0.331415825, 0.298545112, 0.281891087, 0.301013691, 0.334150401,
0.537372378, 0.756594273, 0.778707894, 0.728867792, 0.634829201,
0.415475576, 0.353712963), .Tsp = c(2002.5, 2020.41666666667,
12), class = "ts")
Create the dataframe with proper dates:
# this will extract, convert and create a Date column from your data
df1 <- data.frame(as.matrix(df), date=time(df))
df1$year <- trunc(df1$date)
df1$month <- (df1$date - df1$year) * 12 + 1
df1$month <- as.integer(round(df1$month),0)
df1$Day <- 1
names(df1) <- c("Data", "Original_Date", "Year", "Month", "Day")
df1$Date <- paste(df1$Year, df1$Month, df1$Day, sep="-")
You can review, if all is good:
# check the structure
str(df1)
Finally, plot it:
# plot the data
library(ggplot2)
theme_set(theme_minimal())
ggplot(data = df1, aes(x = Date, y = Data)) + geom_line(color = "#00AFBB", size = 1)
A bit closer
plot(y=out$s, x=out$time, type='l') #The same as plot(out$s[,1]): plot the seasonal curve
plot(y=out$sProb, x=out$time, type='l') #Plot the probability of observing seasonal changepoints
plot(y=out$t, x=out$time, type='l') #The same as plot(out$t[,1]): plot the trend
plot(y=out$sProb, x=out$time, type='l') #Plot the probability of observing trend changepoints11
I know this question is old and my late reply may be irrelevant. Regardless, in the old version of Rbeast, if given a ts input, the beast function uses only its data vector not the time tag information (i.e., .tsp attributes such as start and frequency). In other words, the time info has to be additionally provided to beast; if not, the indices 1:N are used as default times. In the latest version (Rbeast v0.9.2), I changed the API a little bit; now beast can handle the time tag of ts objects.
Below is a test using your sample data for time series decomposition and changepoint detection.
ndvi= c( 0.258,0.237,0.223,0.223,0.281,0.387,0.556,0.719,0.443,0.314,0.292,0.278,0.246,0.230,0.228,0.234,0.396,0.491,0.716, 0.734,0.576,0.479,0.423,0.377,0.313,0.297,0.278,0.285,0.323,0.482,0.544,0.692,0.512,0.526,0.432,0.331,0.294,0.274,
0.247,0.248,0.268,0.623,0.732,0.794,0.697,0.588,0.389,0.348,0.282,0.264,0.254,0.277,0.342,0.450,0.631,0.716,0.508, 0.423,0.324,0.321,0.295,0.255,0.257,0.272,0.270,0.432,0.585,0.716,0.593,0.559,0.388,0.386,0.323,0.296,0.299,0.340,
0.487,0.531,0.755,0.630,0.624,0.466,0.407,0.396,0.348,0.311,0.301,0.359,0.389,0.639,0.693,0.544,0.608,0.520,0.489, 0.371,0.322,0.285,0.262,0.290,0.272,0.465,0.597,0.582,0.498,0.351,0.313,0.301,0.272,0.256,0.257,0.261,0.280,0.472,
0.635,0.809,0.708,0.454,0.374,0.312,0.277,0.273,0.279,0.274,0.322,0.665,0.786,0.749,0.494,0.380,0.316,0.293,0.275, 0.263,0.254,0.270,0.319,0.699,0.733,0.743,0.701,0.514,0.414,0.353,0.304,0.263,0.253,0.270,0.310,0.529,0.769,0.676,
0.561,0.550,0.408,0.362,0.294,0.271,0.261,0.265,0.311,0.556,0.586,0.715,0.713,0.554,0.423,0.332,0.316,0.264,0.253, 0.274,0.274,0.331,0.568,0.678,0.651,0.550,0.501,0.395,0.373,0.342,0.308,0.368,0.346,0.590,0.550,0.641,0.732,0.616,
0.501,0.441,0.420,0.297,0.302,0.347,0.347,0.488,0.739,0.773,0.527,0.531,0.481,0.450,0.331,0.298,0.281,0.301,0.334, 0.537,0.756,0.778,0.728,0.634,0.415,0.353)
ndvi is a data vector; create a ts object out of it as the input to beast:
ndvi_ts = ts(ndvi, start=2002.5, frequency = 12)
out = beast(ndvi_ts)
plot(out)
# By default, seasonality is fitted as a harmonic curve; use SVD-based bases instead
out = beast(ndvi_ts, season='svd')
plot(out)
# If the input is a pure data vector, the time info has to be specified
# via 'start', 'deltat' (delta T), and 'freq'; otherwise, default times
# (i.e., 1:216) would be used.
out = beast( ndvi, start=2002.5, deltat=1/12, freq=12, season='svd')
plot(out)
The beast output is a LIST object of class 'beast' and can't be directly plotted with ggplot2. The plot function above is not R's own function but Rbeast's implementation of the S3 method 'plot.beast'. If customary plots are needed, individual variables of the output can be extracted for use in R's base plot or ggplot2. Below is an example:
out=beast(ndvi_ts, season='svd')
# out is a LIST variable; extract some elements
t = out$time # time
y = out$trend$Y # fitted trend compnt
yci = out$trend$CI[,,1] # estimated 95% CI interval for trend
cp = out$trend$cp # most probable changepoints in trend
ncp = sum(!is.nan(cp)) # number of changepoints given in cp
ymax=max(yci)
ymin=min(yci)
plot( x=c(min(t),max(t)), y=c(ymin,ymax), type='n', xlab='time', ylab='trend' )
lines(t,y)
polygon( x=c(t,rev(t)), y=c( yci[,1], rev(yci[,2]) ) , col=rgb(0,1,0,.2),border=NA )
yrange=par('usr')[3:4]
for (i in 1:ncp) {
lines( c(cp[i], cp[i] ), y=yrange, type='l', lty='dashed' )
}
If multiple subplots are needed, one way is to use R's par(new=TRUE) option, as shown below.
t = out$time # time
y = out$trend$Y # fitted trend compnt
yci = out$trend$CI[,,1] # estimated 95% CI interval for trend
cp = out$trend$cp # most probable changepoints in trend
ncp = sum(!is.nan(cp)) # number of changepoints given in cp
prob = out$trend$cpOccPr # probability of changepoints occurrence over time
ymax=max(yci)
ymin=min(yci)
plot.new()
# FIRST PLOT: Fine-tune the four margin numbers in plt to re-position
par(plt = c(0.15, 1-0.01, 1-0.01-.5, 1-0.01-.5+0.4),new=TRUE)
plot( x=c(min(t),max(t)), y=c(ymin,ymax), type='n', xaxt='n', xlab=NA, ylab='trend')
lines( t, y )
polygon( x=c( t, rev(t) ), y=c( yci[,1], rev(yci[,2]) ) , col=rgb(0,1,0,.2),border=NA )
yrange=par('usr')[3:4]
for (i in 1:ncp) {
lines( c(cp[i], cp[i] ), y=yrange, type='l', lty='dashed' )
}
# SECOND PLOT: Fine-tune the four margin numbers in plt to re-position
par(plt = c(0.15, 1-0.01, 1-0.01-.5-0.3, 1-0.01-.5-0.3+0.3),new=TRUE)
plot( x=t, y=prob, type='l', xlab='time', ylab='Prob' )
Not sure how useful this is. In any case, thanks for testing out Rbeast. If new features are desired, pls let me know.

Building and analysing trends in time series

I need advice about building time series. I have a bunch of files with monthly data for sea surface temperature for an number of locations across 408 months. I have aggregated monthly values in a data frame with the following structure
longitude, latitude, SST for month 1, SST for month 2, .... SST for month n
This is just a small piece of the data frame so you can see
dput(sst_subset)
structure(list(lon = c(-19.875, -19.625, -19.375, -19.125), lat = c(30.125,
30.125, 30.125, 30.125), sst = c(293.197412803228, 293.092251515256,
292.999348291526, 293.013219258958), sst.1 = c(292.490350607051,
292.504279178168, 292.502850606771, 292.438922036772), sst.2 = c(291.994832184947,
291.887412832509, 291.832896704695, 291.810638640677), sst.3 = c(292.095993473008,
292.066660140331, 292.091993473098, 292.110326806021), sst.4 = c(293.071606354427,
293.095799902274, 293.106445063326, 293.116122482465), sst.5 = c(294.981993408501,
294.996326741514, 295.004660074661, 295.018993407674), sst.6 = c(295.568703072806,
295.600315975326, 295.597735330222, 295.49418694544), sst.7 = c(296.250961122073,
296.175154672154, 296.079348222683, 296.052251449095)), .Names = c("lon",
"lat", "sst", "sst.1", "sst.2", "sst.3", "sst.4", "sst.5", "sst.6",
"sst.7"), row.names = c(NA, 4L), class = "data.frame")
To build a time series I have extracted a row of the data frame, that corresponds to all the monthly values in a location (defined by longitude and latitude), transposed to a column and created a new data frame
ncolumnes<-ncol(sst_all)
sst_point1<-sst_all[1:3,ncolumnes]
sst1_df <- as.data.frame(t(sst_point1))
dput(sst1_ts)
structure(c(293.197412803228, 292.490350607051, 291.994832184947,
292.095993473008, 293.071606354427, 294.981993408501, 295.568703072806,
296.250961122073, 296.73166003606, 296.385154667461, 294.611660083445,
293.484186990367, 292.372896692626, 291.348207775437, 291.627090257683,
291.957326809441, 292.71063862056, 293.545326773947, 295.897412742879,
296.671928854599, 296.681326703851, 296.483864342674, 294.934660076226,
293.76709020985, 292.45870314232, 291.399993488565, 291.446767681068,
291.918993476964, 292.889025713347, 293.71099343691, 294.01418697852,
296.219025638916, 296.90166003226, 296.119993383065, 294.936326742855,
293.405154734069, 291.834509607885, 291.638564911804, 291.527412840556,
292.055326807251, 292.020961216621, 294.573660084295, 295.850315969738,
295.978380483004, 296.863660033109, 297.228380455065, 296.00866005222,
294.711606317771, 293.067735386772, 291.577136341748, 291.426445100877,
291.602993484028, 292.42096120768, 293.742993436195, 294.709348253305,
295.973219192797, 296.913993365318, 296.213219187433, 294.494326752735,
293.59225150408, 292.492251528667, 291.838207764485, 292.225477341082,
292.385993466526, 294.063864396765, 295.407326732328, 295.98386435385,
297.471928836718, 297.880660010378, 297.070638523107, 294.419993421063,
293.154509578381, 292.307735403759, 291.263441767479, 291.197412847932,
292.566660129155, 293.590316020253, 294.627660083088, 295.085477277156,
296.166122414292, 296.608660038809, 296.143864350273, 294.568660084407,
293.292251510786, 292.269670888481, 291.425350630855, 291.424832197687,
291.351326822986, 292.945799905626, 296.319660045269, 297.158380456629,
297.712251411991, 297.68699334804, 296.391928860858, 294.519660085502,
292.856445068914, 291.953864443927, 291.813922050742, 291.561606388179,
291.680660148958, 293.242574092542, 294.903326743593, 295.748057907507,
297.715799799009, 298.00999334082, 297.161606263009, 295.690326726002,
294.133541814562, 292.727412813734, 292.312493468169, 291.931928960546,
291.646326816392, 291.639670902563, 293.339326778551, 295.357090174311,
297.108703038385, 298.576993328147, 296.577735308317, 295.347660066995,
293.425154733622, 292.446445078078, 291.951027959007, 291.967735411359,
291.957993476093, 292.77838055453, 294.320326756624, 295.738703069007,
296.466122407586, 296.747993369028, 296.3506385392, 294.958326742363,
293.579348278562, 292.182574116234, 291.279279205549, 291.659993482754,
291.872993477993, 292.670316040816, 294.635326749583, 295.305477272238,
296.348057894096, 297.221993358433, 296.08612241608, 294.042993429489,
292.95160635711, 292.009670894293, 291.243207777784, 290.859025758721,
291.319993490353, 292.587412816863, 294.628660083066, 294.788057928965,
296.454832085258, 296.454326708925, 296.265477250781, 295.604326727924,
294.013219236607, 293.043541838926, 292.523922034872, 292.038703151708,
292.477326797818, 294.406122453631, 295.478993397392, 296.886122398199,
297.362251419814, 297.879993343726, 296.978703041291, 295.939326720436,
293.980638592173, 293.048703129133, 291.979993475601, 291.462896712966,
292.266326802534, 293.046445064667, 294.074993428774, 295.435477269333,
296.886122398199, 297.262660024191, 296.517090148383, 295.193326737111,
293.43967086233, 292.486122496546, 292.043564902752, 291.806767673021,
292.480660131077, 293.707735372467, 295.127326738586, 295.877735323964,
296.78192885214, 297.788326679108, 297.02450949188, 295.75766005783,
294.890315991195, 293.371606347722, 292.426422037051, 292.379670886022,
292.746993458457, 293.078057967186, 294.512993418984, 295.54612242815,
296.109348222013, 297.133660027074, 296.816767561039, 295.519326729824,
294.220638586809, 292.947412808816, 291.781422051468, 291.450638648723,
292.118660139168, 293.846122466148, 294.885993410647, 295.964832096211,
297.745154637062, 298.001326674347, 297.287735292448, 295.068993406557,
293.324509574581, 291.593864451974, 291.534821071758, 291.633219289804,
292.017993474752, 292.164187019871, 293.516660107921, 295.506122429044,
296.33321918475, 297.117660027432, 296.34741273282, 294.993660074907,
293.8032192413, 293.077735386549, 292.511779178, 292.344832177124,
292.459326798221, 293.437412797864, 295.860326722202, 296.416444989342,
297.083864329263, 298.678993325867, 297.782251410427, 295.657993393391,
293.652251502739, 293.274186995061, 292.307136325432, 291.922251541408,
291.564993484877, 292.452574110199, 293.996326763866, 294.823219218502,
296.541283696229, 297.421660020637, 296.747735304518, 295.771993390843,
294.041928913384, 293.317090219908, 292.421422037163, 292.680316040593,
292.577660128909, 293.240316028076, 295.254993402399, 296.815477238487,
297.524186900066, 298.126326671553, 297.598380446795, 295.563326728841,
294.207735361291, 293.43805795914, 293.115855519178, 292.753864426046,
292.466993464716, 292.925154744798, 296.035326718291, 296.538380470487,
298.612573972513, 298.241993335634, 297.065154652261, 295.770993390866,
293.72934827521, 292.379670886022, 291.370350632085, 291.601928967922,
292.473326797908, 293.597412794288, 294.678993415274, 296.042896610595,
297.383541741919, 297.729326680427, 296.714186918171, 295.008993407898,
293.465154732728, 292.365154757315, 292.279993468896, 291.722896707154,
292.651993460581, 293.469670861659, 295.145993404835, 296.262896605677,
297.257090131842, 297.550326684428, 297.544832060895, 296.194326714737,
294.499670838637, 293.095799902274, 292.836064885038, 292.445799916802,
292.78566012426, 293.216445060867, 294.3869934218, 295.256767595908,
296.333864346026, 296.692993370257, 296.250315960797, 295.23466006952,
293.713864404588, 292.874187004001, 292.378614156346, 291.931606379908,
292.099326806267, 293.999348269175, 295.055660073521, 296.170638543223,
296.729670788792, 297.024993362837, 296.646444984201, 294.817993412167,
293.368057960704, 292.39579991792, 291.174279207896, 291.343541876924,
291.974660142387, 292.742574103717, 294.785993412882, 296.685477241393,
297.067735297365, 297.318326689613, 297.265154647791, 296.419993376359,
294.439993420616, 293.224509576816, 293.140707735371, 292.928057970539,
293.028326785502, 293.116767643741, 294.067993428931, 295.034832116997,
296.24192886421, 297.204660025487, 297.0212836855, 295.618993394263,
294.195477297049, 293.26644505975, 292.1507077575, 291.842574123834,
292.212326803741, 292.898380551848, 293.698660103853, 294.868057927177,
296.104832093081, 297.440660020212, 296.802574012969, 295.234993402846,
293.692574082483, 292.617090235554, 291.535510726915, 291.344832199475,
292.175660137894, 293.799025693007, 295.795993390307, 296.195799832983,
297.432573998888, 298.643659993323, 297.612251414226, 296.027326718469,
294.692896640769, 293.446122475089, 292.611779175765, 292.494832173771,
293.027326785525, 293.948380528378, 294.144326760558, 295.259670821649,
296.524509503055, 297.014660029734, 296.854832076317, 295.413326732193,
294.306122455866, 292.857735391466, 291.982493475545, 291.549025743299,
292.710993459262, 293.044832161478, 294.210660092408, 296.063864352061,
296.959993364289, 298.161660004097, 297.040315943139, 295.179326737424,
293.474509571228, 292.265799920826, 291.409993488342, 291.042574141715,
291.81732681257, 293.374186992826, 294.908993410133, 296.215799832536,
297.686767541593, 298.667326659461, 297.63999334909, 295.589993394911,
294.077412783559), .Dim = c(408L, 1L), .Dimnames = list(NULL,
"1"), .Tsp = c(1982, 2015.91666666667, 12), class = "ts")
and then decompose in its additive trend, seasonal and random components and remove seasonal component from original data
sst1_dec<-decompose(sst1_ts)
sst1_noseason<-sst1_ts - sst1_dec$seasonal
Now, how do I get a linear regression for this data (sst1_noseason)? I have tried lm() but as there is only single var in the dataframe I think I can't. Should I build a new date column (time) with monthly dates and then run lm (sst ~ time)?
Is there any other R package that deals with time series that can do better? I have looked at ggseas and tidyr, they seem promising but maybe I need to build than date column to run this analysis in any case.
My final objective is to have a single value for the trend in each longitude and latitude point and plot a map to look for the areas with the highest climatic trend for sea surface temperature.
Maybe there is a better procedure and you could point me to another R package running spatio-temporal analysis. Any help would be appreciated.
Thanks in advance for your help
I am not a fan of specialised class in R, since they are usually not as intuitive and require additional vocabulary to deal with. Here's an attempt to convert the time-series you'd made into a data.frame, using zoo package:
library(zoo)
df1 <- data.frame(zoo(sst1_ts), time=as.yearmon(time(sst1_ts)))
df1$jday <- as.Date(df1$time)
(fit1<-lm(X1 ~ jday, df1))
Call:
lm(formula = X1 ~ jday, data = df1)
Coefficients:
(Intercept) jday
2.937e+02 6.025e-05
Plotting is more intuitve with a data.frame as well:
library(ggplot2)
base <- ggplot(df1, aes(jday, X1)) + geom_line() + stat_smooth(method="lm")
p<-base + scale_x_date(date_labels = "%Y")
You can further use an interactive package such as plotly to navigate the plot created with ggplotly.
library(plotly)
ggplotly(p)

Error running bfast function

This is probably obvious but I cannot see where the problem is. I’m trying to run bfast on a yearly time series to detect abrupt changes in trend but keep getting the following error (it is indeed the call to stl what is causing the problem):
bfast(GM85.ts,h=0.15,max.iter=1,season="none")
Error in stl(Yt, "periodic") : series is not periodic or has less than two periods
My time series has frequency 1 and 95 years of data
GM85.ts
Time Series:
Start = 2006
End = 2100
Frequency = 1
[1] 13.88868 13.89915 13.91431 13.93718 13.94067 13.94063 13.96324 13.99648 14.01391 14.03268 14.04667 14.05893 14.05230 14.06443 14.07909 14.11433 14.14736 14.14514 14.15454 14.19593 14.23417 14.23578 14.25171 14.27545 14.27213
[26] 14.29543 14.32851 14.34124 14.36091 14.38245 14.41517 14.42666 14.45183 14.49599 14.50378 14.52052 14.54298 14.58360 14.60798 14.62069 14.64962 14.68641 14.71247 14.72497 14.76606 14.79369 14.81297 14.84822 14.86503 14.89134
[51] 14.92601 14.95497 14.98318 15.01789 15.05929 15.09193 15.11453 15.14574 15.17960 15.20188 15.23737 15.27275 15.28612 15.32248 15.34883 15.38858 15.42155 15.45223 15.48342 15.51099 15.54076 15.58005 15.59959 15.63353 15.66272
[76] 15.69312 15.71358 15.73641 15.76502 15.79923 15.83983 15.87472 15.91833 15.93602 15.99177 16.03119 16.05529 16.07834 16.10982 16.14174 16.17376 16.22898 16.25100 16.27703 16.30971
Therefore it is periodic and has more than two periods… what is causing the error then?
Seasonality only occurs when there is more than one observation per year. bfast can't find a harmonic equation given a single value.
You need to specify frequency when you create the time series object using ts(). Example of data with six measurements per year:
mydata <- rep(c(1,4,6,8,5,2), 50) + rnorm(50*6)
plot(mydata, type = "l")
ts <- ts(mydata, frequency = 6, start = 1969)
Check by typing ts, that will give you:
Time Series:
Start = c(1969, 1)
End = c(2018, 6)
Frequency = 6

Resources