Error running bfast function - r

This is probably obvious but I cannot see where the problem is. I’m trying to run bfast on a yearly time series to detect abrupt changes in trend but keep getting the following error (it is indeed the call to stl what is causing the problem):
bfast(GM85.ts,h=0.15,max.iter=1,season="none")
Error in stl(Yt, "periodic") : series is not periodic or has less than two periods
My time series has frequency 1 and 95 years of data
GM85.ts
Time Series:
Start = 2006
End = 2100
Frequency = 1
[1] 13.88868 13.89915 13.91431 13.93718 13.94067 13.94063 13.96324 13.99648 14.01391 14.03268 14.04667 14.05893 14.05230 14.06443 14.07909 14.11433 14.14736 14.14514 14.15454 14.19593 14.23417 14.23578 14.25171 14.27545 14.27213
[26] 14.29543 14.32851 14.34124 14.36091 14.38245 14.41517 14.42666 14.45183 14.49599 14.50378 14.52052 14.54298 14.58360 14.60798 14.62069 14.64962 14.68641 14.71247 14.72497 14.76606 14.79369 14.81297 14.84822 14.86503 14.89134
[51] 14.92601 14.95497 14.98318 15.01789 15.05929 15.09193 15.11453 15.14574 15.17960 15.20188 15.23737 15.27275 15.28612 15.32248 15.34883 15.38858 15.42155 15.45223 15.48342 15.51099 15.54076 15.58005 15.59959 15.63353 15.66272
[76] 15.69312 15.71358 15.73641 15.76502 15.79923 15.83983 15.87472 15.91833 15.93602 15.99177 16.03119 16.05529 16.07834 16.10982 16.14174 16.17376 16.22898 16.25100 16.27703 16.30971
Therefore it is periodic and has more than two periods… what is causing the error then?

Seasonality only occurs when there is more than one observation per year. bfast can't find a harmonic equation given a single value.

You need to specify frequency when you create the time series object using ts(). Example of data with six measurements per year:
mydata <- rep(c(1,4,6,8,5,2), 50) + rnorm(50*6)
plot(mydata, type = "l")
ts <- ts(mydata, frequency = 6, start = 1969)
Check by typing ts, that will give you:
Time Series:
Start = c(1969, 1)
End = c(2018, 6)
Frequency = 6

Related

Can I account for a shift in a timeseries from quarter x onwards in the seas() package

I want to seasonally adjust the following time series:
s <- c(37.74991 , 37.77515 , 37.75384 , 37.94780 , 39.77417, 39.79430 , 39.81612,
39.99021, 42.20018, 42.22210, 42.26033, 42.45065, 45.07874, 45.15395,
45.22697, 45.51945, 47.79920, 47.85800, 47.92614, 48.25480, 49.91503,
49.91005, 50.04165, 50.42833, 51.72783, 51.60934, 51.61586, 51.89149,
55.29518, 55.17836, 55.22256, 55.55186, 57.44819, 57.25458, 57.25675,
57.68244, 59.67495, 59.42791, 59.44048, 59.87237, 61.81162, 61.60417,
61.63524, 62.06323, 63.39596, 63.19328, 63.14214, 63.49329, 65.63239,
65.46829, 65.48508, 65.65813, 69.06712, 68.92904, 68.94045, 69.18806,
73.69241, 73.47518, 73.42786, 73.82185, 69.54005, 69.59465, 69.85418,
70.30283, 73.80306, 73.69583, 73.69901, 74.12954)
s <-ts(s, start = c(1999, 1), frequency = 4)
There is an intended jump in the values of s from t=21 onwards. Therefore, I create a Dummy:
t <- rep_len(1,68)
t[1:20] <- 0
t <-ts(t, start = c(1999, 1), frequency = 4)
A X13-ARIMA model to seasonally adjust the data including the dummy
s_a <- final(seas(s, outlier = NULL, xreg = t))
gives me the error:
Error: X-13 run failed
Errors:
- forecasts end date, 2018.4, must end on or before user-defined
regression variables end date, 2015.4.
But actually, I am not interested in a forecast but the adjusted time series whithout a jump. So can I turn the forecast off or do I have to specify the seas() function differently?

Run analysis only on a intraday time period - zoo

I need to run an analysis from 10AM to 4PM.
The original data runs from 9 AM to 5 PM, everyday for one year.
How to include only the indicated time period for analysis ?
window in zoo does not help for the same.
structure(c(0, 7.12149266486255e-05, 0.000142429853297251, 0.000213644779945877,
0.000284859706594502, 0.000356074633243128, 0.000427289559891753,
0.000498504486540379, 0.000569719413189004, 0.00064093433983763,
0.000712149266486256, 0.000783364193134881, 0.000854579119783507,
0.000925794046432132, 0.000997008973080758, 0.00106822389972938,
0.00113943882637801, 0.00121065375302663, 0.00128186867967526,
0.00135308360632389, 0.00142429853297251, 0.00149551345962114,
0.00156672838626976, 0.00163794331291839, 0.00170915823956701,
0.00178037316621564, 0.00185158809286426, 0.00192280301951289,
0.00199401794616152, 0.00206523287281014), index = structure(c(1009942620,
1009942680, 1009942740, 1009942800, 1009942860, 1009942920, 1009942980,
1009943040, 1009943100, 1009943160, 1009943220, 1009943280, 1009943340,
1009943400, 1009943460, 1009943520, 1009943580, 1009943640, 1009943700,
1009943760, 1009943820, 1009943880, 1009943940, 1009944000, 1009944060,
1009944120, 1009944180, 1009944240, 1009944300, 1009944360), class = c("POSIXct",
"POSIXt")), class = "zoo")
How to select periods of time > 10 AM and time < 4 PM, across several days.
If z is the zoo object then
1) use this to extract hour of each time point and then subset to only those that are 10, 11, 12, 13, 14 or 15.
z[format(time(z), "%H") %in% 10:15]
2) or use this alternative which is similar but uses POSIXlt to get the hour:
z[as.POSIXlt(time(z))$hour %in% 10:15]
3) or convert the series to xts and use this:
x <- as.xts(z)["T10:00/T15:00"]
drop(as.zoo(x))
Omit the second line if it is ok to return an xts series.
Time Zone
Be sure that you have the time zone set correctly since the time in one time zone is not the same as in another time zone.
We can query the current time zone of the session like this:
Sys.timezone()
and can set it like this:
Sys.setenv(TZ = "...")
where ... is replaced with the time zone desired. Common settings are:
Sys.setenv(TZ = "GMT")
Sys.setenv(TZ = "") # restore default
The following will show the possible time zones that can be used:
OlsonNames()
You only need all this if the time zone of your session is not already set to the time zone of the data.
You could build a tibble for analysis with time, value and hour information. You can then filter the rows only between 10AM to 4PM.
library(dplyr)
library(zoo)
tibble(time = index(df),
value = coredata(df),
hour = lubridate::hour(time)) %>%
filter(between(hour, 10, 15)) -> result
result

how to interpolate data within groups in R using seqtime?

I am trying to use seqtime (https://github.com/hallucigenia-sparsa/seqtime) to analyze time-serie microbiome data, as follow:
meta = data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta<- meta[order(meta$day, meta$condition),]
meta.ts<-as.data.frame(t(meta))
otu=matrix(1:390, ncol = 39)
oturar<-rarefyFilter(otu, min=0)
rarotu<-oturar$rar
time<-meta.ts[1,]
interp.otu<-interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
the interpolation returns the following error:
[1] "Processing group a"
[1] "Number of members 13"
intervals
0
12
[1] "Selected interval: 1"
[1] "Length of time series: 13"
[1] "Length of time series after interpolation: 1"
Error in stinepack::stinterp(time.vector, as.numeric(x[i, ]), xout = xout, :
The values of x must strictly increasing
I tried to change method to "hyman", but it returns the error below:
Error in interpolateSub(x = x, time.vector = time.vector, method = method) :
Time points must be provided in chronological order.
I am using R version 3.6.1 and I am a bit new to R.
Please can anyone tell me what I am doing wrong/ how to go around these errors?
Many thanks!
I used quite some time stumbling around trying to figure this out. It all comes down to the data structure of meta and the resulting time variable used as input for the time.vector parameter.
When meta.ts is being converted to a data frame, all strings are automatically converted to factors - this includes day.
To adjust, you can edit your code to the following:
library(seqtime)
meta <- data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta <- meta[order(meta$day, meta$condition),]
meta.ts <- as.data.frame(t(meta), stringsAsFactors = FALSE) # Set stringsAsFactors = FALSE
otu <- matrix(1:390, ncol = 39)
oturar <- rarefyFilter(otu, min=0)
rarotu <- oturar$rar
time <- as.integer(meta.ts[1,]) # Now 'day' is character, so convert to integer
interp.otu <- interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
As a bonus, read this blogpost for information on the stringsAsFactors parameter. Strings automatically being converted to Factors is a common bewilderment.

ARIMA forecasts are way off

I am using ARIMA (auto.arima) to forecast for 52 weeks. The time series model fits the data well (see plot below, red line is the fitted value). The input data has a decreasing trend.
The forecasts (highlighted area) however seems to just taking off after the actual values end.
How can the forecasts be tamed?
dput of the input
> dput(baseTs)
structure(c(5.41951956469523, 5.49312499014084, 5.56299025716832,
5.64442852110163, 5.71385023974044, 5.77578632033402, 5.82985917237953,
5.86346591034374, 5.89626165157029, 5.92013286862512, 5.94200331713403,
5.93996840759539, 5.93917517855891, 5.90355191030718, 5.87180377346416,
5.83190030607801, 5.79624428055153, 5.75377043604686, 5.71445345904649,
5.70025269940165, 5.69789272204017, 5.73728731204876, 5.77015169357394,
5.78936321107329, 5.80113284575595, 5.79449448552444, 5.78193215198878,
5.74003482344406, 5.71694163930612, 5.66689345413153, 5.614357635737,
5.58578389962286, 5.55824727570498, 5.58495146060423, 5.61344117957187,
5.63637441850401, 5.65948408172102, 5.65558124383951, 5.64909390802285,
5.6664546352889, 5.68205689033408, 5.69991437586231, 5.72273650369514,
5.72006065065194, 5.71556512542993, 5.6717608006789, 5.64610326418084,
5.57193975508467, 5.49406607804055, 5.40126523530993, 5.31513540386482,
5.238437956722, 5.15362077920702, 5.11960611878249, 5.08498887979172,
5.08408134201562, 5.07361213981111, 5.04830559379816, 5.01401413448689,
5.0418662607737, 5.06947584464062, 5.08771495309317, 5.10587165060358,
5.1438369937098, 5.1815251206981, 5.2318657906363, 5.29385492077065,
5.29652029253008, 5.29998067741868, 5.28242409629194, 5.2722770646788,
5.24927444462166, 5.22226735874711, 5.16555064465208, 5.10956459841778,
5.09439240612378, 5.07617974794969, 5.04418337811006, 5.0075619037348,
4.99108423417745, 4.9874504485194, 4.99135285004736, 4.99217791657733,
4.94874445528885, 4.90320874819525, 4.84508278068469, 4.79086127023963,
4.75236840849279, 4.71431573721527, 4.71936529020481, 4.72422850167074,
4.72203091743033, 4.71732868614755, 4.71175323610448, 4.70566162766782,
4.71165837247331, 4.71767529028615, 4.75129316683193, 4.7863855803437,
4.85248191548789, 4.91865394024373, 4.9590849617955, 4.99960686851895,
5.02020678181827, 5.04201201976595, 5.02025906892952, 4.99735920720967,
4.92520279823639, 4.84822505567723, 4.81118504683572, 4.77330440072099,
4.72636395544651, 4.6861111959621, 4.64912520396312, 4.61348981514599,
4.58517820348434, 4.56378688913207, 4.549011597464, 4.52900600122321,
4.56028365470815, 4.60248987909752, 4.65628990381626, 4.70496326660038,
4.73779351647955, 4.76616725791407, 4.79569018347378, 4.83185281078024,
4.85177852259102, 4.87488251014986, 4.89468916229158, 4.9077984323135,
4.92375782591088, 4.96363767543938, 5.05416277704822, 5.1426680212522,
5.232495043331, 5.32153608753653, 5.41780853915163, 5.51131526881126,
5.62791210324026), .Tsp = c(2015.05769230769, 2017.73076923077,
52), class = "ts")
The code used
fc <- try(auto.arima(baseTs,ic='aic',approximation = F))
baseFc <- forecast(fc,h = weeks_forecasted)
baseVolume_forecast_new <- baseFc$mean
What could be the reason behind the forecasts exploding?

Averaging multiple time series contained in a list object month-by-month

I have a list object with multiple slots, 2 in this example, each containing a time series of monthly data over 5 yrs (60 values). I want to calculate an "average" time series from these two i.e the January value of my new series should be the mean of the 2 Januarys from each of the two time series and so on. I thought of using lapply() but if I understand correctly, that is used to apply functions within slots and not across, but I could be wrong.
Here is a dput()of my data:
list(structure(c(-2.70881589936525, -1.25455287657218, 2.20891093254408,
5.47494447650721, 9.22974987813887, 12.0978184689361, 15.8529078203063,
14.5682520133134, 10.8615272853853, 5.13086415717895, 0.728917940858284,
2.13993708024285, 0.0592607633855364, -1.08188244487586, -1.19467731719249,
5.03740002827978, 10.3763483415682, 13.3292119845773, 12.838352493412,
15.3580851547661, 9.4829099091539, 6.56223017400025, 1.36042454896383,
0.899805834524198, -2.13189083053455, -0.083918862391372, 0.994166453698637,
2.71436535566226, 11.3453352141603, 15.0712013841955, 13.7110193822507,
9.8693411661721, 9.60321957581941, 5.2375499185438, -0.184162586424226,
-1.50175258729513, -6.9445058128996, -3.21184575341925, 0.383804323362742,
5.59544079002557, 7.80248270514967, 12.4958346580684, 14.3387761065989,
12.1472112399243, 12.3920738957853, 7.03456285321734, 1.04268672395181,
-1.38758815045495, -3.32477056135693, -0.447356879470411, 4.56295165574707,
5.68189626665318, 6.74697976141299, 12.0703824641417, 16.8904454284777,
14.2920547883889, 12.1655598473256, 6.77734909883441, 3.00180135903895,
1.94856648801937), .Tsp = c(2001, 2005.91666666667, 12), class = "ts"),
structure(c(-1.63889806183691, -3.44715647302858, 0.394739200062096,
5.23920359311239, 9.57664849661865, 14.0415975194851, 16.7884967982053,
13.6157495618028, 10.5269221330342, 7.71132825720641, -0.0288215700483627,
-3.13091409964762, -0.970448606448803, -1.87539827694689,
0.765137214031195, 4.44395722618218, 10.680721392289, 10.3468681880514,
14.3053058161559, 16.3132350056912, 12.8839577165089, 9.98091681764607,
2.69020486688223, 0.290392068555248, -0.924761078500446,
-5.67205756065117, 1.41326224137296, 6.36338872204811, 8.92258840663339,
13.0624643120579, 12.8689225828453, 14.3836922928304, 12.3805992934003,
7.60234172866889, 2.86744304241512, 1.35829952041713, -2.82629733903844,
-0.768552317763034, -0.568688011194226, 3.57676644057355,
4.99664005346314, 11.0140757656585, 15.498475017639, 13.4278279144656,
11.8598222456967, 7.31027938974563, 3.10247804880477, -2.67278197280972,
-2.49516231672057, -3.63941768231319, 1.89945951183736, 4.26424942213747,
9.37058647983393, 14.5133688239731, 14.6719630140624, 15.5022840542944,
13.3686764903323, 6.20332371420166, 3.05229549361941, -0.975912393030021
), .Tsp = c(2001, 2005.91666666667, 12), class = "ts"))
If there is an automated way of doing this it will be great because I will eventually have a list with 1000 ts() objects each with 600 data points
Thanks.
If L is the list of "ts" objects then assuming the time index of each component series is the same:
1) rowMeans/cbind
combined <- ts(rowMeans(do.call("cbind", L)))
tsp(combined) <- tsp(L[[1]]) # fix up times
2) Reduce
Reduce("+", L) / length(L)
These should both work even if there are more than 2 components in L.

Resources