Related
I want to seasonally adjust the following time series:
s <- c(37.74991 , 37.77515 , 37.75384 , 37.94780 , 39.77417, 39.79430 , 39.81612,
39.99021, 42.20018, 42.22210, 42.26033, 42.45065, 45.07874, 45.15395,
45.22697, 45.51945, 47.79920, 47.85800, 47.92614, 48.25480, 49.91503,
49.91005, 50.04165, 50.42833, 51.72783, 51.60934, 51.61586, 51.89149,
55.29518, 55.17836, 55.22256, 55.55186, 57.44819, 57.25458, 57.25675,
57.68244, 59.67495, 59.42791, 59.44048, 59.87237, 61.81162, 61.60417,
61.63524, 62.06323, 63.39596, 63.19328, 63.14214, 63.49329, 65.63239,
65.46829, 65.48508, 65.65813, 69.06712, 68.92904, 68.94045, 69.18806,
73.69241, 73.47518, 73.42786, 73.82185, 69.54005, 69.59465, 69.85418,
70.30283, 73.80306, 73.69583, 73.69901, 74.12954)
s <-ts(s, start = c(1999, 1), frequency = 4)
There is an intended jump in the values of s from t=21 onwards. Therefore, I create a Dummy:
t <- rep_len(1,68)
t[1:20] <- 0
t <-ts(t, start = c(1999, 1), frequency = 4)
A X13-ARIMA model to seasonally adjust the data including the dummy
s_a <- final(seas(s, outlier = NULL, xreg = t))
gives me the error:
Error: X-13 run failed
Errors:
- forecasts end date, 2018.4, must end on or before user-defined
regression variables end date, 2015.4.
But actually, I am not interested in a forecast but the adjusted time series whithout a jump. So can I turn the forecast off or do I have to specify the seas() function differently?
I need to run an analysis from 10AM to 4PM.
The original data runs from 9 AM to 5 PM, everyday for one year.
How to include only the indicated time period for analysis ?
window in zoo does not help for the same.
structure(c(0, 7.12149266486255e-05, 0.000142429853297251, 0.000213644779945877,
0.000284859706594502, 0.000356074633243128, 0.000427289559891753,
0.000498504486540379, 0.000569719413189004, 0.00064093433983763,
0.000712149266486256, 0.000783364193134881, 0.000854579119783507,
0.000925794046432132, 0.000997008973080758, 0.00106822389972938,
0.00113943882637801, 0.00121065375302663, 0.00128186867967526,
0.00135308360632389, 0.00142429853297251, 0.00149551345962114,
0.00156672838626976, 0.00163794331291839, 0.00170915823956701,
0.00178037316621564, 0.00185158809286426, 0.00192280301951289,
0.00199401794616152, 0.00206523287281014), index = structure(c(1009942620,
1009942680, 1009942740, 1009942800, 1009942860, 1009942920, 1009942980,
1009943040, 1009943100, 1009943160, 1009943220, 1009943280, 1009943340,
1009943400, 1009943460, 1009943520, 1009943580, 1009943640, 1009943700,
1009943760, 1009943820, 1009943880, 1009943940, 1009944000, 1009944060,
1009944120, 1009944180, 1009944240, 1009944300, 1009944360), class = c("POSIXct",
"POSIXt")), class = "zoo")
How to select periods of time > 10 AM and time < 4 PM, across several days.
If z is the zoo object then
1) use this to extract hour of each time point and then subset to only those that are 10, 11, 12, 13, 14 or 15.
z[format(time(z), "%H") %in% 10:15]
2) or use this alternative which is similar but uses POSIXlt to get the hour:
z[as.POSIXlt(time(z))$hour %in% 10:15]
3) or convert the series to xts and use this:
x <- as.xts(z)["T10:00/T15:00"]
drop(as.zoo(x))
Omit the second line if it is ok to return an xts series.
Time Zone
Be sure that you have the time zone set correctly since the time in one time zone is not the same as in another time zone.
We can query the current time zone of the session like this:
Sys.timezone()
and can set it like this:
Sys.setenv(TZ = "...")
where ... is replaced with the time zone desired. Common settings are:
Sys.setenv(TZ = "GMT")
Sys.setenv(TZ = "") # restore default
The following will show the possible time zones that can be used:
OlsonNames()
You only need all this if the time zone of your session is not already set to the time zone of the data.
You could build a tibble for analysis with time, value and hour information. You can then filter the rows only between 10AM to 4PM.
library(dplyr)
library(zoo)
tibble(time = index(df),
value = coredata(df),
hour = lubridate::hour(time)) %>%
filter(between(hour, 10, 15)) -> result
result
I am trying to use seqtime (https://github.com/hallucigenia-sparsa/seqtime) to analyze time-serie microbiome data, as follow:
meta = data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta<- meta[order(meta$day, meta$condition),]
meta.ts<-as.data.frame(t(meta))
otu=matrix(1:390, ncol = 39)
oturar<-rarefyFilter(otu, min=0)
rarotu<-oturar$rar
time<-meta.ts[1,]
interp.otu<-interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
the interpolation returns the following error:
[1] "Processing group a"
[1] "Number of members 13"
intervals
0
12
[1] "Selected interval: 1"
[1] "Length of time series: 13"
[1] "Length of time series after interpolation: 1"
Error in stinepack::stinterp(time.vector, as.numeric(x[i, ]), xout = xout, :
The values of x must strictly increasing
I tried to change method to "hyman", but it returns the error below:
Error in interpolateSub(x = x, time.vector = time.vector, method = method) :
Time points must be provided in chronological order.
I am using R version 3.6.1 and I am a bit new to R.
Please can anyone tell me what I am doing wrong/ how to go around these errors?
Many thanks!
I used quite some time stumbling around trying to figure this out. It all comes down to the data structure of meta and the resulting time variable used as input for the time.vector parameter.
When meta.ts is being converted to a data frame, all strings are automatically converted to factors - this includes day.
To adjust, you can edit your code to the following:
library(seqtime)
meta <- data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta <- meta[order(meta$day, meta$condition),]
meta.ts <- as.data.frame(t(meta), stringsAsFactors = FALSE) # Set stringsAsFactors = FALSE
otu <- matrix(1:390, ncol = 39)
oturar <- rarefyFilter(otu, min=0)
rarotu <- oturar$rar
time <- as.integer(meta.ts[1,]) # Now 'day' is character, so convert to integer
interp.otu <- interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
As a bonus, read this blogpost for information on the stringsAsFactors parameter. Strings automatically being converted to Factors is a common bewilderment.
I am using ARIMA (auto.arima) to forecast for 52 weeks. The time series model fits the data well (see plot below, red line is the fitted value). The input data has a decreasing trend.
The forecasts (highlighted area) however seems to just taking off after the actual values end.
How can the forecasts be tamed?
dput of the input
> dput(baseTs)
structure(c(5.41951956469523, 5.49312499014084, 5.56299025716832,
5.64442852110163, 5.71385023974044, 5.77578632033402, 5.82985917237953,
5.86346591034374, 5.89626165157029, 5.92013286862512, 5.94200331713403,
5.93996840759539, 5.93917517855891, 5.90355191030718, 5.87180377346416,
5.83190030607801, 5.79624428055153, 5.75377043604686, 5.71445345904649,
5.70025269940165, 5.69789272204017, 5.73728731204876, 5.77015169357394,
5.78936321107329, 5.80113284575595, 5.79449448552444, 5.78193215198878,
5.74003482344406, 5.71694163930612, 5.66689345413153, 5.614357635737,
5.58578389962286, 5.55824727570498, 5.58495146060423, 5.61344117957187,
5.63637441850401, 5.65948408172102, 5.65558124383951, 5.64909390802285,
5.6664546352889, 5.68205689033408, 5.69991437586231, 5.72273650369514,
5.72006065065194, 5.71556512542993, 5.6717608006789, 5.64610326418084,
5.57193975508467, 5.49406607804055, 5.40126523530993, 5.31513540386482,
5.238437956722, 5.15362077920702, 5.11960611878249, 5.08498887979172,
5.08408134201562, 5.07361213981111, 5.04830559379816, 5.01401413448689,
5.0418662607737, 5.06947584464062, 5.08771495309317, 5.10587165060358,
5.1438369937098, 5.1815251206981, 5.2318657906363, 5.29385492077065,
5.29652029253008, 5.29998067741868, 5.28242409629194, 5.2722770646788,
5.24927444462166, 5.22226735874711, 5.16555064465208, 5.10956459841778,
5.09439240612378, 5.07617974794969, 5.04418337811006, 5.0075619037348,
4.99108423417745, 4.9874504485194, 4.99135285004736, 4.99217791657733,
4.94874445528885, 4.90320874819525, 4.84508278068469, 4.79086127023963,
4.75236840849279, 4.71431573721527, 4.71936529020481, 4.72422850167074,
4.72203091743033, 4.71732868614755, 4.71175323610448, 4.70566162766782,
4.71165837247331, 4.71767529028615, 4.75129316683193, 4.7863855803437,
4.85248191548789, 4.91865394024373, 4.9590849617955, 4.99960686851895,
5.02020678181827, 5.04201201976595, 5.02025906892952, 4.99735920720967,
4.92520279823639, 4.84822505567723, 4.81118504683572, 4.77330440072099,
4.72636395544651, 4.6861111959621, 4.64912520396312, 4.61348981514599,
4.58517820348434, 4.56378688913207, 4.549011597464, 4.52900600122321,
4.56028365470815, 4.60248987909752, 4.65628990381626, 4.70496326660038,
4.73779351647955, 4.76616725791407, 4.79569018347378, 4.83185281078024,
4.85177852259102, 4.87488251014986, 4.89468916229158, 4.9077984323135,
4.92375782591088, 4.96363767543938, 5.05416277704822, 5.1426680212522,
5.232495043331, 5.32153608753653, 5.41780853915163, 5.51131526881126,
5.62791210324026), .Tsp = c(2015.05769230769, 2017.73076923077,
52), class = "ts")
The code used
fc <- try(auto.arima(baseTs,ic='aic',approximation = F))
baseFc <- forecast(fc,h = weeks_forecasted)
baseVolume_forecast_new <- baseFc$mean
What could be the reason behind the forecasts exploding?
I have a list object with multiple slots, 2 in this example, each containing a time series of monthly data over 5 yrs (60 values). I want to calculate an "average" time series from these two i.e the January value of my new series should be the mean of the 2 Januarys from each of the two time series and so on. I thought of using lapply() but if I understand correctly, that is used to apply functions within slots and not across, but I could be wrong.
Here is a dput()of my data:
list(structure(c(-2.70881589936525, -1.25455287657218, 2.20891093254408,
5.47494447650721, 9.22974987813887, 12.0978184689361, 15.8529078203063,
14.5682520133134, 10.8615272853853, 5.13086415717895, 0.728917940858284,
2.13993708024285, 0.0592607633855364, -1.08188244487586, -1.19467731719249,
5.03740002827978, 10.3763483415682, 13.3292119845773, 12.838352493412,
15.3580851547661, 9.4829099091539, 6.56223017400025, 1.36042454896383,
0.899805834524198, -2.13189083053455, -0.083918862391372, 0.994166453698637,
2.71436535566226, 11.3453352141603, 15.0712013841955, 13.7110193822507,
9.8693411661721, 9.60321957581941, 5.2375499185438, -0.184162586424226,
-1.50175258729513, -6.9445058128996, -3.21184575341925, 0.383804323362742,
5.59544079002557, 7.80248270514967, 12.4958346580684, 14.3387761065989,
12.1472112399243, 12.3920738957853, 7.03456285321734, 1.04268672395181,
-1.38758815045495, -3.32477056135693, -0.447356879470411, 4.56295165574707,
5.68189626665318, 6.74697976141299, 12.0703824641417, 16.8904454284777,
14.2920547883889, 12.1655598473256, 6.77734909883441, 3.00180135903895,
1.94856648801937), .Tsp = c(2001, 2005.91666666667, 12), class = "ts"),
structure(c(-1.63889806183691, -3.44715647302858, 0.394739200062096,
5.23920359311239, 9.57664849661865, 14.0415975194851, 16.7884967982053,
13.6157495618028, 10.5269221330342, 7.71132825720641, -0.0288215700483627,
-3.13091409964762, -0.970448606448803, -1.87539827694689,
0.765137214031195, 4.44395722618218, 10.680721392289, 10.3468681880514,
14.3053058161559, 16.3132350056912, 12.8839577165089, 9.98091681764607,
2.69020486688223, 0.290392068555248, -0.924761078500446,
-5.67205756065117, 1.41326224137296, 6.36338872204811, 8.92258840663339,
13.0624643120579, 12.8689225828453, 14.3836922928304, 12.3805992934003,
7.60234172866889, 2.86744304241512, 1.35829952041713, -2.82629733903844,
-0.768552317763034, -0.568688011194226, 3.57676644057355,
4.99664005346314, 11.0140757656585, 15.498475017639, 13.4278279144656,
11.8598222456967, 7.31027938974563, 3.10247804880477, -2.67278197280972,
-2.49516231672057, -3.63941768231319, 1.89945951183736, 4.26424942213747,
9.37058647983393, 14.5133688239731, 14.6719630140624, 15.5022840542944,
13.3686764903323, 6.20332371420166, 3.05229549361941, -0.975912393030021
), .Tsp = c(2001, 2005.91666666667, 12), class = "ts"))
If there is an automated way of doing this it will be great because I will eventually have a list with 1000 ts() objects each with 600 data points
Thanks.
If L is the list of "ts" objects then assuming the time index of each component series is the same:
1) rowMeans/cbind
combined <- ts(rowMeans(do.call("cbind", L)))
tsp(combined) <- tsp(L[[1]]) # fix up times
2) Reduce
Reduce("+", L) / length(L)
These should both work even if there are more than 2 components in L.