Designating dynamic start in time-series vectors in R - r

I have some problems with time-series designation of vectors in R.
I work with time-series and when I want to set a vector to a certain period, I feel quite confident about how to do it. I have simply done as follow name<- ts(name, frequency=12, start=c(2007,1)). As you can see I have monthly data
I am making an R template for colleagues to use, and I want them to be able to carry out a recursive ARIMA regression from any given starting point. That is, I have a range of in-sample predicted valued and I want to designate a start-value that is n monthly observation after 2007 (or whatever start data is used), where n is the start-value of the recursive regression.

first and last from the xts time-series package do exactly what you want.i.e. to get the first 2 months of an object x:
first(x, '2 months’)
or the last 6 weeks:
last(x, '6 weeks’)
Valid period.types are: secs, seconds, mins, minutes, hours, days, weeks, months, quarters, and years. As always you can find much more detailed information using ?xts::first.

Related

Analyzing disparate time series in R

Are there tools in R that simplify analysis of lagged and disparate time series. For example:
Daily values that only occur on weekdays (no entry on weekends or holidays)
vs
Bi-annual values
What I'm seeking is ways to:
Complete the missing daily values (with interpolated, or last value rolled forward, etc.)
Look for correlation between daily values and the bi-annual value (only the values that came before the bi-annual event)
As an example:
10-year treasury note interest rate (daily on non-holiday weekdays) as "X" and i-bond fixed rate as "Y" (set May 1/Nov 1)
Any suggestions appreciated.
I've built a test dataset manually for "x" and used functions in zoo to populate the missing values (interpolated), but I'm hoping for a less "brute-force" method for looking at analyzing the disparate time series. I've used lag functions in the past, but those were on matching interval time series.
What Jon commented is what I had in mind:
expand a weekday time series to full week using missing value function(s) in zoo
Sample the daily value - say April 15 for the May 1, Oct 15 for Nov 1
Ideally be able to automate - say loop through April 1-30, Oct 1-30 to look for highest RSqr for the model of choice (linear, polynomial, etc.)
Not have to build discrete datasets for each of the above - but if that is what is required I can do it programmatically - I've done that with stock data in the past. I was looking for a more efficient means of selecting the datasets ad hoc during the analysis.
I don't have code to post, because I'm clueless as to the feature/function that would make the date selection I'm after possible (at least in R).
Thanks for the input so far. It has already been useful in helping me look at alternative methods to achieve what I'm after.

How to specify interval or frequency with tsibble and fable for service hours?

I want to forecast the number of customers entering a shop during service hours. I have hourly data for
Monday to Friday
8:00 to 18:00
Thus, I assume my time series is in fact regular, but atypical in a sense, since I have 10 hours a day and 5 days a week.
I am able to do modeling with this regular 24/7 time series by setting non-service hours to zero, but I find this inefficient and also incorrect, because the times are not missing. Rather, they do not exist.
Using the old ts-framework I was able to explicitly specify
myTS <- ts(x, frequency = 10)
However, within the new tsibble/fable-framework this is not possible. It detects hourly data and expects 24 hours per day and not 10. Every subsequent function reminds me of implicit gaps in time. Manually overriding the interval-Attribute works:
> attr(ts, "interval") <- new_interval(hour = 10)
> has_gaps(ts)
# A tibble: 1 x 1
.gaps
<lgl>
1 FALSE
But has no effect on modeling:
model(ts,
snaive = SNAIVE(customers ~ lag("week")))
I still get the same error message:
1 error encountered for snaive [1] .data contains implicit gaps in
time. You should check your data and convert implicit gaps into
explicit missing values using tsibble::fill_gaps() if required.
Any help would be appreciated.
This question actually corresponds to this gh issue. As far as I know, there's no R packages that allow users to construct custom schedule, for example to specify certain intra-days and days. A couple of packages provide some specific calendars (like business dates), but none gives a solution to setting up intra days. Tsibble will gain a calendar argument for custom calendars to respect structural missings, when such a package is made available. But currently no support for that.
As you stated, it's hourly data. Hence the data interval should be 1 hour, not 10 hours. However, ts() frequency is seasonal periods, 10 hours per day, for modelling.

Decompose fails because time-series period is set incorrectly

I am trying to plot a decomposed time series, but running into an error:
Error in decompose(ts_ret) : time series has no or less than 2 periods`.
I am forcing the time series to a fixed period that is higher than 2.
Why does the ts think the period is less than 2?
Shouldn't the period be set automatically based on the time intervals in the data? (which are daily)
rm(list=ls())
library(jsonlite)
library(xts)
item.id<-18
eve.url<-paste0("http://eve-marketdata.com/api/item_history2.json?char_name=demo&region_ids=10000002&type_ids=",item.id,"&days=100")
eve.data<-data.frame(fromJSON(txt=eve.url))$emd.row
eve.data$date<-as.POSIXct(eve.data$date,format="%Y-%m-%d",tz="EST")
xxx<-xts(as.numeric(eve.data[,"avgPrice"]),eve.data$date)
colnames(xxx)<-"trit"
ts_ret<-ts(xxx,frequency=52) #but Im setting the periods here.....
plot(decompose(ts_ret))
As #ufelder pointed out my dataset was too small to look at seasonal decomposition because I only had a few months of data (measured hourly), but not an entire seasons worth (which is 4 months). To fix this I had to modify the period of the dataset to once per day by using ts(xxx,frequency=365) so decompose would compare across days, not seasons.

Time Series for Periods Over One Year

I'm having trouble doing time series for my data set. Most examples have quarterly or monthly frequencies but my issue comes with data that is collect annually or every two years. Consider my code:
data<-data.frame(year=seq(1978,2012,2), number=runif(18,100,500))
time<-ts(data$number, start=1978, frequency=.5)
decomp<-decompose(time)
Error in decompose(time) : time series has no or less than 2 periods
How do I make R recognize time series values from data that is collected over an annual basis? Thanks!
Seasonal decomposition only makes sense with intra-yearly data, because you have seasons within years. So, trying to calculate seasonal effects with decompose on data collected every two years you get the error.

Role of frequency parameter in ts

How does the ts() function use its frequency parameter? What is the effect of assigning wrong values as frequency?
I am trying to use 1.5 years of website usage data to build a time series model so that I can forecast the usage for coming periods. I am using data at daily level. What should be the frequency here - 7 or 365 or 365.25?
The frequency is "the" period at which seasonal cycles repeat. I use "the" in scare quotes since, of course, there are often multiple cycles in time series data. For instance, daily data often exhibit weekly patterns (a frequency of 7) and yearly patterns (a frequency of 365 or 365.25 - the difference often does not matter).
In your case, I would assume that weekly patterns dominate, so I would assign frequency=7. If your data exhibits additional patterns, e.g., holiday effects, you can use specialized methods accounting for multiple seasonalities, or work with dummy coding and a regression-based framework.
Here, the frequency parameter is not a frequency that you can observe in the data of your time series. Instead, you have to specify the frequency at which samples of the time series were taken. In your case, this is simply 1 day, or 1.
The value you give here will influence the results you get later when running analysis operations (examples are average requests per time unit or fourier transformation to get the (real) frequencies in the data). E.g. if you wanted to get all your results in the unit of hours instead of in days, you would pass 24 instead of 1 as frequency, because your data samples were taken in a frequency of 24 hours.

Resources