I'm creating an xts object with a weekly (7 day) frequency to use in forecasting. However, even when using the frequency=7 argument in the xts call, the resulting xts object has a frequency of 1.
Here's an example with random data:
> values <- rnorm(364, 10)
> days <- seq.Date(from=as.Date("2014-01-01"), to=as.Date("2014-12-30"), by='days')
> x <- xts(values, order.by=days, frequency=7)
> frequency(x)
[1] 1
I have also tried, after using the above code, frequency(x) <- 7. However, this changes the class of x to only zooreg and zoo, losing the xts class and messing with the time stamp formats.
Does xts automatically choose a frequency based on analyzing the data in some way? If so, how can you override this to set a specific frequency for forecasting purposes (in this case, passing a seasonal time series to ets from the forecast package)?
I understand that xts may not allow frequencies that don't make sense, but a frequency of 7 with daily time stamps seems pretty logical.
Consecutive Date class dates always have a frequency of 1 since consecutive dates are 1 apart. Use ts or zooreg to get a frequency of 7:
tt <- ts(values, frequency = 7)
library(zoo)
zr <- as.zooreg(tt)
# or
zr <- zooreg(values, frequency = 7)
These will create a series whose times are 1, 1+1/7, 1+2/7, ...
If we have some index values of zr
zrdates <- index(zr)[5:12]
we can recover the dates from zrdates like this:
days[match(zrdates, index(zr))]
As pointed out in the comments xts does not support this type of series.
Related
My objective is to impute NAs in a zooreg time series object. The pattern of the time series is cyclic. My code is:
#load libraries required
library("zoo")
# create sequence every 15 minutes from 1st Dec to 20th Dec, 2018
timeStamp <- seq.POSIXt(from=as.POSIXct('2018-01-01 00:00:00', tz="UTC"), to=as.POSIXct('2018-01-20 23:45:00', tz="UTC"), by = "15 min")
# data which increases from 12am to 12pm, then decreases till 12 am of next day, for 20 days
readings <- rep(c(seq(1,48,1), seq(48,1,-1)), 20)
dF <- data.frame(timeStamp=timeStamp, readings=readings)
# create a regular zooreg object, frequency is 1 day( 4 readings * 24 hours)
readingsZooReg <- zooreg(dF$readings, order.by = dF$timeStamp, frequency = 4*24)
plot(readingsZooReg)
# force some data to be NAs
window(readingsZooReg, start = as.POSIXct("2018-01-14 00:00:00", tz="UTC"), end = as.POSIXct("2018-01-16 23:45:00", tz="UTC")) <- NA
plot(readingsZooReg)
# plot imputed values
plot(na.approx(readingsZooReg))
The plots are:
Full time series, NAs added, Imputed time series
I'm purposely using zoo here, since the time series I work on are irregular(eg. solar, oil wells, etc)
1) Is my usage of "zooreg" correct? Or would a "zoo" object suffice ?
2) Is my frequency variable right?
3) Why won't na.approx work? I've also tried na.StructTs, the R script hangs.
4) Is there a solution using any other package? xts, ts, etc?
Your current example time-series is a regular time-series.
(a irregular time series would have time-steps with different time distances between observations)
E.g.:
10:00:10, 10:00:20, 10:00:30, 10:00:40, 10:00:50 (regular spaced)
10:00:10, 10:00:17, 10:00:33, 10:00:37, 10:00:50 (irregular spaced)
If you really need to handle irregular spaced time-series, zoo is your go to package. Otherwise you can also use other time series classes as xts and ts.
About the frequency:
You set the frequency of a time-series usually according to a value where you expect patterns to repeat. (in your example this could be 96). In real live this is often 1 day, 1 week, 1 month,....but it can be also different from these like 1,5 days. (e.g. if you have daily returning patterns and 1 minute observations you would set the frequency to 1440).
na.approx of zoo workes perfectly. It is exactly doing what it is expected to. A interpolation between the points 0 before the gap and 0 at the end of the gap will give a straight line at 0. Of course that is probably not the result you expected, because it does not account for seasonality. That is why G. Grothendieck suggests you na.StructTS as a method to choose. (this method is usually better in accounting for seasonality)
The best choice if you are not bound to zoo would in this specific case be using na_seadec from the imputeTS package ( a package solely dedicated to time series imputation).
I have added you a example also with nice plots from the imputeTS package
library(imputeTS)
yourTS <- ts(coredata(readingsZooReg), frequency = 96)
ggplot_na_distribution(yourTS)
imputedTS <- na_seadec(yourTS)
ggplot_na_imputations(yourTS, imputedTS)
Usually imputeTS also works perfectly with zoo time-series as input. I only changed it to ts again, because something with your zoo object seems odd...that is also why na.StructTS from zoo itself breaks. Maybe somebody with better knowledge can help out here.
Beware, if you really should have irregular time series do not use other packages / imputation functions than from zoo. Because they all assume the data to be regular spaced and will give results accordingly.
What is the meaning of frequency below; when I have converted my xts object to ts object and tried printing ts object I got below information.
My data is hourly data. But I could not understand how this below frequency is calculated. I want to make sure my ts object is treating my data as hourly data.
Time Series:
Start = 1
End = 15548401
Frequency = 0.000277777777777778 (how this is equivalent to hourly frequency?)
So, My dataframe looks like below intitally:
y
1484337600 19.22819
1484341200 19.28906
1484344800 19.28228
1484348400 19.21669
1484352000 19.32759
1484355600 19.21833
1484359200 19.20626
1484362800 19.28737
1484366400 19.20651
1484370000 19.18424
It has epoch times and values. Epoch times are row.names in this dataframe.
Now, I converted into xts object using --
xts_dataframe <- xts(x = dataframe$y,
order.by = as.POSIXct(as.numeric(row.names(dataframe)), origin="1970-01-01"))
ts_dataframe <- as.ts(xts_dataframe)
Please suggest what I'm doing wrong? Basically I want to convert my initial dataframe to ts() object as I need to apply ARIMA on it. This data is per hour data. I'm really facing hard time to work with it.
The frequency is equivalent to 1/deltat, where deltat is the fraction of the sampling period between successive observations. ?frequency gives the example that deltat would be "1/12 for monthly data".
In the case of hourly data, deltat is 3600, since there are 3600 seconds in an hour. Since frequency = 1 / deltat, that means frequency = 1 / 3600, or 0.0002777778.
I need to process five years of weekly data. I used the following command to create a time series from that:
my.ts <- ts(x[,3], start = c(2009,12), freq=52)
When plotting the series it looks good. However, the time points of the observations are stored as:
time(my.ts)
# Time Series:
# Start = c(2009, 12)
# End = c(2014, 26)
# Frequency = 52
# [1] 2009.212 2009.231 2009.250 2009.269 2009.288 2009.308 2009.327 ...
I expected to see proper dates instead (which should be aligned with a Calendar). What shall I do?
That is how the "ts" class works.
The zoo package can represent time series with dates (and other indexes):
library(zoo)
z <- zooreg(1:3, start = as.Date("2009-12-01"), deltat = 7)
giving:
> z
2009-12-01 2009-12-08 2009-12-15
1 2 3
> time(z)
[1] "2009-12-01" "2009-12-08" "2009-12-15"
The xts package and a number of other packages can also represent time series with dates although they do it by converting to POSIXct internally whereas zoo maintains the original class.
I have a two variable dataframe (df) in R of daily sales for a ten year period from 2004-07-09 through 2014-12-31. Not every single date is represented in the ten year period, but pretty much most days Monday through Friday.
My objective is to aggregate sales by quarter, convert to a time series object, and run a seasonal decomposition and other time series forecasting.
I am having trouble with the conversion, as ulitmately I receive a error:
time series has no or less than 2 periods
Here's the structure of my code.
# create a time series object
library(xts)
x <- xts(df$amount, df$date)
# create a time series object aggregated by quarter
q.x <- apply.quarterly(x, sum)
When I try to run
fit <- stl(q.x, s.window = "periodic")
I get the error message
series is not periodic or has less than two periods
When I try to run
q.x.components <- decompose(q.x)
# or
decompose(x)
I get the error message
time series has no or less than 2 periods
So, how do I take my original dataframe, with a date variable and an amount variable (sales), aggregate that quarterly as a time series object, and then run a time series analysis?
I think I was able to answer my own question. I did this. Can anyone confirm if this structure makes sense?
library(lubridate)
# add a new variable indicating the calendar year.quarter (i.e. 2004.3) of each observation
df$year.quarter <- quarter(df$date, with_year = TRUE)
library(plyr)
# summarize gift amount by year.quarter
new.data <- ddply(df, .(year.quarter), summarize,
sum = round(sum(amount), 2))
# convert the new data to a quarterly time series object beginning
# in July 2004 (2004, Q3) and ending in December 2014 (2014, Q4)
nd.ts <- ts(new.data$sum, start = c(2004,3), end = c(2014,4), frequency = 4)
I have a chunk of data logging temperatures from a few dozen devices every hour for over a year. The data are stored as a zoo object. I'd very much like to summarize those data by looking at the average values for every one of the 24 hours in a day (1am, 2am, 3am, etc.). So that for each device I can see what its average value is for all the 1am times, 2am times, and so on. I can do this with a loop but sense that there must be a way to do this in zoo with an artful use of aggregate.zoo. Any help?
require(zoo)
# random hourly data over 30 days for five series
x <- matrix(rnorm(24 * 30 * 5),ncol=5)
# Assign hourly data with a real time and date
x.DateTime <- as.POSIXct("2014-01-01 0100",format = "%Y-%m-%d %H") +
seq(0,24 * 30 * 60 * 60, by=3600)
# make a zoo object
x.zoo <- zoo(x, x.DateTime)
#plot(x.zoo)
# what I want:
# the average value for each series at 1am, 2am, 3am, etc. so that
# the dimensions of the output are 24 (hours) by 5 (series)
# If I were just working on x I might do something like:
res <- matrix(NA,ncol=5,nrow=24)
for(i in 1:nrow(res)){
res[i,] <- apply(x[seq(i,nrow(x),by=24),],2,mean)
}
res
# how can I avoid the loop and write an aggregate statement in zoo that
# will get me what I want?
Calculate the hour for each time point and then aggregate by that:
hr <- as.numeric(format(time(x.zoo), "%H"))
ag <- aggregate(x.zoo, hr, mean)
dim(ag)
## [1] 24 5
ADDED
Alternately use hours from chron or hour from data.table:
library(chron)
ag <- aggregate(x.zoo, hours, mean)
This is quite similar to the other answer but takes advantage of the fact the the by=... argument to aggregate.zoo(...) can be a function which will be applied to time(x.zoo):
as.hour <- function(t) as.numeric(format(t,"%H"))
result <- aggregate(x.zoo,as.hour,mean)
identical(result,ag) # ag from G. Grothendieck answer
# [1] TRUE
Note that this produces a result identical to the other answer, not not the same as yours. This is because your dataset starts at 1:00am, not midnight, so your loop produces a matrix wherein the 1st row corresponds to 1:00am and the last row corresponds to midnight. These solutions produce zoo objects wherein the first row corresponds to midnight.