Decompose xts hourly time series - r

I want to decompose hourly time series with decompose, ets, or stl or whatever function. Here is an example code and its output:
require(xts)
require(forecast)
time_index1 <- seq(from = as.POSIXct("2012-05-15 07:00"),
to = as.POSIXct("2012-05-17 18:00"), by="hour")
head(time_index1 <- format(time_index1, format="%Y-%m-%d %H:%M:%S",
tz="UTC", usetz=TRUE)
# [1] "2012-05-15 05:00:00 UTC" "2012-05-15 06:00:00 UTC"
# [3] "2012-05-15 07:00:00 UTC" "2012-05-15 08:00:00 UTC"
# [5] "2012-05-15 09:00:00 UTC" "2012-05-15 10:00:00 UTC"
head(time_index <- as.POSIXct(time_index1))
# [1] "2012-05-15 05:00:00 CEST" "2012-05-15 06:00:00 CEST"
# [3] "2012-05-15 07:00:00 CEST" "2012-05-15 08:00:00 CEST"
# [5] "2012-05-15 09:00:00 CEST" "2012-05-15 10:00:00 CEST"
Why does the timezone for time_index change back to CEST?
set.seed(1)
value <- rnorm(n = length(time_index1))
eventdata1 <- xts(value, order.by = time_index)
tzone(eventdata1)
# [1] ""
head(index(eventdata1))
# [1] "2012-05-15 05:00:00 CEST" "2012-05-15 06:00:00 CEST"
# [3] "2012-05-15 07:00:00 CEST" "2012-05-15 08:00:00 CEST"
# [5] "2012-05-15 09:00:00 CEST" "2012-05-15 10:00:00 CEST"
ets(eventdata1)
# ETS(A,N,N)
#
# Call:
# ets(y = eventdata1)
#
# Smoothing parameters:
# alpha = 1e-04
#
# Initial states:
# l = 0.1077
#
# sigma: 0.8481
#
# AIC AICc BIC
# 229.8835 230.0940 234.0722
decompose(eventdata1)
# Error in decompose(eventdata1) :
# time series has no or less than 2 periods
stl(eventdata1)
# Error in stl(eventdata1) :
# series is not periodic or has less than two periods
When I call tzone or indexTZ there is no timezone but the index clearly show that the times are defined with a timezone.
Also, why does only ets work? Can it be used to decompose a time series?

Why does the timezone for time_index change back to CEST?
Because you didn't specify tz= in your call to as.POSIXct. It will only pick up the timezone from the string if it's specified by offset from UTC (e.g. -0800). See ?strptime.
R> head(time_index <- as.POSIXct(time_index1, "UTC"))
[1] "2012-05-15 12:00:00 UTC" "2012-05-15 13:00:00 UTC"
[3] "2012-05-15 14:00:00 UTC" "2012-05-15 15:00:00 UTC"
[5] "2012-05-15 16:00:00 UTC" "2012-05-15 17:00:00 UTC"
When I call tzone or indexTZ there is no timezone but the index clearly show that the times are defined with a timezone.
All POSIXct objects have a timezone. A timezone of "" simply means R wasn't able to determine a specific timezone, so it is using the timezone specified by your operating system. See ?timezone.
Only the ets function works because your xts object doesn't have a properly defined frequency attribute. This is a known limitation of xts objects, and I have plans to address them over the next several months. You can work around the current issues by explicitly specifying the frequency attribute after calling the xts constructor.
R> set.seed(1)
R> value <- rnorm(n = length(time_index1))
R> eventdata1 <- xts(value, order.by = time_index)
R> attr(eventdata1, 'frequency') <- 24 # set frequency attribute
R> decompose(as.ts(eventdata1)) # decompose expects a 'ts' object

You can use tbats to decompose hourly data:
require(forecast)
set.seed(1)
time_index1 <- seq(from = as.POSIXct("2012-05-15 07:00"),
to = as.POSIXct("2012-05-17 18:00"), by="hour")
value <- rnorm(n = length(time_index1))
eventdata1 <- msts(value, seasonal.periods = c(24) )
seasonaldecomp <- tbats(eventdata1)
plot(seasonaldecomp)
Additionally, using msts instead of xts allows you to specify multiple seasons/cycles, fore instance hourly as well as daily: c(24, 24*7)

Related

Floor datetime with custom start time (lubridate)

Is there a way to floor dates using a custom start time instead of the earliest possible time?
For example, flooring hours in a day into 2 12-hour intervals starting at 8am and 8pm rather than 12am and 12pm.
Example:
x <- ymd_hms("2009-08-03 21:00:00")
y <- ymd_hms("2009-08-03 09:00:00")
floor_date(x, '12 hours')
floor_date(y, '12 hours')
# default lubridate output:
[1] "2009-08-03 12:00:00 UTC"
[1] "2009-08-03 UTC"
# what i would like to have:
[1] "2009-08-03 20:00:00 UTC"
[1] "2009-08-03 08:00:00 UTC"
You could program a small switch (without lubridate, though).
FUN <- function(x) {
s <- switch(which.min(abs(mapply(`-`, c(8, 20), as.numeric(substr(x, 12, 13))))),
"08:00:00", "20:00:00")
as.POSIXct(paste(as.Date(x), s))
}
FUN("2009-08-03 21:00:00")
# [1] "2009-08-03 20:00:00 CEST"
FUN("2009-08-03 09:00:00")
# [1] "2009-08-03 08:00:00 CEST"

Generate random times in sample of POSIXct

I want to generate a load of POSIXct dates. I want to have the time component only between 9am and 5pm and only at 15 minute blocks. I know how to generate the random POSIXct between certain dates but how do I specify the minute blocks and the time range. This is where I am at:
sample(seq(as.POSIXct('2013/01/01'), as.POSIXct('2017/05/01'), by="day"), 1000)
Just change the by argument to 15mins:
sample(seq(as.POSIXct('2013/01/01'), as.POSIXct('2017/05/01'), by="15 mins"), 1000)
EDIT:
I overlooked that the time component should be between 9am and 5pm. To take this into account I would filter the sequence:
library(lubridate)
possible_dates <- seq(as.POSIXct('2013/01/01'), as.POSIXct('2017/05/01'), by="15 mins")
possible_dates <- possible_dates[hour(possible_dates) < 17 & hour(possible_dates) >=9]
sample(possible_dates, 1000)
As #AEF also pointed out, you can use the argument by to create the sequence in steps of 15 minutes.
x <- seq(as.POSIXct('2013/01/01'), as.POSIXct('2017/05/01'), by="15 mins")
You then can use lubridate::hour() like this to extract the values from the sequence and create the sample:
library(lubridate)
sample(x[hour(x) > "09:00" & hour(x) < "17:00"], 1000)
# [1] "2015-06-28 12:45:00 CEST" "2014-05-04 10:15:00 CEST" "2017-01-08 01:00:00 CET" "2015-06-22 12:30:00 CEST"
# [5] "2016-01-14 13:30:00 CET" "2015-06-15 14:00:00 CEST" "2014-11-20 13:15:00 CET" "2013-09-23 11:15:00 CEST"
# [9] "2014-11-25 11:30:00 CET" "2014-12-04 15:30:00 CET" "2016-05-28 14:45:00 CEST" "2017-01-12 14:15:00 CET"
# .....
OK so I used this in the end:
ApptDate<-sample(seq(as.Date('2013/01/01'), as.Date('2017/05/01'), by="day"), 1000)
Time<-paste(sample(9:15,1000,replace=T),":",sample(seq(0,59,by=15),1000,replace=T),sep="")
FinalPOSIXDate<-as.POSIXct(paste(ApptDate," ",Time,sep=""))

Generate a uniformly sampled time series object in R

Hi I am looking to generate a uniformly sampled time series at 30 minute interval from a particular start date to some end date. However the constraint is that on each day the 30 minute interval begins at 7:00 and ends at 18:30 i.e. I need the time series object to be something like
c('2016-08-19 07:00:00',
'2016-08-19 07:30:00',
...,
'2016-08-19 18:30:00',
'2016-08-20 07:00:00',
...,
'2016-08-20 18:30:00',
...
'2016-08-31 18:30:00')
Without the constraints it can be done with something like
seq(as.POSIXct('2016-08-19 07:00:00'), as.POSIXct('2016-08-21 18:30:00'), by="30 min")
But I dont want the times between '2016-08-20 18:30:00' and '2016-08-21 07:30:00' in this case. Any help will be appreciated. Thanks!
Using the example series you created:
ts <- seq(as.POSIXct('2016-08-19 07:00:00'),
as.POSIXct('2016-08-21 18:30:00'), by="30 min")
Pull out the hours from your series using strftime:
hours <- strftime(ts, format="%H:%M:%S")
> head(hours)
[1] "07:00:00" "07:30:00"
[3] "08:00:00" "08:30:00"
[5] "09:00:00" "09:30:00"
You can then convert it back to POSIXct:
hours <- as.POSIXct(hours, format="%H:%M:%S")
This will retain the times of the day but it will make the date today's date:
> head(hours)
[1] "2016-09-11 07:00:00 EDT"
[2] "2016-09-11 07:30:00 EDT"
[3] "2016-09-11 08:00:00 EDT"
[4] "2016-09-11 08:30:00 EDT"
[5] "2016-09-11 09:00:00 EDT"
[6] "2016-09-11 09:30:00 EDT"
> tail(hours)
[1] "2016-09-11 16:00:00 EDT"
[2] "2016-09-11 16:30:00 EDT"
[3] "2016-09-11 17:00:00 EDT"
[4] "2016-09-11 17:30:00 EDT"
[5] "2016-09-11 18:00:00 EDT"
[6] "2016-09-11 18:30:00 EDT"
You can then create a TRUE/FALSE vector based on the condition you want:
condition <- hours > "2016-09-11 07:30:00 EDT" &
hours < "2016-09-11 18:30:00 EDT"
Then filter your original series based on this condition:
ts[condition]
Here is my short and handy solution with package lubridate
library("lubridate")
list <- lapply(0:2, function(x){
temp <- ymd_hms('2016-08-19 07:00:00') + days(x)
result <- temp + minutes(seq(0, 690, 30))
return(strftime(result))
})
do.call("c", list)
I have to use strftime(result) to remove the timezone and to have the right times.

Sum half-hourly values to 6-hourly using zApply in Raster package R

library(raster)
library(zoo)
library(xts)
I have a large rasterBrick object (RB) with 144 layers.Each layer is separated from the other by 30 minutes time step. The start time for accumulation is 00:00:00–00:30:00 UTC. Please use UTC and not CST.
date30mins=<-seq(as.POSIXct("2015-04-01 00:00:00"), as.POSIXct("2015-04-03 23:59:59"), by="30 mins",tz="GMT") #length (date30mins)=144
i.e. date30mins
"2015-04-01 00:00:00 UTC"
"2015-04-01 00:30:00 UTC"
"2015-04-01 01:00:00 UTC"
"2015-04-01 01:30:00 UTC"
"2015-04-01 02:00:00 UTC"
"2015-04-01 02:30:00 UTC"
"2015-04-01 03:00:00 UTC"
"2015-04-01 03:30:00 UTC"
"2015-04-01 04:00:00 UTC"
"2015-04-01 04:30:00 UTC"
"2015-04-01 05:00:00 UTC"
"2015-04-01 05:30:00 UTC" and so on...
becomes: dates6hourly
(i.e. sum values from 00:00:00 UTC to 05:30:00 UTC;06:00:00 UTC to 11:30:00 UTC)
`dates6hourly`=seq(as.POSIXct("2015-04-01 00:00:00"), as.POSIXct("2015-04-03 23:59:59"), by="6 hours",tz="GMT")
"2015-01-01 00:00:00 UTC"
"2015-01-01 06:00:00 UTC" and so on
How can one implement this using zApply?
My data set is too large to make a reproducible example via dput but sample data can be found here sample data set
EDIT
The following should accomplish the task but I get an error:
index=rep(seq(1,12,by=1),each=12)
ras <- setZ(RB,date30mins)
6hrly <- zApply(ras,by=index, FUN=sum);
Error: unexpected symbol in "6hrly"
Probably zApply does not recognize "index"?
Within package::rts this can easily be done using:
rasrts=rts(x=RB,time=dates30mins)
agg <- period.apply(rasrts,index,sum)#rts
But I would like to implement using the raster package.
I got it using raster package stackApply function.
library(raster)
library(zoo)
library(xts)
index=rep(seq(1,12,by=1),each=12)
ras <- setZ(RB,date30mins)
sixhrly <- stackApply(ras,indices=index, fun=sum);

lubridate parses dates one day off

When I put a single date to be parsed, it parses accurately
> ymd("20011001")
[1] "2001-10-01 UTC"
But when I try to create a vector of dates they all come out one day off:
> b=c(ymd("20111001"),ymd("20101001"),ymd("20091001"),ymd("20081001"),ymd("20071001"),ymd("20061001"),ymd("20051001"),ymd("20041001"),ymd("20031001"),ymd("20021001"),ymd("20011001"))
> b
[1] "2011-09-30 19:00:00 CDT" "2010-09-30 19:00:00 CDT" "2009-09-30 19:00:00 CDT"
[4] "2008-09-30 19:00:00 CDT" "2007-09-30 19:00:00 CDT" "2006-09-30 19:00:00 CDT"
[7] "2005-09-30 19:00:00 CDT" "2004-09-30 19:00:00 CDT" "2003-09-30 19:00:00 CDT"
[10] "2002-09-30 19:00:00 CDT" "2001-09-30 19:00:00 CDT"
how can I fix this??? Many thanks.
I don't claim to understand exactly what's going on here, but the proximal problem is that c() strips attributes, so using c() on a POSIX[c?]t vector changes it from UTC to the time zone specified by your locale strips the time zone attribute, messing it up (even if you set the time zone to agree with the one specified by your locale). On my system:
library(lubridate)
(y1 <- ymd("20011001"))
## [1] "2001-10-01 UTC"
(y2 <- ymd("20011002"))
c(y1,y2)
## now in EDT (and a day earlier/4 hours before UTC):
## [1] "2001-09-30 20:00:00 EDT" "2001-10-01 20:00:00 EDT"
(y12 <- ymd(c("20011001","20011002")))
## [1] "2001-10-01 UTC" "2001-10-02 UTC"
c(y12)
## back in EDT
## [1] "2001-09-30 20:00:00 EDT" "2001-10-01 20:00:00 EDT"
You can set the time zone explicitly ...
y3 <- ymd("20011001",tz="EDT")
## [1] "2001-10-01 EDT"
But c() is still problematic.
(y3c <- c(y3))
## [1] "2001-09-30 20:00:00 EDT"
So two solutions are
convert a character vector rather than combining the objects after converting them one by one or
restore the tzone attribute after combining.
For example:
attr(y3c,"tzone") <- attr(y3,"tzone")
#Joran points out that this is almost certainly a general property of applying c() to POSIX[c?]t objects, not specifically lubridate-related. I hope someone will chime in and explain whether this is a well-known design decision/infelicity/misfeature.
Update: there is some discussion of this on R-help in 2012, and Brian Ripley comments:
But in any case, the documentation (?c.POSIXct) is clear:
Using ‘c’ on ‘"POSIXlt"’ objects converts them to the current time
zone, and on ‘"POSIXct"’ objects drops any ‘"tzone"’ attributes
(even if they are all marked with the same time zone).
So the recommended way is to add a "tzone" attribute if you know what
you want it to be. POSIXct objects are absolute times: the timezone
merely affects how they are converted (including to character for
printing).
It might be nice if lubridate added a method to do this ...

Resources