Floor datetime with custom start time (lubridate) - r

Is there a way to floor dates using a custom start time instead of the earliest possible time?
For example, flooring hours in a day into 2 12-hour intervals starting at 8am and 8pm rather than 12am and 12pm.
Example:
x <- ymd_hms("2009-08-03 21:00:00")
y <- ymd_hms("2009-08-03 09:00:00")
floor_date(x, '12 hours')
floor_date(y, '12 hours')
# default lubridate output:
[1] "2009-08-03 12:00:00 UTC"
[1] "2009-08-03 UTC"
# what i would like to have:
[1] "2009-08-03 20:00:00 UTC"
[1] "2009-08-03 08:00:00 UTC"

You could program a small switch (without lubridate, though).
FUN <- function(x) {
s <- switch(which.min(abs(mapply(`-`, c(8, 20), as.numeric(substr(x, 12, 13))))),
"08:00:00", "20:00:00")
as.POSIXct(paste(as.Date(x), s))
}
FUN("2009-08-03 21:00:00")
# [1] "2009-08-03 20:00:00 CEST"
FUN("2009-08-03 09:00:00")
# [1] "2009-08-03 08:00:00 CEST"

Related

Generate random times in sample of POSIXct

I want to generate a load of POSIXct dates. I want to have the time component only between 9am and 5pm and only at 15 minute blocks. I know how to generate the random POSIXct between certain dates but how do I specify the minute blocks and the time range. This is where I am at:
sample(seq(as.POSIXct('2013/01/01'), as.POSIXct('2017/05/01'), by="day"), 1000)
Just change the by argument to 15mins:
sample(seq(as.POSIXct('2013/01/01'), as.POSIXct('2017/05/01'), by="15 mins"), 1000)
EDIT:
I overlooked that the time component should be between 9am and 5pm. To take this into account I would filter the sequence:
library(lubridate)
possible_dates <- seq(as.POSIXct('2013/01/01'), as.POSIXct('2017/05/01'), by="15 mins")
possible_dates <- possible_dates[hour(possible_dates) < 17 & hour(possible_dates) >=9]
sample(possible_dates, 1000)
As #AEF also pointed out, you can use the argument by to create the sequence in steps of 15 minutes.
x <- seq(as.POSIXct('2013/01/01'), as.POSIXct('2017/05/01'), by="15 mins")
You then can use lubridate::hour() like this to extract the values from the sequence and create the sample:
library(lubridate)
sample(x[hour(x) > "09:00" & hour(x) < "17:00"], 1000)
# [1] "2015-06-28 12:45:00 CEST" "2014-05-04 10:15:00 CEST" "2017-01-08 01:00:00 CET" "2015-06-22 12:30:00 CEST"
# [5] "2016-01-14 13:30:00 CET" "2015-06-15 14:00:00 CEST" "2014-11-20 13:15:00 CET" "2013-09-23 11:15:00 CEST"
# [9] "2014-11-25 11:30:00 CET" "2014-12-04 15:30:00 CET" "2016-05-28 14:45:00 CEST" "2017-01-12 14:15:00 CET"
# .....
OK so I used this in the end:
ApptDate<-sample(seq(as.Date('2013/01/01'), as.Date('2017/05/01'), by="day"), 1000)
Time<-paste(sample(9:15,1000,replace=T),":",sample(seq(0,59,by=15),1000,replace=T),sep="")
FinalPOSIXDate<-as.POSIXct(paste(ApptDate," ",Time,sep=""))

Generate a uniformly sampled time series object in R

Hi I am looking to generate a uniformly sampled time series at 30 minute interval from a particular start date to some end date. However the constraint is that on each day the 30 minute interval begins at 7:00 and ends at 18:30 i.e. I need the time series object to be something like
c('2016-08-19 07:00:00',
'2016-08-19 07:30:00',
...,
'2016-08-19 18:30:00',
'2016-08-20 07:00:00',
...,
'2016-08-20 18:30:00',
...
'2016-08-31 18:30:00')
Without the constraints it can be done with something like
seq(as.POSIXct('2016-08-19 07:00:00'), as.POSIXct('2016-08-21 18:30:00'), by="30 min")
But I dont want the times between '2016-08-20 18:30:00' and '2016-08-21 07:30:00' in this case. Any help will be appreciated. Thanks!
Using the example series you created:
ts <- seq(as.POSIXct('2016-08-19 07:00:00'),
as.POSIXct('2016-08-21 18:30:00'), by="30 min")
Pull out the hours from your series using strftime:
hours <- strftime(ts, format="%H:%M:%S")
> head(hours)
[1] "07:00:00" "07:30:00"
[3] "08:00:00" "08:30:00"
[5] "09:00:00" "09:30:00"
You can then convert it back to POSIXct:
hours <- as.POSIXct(hours, format="%H:%M:%S")
This will retain the times of the day but it will make the date today's date:
> head(hours)
[1] "2016-09-11 07:00:00 EDT"
[2] "2016-09-11 07:30:00 EDT"
[3] "2016-09-11 08:00:00 EDT"
[4] "2016-09-11 08:30:00 EDT"
[5] "2016-09-11 09:00:00 EDT"
[6] "2016-09-11 09:30:00 EDT"
> tail(hours)
[1] "2016-09-11 16:00:00 EDT"
[2] "2016-09-11 16:30:00 EDT"
[3] "2016-09-11 17:00:00 EDT"
[4] "2016-09-11 17:30:00 EDT"
[5] "2016-09-11 18:00:00 EDT"
[6] "2016-09-11 18:30:00 EDT"
You can then create a TRUE/FALSE vector based on the condition you want:
condition <- hours > "2016-09-11 07:30:00 EDT" &
hours < "2016-09-11 18:30:00 EDT"
Then filter your original series based on this condition:
ts[condition]
Here is my short and handy solution with package lubridate
library("lubridate")
list <- lapply(0:2, function(x){
temp <- ymd_hms('2016-08-19 07:00:00') + days(x)
result <- temp + minutes(seq(0, 690, 30))
return(strftime(result))
})
do.call("c", list)
I have to use strftime(result) to remove the timezone and to have the right times.

Decompose xts hourly time series

I want to decompose hourly time series with decompose, ets, or stl or whatever function. Here is an example code and its output:
require(xts)
require(forecast)
time_index1 <- seq(from = as.POSIXct("2012-05-15 07:00"),
to = as.POSIXct("2012-05-17 18:00"), by="hour")
head(time_index1 <- format(time_index1, format="%Y-%m-%d %H:%M:%S",
tz="UTC", usetz=TRUE)
# [1] "2012-05-15 05:00:00 UTC" "2012-05-15 06:00:00 UTC"
# [3] "2012-05-15 07:00:00 UTC" "2012-05-15 08:00:00 UTC"
# [5] "2012-05-15 09:00:00 UTC" "2012-05-15 10:00:00 UTC"
head(time_index <- as.POSIXct(time_index1))
# [1] "2012-05-15 05:00:00 CEST" "2012-05-15 06:00:00 CEST"
# [3] "2012-05-15 07:00:00 CEST" "2012-05-15 08:00:00 CEST"
# [5] "2012-05-15 09:00:00 CEST" "2012-05-15 10:00:00 CEST"
Why does the timezone for time_index change back to CEST?
set.seed(1)
value <- rnorm(n = length(time_index1))
eventdata1 <- xts(value, order.by = time_index)
tzone(eventdata1)
# [1] ""
head(index(eventdata1))
# [1] "2012-05-15 05:00:00 CEST" "2012-05-15 06:00:00 CEST"
# [3] "2012-05-15 07:00:00 CEST" "2012-05-15 08:00:00 CEST"
# [5] "2012-05-15 09:00:00 CEST" "2012-05-15 10:00:00 CEST"
ets(eventdata1)
# ETS(A,N,N)
#
# Call:
# ets(y = eventdata1)
#
# Smoothing parameters:
# alpha = 1e-04
#
# Initial states:
# l = 0.1077
#
# sigma: 0.8481
#
# AIC AICc BIC
# 229.8835 230.0940 234.0722
decompose(eventdata1)
# Error in decompose(eventdata1) :
# time series has no or less than 2 periods
stl(eventdata1)
# Error in stl(eventdata1) :
# series is not periodic or has less than two periods
When I call tzone or indexTZ there is no timezone but the index clearly show that the times are defined with a timezone.
Also, why does only ets work? Can it be used to decompose a time series?
Why does the timezone for time_index change back to CEST?
Because you didn't specify tz= in your call to as.POSIXct. It will only pick up the timezone from the string if it's specified by offset from UTC (e.g. -0800). See ?strptime.
R> head(time_index <- as.POSIXct(time_index1, "UTC"))
[1] "2012-05-15 12:00:00 UTC" "2012-05-15 13:00:00 UTC"
[3] "2012-05-15 14:00:00 UTC" "2012-05-15 15:00:00 UTC"
[5] "2012-05-15 16:00:00 UTC" "2012-05-15 17:00:00 UTC"
When I call tzone or indexTZ there is no timezone but the index clearly show that the times are defined with a timezone.
All POSIXct objects have a timezone. A timezone of "" simply means R wasn't able to determine a specific timezone, so it is using the timezone specified by your operating system. See ?timezone.
Only the ets function works because your xts object doesn't have a properly defined frequency attribute. This is a known limitation of xts objects, and I have plans to address them over the next several months. You can work around the current issues by explicitly specifying the frequency attribute after calling the xts constructor.
R> set.seed(1)
R> value <- rnorm(n = length(time_index1))
R> eventdata1 <- xts(value, order.by = time_index)
R> attr(eventdata1, 'frequency') <- 24 # set frequency attribute
R> decompose(as.ts(eventdata1)) # decompose expects a 'ts' object
You can use tbats to decompose hourly data:
require(forecast)
set.seed(1)
time_index1 <- seq(from = as.POSIXct("2012-05-15 07:00"),
to = as.POSIXct("2012-05-17 18:00"), by="hour")
value <- rnorm(n = length(time_index1))
eventdata1 <- msts(value, seasonal.periods = c(24) )
seasonaldecomp <- tbats(eventdata1)
plot(seasonaldecomp)
Additionally, using msts instead of xts allows you to specify multiple seasons/cycles, fore instance hourly as well as daily: c(24, 24*7)

How to add the time to a date when using as.date?

I have measurements that were taken at this time: 13880 and they represent "days since 1970-01-01 00:00:00"
So now I want to know the dat and time:
as.Date(13880, origin="1970-01-01")
[1] "2008-01-02" # works fine
Now to add the time:
as.Date(13880, origin="1970-01-01",tz = "UTC", format="%Y/%m/%d %H:%M:%S")
[1] NA
or
as.POSIXct(13880, origin="1970-01-01")
[1] "1970-01-01 04:51:20 CET"
as.POSIXlt(13879, origin="1970-01-01")
[1] "1970-01-01 04:51:19 CET"
None of these worked for me. Any idea?
as.POSIXct(as.Date("1970-01-01") + 13880) # returns "2008-01-01 19:00:00 EST"
as.POSIXct(as.Date("1970-01-01") + 13880.5) # returns "2008-01-02 07:00:00 EST"
You can also set your time zone:
How to change the default time zone in R?
also: http://blog.revolutionanalytics.com/2009/06/converting-time-zones.html

How to create a range of dates in R

From two integers (1, 5) one can create a range in the following way
1:5
[1] 1 2 3 4 5
How can you make a range of dates if you are give two dates ("2014-09-04 JST", "2014-09-11 JST")
The output must be
[1] ("2014-09-04 JST", "2014-09-05 JST", "2014-09-06 JST", "2014-09-07 JST", "2014-09-08 JST")
Does this help?
seq(as.Date("2014/09/04"), by = "day", length.out = 5)
# [1] "2014-09-04" "2014-09-05" "2014-09-06" "2014-09-07" "2014-09-08"
edit: adding in something about timezones
this works for my current timezone
seq(c(ISOdate(2014,4,9)), by = "DSTday", length.out = 5)
#[1] "2014-04-09 08:00:00 EDT" "2014-04-10 08:00:00 EDT" "2014-04-11 08:00:00 EDT" "2014-04-12 08:00:00 EDT"
#[5] "2014-04-13 08:00:00 EDT"
edit2:
OlsonNames() # I used this to find out what to write for the JST tz - it's "Japan"
x <- as.POSIXct("2014-09-04 23:59:59", tz="Japan")
format(seq(x, by="day", length.out=5), "%Y-%m-%d %Z")
# [1] "2014-09-04 JST" "2014-09-05 JST" "2014-09-06 JST" "2014-09-07 JST" "2014-09-08 JST"
To get a sequence of dates ( days, weeks,.. ) using only start and end dates you can use:
seq(as.Date("2014/1/1"), as.Date("2014/1/10"), "days”)
[1] "2014-01-01" "2014-01-02" "2014-01-03" "2014-01-04" "2014-01-05" "2014-01-06" "2014-01-07"
[8] "2014-01-08" "2014-01-09" "2014-01-10”
Here's an answer, admittedly worse than #jalapic's, that doesn't use seq and instead uses a for loop:
date1 <- "2014-09-04"
date2 <- "2014-09-11"
dif <- as.numeric(abs(as.Date(date1) - as.Date(date2)))
dates <- vector()
for (i in 1:dif) {
date <- (as.Date(date1) + i)
dates <- append(dates, date)
}
# [1] "2014-09-05" "2014-09-06" "2014-09-07" "2014-09-08" "2014-09-09" "2014-09-10" "2014-09-11
here's a shot though the timezone JST isn't recognized by my system
d1<-ISOdate(year=2014,month=9,day=4,tz="GMT")
seq(from=d1,by="day",length.out=5)
[1] "2014-09-04 12:00:00 GMT" "2014-09-05 12:00:00 GMT" "2014-09-06 12:00:00 GMT" "2014-09-07 12:00:00 GMT" "2014-09-08 12:00:00 GMT"
While using seq(date1, date2, "days") is by far the better option in nearly all cases, I'd just like to add, that the following works too, if you need a range of dates that are n_number of days from a date:
1:10 + as.Date("2020-01-01")
# [1] "2020-01-02" "2020-01-03" "2020-01-04" "2020-01-05"
# [5] "2020-01-06" "2020-01-07" "2020-01-08" "2020-01-09"
# [9] "2020-01-10" "2020-01-11"

Resources