I need to create a sequence of dates and times in R, increasing in 15 minute periods.
Currently, I am doing this:
datestimes=seq(as.POSIXlt("2011-01-01 00:00:00"), as.POSIXlt("2015-09-30 23:45:00"), by="15 min")
I should have one reading for each time in the year. The problem is that since it is adjusting for BST, I get two values for certain dates in October.
anm=aggregate(datestimes, by=list(datestimes$datestimes), FUN=length)
anm[which(anm$datestimes>1),]
Group.1 datestimes X.Date.
28993 2011-10-30 01:00:00 2 2
28994 2011-10-30 01:15:00 2 2
28995 2011-10-30 01:30:00 2 2
28996 2011-10-30 01:45:00 2 2
63933 2012-10-28 01:00:00 2 2
63934 2012-10-28 01:15:00 2 2
63935 2012-10-28 01:30:00 2 2
63936 2012-10-28 01:45:00 2 2
98873 2013-10-27 01:00:00 2 2
98874 2013-10-27 01:15:00 2 2
98875 2013-10-27 01:30:00 2 2
98876 2013-10-27 01:45:00 2 2
133813 2014-10-26 01:00:00 2 2
133814 2014-10-26 01:15:00 2 2
133815 2014-10-26 01:30:00 2 2
133816 2014-10-26 01:45:00 2 2
I tried using the as.chron command since this does not use timezones, but it will not allow increments of 15 minutes which is what I need.
The problem is that since it is adjusting for BST, I get two values for certain dates in October.
That's because the 'fall back' (mnemonic for daylight savings times adjustment adding an hour in the fall) happens under human time and that is what you get by default unless you override it.
R> seq(as.POSIXlt("2012-10-28 00:00:00", tz="UTC"),
+ as.POSIXlt("2012-10-28 03:00:00", tz="UTC"), by="15 min")
[1] "2012-10-28 00:00:00 UTC" "2012-10-28 00:15:00 UTC"
[3] "2012-10-28 00:30:00 UTC" "2012-10-28 00:45:00 UTC"
[5] "2012-10-28 01:00:00 UTC" "2012-10-28 01:15:00 UTC"
[7] "2012-10-28 01:30:00 UTC" "2012-10-28 01:45:00 UTC"
[9] "2012-10-28 02:00:00 UTC" "2012-10-28 02:15:00 UTC"
[11] "2012-10-28 02:30:00 UTC" "2012-10-28 02:45:00 UTC"
[13] "2012-10-28 03:00:00 UTC"
R>
The example I show here covers the same subset as above but without the fall back as we now impose UTC as a time zone. And UTC has be construction no daylight savings adjustment.
Maybe try this (UTC timezone should not allow any duplicate):
datestimes=seq(as.POSIXlt("2015-09-01 00:00:00", tz="UTC"),
as.POSIXlt("2015-10-30 23:45:00", tz="UTC"),
by="15 min")
Related
I am trying to import in R a text file including datetimes. Times are stored in character format, without timezone information, but we know it is French time (Europe/Paris).
An issue arise for the days of timezone change: e.g. there is a time change from 2018-10-28 03:00:00 CEST to 2018-10-28 02:00:00 CET, thus we have duplicates in our character format, and R cannot tell wether it is CEST or CET.
Consider the following example:
data_in <- "date,val
2018-10-28 01:30:00,25
2018-10-28 02:00:00,26
2018-10-28 02:30:00,27
2018-10-28 02:00:00,28
2018-10-28 02:30:00,29
2018-10-28 03:00:00,30"
library(readr)
data <- read_delim(data_in, ",", locale = locale(tz = "Europe/Paris"))
We end up having duplicates in our dates:
data$date
[1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CET" "2018-10-28 02:00:00 CEST"
[5] "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
Expected output would be:
data$date
[1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CEST" "2018-10-28 02:00:00 CET"
[5] "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
Any idea how to solve the issue (besides telling people to use UTC or ISO formats). I guess the only way is to suppose the dates are sorted, so we can tell the first ones are CEST.
If you are certain that your time is always-increasing, then you can look for an apparent decrease (of time-of-day) and manually insert the TZ offset to the string, then parse as usual. I added some logic to look for this decrease only around 2-3am so that if you have multiple days of data spanning midnight, you would not get a false-alarm.
data <- read.csv(text = data_in)
fakedate <- as.POSIXct(gsub("^[-0-9]+ ", "2000-01-01 ", data$date))
decreases <- cumany(grepl(" 0[23]:", data$date) & c(FALSE, diff(fakedate) < 0))
data$date <- paste(data$date, ifelse(decreases, "+0100", "+0200"))
data
# date val
# 1 2018-10-28 01:30:00 +0200 25
# 2 2018-10-28 02:00:00 +0200 26
# 3 2018-10-28 02:30:00 +0200 27
# 4 2018-10-28 02:00:00 +0100 28
# 5 2018-10-28 02:30:00 +0100 29
# 6 2018-10-28 03:00:00 +0100 30
as.POSIXct(data$date, format="%Y-%m-%d %H:%M:%S %z", tz="Europe/Paris")
# [1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CEST"
# [4] "2018-10-28 02:00:00 CET" "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
My use of "2000-01-01" was just some non-DST day so that we can parse the timestamp into POSIXt and calculate a diff on it. (If we didn't insert a date, we could still use as.POSIXct with a format, but if you ever ran this on one of the two DST days, you might get different results since as.POSIXct("01:02:03", format="%H:%M:%S") always assumes "today".
This is obviously a bit fragile with its assumptions, but perhaps it'll be good enough for what you need.
When converting a date object to a POSIXct object, I expected the hours to be zero.
Turns out the hours are either 1 or 2, depending on summer/winter time.
eg:
oct.days <- (as.Date("2018-10-26")+0:5)
as.POSIXct(oct.days)
[1] "2018-10-26 02:00:00 CEST" "2018-10-27 02:00:00 CEST" "2018-10-28 02:00:00 CEST"
[4] "2018-10-29 01:00:00 CET" "2018-10-30 01:00:00 CET" "2018-10-31 01:00:00 CET"
(I'm in Germany, winter time was implemented on Oct 28th at 3 am.)
Rounding it down fixed the issue
round(as.POSIXct(oct.days),"days")
but I wonder for what reason the date object contains extra hours?
tks!
I know this question has been asked over and over again. But this time, the problem is a little different.
a<-matrix(c("01-02-2014", "02-02-2014", "03-02-2014",
"04-02-2014","05-02-2014","0 1", "0 2", "0 3", "0 4","0 5"),nrow=5)
a<-data.frame(a)
names(a)<-c("date","time")
a$date<-as.Date(a$date, format="%d-%m-%Y")
So now I get this data frame.
date time
1 2014-02-01 0 1
2 2014-02-02 0 2
3 2014-02-03 0 3
4 2014-02-04 0 4
5 2014-02-05 0 5
As you can see, the time is the minute of the day, but it is not in typical 00:00 form so the R doesnt recognize it as time, so my question is how do i tranform the time column into a 00:00 format so i can merge with date column to form %Y%m%d %H:%M ??
We can use sprintf after splitting the 'time' column by space (" ") to get the required format
a$time <- sapply(strsplit(as.character(a$time), " "),
function(x) do.call(sprintf, c(fmt = "%02d:%02d", as.list(as.numeric(x)))))
a$time
#[1] "00:01" "00:02" "00:03" "00:04" "00:05"
Then, paste the columns and convert to POSIXct
as.POSIXct(paste(a$date, a$time))
#[1] "2014-02-01 00:01:00 EST" "2014-02-02 00:02:00 EST"
#[3] "2014-02-03 00:03:00 EST" "2014-02-04 00:04:00 EST"
#[5] "2014-02-05 00:05:00 EST"
Or using lubridate we can directly convert it to POSIXct without formatting the 'time' column
library(lubridate)
ymd_hm(paste(a$date, a$time), tz = "EST")
#[1] "2014-02-01 00:01:00 EST" "2014-02-02 00:02:00 EST"
#[3] "2014-02-03 00:03:00 EST" "2014-02-04 00:04:00 EST"
#[5] "2014-02-05 00:05:00 EST"
you do not need to do anything. Just use it the way it is:
strptime(do.call(paste,a),"%Y-%m-%d %H %M","UTC")
[1] "2014-02-01 00:01:00 UTC" "2014-02-02 00:02:00 UTC"
[3] "2014-02-03 00:03:00 UTC" "2014-02-04 00:04:00 UTC"
[5] "2014-02-05 00:05:00 UTC"
or just even
strptime(paste(a$date,a$time),"%Y-%m-%d %H %M")
[1] "2014-02-01 00:01:00 PST" "2014-02-02 00:02:00 PST"
[3] "2014-02-03 00:03:00 PST" "2014-02-04 00:04:00 PST"
[5] "2014-02-05 00:05:00 PST"
Time Customer Count
11:00 13
13:00 25
15:00 22
17:00 21
19:00 15
21:00 10
I have the above data frame for the number of customers coming into a small shop from 11:00 (11am) to 21:00 (9pm).
I need to make a time series of this data but I'm having trouble declaring the ts function for this data. There is only data for 10 hours from 11 to 9 and it's only taken every two hours. I can't decide how to declare the frequency.
If anyone could help, would be really grateful. Thanks in advance
One option is to create sequence 2 hours basis and then filter for time-range as:
library(lubridate)
v <- seq(from = as.POSIXct("2017-01-01 11:00"),
to = as.POSIXct("2017-01-05 21:00"), by = "2 hour")
v[hour(v)>=9 & hour(v)<=21]
# [1] "2017-01-01 11:00:00 GMT" "2017-01-01 13:00:00 GMT" "2017-01-01 15:00:00 GMT"
# [4] "2017-01-01 17:00:00 GMT" "2017-01-01 19:00:00 GMT" "2017-01-01 21:00:00 GMT"
# [7] "2017-01-02 09:00:00 GMT" "2017-01-02 11:00:00 GMT" "2017-01-02 13:00:00 GMT"
# [10] "2017-01-02 15:00:00 GMT" "2017-01-02 17:00:00 GMT" "2017-01-02 19:00:00 GMT"
# [13] "2017-01-02 21:00:00 GMT" "2017-01-03 09:00:00 GMT" "2017-01-03 11:00:00 GMT"
# [16] "2017-01-03 13:00:00 GMT" "2017-01-03 15:00:00 GMT" "2017-01-03 17:00:00 GMT"
# so on few more rows.
The above seq generates time-series from 1st January till 5th January at 2 hours interval. A filter condition is applied once series has been generated. The filter condition considers only time-date for which hour >= 9 and hour <= 21. This will provide desired time-series.
I have a straight sequence of time series, for example:
library(lubridate)
start = parse_date_time("2018-01-01","%Y-%m-%d")
end = parse_date_time("2018-01-02","%Y-%m-%d")
series = seq(start,end,by=600)
> series
[1] "2018-01-01 00:00:00 UTC" "2018-01-01 00:10:00 UTC" "2018-01-01 00:20:00 UTC" "2018-01-01 00:30:00 UTC"
[5] "2018-01-01 00:40:00 UTC" "2018-01-01 00:50:00 UTC" "2018-01-01 01:00:00 UTC" "2018-01-01 01:10:00 UTC"
[9] "2018-01-01 01:20:00 UTC" "2018-01-01 01:30:00 UTC" "2018-01-01 01:40:00 UTC" "2018-01-01 01:50:00 UTC"
[13] "2018-01-01 02:00:00 UTC" "2018-01-01 02:10:00 UTC" "2018-01-01 02:20:00 UTC" "2018-01-01 02:30:00 UTC"...
And I also have a vector of irregular status, for example:
error = data.frame(
on = parse_date_time(c("2018-01-01 00:13:57","2018-01-01 01:01:44"),"%Y-%m-%d %H:%M:%S"),
off = parse_date_time(c("2018-01-01 00:21:32","2018-01-01 02:33:45"),"%Y-%m-%d %H:%M:%S")
)
> error
on off
1 2018-01-01 00:13:57 2018-01-01 00:21:32
2 2018-01-01 01:01:44 2018-01-01 02:33:45
How can I flag my series with the error just like below?
> flag
series error
[1] "2018-01-01 00:00:00 UTC" "OK"
[2] "2018-01-01 00:10:00 UTC" "OK"
[3] "2018-01-01 00:20:00 UTC" "ERROR"
[4] "2018-01-01 00:30:00 UTC" "ERROR"
[5] "2018-01-01 00:40:00 UTC" "OK"
[6] "2018-01-01 00:50:00 UTC" "OK"
[7] "2018-01-01 01:00:00 UTC" "OK"
[8] "2018-01-01 01:10:00 UTC" "ERROR"
[9] "2018-01-01 01:20:00 UTC" "ERROR"
[10] "2018-01-01 01:30:00 UTC" "ERROR"
[11] "2018-01-01 01:40:00 UTC" "ERROR"
[12] "2018-01-01 01:50:00 UTC" "ERROR"
[13] "2018-01-01 02:00:00 UTC" "ERROR"
[14] "2018-01-01 02:10:00 UTC" "ERROR"
[15] "2018-01-01 02:20:00 UTC" "ERROR"
[16] "2018-01-01 02:30:00 UTC" "ERROR"
[17] "2018-01-01 02:40:00 UTC" "ERROR"
[18] "2018-01-01 02:50:00 UTC" "OK"
Here is a solution using map_lgl, because lubridate intervals play funny with dplyr functions for me. Note that I use ceiling_date on off to reproduce your desired output, even though it's not obvious to me why the last row counts as ERROR since, for example, row 4 in the output "2018-01-01 00:30:00 UTC" is after the first off value "2018-01-01 00:21:32". The key parts are simply the creation of intervals with interval (or alternatively, on %--% off) and then the use of any(%within%) to return a logical value for whether a given value in the series is inside one of the error intervals. ifelse lets us convert the values into character flags.
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
start = parse_date_time("2018-01-01","%Y-%m-%d")
end = parse_date_time("2018-01-02","%Y-%m-%d")
series = seq(start,end,by=600)
error = data.frame(
on = parse_date_time(c("2018-01-01 00:13:57","2018-01-01 01:01:44"),"%Y-%m-%d %H:%M:%S"),
off = parse_date_time(c("2018-01-01 00:21:32","2018-01-01 02:33:45"),"%Y-%m-%d %H:%M:%S")
) %>%
mutate(
off = ceiling_date(off, unit = "10 minutes"),
intvs = interval(on, off)
)
series %>%
tibble(dttm = .) %>%
bind_cols(status = map_lgl(series, ~ any(. %within% error$intvs))) %>%
mutate(status = ifelse(status == TRUE, "ERROR", "OK")) %>%
print(n = 20)
#> # A tibble: 145 x 2
#> dttm status
#> <dttm> <chr>
#> 1 2018-01-01 00:00:00 OK
#> 2 2018-01-01 00:10:00 OK
#> 3 2018-01-01 00:20:00 ERROR
#> 4 2018-01-01 00:30:00 ERROR
#> 5 2018-01-01 00:40:00 OK
#> 6 2018-01-01 00:50:00 OK
#> 7 2018-01-01 01:00:00 OK
#> 8 2018-01-01 01:10:00 ERROR
#> 9 2018-01-01 01:20:00 ERROR
#> 10 2018-01-01 01:30:00 ERROR
#> 11 2018-01-01 01:40:00 ERROR
#> 12 2018-01-01 01:50:00 ERROR
#> 13 2018-01-01 02:00:00 ERROR
#> 14 2018-01-01 02:10:00 ERROR
#> 15 2018-01-01 02:20:00 ERROR
#> 16 2018-01-01 02:30:00 ERROR
#> 17 2018-01-01 02:40:00 ERROR
#> 18 2018-01-01 02:50:00 OK
#> 19 2018-01-01 03:00:00 OK
#> 20 2018-01-01 03:10:00 OK
#> # ... with 125 more rows
Created on 2018-03-15 by the reprex package (v0.2.0).