Convert two columns into date and time in r - r

I know this question has been asked over and over again. But this time, the problem is a little different.
a<-matrix(c("01-02-2014", "02-02-2014", "03-02-2014",
"04-02-2014","05-02-2014","0 1", "0 2", "0 3", "0 4","0 5"),nrow=5)
a<-data.frame(a)
names(a)<-c("date","time")
a$date<-as.Date(a$date, format="%d-%m-%Y")
So now I get this data frame.
date time
1 2014-02-01 0 1
2 2014-02-02 0 2
3 2014-02-03 0 3
4 2014-02-04 0 4
5 2014-02-05 0 5
As you can see, the time is the minute of the day, but it is not in typical 00:00 form so the R doesnt recognize it as time, so my question is how do i tranform the time column into a 00:00 format so i can merge with date column to form %Y%m%d %H:%M ??

We can use sprintf after splitting the 'time' column by space (" ") to get the required format
a$time <- sapply(strsplit(as.character(a$time), " "),
function(x) do.call(sprintf, c(fmt = "%02d:%02d", as.list(as.numeric(x)))))
a$time
#[1] "00:01" "00:02" "00:03" "00:04" "00:05"
Then, paste the columns and convert to POSIXct
as.POSIXct(paste(a$date, a$time))
#[1] "2014-02-01 00:01:00 EST" "2014-02-02 00:02:00 EST"
#[3] "2014-02-03 00:03:00 EST" "2014-02-04 00:04:00 EST"
#[5] "2014-02-05 00:05:00 EST"
Or using lubridate we can directly convert it to POSIXct without formatting the 'time' column
library(lubridate)
ymd_hm(paste(a$date, a$time), tz = "EST")
#[1] "2014-02-01 00:01:00 EST" "2014-02-02 00:02:00 EST"
#[3] "2014-02-03 00:03:00 EST" "2014-02-04 00:04:00 EST"
#[5] "2014-02-05 00:05:00 EST"

you do not need to do anything. Just use it the way it is:
strptime(do.call(paste,a),"%Y-%m-%d %H %M","UTC")
[1] "2014-02-01 00:01:00 UTC" "2014-02-02 00:02:00 UTC"
[3] "2014-02-03 00:03:00 UTC" "2014-02-04 00:04:00 UTC"
[5] "2014-02-05 00:05:00 UTC"
or just even
strptime(paste(a$date,a$time),"%Y-%m-%d %H %M")
[1] "2014-02-01 00:01:00 PST" "2014-02-02 00:02:00 PST"
[3] "2014-02-03 00:03:00 PST" "2014-02-04 00:04:00 PST"
[5] "2014-02-05 00:05:00 PST"

Related

Generate an ordered series of datetime

I am working in R.
I have to generate a series of dates and times. In particular, I would like to have two data points per day, hence to assign twice each date with a different time, for instance:
"2001-05-13 00:00:00"
"2001-05-13 12:00:00"
"2001-05-14 00:00:00"
"2001-05-14 12:00:00"
I found the following code to produce a series of dates:
seq(as.Date("2000/1/1"), as.Date("2003/1/1"), by = 0.5)
Nevertheless, even if I set the by = 0.5, the code returns only a date , not a datetime.
Any idea how to produce a series of datetimes?
as.Date will produce only dates, use as.POSIXct to produce date-time.
seq(as.POSIXct("2000-01-01 00:00:00", tz = 'UTC'),
as.POSIXct("2003-01-01 00:00:00", tz = 'UTC'), by = '12 hours')
# [1] "2000-01-01 00:00:00 UTC" "2000-01-01 12:00:00 UTC"
# [3] "2000-01-02 00:00:00 UTC" "2000-01-02 12:00:00 UTC"
# [5] "2000-01-03 00:00:00 UTC" "2000-01-03 12:00:00 UTC"
# [7] "2000-01-04 00:00:00 UTC" "2000-01-04 12:00:00 UTC"
# [9] "2000-01-05 00:00:00 UTC" "2000-01-05 12:00:00 UTC"
#[11] "2000-01-06 00:00:00 UTC" "2000-01-06 12:00:00 UTC"
#[13] "2000-01-07 00:00:00 UTC" "2000-01-07 12:00:00 UTC"
#...
#...

Read character datetimes without timezones

I am trying to import in R a text file including datetimes. Times are stored in character format, without timezone information, but we know it is French time (Europe/Paris).
An issue arise for the days of timezone change: e.g. there is a time change from 2018-10-28 03:00:00 CEST to 2018-10-28 02:00:00 CET, thus we have duplicates in our character format, and R cannot tell wether it is CEST or CET.
Consider the following example:
data_in <- "date,val
2018-10-28 01:30:00,25
2018-10-28 02:00:00,26
2018-10-28 02:30:00,27
2018-10-28 02:00:00,28
2018-10-28 02:30:00,29
2018-10-28 03:00:00,30"
library(readr)
data <- read_delim(data_in, ",", locale = locale(tz = "Europe/Paris"))
We end up having duplicates in our dates:
data$date
[1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CET" "2018-10-28 02:00:00 CEST"
[5] "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
Expected output would be:
data$date
[1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CEST" "2018-10-28 02:00:00 CET"
[5] "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
Any idea how to solve the issue (besides telling people to use UTC or ISO formats). I guess the only way is to suppose the dates are sorted, so we can tell the first ones are CEST.
If you are certain that your time is always-increasing, then you can look for an apparent decrease (of time-of-day) and manually insert the TZ offset to the string, then parse as usual. I added some logic to look for this decrease only around 2-3am so that if you have multiple days of data spanning midnight, you would not get a false-alarm.
data <- read.csv(text = data_in)
fakedate <- as.POSIXct(gsub("^[-0-9]+ ", "2000-01-01 ", data$date))
decreases <- cumany(grepl(" 0[23]:", data$date) & c(FALSE, diff(fakedate) < 0))
data$date <- paste(data$date, ifelse(decreases, "+0100", "+0200"))
data
# date val
# 1 2018-10-28 01:30:00 +0200 25
# 2 2018-10-28 02:00:00 +0200 26
# 3 2018-10-28 02:30:00 +0200 27
# 4 2018-10-28 02:00:00 +0100 28
# 5 2018-10-28 02:30:00 +0100 29
# 6 2018-10-28 03:00:00 +0100 30
as.POSIXct(data$date, format="%Y-%m-%d %H:%M:%S %z", tz="Europe/Paris")
# [1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CEST"
# [4] "2018-10-28 02:00:00 CET" "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
My use of "2000-01-01" was just some non-DST day so that we can parse the timestamp into POSIXt and calculate a diff on it. (If we didn't insert a date, we could still use as.POSIXct with a format, but if you ever ran this on one of the two DST days, you might get different results since as.POSIXct("01:02:03", format="%H:%M:%S") always assumes "today".
This is obviously a bit fragile with its assumptions, but perhaps it'll be good enough for what you need.

How can I flag a regular time series with a irregular error sign in R?

I have a straight sequence of time series, for example:
library(lubridate)
start = parse_date_time("2018-01-01","%Y-%m-%d")
end = parse_date_time("2018-01-02","%Y-%m-%d")
series = seq(start,end,by=600)
> series
[1] "2018-01-01 00:00:00 UTC" "2018-01-01 00:10:00 UTC" "2018-01-01 00:20:00 UTC" "2018-01-01 00:30:00 UTC"
[5] "2018-01-01 00:40:00 UTC" "2018-01-01 00:50:00 UTC" "2018-01-01 01:00:00 UTC" "2018-01-01 01:10:00 UTC"
[9] "2018-01-01 01:20:00 UTC" "2018-01-01 01:30:00 UTC" "2018-01-01 01:40:00 UTC" "2018-01-01 01:50:00 UTC"
[13] "2018-01-01 02:00:00 UTC" "2018-01-01 02:10:00 UTC" "2018-01-01 02:20:00 UTC" "2018-01-01 02:30:00 UTC"...
And I also have a vector of irregular status, for example:
error = data.frame(
on = parse_date_time(c("2018-01-01 00:13:57","2018-01-01 01:01:44"),"%Y-%m-%d %H:%M:%S"),
off = parse_date_time(c("2018-01-01 00:21:32","2018-01-01 02:33:45"),"%Y-%m-%d %H:%M:%S")
)
> error
on off
1 2018-01-01 00:13:57 2018-01-01 00:21:32
2 2018-01-01 01:01:44 2018-01-01 02:33:45
How can I flag my series with the error just like below?
> flag
series error
[1] "2018-01-01 00:00:00 UTC" "OK"
[2] "2018-01-01 00:10:00 UTC" "OK"
[3] "2018-01-01 00:20:00 UTC" "ERROR"
[4] "2018-01-01 00:30:00 UTC" "ERROR"
[5] "2018-01-01 00:40:00 UTC" "OK"
[6] "2018-01-01 00:50:00 UTC" "OK"
[7] "2018-01-01 01:00:00 UTC" "OK"
[8] "2018-01-01 01:10:00 UTC" "ERROR"
[9] "2018-01-01 01:20:00 UTC" "ERROR"
[10] "2018-01-01 01:30:00 UTC" "ERROR"
[11] "2018-01-01 01:40:00 UTC" "ERROR"
[12] "2018-01-01 01:50:00 UTC" "ERROR"
[13] "2018-01-01 02:00:00 UTC" "ERROR"
[14] "2018-01-01 02:10:00 UTC" "ERROR"
[15] "2018-01-01 02:20:00 UTC" "ERROR"
[16] "2018-01-01 02:30:00 UTC" "ERROR"
[17] "2018-01-01 02:40:00 UTC" "ERROR"
[18] "2018-01-01 02:50:00 UTC" "OK"
Here is a solution using map_lgl, because lubridate intervals play funny with dplyr functions for me. Note that I use ceiling_date on off to reproduce your desired output, even though it's not obvious to me why the last row counts as ERROR since, for example, row 4 in the output "2018-01-01 00:30:00 UTC" is after the first off value "2018-01-01 00:21:32". The key parts are simply the creation of intervals with interval (or alternatively, on %--% off) and then the use of any(%within%) to return a logical value for whether a given value in the series is inside one of the error intervals. ifelse lets us convert the values into character flags.
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
start = parse_date_time("2018-01-01","%Y-%m-%d")
end = parse_date_time("2018-01-02","%Y-%m-%d")
series = seq(start,end,by=600)
error = data.frame(
on = parse_date_time(c("2018-01-01 00:13:57","2018-01-01 01:01:44"),"%Y-%m-%d %H:%M:%S"),
off = parse_date_time(c("2018-01-01 00:21:32","2018-01-01 02:33:45"),"%Y-%m-%d %H:%M:%S")
) %>%
mutate(
off = ceiling_date(off, unit = "10 minutes"),
intvs = interval(on, off)
)
series %>%
tibble(dttm = .) %>%
bind_cols(status = map_lgl(series, ~ any(. %within% error$intvs))) %>%
mutate(status = ifelse(status == TRUE, "ERROR", "OK")) %>%
print(n = 20)
#> # A tibble: 145 x 2
#> dttm status
#> <dttm> <chr>
#> 1 2018-01-01 00:00:00 OK
#> 2 2018-01-01 00:10:00 OK
#> 3 2018-01-01 00:20:00 ERROR
#> 4 2018-01-01 00:30:00 ERROR
#> 5 2018-01-01 00:40:00 OK
#> 6 2018-01-01 00:50:00 OK
#> 7 2018-01-01 01:00:00 OK
#> 8 2018-01-01 01:10:00 ERROR
#> 9 2018-01-01 01:20:00 ERROR
#> 10 2018-01-01 01:30:00 ERROR
#> 11 2018-01-01 01:40:00 ERROR
#> 12 2018-01-01 01:50:00 ERROR
#> 13 2018-01-01 02:00:00 ERROR
#> 14 2018-01-01 02:10:00 ERROR
#> 15 2018-01-01 02:20:00 ERROR
#> 16 2018-01-01 02:30:00 ERROR
#> 17 2018-01-01 02:40:00 ERROR
#> 18 2018-01-01 02:50:00 OK
#> 19 2018-01-01 03:00:00 OK
#> 20 2018-01-01 03:10:00 OK
#> # ... with 125 more rows
Created on 2018-03-15 by the reprex package (v0.2.0).

Create a time series by 30 minute intervals

I am trying to create a time series with 30 min intervals. I used the following command with the output also shown:
ts = seq(as.POSIXct("2009-01-01 00:00"), as.POSIXct("2014-12-31 23:30"),by = "hour")
"2010-02-21 12:00:00 EST" "2010-02-21 13:00:00 EST" "2010-02-21 14:00:00 EST"
When I change it to by ="min" it changes to be every minute.
How do I create a time series with every 30 minute intervals?
You can specify minutes in the by argument, and pass the time zone "UTC" as Adrian pointed out. Check ?seq.POSIXt for more details about the by argument specified as a character string:
A character string, containing one of "sec", "min", "hour", "day",
"DSTday", "week", "month", "quarter" or "year". This can optionally be
preceded by a (positive or negative) integer and a space, or followed
by "s".
ts <- seq(as.POSIXct("2017-01-01", tz = "UTC"),
as.POSIXct("2017-01-02", tz = "UTC"),
by = "30 min")
head(ts)
Output
[1] "2017-01-01 00:00:00 UTC"
[2] "2017-01-01 00:30:00 UTC"
[3] "2017-01-01 01:00:00 UTC"
[4] "2017-01-01 01:30:00 UTC"
[5] "2017-01-01 02:00:00 UTC"
[6] "2017-01-01 02:30:00 UTC"
Default units are seconds. So just do 1800 seconds to get 30 minutes.
ts = seq(as.POSIXct("2009-01-01 00:00"), as.POSIXct("2014-12-31 23:30"),by = 1800)
ts[1:20]
[1] "2009-01-01 00:00:00 EST" "2009-01-01 00:30:00 EST" "2009-01-01 01:00:00 EST" "2009-01-01 01:30:00 EST" "2009-01-01 02:00:00 EST"
[6] "2009-01-01 02:30:00 EST" "2009-01-01 03:00:00 EST" "2009-01-01 03:30:00 EST" "2009-01-01 04:00:00 EST" "2009-01-01 04:30:00 EST"
[11] "2009-01-01 05:00:00 EST" "2009-01-01 05:30:00 EST" "2009-01-01 06:00:00 EST" "2009-01-01 06:30:00 EST" "2009-01-01 07:00:00 EST"
[16] "2009-01-01 07:30:00 EST" "2009-01-01 08:00:00 EST" "2009-01-01 08:30:00 EST" "2009-01-01 09:00:00 EST" "2009-01-01 09:30:00 EST"

Create sequence of dates and times in R without time zones

I need to create a sequence of dates and times in R, increasing in 15 minute periods.
Currently, I am doing this:
datestimes=seq(as.POSIXlt("2011-01-01 00:00:00"), as.POSIXlt("2015-09-30 23:45:00"), by="15 min")
I should have one reading for each time in the year. The problem is that since it is adjusting for BST, I get two values for certain dates in October.
anm=aggregate(datestimes, by=list(datestimes$datestimes), FUN=length)
anm[which(anm$datestimes>1),]
Group.1 datestimes X.Date.
28993 2011-10-30 01:00:00 2 2
28994 2011-10-30 01:15:00 2 2
28995 2011-10-30 01:30:00 2 2
28996 2011-10-30 01:45:00 2 2
63933 2012-10-28 01:00:00 2 2
63934 2012-10-28 01:15:00 2 2
63935 2012-10-28 01:30:00 2 2
63936 2012-10-28 01:45:00 2 2
98873 2013-10-27 01:00:00 2 2
98874 2013-10-27 01:15:00 2 2
98875 2013-10-27 01:30:00 2 2
98876 2013-10-27 01:45:00 2 2
133813 2014-10-26 01:00:00 2 2
133814 2014-10-26 01:15:00 2 2
133815 2014-10-26 01:30:00 2 2
133816 2014-10-26 01:45:00 2 2
I tried using the as.chron command since this does not use timezones, but it will not allow increments of 15 minutes which is what I need.
The problem is that since it is adjusting for BST, I get two values for certain dates in October.
That's because the 'fall back' (mnemonic for daylight savings times adjustment adding an hour in the fall) happens under human time and that is what you get by default unless you override it.
R> seq(as.POSIXlt("2012-10-28 00:00:00", tz="UTC"),
+ as.POSIXlt("2012-10-28 03:00:00", tz="UTC"), by="15 min")
[1] "2012-10-28 00:00:00 UTC" "2012-10-28 00:15:00 UTC"
[3] "2012-10-28 00:30:00 UTC" "2012-10-28 00:45:00 UTC"
[5] "2012-10-28 01:00:00 UTC" "2012-10-28 01:15:00 UTC"
[7] "2012-10-28 01:30:00 UTC" "2012-10-28 01:45:00 UTC"
[9] "2012-10-28 02:00:00 UTC" "2012-10-28 02:15:00 UTC"
[11] "2012-10-28 02:30:00 UTC" "2012-10-28 02:45:00 UTC"
[13] "2012-10-28 03:00:00 UTC"
R>
The example I show here covers the same subset as above but without the fall back as we now impose UTC as a time zone. And UTC has be construction no daylight savings adjustment.
Maybe try this (UTC timezone should not allow any duplicate):
datestimes=seq(as.POSIXlt("2015-09-01 00:00:00", tz="UTC"),
as.POSIXlt("2015-10-30 23:45:00", tz="UTC"),
by="15 min")

Resources