Handling of switch to daylight saving time with POSIXct in R - r

I have hourly data associated with time stamps of the following format.
xx <- c("2019-03-30 12:00", "2019-03-30 13:00", "2019-03-30 14:00", "2019-03-30 15:00", "2019-03-30 16:00", "2019-03-30 17:00", "2019-03-30 18:00", "2019-03-30 19:00", "2019-03-30 20:00", "2019-03-30 21:00", "2019-03-30 22:00", "2019-03-30 23:00", "2019-03-31 00:00", "2019-03-31 01:00", "2019-03-31 02:00","2019-03-31 03:00", "2019-03-31 04:00", "2019-03-31 05:00", "2019-03-31 06:00", "2019-03-31 07:00", "2019-03-31 08:00", "2019-03-31 09:00", "2019-03-31 10:00", "2019-03-31 11:00", "2019-03-31 12:00")
If I convert this to POSIXct, I get a format stripped of the hours:
> as.POSIXct(xx)
[1] "2019-03-30 CET" "2019-03-30 CET" "2019-03-30 CET"
[4] "2019-03-30 CET" "2019-03-30 CET" "2019-03-30 CET"
[7] "2019-03-30 CET" "2019-03-30 CET" "2019-03-30 CET"
[10] "2019-03-30 CET" "2019-03-30 CET" "2019-03-30 CET"
[13] "2019-03-31 CET" "2019-03-31 CET" "2019-03-31 CET"
[16] "2019-03-31 CET" "2019-03-31 CET" "2019-03-31 CET"
[19] "2019-03-31 CET" "2019-03-31 CET" "2019-03-31 CET"
[22] "2019-03-31 CET" "2019-03-31 CET" "2019-03-31 CET"
[25] "2019-03-31 CET"
But I need to retain the hourly timestamp. However if I execute as.POSIXct() with the correct formatting option, I get the following problem:
> as.POSIXct(xx, format = "%Y-%m-%d %H:%M")
[1] "2019-03-30 12:00:00 CET" "2019-03-30 13:00:00 CET"
[3] "2019-03-30 14:00:00 CET" "2019-03-30 15:00:00 CET"
[5] "2019-03-30 16:00:00 CET" "2019-03-30 17:00:00 CET"
[7] "2019-03-30 18:00:00 CET" "2019-03-30 19:00:00 CET"
[9] "2019-03-30 20:00:00 CET" "2019-03-30 21:00:00 CET"
[11] "2019-03-30 22:00:00 CET" "2019-03-30 23:00:00 CET"
[13] "2019-03-31 00:00:00 CET" "2019-03-31 01:00:00 CET"
[15] NA "2019-03-31 03:00:00 CEST"
[17] "2019-03-31 04:00:00 CEST" "2019-03-31 05:00:00 CEST"
[19] "2019-03-31 06:00:00 CEST" "2019-03-31 07:00:00 CEST"
[21] "2019-03-31 08:00:00 CEST" "2019-03-31 09:00:00 CEST"
[23] "2019-03-31 10:00:00 CEST" "2019-03-31 11:00:00 CEST"
[25] "2019-03-31 12:00:00 CEST"
Apparently POSIXct cannot handle switches in daylight saving time? What's going on here?
I know I can solve this by using lubridates' ymd_hm(), but I pose this question in order to get understanding of the workings here. Is it possible to solve this in base R, or does s.POSIXct have a basic disfunctionality here?
Thanks.
EDIT: SOLUTION
Thanks to zoowalk and Roland in the comments for this solution:
My timeseries was recorded without time switches. However my OS time zone does record time switches throughout the year. Accordingly, I need to hand a time zone to the function that equally does not have time switches, like UTC:
as.POSIXct(xx, format = "%Y-%m-%d %H:%M", tz="UTC")

SOLUTION
Thanks to zoowalk and Roland in the comments for this solution:
My timeseries was recorded without time switches. However my OS time zone does record time switches throughout the year. Accordingly, I need to hand a time zone to the function that equally does not have time switches, like UTC:
as.POSIXct(xx, format = "%Y-%m-%d %H:%M", tz="UTC")

Related

Convert character to datetime object produces NAs

I have this character vector but when I try to convert it to datetime object I get NAs
timess<-c("11.01.2021 04:40", "03.01.2021 05:45", "30.12.2020 02:28",
"02.01.2021 08:13", "03.01.2021 05:45", "04.01.2021 04:33", "03.01.2021 05:45",
"02.01.2021 08:13", "03.01.2021 05:45", "02.01.2021 08:13")
timess<-as.POSIXct(timess, format="%d.%m.%y %H:%M:%S")
paste the seconds to the strings, and take good care of the format string, see?strptime.
as.POSIXct(paste0(timess, ':00'), format="%d.%m.%Y %H:%M:%S")
# [1] "2021-01-11 04:40:00 CET" "2021-01-03 05:45:00 CET"
# [3] "2020-12-30 02:28:00 CET" "2021-01-02 08:13:00 CET"
# [5] "2021-01-03 05:45:00 CET" "2021-01-04 04:33:00 CET"
# [7] "2021-01-03 05:45:00 CET" "2021-01-02 08:13:00 CET"
# [9] "2021-01-03 05:45:00 CET" "2021-01-02 08:13:00 CET"

Using lubridate to get dates/times from continuous increments

I am trying to use lubridate to process the results of a differential equation solved using ode. My simulation begins on a certain date (01-01-2021) and is on the order of days (a one unit-time increase is equal to a one day calendar time increase). How can I use lubridate to process a continuous double of time since simulation start?
For ex, I want to go from the left column to the right column:
ODE time
Calendar Time
0.0
01-01-2021 00:00
0.5
01-01-2021 12:00
1.0
01-02-2021 00:00
etc...
Thank you
I am not fully sure I understand your question. But from your example it appears you want to create timesteps. When I understand it correctly, a "one unit" is a adding 24 hours, while the half day is adding 12 hours. Your data frame example suggest you want to have this in a dataframe/tibble.
With {lubridate} you can "coerce" datetimestamps. There are some handy time formatting functions. From a character you can go to a timestamp.
For example
# create dataframe/tibble of ODE and Calendar times
mydata <- tribble(
~ODE_time, ~Calendar_Time
,0.0 , "01-01-2021 00:00"
,0.5 , "01-01-2021 12:00"
,1.0 , "01-02-2021 00:00"
,1.5 , "01-02-2021 12:00"
)
mydata <- mydata %>%
mutate(time = lubridate::mdy_hm(Calendar_Time))
In your case, I use the mdy_hm() function to make a timestamp (dttm) object.
I assign it to the time variable/column so you can check the presentation in R/RStudio.
What I get from your question is that you want to create a sequence of timestamps.
Here you can use the seq() function and work with the time offset, in your case 12 hours (or half a day). I limit the length out to 10 ... you can obviously define longer sequences or determine your end day (i.e. to parameter of seq())
date_time_seq <- seq( from = lubridate::mdy_hm("01-01-2021 00:00")
,length.out = 10,
,by = "12 hours")
This gives you a sequence of timestamps
date_time_seq
[1] "2021-01-01 00:00:00 UTC" "2021-01-01 12:00:00 UTC" "2021-01-02 00:00:00 UTC"
[4] "2021-01-02 12:00:00 UTC" "2021-01-03 00:00:00 UTC" "2021-01-03 12:00:00 UTC"
[7] "2021-01-04 00:00:00 UTC" "2021-01-04 12:00:00 UTC" "2021-01-05 00:00:00 UTC"
[10] "2021-01-05 12:00:00 UTC"
The syntax allows you to add various "steps" and you can use increments of different time units, e.g. mins, hours, days, weeks, etc.
This timestep vector you can operate in your dataframe/tibble and perform your other operations.
Good luck!
You could directly add the number of seconds to the start date:
ODETime <- seq(0,10,by=0.5)
calendarTime <- as.POSIXct("2021-01-01 00:00") + ODETime * 86400
calendarTime
[1] "2021-01-01 00:00:00 CET" "2021-01-01 12:00:00 CET" "2021-01-02 00:00:00 CET"
[4] "2021-01-02 12:00:00 CET" "2021-01-03 00:00:00 CET" "2021-01-03 12:00:00 CET"
[7] "2021-01-04 00:00:00 CET" "2021-01-04 12:00:00 CET" "2021-01-05 00:00:00 CET"
[10] "2021-01-05 12:00:00 CET" "2021-01-06 00:00:00 CET" "2021-01-06 12:00:00 CET"
[13] "2021-01-07 00:00:00 CET" "2021-01-07 12:00:00 CET" "2021-01-08 00:00:00 CET"
[16] "2021-01-08 12:00:00 CET" "2021-01-09 00:00:00 CET" "2021-01-09 12:00:00 CET"
[19] "2021-01-10 00:00:00 CET" "2021-01-10 12:00:00 CET" "2021-01-11 00:00:00 CET"
or with lubridate:
as.POSIXct("2021-01-01 00:00") + lubridate::period(24,'hour') * ODETime
[1] "2021-01-01 00:00:00 CET" "2021-01-01 12:00:00 CET" "2021-01-02 00:00:00 CET"
[4] "2021-01-02 12:00:00 CET" "2021-01-03 00:00:00 CET" "2021-01-03 12:00:00 CET"
[7] "2021-01-04 00:00:00 CET" "2021-01-04 12:00:00 CET" "2021-01-05 00:00:00 CET"
[10] "2021-01-05 12:00:00 CET" "2021-01-06 00:00:00 CET" "2021-01-06 12:00:00 CET"
[13] "2021-01-07 00:00:00 CET" "2021-01-07 12:00:00 CET" "2021-01-08 00:00:00 CET"
[16] "2021-01-08 12:00:00 CET" "2021-01-09 00:00:00 CET" "2021-01-09 12:00:00 CET"
[19] "2021-01-10 00:00:00 CET" "2021-01-10 12:00:00 CET" "2021-01-11 00:00:00 CET"

Converting values in Date Time column into common format [duplicate]

This question already has answers here:
Parse datetime with lubridate
(3 answers)
Parsing dates with different formats
(3 answers)
Closed 2 years ago.
I have a dataframe with a column containing Date Time values, but the values are in different formats.
I want to bring them all to the format "dd/mm/yyyy hh:mm". I tried using the lubridate package to convert the dates with the AM/PM text appended to the dates, but am unable to do so.
Date_Time
"11/01/2019 10:00"
"11/01/2019 11:00"
"11/01/2019 12:00"
"11/01/2019 13:00"
"11/01/2019 14:00"
"11/01/2019 15:00"
"11/01/2019 16:00"
"10/03/2019 23:00"
"10/04/2019 1:00"
"10/28/2019 05:00:00 AM"
"10/28/2019 10:00:00 PM"
"10/29/2019 02:00:00 AM"
"10/29/2019 03:00:00 AM"
"10/31/2019 01:00:00 PM"
"10/31/2019 02:00:00 PM"
"10/31/2019 10:00:00 PM"
You can use lubridate's parse_date_time :
lubridate::parse_date_time(df$Date_Time, c('mdYHM', 'mdYIMSp'))
#[1] "2019-11-01 10:00:00 UTC" "2019-11-01 11:00:00 UTC" "2019-11-01 12:00:00 UTC"
#[4] "2019-11-01 13:00:00 UTC" "2019-11-01 14:00:00 UTC" "2019-11-01 15:00:00 UTC"
#[7] "2019-11-01 16:00:00 UTC" "2019-10-03 23:00:00 UTC" "2019-10-04 01:00:00 UTC"
#[10]"2019-10-28 05:00:00 UTC" "2019-10-28 22:00:00 UTC" "2019-10-29 02:00:00 UTC"
#[13]"2019-10-29 03:00:00 UTC" "2019-10-31 13:00:00 UTC" "2019-10-31 14:00:00 UTC"
#[16]"2019-10-31 22:00:00 UTC"
data
df <- structure(list(Date_Time = c("11/01/2019 10:00", "11/01/2019 11:00",
"11/01/2019 12:00", "11/01/2019 13:00", "11/01/2019 14:00", "11/01/2019 15:00",
"11/01/2019 16:00", "10/03/2019 23:00","10/04/2019 1:00","10/28/2019 05:00:00 AM",
"10/28/2019 10:00:00 PM", "10/29/2019 02:00:00 AM", "10/29/2019 03:00:00 AM",
"10/31/2019 01:00:00 PM", "10/31/2019 02:00:00 PM", "10/31/2019 10:00:00 PM"
)), class = "data.frame", row.names = c(NA, -16L))

Create a time series by 30 minute intervals

I am trying to create a time series with 30 min intervals. I used the following command with the output also shown:
ts = seq(as.POSIXct("2009-01-01 00:00"), as.POSIXct("2014-12-31 23:30"),by = "hour")
"2010-02-21 12:00:00 EST" "2010-02-21 13:00:00 EST" "2010-02-21 14:00:00 EST"
When I change it to by ="min" it changes to be every minute.
How do I create a time series with every 30 minute intervals?
You can specify minutes in the by argument, and pass the time zone "UTC" as Adrian pointed out. Check ?seq.POSIXt for more details about the by argument specified as a character string:
A character string, containing one of "sec", "min", "hour", "day",
"DSTday", "week", "month", "quarter" or "year". This can optionally be
preceded by a (positive or negative) integer and a space, or followed
by "s".
ts <- seq(as.POSIXct("2017-01-01", tz = "UTC"),
as.POSIXct("2017-01-02", tz = "UTC"),
by = "30 min")
head(ts)
Output
[1] "2017-01-01 00:00:00 UTC"
[2] "2017-01-01 00:30:00 UTC"
[3] "2017-01-01 01:00:00 UTC"
[4] "2017-01-01 01:30:00 UTC"
[5] "2017-01-01 02:00:00 UTC"
[6] "2017-01-01 02:30:00 UTC"
Default units are seconds. So just do 1800 seconds to get 30 minutes.
ts = seq(as.POSIXct("2009-01-01 00:00"), as.POSIXct("2014-12-31 23:30"),by = 1800)
ts[1:20]
[1] "2009-01-01 00:00:00 EST" "2009-01-01 00:30:00 EST" "2009-01-01 01:00:00 EST" "2009-01-01 01:30:00 EST" "2009-01-01 02:00:00 EST"
[6] "2009-01-01 02:30:00 EST" "2009-01-01 03:00:00 EST" "2009-01-01 03:30:00 EST" "2009-01-01 04:00:00 EST" "2009-01-01 04:30:00 EST"
[11] "2009-01-01 05:00:00 EST" "2009-01-01 05:30:00 EST" "2009-01-01 06:00:00 EST" "2009-01-01 06:30:00 EST" "2009-01-01 07:00:00 EST"
[16] "2009-01-01 07:30:00 EST" "2009-01-01 08:00:00 EST" "2009-01-01 08:30:00 EST" "2009-01-01 09:00:00 EST" "2009-01-01 09:30:00 EST"

switch to DST: round_date() returns NAs

In 2013, the switch from Central European Time (CET) to Central European Summer Time (CEST) took place on Sunday 2013-03-31. Clocks are advanced by one hour from 2am to 3pm, so basically there is no 2am.
start <- strptime("2013-03-31 01:00:00", format="%F %T", tz="CET")
times <- start + (0:5) * 60*15
times
[1] "2013-03-31 01:00:00 CET" "2013-03-31 01:15:00 CET"
[3] "2013-03-31 01:30:00 CET" "2013-03-31 01:45:00 CET"
[5] "2013-03-31 03:00:00 CEST" "2013-03-31 03:15:00 CEST"
Rounding the vector times to hours gives NAs. Even for times before 01:30, which aren't affected by the transition at all.
library(lubridate)
round_date(times, unit = "hour")
[1] "2013-03-31 01:00:00 CET" NA
[3] NA NA
[5] NA "2013-03-31 03:00:00 CEST"
This seems to be a bug, or am I missing something? I am running:
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=German_Austria.1252 LC_CTYPE=German_Austria.1252
[3] LC_MONETARY=German_Austria.1252 LC_NUMERIC=C
[5] LC_TIME=German_Austria.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.3.3
loaded via a namespace (and not attached):
[1] digest_0.6.4 memoise_0.2.1 plyr_1.8.1 Rcpp_0.11.2 stringr_0.6.2
It looks like the culprit is ceiling_date which is called by round_date:
ceiling_date(times,"hour")
[1] "2013-03-31 01:00:00 CET" NA
[3] NA NA
[5] NA "2013-03-31 04:00:00 CEST"
Looking at the code it works by adding 1 to the hour, thereby creating a non-existant time. It is definitely a bug.
base::round has support for times to do what you want though:
round(times,"hour")
[1] "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET"
[3] "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST"
[5] "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST"
It's an edge case and you could consider the behavior a bug. round_date uses ceiling_date and there this happens:
y <- floor_date(times - eseconds(1), "hour")
#[1] "2013-03-31 00:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 03:00:00 CEST"
hour(y) <- hour(y) + 1
#[1] "2013-03-31 01:00:00 CET" NA NA NA NA "2013-03-31 04:00:00 CEST"
As you see it tries to increment 2013-03-31 01:00:00 CET by one hour and doesn't deal correctly with the time zones.
The root issue is probably in the "hour<-" POSIXct S4 method.
This has been fixed in master:
> times <- ymd_hms("2013-03-31 01:00:00 CET", "2013-03-31 01:15:00 CEST",
+ "2013-03-31 01:30:00 CEST", "2013-03-31 01:45:00 CEST",
+ "2013-03-31 03:00:00 CEST", "2013-03-31 03:15:00 CEST",
+ tz = "Europe/Amsterdam")
> round_date(times, unit = "hour")
[1] "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 03:00:00 CEST"
[4] "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST"
> ceiling_date(times, unit = "hour")
[1] "2013-03-31 01:00:00 CET" "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST"
[4] "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST" "2013-03-31 04:00:00 CEST"

Resources