How can I find the element in a timestamp vector where it switches to a different time zone due to the time changeover?
Sample data:
ts <- structure(c(1521921600, 1521925200, 1521928800, 1521932400, 1521936000,
1521939600, 1521943200, 1521946800, 1521950400, 1521954000, 1521957600
), class = c("POSIXct", "POSIXt"))
Output:
"2018-03-24 21:00:00 CET" "2018-03-24 22:00:00 CET" "2018-03-24 23:00:00 CET" "2018-03-25 00:00:00 CET" "2018-03-25 01:00:00 CET" "2018-03-25 03:00:00 CEST" "2018-03-25 04:00:00 CEST" "2018-03-25 05:00:00 CEST" "2018-03-25 06:00:00 CEST" "2018-03-25 07:00:00 CEST" "2018-03-25 08:00:00 CEST"
The first 5 elements are in CET and then it switches to CEST. So the answer here would be 5 or 6. Both answers would be fine.
In the sample data the difference is always 1 hour, but I need it aswell for different time intervalls, for example 15 or 30 minutes.
seq(min(ts), to = max(ts), by = 15*60)
seq(min(ts), to = max(ts), by = 30*60)
The expected answer for 15 min would be 20/21.
The expected answer for 30 min would be 10/11.
You can use lubridate's dst:
which(!duplicated(dst(ts)))[2]
This will give you the point where the time zone changes to DST.
Related
I am trying to use lubridate to process the results of a differential equation solved using ode. My simulation begins on a certain date (01-01-2021) and is on the order of days (a one unit-time increase is equal to a one day calendar time increase). How can I use lubridate to process a continuous double of time since simulation start?
For ex, I want to go from the left column to the right column:
ODE time
Calendar Time
0.0
01-01-2021 00:00
0.5
01-01-2021 12:00
1.0
01-02-2021 00:00
etc...
Thank you
I am not fully sure I understand your question. But from your example it appears you want to create timesteps. When I understand it correctly, a "one unit" is a adding 24 hours, while the half day is adding 12 hours. Your data frame example suggest you want to have this in a dataframe/tibble.
With {lubridate} you can "coerce" datetimestamps. There are some handy time formatting functions. From a character you can go to a timestamp.
For example
# create dataframe/tibble of ODE and Calendar times
mydata <- tribble(
~ODE_time, ~Calendar_Time
,0.0 , "01-01-2021 00:00"
,0.5 , "01-01-2021 12:00"
,1.0 , "01-02-2021 00:00"
,1.5 , "01-02-2021 12:00"
)
mydata <- mydata %>%
mutate(time = lubridate::mdy_hm(Calendar_Time))
In your case, I use the mdy_hm() function to make a timestamp (dttm) object.
I assign it to the time variable/column so you can check the presentation in R/RStudio.
What I get from your question is that you want to create a sequence of timestamps.
Here you can use the seq() function and work with the time offset, in your case 12 hours (or half a day). I limit the length out to 10 ... you can obviously define longer sequences or determine your end day (i.e. to parameter of seq())
date_time_seq <- seq( from = lubridate::mdy_hm("01-01-2021 00:00")
,length.out = 10,
,by = "12 hours")
This gives you a sequence of timestamps
date_time_seq
[1] "2021-01-01 00:00:00 UTC" "2021-01-01 12:00:00 UTC" "2021-01-02 00:00:00 UTC"
[4] "2021-01-02 12:00:00 UTC" "2021-01-03 00:00:00 UTC" "2021-01-03 12:00:00 UTC"
[7] "2021-01-04 00:00:00 UTC" "2021-01-04 12:00:00 UTC" "2021-01-05 00:00:00 UTC"
[10] "2021-01-05 12:00:00 UTC"
The syntax allows you to add various "steps" and you can use increments of different time units, e.g. mins, hours, days, weeks, etc.
This timestep vector you can operate in your dataframe/tibble and perform your other operations.
Good luck!
You could directly add the number of seconds to the start date:
ODETime <- seq(0,10,by=0.5)
calendarTime <- as.POSIXct("2021-01-01 00:00") + ODETime * 86400
calendarTime
[1] "2021-01-01 00:00:00 CET" "2021-01-01 12:00:00 CET" "2021-01-02 00:00:00 CET"
[4] "2021-01-02 12:00:00 CET" "2021-01-03 00:00:00 CET" "2021-01-03 12:00:00 CET"
[7] "2021-01-04 00:00:00 CET" "2021-01-04 12:00:00 CET" "2021-01-05 00:00:00 CET"
[10] "2021-01-05 12:00:00 CET" "2021-01-06 00:00:00 CET" "2021-01-06 12:00:00 CET"
[13] "2021-01-07 00:00:00 CET" "2021-01-07 12:00:00 CET" "2021-01-08 00:00:00 CET"
[16] "2021-01-08 12:00:00 CET" "2021-01-09 00:00:00 CET" "2021-01-09 12:00:00 CET"
[19] "2021-01-10 00:00:00 CET" "2021-01-10 12:00:00 CET" "2021-01-11 00:00:00 CET"
or with lubridate:
as.POSIXct("2021-01-01 00:00") + lubridate::period(24,'hour') * ODETime
[1] "2021-01-01 00:00:00 CET" "2021-01-01 12:00:00 CET" "2021-01-02 00:00:00 CET"
[4] "2021-01-02 12:00:00 CET" "2021-01-03 00:00:00 CET" "2021-01-03 12:00:00 CET"
[7] "2021-01-04 00:00:00 CET" "2021-01-04 12:00:00 CET" "2021-01-05 00:00:00 CET"
[10] "2021-01-05 12:00:00 CET" "2021-01-06 00:00:00 CET" "2021-01-06 12:00:00 CET"
[13] "2021-01-07 00:00:00 CET" "2021-01-07 12:00:00 CET" "2021-01-08 00:00:00 CET"
[16] "2021-01-08 12:00:00 CET" "2021-01-09 00:00:00 CET" "2021-01-09 12:00:00 CET"
[19] "2021-01-10 00:00:00 CET" "2021-01-10 12:00:00 CET" "2021-01-11 00:00:00 CET"
I am trying to import in R a text file including datetimes. Times are stored in character format, without timezone information, but we know it is French time (Europe/Paris).
An issue arise for the days of timezone change: e.g. there is a time change from 2018-10-28 03:00:00 CEST to 2018-10-28 02:00:00 CET, thus we have duplicates in our character format, and R cannot tell wether it is CEST or CET.
Consider the following example:
data_in <- "date,val
2018-10-28 01:30:00,25
2018-10-28 02:00:00,26
2018-10-28 02:30:00,27
2018-10-28 02:00:00,28
2018-10-28 02:30:00,29
2018-10-28 03:00:00,30"
library(readr)
data <- read_delim(data_in, ",", locale = locale(tz = "Europe/Paris"))
We end up having duplicates in our dates:
data$date
[1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CET" "2018-10-28 02:00:00 CEST"
[5] "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
Expected output would be:
data$date
[1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CEST" "2018-10-28 02:00:00 CET"
[5] "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
Any idea how to solve the issue (besides telling people to use UTC or ISO formats). I guess the only way is to suppose the dates are sorted, so we can tell the first ones are CEST.
If you are certain that your time is always-increasing, then you can look for an apparent decrease (of time-of-day) and manually insert the TZ offset to the string, then parse as usual. I added some logic to look for this decrease only around 2-3am so that if you have multiple days of data spanning midnight, you would not get a false-alarm.
data <- read.csv(text = data_in)
fakedate <- as.POSIXct(gsub("^[-0-9]+ ", "2000-01-01 ", data$date))
decreases <- cumany(grepl(" 0[23]:", data$date) & c(FALSE, diff(fakedate) < 0))
data$date <- paste(data$date, ifelse(decreases, "+0100", "+0200"))
data
# date val
# 1 2018-10-28 01:30:00 +0200 25
# 2 2018-10-28 02:00:00 +0200 26
# 3 2018-10-28 02:30:00 +0200 27
# 4 2018-10-28 02:00:00 +0100 28
# 5 2018-10-28 02:30:00 +0100 29
# 6 2018-10-28 03:00:00 +0100 30
as.POSIXct(data$date, format="%Y-%m-%d %H:%M:%S %z", tz="Europe/Paris")
# [1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CEST"
# [4] "2018-10-28 02:00:00 CET" "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
My use of "2000-01-01" was just some non-DST day so that we can parse the timestamp into POSIXt and calculate a diff on it. (If we didn't insert a date, we could still use as.POSIXct with a format, but if you ever ran this on one of the two DST days, you might get different results since as.POSIXct("01:02:03", format="%H:%M:%S") always assumes "today".
This is obviously a bit fragile with its assumptions, but perhaps it'll be good enough for what you need.
When converting a date object to a POSIXct object, I expected the hours to be zero.
Turns out the hours are either 1 or 2, depending on summer/winter time.
eg:
oct.days <- (as.Date("2018-10-26")+0:5)
as.POSIXct(oct.days)
[1] "2018-10-26 02:00:00 CEST" "2018-10-27 02:00:00 CEST" "2018-10-28 02:00:00 CEST"
[4] "2018-10-29 01:00:00 CET" "2018-10-30 01:00:00 CET" "2018-10-31 01:00:00 CET"
(I'm in Germany, winter time was implemented on Oct 28th at 3 am.)
Rounding it down fixed the issue
round(as.POSIXct(oct.days),"days")
but I wonder for what reason the date object contains extra hours?
tks!
Can anyone explain this behaviour of as.POSIXct in R.
ts <- c("2018-03-24 23:00:00", "2018-03-25 01:00:00", "2018-03-25 01:15:00",
"2018-03-25 01:30:00", "2018-03-25 01:45:00", "2018-03-25 02:00:00")
as.POSIXct(ts)
as.POSIXct(ts[1:5])
diff(as.POSIXct(ts))
diff(as.POSIXct(ts[1:5]))
The results when running interactive in RStudio are:
> as.POSIXct(ts)
[1] "2018-03-24 CET" "2018-03-25 CET" "2018-03-25 CET" "2018-03-25 CET" "2018-03-25 CET" "2018-03-25 CET"
> as.POSIXct(ts[1:5])
[1] "2018-03-24 23:00:00 CET" "2018-03-25 01:00:00 CET" "2018-03-25 01:15:00 CET" "2018-03-25 01:30:00 CET" "2018-03-25 01:45:00 CET"
>
> diff(as.POSIXct(ts))
Time differences in secs
[1] 86400 0 0 0 0
> diff(as.POSIXct(ts[1:5]))
Time differences in mins
[1] 120 15 15 15
> ts[1:5]
[1] "2018-03-24 23:00:00" "2018-03-25 01:00:00" "2018-03-25 01:15:00" "2018-03-25 01:30:00" "2018-03-25 01:45:00"
> ts
[1] "2018-03-24 23:00:00" "2018-03-25 01:00:00" "2018-03-25 01:15:00" "2018-03-25 01:30:00" "2018-03-25 01:45:00" "2018-03-25 02:00:00"
First observation; when looking at all 6 data the 'time' entry disappears. Is this a printout phenomenon only??
Second observation, the behaviour of diff seems completely bizarre.
The problem does not lie in the diff function. It lies with as.POSIX* combined with the DST (Daylight saving time). R does not handle this automatically.
On 25 march, 2018 02:00:00. The CET is set 1 hour foward, changing to CEST time officially. This means 2018-03-25 02:00:00 CET simply does not exist.
Why does this happen?
When calling as.POSIXct() some parameters are set as default. One of them is the tz (timezone) set at the system's default (mine is CET).
To clarify, I edited your dataset
ts <- c("2018-03-25 01:45:00", "2018-03-25 02:00:00", "2018-03-25 03:00:00")
Now we run the following line
as.POSIXct(ts)
#"2018-03-25 CET" "2018-03-25 CET" "2018-03-25 CET"
There is no format parameter given, so R will try different formats, resulting in the timestamps removed. So what if we force a format with timestamps? Running the following line will result in:
as.POSIXct(ts, format = "%Y-%m-%d %H:%M:%OS")
# "2018-03-25 01:45:00 CET" NA "2018-03-25 03:00:00 CEST"
Note that the second value (where a time actually does not exist) is coerced as NA. Because R cannot transform this value to "%Y-%m-%d %H:%M:%OS", it tries an easier format ("%Y-%m-%d"). Also note that the third value is in the CEST timezone, passing the DST time. Running the set through a transformation call with a different timezone given, the code succeeds:
as.POSIXct(ts[1:3], format = "%Y-%m-%d %H:%M:%OS", tz = "UTC")
#"2018-03-25 01:45:00 UTC" "2018-03-25 02:00:00 UTC" "2018-03-25 03:00:00 UTC"
In short answer, there is no "2018-03-25 02:00:00" in CET due to summertime issue.
In detail, as.POSIXct function has tryFormats parameter, which it applies format function to convert from character to POSIXct class.
Since it is impossible to convert "2018-03-25 02:00:00" to POSIXct class, the function uses %Y-%m-%d format, not %Y-%m-%d %H:%M:%OS.
If you tries another timezone, which does not have summer time e.g. Asia/Seoul, then you will see they will show datetime format all the time.
Sys.setenv(TZ='Asia/Seoul')
as.POSIXct(ts)
> [1] "2018-03-24 23:00:00 KST" "2018-03-25 01:00:00 KST" "2018-03-25 01:15:00 KST" "2018-03-25 01:30:00 KST" "2018-03-25 01:45:00 KST" "2018-03-25 02:00:00 KST"
I've got some data with POSIXct timestamps in "CET" (Central European Time = Winter time = UTC+0100) and "CEST" (Central European Summer Time = UTC+0200). Since I've had some trouble with plots and calculations because of that daylight savings time, I want all of the timestamps to be in UTC+0100 time.
Here is an example for my timestamps on switch-back-to-winter-time-day:
> tdf$time_posix_vec[1:20]
[1] "2015-10-25 00:00:00 CEST" "2015-10-25 00:15:00 CEST" "2015-10-25 00:30:00 CEST" "2015-10-25 00:45:00 CEST" "2015-10-25 01:00:00 CEST"
[6] "2015-10-25 01:15:00 CEST" "2015-10-25 01:30:00 CEST" "2015-10-25 01:45:00 CEST" "2015-10-25 02:00:00 CEST" "2015-10-25 02:15:00 CEST"
[11] "2015-10-25 02:30:00 CEST" "2015-10-25 02:45:00 CEST" "2015-10-25 02:00:00 CET" "2015-10-25 02:15:00 CET" "2015-10-25 02:30:00 CET"
[16] "2015-10-25 02:45:00 CET" "2015-10-25 03:00:00 CET" "2015-10-25 03:15:00 CET" "2015-10-25 03:30:00 CET" "2015-10-25 03:45:00 CET"
To demonstrate the issue i picked an example timestamp:
> tx <- tdf$time_posix_vec[7]
> tx
[1] "2015-10-25 01:30:00 CEST"
I already tried lubridate's with_tz function, but if I use it with "CET", this is what happens:
> with_tz(tx, tzone = "CET")
[1] "2015-10-25 01:30:00 CEST"
I assume, the timezone handler knows that in my location CET becomes CEST between last week of march and last week of october.
To solve the issue I could use Algeria's timezone, since Algeria uses CET without daylight savings time (as wikipedia told me). However, this could change in the future, and
I wonder if this solution would be a bit unsafe because of that?
> with_tz(tx, tzone = "Africa/Algiers")
[1] "2015-10-25 00:30:00 CET"
The best way, I thought, would be to use "UTC+1", but the behaviour of with_tz is exactly the opposite of what I expected:
> with_tz(tx, tzone = "UTC+1")
[1] "2015-10-24 22:30:00 UTC"
to get 00:30:00 I would have to use:
> with_tz(tx, tzone = "UTC-1")
[1] "2015-10-25 00:30:00 UTC"
but then also the label "UTC" is wrong, because in UTC it would be
> with_tz(tx, tzone = "UTC")
[1] "2015-10-24 23:30:00 UTC"
Why is "UTC+1" switching the timestamp to UTC-0100 instead of UTC+0100?
And is there a function that forces the timestamp to UTC+0100 and also gives puts the correct timezone label to the timestamp, so the result would be "2015-10-25 00:30:00 UTC+1"?
Thanks in advance,
greetings, Peter
I think I found the solution: now I use
t1 <- as.POSIXct("2016-07-12 17:43","Etc/GMT-1")
for example. It confused me that GMT-1 is the same as UTC+0100, they seem to turn around the sign at bsd style timezones.