I am trying to convert a character string into a POSIXct date format and running into a problem with the time zone information.
The original character data looks like this:
SD$BGN_DTTM
[1] "1956-05-25 14:30:00 CST" "1956-06-05 16:30:00 CST" "1956-07-04 15:30:00 CST"
[4] "1956-07-08 08:00:00 CST" "1956-08-19 12:00:00 CST" "1956-12-23 00:50:00 CST"
but when I attempt to convert using as.POSIXct , this happens:
SD$BGN_DTTM <- as.POSIXct(SD$BGN_DTTM)
[1] "1956-05-25 14:30:00 PDT" "1956-06-05 16:30:00 PDT" "1956-07-04 15:30:00 PDT"
[4] "1956-07-08 08:00:00 PDT" "1956-08-19 12:00:00 PDT" "1956-12-23 00:50:00 PST"
It looks like the function isn't reading the time zone I've specified. Since my computer is on PDT, it looks like it has used that instead. Note also that it has appended PST to the last date (seems odd). Can anyone tell me what is going on here, and whether there is a method to get R to read the time zone information as shown?
This would still have the problem you noticed with daylight/standard times:
> strptime(test, format="%Y-%m-%d %H:%M:%S", tz="America/Chicago")
[1] "1956-05-25 14:30:00 CDT" "1956-06-05 16:30:00 CDT"
[3] "1956-07-04 15:30:00 CDT" "1956-07-08 08:00:00 CDT"
[5] "1956-08-19 12:00:00 CDT" "1956-12-23 00:50:00 CST"
The strptime function refuses to honor the "%Z" format for input (which in its defense is documented.) Many people have lost great gobs of hair and probably some keyboards into monitors in efforts to get R timezones working to their (dis?)satisfaction.
As we all know, time is a relative thing. Storing time as UTC/GMT or relative to UTC/GMT will make sure that daylight savings etc only come into play when you want them to, as per: Does UTC observe daylight saving time?
So, if:
x <- c("1956-05-25 14:30:00 CST","1956-06-05 16:30:00 CST", "1956-07-04 15:30:00 CST",
"1956-07-08 08:00:00 CST", "1956-08-19 12:00:00 CST","1956-12-23 00:50:00 CST")
You can find out that CST is 6 hours behind UTC/GMT (as opposed to CDT, which is daylight savings time and is 7 hours behind)
Therefore:
out <- as.POSIXct(x,tz="ETC/GMT+6")
will represent CST without any daylight savings shift to CDT.
That way when or if you convert to local central timezones, the proper CST time will be returned without changing the actual data for daylight savings. (i.e. - when R prints CDT, it is only shifting the display of the time forward an hour, but the underlying numerical data is not changed. The last case displays as expected when standard time kicks back in):
attr(out,"tzone") <- "America/Chicago"
out
#[1] "1956-05-25 15:30:00 CDT" "1956-06-05 17:30:00 CDT" "1956-07-04 16:30:00 CDT"
#[4] "1956-07-08 09:00:00 CDT" "1956-08-19 13:00:00 CDT" "1956-12-23 00:50:00 CST"
I.e. - for case 1, 15:30 CDT == 14:30 CST - as originally specified, and when daylight savings stops, for case 6, 00:50 CST == 00:50 CST as originally specified.
Comparing this final out to the other answer, you can see there is an actual numerical time difference of one hour for all the daylight savings cases:
out - strptime(x, format="%Y-%m-%d %H:%M:%S", tz="America/Chicago")
#Time differences in secs
#[1] 3600 3600 3600 3600 3600 0
Related
I have the following string desribing a date and time in Central European
Summer Time: "2021-09-23 12:00:00".
In fact, I've got a whole, long column of such time points.
For certain reasons I have to handle these times with functions in the 'lubridate'
package. Using 'as_datetime' I get
t0 = "2021-09-23 12:00:00"
t1 = as_datetime(t0); t1
## [1] "2021-09-23 12:00:00 UTC"
that is, as_datetime uses the hours and minutes given and adds Universal
Time "UTC" as time zone. Supplying the time zone CEST gives instead
t1 = as_datetime(t0, tz = "CEST"); t1
## [1] "2021-09-23 10:00:00 CEST"
that is, changes the time, what I don't want.
What I would like to get (what I really need) is "2021-09-23 12:00:00 CEST",
that is changing the time zone without changing the time.
I tried force_tz and with_tz, but that didn't work either.
What I am also wondering is why 'lubridate' converts 12:00:00 UST to
10:00:00 CEST, because CEST is the same as GMT+2 and UTC is GMT+0, so
the result should actually be the other way around.
Thanks for any help.
Try:
t0 = "2021-09-23 12:00:00"
library(lubridate)
#1
t1 = as_datetime(t0, tz = "Europe/Berlin")
[1] "2021-09-23 12:00:00 CEST"
or
#2
t1 = as_datetime(t0, tz = "CST6CDT")
[1] "2021-09-23 12:00:00 CDT"
I want to change a date-time string into POSIXct with the '+00' in behind indicating the time zone.
This is what I did
as.POSIXct("2018-2-12-1230", format = "%Y-%m-%d-%H%M")
>"2018-02-12 12:30:00 CST"
However instead of "2018-02-12 12:30:00 CST", I would like "2018-02-12 12:30:00 +08" as output. How can I do that in as.POSIXct?
I am trying to parse strings of the form 25.10.2013 17:30 (the timezone is CET/CEST but this is not specified in the strings themselves) as POSIXct using lubridates dmy_hm(..., tz = 'Europe/Brussels') function.
I have the problem that after parsing, there are duplicate values on the day CEST switches to CET (the clock jumps one hour back). The cause seems to be the way this shift is indicated in my data: 02A:00 for 2 o'clock CEST and 02B:00 2 o'clock CET, which is one hour later. dmy_hm(..., tz = 'Europe/Brussels') interprets both as CET.
Minimal working example:
> library(lubridate)
> times = c("27.10.2013 01:00", "27.10.2013 02A:00",
"27.10.2013 02B:00", "27.10.2013 03:00")
> times = dmy_hm(times, tz = "Europe/Brussels")
> times
[1] "2013-10-27 01:00:00 CEST" "2013-10-27 02:00:00 CET"
[3] "2013-10-27 02:00:00 CET" "2013-10-27 03:00:00 CET"
My question is: What would be the best way to fix the "wrong" dates?
I tried to use which(duplicated(times)) to find the indices of the duplicate values and remove one hour from the "wrong" values, however there seems to be another problem:
> times[2] - hours(1)
[1] "2013-10-27 01:00:00 CEST"
Why does removing one hour from '"2013-10-27 02:00:00 CET"' bring me to '"2013-10-27 01:00:00 CEST"'? Isn't that a jump of two hours? I would expect to land at '"2013-10-27 02:00:00 CEST"'.
EDIT: The last part is a know issue (see https://github.com/tidyverse/lubridate/issues/498). The solution is to use dhours() instead of hours()
> times[2] - dhours(1)
[1] "2013-10-27 02:00:00 CEST"
When I put a single date to be parsed, it parses accurately
> ymd("20011001")
[1] "2001-10-01 UTC"
But when I try to create a vector of dates they all come out one day off:
> b=c(ymd("20111001"),ymd("20101001"),ymd("20091001"),ymd("20081001"),ymd("20071001"),ymd("20061001"),ymd("20051001"),ymd("20041001"),ymd("20031001"),ymd("20021001"),ymd("20011001"))
> b
[1] "2011-09-30 19:00:00 CDT" "2010-09-30 19:00:00 CDT" "2009-09-30 19:00:00 CDT"
[4] "2008-09-30 19:00:00 CDT" "2007-09-30 19:00:00 CDT" "2006-09-30 19:00:00 CDT"
[7] "2005-09-30 19:00:00 CDT" "2004-09-30 19:00:00 CDT" "2003-09-30 19:00:00 CDT"
[10] "2002-09-30 19:00:00 CDT" "2001-09-30 19:00:00 CDT"
how can I fix this??? Many thanks.
I don't claim to understand exactly what's going on here, but the proximal problem is that c() strips attributes, so using c() on a POSIX[c?]t vector changes it from UTC to the time zone specified by your locale strips the time zone attribute, messing it up (even if you set the time zone to agree with the one specified by your locale). On my system:
library(lubridate)
(y1 <- ymd("20011001"))
## [1] "2001-10-01 UTC"
(y2 <- ymd("20011002"))
c(y1,y2)
## now in EDT (and a day earlier/4 hours before UTC):
## [1] "2001-09-30 20:00:00 EDT" "2001-10-01 20:00:00 EDT"
(y12 <- ymd(c("20011001","20011002")))
## [1] "2001-10-01 UTC" "2001-10-02 UTC"
c(y12)
## back in EDT
## [1] "2001-09-30 20:00:00 EDT" "2001-10-01 20:00:00 EDT"
You can set the time zone explicitly ...
y3 <- ymd("20011001",tz="EDT")
## [1] "2001-10-01 EDT"
But c() is still problematic.
(y3c <- c(y3))
## [1] "2001-09-30 20:00:00 EDT"
So two solutions are
convert a character vector rather than combining the objects after converting them one by one or
restore the tzone attribute after combining.
For example:
attr(y3c,"tzone") <- attr(y3,"tzone")
#Joran points out that this is almost certainly a general property of applying c() to POSIX[c?]t objects, not specifically lubridate-related. I hope someone will chime in and explain whether this is a well-known design decision/infelicity/misfeature.
Update: there is some discussion of this on R-help in 2012, and Brian Ripley comments:
But in any case, the documentation (?c.POSIXct) is clear:
Using ‘c’ on ‘"POSIXlt"’ objects converts them to the current time
zone, and on ‘"POSIXct"’ objects drops any ‘"tzone"’ attributes
(even if they are all marked with the same time zone).
So the recommended way is to add a "tzone" attribute if you know what
you want it to be. POSIXct objects are absolute times: the timezone
merely affects how they are converted (including to character for
printing).
It might be nice if lubridate added a method to do this ...
I would like to add 1 hour to a POSIXct object, but it does not support '+'.
This command:
as.POSIXct("2012/06/30","GMT")
+ as.POSIXct(paste(event_hour, event_minute,0,":"), ,"%H:%M:$S")
returns this error:
Error in `+.POSIXt`(as.POSIXct("2012/06/30", "GMT"), as.POSIXct(paste(event_hour, :
binary '+' is not defined for "POSIXt" objects
How can I add a few hours to a POSIXct object ?
POSIXct objects are a measure of seconds from an origin, usually the UNIX epoch (1st Jan 1970). Just add the requisite number of seconds to the object:
x <- Sys.time()
x
[1] "2012-08-12 13:33:13 BST"
x + 3*60*60 # add 3 hours
[1] "2012-08-12 16:33:13 BST"
The lubridate package also implements this nicely with convenience functions hours, minutes, etc.
x = Sys.time()
library(lubridate)
x + hours(3) # add 3 hours
James and Gregor's answers are great, but they handle daylight saving differently. Here's an elaboration of them.
# Start with d1 set to 12AM on March 3rd, 2019 in U.S. Central time, two hours before daylight saving
d1 <- as.POSIXct("2019-03-10 00:00:00", tz = "America/Chicago")
print(d1) # "2019-03-10 CST"
# Daylight saving begins # 2AM. See how a sequence of hours works. (Basically it skips the time between 2AM and 3AM)
seq.POSIXt(from = d1, by = "hour", length.out = 4)
# "2019-03-10 00:00:00 CST" "2019-03-10 01:00:00 CST" "2019-03-10 03:00:00 CDT" "2019-03-10 04:00:00 CDT"
# Now let's add 24 hours to d1 by adding 86400 seconds to it.
d1 + 24*60*60 # "2019-03-11 01:00:00 CDT"
# Next we add 24 hours to d1 via lubridate seconds/hours/days
d1 + lubridate::seconds(24*60*60) # "2019-03-11 CDT" (i.e. 2019-03-11 00:00:00 CDT)
d1 + lubridate::hours(24) # "2019-03-11 CDT" (i.e. 2019-03-11 00:00:00 CDT)
d1 + lubridate::days(1) # "2019-03-11 CDT" (i.e. 2019-03-11 00:00:00 CDT)
So, either answer is correct depending on what you want. Of course, if you're using UTC or some other timezone that doesn't observe daylight saving, these two methods should be the same.