lubridate yyyy-MM-dd'T'HH:mm:ssX conversion unexpected. Bug? - r

Very unexpected behaviour when parsing "yyyy-MM-dd'T'HH:mm:ssX"-string (ISO 8601)
> as_datetime("2017-03-22T15:48:00.000Z")
[1] "2017-03-21 23:00:00 UTC"
> packageDescription("lubridate")$Version
[1] "1.6.0"
Could someone explain the rationale for this?
edit: Seems like a bug, see issue #536

update: resolved in lubridate commit here (May 2017). Works with lubridate 1.7.4, probably some earlier versions as well.
Without digging into the guts of as_datetime,
I think this may be a combination of (1) as_datetime
not being able to handle (i.e., ignore) the T in your format;
(2) conversion from local to UTC time zone.
dstr <- "2017-03-22T15:48:00.000Z"
library(lubridate)
as_datetime(dstr)
## [1] "2017-03-22 04:00:00 UTC"
If as_datetime() ignores everything after the T
that gets us to midnight on 2017-03-22. However, this is
taken as midnight in my local time zone which is GMT+04,
so the resulting time is 04:00:00. Presumably your local time
is GMT-01.
If you manually substitute a space for the T things work better (you can use
stringr::str_replace if you prefer)
as_datetime(sub("T"," ",dstr))
## [1] "2017-03-22 19:48:00 UTC"
Or use strptime:
strptime(dstr,format="%Y-%m-%dT%H:%M:%S")
## [1] "2017-03-22 15:48:00 EDT"
(note that strptime automatically discards trailing characters)
For what it's worth Dirk Eddelbuettel's anytime package handles this case:
anytime(dstr)
## [1] "2017-03-22 15:48:00 EDT"

If you have imported your data in the format presented here and you want to use lubridate to convert it into a date-time object I would recommend using the ymd_hms function of lubridate.
In your case it would look like this:
ymd_hms("2017-03-22T15:48:00.000Z")
[1] "2017-03-22 15:48:00 UTC"

Related

How to convert a text string 20200821144500 into date and time format in r?

I have a date/time column saved like this 20200821144500. How do I convert it to date and time format?
I tried as.POSIXct('20200821144500',format="%Y-%m-%d %H:%M:%S") which returned NA.
Am I using the wrong format or is there another way to do it?
The main feature here is to tell which is first year month or day etc.. in the string. and to use HMS , because:
hms, hm and ms usage is defunct, please use HMS, HM or MS instead.
Deprecated in version '1.5.6'.
library(lubridate)
string <- "20200821144500"
parse_date_time(string, "ymd_HMS")
[1] "2020-08-21 14:45:00 UTC"
A possible solution, based on lubridate::ymd_hms:
lubridate::ymd_hms("20200821144500")
#> [1] "2020-08-21 14:45:00 UTC"
Another option using strptime with right format:
strptime(as.character(20200821144500),format="%Y%m%d%H%M%S")
#> [1] "2020-08-21 14:45:00 CEST"
Created on 2022-07-08 by the reprex package (v2.0.1)

How to convert date to datetime to seconds since UNIX epoch in R with lubridate?

I'm noticing this very confusing behavior.
library(lubridate)
x = as_date(-25567)
as.integer(as_datetime(x)) # Returns NA
How can I get this to return the seconds since (or in this case before) UNIX epoch?
This works with base R, now that we covered that you really want as.Date("1970-01-01").
R> as.POSIXct("1900-01-01 00:00:00")
[1] "1900-01-01 CST"
R> as.numeric(as.POSIXct("1900-01-01 00:00:00"))
[1] -2208967200
R>
I vaguely recall some OS-level irritations for dates prior to the epoch. This may fail for you on the world's most commonly used OS but that is not really R's fault...

Problem with converting unix time zone in lubridate

I import dataset from treasuredata in a JSON format and then transform unix time column to standard time using Lubridate fuction "as_datetime".
I set tz(timezone) as GMT+7 as I am in Bangkok Thailand, and use that convert datetime in my analysis. Later, I found that it is wrong and for the correct datetime I have to input tz as "GMT-7". I dont understand why.
as_datetime(1565923100)
#[1] "2019-08-16 02:38:20 UTC"
as_datetime(1565923100, tz = "gmt+7")
#[1] "2019-08-15 19:38:20 gmt"
as_datetime(1565923100, tz = "gmt-7")
#[1] "2019-08-16 09:38:20 gmt"
now()
#[1] "2019-08-16 10:16:17 +07"
From the code, the gmt-7 gives the correct time while I think I should use the gmt+7 because Bangkok timezone is gmt+7. I check using 'now' function and got the correct time too. I dont understand the logic behind the code. Thank you for any explanation.

Why R package lubridate can't parse vector with multiple formats?

I'm using package lubridate to parse a vector of heterogeneously-formatted dates and convert them to string, like this:
parse_date_time(c('12/17/1996 04:00:00 PM','4/18/1950 0130'), c('%m/%d/%Y %I:%M:%S %p','%m/%d/%Y %H%M'))
This is the result:
[1] NA NA
Warning message:
All formats failed to parse. No formats found.
If I remove the %p in the 1st format string, it incorrectly parses the 1st date string, and still doesn't parse the 2nd, like so:
[1] "1996-12-17 04:00:00 UTC" NA
Warning message:
1 failed to parse.
The 4PM time in the string is parsed to 4AM in the result.
Has anyone experienced this strange behavior?
This probably relate to your system locale.
parse_date_time {lubridate}
p : AM/PM indicator in the locale. Used in conjunction with I and not with H. An empty string in some locales.
Because different languages have different string for AM/PM, if your locale is not English, lubridate will not pick up the AM/PM indicator even if you specify it.
The locale in OS could include display language, time format, time zones. I'm using English windows with US time zone and Chinese locale, so I had been fighting with AM/PM in time parsing too.
Sys.getlocale("LC_TIME")
[1] "Chinese (Simplified)_China.936"
You can specify locale in parse_date_time {lubridate}, but it didn't work for me at first:
Sys.setlocale("LC_TIME", "en_US")
[1] ""
Warning message:
In Sys.setlocale("LC_TIME", "en_US") :
OS reports request to set locale to "en_US" cannot be honored
locales {base}
The locale describes aspects of the internationalization of a program. Initially most aspects of the locale of R are set to "C" (which is the default for the C language and reflects North-American usage).
strptime for uses of category = "LC_TIME".
Then I found this and used this to success:
Sys.setlocale("LC_TIME", "C")
[1] "C"
After this the parsing works:
parse_date_time('12/17/1996 04:00:00 PM', '%m/%d/%Y %I:%M:%S %p')
[1] "1996-12-17 16:00:00 UTC"
You can also specify time zone and locale
parse_date_time('12/17/1996 04:00:00 PM', '%m/%d/%Y %I:%M:%S %p', tz = "America/New_York", locale = "C")
[1] "1996-12-17 16:00:00 EST"
The problem with %p part is locale related. See this issue.
The inability to parse has to do with the way lubridate guesser works.
Tthere are two ways lubridate infers formats, flex and exact. With flex matching all numeric elements can have flexible length (for example both 4 and 04 for day will work), but then, there must be non-numeric separators between the elements. For the exact matcher there need not be non-numeric separators but elements must have exact number of digits (like 04).
Unfortunately you cannot combine both matchers within one expression. It would be extremely hard to fix this and preserve the current flexibility of the lubridate parser.
In your example
> parse_date_time('4/18/1950 0130', 'mdY HM')
[1] NA
Warning message:
All formats failed to parse. No formats found.
you want to perform flex matching on the date part 4/18/1950 and exact matching on time part 0130.
Please note that if your date-time is in fully flex, or fully exact format the parsing will work as expected:
> parse_date_time('04/18/1950 0130', 'mdY HM')
[1] "1950-04-18 01:30:00 UTC"
> parse_date_time('4/18/1950 1:30', 'mdY HM')
[1] "1950-04-18 01:30:00 UTC"
The lubridate 1.4.1 "fixes" this by adding a new argument to parse_date_time, exact=FALSE. When set toTRUE the orders argument is interpreted as containing exact strptime formats and no guessing or training is performed. This way you can add as many exact formats as you want and you will also gain in speed because no guessing is performed at all.
> parse_date_time(c('12/17/1996 04:00:00','4/18/1950 0130'),
+ c('%m/%d/%Y %I:%M:%S','%m/%d/%Y %H%M'),
+ exact = T)
[1] "1996-12-17 04:00:00 UTC" "1950-04-18 01:30:00 UTC"
Relatedly, there was an explicit requested asking for such an option.

POSIXct: as.POSIXct("2008-03-30 02:00:00",format="%Y-%m-%d %H:%M:%S") fails is it a bug in R?

Very strange things happen while converting to POSIXct:
> as.POSIXct("2008-03-30 02:00:00",format="%Y-%m-%d %H:%M:%S")
[1] NA
but:
> as.POSIXct("2008-02-28 02:00:00",format="%Y-%m-%d %H:%M:%S")
[1] "2008-02-28 02:00:00 CET"
I am clueless. is it a bug in R?
Does it perhaps have to do with my German (Berlin) locale?
I am using R 2.14.2 for windows.
This is the beginning of daylight savings time in Germany in 2008, see this link. That time effectively is equivalent to "2008-03-30 03:00:00" (an hour later), which should work on your system. So, yes, this is related to your locale.

Resources