Convert unconventional datetime string to POSIXct in R - r

What format string argument of the as.POSIXct() function would allow me to coerce the following timestamp into POSIXct?
datetime <- "2018/02/08T23:58:33z"
datetime <- as.POSIXct(datetime, format = "%Y/%m/%d %H:%M:%S", tz = "UTC)
Desired result
2018-02-08 23:58:33

Just put a literal "T" in the string to match (trailing characters are ignored by default anyway):
as.POSIXct(datetime, format = "%Y/%m/%dT%H:%M:%S", tz = "UTC")

Other option is to use anytime which parses automatically
anytime::anytime(datetime, tz = "UTC", asUTC = TRUE)
#[1] "2018-02-08 23:58:33 UTC"

Related

Caused by error in `as.POSIXlt.character()`: ! character string is not in a standard unambiguous format. How can I change CHR to POSIXCT?

I have a huge data frame called 'cyclist_trip_data_all'
head(cyclist_trip_data_all)
The most important columns of dates have chr class and I need it to be converted to POSIXct for calculations.
combined_ctd <-
mutate(cyclist_trip_data_all, ride_length =
difftime(ended_at,started_at,units='mins'))
The above throws the unambiguous format error at us and I'm pretty sure because the two columns are chr class.
cyclist_trip_data_all %>%
transmute(ended_at = as.POSIXct(ended_at,tz="",tryFormats=
c("%Y-%m-%d %H:%M:%OS",
"%Y/%m/%d %H:%M:%OS",
"%Y-%m-%d %H:%M",
"%Y/%m/%d %H:%M",
"%Y-%m-%d",
"%Y/%m/%d")))
head(cyclist_trip_data_all$ended_at)
Using the above code throws similar unambiguous format error:
Error in `transmute()`:
! Problem while computing `ended_at = as.POSIXct(...)`.
Caused by error in `as.POSIXlt.character()`:
! character string is not in a standard unambiguous format
Traceback:
1. cyclist_trip_data_all %>% transmute(ended_at = as.POSIXct(ended_at,
. tz = "", tryFormats = c("%Y-%m-%d %H:%M:%OS", "%Y/%m/%d %H:%M:%OS",
. "%Y-%m-%d %H:%M", "%Y/%m/%d %H:%M", "%Y-%m-%d", "%Y/%m/%d")))
2. transmute(., ended_at = as.POSIXct(ended_at, tz = "", tryFormats = c("%Y-%m-%d %H:%M:%OS",
. "%Y/%m/%d %H:%M:%OS", "%Y-%m-%d %H:%M", "%Y/%m/%d %H:%M",
. "%Y-%m-%d", "%Y/%m/%d")))
3. transmute.data.frame(., ended_at = as.POSIXct(ended_at, tz = "",
. tryFormats = c("%Y-%m-%d %H:%M:%OS", "%Y/%m/%d %H:%M:%OS",
. "%Y-%m-%d %H:%M", "%Y/%m/%d %H:%M", "%Y-%m-%d", "%Y/%m/%d")))
4. mutate_cols(.data, dots, caller_env = caller_env())
5. withCallingHandlers({
. for (i in seq_along(dots)) {
I've tried as.POSIXct(as.numeric(as.character())) function instead but it changes every date values in the column into NA introduced by coersion.
Also tried:
cyclist_trip_data_all$ended_at <- as.POSIXct(cyclist_trip_data_all$ended_at,format="%Y-%m-%d %H:%M:%S",tz="UTC")
cyclist_trip_data_all$started_at <- as.POSIXct(cyclist_trip_data_all$ended_at,format="%Y-%m-%d %H:%M:%S",tz="UTC")
And it also changes every value to NA.
How do I actually change it to a POSIXct class without it becoming NA?
To note, this error doesn't happen in RStudio but it's happening in my Kaggle notebook for R.
There are alternatives -- at some point I found having to chase (known) formats to be too repetitive and boring and wrote a package that does it for me (and fast):
> library(anytime)
> datevec <- c("5/30/2021 11:58", "5/30/2021 12:10", "5/30/2021 11:29")
> anytime(datevec)
[1] "2021-05-30 11:58:00 CDT" "2021-05-30 12:10:00 CDT" "2021-05-30 11:29:00 CDT"
>
The example just shows the first three of your (non-reproducibly presented) dates. The package has other functions too for converting dates, or specific timezones as well as formatters. Take a look: anytime at CRAN -- and yes it of course also works in pipes and with other packages and whatnot. It "just" aims to take care of converting 'any time or date in any format' to POSIXct or Date.

as.POSIXct, timezone part is ignored

Here is my example.
test <- as.POSIXct(as.Date("2019-11-01"), tz = "UTC")
test
It prints:
[1] "2019-10-31 19:00:00 CDT"
It looks like it ignored tz parameter:
attr(test, "tzone")
returns NULL.
Why is it coming with "19" hours and not 00? How can I make it 00 hours and take UTC?
UPDATE
Here is even better case:
test_2 <- as.POSIXct("2019-11-01 00:00:00", tz = "UTC")
str(test_2)
attr(test_2, "tzone")
strftime(test_2, "%H")
It generates:
POSIXct[1:1], format: "2019-11-01"
[1] "UTC"
[1] "19"
Now it looks like parameter tz is not ignored, but hour is 19, but why?
We can use with_tz from lubridate
library(lubridate)
test1 <- with_tz(as.Date("2019-11-01"), tzone = 'UTC')
attr(test1, 'tzone')
#[1] "UTC"
Also, as.POSIXct can be directly applied
test2 <- as.POSIXct("2019-11-01", tz = "UTC")
test2
#[1] "2019-11-01 UTC"
attr(test2, 'tzone')
#[1] "UTC"
With strftime, use the option for tz
strftime(test2, "%H", tz = 'UTC')
#[1] "00"
If we check the strftime, it is doing a second conversion with as.POSIXlt and then with format gets the formatted output
strftime
function (x, format = "", tz = "", usetz = FALSE, ...)
format(as.POSIXlt(x, tz = tz), format = format, usetz = usetz,
...)
According to ?as.POSIXlt
tz - time zone specification to be used for the conversion, if one is required. System-specific (see time zones), but "" is the current time zone, and "GMT" is UTC (Universal Time, Coordinated). Invalid values are most commonly treated as UTC, on some platforms with a warning.
as.POSIXlt(test2)$hour
#[1] 0
as.POSIXlt(test2, tz = "")$hour
#[1] 20
The "" uses the Sys.timezone by default
as.POSIXlt(test2, tz = Sys.timezone())$hour
#[1] 20

String column to Dtm(date time) column in R

How does one convert a column from str to dtm? I've tried as.date and strptime and non of those works. Say I have a table with a column with 3 attributes (2003/11/04 19:29, 2001/04/02 21:32, 2003/10/28 09:51) in the str format. How would I covert this column so that it is in the dtm format? Thank you in advance!
Check ?strptime for different format arguments. You can do:
x <- c('2003/11/04 19:29', '2001/04/02 21:32', '2003/10/28 09:51')
as.POSIXct(x, format = "%Y/%m/%d %H:%M", tz = "UTC")
#Can also be done with `strptime`
#strptime(x, format = "%Y/%m/%d %H:%M", tz = "UTC")
#[1] "2003-11-04 19:29:00 UTC" "2001-04-02 21:32:00 UTC" "2003-10-28 09:51:00 UTC"
Or with lubridate
lubridate::ymd_hm(x)
Replace x with column name df$column_name.

R: ISO 8601 converting UTC to ISO 8601. An hour is added why?

So I'm looking to convert some ISO 8601 time to UTC format in R. For example:
library("lubridate")
x <- "2010-04-14-01-00-00-UTC"
datetime <- lubridate::ymd_hms(x)
datetime
[1] "2010-04-14 01:00:00 UTC"
strftime(datetime, "%Y-%m-%dT%H:%M:%SZ")
[1] "2010-04-14T02:00:00Z"
However in ISO 8601 "Z" indicates UTC time and I would therefore have expected "2010-04-14T01:00:00Z", but an hour has been added onto the datetime. Why? Am I miss-understanding something?
What is the correct way in R to convert between the two? And to convert backwards?
From the documentation:
strftime(x, format = "", tz = "", usetz = FALSE, ...)
[...]
tz A character string specifying the time zone to be used for the
conversion. System-specific (see as.POSIXlt), but "" is the current
time zone, and "GMT" is UTC. Invalid values are most commonly treated
as UTC, on some platforms with a warning.
So, you need to specify the correct timezone:
strftime(datetime, "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
#[1] "2010-04-14T01:00:00Z"
Otherwise it takes the timezone from your system's locale settings:
strftime(datetime, "%Y-%m-%dT%H:%M:%S", usetz = TRUE)
#[1] "2010-04-14T03:00:00 CEST"

Remove timezone during POSIXlt Conversion in R

I have a column in my dataframe as datetime (factor) with the values as "15-10-2017 16:41:00".
I wanted this data to be converted as "2017-10-15 16:41:00".
When i try to convert this, I'm getting the timezone also as output.
I tried using tz="", usetz=F but no use.
Any suggestions ?
Code:
as.POSIXlt("15-10-2017 16:41:00",format = "%d-%m-%Y %H:%M:%S")
[1] "2017-10-15 16:41:00 IST"
From the help page of as.POSIXlt:
"" is the current time zone
which is the default.
That's why it does not work. You could remove the timezone information this way, and it will not show while printing:
my_datetime <- as.POSIXlt("15-10-2017 16:41:00",format = "%d-%m-%Y %H:%M:%S")
my_datetime$zone <- NULL
my_datetime
but I don't understand why you would want to do that. You should convert to GMT if you don't want to worry about the timezone. Also lubridate package has a nice force_tz function if you have to force some specific timezones.
If you are ok storing the datetime as a character instead of as a POSIXlt, then you can use strftime():
my_datetime <- as.POSIXlt("15-10-2017 16:41:00",format = "%d-%m-%Y %H:%M:%S")
strftime(my_datetime)
I do it like this:
strip.tz <- function(dt) {
fmt <- "%Y-%m-%d %H:%M:%S"
strptime(strftime(dt, format = fmt, tz=""), format = fmt, tz="UTC")
}
and you would use it like this:
my_datetime <- as.POSIXct("15-10-2017 16:41:00",format = "%d-%m-%Y %H:%M:%S")
strip.tz(my_datetime)

Resources