guess_formats + R + lubridate - r

I'm having trouble understanding how to use the guess_formats function in lubridate. I have a vector of dates in some unknown set/order of formats. I'd like to convert them to a Date object (or at least convert as many as possible). The following code is what I've tried:
library(lubridate)
sampleDates <- c("4/6/2004","4/6/2004","4/6/2004","4/7/2004",
"4/6/2004","4/7/2004","2014-06-28","2014-06-30","2014-07-12",
"2014-07-29","2014-07-29","2014-08-12")
formats <- guess_formats(sampleDates, c("Ymd", "mdY"))
dates <- as.Date(sampleDates, format=formats)
This gives all NA's.
This is obviously just a short example. In the real case, I wouldn't know where the various formats are scattered about, and I wouldn't be 100% sure there are only %m/%d/%Y and %Y-%m-%d. Could someone let me know either A. how would guess_formats be used in this example or B. is there something more appropriate to use in lubridate/base R, hopefully without a lot of regex'ing. Thanks!
Edit:
I've also tried parse_date_time. What I don't understand is the following works for this example:
parse_date_time(sampleDates,
orders = c("Ymd", "mdY"),
locale = "eng")
But this does not:
parse_date_time(sampleDates,
orders = c("mdY", "Ydm"),
locale = "eng")
In my actual set of data, I will not know the order of the formatting, which seems to be important for this function.
Double Edit: Dur, OK, I see I had Ymd in the first parse_date_time example and Ydm in the second...carry on.

No need to call guess_formats just use parse_date_time :
parse_date_time(sampleDates, c("Ymd", "mdY"))
[1] "2004-04-06 UTC" "2004-04-06 UTC" "2004-04-06 UTC" "2004-04-07 UTC" "2004-04-06 UTC"
[6] "2004-04-07 UTC" "2014-06-28 UTC" "2014-06-30 UTC" "2014-07-12 UTC" "2014-07-29 UTC"
[11] "2014-07-29 UTC" "2014-08-12 UTC"
Internally it will call guess_formats.

A general purpose option that does a good job at matching date formats is the anytime package:
library(anytime)
anydate(sampleDates)
[1] "2004-04-06" "2004-04-06" "2004-04-06" "2004-04-07" "2004-04-06" "2004-04-07" "2014-06-28"
[8] "2014-06-30" "2014-07-12" "2014-07-29" "2014-07-29" "2014-08-12"

Related

POSIXct formatting in R?

I'm trying to convert characters into a POSIXct format using lubridate's parse date_time function.
This is my code:
df$TimeClosed <- parse_date_time(df$TimeClosed, 'mdy HMSp')
Most of my variables are in this format : 09/12/2017 11:08:51 AM
but there are some that are like this : 7/6/19 15:37
and because of that, they are failing to parse.
Any suggestions on how I can fix this so all date/times are in POSIX and in the same format?
Thanks,
parse_date_time can take a vector of formats and will try them one by one in order until one matches.
library (lubridate)
x <- c("09/12/2017 11:08:51 AM", "7/6/19 15:37")
parse_date_time(x, c("dmYHMSp", "dmyHM"))
## [1] "2017-12-09 11:08:51 UTC" "2019-06-07 15:37:00 UTC"

reformatting datetime in R

I have data "A" in the format chr "5/7/2021 15:15". I would like to convert it to a format which R will recognize. (It is giving me errors when I try to plot, for instance, which leads me to believe it needs to be reformatted.)
Here is the format "B" I would like to achieve. R seems to like this ok, so I might as well match it (?):
POSIXct, format: "2021-8-11 16:00:00". I am not sure if the seconds are needed, and they do not exist in data "A" so the seconds could be omitted. If R doesn't care then I don't either. The timezone is UTC.
How do I do it? I have tried a couple things, including:
CTD_datetime_UTC <- as.POSIXct(CTD$Date.and.Time, tz = "UTC").
You can use strptime from base R. But there are many parsers for dates...
Assuming the format is "day/month/year" (example is not unambiguous, could also be "month/day/year")
strptime("5/7/2021 15:15", "%d/%m/%Y %H:%M", tz = "UTC")
Returns:
[1] "2021-07-05 15:15:00 UTC"
Using parsedate
library(parsedate)
parse_date("5/7/2021 15:15")
[1] "2021-05-07 15:15:00 UTC"

How to convert a date character column (with Chinese characters) to a datetime format column in a table in R?

Is there a way to convert a date character column (with Chinese characters) to a datetime format column in a table in R? The column is now in the form of "xxx年xxx月xxx日". I am a bit new to R and I know that "as.date" could only read certain formats of date characters like xxx-xxx-xxx (xxx are numerical values), but failed to recognize the format in my case.
Basically the data looks like this:
dput(head(lands_full$contract_signed_date))
c("2004年10月11日", "2008年09月10日", "2011年10月25日",
"2011年12月31日", "2018年08月07日", "2016年06月24日"
)
Any help would be appreciated!
This can be done with the package lubridate:
lubridate::parse_date_time(x, '%Y年%m月%d日')
# [1] "2004-10-11 UTC" "2008-09-10 UTC" "2011-10-25 UTC" "2011-12-31 UTC" "2018-08-07 UTC"
# [6] "2016-06-24 UTC"
Timezone can be set with tz argument to parse_date_time.
use
strptime(gsub("\\D","-",x),"%F")

Change string including "T07:57:00Z" to a datetime type?

I have a list of strings which take the following form: "2019-03-05T07:57:00Z" and I need to convert them to a Date data type so that I can do calculations with them, however I can't get R to recognize the "T07:57:00Z" as a time. The format itself doesn't matter as long as it is in a form where I can do calculations, does R have a way to handle this form of date?
Use lubridate for datetime use cases:
lubridate::ymd_hms("2019-03-05T07:57:00Z")
[1] "2019-03-05 07:57:00 UTC"
Or with base R
(res <- as.POSIXct("2019-03-05T07:57:00Z", format = "%Y-%m-%dT%H:%M:%SZ"))
# [1] "2019-03-05 07:57:00 CET"

Format multiple date formats in one columns using lubridate

Sometimes I am given data sets that has two different date formats but common variables that have to been joined into one dataframe. Over the years, I've tried various solutions to get around this workflow hassle. Now that I've been using lubridate, it seems like many of these problems are easily solved. However, I am encountering some behaviour that seems weird to me though I imagine there is a good explanation that is beyond me. Say I am given a data set with different date formats that I join into one data frame. This dataframe looks like this:
library(ludridate)
library(dplyr)
df<-data.frame(Lab=c("A","B"),DATE=c("12/15/15","12/15/2013")); df
I want to convert this data to a date format with lubridate. However the following does not format consistently:
df %>%
mutate(mdy(DATE))
...but rather creates a 0015 date. If I filter just for Lab "A":
df %>%
filter(Lab=="A") %>%
mutate(mdy(DATE))
... or even group_by Lab:
df %>%
group_by(Lab) %>%
mutate(mdy(DATE))
Then I get the desired year format. Is this the correct behaviour of the lubridate family of date formatting functions? Is there a better way to accomplish what I am doing? I am sure that multiple date formats in one column is a relatively common (and annoying) occurence.
Thanks in advance.
parse_date_time of lubridate package can help format multiple date formats in one go.
Syntax:
df$date = parse_date_time(df$date, c(format1, format2, format3))
You need to specify all the possible format types.
Since lubridate has some difficulty understanding (correctly) some format types, you need to make custom format.
In the help section , you will find the below illustration. You can recreate it to suit your requirement.
## ** how to use `select_formats` argument **
## By default %Y has precedence:
parse_date_time(c("27-09-13", "27-09-2013"), "dmy")
## [1] "13-09-27 UTC" "2013-09-27 UTC"
## to give priority to %y format, define your own select_format function:
my_select <- function(trained){
n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5
names(trained[ which.max(n_fmts) ])
}
parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## '[1] "2013-09-27 UTC" "2013-09-27 UTC"
From the help on parse_date_time:
## ** how to use select_formats **
## By default %Y has precedence:
parse_date_time(c("27-09-13", "27-09-2013"), "dmy")
## [1] "13-09-27 UTC" "2013-09-27 UTC"
## to give priority to %y format, define your own select_format function:
my_select <- function(trained){
n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5
names(trained[ which.max(n_fmts) ])
}
parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## '[1] "2013-09-27 UTC" "2013-09-27 UTC"

Resources