Parsing 12-hour times using lubridate

Parsing 12-hour times using lubridate - r

I'm trying to get my head around parsing 12-hour times using lubridate. If I run
library(lubridate)
times <- c("1:30 AM", "6:29 AM", "6:59 AM", "9:54 AM", "2:45 PM")
hm(times)
I get
[1] "1H 30M 0S" "6H 29M 0S" "6H 59M 0S" "9H 54M 0S" "2H 45M 0S"
Note that the AM/PM designation is not used. However, if if the time strings also includes a date then the parsing works
ymd_hm(paste("01-01-01", times))
[1] "2001-01-01 01:30:00 UTC" "2001-01-01 06:29:00 UTC"
[3] "2001-01-01 06:59:00 UTC" "2001-01-01 09:54:00 UTC"
[5] "2001-01-01 14:45:00 UTC"
It seems to me that the time parsing functions: hm, hms, ... doesn't recognize the AM/PM, but the date functions do. Is it possible to allow for 12-hour parsing without going through the dates?
[I know I can do this by parsing the strings but I was wondering it it was possible within lubidate]

The two objects belong to different classes each one designed for a specific purpose.
With the first function you create a period class object. This kind of class if designed to represent times, like time of a race, or "how many hours Bolt runs 100 meters?" 0 hours 0 minutes 9 seconds 58 and so on.
See:
a <- hm(times)
class(a)
[1] "Period"
attr(,"package")
[1] "lubridate"
The second object with the function ymd_hm creates an object of class:
b <- ymd_hm(paste("01-01-01", times))
class(b)
[1] "POSIXct" "POSIXt"
This class of object is designed to represent "time", in the sense of Gregorian calendar (or maybe other kind of calendars). It does parse also AM/PM that are vital to differentiate hours of the day in a 12 hours clock.

Related

How to stop lubridate dropping the hour when it's just gone midnight

How do I get lubridate to keep the hour when the time is e.g. 00:30:42? hms() wants to output 30M 42S, whereas I want 0H 32M 40S.
library(lubridate)
hms("00:32:40")
[1] "32M 40S"
The reason I need this is because I'm using the time to put together a datetime:
ymd_hms("2022-09-13 0H 40M 32S")
[1] "2022-09-13 00:40:32 UTC"
ymd_hms("2022-09-13 40M 32S")
[1] NA
Warning message:
All formats failed to parse. No formats found.

Depends a bit on what your original data is, as strings...
ymd("2022-09-13") + hms("00:32:40")
ymd("2022-09-13") + as.duration("32M 40S")
or as #king_of_limes already suggested in his answer:
ymd_hms(paste("2022-09-13", "00:32:40"))
All will output:
[1] "2022-09-13 00:32:40 UTC"
If we truely want to work with one string that has a time lacking the hours, we can create a ymd_ms function that is not existing in lubridate. It is a bit more robust than needed, but it supports now both (HH:)MM:SS as well as (H) M S formats.
# lets create some not existing lubridate style function ymd_ms()
ymd_ms <- function(x) {
ymd_hms(gsub("(\\d )(\\d{1,2}M|(\\d{1,2}:\\d{1,2})$)", "\\10H \\2", x, perl = T))
}
v <- c("2022-09-13 40M 32S", "2022-09-13 3H 40M 32S", "2022-09-13 2H 40:32", "2022-09-13 12:40:32", "2022-09-13 40:32", "2022-09-13 11H 40:32 PM", "2022-09-13 11:40:32 PM", "2022-09-13 11:40:32 AM")
ymd_ms(v)
# [1] "2022-09-13 00:40:32 UTC" "2022-09-13 03:40:32 UTC" "2022-09-13 02:40:32 UTC" "2022-09-13 12:40:32 UTC" "2022-09-13 00:40:32 UTC" "2022-09-13 23:40:32 UTC" "2022-09-13 23:40:32 UTC"
# [8] "2022-09-13 11:40:32 UTC"

If you have two strings
a <- "2022-09-13"
b <- "00:32:40"
then you can do
lubridate::ymd_hms(paste(a,b))

Working with difficult AM/PM formats and REGEX with lubridate in R

Hello guys I hope everyone is having a good one, I am trying to work with some AM/PM formats in lubridate on R but I cant seem to come up with a proper solution I hope you guys can correct meand help me out please!
I have a HUGE dataset that has date_time formats in a very rare way the format goes as follow:
First a number that represents the day, second an abbreviation of the month OR even the month fully spelled out a 12H time format and the strings " a. m." OR "p. m." or even a combination of more spaces between or missing "dots" then such as "a. m" to set an example please take a look at this vector:
dates<-c("02 dec 05:47 a. m",
"7 November 09:47 p. m.",
"3 jul 12:28 a.m.",
"23 sept 08:53 a m.",
"7 may 09:05 PM")
These make up for more than 95% of the rare formats of datetime in the data set I have been trying to use lubridate on R I am trying to use the function
ydm_hm(paste(2021,dates))
this is because all dates are form 2021 but I get always:
[1] NA NA NA
[4] NA "2021-05-07 21:05:00 UTC"
Warning message:
4 failed to parse.
the 4 that fail to parse give me NAS and the only one that parses is correct I do notice that this one has PM or AM as uppercase letters without dots but most of the time my formats will be like this:
ydm_hm("7 may 09:05 p.m.")
and this gives me NAS...
So I feel as though the only way to get this dates to workout is to change the structure and using REGEX so convert all "a. m." combinations into "AM" and "PM" only after analyzing the data I realized all "p.m" or "a. m." strings come after ONE or TWO spaces after the 12H time format that always have a length of 5 characters and so what should be considered to come up with the patter of the REGEX is the following
the string will begins with one or two numbers then spaces and then letters (for the month abbreviated or fully spelled out after that will have spaces and then 5 characters (that's the 12H time format) and then will have letters spaces and dots for all possible a.m and p.m formats but I have tried with no luck to convert the structure of the date.. if you guys could help me I will be so freaking thankful I dont know if there is a way or another package in R that will even resolve this issue without using regex so thank you everyone for your help !
my desired output will be:
"2021-12-02 05;47:00 UTC"
"2021-11-07 09:47:00 UTC"
"2021-07-03 12:28:00 UTC"
"2021-09-23 08:53:00 UTC"
"2021-05-07 21:05:00 UTC"

In this case, parse_date from parsedate works
library(parsedate)
parse_date(paste(2021, dates))
-output
[1] "2021-12-02 05:47:00 UTC"
[2] "2021-11-07 09:47:00 UTC"
[3] "2021-07-03 12:28:00 UTC"
[4] "2021-09-23 08:53:00 UTC"
[5] "2021-05-07 21:05:00 UTC"
Or if the second value should be PM, use str_remove to remove the space
library(stringr)
parse_date(paste(2021, str_remove_all(dates,
"(?<=[A-Za-z])[. ]+(?=[A-Za-z])")))
[1] "2021-12-02 05:47:00 UTC"
[2] "2021-11-07 21:47:00 UTC"
[3] "2021-07-03 00:28:00 UTC"
[4] "2021-09-23 08:53:00 UTC"
[5] "2021-05-07 21:05:00 UTC"
With ydm_hm, the issue is that one of the am/pm format showed spaces without the . and this may not get parsed. We could change the format by removing the spaces
library(lubridate)
library(stringr)
ydm_hm(paste(2021, str_remove_all(dates,
"(?<=[A-Za-z])[. ]+(?=[A-Za-z])")))
[1] "2021-12-02 05:47:00 UTC"
[2] "2021-11-07 21:47:00 UTC"
[3] "2021-07-03 00:28:00 UTC"
[4] "2021-09-23 08:53:00 UTC"
[5] "2021-05-07 21:05:00 UTC"

Since you raised the issue of regular expression, I thought I might try one way to do that
library(stringr)
# get boolean for pm dates
pm = str_detect(dates,"(?<=\\d\\d:\\d\\d\\s{1,2})[pP]",)
# convert dates to dates without am/pm
dates = str_extract(dates,"^.*:\\d\\d")
# add pm back to pm dates and am to am dates
dates[pm] <- paste(dates[pm], "PM")
dates[!pm] <- paste(dates[!pm], "AM")
# now your orignal approach works
ydm_hm(paste(2021,dates))
Output
[1] "2021-12-02 05:47:00 UTC" "2021-11-07 21:47:00 UTC" "2021-07-03 00:28:00 UTC" "2021-09-23 08:53:00 UTC"
[5] "2021-05-07 21:05:00 UTC"

Converting Lubridate Period Class Time to Time Stamp Format in R

My code reads in a .txt file that holds a series of time stamps in one column. I needed to account for daylight savings for the column, so I used the lubridate package to subtract an hour from these time stamps. I'm struggling with converting the period class from lubridate back into a time format of %I:%M%:S %p.
Here is my code.
# Changing from 24 Hr to 12 Hr Format #
raw_data_sample$Time <- format(strptime(raw_data_sample$Time, format='%H:%M:%S'), '%I:%M:%S %p')
# Subtracting an Hour for Daylight Savings
raw_data_sample$Time <- hms(raw_data_sample$Time)
raw_data_sample$Time <- raw_data_sample$Time - hours(1)
Here is my current output.
c("1H 41M 54S", "1H 42M 4S", "1H 42M 14S", "1H 42M 31S", "1H 42M 41S", "1H 43M 1S")
I'm hoping to get an output like
1:41:54 PM, 1:42:40 PM
Any advice? Thank you!

You can use parse_date_time function to convert your period object to POSIXct then use format to get in your suitable format.
library(lubridate)
raw_data_sample$Time1 <- format(parse_date_time(raw_data_sample$Time, 'HMS'), '%I:%M:%S %p')
For example,
x <- period(c("1H 41M 54S", "1H 42M 4S", "1H 42M 14S", "1H 42M 31S", "1H 42M 41S", "1H 43M 1S"))
format(parse_date_time(x, 'HMS'), '%I:%M:%S %p')
#[1] "01:41:54 AM" "01:42:04 AM" "01:42:14 AM" "01:42:31 AM" "01:42:41 AM" "01:43:01 AM"

If we need to subtract an hour, do this on the original datetime object, and then do the formatting
library(lubridate)
# // convert to Datetime class
raw_data_sample$Time <- as.POSIXct(raw_data_sample$Time, format = "%H:%M:%S")
# // subtract 1 hour from the Datetime and use format to change the format
format(raw_data_sample$Time %m-% hours(1), "%I:%M:%S %p")

Convert and round column of string times in h:m format to time

I have a column of "times" in string format in hour and minute (no seconds)
time ...
<char>
18:40
12:20
23:59
2:15
...
Is there a way to convert these into times and then round them down such that my data will look like this
time ...
<time>
18:00
12:00
23:00
2:00
...

POSIXct class needs both date and time, so if date is not provided it by default takes today's date. You can then use floor_date to round it down at the nearest hour.
library(lubridate)
floor_date(as.POSIXct(df$time, 'UTC', format = '%H:%M'), 'hour')
#[1] "2020-07-06 18:00:00 UTC" "2020-07-06 12:00:00 UTC" "2020-07-06 23:00:00 UTC"
#[4] "2020-07-06 02:00:00 UTC"
You can then use format to keep part that you are interested in.
format(floor_date(as.POSIXct(df$time, 'UTC', format = '%H:%M'), 'hour'), '%H:%M')
#[1] "18:00" "12:00" "23:00" "02:00"
A solution without date-time manipulation using regex :
sub(':.*', ':00', df$time)
#[1] "18:00" "12:00" "23:00" "2:00"
However, note that manipulating date and times using regex is probably not the best option.
data
df <- structure(list(time = c("18:40", "12:20", "23:59", "2:15")),
class = "data.frame", row.names = c(NA, -4L))

Maybe Period class in lubridate is what you need:
library(lubridate)
Parse periods with hour and minute
hm(df$time)
# [1] "18H 40M 0S" "12H 20M 0S" "23H 59M 0S" "2H 15M 0S"
Extract hours component
hour(hm(df$time))
# [1] 18 12 23 2
Create a new period object
hours(hour(hm(df$time)))
# [1] "18H 0M 0S" "12H 0M 0S" "23H 0M 0S" "2H 0M 0S"

Handling integer times in R

Times in my data frame are recorded as integers as in: 1005,1405,745,1130,2030 etc. How do I convert these integers so R will understand and use it in functions such as strptime. Thanks in advance for your help

Solution using strptime()
As was pointed out by Psidom in his comment, you can convert the integers to character and use strptime():
int_times <- c(1005,1405,745,1130,2030)
strptime(as.character(int_times), format="%H%M")
## [1] "2016-04-21 10:05:00 CEST" "2016-04-21 14:05:00 CEST" NA
## [4] "2016-04-21 11:30:00 CEST" "2016-04-21 20:30:00 CEST"
However, as you can see, you run into trouble as soon as the number has only three digits. You can get around this by using formatC() to format the integers to character with four digits and a leading zero (if needed):
char_times <- formatC(int_times, flag = 0, width = 4)
char_times
[1] "1005" "1405" "0745" "1130" "2030"
Now, conversion works:
strptime(char_times, format="%H%M")
## [1] "2016-04-21 10:05:00 CEST" "2016-04-21 14:05:00 CEST" "2016-04-21 07:45:00 CEST"
## [4] "2016-04-21 11:30:00 CEST" "2016-04-21 20:30:00 CEST"
Note that strptime() always returns a POSIXct object that involves time and date. Since no data was given, the current day was used. But you could also use paste() to combine the times with any date:
strptime(paste("2010-03-21", char_times), format="%Y-%m-%d %H%M")
## [1] "2010-03-21 10:05:00 CET" "2010-03-21 14:05:00 CET" "2010-03-21 07:45:00 CET"
## [4] "2010-03-21 11:30:00 CET" "2010-03-21 20:30:00 CET"
Solution using lubridate::hm()
As was suggested by Richard Telford in his comment, you could also make use of lubridate's period class, if you prefer not to have any date involved. This class is for periods of times and thus you could represent a clock time, say 10:23, as the period 10 hours, 23 minutes. However, simply using hm() from lubridate does not work:
library(lubridate)
hm(char_times)
## [1] NA NA NA NA NA
## Warning message:
## In .parse_hms(..., order = "HM", quiet = quiet) :
## Some strings failed to parse
The reason is that without a separator, it is not clear how these times should be converted. hm() just expects a representation that has hours before minutes. But "1005" could be 100 hours and 5 minutes just as well as 1 hour and 5 minutes. So you need to introduce a separation between hours and minutes, which you could do for instance as follows:
char_times2 <- paste(substr(char_times, 1, 2), substr(char_times, 3, 4))
hm(char_times2)
## [1] "10H 5M 0S" "14H 5M 0S" "7H 45M 0S" "11H 30M 0S" "20H 30M 0S"
Note that I have again used the fixed width string represantation char_times, because then the hours are always given by the first two characters. This makes it easy to use substr().

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Parsing 12-hour times using lubridate - r

Related

How to stop lubridate dropping the hour when it's just gone midnight

Working with difficult AM/PM formats and REGEX with lubridate in R

Converting Lubridate Period Class Time to Time Stamp Format in R

Convert and round column of string times in h:m format to time

Handling integer times in R

Categories

Resources