I have a local.time column in my data frame of class character containing elements like this :
> a$local.time
[1] "1:30 AM" "6:29 AM" "6:59 AM" "9:54 AM" "10:14 AM" "10:34 AM" "12:54 PM" "1:15 PM" "1:20 PM"
[10] "1:20 PM" "2:15 PM" "2:15 PM" "4:23 AM" "6:28 AM" "2:45 PM" "3:08 PM" "3:23 PM" "3:58 PM"
I wanted to convert them from class character to time variables. So I used:
> as.POSIXct(a$local.time, tz = "", format = "%I:%M %p", usetz = FALSE)
This resulted in :
[1] "2014-10-31 01:30:00 EDT" "2014-10-31 06:29:00 EDT" "2014-10-31 06:59:00 EDT" "2014-10-31 09:54:00 EDT"
[5] "2014-10-31 10:14:00 EDT" "2014-10-31 10:34:00 EDT" "2014-10-31 12:54:00 EDT" "2014-10-31 13:15:00 EDT"
I have a date variable in a different column and the intention is to provide the capability of filtering by date and zooming on time bands to the minute in a dynamic dashboard.
I want to remove the date and time zone from a$local.time but keep it in a time format so the chronology is maintained i.e. 18:57 is the 19th hour and 57th minute of the day etc.
If I use
a$local.time <- format(a$local.time, "%Y-%m-%d %H:%M:%S", usetz = FALSE)
a$local.time <- strftime(a$local.time, format = "%H:%m") ,
the class changes to character ! What's the right approach?
The chron package has a "times" class that might be helpful for you. Starting with something similar to what you have so far:
x <- c("1:30 AM", "6:29 AM", "6:59 AM", "9:54 AM", "10:14 AM", "3:15 PM"))
a <- as.POSIXct(x, tz = "", format = "%I:%M %p", usetz = FALSE)
Then we can use the times function with format
library(chron)
(tms <- times(format(a, "%H:%M:%S")))
# [1] 01:30:00 06:29:00 06:59:00 09:54:00 10:14:00 15:15:00
attributes(tms)
# $format
# [1] "h:m:s"
#
# $class
# [1] "times"
You can use the hms (hour-minute-second) series of functions in the lubridate package.
library(lubridate)
times = c("1:30 AM", "6:29 AM", "6:59 AM", "9:54 AM", "2:45 PM")
I was hoping you could just do:
hm(times)
[1] "1H 30M 0S" "6H 29M 0S" "6H 59M 0S" "9H 54M 0S" "2H 45M 0S"
But notice that hm doesn't recognize the AM/PM distinction. So here's a more convoluted method that requires first using strptime, which does recognize AM/PM, and then putting the result in a form hm recognizes.
hm(paste0(hour(strptime(times, "%I:%M %p")),":",
minute(strptime(times, "%I:%M %p"))))
[1] "1H 30M 0S" "6H 29M 0S" "6H 59M 0S" "9H 54M 0S" "14H 45M 0S"
There's probably a better way, but this seems to work.
UPDATE: To address your comment, you can use the hour and minute functions to get the hours and minutes (although I like #RichardScriven's answer better). For example:
hour(times)
[1] 1 6 6 9 14
mean(hour(times) + minute(times)/60)
[1] 7.923333
Related
My code reads in a .txt file that holds a series of time stamps in one column. I needed to account for daylight savings for the column, so I used the lubridate package to subtract an hour from these time stamps. I'm struggling with converting the period class from lubridate back into a time format of %I:%M%:S %p.
Here is my code.
# Changing from 24 Hr to 12 Hr Format #
raw_data_sample$Time <- format(strptime(raw_data_sample$Time, format='%H:%M:%S'), '%I:%M:%S %p')
# Subtracting an Hour for Daylight Savings
raw_data_sample$Time <- hms(raw_data_sample$Time)
raw_data_sample$Time <- raw_data_sample$Time - hours(1)
Here is my current output.
c("1H 41M 54S", "1H 42M 4S", "1H 42M 14S", "1H 42M 31S", "1H 42M 41S", "1H 43M 1S")
I'm hoping to get an output like
1:41:54 PM, 1:42:40 PM
Any advice? Thank you!
You can use parse_date_time function to convert your period object to POSIXct then use format to get in your suitable format.
library(lubridate)
raw_data_sample$Time1 <- format(parse_date_time(raw_data_sample$Time, 'HMS'), '%I:%M:%S %p')
For example,
x <- period(c("1H 41M 54S", "1H 42M 4S", "1H 42M 14S", "1H 42M 31S", "1H 42M 41S", "1H 43M 1S"))
format(parse_date_time(x, 'HMS'), '%I:%M:%S %p')
#[1] "01:41:54 AM" "01:42:04 AM" "01:42:14 AM" "01:42:31 AM" "01:42:41 AM" "01:43:01 AM"
If we need to subtract an hour, do this on the original datetime object, and then do the formatting
library(lubridate)
# // convert to Datetime class
raw_data_sample$Time <- as.POSIXct(raw_data_sample$Time, format = "%H:%M:%S")
# // subtract 1 hour from the Datetime and use format to change the format
format(raw_data_sample$Time %m-% hours(1), "%I:%M:%S %p")
I have a column of "times" in string format in hour and minute (no seconds)
time ...
<char>
18:40
12:20
23:59
2:15
...
Is there a way to convert these into times and then round them down such that my data will look like this
time ...
<time>
18:00
12:00
23:00
2:00
...
POSIXct class needs both date and time, so if date is not provided it by default takes today's date. You can then use floor_date to round it down at the nearest hour.
library(lubridate)
floor_date(as.POSIXct(df$time, 'UTC', format = '%H:%M'), 'hour')
#[1] "2020-07-06 18:00:00 UTC" "2020-07-06 12:00:00 UTC" "2020-07-06 23:00:00 UTC"
#[4] "2020-07-06 02:00:00 UTC"
You can then use format to keep part that you are interested in.
format(floor_date(as.POSIXct(df$time, 'UTC', format = '%H:%M'), 'hour'), '%H:%M')
#[1] "18:00" "12:00" "23:00" "02:00"
A solution without date-time manipulation using regex :
sub(':.*', ':00', df$time)
#[1] "18:00" "12:00" "23:00" "2:00"
However, note that manipulating date and times using regex is probably not the best option.
data
df <- structure(list(time = c("18:40", "12:20", "23:59", "2:15")),
class = "data.frame", row.names = c(NA, -4L))
Maybe Period class in lubridate is what you need:
library(lubridate)
Parse periods with hour and minute
hm(df$time)
# [1] "18H 40M 0S" "12H 20M 0S" "23H 59M 0S" "2H 15M 0S"
Extract hours component
hour(hm(df$time))
# [1] 18 12 23 2
Create a new period object
hours(hour(hm(df$time)))
# [1] "18H 0M 0S" "12H 0M 0S" "23H 0M 0S" "2H 0M 0S"
I'm trying to parse dates (using lubridate functions) from a vector which has mixed date formats.
departureDate <- c("Aug 17, 2020 12:00:00 AM", "Nov 19, 2019 12:00:00 AM", "Dec 21, 2020 12:00:00 AM",
"Dec 24, 2020 12:00:00 AM", "Dec 24, 2020 12:00:00 AM", "Apr 19, 2020 12:00:00 AM", "28/06/2019",
"16/08/2019", "04/02/2019", "10/04/2019", "28/07/2019", "26/07/2019",
"Jun 22, 2020 12:00:00 AM", "Apr 5, 2020 12:00:00 AM", "May 1, 2021 12:00:00 AM")
As I didn't notice at first, I tried to parse with lubridate::mdy_hms(departureDate) which resulted in NA values for dates whose format differs from that of the parser.
As the format may change on random positions of the vector I tried to use the following sentence:
departureDate <- tryCatch(mdy_hms(departureDate),
warning = function(w){return(dmy(departureDate))})
Which brought even more NA's as it only applied the warning function call. Is there a way to solve this by using my approach?
Thanks in advance
We can use lubridate::parse_date_time which can take multiple formats.
lubridate::parse_date_time(departureDate, c('%b %d, %Y %I:%M:%S %p', '%d/%m/%Y'))
#[1] "2020-08-17 UTC" "2019-11-19 UTC" "2020-12-21 UTC" "2020-12-24 UTC"
#[5] "2020-12-24 UTC" "2020-04-19 UTC" "2019-06-28 UTC" "2019-08-16 UTC"
#[9] "2019-02-04 UTC" "2019-04-10 UTC" "2019-07-28 UTC" "2019-07-26 UTC"
#[13] "2020-06-22 UTC" "2020-04-05 UTC" "2021-05-01 UTC"
Since in departureDate month name is in English, you need the locale to be English as well.
Refer How to change the locale of R? if you have non-English locale.
The ideal situation is that the code should be able to deal with every format on its own, without letting it fall to an exception.
Another issue to take into account is that the myd_hms() function returns dates in the POSIXct data type, whereas dmy() returns the Date type, so they wouldn't mix well together.
The code below applies mdy_hms(), then converts it to Date. It then tests for NA's and applies the second function dmy() on the missing values. More rules can be added in the pipeline at will if more formats are to be recognized.
library(dplyr)
dates.converted <-
mdy_hms(departureDate, tz = ) %>%
as.Date() %>%
ifelse(!is.na(.), ., dmy(departureDate)) %>%
structure(class = "Date")
print(dates.converted)
Output
[1] "2020-08-17" "2019-11-19" "2020-12-21" "2020-12-24" "2020-12-24" "2020-04-19" "2019-06-28" "2019-08-16"
[9] "2019-02-04" "2019-04-10" "2019-07-28" "2019-07-26" "2020-06-22" "2020-04-05" "2021-05-01"
One method would be to iterate through a list of candidate formats and apply it only to dates not previously parsed correctly.
fmts <- c("%b %d, %Y %H:%M:%S %p", "%d/%m/%Y")
dates <- rep(Sys.time()[NA], length(departureDate))
for (fmt in fmts) {
isna <- is.na(dates)
if (!any(isna)) break
dates[isna] <- as.POSIXct(departureDate[isna], format = fmt)
}
dates
# [1] "2020-08-17 12:00:00 PDT" "2019-11-19 12:00:00 PST" "2020-12-21 12:00:00 PST"
# [4] "2020-12-24 12:00:00 PST" "2020-12-24 12:00:00 PST" "2020-04-19 12:00:00 PDT"
# [7] "2019-06-28 00:00:00 PDT" "2019-08-16 00:00:00 PDT" "2019-02-04 00:00:00 PST"
# [10] "2019-04-10 00:00:00 PDT" "2019-07-28 00:00:00 PDT" "2019-07-26 00:00:00 PDT"
# [13] "2020-06-22 12:00:00 PDT" "2020-04-05 12:00:00 PDT" "2021-05-01 12:00:00 PDT"
as.Date(dates)
# [1] "2020-08-17" "2019-11-19" "2020-12-21" "2020-12-24" "2020-12-24" "2020-04-19" "2019-06-28"
# [8] "2019-08-16" "2019-02-04" "2019-04-10" "2019-07-28" "2019-07-26" "2020-06-22" "2020-04-05"
# [15] "2021-05-01"
I encourage you to put the most-likely formats first in the fmts vector.
The way this is set up, as soon as all elements are correctly found, no further formats are attempted (i.e., break).
Edit: if there is a difference in LOCALE where AM/PM are not locally recognized, then one method would be to first remove them from the strings:
departureDate <- gsub("\\s[AP]M$", "", departureDate)
departureDate
# [1] "Aug 17, 2020 12:00:00" "Nov 19, 2019 12:00:00" "Dec 21, 2020 12:00:00"
# [4] "Dec 24, 2020 12:00:00" "Dec 24, 2020 12:00:00" "Apr 19, 2020 12:00:00"
# [7] "28/06/2019" "16/08/2019" "04/02/2019"
# [10] "10/04/2019" "28/07/2019" "26/07/2019"
# [13] "Jun 22, 2020 12:00:00" "Apr 5, 2020 12:00:00" "May 1, 2021 12:00:00"
and then use a simpler format:
fmts <- c("%b %d, %Y %H:%M:%S", "%d/%m/%Y")
I have a timestamp vector like
time_stamp <- c("7/1/2013", "7/1/2013 12:00:30 AM", "7/1/2013 12:01:00 AM", "7/1/2013 12:01:30 AM", "8/1/2013","8/1/2013 11:02:30 PM")
I want to format this to date class. I tried
strptime(time_stamp, format = "%d/%m/%Y %H:%M:%S", tz = "GMT")
but since two timestamps have missing times it results in NAs, which should be substituted by default: 12:00:00.
I can run a loop such as:
for (i in 1:length(time_stamp))
{
if(nchar(time_stamp[i])<11)
{
time_stamp[i] <- paste(time_stamp[i], " 12:00:00 AM")
}
}
time_stamp <- format(strptime(time_stamp, format = "%d/%m/%Y %I:%M:%S %p", tz = "GMT"), "%d/%m/%Y %H:%M:%S", tz = "GMT")
Is there a faster and cleaner way to accomplish this? The vector is a part of large dataset so I don't want to loop over it.
lubridate::parse_date_time can take multiple token orders, with or without the %:
lubridate::parse_date_time(time_stamp, orders = c("dmy IMS p", "dmy"))
## [1] "2013-01-07 00:00:00 UTC" "2013-01-07 00:00:30 UTC" "2013-01-07 00:01:00 UTC"
## [4] "2013-01-07 00:01:30 UTC" "2013-01-08 00:00:00 UTC" "2013-01-08 23:02:30 UTC"
Or use its truncated parameter:
lubridate::parse_date_time(time_stamp, orders = 'dmy IMS p', truncated = 4)
which returns the same thing.
Or use a bit of regex replacement and then process as normal:
as.POSIXct(sub("(\\d{4}$)", "\\1 00:00:00", time_stamp),
format = "%d/%m/%Y %H:%M:%S", tz = "GMT")
#[1] "2013-01-07 00:00:00 GMT" "2013-01-07 12:00:30 GMT" "2013-01-07 12:01:00 GMT"
#[4] "2013-01-07 12:01:30 GMT" "2013-01-08 00:00:00 GMT" "2013-01-08 11:02:30 GMT"
I'm trying to get my head around parsing 12-hour times using lubridate. If I run
library(lubridate)
times <- c("1:30 AM", "6:29 AM", "6:59 AM", "9:54 AM", "2:45 PM")
hm(times)
I get
[1] "1H 30M 0S" "6H 29M 0S" "6H 59M 0S" "9H 54M 0S" "2H 45M 0S"
Note that the AM/PM designation is not used. However, if if the time strings also includes a date then the parsing works
ymd_hm(paste("01-01-01", times))
[1] "2001-01-01 01:30:00 UTC" "2001-01-01 06:29:00 UTC"
[3] "2001-01-01 06:59:00 UTC" "2001-01-01 09:54:00 UTC"
[5] "2001-01-01 14:45:00 UTC"
It seems to me that the time parsing functions: hm, hms, ... doesn't recognize the AM/PM, but the date functions do. Is it possible to allow for 12-hour parsing without going through the dates?
[I know I can do this by parsing the strings but I was wondering it it was possible within lubidate]
The two objects belong to different classes each one designed for a specific purpose.
With the first function you create a period class object. This kind of class if designed to represent times, like time of a race, or "how many hours Bolt runs 100 meters?" 0 hours 0 minutes 9 seconds 58 and so on.
See:
a <- hm(times)
class(a)
[1] "Period"
attr(,"package")
[1] "lubridate"
The second object with the function ymd_hm creates an object of class:
b <- ymd_hm(paste("01-01-01", times))
class(b)
[1] "POSIXct" "POSIXt"
This class of object is designed to represent "time", in the sense of Gregorian calendar (or maybe other kind of calendars). It does parse also AM/PM that are vital to differentiate hours of the day in a 12 hours clock.