R lubridate: Apply helper to dataframe - r

I have a dataframe of times looking like this:
library(lubridate)
times <- structure(list(exp1 = c("17:19:04 \r", "17:28:53 \r", "17:38:44 \r"),
exp2 = c("17:22:04 \r", "17:31:53 \r", "17:41:45 \r")),
row.names = c(NA, 3L), class = "data.frame")
I want to convert the times in more convenient date-time objects, which I will do with the hms() helper function from the lubridate package.
Running hms() on one column of my dataframe works like a charm:
hms(times[,1])
[1] "17H 19M 4S" "17H 28M 53S" "17H 38M 44S"
Great, surely I can just apply() on my whole dataframe then.
apply(times, 2, hms)
Which gives a weird dataframe with some integers, definitely not what I am expecting.
What is the right way to convert all of my times dataframe with the hms() function?

library(dplyr)
library(lubridate)
times %>% mutate_all(hms)
#OR
mutate_all(times, hms)
# exp1 exp2
#1 17H 19M 4S 17H 22M 4S
#2 17H 28M 53S 17H 31M 53S
#3 17H 38M 44S 17H 41M 45S

Instead of apply, can use lapply as apply converts to matrix and it wouln't hold the attributes created by hms
library(lubridate)
times[] <- lapply(times, hms)
str(times)
#'data.frame': 3 obs. of 2 variables:
# $ exp1:Formal class 'Period' [package "lubridate"] with 6 slots
# .. ..# .Data : num 4 53 44
# .. ..# year : num 0 0 0
# .. ..# month : num 0 0 0
# .. ..# day : num 0 0 0
# .. ..# hour : num 17 17 17
# .. ..# minute: num 19 28 38
# $ exp2:Formal class 'Period' [package "lubridate"] with 6 slots
# .. ..# .Data : num 4 53 45
# .. ..# year : num 0 0 0
# .. ..# month : num 0 0 0
# .. ..# day : num 0 0 0
# .. ..# hour : num 17 17 17
# .. ..# minute: num 22 31 41
With the devel version of dplyr, we can use mutate with across
library(dplyr)
times %>%
mutate(across(everything(), hms))

Related

Date to day and time

I have read a lot of blogs, but I cannot find the answer to my question:
I have a date 2020-25-02 17:45:03 and I would like to convert it to two columns day and time.
hello <- strptime(as.character("2020-25-02 17:42:03"),"%Y-%m-%d %H:%M:%S")
df$day <- as.Date(hello, format = "%Y-%d-%m")
But I also would like df$time. Is it possible ?
dtimes = c("2002-06-09 12:45:40","2003-01-29 09:30:40",
+ "2002-09-04 16:45:40","2002-11-13 20:00:40",
+ "2002-07-07 17:30:40")
> dtparts = t(as.data.frame(strsplit(dtimes,' ')))
> row.names(dtparts) = NULL
> thetimes = chron(dates=dtparts[,1],times=dtparts[,2],
+ format=c('y-m-d','h:m:s'))
> thetimes
[1] (02-06-09 12:45:40) (03-01-29 09:30:40) (02-09-04 16:45:40)
[4] (02-11-13 20:00:40) (02-07-07 17:30:40)
Please see this link
Use function hms in package lubridate.
df <- data.frame(day = as.Date(hello, format = "%Y-%d-%m"))
df$time <- lubridate::hms(sub("^[^ ]*\\b(.*)$", "\\1", hello))
df
# day time
#1 2020-02-25 17H 42M 3S
str(df)
#'data.frame': 1 obs. of 2 variables:
# $ day : Date, format: "2020-02-25"
# $ time:Formal class 'Period' [package "lubridate"] with 6 slots
# .. ..# .Data : num 3
# .. ..# year : num 0
# .. ..# month : num 0
# .. ..# day : num 0
# .. ..# hour : num 17
# .. ..# minute: num 42

Convert list of Lubridate periods to vector of periods [duplicate]

This question already has answers here:
Why does unlist() kill dates in R?
(2 answers)
Closed 2 years ago.
I have a list of periods that I would like to convert it to behave like a vector (eventually to add as a column in a data frame).
library(lubridate)
x <- list(ms("09:10"), ms("09:02"), ms("1:10"))
# some_function(x)
# with output
ms(c("09:10", "09:02", "1:10"))
unlist and purrr::flatten don't work in this case since it loses it period properties.
d <- do.call("c", x)
class(d)
[1] "Period"
attr(,"package")
[1] "lubridate"
Or
d <- data.frame(date = do.call("c", x))
str(d)
'data.frame': 3 obs. of 1 variable:
$ date:Formal class 'Period' [package "lubridate"] with 6 slots
.. ..# .Data : num 10 2 10
.. ..# year : num 0 0 0
.. ..# month : num 0 0 0
.. ..# day : num 0 0 0
.. ..# hour : num 0 0 0
.. ..# minute: num 9 9 1
d
date
1 9M 10S
2 9M 2S
3 1M 10S
See here: why does unlist() kill dates in R

How to create a time interval that count the rows in such time interval in R

I have a data frame that stores call records from a call center. My purpose is to count how many records exist per time interval, for example, in a time interval of 30 minutes there may be three call records (that is, three calls entered within that specific time interval); In case there are no records for that time interval, then my counter should show me a zero value.
This post was useful but I do not achieve that when there are no records in a time interval it shows me a zero value.
This is the structure of my call_log:
Classes ‘data.table’ and 'data.frame': 24416 obs. of 23 variables:
$ closecallid : int 1145000 1144998 1144997 1144996 1144995 1144991 1144989 1144987 1144986 1144984 ...
$ lead_id : int 1167647 1167645 1167644 1167643 1167642 1167638 1167636 1167634 1167633 1167631 ...
$ list_id :integer64 998 998 998 998 998 998 998 998 ...
$ campaign_id : chr "212120" "212120" "212120" "212120" ...
$ call_date : POSIXct, format: "2019-08-26 20:25:30" "2019-08-26 19:32:28" "2019-08-26 19:27:03" ...
$ start_epoch : POSIXct, format: "2019-08-26 20:25:30" "2019-08-26 19:32:28" "2019-08-26 19:27:03" ...
$ end_epoch : POSIXct, format: "2019-08-26 20:36:25" "2019-08-26 19:44:52" "2019-08-26 19:40:23" ...
$ length_in_sec : int 655 744 800 1109 771 511 640 153 757 227 ...
$ status : chr "Ar" "Ar" "Ar" "Ar" ...
$ phone_code : chr "1" "1" "1" "1" ...
$ phone_number : chr "17035555" "43667342" "3135324788" "3214255222" ...
$ user : chr "jfino" "jfino" "jfino" "jfino" ...
$ comments : chr "AUTO" "AUTO" "AUTO" "AUTO" ...
$ processed : chr "N" "N" "N" "N" ...
$ queue_seconds : num 0 524 692 577 238 95 104 0 0 0 ...
$ user_group : chr "CEAS" "CEAS" "CEAS" "CEAS" ...
$ xfercallid : int 0 0 0 0 0 0 0 0 0 0 ...
$ term_reason : chr "CALLER" "CALLER" "CALLER" "AGENT" ...
$ uniqueid : chr "1566869112.557969" "1566865941.557957" "1566865611.557952" "1566865127.557947" ...
$ agent_only : chr "" "" "" "" ...
$ queue_position: int 1 2 2 2 1 2 1 1 1 1 ...
$ called_count : int 1 1 1 1 1 1 1 1 1 1 ...
And, this is my code
df <- setDT(call_log)[ , list(number_customers_arrive = sum(called_count)), by = cut(call_date, "30 min")]
Thanks in advance.
Since there is not a reproducible example, I attempt the solution on a simulated data frame. First we create a log of calls with ID and time:
library(lubridate)
library(dplyr)
library(magrittr)
set.seed(123)
# Generate 100 random call times during a day
calls.df <- data.frame(id=seq(1,100,1), calltime=sample(seq(as.POSIXct('2019/10/01'),
as.POSIXct('2019/10/02'), by="min"), 100))
There may not be all intervals represented in your call data so generate a sequence of all 30 minute bins in case:
full.df <- data.frame(bin=seq(as.POSIXct('2019/10/01'), as.POSIXct('2019/10/02'), by="30 min"))
Next tally up counts of calls in represented bins:
calls.df %>% arrange(calltime) %>% mutate(diff=interval(lag(calltime),calltime)) %>%
mutate(mins=diff#.Data/60) %>% select(-diff) %>%
mutate(bin=floor_date(calltime, unit="30 minutes")) %>%
group_by(bin) %>% tally() -> orig.counts
Now make sure there are zeroes for unrepresented bins:
right_join(orig.counts,full.df,by="bin") %>% mutate(count=ifelse(is.na(n), 0, n))
# A tibble: 49 x 3
bin n count
<dttm> <int> <dbl>
1 2019-10-01 00:00:00 2 2
2 2019-10-01 00:30:00 1 1
3 2019-10-01 01:00:00 2 2
4 2019-10-01 01:30:00 NA 0
5 2019-10-01 02:00:00 2 2
6 2019-10-01 02:30:00 4 4
7 2019-10-01 03:00:00 1 1
8 2019-10-01 03:30:00 1 1
9 2019-10-01 04:00:00 2 2
10 2019-10-01 04:30:00 1 1
# ... with 39 more rows
Hope this is helpful for you.

R - How to convert a column times in ####M #S format to just number of minutes

I have used lubridate to create a column of time format ####M ##S
df$delay <- minutes(df$finish-Delays$start)
How can I covert this column to just give the number #### in front of minutes?
Thanks for any help.
Take a look at the S4 slots...
library(lubridate)
# create data
df <- data.frame(finish = 20)
Delays <- data.frame(start = 10)
(df$delay <- minutes(df$finish-Delays$start))
[1] "10M 0S"
# take a look at the 'delay' object
str(df$delay)
Formal class 'Period' [package "lubridate"] with 6 slots
..# .Data : num 0
..# year : num 0
..# month : num 0
..# day : num 0
..# hour : num 0
..# minute: num 10
# access the 'minute' slot
df$delay#minute
[1] 10

Work with durations over 24 hours in R

I have a series of duration that range up to 118 hours in a format like so "118:34:42" where 118 is hours, 34 is minutes, and 42 is seconds. Output should be a number of seconds.
I would like to convert this to some kind of time type in R, but most of the libraries I've looked at want to add a date (lubridate, zoo, xts), or return "NA" due to the hours being beyond a 24 hour range. I could parse the string and return a number of seconds, but I'm wondering if there's a faster way.
I'm slightly new to R (maybe 3 months in to working with this).
Any help figuring out how to deal with this would be appreciated.
Example:
library(lubridate)
x <- c("118:34:42", "114:12:12")
tt <- hms(x)
Error in parse_date_time(hms, orders, truncated = truncated, quiet = TRUE) :
No formats could be infered from the training set.
#try another route
w <- "118:34:42"
tt2 <- hms(w)
tt2
#[1] NA
z <- "7:02:02"
tt3 <- hmw(z)
tt3
#[1] "7H 2M 2S"
In the lubridate package there is a function hms() that returns a time object:
library(lubridate)
x <- c("118:34:42", "114:12:12")
tt <- hms(x)
tt
[1] 118 hours, 34 minutes and 42 seconds
[2] 114 hours, 12 minutes and 12 seconds
The function hms() returns an object of class Period:
str(tt)
Formal class 'Period' [package "lubridate"] with 6 slots
..# .Data : num [1:2] 42 12
..# year : num [1:2] 0 0
..# month : num [1:2] 0 0
..# day : num [1:2] 0 0
..# hour : num [1:2] 118 114
..# minute: num [1:2] 34 12
You can do arithmetic using these objects. For example:
tt[2] - tt[1]
[1] -4 hours, -22 minutes and -30 seconds

Resources