I have 2 dataframes as shown. The first (df1) has orders IDs, user IDS, and the time the user orders something. In df2, I have the orderIds and the time the order was responded to (timeResponse). What I need is a dataframe that takes these two dataframes and outputs each order ID, and if it was responded to, the time difference between the order time and the fastest order response. Thus, in the first order (order ID 1), there were 3 responses, with the first one being at 2pm - so it would be a 2 hour response.
I'm looking for a way to do this in R.
df1 <- data.frame(
orderID = c(1,2,3,4,5),
userID = c(101, 102, 103, 104, 105),
timeOrdered = c("1/1/2020 12:00:00 PM", "1/2/20 1:00PM", "1/3/20 12:00 AM", "1/4/20 12:00 AM", "1/5/20 12:00 AM"))
df2 <- data.frame(responseID = c(1,2,3,4,5),
orderID = c(101, 102, 103, 104, 105),
timeResponse = c("1/1/20 2:00 PM", "1/1/20 3:00 PM", "1/1/20 4:00 PM", "1/4/20 2:00 PM", "1/5/20 2:00 PM"))
Related
I have a column called "time" with some observations in "hours: minutes: seconds" and others only with "hours: minutes". I would like to remove the seconds and be left with only hours and minutes.
So far I have loaded the lubridate package and tried:
format(data$time ,format = "%H:%M")
but no change occurs.
And with:
data$time <- hm(data$time)
all the observations with h:m:s become NAs
What should I do?
You can use parse_date_time from lubridate to bring time into POSIXct format and then use format to keep the information that you need.
data <- data.frame(time = c('10:04:00', '14:00', '15:00', '12:34:56'))
data$time1 <- format(lubridate::parse_date_time(x, c('HMS', 'HM')), '%H:%M')
data
# time time1
#1 10:04:00 10:04
#2 14:00 14:00
#3 15:00 15:00
#4 12:34:56 12:34
I'm trying to calculate business hours between two dates. Business hours vary depending on the day.
Weekdays have 15 business hours (8:00-23:00), saturdays and sundays have 12 business hours (9:00-21:00).
For example: start date 07/24/2020 22:20 (friday) and end date 07/25/2020 21:20 (saturday), since I'm only interested in the business hours the result should be 12.67hours.
Here an example of the dataframe and desired output:
start_date end_date business_hours
07/24/2020 22:20 07/25/2020 21:20 12.67
07/14/2020 21:00 07/16/2020 09:30 18.50
07/18/2020 08:26 07/19/2020 10:00 13.00
07/10/2020 08:00 07/13/2020 11:00 42.00
Here is something you can try with lubridate. I edited another function I had I thought might be helpful.
First create a sequence of dates between the two dates of interest. Then create intervals based on business hours, checking each date if on the weekend or not.
Then, "clamp" the start and end times to the allowed business hours time intervals using pmin and pmax.
You can use time_length to get the time measurement of the intervals; summing them up will give you total time elapsed.
library(lubridate)
library(dplyr)
calc_bus_hours <- function(start, end) {
my_dates <- seq.Date(as.Date(start), as.Date(end), by = "day")
my_intervals <- if_else(weekdays(my_dates) %in% c("Saturday", "Sunday"),
interval(ymd_hm(paste(my_dates, "09:00"), tz = "UTC"), ymd_hm(paste(my_dates, "21:00"), tz = "UTC")),
interval(ymd_hm(paste(my_dates, "08:00"), tz = "UTC"), ymd_hm(paste(my_dates, "23:00"), tz = "UTC")))
int_start(my_intervals[1]) <- pmax(pmin(start, int_end(my_intervals[1])), int_start(my_intervals[1]))
int_end(my_intervals[length(my_intervals)]) <- pmax(pmin(end, int_end(my_intervals[length(my_intervals)])), int_start(my_intervals[length(my_intervals)]))
sum(time_length(my_intervals, "hour"))
}
calc_bus_hours(as.POSIXct("07/24/2020 22:20", format = "%m/%d/%Y %H:%M", tz = "UTC"), as.POSIXct("07/25/2020 21:20", format = "%m/%d/%Y %H:%M", tz = "UTC"))
[1] 12.66667
Edit: For Spanish language, use c("sábado", "domingo") instead of c("Saturday", "Sunday")
For the data frame example, you can use mapply to call the function using the two selected columns as arguments. Try:
df$business_hours <- mapply(calc_bus_hours, df$start_date, df$end_date)
start end business_hours
1 2020-07-24 22:20:00 2020-07-25 21:20:00 12.66667
2 2020-07-14 21:00:00 2020-07-16 09:30:00 18.50000
3 2020-07-18 08:26:00 2020-07-19 10:00:00 13.00000
4 2020-07-10 08:00:00 2020-07-13 11:00:00 42.00000
This is a tough one for me. I have 3 months data (up to 1m obs) and I have 2 columns in my data.frame
Date_Time Number
12/1/2015 12:00:01 AM 92222222
12/1/2015 12:00:29 AM 32211111
12/1/2015 12:00:41 AM 22333333
12/1/2015 12:00:43 AM 12222222
..... .....
12/1/2015 9:00:02 AM 92222222
12/2/2015 12:00:02 AM 32211111
How to count the occurrence/Frequency of each value in column "Number" within time frame of 24 hours.
the expected result of the above example
92222222 Freq: 2
32211111 Freq: 2
22333333 Freq: 1
12222222 Freq: 1
EDIT
time frame of 24 hours refer to interval of 24 hours. it doesn't mean from midnight to midnight. for example, if someone calls at 5 PM today, and call again at 3 PM next day, this should be counted as 2
Edit 2:
To be clearer, the objective of this analysis is to know the number of repeat calls in the call center for window period of 24 hours.
for example, customer called from contact number 01101111 on 1/Jan/2016 1:32:01 PM
& then called again on 1/Jan/2016 1:59:43 PM. and finally called next day 2/Jan/2016 12:21:02 PM
It's considered that the frequency of 0110111 is "3" because the number is repeated 3 times in less than 24 hours.
Based on your comments, for any number the start of the period is the earliest call from that number.
Below is the commented code:
library(lubridate)
library(dplyr)
calls <- structure(list(Date_Time = structure(1:6, .Label = c("12/1/2015 12:00:01 AM",
"12/1/2015 12:00:29 AM", "12/1/2015 12:00:41 AM", "12/1/2015 12:00:43 AM",
"12/1/2015 9:00:02 AM", "12/2/2015 12:00:02 AM"), class = "factor"),
Number = structure(c(4L, 3L, 2L, 1L, 4L, 3L), .Label = c("12222222",
"22333333", "32211111", "92222222"), class = "factor")), .Names = c("Date_Time",
"Number"), row.names = c(NA, -6L), class = "data.frame")
count_freq <- function(timestamps){
#Given all the ocurrences of calls from a number find the
#earliest one and count how many occur within 24 hours
dtime <- sort(mdy_hms(timestamps))
start_time <- dtime[1]
end_time <- start_time + hours(24)
sum(dtime >= start_time & dtime <= end_time)
}
out <- group_by(calls, Number) %>%
summarise(freq = count_freq(Date_Time))
Here is another approach to output the freq of the number in each row for the 24 hrs, but most likely slower than tfc's.
df<-read.table(header = TRUE, sep=",", text="Date_Time, Number
12/1/2015 12:00:01 AM, 92222222
12/1/2015 12:00:29 AM, 32211111
12/1/2015 12:00:41 AM, 22333333
12/1/2015 12:00:43 AM, 12222222
12/1/2015 9:00:02 AM, 92222222
12/2/2015 12:00:02 AM, 32211111")
df$Date_Time<-as.POSIXct(df$Date_Time, format="%m/%d/%Y %I:%M:%S %p")
library(dplyr)
ncount<-function(x){
target<-x[2]
starttime<-as.POSIXct(x[1], format="%Y-%m-%d %H:%M:%S")
endtime<-starttime+ 24*60*60 #1 day later
nrow(filter(df, Number==target & Date_Time>=starttime & Date_Time<=endtime))
}
df$freq<-apply(df, 1, function(x){ncount(x)} )
Given:
A data frame with three character columns.
Date Time Zone
1950-04-18 01:30 CST
1950-04-18 01:45 CST
1951-02-20 16:00 CST
1951-06-08 09:00 CST
1951-11-15 15:00 CST
1951-11-15 20:00 CST
Required:
1. Combine Date, Time and, Zone
2. Convert from Character to Date
What I have tried:
1. datetime <- paste(Date, Time)
2. strptime(datetime[1], "%Y-%m-%d %H:%M", tz=Zone[1])
This successfully parses the first element, however, I would like to convert the entire data using one of the looping functions lapply or sapply.
How can I use loop functions to parse the entire vector?
NOTE: Forgot to mention earlier, the data contains various abbreviated time zones other than CST
Timezones can be a tricky thing to handle as there are different formats being used. To get a list of the used timezones on your system, run OlsonNames() for the list.
The CST timezone you used in your example is not always supported and you might therefore get the following warning message when trying to use that as a timezone:
In as.POSIXct.POSIXlt(x) : unknown timezone 'CST'
I've constructed an example dataset (see below) to show how you can update your datetime with timezone information. The following for loop:
for (i in 1:nrow(d))
d$datetime[i] <- strftime(paste(d$Date, d$Time)[i],
format="%Y-%m-%d %H:%M",
tz = as.character(d$Zone[i]),
usetz = TRUE)
will give:
> d
Date Time Zone datetime
1 1950-04-18 01:30 GMT 1950-04-18 01:30 GMT
2 1950-04-18 01:45 CET 1950-04-18 01:45 CET
3 1951-02-20 16:00 EET 1951-02-20 16:00 EET
4 1951-06-08 09:00 EST 1951-06-08 09:00 EST
5 1951-11-15 15:00 WET 1951-11-15 15:00 WET
6 1951-11-15 20:00 MST 1951-11-15 20:00 MST
As said, your dataset might contain timezone abbreviations that are not recognized by your system. You could replace these with the help of this list for example.
Used data:
d <- read.table(text="Date Time Zone
1950-04-18 01:30 GMT
1950-04-18 01:45 CET
1951-02-20 16:00 EET
1951-06-08 09:00 EST
1951-11-15 15:00 WET
1951-11-15 20:00 MST", header=TRUE, stringsAsFactors = FALSE)
I think this should work:
df1<-data.frame(x = paste(df$Date,df$Time), Zone =df$Zone)
d<-mapply(FUN = strptime,x=df1$x,tz=as.character(df1$Zone),format="%Y-%m-%d %H:%M",SIMPLIFY = F,USE.NAMES = F)
I have two columns in R and both of them are dates with the following format..
1/2/2015 3:00:00 PM
I need to create a new column which is 'Hours' which would be the amount of hours have passed between the two dates. I've tried this but it gives me the difference in days..
col1 <- df$collection.when
col2 <- df$discardable_when
col3 <- as.Date(col1) - as.Date(col2)
head(col3)
# > Time differences in days
[1] -393 NA NA NA -485 NA
EDIT:
It seems that #HubertL answer would be the correct solution, however I cannot get complete a data.frame for an unknown reason.. heres a screenshot that shows my workflow. Any help is greatly appreciated.
Instead of as.Date, use as.POSIXct which includes time information.
You can also use difftime to specify units of output.
time1 = as.POSIXct("2015-01-01 01:00:00")
time2 = as.POSIXct("2015-03-02 05:00:00 PM")
difftime(time2, time1, units = "min")
Time difference of 86640 mins
Maybe Dates don't have time.
You probably could use POSIXct :
a <- as.POSIXct(strptime("1/2/2015 2:00:00 PM", "%m/%d/%Y %I:%M:%S %p"), tz="")
b <- as.POSIXct(strptime("1/2/2015 5:00:00 PM", "%m/%d/%Y %I:%M:%S %p"), tz="")
c <- b - a;
units(c) <- "hours"
as.numeric(c)
[1] 3