How to get the time difference with minutes and seconds? - r

I have two columns of time information using minutes and seconds in a data.frame without additional date information, now I want to calculate the difference between these two columns and get a new column for diff_time (end_time-start_time) in either seconds (diff_time1) or in minutes and seconds as expressed in the original variables(diff_time2), how can I calculate this in R?
For example:
start_time end_time diff_time1 diff_time2
12'10" 16'23" 4'13" 253
1'05" 76'20" 75'15" 4515
96'10" 120'22" 24'12" 1452

Assuming that your times are stored as strings, in which case the quote denoting seconds must be escaped:
times <- data.frame(start_time = c("12'10\"", "1'05\"", "96'10\""),
end_time = c("16'23\"", "76'20\"", "120'22\"")
)
Then you can use lubridate::ms to convert to minutes + seconds and do the calculations. You'll need to do some additional text conversions if you want the results for diff_time1 as strings:
library(lubridate)
library(dplyr)
times %>%
mutate(diff_time1 = ms(end_time) - ms(start_time)) %>%
mutate(diff_time2 = as.numeric(diff_time1)) %>%
mutate(diff_time1 = gsub("M ", "'", diff_time1)) %>%
mutate(diff_time1 = gsub("S", "\"", diff_time1))
start_time end_time diff_time1 diff_time2
1 12'10" 16'23" 4'13" 253
2 1'05" 76'20" 75'15" 4515
3 96'10" 120'22" 24'12" 1452

You can store separate the minutes and seconds and store them as difftime objects, which can be added and subtracted:
library(tidyverse)
df <- structure(list(start_time = c("12'10\"", "1'05\"", "96'10\""),
end_time = c("16'23\"", "76'20\"", "120'22\"")), class = "data.frame", row.names = c(NA,
-3L), .Names = c("start_time", "end_time"))
df %>%
separate(start_time, c('start_min', 'start_sec'), convert = TRUE, extra = 'drop') %>%
separate(end_time, c('end_min', 'end_sec'), convert = TRUE, extra = 'drop') %>%
mutate(start = as.difftime(start_min, units = 'mins') + as.difftime(start_sec, units = 'secs'),
end = as.difftime(end_min, units = 'mins') + as.difftime(end_sec, units = 'secs'),
diff_time = end - start)
#> start_min start_sec end_min end_sec start end diff_time
#> 1 12 10 16 23 730 secs 983 secs 253 secs
#> 2 1 5 76 20 65 secs 4580 secs 4515 secs
#> 3 96 10 120 22 5770 secs 7222 secs 1452 secs

Related

Is there a simple way to calculate minutes between two date_times, excluding a specific interval?

start End minutes
2019-01-11 14:36:00 2019-01-13 16:27:00 2991
What I want is to calculate minutes excluding the interval between 00:00 and 06:00.
interval_time_excluded=as.numeric(round(difftime("2019-01-13 16:27:00", "2019-01-11 14:36:00", units = "days")))*as.difftime(c("06:00:00", "00:00:00"), units = "mins")[1]
interval_time_excluded
# output : Time difference of 720 mins
difftime("2019-01-13 16:27:00", "2019-01-11 14:36:00", units = "mins")
# output : Time difference of 2991 mins
difftime("2019-01-13 16:27:00", "2019-01-11 14:36:00", units = "mins")-interval_time_excluded
# Your desired output : Time difference of 2271 mins
Your post question is simple , just you need more practise with R functions like apply :
end_date="2019-01-11 14:36:00"
start_date="2019-01-13 16:27:00"
start_date_excluded="00:00:00"
end_date_excluded="06:00:00"
diff_times<-function(start_date="2019-01-13 16:27:00",end_date="2019-01-11 14:36:00",start_date_excluded="00:00:00",end_date_excluded="06:00:00"){
interval_time_excluded=as.numeric(round(difftime(start_date, end_date, units = "days")))*as.difftime(c(end_date_excluded, start_date_excluded), units = "mins")[1]
# interval_time_excluded
# output : Time difference of 720 mins
# part for the special case
# part for the special case
if(round(difftime(start_date, end_date, units = "days"))==0){
if(c(as.numeric(lubridate::hour(start_date))+1)==as.numeric(substr(end_date_excluded,1,2)) & as.numeric(substr(start_date_excluded,1,2))==as.numeric(lubridate::hour(end_date))) {
return(0)
}else{
return(difftime(start_date, end_date, units = "mins"))
}
}
# part for the special case
# part for the special case
# difftime(start_date, end_date, units = "mins")
# output : Time difference of 2991 mins
return(difftime(start_date, end_date, units = "mins")-interval_time_excluded)
# Your desired output : Time difference of 2271 mins
}
diff_times() # An example of a running using the default entered function values
data=structure(list(BLOCK_DATE_TIME.x = structure(c(1547217360, 1547225100, 1547392800, 1554900060, 1555930500, 1556305620), class = c("POSIXct", "POSIXt"), tzone = "UTC"), BLOCK_DATE_TIME.y = structure(c(1547396820, 1547228280, 1547397600, 1554905520, 1555936980, 1556362320), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, 6L), class = "data.frame")
apply(data, 1 , function(x) return(diff_times(x[2],x[1],start_date_excluded="00:00:00",end_date_excluded="06:00:00"))) # The result using your data ( returned as a vector )
This output :
Time difference of 2271 mins
1 2 3 4 5 6
2271 53 80 91 108 585
Examples for special cases :
diff_times(start_date="2019-01-09 05:59:00",end_date="2019-01-09 00:00:00",start_date_excluded="00:00:00",end_date_excluded="06:00:00")
diff_times(start_date="2019-01-09 07:59:00",end_date="2019-01-09 00:00:00",start_date_excluded="00:00:00",end_date_excluded="06:00:00")
[1] 0
Time difference of 479 mins

Comparing intervals across multiple time series in R

I have two concurrent time series A and B, both containing events defined by start and end times - here is a sample:
A.df <- structure(list(A.eventid = 1:53,
A.start = structure(c(1563219814.52, 1563219852.37, 1563220313.16, 1563220472.66, 1563220704.35, 1563220879.51, 1563221108.24, 1563221158.33, 1563221387.43, 1563221400.7, 1563221602.34, 1563221828.33, 1563222165.52, 1563222314.2, 1563222557.28, 1563222669.44, 1563222905.52, 1563223091.62, 1563223237.19, 1563223273.64, 1563223580.14, 1563223908.66, 1563224093.27, 1563224497.41, 1563224554.64, 1563224705.57, 1563225011.55, 1563225192.59, 1563225305.14, 1563225414.38, 1563225432.21, 1563225898.61, 1563226034.51, 1563226110.18, 1563226206.49, 1563226528.13, 1563226570.18, 1563226788.53, 1563227026.21, 1563227502.2, 1563227709.3, 1563227832.51, 1563228127.44, 1563228188.4, 1563228293.59, 1563228558.39, 1563228680.32, 1563228819.44, 1563229208.51, 1563229282.14, 1563229528.52, 1563229959.21, 1563230268.65), class = c("POSIXct", "POSIXt")),
A.end = structure(c(1563219846.43, 1563220304.39, 1563220470.68, 1563220702.37, 1563220877.5, 1563221102.18, 1563221151.47, 1563221379.63, 1563221389.22, 1563221600.32, 1563221819.27, 1563222157.29, 1563222312.23, 1563222555.25, 1563222667.42, 1563222894.56, 1563223079.44, 1563223230.39, 1563223273.24, 1563223578.14, 1563223900.48, 1563224089.24, 1563224493.45, 1563224550.37, 1563224699.47, 1563225005.13, 1563225188.17, 1563225293.21, 1563225412.17, 1563225417.46, 1563225894.44, 1563226025.2, 1563226108.13, 1563226204.37, 1563226517.59, 1563226562.41, 1563226780.59, 1563227022.28, 1563227493.57, 1563227705.52, 1563227830.38, 1563228125.49, 1563228184.21, 1563228286.39, 1563228546.47, 1563228677.67, 1563228816.5, 1563229198.68, 1563229273.54, 1563229526.53, 1563229952.57, 1563230257.16, 1563230742.25), class = c("POSIXct", "POSIXt"))),
row.names = 1:53, class = "data.frame")
B.df <- structure(list(B.eventid = 1:52,
B.start = structure(c(1563221811.888, 1563222153.835, 1563222156.013, 1563222220.14, 1563222289.692, 1563222305.607, 1563222611.565, 1563222631.139, 1563222636.867, 1563222763.565, 1563222774.301, 1563222848.507, 1563222849.957, 1563222853.513, 1563223225.656, 1563223302.539, 1563223326.153, 1563223328.934, 1563223590.144, 1563223592.904, 1563224035.038, 1563224692.704, 1563226451.642, 1563226454.731, 1563226819.701, 1563226824.685, 1563227278.677, 1563227770.247, 1563227773.907, 1563227800.529, 1563227804.663, 1563227809.749, 1563227813.237, 1563227819.043, 1563227829.781, 1563227973.727, 1563229396.472, 1563229454.515, 1563229473.079, 1563229488.669, 1563229521.413, 1563229542.954, 1563229553.595, 1563229565.988, 1563229569.095, 1563229618.857, 1563229791.585, 1563229936.355, 1563230339.141, 1563230734.677, 1563231667.173, 1563231978.567), class = c("POSIXct", "POSIXt")),
B.end = structure(c(1563221815.058, 1563222154.295, 1563222158.633, 1563222222.07, 1563222289.872, 1563222308.617, 1563222614.265, 1563222633.509, 1563222640.367, 1563222769.045, 1563222774.801, 1563222848.677, 1563222850.237, 1563222856.103, 1563223226.166, 1563223305.339, 1563223328.763, 1563223333.234, 1563223591.454, 1563223593.084, 1563224043.618, 1563224695.234, 1563226454.622, 1563226456.771, 1563226822.551, 1563226827.225, 1563227282.067, 1563227771.787, 1563227774.477, 1563227802.199, 1563227806.653, 1563227811.569, 1563227817.897, 1563227823.643, 1563227830.351, 1563227978.177, 1563229401.282, 1563229457.905, 1563229478.359, 1563229492.439, 1563229527.723, 1563229545.694, 1563229558.975, 1563229568.658, 1563229571.255, 1563229621.117, 1563229792.055, 1563229952.055, 1563230344.351, 1563230739.647, 1563231672.983, 1563231979.987), class = c("POSIXct", "POSIXt"))),
row.names = 1:52, class = "data.frame")
Events in series A are longer, while events in B are shorter.
I've drawn a schematic to help explain:
For each A event during which ≥ 4 B events occur, I'd like to compare (also shown on the schematic):
X = the mean interval between B events occurring during the A event
with
Y = the interval between the last B event occuring during the A event, and the first B event occurring after the A event
My issues are with the calculation of X and Y.
To calculate X, I tried using foverlaps to group B events by the A events in which they occur. But, this excludes B events occurring within gaps between A events.
Also, my attempts to calculate the mean intervals between grouped B events using mutate and lag failed, as I couldn't restrict lag to working only within the groups (i.e. it calculated intervals between groups as well).
Finally, I'm not sure how to efficiently identify the start/end of the Y interval to calculate its duration.
I was thinking my R/coding was improving, but this has me floundering a bit - any help would be very much appreciated!
Assuming your B-events are in chronological order, do not overlap eachother and only fall within a maximum of 1 A.event...
Explanation and in-between-output are commented in code below.
I could not verify the output, since you provided no desired/expected output in your question. Results look plausible to me on first glance..
library(data.table)
setDT(A.df); setDT(B.df)
#get time to next B
B.df[, time.to.next.B := shift(B.start, type = "lead") - B.end ][]
#get A-event that the B-events falls into
B.df[ A.df,
A.eventid := i.A.eventid,
on = .(B.start >= A.start, B.end <= A.end )][]
# B.eventid B.start B.end time.to.next.B A.eventid
# 1: 1 2019-07-15 22:16:51 2019-07-15 22:16:55 338.777 secs 11
# 2: 2 2019-07-15 22:22:33 2019-07-15 22:22:34 1.718 secs 12
# 3: 3 2019-07-15 22:22:36 2019-07-15 22:22:38 61.507 secs NA
# 4: 4 2019-07-15 22:23:40 2019-07-15 22:23:42 67.622 secs 13
# 5: 5 2019-07-15 22:24:49 2019-07-15 22:24:49 15.735 secs 13
# 6: 6 2019-07-15 22:25:05 2019-07-15 22:25:08 302.948 secs 13
# ...
#summarise by A.eventid, get number of B-events, and B.eventid of last B-event
#only get A-eventis's with 4 or more B-events
ans <- B.df[ !is.na( A.eventid),
.( B.events = .N,
last.B.eventid = max( B.eventid ),
next.B.eventid = max( B.eventid ) + 1,
mean.B.interval.within.A = mean( time.to.next.B[ B.eventid != max( B.eventid ) ] ) ),
by = .(A.eventid) ][ B.events >= 4, ]
# A.eventid B.events last.B.eventid next.B.eventid mean.B.interval.within.A
# 1: 16 5 14 15 20.879500 secs
# 2: 41 8 35 36 6.097714 secs
# 3: 50 4 40 41 26.239000 secs
# 4: 51 7 48 49 62.953500 secs
#now find the needed intervals using an update joins
ans[ B.df, start_time := i.B.end, on = .(last.B.eventid = B.eventid)]
ans[ B.df, end_time := i.B.start, on = .(next.B.eventid = B.eventid)]
# A.eventid B.events last.B.eventid next.B.eventid mean.B.interval.within.A start_time end_time
# 1: 16 5 14 15 20.879500 secs 2019-07-15 22:34:16 2019-07-15 22:40:25
# 2: 41 8 35 36 6.097714 secs 2019-07-15 23:57:10 2019-07-15 23:59:33
# 3: 50 4 40 41 26.239000 secs 2019-07-16 00:24:52 2019-07-16 00:25:21
# 4: 51 7 48 49 62.953500 secs 2019-07-16 00:32:32 2019-07-16 00:38:59
X <- ans$mean.B.interval.within.A
# Time differences in secs
# [1] 20.879500 6.097714 26.239000 62.953500
Y <- ans$end_time - ans$start_time
# Time differences in secs
# [1] 369.553 143.376 28.974 387.086
I tried to come up with a possible solution, minus the part of the average calculation, which should be obvious. First I renamed the column names, which makes it easier to join the data sets:
A.df = A.df %>%
rename_all(funs(str_replace(., "A.", ""))) %>%
mutate(type="A")
B.df = B.df %>%
rename_all(funs(str_replace(., "B.", ""))) %>%
mutate(type="B")
Then the overall data, sorted by time, is:
data = bind_rows(A.df, B.df) %>%
arrange(start)
Now I add a column showing the time stamp of the last start of an A event. Forward filling this value will show for each event the time of the last A event.
data = data %>%
mutate(last.A.start=ifelse(type=='A', start, NA)) %>%
tidyr::fill(last.A.start)
Finally, the A events can be removed. As long as the last.A.start is the same, the B events belong to the same A event. Based on these information x and y can be calculated.
data = data %>%
filter(type == "B") %>%
mutate(
duration=end-start, # Not needed.
delta=start - lag(end),
sameA=(last.A.start == lag(last.A.start)),
x=ifelse(sameA, delta, NA),
y=ifelse(sameA, NA, delta)
)
Does this help?
Bests, M

Find difference between time in R

I would like to calculate time difference in R of "A" and "B". The data that I have is the hour/minute/am-pm of individuals when they go to sleep("A") and at what time they wake up("B"): (df is called time)
Hour(A) Min(A) AMPM(A) Hour(B) Min(B) AMPM(B)
1 30 AM 7 30 AM
4 00 AM 9 00 AM
11 30 PM 6 30 AM
I have been doing some research and what I found is that I could create the time as a character and then change it as a time formate.
First, I used the unite() function (tidyverse) to join the hour(A) and min(A). Then, I created another column with a "fake" date (if it was pm: "2019-04-13" & am "2019-04-14"). Then, I used again the function unite() to join the date and the time and with the function strptime() I change the class to time.
For hour(B), min(B) and AMPM(B), I also used the function unite and join the three columns. Then I applied the function strptime() to change the class to a time.
Finally, I am using the function difftime() to find the difference between A and B, but I can't understand why I am getting unusual results.
time <- time %>% mutate(Date = ifelse(AMPM(A) == " AM", "2019-04-14", "2019-04-13"))
time$Date <- as.Date(time$Date)
#Using unite to join Hour(A) with Mins(A) and Hour(B) with Mins(B)
time <- time %>% unite(Sleeptime,HourA,MinsA, sep = ":") %>% unite(Wakeuptime, HourB,MinsB, sep = ":")
#Adding the seconds
time$Sleeptime <- paste0(time$Sleeptime,":00")
#Using unite to join Hours(B)Mins(B) with AMPM(B)
time <- time %>% unite(Wakeuptime, Wakeuptime ,AMPMWake, sep = "" )
#Changing the class for time (B)
time$Wakeuptime2 <- strptime(x = paste0(time$Wakeuptime2, "m"), format = "%I:%M %p")
#Joining the fake date for (A) with the time(A)
time <- time %>% unite(ST, Date, Sleeptime, sep = " ")
#Changing the class for time (A)
time$ST = strptime(time$ST,format='%Y-%m-%d %H:%M:%S')
#Calculating the difference in time
time$difference <- difftime(time$Wakeuptime2, time$ST, units = "hours")
What I need is another column with the difference in hour or minutes
Hour(A) Min(A) AMPM(A) Hour(B) Min(B) AMPM(B) DIFF (min)
1 30 AM 7 30 AM 300
4 00 AM 9 00 AM 300
11 30 PM 6 30 AM 420
We could use paste to assemble the fragments of time(A) and time(B), then convert as.POSIXct. From bed-times with PM we subtract 8.64e4 (one day in seconds). Now it's easy to calculate the differences within an apply.
tmp <- sapply(list(1:3, 4:6), function(x) {
cl <- as.POSIXct(apply(time[x], 1, paste, collapse=":"), format="%I:%M:%p")
return(ifelse(time[tail(x, 1)] == "PM", cl - 8.64e4, cl))
})
time <- cbind(time, `DIFF(min)`=apply(tmp, 1, diff)/60)
time
# Hour(A) Min(A) AMPM(A) Hour(B) Min(B) AMPM(B) DIFF(min)
# 1 1 30 AM 7 30 AM 360
# 2 4 0 AM 9 0 AM 300
# 3 11 30 PM 6 30 AM 420
Data
time <- structure(list(`Hour(A)` = c(1L, 4L, 11L), `Min(A)` = c(30L,
0L, 30L), `AMPM(A)` = c("AM", "AM", "PM"), `Hour(B)` = c(7L,
9L, 6L), `Min(B)` = c(30L, 0L, 30L), `AMPM(B)` = c("AM", "AM",
"AM")), row.names = c(NA, -3L), class = "data.frame")

Flag strange observations (rows) within lubridate::interval class object

Referring to my previous question here:
Flag rows with interval overlap in r
I have got a dataframe with some location information (1 = location A , 4 = location B)
:
df <- data.frame(stringsAsFactors=FALSE,
date = c("2018-09-02", "2018-09-02", "2018-09-02", "2018-09-02",
"2018-09-02", "2018-09-02", "2018-09-02", "2018-09-02",
"2018-09-02"),
ID = c("18101276-aa", "18101276-aa", "18102843-aa", "18102843-aa", "18102843-ab",
"18102843-aa", "18104148-aa", "18104148-ab", "18104148-ab"),
location = c(1L, 1L, 1L, 4L, 4L, 1L, 1L, 1L, 4L),
Start = c(111300L, 143400L, 030000L, 034900L, 064400L, 070500L, 060400L,
075100L, 081600L),
End = c(111459L, 143759L, 033059L, 035359L, 064759L, 070559L, 060459L,
81559L, 83559L),
start_hour_minute = c(1113L, 1434L, 0300L, 0349L, 0644L, 0705L, 0604L, 0751L, 0816L),
end_hour_minute = c(1114L, 1437L, 0330L, 0353L, 0647L, 0705L, 0604L, 0815L, 0835L))
Here, we have some observations (row 8 and 9) that an individual jump between two locations in a minute (it is not possible!). I was wondering, how can I flag these strange location shifts within my interval?
I am using lubridate::interval() as recommended to make an interval class object:
data_out <- df %>%
# Get the hour, minute, and second values as standalone numerics.
mutate(
date = ymd(date),
Start_Hour = floor(Start / 10000),
Start_Minute = floor((Start - Start_Hour*10000) / 100),
Start_Second = (Start - Start_Hour*10000) - Start_Minute*100,
End_Hour = floor(End / 10000),
End_Minute = floor((End - End_Hour*10000) / 100),
End_Second = (End - End_Hour*10000) - End_Minute*100,
# Use the hour, minute, second values to create a start-end timestamp.
Start_TS = ymd_hms(date + hours(Start_Hour) + minutes(Start_Minute) + seconds(Start_Second)),
End_TS = ymd_hms(date + hours(End_Hour) + minutes(End_Minute) + seconds(End_Second)),
# Create an interval object.
Watch_Interval = interval(start = Start_TS, end = End_TS))
Here's a similar approach.
First, I add padding to the two "...minute" variables so that they are unambiguous (e.g. 0349L in the sample data reads in as an integer 349. This step pads it to become text "0349"). Then I use those, in combination with the date, to get start and end times using lubridate:ymd_hm. (I presume there are no intervals that span midnight; if so, you'd typically see a negative interval of time between the start and end. You could add a step to catch this and increment the end_time to be the next day.)
Then I sort by ID and start time, and group by ID. This limits the subsequent steps so they only calculate time_elapsed and suspicious within records for a single individual at a time. In this case a record is flagged as suspicious if the location has changed from the prior record, but less than 10 minutes have passed.
library(lubridate); library(dplyr); library(stringr)
df2 <- df %>%
# Add lead padding zero to variables containing "minute"
mutate_at(vars(contains("minute")), funs(str_pad(., width = 4, pad = "0"))) %>%
# convert to time stamps
mutate(start_time = ymd_hm(paste(date, start_hour_minute)),
end_time = ymd_hm(paste(date, end_hour_minute))) %>%
# Sort and look separated at each individual
arrange(ID, start_time) %>%
group_by(ID) %>%
# Did location change while too little time passed?
mutate(time_elapsed = (start_time - lag(end_time)) / dminutes(1),
suspicious = (location != lag(location) & time_elapsed < 10)) %>%
ungroup()
> df2 %>% select(date, ID, location, start_time:suspicious)
# A tibble: 9 x 7
date ID location start_time end_time time_elapsed suspicious
<chr> <chr> <int> <dttm> <dttm> <dbl> <lgl>
1 2018-09-02 181012… 1 2018-09-02 11:13:00 2018-09-02 11:14:00 NA NA
2 2018-09-02 181012… 1 2018-09-02 14:34:00 2018-09-02 14:37:00 200 FALSE
3 2018-09-02 181028… 1 2018-09-02 03:00:00 2018-09-02 03:30:00 NA NA
4 2018-09-02 181028… 4 2018-09-02 03:49:00 2018-09-02 03:53:00 19 FALSE
5 2018-09-02 181028… 1 2018-09-02 07:05:00 2018-09-02 07:05:00 192 FALSE
6 2018-09-02 181028… 4 2018-09-02 06:44:00 2018-09-02 06:47:00 NA NA
7 2018-09-02 181041… 1 2018-09-02 06:04:00 2018-09-02 06:04:00 NA NA
8 2018-09-02 181041… 1 2018-09-02 07:51:00 2018-09-02 08:15:00 NA NA
9 2018-09-02 181041… 4 2018-09-02 08:16:00 2018-09-02 08:35:00 1 TRUE
I don't know if I got it right, but the code below will flag the jump in location + time difference less than or smaller than 1 minute. It will flag row 9 in your example data. If you want to tag both rows 8 and 9, you can make a new column containing the next location (using dplyr::lead(location)) and playing with the condition inside FLAG.
data_out <- df %>%
# Get the hour, minute, and second values as standalone numerics.
mutate(
date = ymd(date),
Start_Hour = floor(Start / 10000),
Start_Minute = floor((Start - Start_Hour*10000) / 100),
Start_Second = (Start - Start_Hour*10000) - Start_Minute*100,
End_Hour = floor(End / 10000),
End_Minute = floor((End - End_Hour*10000) / 100),
End_Second = (End - End_Hour*10000) - End_Minute*100,
# Use the hour, minute, second values to create a start-end timestamp.
Start_TS = ymd_hms(date + hours(Start_Hour) + minutes(Start_Minute) + seconds(Start_Second)),
End_TS = ymd_hms(date + hours(End_Hour) + minutes(End_Minute) + seconds(End_Second)),
Previous_End = lag(End_TS),
Previous_Loc = lag(location),
Timediff = lubridate::minutes(Start_TS - Previous_End),
FLAG = ifelse(!(location == Previous_Loc)&(Timediff <= minutes(1)), 1, 0)
)
EDIT
The snippet below won't flag cases where IDs change from one row to the next
data_out <- df %>%
# Get the hour, minute, and second values as standalone numerics.
mutate(
date = ymd(date),
Start_Hour = floor(Start / 10000),
Start_Minute = floor((Start - Start_Hour*10000) / 100),
Start_Second = (Start - Start_Hour*10000) - Start_Minute*100,
End_Hour = floor(End / 10000),
End_Minute = floor((End - End_Hour*10000) / 100),
End_Second = (End - End_Hour*10000) - End_Minute*100,
# Use the hour, minute, second values to create a start-end timestamp.
Start_TS = ymd_hms(date + hours(Start_Hour) + minutes(Start_Minute) + seconds(Start_Second)),
End_TS = ymd_hms(date + hours(End_Hour) + minutes(End_Minute) + seconds(End_Second)),
Previous_ID = lag(ID),
Previous_End = lag(End_TS),
Previous_Loc = lag(location),
Timediff = lubridate::minutes(Start_TS - Previous_End),
FLAG = ifelse(
!((location == Previous_Loc)&!(ID == Previous_ID))&(Timediff <= minutes(1)), 1, 0)
)

Calculating new column based on input from existing columns

I have a data frame with start and stop times for an experiment and I want to calculate the duration of each experiment (one line per experiment). Data frame:
start_t stop_t
7:35 7:48
23:50 00:15
11:22 12:06
I created a function to convert the time to POSIX format and calculate the duration, testing if start and stop crosses midnight:
TimeDiff <- function(t1,t2) {
if (as.numeric(as.POSIXct(paste("2016-01-01", t1))) > as.numeric(as.POSIXct(paste("2016-01-01", t2)))) {
t1n <- as.numeric(as.POSIXct(paste("2016-01-01", t1)))
t2n <- as.numeric(as.POSIXct(paste("2016-01-02", t2)))
}
if (as.numeric(as.POSIXct(paste("2016-01-01", t1))) < as.numeric(as.POSIXct(paste("2016-01-01", t2)))) {
t1n <- as.numeric(as.POSIXct(paste("2016-01-01", t1)))
t2n <- as.numeric(as.POSIXct(paste("2016-01-01", t2)))
}
#calculate time-difference in seconds
t2n - t1n
}
Then I wanted to apply this function to my data frame using either the 'mutate' function in 'dplyr' or an 'apply' function, e.g.:
mutate(df, dur = TimeDiff(start_t, stop_t))
But the result is that the 'dur' table is filled with just the same value. I ended up using a clunky for-loop to apply my function to the dataframe, but would want a more elegant solution. Help wanted!
Day can be incremented when the time stamp passes midnight. I am not sure if that is necessary to just to test if start and stop crosses midnight. Hope this helps!
df = data.frame(start_t = c("7:35", "23:50","11:22"), stop_t=c("7:48", "00:15", "12:06"), stringsAsFactors = F)
myfun = function(tvec1, tvec2, units_args="secs") {
tvec1_t = as.POSIXct(paste("2016-01-01", tvec1))
tvec2_t = as.POSIXct(paste("2016-01-01", tvec2))
time_diff = difftime(tvec2_t, tvec1_t, units = units_args)
return( time_diff )
}
# append new columns (base R)
df$time_diff = myfun(df$start_t, df$stop_t)
df$cross = ifelse(df$time_diff < 0, 1, 0)
output:
start_t stop_t time_diff cross
1 7:35 7:48 780 secs 0
2 23:50 00:15 -84900 secs 1
3 11:22 12:06 2640 secs 0
Since you don't have dates but only times, there is indeed the problem of experiments crossing midnight. Your function does not work, because it is not vectorized, i.e. it doesn't compute the difference for each element on its own.
The following works but is still not perfectly elegant:
If the start happened before the end, we simply subtract to get the duration.
If we cross midnight (the heuristic for this is not very stable), we calculate the difference until midnight and add the duration on the next day.
library(tidyverse)
diff_time <- function(start, end) {
case_when(start < end ~ end - start,
start > end ~ parse_time("23:59") - start + end + parse_time("0:01")
)
}
df %>%
mutate_all(parse_time) %>%
mutate(duration = diff_time(start_t, stop_t))
#> start_t stop_t duration
#> 1 07:35:00 07:48:00 780 secs
#> 2 23:50:00 00:15:00 1500 secs
#> 3 11:22:00 12:06:00 2640 secs
If you had dates, you could simply do:
df %>%
mutate(duration = stop_t - start_t)
Data
df <- read.table(text = "start_t stop_t
7:35 7:48
23:50 00:15
11:22 12:06", header = T)
The simplest way I can think of involves lubridate:
library(lubridate)
library(dplyr)
#make a fake df
df <- data.frame(start = c('7:35', '23:50', '11:22'), stop = c('7:48', '00:15', '12:06'), stringsAsFactors = FALSE)
#convert to lubridate minutes/seconds format, then subtract
df %>%
mutate(start = ms(start), stop = ms(stop)) %>%
mutate(dur= stop - start)
Output:
start stop dur
1 7M 35S 7M 48S 13S
2 23M 50S 15S -23M -35S
3 11M 22S 12M 6S 1M -16S
The problem with your circumstance is that the second line will confuse lubridate - it will show 23 hours and some minutes because it will assume all of these times are on the same day. You should probably add the day:
library(lubridate)
library(dplyr)
#make a fake df
df <- data.frame(start = c('2017/10/08 7:35', '2017/10/08 23:50', '2017/10/08 11:22'), stop = c('2017/10/08 7:48', '2017/10/09 00:15', '2017/10/08 12:06'), stringsAsFactors = FALSE)
#convert to lubridate minutes/seconds format, then subtract
df %>%
mutate(start = ymd_hm(start), stop = ymd_hm(stop)) %>%
mutate(dur= stop - start)
Output:
start stop dur
1 2017-10-08 07:35:00 2017-10-08 07:48:00 13 mins
2 2017-10-08 23:50:00 2017-10-09 00:15:00 25 mins
3 2017-10-08 11:22:00 2017-10-08 12:06:00 44 mins

Resources