union/merge overlapping time-ranges [duplicate] - r

This question already has answers here:
How to flatten / merge overlapping time periods
(5 answers)
Closed 4 years ago.
I know the following problam can be solved using Bioconductor's IRanges-package, using reduce.
But since that function only accepts numeric input, and I am working with data.table anyway, I am wondering is the following van be achieved using data.tables'foverlaps().
Sample data
structure(list(group = c("A", "A", "A", "A", "B", "B", "B", "B"
), subgroup = c(1, 1, 2, 2, 1, 1, 2, 2), start = structure(c(1514793600,
1514795400, 1514794200, 1514798100, 1514815200, 1514817000, 1514815800,
1514818800), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
end = structure(c(1514794500, 1514797200, 1514794800, 1514799000,
1514816100, 1514818800, 1514817600, 1514820600), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
# group subgroup start end
# 1: A 1 2018-01-01 08:00:00 2018-01-01 08:15:00
# 2: A 1 2018-01-01 08:30:00 2018-01-01 09:00:00
# 3: A 2 2018-01-01 08:10:00 2018-01-01 08:20:00
# 4: A 2 2018-01-01 09:15:00 2018-01-01 09:30:00
# 5: B 1 2018-01-01 14:00:00 2018-01-01 14:15:00
# 6: B 1 2018-01-01 14:30:00 2018-01-01 15:00:00
# 7: B 2 2018-01-01 14:10:00 2018-01-01 14:40:00
# 8: B 2 2018-01-01 15:00:00 2018-01-01 15:30:00
Question
What I would like to achieve, is to join/merge events (by group) when:
a range (start - end) overlaps (or partially overlaps) another range
the start of a range is the end of another range
Subgroups can be ignored
As mentioned above, I'm know this can be done using biocondustor's IRanges reduce, but I wonder if the same can be achieved using data.table. I can't shake the feeling that foverlaps should be able to tackle my problem, but I cannot figure out how...
Since I'm an intermediate R-user, but pretty much a novice in data.table, it's hard for me to 'read' some solutions already provided on stackoverflow. So I'm not sure if a similar quenstion has already been asked and answered (if so, please be gentle ;-) )
Desired output
structure(list(group = c("A", "A", "A", "B"), start = structure(c(1514793600,
1514795400, 1514798100, 1514815200), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), end = structure(c(1514794800, 1514797200,
1514799000, 1514820600), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
# group start end
# 1: A 2018-01-01 08:00:00 2018-01-01 08:20:00
# 2: A 2018-01-01 08:30:00 2018-01-01 09:00:00
# 3: A 2018-01-01 09:15:00 2018-01-01 09:30:00
# 4: B 2018-01-01 14:00:00 2018-01-01 15:30:00

If you arrange on group and start (in that order) and unselect the indx column, this solution posted by David Arenburg works perfectly: How to flatten/merge overlapping time periods in R
library(dplyr)
df1 %>%
group_by(group) %>%
arrange(group, start) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(start)) >
cummax(as.numeric(end)))[-n()])) %>%
group_by(group, indx) %>%
summarise(start = first(start), end = last(end)) %>%
select(-indx)
group start end
<chr> <dttm> <dttm>
1 A 2018-01-01 08:00:00 2018-01-01 08:20:00
2 A 2018-01-01 08:30:00 2018-01-01 09:00:00
3 A 2018-01-01 09:15:00 2018-01-01 09:30:00
4 B 2018-01-01 14:00:00 2018-01-01 15:30:00

Related

Calculate time intervals without any overlap

I have the following data:
# dput:
data <- structure(list(start = structure(c(1641193200, 1641189600, 1641218400,
1641189600, 1641222000, 1641222000, 1641222000), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), end = structure(c(1641218400, 1641218400,
1641241800, 1641218400, 1641241800, 1641241800, 1641232800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "B", "C", "D", "E",
"F", "G")), row.names = c(NA, -7L), class = c("tbl_df", "tbl",
"data.frame"))
data
# A tibble: 7 x 3
start end group
<dttm> <dttm> <chr>
1 2022-01-03 07:00:00 2022-01-03 14:00:00 A
2 2022-01-03 06:00:00 2022-01-03 14:00:00 B
3 2022-01-03 14:00:00 2022-01-03 20:30:00 C
4 2022-01-03 06:00:00 2022-01-03 14:00:00 D
5 2022-01-03 15:00:00 2022-01-03 20:30:00 E
6 2022-01-03 15:00:00 2022-01-03 20:30:00 F
7 2022-01-03 15:00:00 2022-01-03 18:00:00 G
And I want to calculate at what time there only 1 group has an "active" time interval (start to end) without overlapping with any other group.
I already experimented with lubridate and the interval function but had trouble comparing more than 2 Intervals with each other.
Desired Output
The output should give the result that the group C has the time interval from 14:00 to 15:00 that has no overlap with any other group.
You can check ivs::iv_locate_splits to see which time frame is occupied by which group:
library(ivs)
ivv <- iv(data$start, data$end)
iv_locate_splits(ivv)
key loc
1 [2022-01-03 06:00:00, 2022-01-03 07:00:00) 2, 4
2 [2022-01-03 07:00:00, 2022-01-03 08:00:00) 1, 2, 4
3 [2022-01-03 08:00:00, 2022-01-03 14:00:00) 1, 2, 4, 7
4 [2022-01-03 14:00:00, 2022-01-03 15:00:00) 3, 7
5 [2022-01-03 15:00:00, 2022-01-03 18:00:00) 3, 5, 6, 7
6 [2022-01-03 18:00:00, 2022-01-03 20:30:00) 3, 5, 6
Updated framework to get the desired outcome:
library(ivs)
#convert to iv format
ivv <- iv(data$start, data$end)
#Check the splits
spl <- iv_locate_splits(ivv)
#Get the index of splits with only 1 group
index <- unlist(spl$loc[lengths(spl$loc) == 1])
#Create the desired outcome using the index
data.frame(frame = spl$key[index],
group = data$group[index])
# frame group
#1 [2022-01-03 14:00:00, 2022-01-03 15:00:00) C

Check if dates are within a time frame r

I have two datasets, one with values at specific time points for different IDs and another one with several time frames for the IDs. Now I want to check if the timepoint in dataframe one is within any of the time frames from dataset 2 matching the ID.
For example:
df1:
ID date time
1 2020-04-14 11:00:00
1 2020-04-14 18:00:00
1 2020-04-15 10:00:00
1 2020-04-15 20:00:00
1 2020-04-16 11:00:00
1 ...
2 ...
df2:
ID start end
1 2020-04-14 16:00:00 2020-04-14 20:00:00
1 2020-04-15 18:00:00 2020-04-16 13:00:00
2 ...
2
what I want
df1_new:
ID date time mark
1 2020-04-14 11:00:00 0
1 2020-04-14 18:00:00 1
1 2020-04-15 10:00:00 0
1 2020-04-15 20:00:00 1
1 2020-04-16 11:00:00 1
1 ...
2 ...
Any help would be appreciated!
An option could be:
library(tidyverse)
library(lubridate)
#> date, intersect, setdiff, union
df_1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L), date = c("14.04.2020",
"14.04.2020", "15.04.2020", "15.04.2020", "16.04.2020"), time = c("11:00:00",
"18:00:00", "10:00:00", "20:00:00", "11:00:00"), date_time = structure(c(1586862000,
1586887200, 1586944800, 1586980800, 1587034800), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA,
-5L))
df_2 <- structure(list(ID = c(1L, 1L), start = c("14.04.2020 16:00",
"15.04.2020 18:00"), end = c("14.04.2020 20:00", "16.04.2020 13:00"
)), class = "data.frame", row.names = c(NA, -2L))
df_22 <- df_2 %>%
mutate(across(c("start", "end"), dmy_hm)) %>%
group_nest(ID)
left_join(x = df_1, y = df_22, by = "ID") %>%
as_tibble() %>%
mutate(mark = map2_dbl(date_time, data, ~+any(.x %within% interval(.y$start, .y$end)))) %>%
select(-data)
#> # A tibble: 5 x 5
#> ID date time date_time mark
#> <int> <chr> <chr> <dttm> <dbl>
#> 1 1 14.04.2020 11:00:00 2020-04-14 11:00:00 0
#> 2 1 14.04.2020 18:00:00 2020-04-14 18:00:00 1
#> 3 1 15.04.2020 10:00:00 2020-04-15 10:00:00 0
#> 4 1 15.04.2020 20:00:00 2020-04-15 20:00:00 1
#> 5 1 16.04.2020 11:00:00 2020-04-16 11:00:00 1
Created on 2021-05-25 by the reprex package (v2.0.0)

Creating dataframe rows with timestamp increments

I have a dataframe with a structure similar to this:
name
time_from
time_to
value
a
2020-01-01 00:00:00
2020-01-01 01:30:00
value1
a
2020-01-01 02:00:00
2020-01-01 02:30:00
value2
b
2020-01-01 00:00:00
2020-01-01 01:00:00
value3
I want to convert the dataframe to the following structure by increasing the time_from timestamp by 30 minutes up to the time_to timestamp value while the name and value remain the same over the timestamp increments.
name
time
value
a
2020-01-01 00:00:00
value1
a
2020-01-01 00:30:00
value1
a
2020-01-01 01:00:00
value1
a
2020-01-01 01:30:00
value1
a
2020-01-01 02:00:00
value2
a
2020-01-01 02:30:00
value2
b
2020-01-01 00:00:00
value3
b
2020-01-01 00:30:00
value3
b
2020-01-01 01:00:00
value3
Help and guidance would be greatly appreciated. Thank you.
Using seqPOSIXt in a by approach.
dat <- do.call(rbind, by(dat, dat[c('name', 'value')], function(x) {
setNames(
data.frame(x[1, 1], seq.POSIXt(x[1, 2], x[nrow(x), 3], by='30 min'), x[1, 4]),
c('name', 'time', 'value'))}))
dat
# name time value
# 1 a 2020-01-01 00:00:00 value1
# 2 a 2020-01-01 00:30:00 value1
# 3 a 2020-01-01 01:00:00 value1
# 4 a 2020-01-01 01:30:00 value1
# 5 a 2020-01-01 02:00:00 value2
# 6 a 2020-01-01 02:30:00 value2
# 7 b 2020-01-01 00:00:00 value3
# 8 b 2020-01-01 00:30:00 value3
# 9 b 2020-01-01 01:00:00 value3
Of course, the solution assumes correctly formated 'POSIXct' format as input. Convert beforehand if you don't have it:
tcols <- c('time_from', 'time_to')
dat[tcols] <- lapply(dat[tcols], as.POSIXct)
Data:
dat <- structure(list(name = c("a", "a", "b"), time_from = structure(c(1577833200,
1577840400, 1577833200), class = c("POSIXct", "POSIXt"), tzone = ""),
time_to = structure(c(1577838600, 1577842200, 1577836800), class = c("POSIXct",
"POSIXt"), tzone = ""), value = c("value1", "value2", "value3"
)), row.names = c(NA, -3L), class = "data.frame")

R: Loop through a set of values in one dataframe update a second dataframe

Updated to a more realistic example; this time added duplicates in interp_b.
I am trying to populate a field in one dataframe (interp_b) using the values from a second dataframe (bait). I want to look at each row's obs_datetime in interp_b, and determine when that plot-station-year was last baited, prior to the obs_datetime. This will later be used to calculate a time-since-bait for each obs_datetime. Bait times are in the bait dataframe in column bait_datetime. The results should go in a field called latestbait_datetime in the interp_b dataframe.
I was visualizing an iterative process where interp_b "latestbait_datetime" keeps getting recalculated until the last row in the bait dataframe is reached. The for-loop I tried is clearly running through the rows and doing the specified calculations but I can't seem to get the output in the format I want; it is producing output for each loop rather than rewriting and updating the interp_b dataframe.
Here is some code to build the two dataframes; interp_b and bait (please excuse the inelegance)
# interp_b dataframe----
structure(list(plot_station_year = c("Cow_C2_2019", "RidingStable_C3_2018",
"RidingStable_C3_2018", "Raf_C1_2018", "Metcalfe_C2_2019"), obs_datetime = structure(c(1559487600,
1544954400, 1541084400, 1515160800, 1567756800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), latestbait_datetime = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), class = c("POSIXct",
"POSIXt"))), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))
# bait dataframe----
structure(list(plot_station_year = c("Cow_C2_2019", "Cow_C2_2019",
"RidingStable_C3_2018", "Raf_C1_2018"), bait_datetime = structure(c(1557500400,
1559746800, 1543676400, 1491318000), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -4L), spec = structure(list(
cols = list(plot_station_year = structure(list(), class = c("collector_character",
"collector")), bait_datetime = structure(list(format = "%d-%m-%Y %H:%M"), class = c("collector_datetime",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
and the desired result would look like this
Below are two of my attempts. The first resulted in a dataframe that only contained the final run of the loop and the second attempt resulted in a dataframe containing all of the run results (as expected with the bind).
library(tidyverse)
#attempt #1----
for (i in 1:nrow(bait)) {
print(paste("row =",i))
interpbait <- interp_b %>%
mutate(latestbait_datetime = if_else((plot_station_year == bait$plot_station_year[i] & (obs_datetime >= bait$bait_datetime[i] & (is.na(latestbait_datetime) | latestbait_datetime < bait$bait_datetime[i]))), bait$bait_datetime[i], latestbait_datetime))
}
#attempt #2----
resultb <- data.frame()
for (i in 1:nrow(bait)) {
print(paste("row =",i))
interpbait2 <- interp_b %>%
mutate(latestbait_datetime = if_else((plot_station_year == bait$plot_station_year[i] & (obs_datetime >= bait$bait_datetime[i] & (is.na(latestbait_datetime) | latestbait_datetime < bait$bait_datetime[i]))), bait$bait_datetime[i], latestbait_datetime))
resultb <- bind_rows(resultb, interpbait2)
print(resultb)
}
Any help would be greatly appreciated.
I'm not sure how long this will take, but here is a tidyverse solution. For each row in interp_b, we filter the bait dataframe to the correct plot_station_year, and ensure that all date-times are less than the row in interp_b. Then, we arrange the filtered bait data by descending datetime (so that the most recent dates are on top). We slice the first row of that dataframe so that we only get the most recent date. Then we "pull out" the date-time from the dataframe, and add it onto the appropriate row in interp_b.
library(tidyverse)
library(progress) # for progress bar
# create progress bar to update, so that you can estimate the amount of time it will take to finish the entire loop
pb <- progress_bar$new(total = nrow(interp_b))
for (i in 1:nrow(interp_b)) {
last_time_baited <- bait %>%
#filter bait dataframe to appropriate plot, station, year based on
# the row in interp_b
filter(plot_station_year == interp_b$plot_station_year[i],
# ensure all datetimes are less than that row in interp_b
bait_datetime < interp_b$obs_datetime[i]) %>%
# arrange by datetime (most recent datetimes first)
arrange(desc(bait_datetime)) %>%
# take the top row - this will be the most recent date-time that
# the plot-station was baited
slice(1) %>%
# "pull" that value out of the dataframe so you have a value,
# not a tibble
pull(bait_datetime) #
# update the row in interp_b with the date_time baited
interp_b$latestbait_datetime[i] <- last_time_baited
pb$tick() # print progress
}
The resulting table matches your expected output (interp_b):
# A tibble: 5 x 3
plot_station_year obs_datetime latestbait_datetime
<chr> <dttm> <dttm>
1 Cow_C2_2019 2019-06-02 15:00:00 2019-05-10 11:00:00
2 RidingStable_C3_2018 2018-12-16 10:00:00 2018-12-01 10:00:00
3 RidingStable_C3_2018 2018-11-01 15:00:00 NA
4 Raf_C1_2018 2018-01-05 14:00:00 2017-04-04 11:00:00
5 Metcalfe_C2_2019 2019-09-06 08:00:00 NA
You could perform an outer join with data.table, and then select the highest bait_datetime for each plot_station_year.
Edit: I edited my answer to reflect the possibility that there could be multiple obs_datetime for a given unique plot_station_year in interp2. To preserve these, we index them and include the index in the filtering step.
One potential improvement with large files (not tested) could be to merge using roll, instead of performing an outer merge and then to filter.
That version is shown in the end of the reproducible example:
library(data.table)
interp2 <- structure(list(plot_station_year = c("Cow_C2_2019", "Cow_C2_2019", "RidingStable_C3_2018",
"Raf_C1_2018", "Metcalfe_C2_2019"), obs_datetime = structure(c(1559487600, 1559487300,
1544954400, 1515160800, 1567756800), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), latestbait_datetime = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_), class = c("POSIXct", "POSIXt"))), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L))
bait2 <- structure(list(plot_station_year = c("Cow_C2_2019", "Cow_C2_2019", "Cow_C2_2019",
"RidingStable_C3_2018", "Raf_C1_2018"), bait_datetime = structure(c(1557500400,
1496674800, 1576674800, 1543676400, 1491318000), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -5L), spec = structure(list(
cols = list(plot_station_year = structure(list(), class = c("collector_character",
"collector")), bait_datetime = structure(list(format = "%d-%m-%Y %H:%M"), class = c("collector_datetime",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
# add index idx by plot_station_year, remove empty column, set keys
setDT(interp2)[, "latestbait_datetime" := NULL][, idx := 1:.N, by=plot_station_year]
setkeyv(interp2, c("plot_station_year", "idx", "obs_datetime"))
# same for bait2: set as data.table, set keys
setDT(bait2, key=c("plot_station_year", "bait_datetime"))
## option 1: merge files, then filter
# outer join on interp2 and bait2 on first column (and order by bait_datetime)
expected_out <- merge(interp2, bait2, by="plot_station_year", all=TRUE)
# set keys for sorting
setkey(expected_out, plot_station_year, idx, bait_datetime)
# select highest bait_datetime below obs_datetime by plot_station_year and idx
expected_out <- expected_out[is.na(bait_datetime) | bait_datetime < obs_datetime][,
tail(.SD, 1), by=.(plot_station_year, idx)]
# rename and sort columns
setnames(expected_out, old="bait_datetime", new="latestbait_datetime")
setorder(expected_out, -latestbait_datetime, idx, na.last = TRUE)[]
#> plot_station_year idx obs_datetime latestbait_datetime
#> 1: Cow_C2_2019 1 2019-06-02 15:00:00 2019-05-10 15:00:00
#> 2: Cow_C2_2019 2 2019-06-02 14:55:00 2019-05-10 15:00:00
#> 3: RidingStable_C3_2018 1 2018-12-16 10:00:00 2018-12-01 15:00:00
#> 4: Raf_C1_2018 1 2018-01-05 14:00:00 2017-04-04 15:00:00
#> 5: Metcalfe_C2_2019 1 2019-09-06 08:00:00 <NA>
## option 2 (might use less memory): rolling join
bait2[, latestbait_datetime := bait_datetime]
out_alt <- bait2[interp2, .(plot_station_year, obs_datetime, idx, latestbait_datetime),
on=c("plot_station_year", "bait_datetime==obs_datetime"), roll=Inf]
# order
setorder(out_alt, -latestbait_datetime, idx, na.last = TRUE)[]
#> plot_station_year obs_datetime idx latestbait_datetime
#> 1: Cow_C2_2019 2019-06-02 15:00:00 1 2019-05-10 15:00:00
#> 2: Cow_C2_2019 2019-06-02 14:55:00 2 2019-05-10 15:00:00
#> 3: RidingStable_C3_2018 2018-12-16 10:00:00 1 2018-12-01 15:00:00
#> 4: Raf_C1_2018 2018-01-05 14:00:00 1 2017-04-04 15:00:00
#> 5: Metcalfe_C2_2019 2019-09-06 08:00:00 1 <NA>
setcolorder(out_alt, c(1,3,2,4))[]
#> plot_station_year idx obs_datetime latestbait_datetime
#> 1: Cow_C2_2019 1 2019-06-02 15:00:00 2019-05-10 15:00:00
#> 2: Cow_C2_2019 2 2019-06-02 14:55:00 2019-05-10 15:00:00
#> 3: RidingStable_C3_2018 1 2018-12-16 10:00:00 2018-12-01 15:00:00
#> 4: Raf_C1_2018 1 2018-01-05 14:00:00 2017-04-04 15:00:00
#> 5: Metcalfe_C2_2019 1 2019-09-06 08:00:00 <NA>
## test that both options give the same result:
identical(expected_out, out_alt)
#> [1] TRUE

Compare timestamps based on multiple criteria from multiple rows and columns

I have two data frames with timestamps (in as.POSIXct, format="%Y-%m-%d %H:%M:%S") as below.
df_ID1
ID DATETIME TIMEDIFF EV
A 2019-03-26 06:13:00 2019-03-26 00:13:00 1
B 2019-04-03 08:00:00 2019-04-03 02:00:00 1
B 2019-04-04 12:35:00 2019-04-04 06:35:00 1
df_ID0
ID DATETIME
A 2019-03-26 00:02:00
A 2019-03-26 04:55:00
A 2019-03-26 11:22:00
B 2019-04-02 20:43:00
B 2019-04-04 11:03:00
B 2019-04-06 03:12:00
I want to compare the DATETIME in df_ID1 with the DATETIME in df_ID0 that is with the same ID and the DATETIME is "smaller than but closest to" the one in df_ID1,
For the pair in two data frames that matches, I want to further compare the TIMEDIFF in df_ID1 to the matched DATETIME in df_ID0, if TIMEDIFF in df_ID1 greater than the DATETIME in df_ID0, change EV 1 to 4 in df_ID1.
My desired result is
df_ID1
ID DATETIME TIMEDIFF EV
A 2019-03-26 06:13:00 2019-03-26 00:13:00 1
B 2019-04-03 08:00:00 2019-04-03 02:00:00 4
B 2019-04-04 12:35:00 2019-04-04 06:35:00 1
I've checked how to compare timestamps and calculate the time difference, also how to change values based on criteria...
But I cannot find anything to select the "smaller than but closest to" timestamps and cannot figure out how to apply all these logic too..
Any help would be appreciate!
You can do this with a for loop keeping in mind that if your actual data base is very big then the overhead would be quite bad performance wise.
for(i in 1:nrow(df_1)){
sub <- subset(df_0, ID == df_1$ID[i]) # filter on ID
df_0_dt <- max(sub[sub$DATETIME < df_1$DATETIME[i],]$DATETIME) # Take max of those with DATETIME less than (ie less than but closest to)
if(df_0_dt < df_1$TIMEDIFF[i]){ # final condition
df_1[i, "EV"] <- 4
}
}
df_1
# A tibble: 3 x 4
ID DATETIME TIMEDIFF EV
<chr> <dttm> <dttm> <dbl>
1 A 2019-03-26 06:13:00 2019-03-26 00:13:00 1
2 B 2019-04-03 08:00:00 2019-04-03 02:00:00 4
3 B 2019-04-04 12:35:00 2019-04-04 06:35:00 1
One option using nested mapply, is to first split df_ID1 and df_ID0 based on ID. Calculate the difference in time between each value in df_ID1 with that of df_ID0 of same ID. Get the index of "smaller than but closest to" and store it in inds and change the value to 4 if the value of corresponding TIMEDIFF column is greater than the matched DATETIME value.
df_ID1$EV[unlist(mapply(function(x, y) {
mapply(function(p, q) {
vals = as.numeric(difftime(p, y$DATETIME))
inds = which(vals == min(vals[vals > 0]))
q > y$DATETIME[inds]
}, x$DATETIME, x$TIMEDIFF)
}, split(df_ID1, df_ID1$ID), split(df_ID0, df_ID0$ID)))] <- 4
df_ID1
# ID DATETIME TIMEDIFF EV
#1 A 2019-03-26 06:13:00 2019-03-26 00:13:00 1
#2 B 2019-04-03 08:00:00 2019-04-03 02:00:00 4
#3 B 2019-04-04 12:35:00 2019-04-04 06:35:00 1
data
df_ID0 <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L),
.Label = c("A",
"B"), class = "factor"), DATETIME = structure(c(1553529720, 1553547300,
1553570520, 1554208980, 1554346980, 1554491520), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, -6L), class = "data.frame")
df_ID1 <- structure(list(ID = structure(c(1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), DATETIME = structure(c(1553551980, 1554249600,
1554352500), class = c("POSIXct", "POSIXt"), tzone = ""), TIMEDIFF =
structure(c(1553530380,
1554228000, 1554330900), class = c("POSIXct", "POSIXt"), tzone = ""),
EV = c(1, 1, 1)), row.names = c(NA, -3L), class = "data.frame")

Resources