Replace nested ifelse while working with dates in R - r

Here is my data
sampleData <- structure(list(Category = c("A", "B", "C", "D", "E", "F", "G",
"H", "I", "J", "K"), Date = structure(c(1546300800, 1547510400,
1547769600, 1548288000, 1548979200, 1549756800, 1550188800, 1551398400,
1552348800, 1552608000, 1553472000), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
# A tibble: 11 x 2
Category Date
<chr> <dttm>
1 A 2019-01-01
2 B 2019-01-15
3 C 2019-01-18
4 D 2019-01-24
5 E 2019-02-01
6 F 2019-02-10
7 G 2019-02-15
8 H 2019-03-01
9 I 2019-03-12
10 J 2019-03-15
11 K 2019-03-25
lookupData <- structure(list(`Original Date` = structure(c(1546560000, 1547769600,
1548979200, 1550188800, 1551398400, 1552608000, 1553817600, 1555027200,
1556236800, 1557446400, 1558656000, 1559865600), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"))
# A tibble: 12 x 1
`Original Date`
<dttm>
1 2019-01-04
2 2019-01-18
3 2019-02-01
4 2019-02-15
5 2019-03-01
6 2019-03-15
7 2019-03-29
8 2019-04-12
9 2019-04-26
10 2019-05-10
11 2019-05-24
12 2019-06-07
Currently I have multiple ifelse() statements something like this to get this working.
sampleData$ModifiedDate <- ifelse(sampleData$Date <= "2019-01-04", "2019-01-04",
ifelse(sampleData$Date <= "2019-01-18", "2019-01-18",
ifelse(sampleData$Date <= "2019-02-01", "2019-02-01",
ifelse(sampleData$Date <= "2019-02-15", "2019-02-15",
ifelse(sampleData$Date <= "2019-03-01", "2019-03-01",
ifelse(sampleData$Date <= "2019-03-15", "2019-03-15",
ifelse(sampleData$Date <= "2019-03-29", "2019-03-29",
ifelse(sampleData$Date <= "2019-04-12", "2019-04-12",
ifelse(sampleData$Date <= "2019-04-26", "2019-04-26","")))))))))
This works, but it is not the way I would want it. Is there a more efficient way to do this? I tried the merge() and fuzzy_left_join() options but I don't get the desired results like below.

Here's an attempt with fuzzyjoin:
library(dplyr)
lookupData %>%
mutate(z = lag(`Original Date`, default = as.POSIXct("1970-01-01"))) %>%
fuzzyjoin::fuzzy_left_join(
sampleData, .,
by = c(Date = "z", Date = "Original Date"),
match_fun = list(`>`, `<=`)) %>%
select(-z)
# # A tibble: 11 x 3
# Category Date `Original Date`
# <chr> <dttm> <dttm>
# 1 A 2019-01-01 00:00:00 2019-01-04 00:00:00
# 2 B 2019-01-15 00:00:00 2019-01-18 00:00:00
# 3 C 2019-01-18 00:00:00 2019-01-18 00:00:00
# 4 D 2019-01-24 00:00:00 2019-02-01 00:00:00
# 5 E 2019-02-01 00:00:00 2019-02-01 00:00:00
# 6 F 2019-02-15 00:00:00 2019-02-15 00:00:00
# 7 G 2019-02-10 00:00:00 2019-02-15 00:00:00
# 8 H 2019-03-12 00:00:00 2019-03-15 00:00:00
# 9 I 2019-03-01 00:00:00 2019-03-01 00:00:00
# 10 J 2019-03-15 00:00:00 2019-03-15 00:00:00
# 11 K 2019-03-25 00:00:00 2019-03-29 00:00:00

This would be better served with a formula as it appears you are advancing all dates to the following, 2nd Friday. If that is correct then the following will accomplish that and does not matter how long the dates span.
Setting baseDate that is used to determine what is the first date for reference:
baseDate <- structure(1546560000, class = c("POSIXct", "POSIXt"), tzone = "UTC")
Using ceiling to advance the date to the following, 2nd Friday:
sampleData$NewDate <- baseDate + ceiling((sampleData$Date - baseDate) / 14) * 14
Category Date NewDate
1 A 2019-01-01 2019-01-04
2 B 2019-01-15 2019-01-18
3 C 2019-01-18 2019-01-18
4 D 2019-01-24 2019-02-01
5 E 2019-02-01 2019-02-01
6 F 2019-02-15 2019-02-15
7 G 2019-02-10 2019-02-15
8 H 2019-03-12 2019-03-15
9 I 2019-03-01 2019-03-01
10 J 2019-03-15 2019-03-15
11 K 2019-03-25 2019-03-29

Related

Calculate time intervals without any overlap

I have the following data:
# dput:
data <- structure(list(start = structure(c(1641193200, 1641189600, 1641218400,
1641189600, 1641222000, 1641222000, 1641222000), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), end = structure(c(1641218400, 1641218400,
1641241800, 1641218400, 1641241800, 1641241800, 1641232800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "B", "C", "D", "E",
"F", "G")), row.names = c(NA, -7L), class = c("tbl_df", "tbl",
"data.frame"))
data
# A tibble: 7 x 3
start end group
<dttm> <dttm> <chr>
1 2022-01-03 07:00:00 2022-01-03 14:00:00 A
2 2022-01-03 06:00:00 2022-01-03 14:00:00 B
3 2022-01-03 14:00:00 2022-01-03 20:30:00 C
4 2022-01-03 06:00:00 2022-01-03 14:00:00 D
5 2022-01-03 15:00:00 2022-01-03 20:30:00 E
6 2022-01-03 15:00:00 2022-01-03 20:30:00 F
7 2022-01-03 15:00:00 2022-01-03 18:00:00 G
And I want to calculate at what time there only 1 group has an "active" time interval (start to end) without overlapping with any other group.
I already experimented with lubridate and the interval function but had trouble comparing more than 2 Intervals with each other.
Desired Output
The output should give the result that the group C has the time interval from 14:00 to 15:00 that has no overlap with any other group.
You can check ivs::iv_locate_splits to see which time frame is occupied by which group:
library(ivs)
ivv <- iv(data$start, data$end)
iv_locate_splits(ivv)
key loc
1 [2022-01-03 06:00:00, 2022-01-03 07:00:00) 2, 4
2 [2022-01-03 07:00:00, 2022-01-03 08:00:00) 1, 2, 4
3 [2022-01-03 08:00:00, 2022-01-03 14:00:00) 1, 2, 4, 7
4 [2022-01-03 14:00:00, 2022-01-03 15:00:00) 3, 7
5 [2022-01-03 15:00:00, 2022-01-03 18:00:00) 3, 5, 6, 7
6 [2022-01-03 18:00:00, 2022-01-03 20:30:00) 3, 5, 6
Updated framework to get the desired outcome:
library(ivs)
#convert to iv format
ivv <- iv(data$start, data$end)
#Check the splits
spl <- iv_locate_splits(ivv)
#Get the index of splits with only 1 group
index <- unlist(spl$loc[lengths(spl$loc) == 1])
#Create the desired outcome using the index
data.frame(frame = spl$key[index],
group = data$group[index])
# frame group
#1 [2022-01-03 14:00:00, 2022-01-03 15:00:00) C

adding multiple columns dynamically in tidyverse

I have a table with multiple datetime columns, I wish to extract weekday for each of those columns and add as a new column.
Sample dataset:
structure(list(mealTime = structure(c(1542492000, 1578852000,
1604253600, 1545901200, 1549821600, 1544306400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), weight_measure_time = structure(c(1542226000, 1578812400,
1594710000, 1545896762, 1546416823, 1544227245), tzone = "UTC", class = c("POSIXct",
"POSIXt")), height_measure_time = structure(c(1542106434, 1543337043,
1543337043, 1542387988, 1542366547, 1542802228), tzone = "UTC", class = c("POSIXct",
"POSIXt")), hba1c_measure_time = structure(c(1542106860, 1573455600,
1594625400, 1544781600, 1545920520, 1544096580), tzone = "UTC", class = c("POSIXct",
"POSIXt")), bpMeasureTime = structure(c(1542380623, 1578812400,
1583218800, 1545896774, 1546416837, 1544266110), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
which looks something like this:
> smple
# A tibble: 6 x 5
mealTime weight_measure_time height_measure_time
<dttm> <dttm> <dttm>
1 2018-11-17 22:00:00 2018-11-14 20:06:40 2018-11-13 10:53:54
2 2020-01-12 18:00:00 2020-01-12 07:00:00 2018-11-27 16:44:03
3 2020-11-01 18:00:00 2020-07-14 07:00:00 2018-11-27 16:44:03
4 2018-12-27 09:00:00 2018-12-27 07:46:02 2018-11-16 17:06:28
5 2019-02-10 18:00:00 2019-01-02 08:13:43 2018-11-16 11:09:07
6 2018-12-08 22:00:00 2018-12-08 00:00:45 2018-11-21 12:10:28
# ... with 2 more variables: hba1c_measure_time <dttm>, bpMeasureTime <dttm>
For the above dataset, the expected result I am expecting is, i.e. for each datetime column extract the weekday and add it in respective column:
glimpse(smple)
Rows: 6
Columns: 10
$ mealTime <dttm> 2018-11-17 22:00:00, 2020-01-12 18:00:00, 20~
$ weight_measure_time <dttm> 2018-11-14 20:06:40, 2020-01-12 07:00:00, 20~
$ height_measure_time <dttm> 2018-11-13 10:53:54, 2018-11-27 16:44:03, 20~
$ hba1c_measure_time <dttm> 2018-11-13 11:01:00, 2019-11-11 07:00:00, 20~
$ bpMeasureTime <dttm> 2018-11-16 15:03:43, 2020-01-12 07:00:00, 20~
$ mealTime_day <chr> "Saturday", "Sunday", "Sunday", "Thursday", "~
$ weight_measure_time_day <chr> "Wednesday", "Sunday", "Tuesday", "Thursday",~
$ height_measure_time_day <chr> "Tuesday", "Tuesday", "Tuesday", "Friday", "F~
$ hba1c_measure_time_day <chr> "Tuesday", "Monday", "Monday", "Friday", "Thu~
$ bpMeasureTime_day <chr> "Friday", "Sunday", "Tuesday", "Thursday", "W~
In the base R, I can achieve the above as follows:
smple[paste(colnames(smple), "day", sep="_")] = apply(smple, 2, lubridate::wday, label=TRUE, abbr=FALSE)
I wanted to know if there is a similar way in tidyverse, which adds column dynamically by evaluating both LHS and RHS.
Making use of across and where you could do:
library(dplyr)
library(lubridate)
mutate(smpl, across(where(is.POSIXct), lubridate::wday,
label=TRUE, abbr=FALSE, .names = "{.col}_day"))
#> # A tibble: 6 x 10
#> mealTime weight_measure_time height_measure_time
#> <dttm> <dttm> <dttm>
#> 1 2018-11-17 22:00:00 2018-11-14 20:06:40 2018-11-13 10:53:54
#> 2 2020-01-12 18:00:00 2020-01-12 07:00:00 2018-11-27 16:44:03
#> 3 2020-11-01 18:00:00 2020-07-14 07:00:00 2018-11-27 16:44:03
#> 4 2018-12-27 09:00:00 2018-12-27 07:46:02 2018-11-16 17:06:28
#> 5 2019-02-10 18:00:00 2019-01-02 08:13:43 2018-11-16 11:09:07
#> 6 2018-12-08 22:00:00 2018-12-08 00:00:45 2018-11-21 12:10:28
#> # … with 7 more variables: hba1c_measure_time <dttm>, bpMeasureTime <dttm>,
#> # mealTime_day <dbl>, weight_measure_time_day <dbl>,
#> # height_measure_time_day <dbl>, hba1c_measure_time_day <dbl>,
#> # bpMeasureTime_day <dbl>
Here is one way to solve your problem:
df[paste0(names(df), "_day")] <- lapply(df, weekdays)
Base R solution:
cbind(
df,
setNames(
data.frame(
Map(
weekdays,
df
)
),
paste0(
names(df),
ifelse(
grepl(
"_",
names(df)
),
"_day_of_week",
"DayOfWeek"
)
)
)
)
dplyr solution only using weekdays from base R
library(dplyr)
df %>%
mutate(across(everything(), weekdays, .names = "{.col}_day"))
Output:
mealTime weight_measure_time height_measure_time hba1c_measure_time bpMeasureTime mealTime_day weight_measure_time_day
<dttm> <dttm> <dttm> <dttm> <dttm> <chr> <chr>
1 2018-11-17 22:00:00 2018-11-14 20:06:40 2018-11-13 10:53:54 2018-11-13 11:01:00 2018-11-16 15:03:43 Samstag Mittwoch
2 2020-01-12 18:00:00 2020-01-12 07:00:00 2018-11-27 16:44:03 2019-11-11 07:00:00 2020-01-12 07:00:00 Sonntag Sonntag
3 2020-11-01 18:00:00 2020-07-14 07:00:00 2018-11-27 16:44:03 2020-07-13 07:30:00 2020-03-03 07:00:00 Sonntag Dienstag
4 2018-12-27 09:00:00 2018-12-27 07:46:02 2018-11-16 17:06:28 2018-12-14 10:00:00 2018-12-27 07:46:14 Donnerstag Donnerstag
5 2019-02-10 18:00:00 2019-01-02 08:13:43 2018-11-16 11:09:07 2018-12-27 14:22:00 2019-01-02 08:13:57 Sonntag Mittwoch
6 2018-12-08 22:00:00 2018-12-08 00:00:45 2018-11-21 12:10:28 2018-12-06 11:43:00 2018-12-08 10:48:30 Samstag Samstag
# ... with 3 more variables: height_measure_time_day <chr>, hba1c_measure_time_day <chr>, bpMeasureTime_day <chr>

Check if dates are within a time frame r

I have two datasets, one with values at specific time points for different IDs and another one with several time frames for the IDs. Now I want to check if the timepoint in dataframe one is within any of the time frames from dataset 2 matching the ID.
For example:
df1:
ID date time
1 2020-04-14 11:00:00
1 2020-04-14 18:00:00
1 2020-04-15 10:00:00
1 2020-04-15 20:00:00
1 2020-04-16 11:00:00
1 ...
2 ...
df2:
ID start end
1 2020-04-14 16:00:00 2020-04-14 20:00:00
1 2020-04-15 18:00:00 2020-04-16 13:00:00
2 ...
2
what I want
df1_new:
ID date time mark
1 2020-04-14 11:00:00 0
1 2020-04-14 18:00:00 1
1 2020-04-15 10:00:00 0
1 2020-04-15 20:00:00 1
1 2020-04-16 11:00:00 1
1 ...
2 ...
Any help would be appreciated!
An option could be:
library(tidyverse)
library(lubridate)
#> date, intersect, setdiff, union
df_1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L), date = c("14.04.2020",
"14.04.2020", "15.04.2020", "15.04.2020", "16.04.2020"), time = c("11:00:00",
"18:00:00", "10:00:00", "20:00:00", "11:00:00"), date_time = structure(c(1586862000,
1586887200, 1586944800, 1586980800, 1587034800), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA,
-5L))
df_2 <- structure(list(ID = c(1L, 1L), start = c("14.04.2020 16:00",
"15.04.2020 18:00"), end = c("14.04.2020 20:00", "16.04.2020 13:00"
)), class = "data.frame", row.names = c(NA, -2L))
df_22 <- df_2 %>%
mutate(across(c("start", "end"), dmy_hm)) %>%
group_nest(ID)
left_join(x = df_1, y = df_22, by = "ID") %>%
as_tibble() %>%
mutate(mark = map2_dbl(date_time, data, ~+any(.x %within% interval(.y$start, .y$end)))) %>%
select(-data)
#> # A tibble: 5 x 5
#> ID date time date_time mark
#> <int> <chr> <chr> <dttm> <dbl>
#> 1 1 14.04.2020 11:00:00 2020-04-14 11:00:00 0
#> 2 1 14.04.2020 18:00:00 2020-04-14 18:00:00 1
#> 3 1 15.04.2020 10:00:00 2020-04-15 10:00:00 0
#> 4 1 15.04.2020 20:00:00 2020-04-15 20:00:00 1
#> 5 1 16.04.2020 11:00:00 2020-04-16 11:00:00 1
Created on 2021-05-25 by the reprex package (v2.0.0)

How do I check whether dates are in chronological order in R?

I have a large data set with 4 date columns (let's say Date_1, Date_2, Date_3, Date_4). I would like to check whether Date_1 occurs before Date_2, Date_2 before Date_3, and Date_3 before Date_4. How would I do this? I've thought of doing a nested if statement but haven't had much luck.
Assuming your data is similar to this :
df <- structure(list(id = 1:4, Date_1 = structure(c(18534, 18544, 18536,
18547), class = "Date"), Date_2 = structure(c(18533, 18539, 18540,
18545), class = "Date"), Date_3 = structure(c(18532, 18535, 18543,
18541), class = "Date"), Date_4 = structure(c(18537, 18550, 18545,
18537), class = "Date")), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
df
# A tibble: 4 x 5
# id Date_1 Date_2 Date_3 Date_4
# <int> <date> <date> <date> <date>
#1 1 2020-09-29 2020-09-28 2020-09-27 2020-10-02
#2 2 2020-10-09 2020-10-04 2020-09-30 2020-10-15
#3 3 2020-10-01 2020-10-05 2020-10-08 2020-10-10
#4 4 2020-10-12 2020-10-10 2020-10-06 2020-10-02
You can use rowwise with diff to check if all the date values occur before the next one.
library(dplyr)
df %>%
rowwise() %>%
mutate(check = all(diff(c_across(contains('date'))) > 0))
# id Date_1 Date_2 Date_3 Date_4 check
# <int> <date> <date> <date> <date> <lgl>
#1 1 2020-09-29 2020-09-28 2020-09-27 2020-10-02 FALSE
#2 2 2020-10-09 2020-10-04 2020-09-30 2020-10-15 FALSE
#3 3 2020-10-01 2020-10-05 2020-10-08 2020-10-10 TRUE
#4 4 2020-10-12 2020-10-10 2020-10-06 2020-10-02 FALSE
In base R, you can do this with apply :
cols <- grep('Date', names(df))
df$check <- apply(df[cols], 1, function(x) all(diff(as.Date(x)) > 0))

union/merge overlapping time-ranges [duplicate]

This question already has answers here:
How to flatten / merge overlapping time periods
(5 answers)
Closed 4 years ago.
I know the following problam can be solved using Bioconductor's IRanges-package, using reduce.
But since that function only accepts numeric input, and I am working with data.table anyway, I am wondering is the following van be achieved using data.tables'foverlaps().
Sample data
structure(list(group = c("A", "A", "A", "A", "B", "B", "B", "B"
), subgroup = c(1, 1, 2, 2, 1, 1, 2, 2), start = structure(c(1514793600,
1514795400, 1514794200, 1514798100, 1514815200, 1514817000, 1514815800,
1514818800), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
end = structure(c(1514794500, 1514797200, 1514794800, 1514799000,
1514816100, 1514818800, 1514817600, 1514820600), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
# group subgroup start end
# 1: A 1 2018-01-01 08:00:00 2018-01-01 08:15:00
# 2: A 1 2018-01-01 08:30:00 2018-01-01 09:00:00
# 3: A 2 2018-01-01 08:10:00 2018-01-01 08:20:00
# 4: A 2 2018-01-01 09:15:00 2018-01-01 09:30:00
# 5: B 1 2018-01-01 14:00:00 2018-01-01 14:15:00
# 6: B 1 2018-01-01 14:30:00 2018-01-01 15:00:00
# 7: B 2 2018-01-01 14:10:00 2018-01-01 14:40:00
# 8: B 2 2018-01-01 15:00:00 2018-01-01 15:30:00
Question
What I would like to achieve, is to join/merge events (by group) when:
a range (start - end) overlaps (or partially overlaps) another range
the start of a range is the end of another range
Subgroups can be ignored
As mentioned above, I'm know this can be done using biocondustor's IRanges reduce, but I wonder if the same can be achieved using data.table. I can't shake the feeling that foverlaps should be able to tackle my problem, but I cannot figure out how...
Since I'm an intermediate R-user, but pretty much a novice in data.table, it's hard for me to 'read' some solutions already provided on stackoverflow. So I'm not sure if a similar quenstion has already been asked and answered (if so, please be gentle ;-) )
Desired output
structure(list(group = c("A", "A", "A", "B"), start = structure(c(1514793600,
1514795400, 1514798100, 1514815200), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), end = structure(c(1514794800, 1514797200,
1514799000, 1514820600), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
# group start end
# 1: A 2018-01-01 08:00:00 2018-01-01 08:20:00
# 2: A 2018-01-01 08:30:00 2018-01-01 09:00:00
# 3: A 2018-01-01 09:15:00 2018-01-01 09:30:00
# 4: B 2018-01-01 14:00:00 2018-01-01 15:30:00
If you arrange on group and start (in that order) and unselect the indx column, this solution posted by David Arenburg works perfectly: How to flatten/merge overlapping time periods in R
library(dplyr)
df1 %>%
group_by(group) %>%
arrange(group, start) %>%
mutate(indx = c(0, cumsum(as.numeric(lead(start)) >
cummax(as.numeric(end)))[-n()])) %>%
group_by(group, indx) %>%
summarise(start = first(start), end = last(end)) %>%
select(-indx)
group start end
<chr> <dttm> <dttm>
1 A 2018-01-01 08:00:00 2018-01-01 08:20:00
2 A 2018-01-01 08:30:00 2018-01-01 09:00:00
3 A 2018-01-01 09:15:00 2018-01-01 09:30:00
4 B 2018-01-01 14:00:00 2018-01-01 15:30:00

Resources