I have a table with multiple datetime columns, I wish to extract weekday for each of those columns and add as a new column.
Sample dataset:
structure(list(mealTime = structure(c(1542492000, 1578852000,
1604253600, 1545901200, 1549821600, 1544306400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), weight_measure_time = structure(c(1542226000, 1578812400,
1594710000, 1545896762, 1546416823, 1544227245), tzone = "UTC", class = c("POSIXct",
"POSIXt")), height_measure_time = structure(c(1542106434, 1543337043,
1543337043, 1542387988, 1542366547, 1542802228), tzone = "UTC", class = c("POSIXct",
"POSIXt")), hba1c_measure_time = structure(c(1542106860, 1573455600,
1594625400, 1544781600, 1545920520, 1544096580), tzone = "UTC", class = c("POSIXct",
"POSIXt")), bpMeasureTime = structure(c(1542380623, 1578812400,
1583218800, 1545896774, 1546416837, 1544266110), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
which looks something like this:
> smple
# A tibble: 6 x 5
mealTime weight_measure_time height_measure_time
<dttm> <dttm> <dttm>
1 2018-11-17 22:00:00 2018-11-14 20:06:40 2018-11-13 10:53:54
2 2020-01-12 18:00:00 2020-01-12 07:00:00 2018-11-27 16:44:03
3 2020-11-01 18:00:00 2020-07-14 07:00:00 2018-11-27 16:44:03
4 2018-12-27 09:00:00 2018-12-27 07:46:02 2018-11-16 17:06:28
5 2019-02-10 18:00:00 2019-01-02 08:13:43 2018-11-16 11:09:07
6 2018-12-08 22:00:00 2018-12-08 00:00:45 2018-11-21 12:10:28
# ... with 2 more variables: hba1c_measure_time <dttm>, bpMeasureTime <dttm>
For the above dataset, the expected result I am expecting is, i.e. for each datetime column extract the weekday and add it in respective column:
glimpse(smple)
Rows: 6
Columns: 10
$ mealTime <dttm> 2018-11-17 22:00:00, 2020-01-12 18:00:00, 20~
$ weight_measure_time <dttm> 2018-11-14 20:06:40, 2020-01-12 07:00:00, 20~
$ height_measure_time <dttm> 2018-11-13 10:53:54, 2018-11-27 16:44:03, 20~
$ hba1c_measure_time <dttm> 2018-11-13 11:01:00, 2019-11-11 07:00:00, 20~
$ bpMeasureTime <dttm> 2018-11-16 15:03:43, 2020-01-12 07:00:00, 20~
$ mealTime_day <chr> "Saturday", "Sunday", "Sunday", "Thursday", "~
$ weight_measure_time_day <chr> "Wednesday", "Sunday", "Tuesday", "Thursday",~
$ height_measure_time_day <chr> "Tuesday", "Tuesday", "Tuesday", "Friday", "F~
$ hba1c_measure_time_day <chr> "Tuesday", "Monday", "Monday", "Friday", "Thu~
$ bpMeasureTime_day <chr> "Friday", "Sunday", "Tuesday", "Thursday", "W~
In the base R, I can achieve the above as follows:
smple[paste(colnames(smple), "day", sep="_")] = apply(smple, 2, lubridate::wday, label=TRUE, abbr=FALSE)
I wanted to know if there is a similar way in tidyverse, which adds column dynamically by evaluating both LHS and RHS.
Making use of across and where you could do:
library(dplyr)
library(lubridate)
mutate(smpl, across(where(is.POSIXct), lubridate::wday,
label=TRUE, abbr=FALSE, .names = "{.col}_day"))
#> # A tibble: 6 x 10
#> mealTime weight_measure_time height_measure_time
#> <dttm> <dttm> <dttm>
#> 1 2018-11-17 22:00:00 2018-11-14 20:06:40 2018-11-13 10:53:54
#> 2 2020-01-12 18:00:00 2020-01-12 07:00:00 2018-11-27 16:44:03
#> 3 2020-11-01 18:00:00 2020-07-14 07:00:00 2018-11-27 16:44:03
#> 4 2018-12-27 09:00:00 2018-12-27 07:46:02 2018-11-16 17:06:28
#> 5 2019-02-10 18:00:00 2019-01-02 08:13:43 2018-11-16 11:09:07
#> 6 2018-12-08 22:00:00 2018-12-08 00:00:45 2018-11-21 12:10:28
#> # … with 7 more variables: hba1c_measure_time <dttm>, bpMeasureTime <dttm>,
#> # mealTime_day <dbl>, weight_measure_time_day <dbl>,
#> # height_measure_time_day <dbl>, hba1c_measure_time_day <dbl>,
#> # bpMeasureTime_day <dbl>
Here is one way to solve your problem:
df[paste0(names(df), "_day")] <- lapply(df, weekdays)
Base R solution:
cbind(
df,
setNames(
data.frame(
Map(
weekdays,
df
)
),
paste0(
names(df),
ifelse(
grepl(
"_",
names(df)
),
"_day_of_week",
"DayOfWeek"
)
)
)
)
dplyr solution only using weekdays from base R
library(dplyr)
df %>%
mutate(across(everything(), weekdays, .names = "{.col}_day"))
Output:
mealTime weight_measure_time height_measure_time hba1c_measure_time bpMeasureTime mealTime_day weight_measure_time_day
<dttm> <dttm> <dttm> <dttm> <dttm> <chr> <chr>
1 2018-11-17 22:00:00 2018-11-14 20:06:40 2018-11-13 10:53:54 2018-11-13 11:01:00 2018-11-16 15:03:43 Samstag Mittwoch
2 2020-01-12 18:00:00 2020-01-12 07:00:00 2018-11-27 16:44:03 2019-11-11 07:00:00 2020-01-12 07:00:00 Sonntag Sonntag
3 2020-11-01 18:00:00 2020-07-14 07:00:00 2018-11-27 16:44:03 2020-07-13 07:30:00 2020-03-03 07:00:00 Sonntag Dienstag
4 2018-12-27 09:00:00 2018-12-27 07:46:02 2018-11-16 17:06:28 2018-12-14 10:00:00 2018-12-27 07:46:14 Donnerstag Donnerstag
5 2019-02-10 18:00:00 2019-01-02 08:13:43 2018-11-16 11:09:07 2018-12-27 14:22:00 2019-01-02 08:13:57 Sonntag Mittwoch
6 2018-12-08 22:00:00 2018-12-08 00:00:45 2018-11-21 12:10:28 2018-12-06 11:43:00 2018-12-08 10:48:30 Samstag Samstag
# ... with 3 more variables: height_measure_time_day <chr>, hba1c_measure_time_day <chr>, bpMeasureTime_day <chr>
Related
i have a dataframe with this data :
# A tibble: 6 × 3
rowid Arrival Depart
<int> <dttm> <dttm>
1 1 2023-02-11 07:00:00 2023-02-11 17:30:00
2 2 2023-02-13 10:00:00 2023-02-13 18:00:00
3 3 2023-02-14 08:00:00 2023-02-14 17:00:00
4 4 2023-02-15 08:00:00 2023-02-15 17:00:00
5 5 2023-02-16 08:00:00 2023-02-16 18:00:00
6 6 2023-02-18 07:00:00 2023-02-18 17:30:00
structure(list(rowid = 1:6, Arrival = structure(c(1676098800,
1676282400, 1676361600, 1676448000, 1676534400, 1676703600), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Depart = structure(c(1676136600, 1676311200, 1676394000,
1676480400, 1676570400, 1676741400), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
i set the following objects :
ri <- 2
int <- test_int[2]
int (an interval) becomes :
> int
[1] 2023-02-13 06:00:00 EST--2023-02-13 13:00:00 EST
and then run this code :
test <- test %>% mutate(
interval_start = if_else(rowid == ri, int_start(int), Arrival),
interval_end = if_else(rowid == ri, int_end(int), Depart)
) %>% select(Arrival, interval_start, Depart, interval_end)
the result is this :
# A tibble: 6 × 4
Arrival interval_start Depart interval_end
<dttm> <dttm> <dttm> <dttm>
1 2023-02-11 07:00:00 2023-02-11 02:00:00 2023-02-11 17:30:00 2023-02-11 12:30:00
2 2023-02-13 10:00:00 2023-02-13 06:00:00 2023-02-13 18:00:00 2023-02-13 13:00:00
3 2023-02-14 08:00:00 2023-02-14 03:00:00 2023-02-14 17:00:00 2023-02-14 12:00:00
4 2023-02-15 08:00:00 2023-02-15 03:00:00 2023-02-15 17:00:00 2023-02-15 12:00:00
5 2023-02-16 08:00:00 2023-02-16 03:00:00 2023-02-16 18:00:00 2023-02-16 13:00:00
6 2023-02-18 07:00:00 2023-02-18 02:00:00 2023-02-18 17:30:00 2023-02-18 12:30:00
structure(list(Arrival = structure(c(1676098800, 1676282400,
1676361600, 1676448000, 1676534400, 1676703600), tzone = "UTC", class = c("POSIXct",
"POSIXt")), interval_start = structure(c(1676098800, 1676286000,
1676361600, 1676448000, 1676534400, 1676703600), class = c("POSIXct",
"POSIXt")), Depart = structure(c(1676136600, 1676311200, 1676394000,
1676480400, 1676570400, 1676741400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), interval_end = structure(c(1676136600, 1676311200,
1676394000, 1676480400, 1676570400, 1676741400), class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
for some reason the if_else statement is returning some weird times instead of the values for arrival/depart, but it is correctly mutating rowid 2 to the correct time.
anyone knows why this could be and how i can fix it?
thanks to #George Savva, i was able to see that the erroneous values i was getting was because my int variable was in the EST timezone instead of UTC which is why there was a 5 hour difference
I have the following data:
# dput:
data <- structure(list(start = structure(c(1641193200, 1641189600, 1641218400,
1641189600, 1641222000, 1641222000, 1641222000), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), end = structure(c(1641218400, 1641218400,
1641241800, 1641218400, 1641241800, 1641241800, 1641232800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "B", "C", "D", "E",
"F", "G")), row.names = c(NA, -7L), class = c("tbl_df", "tbl",
"data.frame"))
data
# A tibble: 7 x 3
start end group
<dttm> <dttm> <chr>
1 2022-01-03 07:00:00 2022-01-03 14:00:00 A
2 2022-01-03 06:00:00 2022-01-03 14:00:00 B
3 2022-01-03 14:00:00 2022-01-03 20:30:00 C
4 2022-01-03 06:00:00 2022-01-03 14:00:00 D
5 2022-01-03 15:00:00 2022-01-03 20:30:00 E
6 2022-01-03 15:00:00 2022-01-03 20:30:00 F
7 2022-01-03 15:00:00 2022-01-03 18:00:00 G
And I want to calculate at what time there only 1 group has an "active" time interval (start to end) without overlapping with any other group.
I already experimented with lubridate and the interval function but had trouble comparing more than 2 Intervals with each other.
Desired Output
The output should give the result that the group C has the time interval from 14:00 to 15:00 that has no overlap with any other group.
You can check ivs::iv_locate_splits to see which time frame is occupied by which group:
library(ivs)
ivv <- iv(data$start, data$end)
iv_locate_splits(ivv)
key loc
1 [2022-01-03 06:00:00, 2022-01-03 07:00:00) 2, 4
2 [2022-01-03 07:00:00, 2022-01-03 08:00:00) 1, 2, 4
3 [2022-01-03 08:00:00, 2022-01-03 14:00:00) 1, 2, 4, 7
4 [2022-01-03 14:00:00, 2022-01-03 15:00:00) 3, 7
5 [2022-01-03 15:00:00, 2022-01-03 18:00:00) 3, 5, 6, 7
6 [2022-01-03 18:00:00, 2022-01-03 20:30:00) 3, 5, 6
Updated framework to get the desired outcome:
library(ivs)
#convert to iv format
ivv <- iv(data$start, data$end)
#Check the splits
spl <- iv_locate_splits(ivv)
#Get the index of splits with only 1 group
index <- unlist(spl$loc[lengths(spl$loc) == 1])
#Create the desired outcome using the index
data.frame(frame = spl$key[index],
group = data$group[index])
# frame group
#1 [2022-01-03 14:00:00, 2022-01-03 15:00:00) C
I have a dataset with temperature data for each day, so i grouped them by date. In the end i have a list with dataframes for each day. Now what i want to do is i want to filter by a range all these dataframes. the filter is the mean value of temperature for that day(dataframe) +- 0.5°C.
But the problem is that each dataframe in the list has a different mean value (I hope im clear).
So i want to filter by the mean values of a column but this mean changes for every dataframe.
How can i solve this problem.
I'm an amateur in R so anything is helpful. Thank you in advance
This is a short version of the my list
structure(list(structure(list(Date = structure(c(1646434800,
1646434800, 1646434800, 1646434800, 1646434800, 1646434800, 1646434800,
1646434800, 1646434800, 1646434800), tzone = "", class = c("POSIXct",
"POSIXt")), V4 = c(0.875, 0.5, 0.1875, -0.1875, -0.5, -0.8125,
-1.125, -1.375, -1.625, -1.875)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(Date = structure(c(1646521200,
1646521200, 1646521200, 1646521200, 1646521200, 1646521200, 1646521200,
1646521200, 1646521200, 1646521200, 1646521200), tzone = "", class = c("POSIXct",
"POSIXt")), V4 = c(3.75, 3.75, 3.6875, 3.6875, 3.6875, 3.6875,
3.6875, 3.625, 3.625, 3.625, 3.625)), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(Date = structure(c(1646607600,
1646607600, 1646607600, 1646607600, 1646607600, 1646607600, 1646607600,
1646607600, 1646607600, 1646607600, 1646607600), tzone = "", class = c("POSIXct",
"POSIXt")), V4 = c(3.6875, 3.6875, 3.6875, 3.6875, 3.6875, 3.625,
3.625, 3.625, 3.625, 3.625, 3.625)), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))), ptype = structure(list(Date = structure(numeric(0), tzone = "", class = c("POSIXct",
"POSIXt")), V4 = numeric(0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = integer(0)), class = c("vctrs_list_of", "vctrs_vctr",
"list"))
You can do this in several ways. Suppose mydata is the list that you provided in the question.
In dplyr you can bind the rows of all the data frames in mydata first to create a single data frame, and then group them by the Date, and then apply the filter to each group. The result is a data frame.
do.call(rbind, mydata) %>%
group_by(Date) %>% filter((V4 <= mean(V4) + 0.5) &
(V4 >= mean(V4)-0.5))
# A tibble: 25 x 2
# Groups: Date [3]
# Date V4
# <dttm> <dbl>
# 1 2022-03-05 06:00:00 -0.188
# 2 2022-03-05 06:00:00 -0.5
# 3 2022-03-05 06:00:00 -0.812
# 4 2022-03-06 06:00:00 3.75
# 5 2022-03-06 06:00:00 3.75
# 6 2022-03-06 06:00:00 3.69
# 7 2022-03-06 06:00:00 3.69
# 8 2022-03-06 06:00:00 3.69
# 9 2022-03-06 06:00:00 3.69
# 10 2022-03-06 06:00:00 3.69
# ... with 15 more rows
In R base you can define your function that filters a single data frame, and then apply the function to mydata. The result is a list of data frames.
myfilter <- function(df) {
cond <- (df$V4 <= mean(df$V4 + 0.5) & (df$V4 >= mean(df$V4) - 0.5))
result <- df[cond,]
return(result)
}
lapply(mydata, myfilter)
# [[1]]
# # A tibble: 3 x 2
# Date V4
# <dttm> <dbl>
# 1 2022-03-05 06:00:00 -0.188
# 2 2022-03-05 06:00:00 -0.5
# 3 2022-03-05 06:00:00 -0.812
#
# [[2]]
# # A tibble: 11 x 2
# Date V4
# <dttm> <dbl>
# 1 2022-03-06 06:00:00 3.75
# 2 2022-03-06 06:00:00 3.75
# 3 2022-03-06 06:00:00 3.69
# 4 2022-03-06 06:00:00 3.69
# 5 2022-03-06 06:00:00 3.69
# 6 2022-03-06 06:00:00 3.69
# 7 2022-03-06 06:00:00 3.69
# 8 2022-03-06 06:00:00 3.62
# 9 2022-03-06 06:00:00 3.62
# 10 2022-03-06 06:00:00 3.62
# 11 2022-03-06 06:00:00 3.62
#
# [[3]]
# # A tibble: 11 x 2
# Date V4
# <dttm> <dbl>
# 1 2022-03-07 06:00:00 3.69
# 2 2022-03-07 06:00:00 3.69
# 3 2022-03-07 06:00:00 3.69
# 4 2022-03-07 06:00:00 3.69
# 5 2022-03-07 06:00:00 3.69
# 6 2022-03-07 06:00:00 3.62
# 7 2022-03-07 06:00:00 3.62
# 8 2022-03-07 06:00:00 3.62
# 9 2022-03-07 06:00:00 3.62
# 10 2022-03-07 06:00:00 3.62
# 11 2022-03-07 06:00:00 3.62
I have two datasets, one with values at specific time points for different IDs and another one with several time frames for the IDs. Now I want to check if the timepoint in dataframe one is within any of the time frames from dataset 2 matching the ID.
For example:
df1:
ID date time
1 2020-04-14 11:00:00
1 2020-04-14 18:00:00
1 2020-04-15 10:00:00
1 2020-04-15 20:00:00
1 2020-04-16 11:00:00
1 ...
2 ...
df2:
ID start end
1 2020-04-14 16:00:00 2020-04-14 20:00:00
1 2020-04-15 18:00:00 2020-04-16 13:00:00
2 ...
2
what I want
df1_new:
ID date time mark
1 2020-04-14 11:00:00 0
1 2020-04-14 18:00:00 1
1 2020-04-15 10:00:00 0
1 2020-04-15 20:00:00 1
1 2020-04-16 11:00:00 1
1 ...
2 ...
Any help would be appreciated!
An option could be:
library(tidyverse)
library(lubridate)
#> date, intersect, setdiff, union
df_1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L), date = c("14.04.2020",
"14.04.2020", "15.04.2020", "15.04.2020", "16.04.2020"), time = c("11:00:00",
"18:00:00", "10:00:00", "20:00:00", "11:00:00"), date_time = structure(c(1586862000,
1586887200, 1586944800, 1586980800, 1587034800), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA,
-5L))
df_2 <- structure(list(ID = c(1L, 1L), start = c("14.04.2020 16:00",
"15.04.2020 18:00"), end = c("14.04.2020 20:00", "16.04.2020 13:00"
)), class = "data.frame", row.names = c(NA, -2L))
df_22 <- df_2 %>%
mutate(across(c("start", "end"), dmy_hm)) %>%
group_nest(ID)
left_join(x = df_1, y = df_22, by = "ID") %>%
as_tibble() %>%
mutate(mark = map2_dbl(date_time, data, ~+any(.x %within% interval(.y$start, .y$end)))) %>%
select(-data)
#> # A tibble: 5 x 5
#> ID date time date_time mark
#> <int> <chr> <chr> <dttm> <dbl>
#> 1 1 14.04.2020 11:00:00 2020-04-14 11:00:00 0
#> 2 1 14.04.2020 18:00:00 2020-04-14 18:00:00 1
#> 3 1 15.04.2020 10:00:00 2020-04-15 10:00:00 0
#> 4 1 15.04.2020 20:00:00 2020-04-15 20:00:00 1
#> 5 1 16.04.2020 11:00:00 2020-04-16 11:00:00 1
Created on 2021-05-25 by the reprex package (v2.0.0)
Here is my data
sampleData <- structure(list(Category = c("A", "B", "C", "D", "E", "F", "G",
"H", "I", "J", "K"), Date = structure(c(1546300800, 1547510400,
1547769600, 1548288000, 1548979200, 1549756800, 1550188800, 1551398400,
1552348800, 1552608000, 1553472000), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"))
# A tibble: 11 x 2
Category Date
<chr> <dttm>
1 A 2019-01-01
2 B 2019-01-15
3 C 2019-01-18
4 D 2019-01-24
5 E 2019-02-01
6 F 2019-02-10
7 G 2019-02-15
8 H 2019-03-01
9 I 2019-03-12
10 J 2019-03-15
11 K 2019-03-25
lookupData <- structure(list(`Original Date` = structure(c(1546560000, 1547769600,
1548979200, 1550188800, 1551398400, 1552608000, 1553817600, 1555027200,
1556236800, 1557446400, 1558656000, 1559865600), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"))
# A tibble: 12 x 1
`Original Date`
<dttm>
1 2019-01-04
2 2019-01-18
3 2019-02-01
4 2019-02-15
5 2019-03-01
6 2019-03-15
7 2019-03-29
8 2019-04-12
9 2019-04-26
10 2019-05-10
11 2019-05-24
12 2019-06-07
Currently I have multiple ifelse() statements something like this to get this working.
sampleData$ModifiedDate <- ifelse(sampleData$Date <= "2019-01-04", "2019-01-04",
ifelse(sampleData$Date <= "2019-01-18", "2019-01-18",
ifelse(sampleData$Date <= "2019-02-01", "2019-02-01",
ifelse(sampleData$Date <= "2019-02-15", "2019-02-15",
ifelse(sampleData$Date <= "2019-03-01", "2019-03-01",
ifelse(sampleData$Date <= "2019-03-15", "2019-03-15",
ifelse(sampleData$Date <= "2019-03-29", "2019-03-29",
ifelse(sampleData$Date <= "2019-04-12", "2019-04-12",
ifelse(sampleData$Date <= "2019-04-26", "2019-04-26","")))))))))
This works, but it is not the way I would want it. Is there a more efficient way to do this? I tried the merge() and fuzzy_left_join() options but I don't get the desired results like below.
Here's an attempt with fuzzyjoin:
library(dplyr)
lookupData %>%
mutate(z = lag(`Original Date`, default = as.POSIXct("1970-01-01"))) %>%
fuzzyjoin::fuzzy_left_join(
sampleData, .,
by = c(Date = "z", Date = "Original Date"),
match_fun = list(`>`, `<=`)) %>%
select(-z)
# # A tibble: 11 x 3
# Category Date `Original Date`
# <chr> <dttm> <dttm>
# 1 A 2019-01-01 00:00:00 2019-01-04 00:00:00
# 2 B 2019-01-15 00:00:00 2019-01-18 00:00:00
# 3 C 2019-01-18 00:00:00 2019-01-18 00:00:00
# 4 D 2019-01-24 00:00:00 2019-02-01 00:00:00
# 5 E 2019-02-01 00:00:00 2019-02-01 00:00:00
# 6 F 2019-02-15 00:00:00 2019-02-15 00:00:00
# 7 G 2019-02-10 00:00:00 2019-02-15 00:00:00
# 8 H 2019-03-12 00:00:00 2019-03-15 00:00:00
# 9 I 2019-03-01 00:00:00 2019-03-01 00:00:00
# 10 J 2019-03-15 00:00:00 2019-03-15 00:00:00
# 11 K 2019-03-25 00:00:00 2019-03-29 00:00:00
This would be better served with a formula as it appears you are advancing all dates to the following, 2nd Friday. If that is correct then the following will accomplish that and does not matter how long the dates span.
Setting baseDate that is used to determine what is the first date for reference:
baseDate <- structure(1546560000, class = c("POSIXct", "POSIXt"), tzone = "UTC")
Using ceiling to advance the date to the following, 2nd Friday:
sampleData$NewDate <- baseDate + ceiling((sampleData$Date - baseDate) / 14) * 14
Category Date NewDate
1 A 2019-01-01 2019-01-04
2 B 2019-01-15 2019-01-18
3 C 2019-01-18 2019-01-18
4 D 2019-01-24 2019-02-01
5 E 2019-02-01 2019-02-01
6 F 2019-02-15 2019-02-15
7 G 2019-02-10 2019-02-15
8 H 2019-03-12 2019-03-15
9 I 2019-03-01 2019-03-01
10 J 2019-03-15 2019-03-15
11 K 2019-03-25 2019-03-29