I have several periods separated by 6 months. Initial "in_talls_temp_6" and Ends "f_talls_temp_6".
`in_talls_temp_6 <- seq.Date(from=i_preImp_preref, to=f_postImp, by="6 months")
f_talls_temp_6 <- in_talls_temp_6 + months(6) - days(1)
I have data like this:
name <- paste0("time_point", seq(1:13))
a <- round(runif(length(name), 200, 500), 0)
data <- data.frame(name, a)
name a
1 time_point1 361
2 time_point2 444
3 time_point3 221
4 time_point4 434
5 time_point5 400
6 time_point6 438
7 time_point7 411
8 time_point8 367
9 time_point9 409
10 time_point10 337
11 time_point11 481
12 time_point12 201
13 time_point13 417
And I want to assign to each "time_point" x their initial and ending dates of the period.
Until now I have done it the silly way:
data %>%
mutate( i.date.time.point = case_when (name == "time_point1" ~ in_talls_temp_6[1],
name == "time_point2" ~ in_talls_temp_6[2],
name == "time_point3" ~ in_talls_temp_6[3],
name == "time_point4" ~ in_talls_temp_6[4],
name == "time_point5" ~ in_talls_temp_6[5],
name == "time_point6" ~ in_talls_temp_6[6],
name == "time_point7" ~ in_talls_temp_6[7],
name == "time_point8" ~ in_talls_temp_6[8],
name == "time_point9" ~ in_talls_temp_6[9],
name == "time_point10" ~ in_talls_temp_6[10],
name == "time_point11" ~ in_talls_temp_6[11],
name == "time_point12" ~ in_talls_temp_6[12],
name == "time_point13" ~ in_talls_temp_6[13]) ) %>%
mutate( f.date.time.point = case_when (name == "time_point1" ~ f_talls_temp_6[1],
name == "time_point2" ~ f_talls_temp_6[2],
name == "time_point3" ~ f_talls_temp_6[3],
name == "time_point4" ~ f_talls_temp_6[4],
name == "time_point5" ~ f_talls_temp_6[5],
name == "time_point6" ~ f_talls_temp_6[6],
name == "time_point7" ~ f_talls_temp_6[7],
name == "time_point8" ~ f_talls_temp_6[8],
name == "time_point9" ~ f_talls_temp_6[9],
name == "time_point10" ~ f_talls_temp_6[10],
name == "time_point11" ~ f_talls_temp_6[11],
name == "time_point12" ~ f_talls_temp_6[12],
name == "time_point13" ~ f_talls_temp_6[13])
)
Getting this:
name a i.date.time.point f.date.time.point
1 time_point1 361 2014-07-01 2014-12-31
2 time_point2 444 2015-01-01 2015-06-30
3 time_point3 221 2015-07-01 2015-12-31
4 time_point4 434 2016-01-01 2016-06-30
5 time_point5 400 2016-07-01 2016-12-31
6 time_point6 438 2017-01-01 2017-06-30
7 time_point7 411 2017-07-01 2017-12-31
8 time_point8 367 2018-01-01 2018-06-30
9 time_point9 409 2018-07-01 2018-12-31
10 time_point10 337 2019-01-01 2019-06-30
11 time_point11 481 2019-07-01 2019-12-31
12 time_point12 201 2020-01-01 2020-06-30
13 time_point13 417 2020-07-01 2020-12-31
I think that there is a better way and I'm not capable of doing it. I'm stucked here cause I want to get bigger with the project and now I want to do the same with:
in_talls_temp_3 <- seq.Date(from=i_preImp_preref, to=f_postImp, by="3 months")
f_talls_temp_3 <- in_talls_temp_3 + months(3) - days(1)
More time_points. And this probably could grow in the future...
I have thought about a ¿recursive function? (is this the proper name to it?) like this (just an idea):
repeat_v <- function(x){
n <- length(x)
for (y in 1:n) {
return(x[[y]])
}
}
I dunno If it's the right way to do it is with a for loop (apply would be better?). Also I doubt with the idea and don't know if it's good for the job or I will regret it later because will be time consuming..
Any ideas?
Any thoughts will be appreciated! ^^
We can just use standard R [ subsetting:
n = readr::parse_number(data$name)
data$i.date.time.point = in_talls_temp_6[n]
data$f.date.time.point = f_talls_temp_6[n]
# name a i.date.time.point f.date.time.point
# 1 time_point1 267 2014-07-01 2014-12-31
# 2 time_point2 208 2015-01-01 2015-06-30
# 3 time_point3 332 2015-07-01 2015-12-31
# 4 time_point4 325 2016-01-01 2016-06-30
# 5 time_point5 455 2016-07-01 2016-12-31
# 6 time_point6 345 2017-01-01 2017-06-30
# 7 time_point7 425 2017-07-01 2017-12-31
# 8 time_point8 212 2018-01-01 2018-06-30
# 9 time_point9 359 2018-07-01 2018-12-31
# 10 time_point10 297 2019-01-01 2019-06-30
# 11 time_point11 230 2019-07-01 2019-12-31
# 12 time_point12 334 2020-01-01 2020-06-30
# 13 time_point13 457 2020-07-01 2020-12-31
Just do:
generate_df <- function(months, time_points, min_val=200, max_val=500,
from=i_preImp_preref,
to=f_postImp) {
dates <- seq.Date(from=from, to=to, by=paste0(months, " months"))
data.frame(name = paste0("time_point", 1:time_points),
a = round(runif(length(name), min_val, max_val), 0),
i.date.time.point = dates,
f_talls_temp_3 = dates + months(months) - days(1))
}
The first df would be sth like:
generate_df(6, 13, 200, 500, i_preImp_preref, postImp)
And the second:
generate_df(3, 13, 200, 500, i_preImp_preref, postImp)
Related
Input table:
Date Qty
2017-01-01 234
2017-01-08 123
2017-01-15 445
2017-01-22 113
2017-01-29 674
2018-02-05 120
2018-02-12 921
2018-02-19 732
2018-02-26 634
2018-03-05 711
Expected table:
Date Qty
2017-01-01 234
2017-01-08 123
2017-01-15 445
2017-01-22 113
2017-01-29 708.28 #674+(120/7 * 2)
2018-02-05 85.71 #(120/7 * 5)
2018-02-12 921
2018-02-19 732
2018-02-26 837.14 #634+(711/7 * 2)
2018-03-05 507.85 #(711/7 * 5)
In the above o/p table, the quantity belonging to the first date of the new month is expected to split to the last date of the past month using the weekly proportions.
Eg:
2017-02-26 had a quantity of 634 and 2018-03-05 had 711
So, the quantity 711 is split by 7 (#days in a week) i.e. 711/7 = 101.571 and the month of February has 28 days in general so 2 shares (#days left in February as the present date of that row is 2017-02-26) of 101.571 are added to the existing quantity of 2017-02-26, thus making it 634+(101.571*2) => 634+203.14 => 837.14 (as you can observe in the expected table). Similarly the remaining 2 shares are deducted from the 2018-03-05 and now it remains with 5 shares (#days of the first week of the present month as the present date of that row is 2018-03-05) ie 711/5 => 507.85 (as you can observe in the expected table).
Using R how should I generalise this situation?
Does this answer:
> library(dplyr)
> first_day_of_month_wday <- function(dx) {
+ day(dx) <- 1
+ wday(dx)
+ }
> fil <- ceiling((day(df$Date) + first_day_of_month_wday(df$Date) - 1) / 7)
>
> df %>% mutate(Qty1 = case_when(fil > 4 ~ Qty + (days_in_month(df$Date) - day(Date)) * lead(Qty)/7, TRUE ~ Qty)) %>%
+ mutate(Qty1 = case_when(lag(fil) > 4 ~ Qty/7 * day(Date), TRUE ~ Qty1)) %>% select(-Qty) %>% rename(Qty = Qty1)
# A tibble: 10 x 2
Date Qty
<date> <dbl>
1 2017-01-01 234
2 2017-01-08 123
3 2017-01-15 445
4 2017-01-22 113
5 2017-01-29 708.
6 2018-02-05 85.7
7 2018-02-12 921
8 2018-02-19 732
9 2018-02-26 837.
10 2018-03-05 508.
>
PS: Used first_day_of_month_wday function from R: How to get the Week number of the month.
subset data e.g. all previous year and store as new object.
mtdl <- na.omit(getSymbols("MTDL.JK", auto.assign = F, src = "yahoo", periodicity = "weekly"))
week.year.mtdl <- mtdl %>%
filter(DATE >= as.Date("2018-01-01") & DATE <= as.Date("2018-12-31"))
Here are a few ways to go about this if you want to use dplyr.
1 transform xts into data.frame
df_mtdl <- data.frame(date = index(mtdl), coredata(mtdl))
week.year.mtdl <- df_mtdl %>%
filter(date >= as.Date("2018-01-01") & date <= as.Date("2018-12-31"))
head(week.year.mtdl)
date MTDL.JK.Open MTDL.JK.High MTDL.JK.Low MTDL.JK.Close MTDL.JK.Volume MTDL.JK.Adjusted
1 2018-01-01 650 650 620 630 78200 609.6684
2 2018-01-08 630 650 610 610 291800 590.3138
3 2018-01-15 610 750 600 700 9390700 677.4093
4 2018-01-22 700 730 640 700 6816200 677.4093
5 2018-01-29 700 745 685 685 119900 662.8934
6 2018-02-05 695 715 630 635 1533000 614.5070
2 use tidyquant. This returns a tibble instead of an xts object. Tidyquant is built on top of quantmod and a lot of other packages.
library(tidyquant)
tq_mtdl <- tq_get("MTDL.JK", complete_cases = TRUE, periodicity = "weekly")
week.year.mtdl <- tq_mtdl %>%
filter(date >= as.Date("2018-01-01") & date <= as.Date("2018-12-31"))
head(week.year.mtdl)
# A tibble: 6 x 7
date open high low close volume adjusted
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2018-01-04 645 645 620 625 137000 605.
2 2018-01-11 620 660 600 645 1460000 624.
3 2018-01-18 645 750 635 660 13683700 639.
4 2018-01-25 680 745 665 685 1359700 663.
5 2018-02-01 700 715 675 700 922200 677.
6 2018-02-08 695 695 630 690 673700 668.
Or use packages timetk (used as part of tidyquant) or tsbox to transform the data from xts to data.frame or tibble.
This will give 2018 points of an xts object
mtdl["2018"]
All of these also work:
subset(mtdl, time(.) >= "2018-01-01" & time(.) <= "2018-12-31")
subset(mtdl, start = "2018-01-01", end = "2018-12-31")
window(mtdl, start = "2018-01-01", end = "2018-12-31")
dates <- seq(as.Date("2008-01-01"), as.Date("2008-12-31"), "day")
window(mtdl, dates)
mtdl[dates] # dates is from above
mtdl[ format(time(mtdl), "%Y") == 2018 ]
My codes executes just fine, however it takes an enourmous amount of time to finalize. Would like some help to optimize the code, if possible, a way to execute a rolling aggregation on multiple columns.
I've been tring a few other ways by creating a function and vectorizing my dataframe with library(data.table), but no success in doing so, i actually get half of what i should get and I can only do with one column at a time.
# Creating functions
fun <- function(x, date, thresh) {
D <- as.matrix(dist(date)) #distance matrix between dates
D <- D <= thresh
D[lower.tri(D)] <- FALSE #don't sum to future
R <- D * x #FALSE is treated as 0
colMeans(R, na.rm = TRUE)
}
setDT(df_2)
df_2[, invoiceDate := as.Date(invoiceDate, format = "%m/%d/%Y")]
setkey(df_2, cod_unb, cod_pdv, invoiceDate)
df_2[, volume_total_diario_RT30 := fun(volume_total_diario, invoiceDate, 30), by = list(cod_unb, cod_pdv)]
This is my current code that works fine, but takes too much time (Over 8h to process 30 days)
years <- c(2017:2019)
months <- c(1:12)
days <- c(1:31)
df_final <- df_n[1,c('cod_unb','cod_pdv','cpf_cnpj','idade_pdv_meses','status_telefone','col1','col2','col3','year','month','day')] #eliminating first line
for (i in years) {
for (j in months) {
for (k in days) {
if (j == 1){
df_temp <- df_n[(df_n$years == i & df_n$months == j & df_n$days <= k) | (df_n$years == (i-1) & df_n$months == 12 & df_n$days >= k),]
}
if (j != 1){
df_temp <- df_n[(df_n$years == i & df_n$months == j & df_n$days <= k) | (df_n$years == i & df_n$months == (j - 1) & df_n$days >= k),]
}
#Agreggate.
if(nrow(df_temp) >= 1){
df_temp <- aggregate(df_temp[, c('col1','col2','col3')], by = list(df_temp$cod_unb,df_temp$cod_pdv,df_temp$cpf_cnpj,df_temp$idade_pdv_meses,df_temp$status_telefone), FUN = mean)
names(df_temp)[names(df_temp) == "Group.1"] <- "cod_unb"
names(df_temp)[names(df_temp) == "Group.2"] <- "cod_pdv"
names(df_temp)[names(df_temp) == "Group.3"] <- "cpf_cnpj"
names(df_temp)[names(df_temp) == "Group.4"] <- "idade_pdv_meses"
names(df_temp)[names(df_temp) == "Group.5"] <- "status_telefone"
df_temp$years <- i
df_temp$months <- j
df_temp$days <- k
df_final <- rbind(df_final,df_temp)
}
}
}
}
df_final <- df_final[-1,]
Output should be column R30
cod_unb;cod_pdv;Years;Months;Days;date;volume_total_diario;R30
111;1005;2018;11;3;03/11/2018;0.48;
111;1005;2018;11;9;09/11/2018;0.79035;
111;1005;2018;11;16;16/11/2018;1.32105;
111;1005;2018;11;24;24/11/2018;0.6414;
111;1005;2018;11;30;30/11/2018;0.6;
111;1005;2018;12;7;07/12/2018;1.79175;1.02891
111;1005;2018;12;15;15/12/2018;1.4421;1.15926
111;1005;2018;12;21;21/12/2018;0.48;0.99105
111;1005;2018;12;28;28/12/2018;0.5535;0.97347
111;1005;2019;1;4;04/01/2019;0.36;0.92547
If I understand correctly, the OP has requested to aggregate values over a rolling period of 30 days and to append these aggregates to the original data.
This can be solved efficiently by aggregating in a non-equi join.
Here is an example for one variable using sample data provided by the OP:
library(data.table)
# coerce to data.table, coerce character date to class IDate
setDT(df_n)[, date := as.IDate(date, "%d/%m/%Y")]
# intermediate result for demonstration:
df_n[.(upper = date, lower = date - 30), on = .(date <= upper, date >= lower),
mean(volume_total_diario), by = .EACHI]
date date V1
1: 2018-11-03 2018-10-04 0.480000
2: 2018-11-09 2018-10-10 0.635175
3: 2018-11-16 2018-10-17 0.863800
4: 2018-11-24 2018-10-25 0.808200
5: 2018-11-30 2018-10-31 0.766560
6: 2018-12-07 2018-11-07 1.028910
7: 2018-12-15 2018-11-15 1.159260
8: 2018-12-21 2018-11-21 0.991050
9: 2018-12-28 2018-11-28 0.973470
10: 2019-01-04 2018-12-05 0.925470
The intermediate result shows the upper and lower limits of the date range included in the aggregation and the aggragated values for the respective periods. This can be used to add a new column to df_n:
# update df_n by appending new column
setDT(df_n)[, R30_new := df_n[.(upper = date, lower = date - 30), on = .(date <= upper, date >= lower),
mean(volume_total_diario), by = .EACHI]$V1]
df_n
cod_unb cod_pdv Years Months Days date volume_total_diario R30 R30_new
1: 111 1005 2018 11 3 2018-11-03 0.48000 NA 0.480000
2: 111 1005 2018 11 9 2018-11-09 0.79035 NA 0.635175
3: 111 1005 2018 11 16 2018-11-16 1.32105 NA 0.863800
4: 111 1005 2018 11 24 2018-11-24 0.64140 NA 0.808200
5: 111 1005 2018 11 30 2018-11-30 0.60000 NA 0.766560
6: 111 1005 2018 12 7 2018-12-07 1.79175 1.02891 1.028910
7: 111 1005 2018 12 15 2018-12-15 1.44210 1.15926 1.159260
8: 111 1005 2018 12 21 2018-12-21 0.48000 0.99105 0.991050
9: 111 1005 2018 12 28 2018-12-28 0.55350 0.97347 0.973470
10: 111 1005 2019 1 4 2019-01-04 0.36000 0.92547 0.925470
The values of R30 and R30_new are identical; R30_new contains also results for the first 5 rows.
Caveat
Additional grouping variables have been ignored for the sake of clarity but can be included easily. Also, the solution can be extended to aggregate multiple value columns.
Data
library(data.table)
df_n <- fread("
cod_unb;cod_pdv;Years;Months;Days;date;volume_total_diario;R30
111;1005;2018;11;3;03/11/2018;0.48;
111;1005;2018;11;9;09/11/2018;0.79035;
111;1005;2018;11;16;16/11/2018;1.32105;
111;1005;2018;11;24;24/11/2018;0.6414;
111;1005;2018;11;30;30/11/2018;0.6;
111;1005;2018;12;7;07/12/2018;1.79175;1.02891
111;1005;2018;12;15;15/12/2018;1.4421;1.15926
111;1005;2018;12;21;21/12/2018;0.48;0.99105
111;1005;2018;12;28;28/12/2018;0.5535;0.97347
111;1005;2019;1;4;04/01/2019;0.36;0.92547
")
EDIT: Aggregating multiple variables
As the OP has asked for a way to execute a rolling aggregation on multiple columns here is an example.
First, we need to create an additional value var in OP's sample dataset:
df_n <- fread("
cod_unb;cod_pdv;Years;Months;Days;date;volume_total_diario;R30
111;1005;2018;11;3;03/11/2018;0.48;
111;1005;2018;11;9;09/11/2018;0.79035;
111;1005;2018;11;16;16/11/2018;1.32105;
111;1005;2018;11;24;24/11/2018;0.6414;
111;1005;2018;11;30;30/11/2018;0.6;
111;1005;2018;12;7;07/12/2018;1.79175;1.02891
111;1005;2018;12;15;15/12/2018;1.4421;1.15926
111;1005;2018;12;21;21/12/2018;0.48;0.99105
111;1005;2018;12;28;28/12/2018;0.5535;0.97347
111;1005;2019;1;4;04/01/2019;0.36;0.92547
")[
, date := as.IDate(date, "%d/%m/%Y")][, var2 := .I][]
df_n
cod_unb cod_pdv Years Months Days date volume_total_diario R30 var2
1: 111 1005 2018 11 3 2018-11-03 0.48000 NA 1
2: 111 1005 2018 11 9 2018-11-09 0.79035 NA 2
3: 111 1005 2018 11 16 2018-11-16 1.32105 NA 3
4: 111 1005 2018 11 24 2018-11-24 0.64140 NA 4
5: 111 1005 2018 11 30 2018-11-30 0.60000 NA 5
6: 111 1005 2018 12 7 2018-12-07 1.79175 1.02891 6
7: 111 1005 2018 12 15 2018-12-15 1.44210 1.15926 7
8: 111 1005 2018 12 21 2018-12-21 0.48000 0.99105 8
9: 111 1005 2018 12 28 2018-12-28 0.55350 0.97347 9
10: 111 1005 2019 1 4 2019-01-04 0.36000 0.92547 10
So, a column var2 has been added (which simply contains the row number).
This is the code to aggregate multiple column using the same aggregation function:
cols <- c("volume_total_diario", "var2")
setDT(df_n)[, paste0("mean_", cols) :=
df_n[.(upper = date, lower = date - 30),
on = .(date <= upper, date >= lower),
lapply(.SD, mean),
.SDcols = cols, by = .EACHI][
, .SD, .SDcols = cols]][]
df_n
cod_unb cod_pdv Years Months Days date volume_total_diario R30 var2 mean_volume_total_diario mean_var2
1: 111 1005 2018 11 3 2018-11-03 0.48000 NA 1 0.480000 1.0
2: 111 1005 2018 11 9 2018-11-09 0.79035 NA 2 0.635175 1.5
3: 111 1005 2018 11 16 2018-11-16 1.32105 NA 3 0.863800 2.0
4: 111 1005 2018 11 24 2018-11-24 0.64140 NA 4 0.808200 2.5
5: 111 1005 2018 11 30 2018-11-30 0.60000 NA 5 0.766560 3.0
6: 111 1005 2018 12 7 2018-12-07 1.79175 1.02891 6 1.028910 4.0
7: 111 1005 2018 12 15 2018-12-15 1.44210 1.15926 7 1.159260 5.0
8: 111 1005 2018 12 21 2018-12-21 0.48000 0.99105 8 0.991050 6.0
9: 111 1005 2018 12 28 2018-12-28 0.55350 0.97347 9 0.973470 7.0
10: 111 1005 2019 1 4 2019-01-04 0.36000 0.92547 10 0.925470 8.0
Note that the new columns have been named programmtically.
I have a data frame where for each day, I have several prices.
I would like to modify my data frame with the following code :
newdf <- Data %>%
filter(
if (Data$Date == Data$Echeance) {
Data$Close == lag(Data$Close,1)
} else {
Data$Close == Data$Close
}
)
However, it is not giving me what I want, that is :
create a new data frame where the variable Close takes its normal value, unless the day of Date is equal to the day of Echeance. In this case, take the following Close value.
I added filter because I wanted to remove the duplicate dates, and keep only one date per day where Close satisfies the condition above.
There is no error message, it just doesn't give me the right database.
Here is a glimpse of my data:
Date Echeance Compens. Open Haut Bas Close
1 1998-03-27 00:00:00 1998-09-10 00:00:00 125. 828 828 820 820. 197
2 1998-03-27 00:00:00 1998-11-10 00:00:00 128. 847 847 842 842. 124
3 1998-03-27 00:00:00 1999-01-11 00:00:00 131. 858 858 858 858. 2
4 1998-03-30 00:00:00 1998-09-10 00:00:00 125. 821 821 820 820. 38
5 1998-03-30 00:00:00 1998-11-10 00:00:00 129. 843 843 843 843. 1
6 1998-03-30 00:00:00 1999-01-11 00:00:00 131. 860 860 860 860. 5
Thanks a lot in advance.
Sounds like a use case for ifelse, with dplyr:
library(dplyr)
Data %>%
mutate(Close = ifelse(Date==Echeance, lead(Close,1), Close))
Here an example:
dat %>%
mutate(var_new = ifelse(date1==date2, lead(var,1), var))
# A tibble: 3 x 4
# date1 date2 var var_new
# <date> <date> <int> <int>
# 1 2018-03-27 2018-03-27 10 11
# 2 2018-03-28 2018-01-01 11 11
# 3 2018-03-29 2018-02-01 12 12
The function lead will move the vector by 1 position. Also note that I created a var_new just to show the difference, but you can mutate directly var.
Data used:
dat <- tibble(date1 = seq(from=as.Date("2018-03-27"), to=as.Date("2018-03-29"), by="day"),
date2 = c(as.Date("2018-03-27"), as.Date("2018-01-01"), as.Date("2018-02-01")),
var = 10:12)
dat
# A tibble: 3 x 3
# date1 date2 var
# <date> <date> <int>
# 1 2018-03-27 2018-03-27 10
# 2 2018-03-28 2018-01-01 11
# 3 2018-03-29 2018-02-01 12
I would like to create a column of 0s and 1s based on inequalities of three columns of dates.
The idea is the following. If event_date is before death_date or study_over, the the column event should be ==1, if event_date occurs after death_date or study_over, event should be == 0. Both event_date and death_date may contain NAs.
set.seed(1337)
rand_dates <- Sys.Date() - 365:1
df <-
data.frame(
event_date = sample(rand_dates, 20),
death_date = sample(rand_dates, 20),
study_over = sample(rand_dates, 20)
)
My attempt was the following
eventR <-
function(x, y, z){
if(is.na(y)){
ifelse(x <= z, 1, 0)
} else if(y <= z){
ifelse(x < y, 1, 0)
} else {
ifelse(x <= z, 1, 0)
}
}
I use it in the following manner
library(dplyr)
df[c(3, 5, 7), "event_date"] <- NA #there are some NA in .$event_date
df[c(3, 4, 6), "death_date"] <- NA #there are some NA in .$death_date
df %>%
mutate(event = sapply(.$event_date, eventR, y = .$death_date, z = .$study_over))
##Error: wrong result size (400), expected 20 or 1
##In addition: There were 40 warnings (use warnings() to see them)
I can't figure out how to do this. Any suggestions?
This would seem to construct a binary column (with NA's where needed) where 1 indicates "event_date is before death_date or study_over" and 0 is used elsewhere. As already pointed out your specification does not cover all cases:
df$event <- with(df, as.numeric( event_date < pmax( death_date , study_over) ) )
df
Can use pmap_dbl() from the purrr package instead of sapply...
library(dplyr)
library(purrr)
df %>% mutate(event = pmap_dbl(list(event_date, death_date, study_over), eventR))
event_date death_date study_over event
1 2016-10-20 2017-01-27 2016-12-16 1
2 2016-10-15 2016-12-12 2017-01-20 1
3 <NA> <NA> 2016-10-09 NA
4 2016-09-04 <NA> 2016-11-17 1
5 <NA> 2016-10-13 2016-06-09 NA
6 2016-07-21 <NA> 2016-04-26 0
7 <NA> 2017-02-21 2016-07-12 NA
8 2016-07-02 2017-02-08 2016-08-24 1
9 2016-06-19 2016-09-07 2016-04-11 0
10 2016-05-14 2017-03-13 2016-08-03 1
11 2017-03-06 2017-02-05 2017-02-28 0
12 2017-03-10 2016-04-28 2016-11-30 0
13 2017-01-10 2016-12-10 2016-10-27 0
14 2016-05-31 2016-06-12 2016-08-13 1
15 2017-03-03 2016-12-25 2016-12-20 0
16 2016-04-01 2016-11-03 2016-06-30 1
17 2017-02-26 2017-02-25 2016-05-12 0
18 2017-02-08 2016-12-08 2016-10-14 0
19 2016-07-19 2016-07-03 2016-09-22 0
20 2016-06-17 2016-06-06 2016-11-09 0
You might also be interested in the dplyr function, case_when() for handling many if else statements.