Below is an example of a table I am working with.
df = data.frame(Test_ID = c('a1','a1','a1','a1','a1','a1','a1','a2','a2','a2','a2','a2','a2'),
Event_ID = c('Failure_x', 'Failure_x', 'Failure_y', 'Failure_y', 'Failure_x',
'Failure_x', 'Failure_y', 'Failure_x', 'Failure_y', 'Failure_y',
'Failure_x','Failure_x', 'Failure_y'),
Fail_Date = c('2018-10-10 17:52:20', '2018-10-11 17:02:16', '2018-10-14 12:52:20',
'2018-11-11 16:18:34', '2018-11-12 17:03:06', '2018-11-25 10:50:10',
'2018-12-01 10:28:50', '2018-09-12 19:02:08', '2018-09-20 11:32:25',
'2018-10-13 14:43:30', '2018-10-15 14:22:28', '2018-10-30 21:55:45',
'2018-11-17 11:53:35'))
I want to subtract the failure dates (by Test_ID) only where Failure_y occurs after Failure_x. The Fail_Date for Event_ID Failure_y will be subtracted from the Fail_Date for Event_ID Failure_x. Within a group I can have multiple Failure_y's. The second Failure_y will be subtracted from the Failure_x occurring after the first instance of Failure_y.
I have tried to use dplyr to create a column TIME_BETWEEN_FAILURES.
library(lubridate)
df$Fail_Date = as.POSIXct(as.character(as.factor(df$Fail_Date)),format="%Y-%m-%d %H:%M:%S")
df = df %>% group_by(Test_ID) %>%
mutate(TIME_BETWEEN_FAILURES = ifelse(Event_ID == "Failure_y" & lag(Event_ID) == "Failure_x",
difftime(Fail_Date, first(Fail_Date),units = "hours"),''))`
I was able to create the Time_BETWEEN_FAILURES correctly only for the first instance using first() in dplyr. That's where I am currently stuck. Any help on this matter will be appreciated.
This is result from the code snippet above.
Output required for analysis.
This is ideal response needed for my analysis.
Thanks.
Cheers.
df %>%
group_by(gr = rev(cumsum(rev(Event_ID)=="Failure_y")), Test_ID) %>%
mutate(time_between_failures = ifelse(n() > 1 & Event_ID=="Failure_y", difftime(Fail_Date[n()], Fail_Date[1L], units = "hours"), NA))
# A tibble: 13 x 5
# Groups: gr, Test_ID [6]
Test_ID Event_ID Fail_Date gr time_between_failures
<fct> <fct> <dttm> <int> <dbl>
1 a1 Failure_x 2018-10-10 17:52:20 6 NA
2 a1 Failure_x 2018-10-11 17:02:16 6 NA
3 a1 Failure_y 2018-10-14 12:52:20 6 91
4 a1 Failure_y 2018-11-11 16:18:34 5 NA
5 a1 Failure_x 2018-11-12 17:03:06 4 NA
6 a1 Failure_x 2018-11-25 10:50:10 4 NA
7 a1 Failure_y 2018-12-01 10:28:50 4 449.
8 a2 Failure_x 2018-09-12 19:02:08 3 NA
9 a2 Failure_y 2018-09-20 11:32:25 3 185.
10 a2 Failure_y 2018-10-13 14:43:30 2 NA
11 a2 Failure_x 2018-10-15 14:22:28 1 NA
12 a2 Failure_x 2018-10-30 21:55:45 1 NA
13 a2 Failure_y 2018-11-17 11:53:35 1 790.
Related
I have two datasets that I would like to join based on date. One is a survey dataset, and the other is a list of prices at various dates. The dates don't match exactly, so I would like to join on the nearest date in the survey dataset (the price data is weekly).
Here's a brief snippet of what the survey dataset looks like (there are many other variables, but here's the two most relevant):
ID
actual.date
20120377
2012-09-26
2020455822
2020-11-23
20126758
2012-10-26
20124241
2012-10-25
2020426572
2020-11-28
And here's the price dataset (also much larger, but you get the idea):
date
price.var1
price.var2
2017-10-30
2.74733926399869
2.73994826674735
2015-03-16
2.77028200438506
2.74079930272231
2010-10-18
3.4265947805337
3.41591263539176
2012-10-29
4.10095806545397
4.14717556976502
2012-01-09
3.87888859352037
3.93074237884497
What I would like to do is join the price dataset to the survey dataset, joining on the nearest date.
I've tried a number of different things, none of which have worked to my satisfaction.
#reading in sample data
library(data.table)
library(dplyr)
survey <- fread(" ID actual.date
1: 20120377 2012-09-26
2: 2020455822 2020-11-23
3: 20126758 2012-10-26
4: 20124241 2012-10-25
5: 2020426572 2020-11-28
> ") %>% select(-V1)
price <- fread("date price.var1 price.var2
1: 2017-10-30 2.747339 2.739948
2: 2015-03-16 2.770282 2.740799
3: 2010-10-18 3.426595 3.415913
4: 2012-10-29 4.100958 4.147176
5: 2012-01-09 3.878889 3.930742") %>% select(-V1)
#using data.table
setDT(survey)[,DT_DATE := actual.date]
setDT(price)[,DT_DATE := date]
survey_price <- survey[price,on=.(DT_DATE),roll="nearest"]
#This works, and they join, but it drops a ton of observations, which won't work
#using dplyr
library(dplyr)
survey_price <- left_join(survey,price,by=c("actual.date"="date"))
#this joins them without dropping observations, but all of the price variables become NAs
You were almost there.
In the DT[i,on] syntax, i should be survey to join on all its rows
setDT(survey)
setDT(price)
survey_price <- price[survey,on=.(date=actual.date),roll="nearest"]
survey_price
date price.var1 price.var2 ID
<IDat> <num> <num> <int>
1: 2012-09-26 4.100958 4.147176 20120377
2: 2020-11-23 2.747339 2.739948 2020455822
3: 2012-10-26 4.100958 4.147176 20126758
4: 2012-10-25 4.100958 4.147176 20124241
5: 2020-11-28 2.747339 2.739948 2020426572
Convert the dates to numeric and find the closest date from the survey for price with Closest() from DescTools, and take that value.
Example datasets
survey <- tibble(
ID = sample(20000:40000, 9, replace = TRUE),
actual.date = seq(today() %m+% days(5), today() %m+% days(5) %m+% months(2),
"week")
)
price <- tibble(
date = seq(today(), today() %m+% months(2), by = "week"),
price_1 = sample(2:6, 9, replace = TRUE),
price_2 = sample(2:6, 9, replace = TRUE)
)
survey
# A tibble: 9 x 2
ID actual.date
<int> <date>
1 34592 2022-05-07
2 37846 2022-05-14
3 22715 2022-05-21
4 22510 2022-05-28
5 30143 2022-06-04
6 34348 2022-06-11
7 21538 2022-06-18
8 39802 2022-06-25
9 36493 2022-07-02
price
# A tibble: 9 x 3
date price_1 price_2
<date> <int> <int>
1 2022-05-02 6 6
2 2022-05-09 3 2
3 2022-05-16 6 4
4 2022-05-23 6 2
5 2022-05-30 2 6
6 2022-06-06 2 4
7 2022-06-13 2 2
8 2022-06-20 3 5
9 2022-06-27 5 6
library(tidyverse)
library(lubridate)
library(DescTools)
price <- price %>%
mutate(date = Closest(survey$actual.date %>%
as.numeric, date %>%
as.numeric) %>%
as_date())
# A tibble: 9 x 3
date price_1 price_2
<date> <int> <int>
1 2022-05-07 6 6
2 2022-05-14 3 2
3 2022-05-21 6 4
4 2022-05-28 6 2
5 2022-06-04 2 6
6 2022-06-11 2 4
7 2022-06-18 2 2
8 2022-06-25 3 5
9 2022-07-02 5 6
merge(survey, price, by.x = "actual.date", by.y = "date")
actual.date ID price_1 price_2
1 2022-05-07 34592 6 6
2 2022-05-14 37846 3 2
3 2022-05-21 22715 6 4
4 2022-05-28 22510 6 2
5 2022-06-04 30143 2 6
6 2022-06-11 34348 2 4
7 2022-06-18 21538 2 2
8 2022-06-25 39802 3 5
9 2022-07-02 36493 5 6
I have a dataframe that looks like this:
CYCLE date_cycle Randomization_Date COUPLEID
1 0 2016-02-16 10892
2 1 2016-08-17 2016-02-19 10894
3 1 2016-08-14 2016-02-26 10899
4 1 2016-02-26 10900
5 2 2016-03--- 2016-02-26 10900
6 3 2016-07-19 2016-02-26 10900
7 4 2016-11-15 2016-02-26 10900
8 1 2016-02-27 10901
9 2 2016-02--- 2016-02-27 10901
10 1 2016-03-27 2016-03-03 10902
11 2 2016-04-21 2016-03-03 10902
12 1 2016-03-03 10903
13 2 2016-03--- 2016-03-03 10903
14 0 2016-03-03 10904
15 1 2016-03-03 10905
16 2 2016-03-03 10905
17 3 2016-03-03 10905
18 4 2016-04-14 2016-03-03 10905
19 5 2016-05--- 2016-03-03 10905
20 6 2016-06--- 2016-03-03 10905
The goal is to fill in the missing day for a given ID using either an earlier or later date and add/subtract 28 from that.
The date_cycle variable was originally in the dataframe as a character type.
I have tried to code it as follows:
mutate(rowwise(df),
newdate = case_when( str_count(date1, pattern = "\\W") >2 ~ lag(as.Date.character(date1, "%Y-%m-%d"),1) + days(28)))
But I need to incorporate it by ID by CYCLE.
An example of my data could be made like this:
data.frame(stringsAsFactors = FALSE,
CYCLE =(0,1,1,1,2,3,4,1,2,1,2,1,2,0,1,2,3,4,5,6),
date_cycle = c(NA,"2016-08-17", "2016-08-14",NA,"2016-03---","2016-07-19", "2016-11-15",NA,"2016-02---", "2016-03-27","2016-04-21",NA, "2016-03---",NA,NA,NA,NA,"2016-04-14", "2016-05---","2016-06---"), Randomization_Date = c("2016-02-16","2016-02-19",
"2016-02-26","2016-02-26",
"2016-02-26","2016-02-26",
"2016-02-26",
"2016-02-27","2016-02-27",
"2016-03-03",
"2016-03-03","2016-03-03",
"2016-03-03","2016-03-03",
"2016-03-03",
"2016-03-03","2016-03-03",
"2016-03-03",
"2016-03-03","2016-03-03"),
COUPLEID = c(10892,10894,10899,10900,
10900,10900,10900,10901,10901,
10902,10902,10903,10903,10904,
10905,10905,10905,10905,10905,10905)
)
The output I am after would look like:
COUPLEID CYCLE date_cycle new_date_cycle
a 1 2014-03-27 2014-03-27
a 1 2014-04--- 2014-04-24
b 1 2014-03-24 2014-03-24
b 2 2014-04-21
b 3 2014-05--- 2014-05-19
c 1 2014-04--- 2014-04-02
c 2 2014-04-30 2014-04-30
I have also started to make a long conditional, but I wanted to ask here and see if anyone new of a more straight forward way to do it, instead of explicitly writing out all of the possible conditions.
mutate(rowwise(df),
newdate = case_when(
grp == 1 & str_count(date1, pattern = "\\W") >2 & !is.na(lead(date1,1) ~ lead(date1,1) - days(28),
grp == 2 & str_count(date1, pattern = "\\W") >2 & !is.na(lead(date1,1)) ~ lead(date1,1) - days(28),
grp == 3 & str_count(date1, pattern = "\\W") >2 & ...)))
Function to fill dates forward and backwards
filldates <- function(dates) {
m = which(!is.na(dates))
if(length(m)>0 & length(m)!=length(dates)) {
if(m[1]>1) for(i in seq(m,1,-1)) if(is.na(dates[i])) dates[i]=dates[i+1]-28
if(sum(is.na(dates))>0) for(i in seq_along(dates)) if(is.na(dates[i])) dates[i] = dates[i-1]+28
}
return(dates)
}
Usage:
data %>%
arrange(ID, grp) %>%
group_by(ID) %>%
mutate(date2=filldates(as.Date(date1,"%Y-%m-%d")))
Ouput:
ID grp date1 date2
<chr> <dbl> <chr> <date>
1 a 1 2014-03-27 2014-03-27
2 a 2 2014-04--- 2014-04-24
3 b 1 2014-03-24 2014-03-24
4 b 2 2014-04--- 2014-04-21
5 b 3 2014-05--- 2014-05-19
6 c 1 2014-03--- 2014-04-02
7 c 2 2014-04-30 2014-04-30
An option using purrr::accumulate().
library(tidyverse)
center <- df %>%
group_by(ID) %>%
mutate(helpDate = ymd(str_replace(date1, '---', '-01')),
refDate = max(ymd(date1), na.rm = T))
backward <- center %>%
filter(refDate == max(helpDate)) %>%
mutate(date2 = accumulate(refDate, ~ . - days(28), .dir = 'backward'))
forward <- center %>%
filter(refDate == min(helpDate)) %>%
mutate(date2 = accumulate(refDate, ~ . + days(28)))
bind_rows(forward, backward) %>%
ungroup() %>%
mutate(date2 = as_date(date2)) %>%
select(-c('helpDate', 'refDate'))
# # A tibble: 7 x 4
# ID grp date1 date2
# <chr> <int> <chr> <date>
# 1 a 1 2014-03-27 2014-03-27
# 2 a 2 2014-04--- 2014-04-24
# 3 b 1 2014-03-24 2014-03-24
# 4 b 2 2014-04--- 2014-04-21
# 5 b 3 2014-05--- 2014-05-19
# 6 c 1 2014-03--- 2014-04-02
# 7 c 2 2014-04-30 2014-04-30
I am trying to extract dates from text and create a new column in a dataset. Dates are entered in different formats in column A1 (either mm-dd-yy or mm-dd). I need to find a way to identify the date in column A1 and then add the year if it is missing. Thus far, I have been able to extract the date regardless of the format; however, when I use as.Date on the new column A2, the date with mm-dd format becomes <NA>. I am aware that there might not be a direct solution for this situation, but a workaround (generalizable to a larger data set) would be great. The year would go from September 2019 to August 2020. Additionally, I am not sure why the format I use within the as.Date function is unable to control how the date gets displayed. This latter issue is not that important, but I am surprised by the behavior of the R function. A solution in tidyverse would be much appreciated.
library(tidyverse)
library(stringr)
db <- data.frame(A1 = c("review 11/18", "begins 12/4/19", "3/5/20", NA, "deadline 09/5/19", "9/3"))
db %>% mutate(A2 = str_extract(A1, "[0-9/0-9]+"))
# A1 A2
#1 review 11/18 11/18
#2 begins 12/4/19 12/4/19
#3 3/5/20 3/5/20
#4 <NA> <NA>
#5 deadline 09/5/19 09/5/19
#6 9/3 9/3
db %>% mutate(A2 = str_extract(A1, "[0-9/0-9]+")) %>%
mutate(A2 = A2 %>% as.Date(., "%m/%d/%y"))
# A1 A2
# 1 review 11/18 <NA>
# 2 begins 12/4/19 2019-12-04
# 3 3/5/20 2020-03-05
# 4 <NA> <NA>
# 5 deadline 09/5/19 2019-09-05
# 6 9/3 <NA>
Perhaps:
library(tidyverse)
db <- data.frame(A1 = c("review 11/18", "begins 12/4/19", "3/5/20", NA, "deadline 09/5/19", "9/3"))
#year from september to august 2019
(db <-
db %>%
mutate(A2 = str_extract(A1, '[\\d\\d/]+'),
A2 = if_else(str_count(A2, '/') == 1 & as.numeric(str_extract(A2, '\\d+')) > 8, paste0(A2, '/19'), A2),
A2 = if_else(str_count(A2, '/') == 1 & as.numeric(str_extract(A2, '\\d+')) <= 8, paste0(A2, '/20'), A2),
A2 = as.Date(A2, "%m/%d/%y")) )
#> A1 A2
#> 1 review 11/18 2019-11-18
#> 2 begins 12/4/19 2019-12-04
#> 3 3/5/20 2020-03-05
#> 4 <NA> <NA>
#> 5 deadline 09/5/19 2019-09-05
#> 6 9/3 2019-09-03
Created on 2021-11-21 by the reprex package (v2.0.1)
Well, this is neither a beautiful, concise or tidyverse solution but it does work and should be flexible in its modularity.
library(tidyverse)
db <- data.frame(A1 = c("review 11/18", "begins 12/4/19", "3/5/20", NA, "deadline 09/5/19", "9/3"))
db <- db %>% mutate(A2 = str_extract(A1, "[0-9/0-9]+"), A2 = str_extract(A1, "[0-9/0-9]+"))
test1 <- unlist(lapply(str_split(db$A2, "/", n = 3), function(x) length(x)))
test2 <- lapply(str_split(db$A2, "/", n = 3), function(x) as.numeric(x))
if(test1 == 2){
if(test2[[1]] >= 9){
db$A2 <- ifelse(test = between(nchar(db$A2), 3, 5) & !is.na(db$A2), yes = paste0(db$A2, "/19"), no = db$A2)
}
if(test2[[1]] < 9){
db$A2 <- ifelse(test = between(nchar(db$A2), 3, 5) & !is.na(db$A2), yes = paste0(db$A2, "/20"), no = db$A2)
}
}
db <- db %>% mutate(A2 = A2 %>% as.Date(., "%m/%d/%y"))
db
A1 A2
1 review 11/18 2019-11-18
2 begins 12/4/19 2019-12-04
3 3/5/20 2020-03-05
4 <NA> <NA>
5 deadline 09/5/19 2019-09-05
6 9/3 2019-09-03
I like the rematch2 package for many regex scenarios.
The first pattern tries to match the full m/d/y values. The second patterns tried to match the partial m/d values (furthermore, it separates the month from the day, so it can determine if it should be 2019 or 2020).
Once those pieces are isolated, the rest is just a sequence of small steps.
db |>
rematch2::bind_re_match(from = A1, "^.*?(?<mdy>\\d{1,2}/\\d{1,2}/\\d{2})$") |>
rematch2::bind_re_match(from = A1, "^.*?(?<md_m>\\d{1,2})/(?<md_d>\\d{1,2})$") |>
dplyr::mutate(
md_m = as.integer(md_m),
md_y = dplyr::if_else(9L <= md_m, "19", "20"), # It's 2019 if the month is Sept or later
md = sprintf("%i/%s/%s", md_m, md_d, md_y), # Assemble components
md = as.Date(md , "%m/%d/%y"), # Convert data type
mdy = as.Date(mdy, "%m/%d/%y"), # Convert data type
date = dplyr::coalesce(mdy, md), # Prefer the mdy if it's not missing
)
Output:
A1 mdy md_m md_d md_y md date
1 review 11/18 <NA> 11 18 19 2019-11-18 2019-11-18
2 begins 12/4/19 2019-12-04 4 19 20 2020-04-19 2019-12-04
3 3/5/20 2020-03-05 5 20 20 2020-05-20 2020-03-05
4 <NA> <NA> NA <NA> <NA> <NA> <NA>
5 deadline 09/5/19 2019-09-05 5 19 20 2020-05-19 2019-09-05
6 9/3 <NA> 9 3 19 2019-09-03 2019-09-03
I have created the RespNum & RespDay variables using the code below (see starting at ______________________)
Now I just need to do the following task: Create a variable called ‘Day’ that is nested by subject and date
Data sample: (click here to download)
ParticipantId DateTime_local RespNum RespDay
<chr> <dttm> <int> <int>
1 1001 2017-10-20 18:42:00 1 1
2 1001 2017-10-20 20:24:00 2 2
3 1001 2017-10-20 23:12:00 3 3
4 1001 2017-10-21 01:23:00 4 1
5 1001 2017-10-21 13:32:00 5 2
6 1001 2017-10-21 15:17:00 6 3
7 1001 2017-10-21 17:32:00 7 4
8 1001 2017-10-21 20:23:00 8 5
9 1001 2017-10-21 22:57:00 9 6
10 1001 2017-10-22 01:54:00 10 1
___________ Code used to create RespNum & RespDay ______________________
data = dataset
create new variable in correct time zone
data <- data %>%
mutate(DateTime = mdy_hm(DateTime),
DateTime_local = force_tz(DateTime, tzone = "America/New_York"))
create RespNum
this variable is the number of responses by subject.
data <- data %>%
group_by(ParticipantId) %>%
mutate(RespNum = row_number(DateTime_local)) %>%
ungroup() %>%
arrange(ParticipantId, RespNum, DateTime_local) # arrange data
data %>% select(ParticipantId, DateTime_local, RespNum) #view data
split date & time into two columns
data$date <- sapply(strsplit(as.character(data$DateTime_local), " "), "[", 1)
data$time <- sapply(strsplit(as.character(data$DateTime_local), " "), "[", 2)
change date to date format and save as numeric date
(data$date <- ymd(data$date)) #change to date format
class(data$date) #check that it is stored as date
as.numeric(data$date) #save date as numeric
class(data$date) #check that it is still date
Create RespDay Variable
ID = grouping variable
data$ID <- data$ParticipantId
date = date (not date + time)
create variable that contains subject ID and date
data$ID_DAY<-paste(data$ID,as.numeric(data$date),sep="")
data <- data %>%
group_by(ID_DAY) %>%
mutate(RespDay = row_number(date)) %>%
ungroup() %>%
arrange(ParticipantId, RespNum, RespDay, DateTime_local) # arrange data
data %>% select(ParticipantId, DateTime_local, RespNum, RespDay) #view data
The ‘Day’ variable should be a series of 1’s for the first day the participant responded, series of 2 for the 2nd day the participant responded, etc.
So using the subset of data example above:
ParticipantId DateTime_local RespNum RespDay Day
<chr> <dttm> <int> <int> <int>
1 1001 2017-10-20 18:42:00 1 1 1
2 1001 2017-10-20 20:24:00 2 2 1
3 1001 2017-10-20 23:12:00 3 3 1
4 1001 2017-10-21 01:23:00 4 1 2
5 1001 2017-10-21 13:32:00 5 2 2
6 1001 2017-10-21 15:17:00 6 3 2
7 1001 2017-10-21 17:32:00 7 4 2
8 1001 2017-10-21 20:23:00 8 5 2
9 1001 2017-10-21 22:57:00 9 6 2
10 1001 2017-10-22 01:54:00 10 1 3
Thank you!
Using the tidyverse and lubridate package, this works!
library(tidyverse)
library(lubridate)
##data = data name
## ParticipantId = unique subject ID
## expday = new variable created
data <- data %>%
group_by(ParticipantId) %>%
mutate(
DateTime = mdy_hm(DateTime),
Date = lubridate::date(DateTime),
expday = dense_rank(Date))
ungroup() %>%
arrange(ParticipantId, DateTime, expday) # arrange data
data %>% select(ParticipantId, DateTime, expday) #view data
I have the following dataset with three columns containing dates.
library(dplyr)
set.seed(45)
df1 <- data.frame(hire_date = sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="week"), 10),
t1 = sample(seq(as.Date('2000/01/01'), as.Date('2001/01/01'), by="week"), 10),
t2 = sample(seq(as.Date('2000/01/01'), as.Date('2001/01/01'), by="day"), 10))
#this value is actually unknown
df1[10,2] <- NA
hire_date t1 t2
1 1999-08-20 2000-05-13 2000-02-17
2 1999-04-23 2000-11-11 2000-04-27
3 1999-03-26 2000-04-15 2000-08-01
4 1999-05-07 2000-06-03 2000-08-29
5 1999-04-30 2000-05-27 2000-11-19
6 1999-04-09 2000-12-30 2000-01-26
7 1999-03-12 2000-12-23 2000-12-07
8 1999-06-25 2000-02-12 2000-09-26
9 1999-02-26 2000-05-06 2000-08-23
10 1999-01-01 <NA> 2000-03-18
I'd like to perform an if else statement such that df1$com is 1 if the difference between t1 OR t2 and hire_date is between [395,500]
The following if_else statement almost gets me there, but the NA mucks it up. Any ideas?
df1$com <- if_else((df1$t1 - df1$hire_date) >= 395 &
(df1$t1 - df1$hire_date) <= 500, 1,
if_else((df1$t2 - df1$hire_date) >= 395 &
(df1$t2 - df1$hire_date) <= 500, 1, 0))
You could use dplyr::case_when instead of nesting the if_else statements. It will give you easy control over how to treat NA. And dplyr::between will clean things up as well for your date comparisons.
df1 %>%
mutate(com = case_when(
is.na(t1) | is.na(t2) ~ 999, # or however you want to treat NA cases
between(t1 - hire_date, 395, 500) ~ 1,
between(t2 - hire_date, 395, 500) ~ 1,
TRUE ~ 0 # neither range is between 395 and 500
))
#> hire_date t1 t2 com
#> 1 1999-08-20 2000-05-13 2000-02-17 0
#> 2 1999-04-23 2000-11-11 2000-04-27 0
#> 3 1999-03-26 2000-04-15 2000-08-01 1
#> 4 1999-05-07 2000-06-03 2000-08-29 1
#> 5 1999-04-30 2000-05-27 2000-11-19 0
#> 6 1999-04-09 2000-12-30 2000-01-26 0
#> 7 1999-03-12 2000-12-23 2000-12-07 0
#> 8 1999-06-25 2000-02-12 2000-09-26 1
#> 9 1999-02-26 2000-05-06 2000-08-23 1
#> 10 1999-01-01 <NA> 2000-03-18 999