Let's say I have two dataframes like the ones below:
df1 = structure(list(Date = c("2000-01-05", "2000-02-03", "2000-03-02",
"2000-03-30", "2000-04-13", "2000-05-11", "2000-06-08", "2000-07-06",
"2000-09-14", "2000-10-19", "2000-11-02", "2000-12-14", "2001-02-01",
"2001-03-01", "2001-04-11", "2001-05-10", "2001-06-07", "2001-06-21",
"2001-07-05", "2001-08-30", "2001-10-11", "2001-11-08", "2001-12-06"
)), row.names = c(NA, 23L), class = "data.frame")
Date
1 2000-01-05
2 2000-02-03
3 2000-03-02
4 2000-03-30
5 2000-04-13
6 2000-05-11
7 2000-06-08
8 2000-07-06
9 2000-09-14
10 2000-10-19
11 2000-11-02
12 2000-12-14
13 2001-02-01
14 2001-03-01
15 2001-04-11
16 2001-05-10
17 2001-06-07
18 2001-06-21
19 2001-07-05
20 2001-08-30
21 2001-10-11
22 2001-11-08
23 2001-12-06
df2 = structure(list(Date = structure(c(10987, 11016, 11047, 11077,
11108, 11138, 11169, 11200, 11230, 11261, 11291, 11322, 11353,
11381, 11412, 11442, 11473, 11503, 11534, 11565, 11595, 11626,
11656, 11687), class = "Date"), x = c(3.04285714285714, 3.27571428571429,
3.5104347826087, 3.685, 3.92, 4.29454545454545, 4.30857142857143,
4.41913043478261, 4.59047619047619, 4.76272727272727, 4.82909090909091,
4.82684210526316, 4.75590909090909, 4.9925, 4.78136363636364,
5.06421052631579, 4.65363636363636, 4.53952380952381, 4.50545454545454,
4.49130434782609, 3.9865, 3.97130434782609, 3.50727272727273,
3.33888888888889)), row.names = c(NA, 24L), class = "data.frame")
Date x
1 2000-01-31 3.042857
2 2000-02-29 3.275714
3 2000-03-31 3.510435
4 2000-04-30 3.685000
5 2000-05-31 3.920000
6 2000-06-30 4.294545
7 2000-07-31 4.308571
8 2000-08-31 4.419130
9 2000-09-30 4.590476
10 2000-10-31 4.762727
11 2000-11-30 4.829091
12 2000-12-31 4.826842
13 2001-01-31 4.755909
14 2001-02-28 4.992500
15 2001-03-31 4.781364
16 2001-04-30 5.064211
17 2001-05-31 4.653636
18 2001-06-30 4.539524
19 2001-07-31 4.505455
20 2001-08-31 4.491304
21 2001-09-30 3.986500
22 2001-10-31 3.971304
23 2001-11-30 3.507273
24 2001-12-31 3.338889
Now, what I would like to do is to create a real-time dataframe, that is, the data in df2 that were only available at the time of df1. For instance, at 2000-01-05 (first row in df1) no data in df2 was available since since 2000-01-31 (first row of df2) occurs after 2000-01-05. However, in 2000-02-03(second row in df1) the observation in 2000-01-31 (first row of df2) is available. This should be the reasoning for every row. The outcome should look like this:
Date y
1 2000-01-05 NA
2 2000-02-03 3.042857
3 2000-03-02 3.275714
4 2000-03-30 3.275714
5 2000-04-13 3.510435
6 2000-05-11 3.685000
....
The rule would be: pick up from df2 only the observation that was available at the time of df1.
Can anyone help me?
Thanks!
What you can do is complete the df2 dates and then join.
library(dplyr)
library(tidyr)
# create a dataframe with all the days, not just the snapshots
df2_complete <- df2 %>%
complete(Date = seq.Date(min(Date), max(Date), by = "day")) %>%
fill(x, .direction = "down")
# convert to Date class for this case and join
df1 %>%
mutate(Date = as.Date(Date)) %>%
left_join(df2_complete, by = "Date")
Which gives:
Date x
1 2000-01-05 NA
2 2000-02-03 3.042857
3 2000-03-02 3.275714
4 2000-03-30 3.275714
5 2000-04-13 3.510435
6 2000-05-11 3.685000
....
Related
I have a database where animals in a herd are tested every 6 months (number of animals can change over the time). The issue is that all the animals in a herd are not tested on the same day but within a period of time of 2 months.
I would like to know who I can create a new column that merges all these close dates (grouping by herd), so I can calculate the number of times a herd has been tested.
This is an example of a herd that has been tested 8 times, but at different dates. Each dot represents an animal:
Here is an example of the data:
df <- data.frame(
animal = c("Animal1", "Animal2", "Animal3", "Animal4", "Animal5", "Animal6", "Animal1", "Animal2", "Animal3", "Animal4", "Animal5", "Animal6", "Animal7", "Animal8", "Animal9", "Animal10", "Animal11", "Animal12", "Animal7", "Animal8", "Animal9", "Animal10", "Animal11", "Animal12"),
herd = c("Herd1","Herd1","Herd1", "Herd1","Herd1","Herd1", "Herd1","Herd1","Herd1", "Herd1","Herd1","Herd1","Herd2","Herd2", "Herd2","Herd2","Herd2","Herd2", "Herd2","Herd2", "Herd2","Herd2","Herd2","Herd2"),
date = c("2017-01-01", "2017-01-01", "2017-01-17","2017-02-04", "2017-02-04", "2017-02-05", "2017-06-01" , "2017-06-03", "2017-07-01", "2017-06-21", "2017-06-01", "2017-06-15", "2017-02-01", "2017-02-01", "2017-02-15", "2017-02-21", "2017-03-05", "2017-03-01", "2017-07-01", "2017-07-01", "2017-07-15", "2017-07-21", "2017-08-05", "2017-08-01"))
So the desired outcome will be:
animal herd date testing
1 Animal1 Herd1 2017-01-01 1
2 Animal2 Herd1 2017-01-01 1
3 Animal3 Herd1 2017-01-17 1
4 Animal4 Herd1 2017-02-04 1
5 Animal5 Herd1 2017-02-04 1
6 Animal6 Herd1 2017-02-05 1
7 Animal1 Herd1 2017-06-01 2
8 Animal2 Herd1 2017-06-03 2
9 Animal3 Herd1 2017-07-01 2
10 Animal4 Herd1 2017-06-21 2
11 Animal5 Herd1 2017-06-01 2
12 Animal6 Herd1 2017-06-15 2
13 Animal7 Herd2 2017-02-01 1
14 Animal8 Herd2 2017-02-01 1
15 Animal9 Herd2 2017-02-15 1
16 Animal10 Herd2 2017-02-21 1
17 Animal11 Herd2 2017-03-05 1
18 Animal12 Herd2 2017-03-01 1
19 Animal7 Herd2 2017-07-01 2
20 Animal8 Herd2 2017-07-01 2
21 Animal9 Herd2 2017-07-15 2
22 Animal10 Herd2 2017-07-21 2
23 Animal11 Herd2 2017-08-05 2
24 Animal12 Herd2 2017-08-01 2
I would like to apply something like this but considering that dates closed to each other are the same testing
df %>%
group_by(herd) %>%
mutate(testing = dense_rank(date))
Thanks!
You can group_by every 5 months and apply dense_rank. Since your smallest gap between two dates from the same animal is 5 months, the unit has to be 5 months.
library(dplyr)
library(lubridate)
df %>%
group_by(testing = dense_rank(floor_date(ymd(date), unit = "5 months")))
I have a dataframe that looks like this:
CYCLE date_cycle Randomization_Date COUPLEID
1 0 2016-02-16 10892
2 1 2016-08-17 2016-02-19 10894
3 1 2016-08-14 2016-02-26 10899
4 1 2016-02-26 10900
5 2 2016-03--- 2016-02-26 10900
6 3 2016-07-19 2016-02-26 10900
7 4 2016-11-15 2016-02-26 10900
8 1 2016-02-27 10901
9 2 2016-02--- 2016-02-27 10901
10 1 2016-03-27 2016-03-03 10902
11 2 2016-04-21 2016-03-03 10902
12 1 2016-03-03 10903
13 2 2016-03--- 2016-03-03 10903
14 0 2016-03-03 10904
15 1 2016-03-03 10905
16 2 2016-03-03 10905
17 3 2016-03-03 10905
18 4 2016-04-14 2016-03-03 10905
19 5 2016-05--- 2016-03-03 10905
20 6 2016-06--- 2016-03-03 10905
The goal is to fill in the missing day for a given ID using either an earlier or later date and add/subtract 28 from that.
The date_cycle variable was originally in the dataframe as a character type.
I have tried to code it as follows:
mutate(rowwise(df),
newdate = case_when( str_count(date1, pattern = "\\W") >2 ~ lag(as.Date.character(date1, "%Y-%m-%d"),1) + days(28)))
But I need to incorporate it by ID by CYCLE.
An example of my data could be made like this:
data.frame(stringsAsFactors = FALSE,
CYCLE =(0,1,1,1,2,3,4,1,2,1,2,1,2,0,1,2,3,4,5,6),
date_cycle = c(NA,"2016-08-17", "2016-08-14",NA,"2016-03---","2016-07-19", "2016-11-15",NA,"2016-02---", "2016-03-27","2016-04-21",NA, "2016-03---",NA,NA,NA,NA,"2016-04-14", "2016-05---","2016-06---"), Randomization_Date = c("2016-02-16","2016-02-19",
"2016-02-26","2016-02-26",
"2016-02-26","2016-02-26",
"2016-02-26",
"2016-02-27","2016-02-27",
"2016-03-03",
"2016-03-03","2016-03-03",
"2016-03-03","2016-03-03",
"2016-03-03",
"2016-03-03","2016-03-03",
"2016-03-03",
"2016-03-03","2016-03-03"),
COUPLEID = c(10892,10894,10899,10900,
10900,10900,10900,10901,10901,
10902,10902,10903,10903,10904,
10905,10905,10905,10905,10905,10905)
)
The output I am after would look like:
COUPLEID CYCLE date_cycle new_date_cycle
a 1 2014-03-27 2014-03-27
a 1 2014-04--- 2014-04-24
b 1 2014-03-24 2014-03-24
b 2 2014-04-21
b 3 2014-05--- 2014-05-19
c 1 2014-04--- 2014-04-02
c 2 2014-04-30 2014-04-30
I have also started to make a long conditional, but I wanted to ask here and see if anyone new of a more straight forward way to do it, instead of explicitly writing out all of the possible conditions.
mutate(rowwise(df),
newdate = case_when(
grp == 1 & str_count(date1, pattern = "\\W") >2 & !is.na(lead(date1,1) ~ lead(date1,1) - days(28),
grp == 2 & str_count(date1, pattern = "\\W") >2 & !is.na(lead(date1,1)) ~ lead(date1,1) - days(28),
grp == 3 & str_count(date1, pattern = "\\W") >2 & ...)))
Function to fill dates forward and backwards
filldates <- function(dates) {
m = which(!is.na(dates))
if(length(m)>0 & length(m)!=length(dates)) {
if(m[1]>1) for(i in seq(m,1,-1)) if(is.na(dates[i])) dates[i]=dates[i+1]-28
if(sum(is.na(dates))>0) for(i in seq_along(dates)) if(is.na(dates[i])) dates[i] = dates[i-1]+28
}
return(dates)
}
Usage:
data %>%
arrange(ID, grp) %>%
group_by(ID) %>%
mutate(date2=filldates(as.Date(date1,"%Y-%m-%d")))
Ouput:
ID grp date1 date2
<chr> <dbl> <chr> <date>
1 a 1 2014-03-27 2014-03-27
2 a 2 2014-04--- 2014-04-24
3 b 1 2014-03-24 2014-03-24
4 b 2 2014-04--- 2014-04-21
5 b 3 2014-05--- 2014-05-19
6 c 1 2014-03--- 2014-04-02
7 c 2 2014-04-30 2014-04-30
An option using purrr::accumulate().
library(tidyverse)
center <- df %>%
group_by(ID) %>%
mutate(helpDate = ymd(str_replace(date1, '---', '-01')),
refDate = max(ymd(date1), na.rm = T))
backward <- center %>%
filter(refDate == max(helpDate)) %>%
mutate(date2 = accumulate(refDate, ~ . - days(28), .dir = 'backward'))
forward <- center %>%
filter(refDate == min(helpDate)) %>%
mutate(date2 = accumulate(refDate, ~ . + days(28)))
bind_rows(forward, backward) %>%
ungroup() %>%
mutate(date2 = as_date(date2)) %>%
select(-c('helpDate', 'refDate'))
# # A tibble: 7 x 4
# ID grp date1 date2
# <chr> <int> <chr> <date>
# 1 a 1 2014-03-27 2014-03-27
# 2 a 2 2014-04--- 2014-04-24
# 3 b 1 2014-03-24 2014-03-24
# 4 b 2 2014-04--- 2014-04-21
# 5 b 3 2014-05--- 2014-05-19
# 6 c 1 2014-03--- 2014-04-02
# 7 c 2 2014-04-30 2014-04-30
I would like to calculate mean every 5 rows in my df. Here is my df :
Time
value
03/06/2021 06:15:00
NA
03/06/2021 06:16:00
NA
03/06/2021 06:17:00
20
03/06/2021 06:18:00
22
03/06/2021 06:19:00
25
03/06/2021 06:20:00
NA
03/06/2021 06:21:00
31
03/06/2021 06:22:00
23
03/06/2021 06:23:00
19
03/06/2021 06:24:00
25
03/06/2021 06:25:00
34
03/06/2021 06:26:00
42
03/06/2021 06:27:00
NA
03/06/2021 06:28:00
19
03/06/2021 06:29:00
17
03/06/2021 06:30:00
25
I already have a loop which goes well to calculate means for each 5 rows package. My problem is in my "mean function".
The problem is :
-if I put na.rm = FALSE, mean = NA as soon as there is a NA in a package of 5 values.
- if I put na.rm = TRUE in mean function, the result gives me averages that are shifted to take 5 values. I would like the NA not to interfere with the average and that when there is a NA in a package of 5 values, the average is only done on 4 values.
How can I do this? Thanks for your help !
You can solve your problem by introducing a dummy variable that groups your observarions in sets of five and then calculating the mean within group. Here's MWE, based in the tidyverse, that assumes your data is in a data.frame named df.
library(tidyverse)
df %>%
mutate(Group= 1 + floor((row_number()-1) / 5)) %>%
group_by(Group) %>%
summarise(Mean=mean(value, na.rm=TRUE), .groups="drop")
# A tibble: 4 × 2
Group Mean
<dbl> <dbl>
1 1 22.3
2 2 24.5
3 3 28
4 4 25
A solution based on purrr::map_dfr:
library(purrr)
df <- data.frame(
stringsAsFactors = FALSE,
time = c("03/06/2021 06:15:00","03/06/2021 06:16:00",
"03/06/2021 06:17:00",
"03/06/2021 06:18:00","03/06/2021 06:19:00",
"03/06/2021 06:20:00","03/06/2021 06:21:00",
"03/06/2021 06:22:00","03/06/2021 06:23:00",
"03/06/2021 06:24:00","03/06/2021 06:25:00",
"03/06/2021 06:26:00",
"03/06/2021 06:27:00","03/06/2021 06:28:00",
"03/06/2021 06:29:00","03/06/2021 06:30:00"),
value = c(NA,NA,20L,22L,
25L,NA,31L,23L,19L,25L,34L,42L,NA,19L,17L,
25L)
)
map_dfr(1:(nrow(df)-5),
~ data.frame(Group =.x, Mean = mean(df$value[.x:(.x+5)],na.rm=T)))
#> Group Mean
#> 1 1 22.33333
#> 2 2 24.50000
#> 3 3 24.20000
#> 4 4 24.00000
#> 5 5 24.60000
#> 6 6 26.40000
#> 7 7 29.00000
#> 8 8 28.60000
#> 9 9 27.80000
#> 10 10 27.40000
#> 11 11 27.40000
If you want to take average of every 5 minutes you may use lubridate's function floor_date/ceiling_date to round the time.
library(dplyr)
library(lubridate)
df %>%
mutate(time = mdy_hms(time),
time = floor_date(time, '5 mins')) %>%
group_by(time) %>%
summarise(value = mean(value, na.rm = TRUE))
# time value
# <dttm> <dbl>
#1 2021-03-06 06:15:00 22.3
#2 2021-03-06 06:20:00 24.5
#3 2021-03-06 06:25:00 28
#4 2021-03-06 06:30:00 25
Let's suppose I have two dataframes that look like this:
df1 = structure(list(X1 = c(0.659588465514883, 0.47368422669833, -0.0422047052887636,
-1.75642936005977, 0.339813114272074, 1.09341750942405, 0.327672990051479,
-0.893507823167616, -0.661285321563594, -0.569673784617002, -0.983369868281376,
-2.53659592825309, 0.396220995581641, -1.1994504350227, -0.553343957714012,
1.30884516680972, -0.120561033997931, 0.971506981390537, 0.815610612704566,
1.53103368033727, -0.808956975392184, -1.27332589061096, -1.89082047917723,
0.249755375966669, -0.704051599213331), X2 = c(0.659588465514883,
0.47368422669833, -0.0422047052887636, -1.75642936005977, 0.339813114272074,
1.09341750942405, 0.327672990051479, -0.893507823167616, -0.661285321563594,
-0.569673784617002, -0.983369868281376, -2.53659592825309, 0.396220995581641,
-1.1994504350227, -0.553343957714012, 1.30884516680972, -0.120561033997931,
0.971506981390537, 0.815610612704566, 1.53103368033727, -0.808956975392184,
-1.27332589061096, -1.89082047917723, 0.249755375966669, -0.704051599213331
), Date = structure(c(10957,
10988, 11017, 11048, 11078, 11109, 11139, 11170, 11201, 11231,
11262, 11292, 11323, 11354, 11382, 11413, 11443, 11474, 11504,
11535, 11566, 11596, 11627, 11657, 11688), class = "Date")), class = "data.frame", row.names = c(NA,
-25L))
X1 X2
1 -1.633636896 -1.633636896
2 1.793766808 1.793766808
3 0.440697771 0.440697771
4 0.330091148 0.330091148
5 -1.234246285 -1.234246285
6 0.044951993 0.044951993
7 -2.831295687 -2.831295687
8 -0.735371579 -0.735371579
9 -0.412580789 -0.412580789
10 0.001848622 0.001848622
11 1.480684731 1.480684731
12 -1.088999830 -1.088999830
13 -0.465903929 -0.465903929
14 -0.010743010 -0.010743010
15 1.420995930 1.420995930
16 -0.789190729 -0.789190729
17 -0.750476176 -0.750476176
18 -0.314079067 -0.314079067
19 -0.324779959 -0.324779959
20 -1.192471909 -1.192471909
21 -0.170325813 -0.170325813
22 0.890941125 0.890941125
23 0.863875448 0.863875448
24 -0.088048086 -0.088048086
25 0.021239226 0.021239226
Date
1 2000-01-01
2 2000-02-01
3 2000-03-01
4 2000-04-01
5 2000-05-01
6 2000-06-01
7 2000-07-01
8 2000-08-01
9 2000-09-01
10 2000-10-01
11 2000-11-01
12 2000-12-01
13 2001-01-01
14 2001-02-01
15 2001-03-01
16 2001-04-01
17 2001-05-01
18 2001-06-01
19 2001-07-01
20 2001-08-01
21 2001-09-01
22 2001-10-01
23 2001-11-01
24 2001-12-01
25 2002-01-01
df2 = structure(list(X1 = c(-0.0712460200169048, 1.0131741924359, 0.28590272354409,
-0.835911047943257, -0.146890264431744), X2 = c(-0.0712460200169048,
1.0131741924359, 0.28590272354409, -0.835911047943257, -0.146890264431744
), Date = structure(c(10984, 11120, 11441, 11488, 11712), class = "Date")), class = "data.frame", row.names = c(NA,
-5L))
X1 X2 Date
1 0.03815189 0.03815189 2000-01-28
2 -0.22665838 -0.22665838 2000-06-12
3 0.36459588 0.36459588 2001-04-29
4 0.32772746 0.32772746 2001-06-15
5 -1.22891784 -1.22891784 2002-01-25
What I would like to do is to reduce the number of rows in df1 (number of rows in df1 = number of rows in df2) on the basis of the the number of rows in df2. In particular, I would like to remove those rows that are in the Date column for df1 is not present in the Date column of df2. Easier to see the output I would like to get:
# DF1 shall become like this (n stays for the numbers corresponding to each date row):
X1 X2 Date
1 n n 2000-01-01
2 n n 2000-06-01
3 n n 2001-04-01
4 n n 2001-06-01
5 n n 2002-01-01
# not really important which day is diplayed in the finale output. What matters is just year and month
I tried to use semin_join but the problem is that different days make the function unable to grasp what I need. Ideally, I would need to ignore days and sample by year and months.
This is what I tried:
library(dplyr)
semin_join(df1, df2, by = "Date")
[1] X1 X2 Date
<0 rows> (or 0-length row.names)
Can anyone help me?
Thanks!
Using the great suggestion from #arg0naut91 here a possible solution in base R. First format the variables Date and then you can use %in% to check which dates are present or not. Next the code using your df1 and df2:
#Format dates
df1$I1 <- format(df1$Date,'%Y-%m')
df2$I2 <- format(df2$Date,'%Y-%m')
Now this makes the contrast:
df1[df1$I1 %in% df2$I2,]
Output:
X1 X2 Date I1
1 0.6595885 0.6595885 2000-01-01 2000-01
6 1.0934175 1.0934175 2000-06-01 2000-06
16 1.3088452 1.3088452 2001-04-01 2001-04
18 0.9715070 0.9715070 2001-06-01 2001-06
25 -0.7040516 -0.7040516 2002-01-01 2002-01
In the end you could assign that result to a new dataframe and remove I1.
As default I set the argument cut.points as NA and if it's on default then it shouldn't do anything with the data.
But if user decides to put for example cut.points = c("2012-01-01", "2013-01-01") then the data should be filtered by the column that has dates in it. And it should return only dates between 2012 to 2013.
The problem is that I'm reading data from the function so in theory i won't know what is the name of this date column that uses provides. So i find the column with dates and store it's name in the variable.
But the condition which i wrote that should filter based od this variable doesn't work:
modifier <- function(input.data, cut.points = c(NA, NA)) {
date_check <- sapply(input.data, function(x) !all(is.na(as.Date(as.character(x),format="%Y-%m-%d"))))
if (missing(cut.points)) {
input.data
} else {
cols <- colnames(select_if(input.data, date_check == TRUE))
cut.points <- as.Date(cut.points)
input.data <- filter(input.data, cols > cut.points[1] & cols < cut.points[2])
}
}
for ex. when i try to run this:
modifier(ex_data, cut.points = c("2012-01-01", "2013-01-01"))
On sample like this:
ex_data
Row.ID Order.ID Order.Date
1 32298 CA-2012-124891 2012-07-31
2 26341 IN-2013-77878 2013-02-05
3 25330 IN-2013-71249 2013-10-17
4 13524 ES-2013-1579342 2013-01-28
5 47221 SG-2013-4320 2013-11-05
6 22732 IN-2013-42360 2013-06-28
7 30570 IN-2011-81826 2011-11-07
8 31192 IN-2012-86369 2012-04-14
9 40155 CA-2014-135909 2014-10-14
10 40936 CA-2012-116638 2012-01-28
11 34577 CA-2011-102988 2011-04-05
12 28879 ID-2012-28402 2012-04-19
13 45794 SA-2011-1830 2011-12-27
14 4132 MX-2012-130015 2012-11-13
15 27704 IN-2013-73951 2013-06-06
16 13779 ES-2014-5099955 2014-07-31
17 36178 CA-2014-143567 2014-11-03
18 12069 ES-2014-1651774 2014-09-08
19 22096 IN-2014-11763 2014-01-31
20 49463 TZ-2014-8190 2014-12-05
the error is:
character string is not in a standard unambiguous format
I've added lubridateas a dependency so I could get access to %within% and is.Date. I've also changed the check condition, because I don't think your original one would work with NA, NA.
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
ex_data <- read_table(" Row.ID Order.ID Order.Date
1 32298 CA-2012-124891 2012-07-31
2 26341 IN-2013-77878 2013-02-05
3 25330 IN-2013-71249 2013-10-17
4 13524 ES-2013-1579342 2013-01-28
5 47221 SG-2013-4320 2013-11-05
6 22732 IN-2013-42360 2013-06-28
7 30570 IN-2011-81826 2011-11-07
8 31192 IN-2012-86369 2012-04-14
9 40155 CA-2014-135909 2014-10-14
10 40936 CA-2012-116638 2012-01-28
11 34577 CA-2011-102988 2011-04-05
12 28879 ID-2012-28402 2012-04-19
13 45794 SA-2011-1830 2011-12-27
14 4132 MX-2012-130015 2012-11-13
15 27704 IN-2013-73951 2013-06-06
16 13779 ES-2014-5099955 2014-07-31
17 36178 CA-2014-143567 2014-11-03
18 12069 ES-2014-1651774 2014-09-08
19 22096 IN-2014-11763 2014-01-31
20 49463 TZ-2014-8190 2014-12-05")
#> Warning: Missing column names filled in: 'X1' [1]
modifier <- function(input.data, cut.points = NULL) {
if (length(cut.points) == 2) {
date_col <- colnames(input.data)[sapply(input.data, is.Date)]
filtered.data <- input.data %>%
rename(Date = !! date_col) %>%
filter(Date %within% interval(cut.points[1], cut.points[2])) %>%
rename_with(~ date_col, Date)
return(filtered.data)
} else {
input.data
}
}
modifier(ex_data, cut.points = c("2012-01-01", "2013-01-01"))
#> # A tibble: 5 x 4
#> X1 Row.ID Order.ID Order.Date
#> <dbl> <dbl> <chr> <date>
#> 1 1 32298 CA-2012-124891 2012-07-31
#> 2 8 31192 IN-2012-86369 2012-04-14
#> 3 10 40936 CA-2012-116638 2012-01-28
#> 4 12 28879 ID-2012-28402 2012-04-19
#> 5 14 4132 MX-2012-130015 2012-11-13