Create time axis for longitudinal data; calculations with data variables - r

I've got the following sample data frame. The data is in long format (longitudinal data). col1 indicates the person ID (for this sample we only have 2 people). col2 indicates the occurrence of a life event (e.g. 0 = not married, 1 = married). The status change from 0 to one actually marks the life event. col3 is 1 for each measurement occasion after the event and 0 for each measurement occasion prior to the event. The year indicates the year of assessment. The month indicates the month of assessment (02 = February).
col1 col2 col3 year month
row.name11 A 0 0 2013 02
row.name12 A 0 0 2014 02
row.name13 A 1 1 2015 02
row.name14 A 0 1 2016 02
row.name15 A 0 1 2018 02
row.name16 B 0 0 2014 02
row.name17 B 0 0 2015 02
row.name18 B 1 1 2016 02
row.name19 B 0 1 2017 04
I now wish to create an event-centered timeline. The new variable should be 0 when the event takes place (col2 == 1). It should be negative prior to the event (indicating the month until the event occurs) and positive after the event (indicating the month since the event has occurred).
It should look like this (see event.time variable):
col1 col2 col3 year month event.time
row.name11 A 0 0 2013 02 -24
row.name12 A 0 0 2014 02 -12
row.name13 A 1 1 2015 02 0
row.name14 A 0 1 2016 02 12
row.name15 A 0 1 2018 02 36
row.name16 B 0 0 2014 02 -24
row.name17 B 0 0 2015 02 -12
row.name18 B 1 1 2016 02 0
row.name19 B 0 1 2017 04 14
I figured out that I should transform my year and month variable into date-variables (using as.date function) first. However, I wasn't successful. How could I efficiently calculate the event.time variable afterwards? Maybe using the col3 because this variable indicates if it is prior or after the event?
I'm more than happy to receive any advices you may have! Thanks in advance :)

#if nchar(month) is 1 then add 0 before month, otherwise use month directly.
#1 added to make the transformation to as.Date simple
df$date<- paste0(df$year,'-',ifelse(nchar(df$month)==1,paste0(0,df$month),df$month),'-1')
df$date<- as.Date(df$date)
library(dplyr)
df %>% group_by(col1) %>%
#Get the minmume date where col2==1 incase there is more than one 1 in the same ID
mutate(date_used=min(date[col2==1]), event.time=as.numeric(date - date_used))
# A tibble: 9 x 8
# Groups: col1 [2]
col1 col2 col3 year month date date_used event.time
<fct> <int> <int> <int> <int> <date> <date> <dbl>
1 A 0 0 2013 2 2013-02-01 2015-02-01 -730
2 A 0 0 2014 2 2014-02-01 2015-02-01 -365
3 A 1 1 2015 2 2015-02-01 2015-02-01 0
4 A 0 1 2016 2 2016-02-01 2015-02-01 365
5 A 0 1 2018 2 2018-02-01 2015-02-01 1096
6 B 0 0 2014 2 2014-02-01 2016-02-01 -730
7 B 0 0 2015 2 2015-02-01 2016-02-01 -365
8 B 1 1 2016 2 2016-02-01 2016-02-01 0
9 B 0 1 2017 4 2017-04-01 2016-02-01 425
Data
df <- read.table(text="
col1 col2 col3 year month
row.name11 A 0 0 2013 02
row.name12 A 0 0 2014 02
row.name13 A 1 1 2015 02
row.name14 A 0 1 2016 02
row.name15 A 0 1 2018 02
row.name16 B 0 0 2014 02
row.name17 B 0 0 2015 02
row.name18 B 1 1 2016 02
row.name19 B 0 1 2017 04
",header=T)

Here is an option using lubridate
library(tidyverse)
library(lubridate)
ym <- function(y, m) ymd(sprintf("%s-%s-01", y, m))
df %>%
group_by(col1) %>%
mutate(event.time = interval(ym(year, month)[col2 == 1], ym(year, month)) %/% months(1))
## A tibble: 9 x 6
## Groups: col1 [2]
# col1 col2 col3 year month event.time
# <fct> <int> <int> <int> <int> <dbl>
#1 A 0 0 2013 2 -24.
#2 A 0 0 2014 2 -12.
#3 A 1 1 2015 2 0.
#4 A 0 1 2016 2 12.
#5 A 0 1 2018 2 36.
#6 B 0 0 2014 2 -24.
#7 B 0 0 2015 2 -12.
#8 B 1 1 2016 2 0.
#9 B 0 1 2017 4 14.
Sample data
df <- read.table(text =
" col1 col2 col3 year month
row.name11 A 0 0 2013 02
row.name12 A 0 0 2014 02
row.name13 A 1 1 2015 02
row.name14 A 0 1 2016 02
row.name15 A 0 1 2018 02
row.name16 B 0 0 2014 02
row.name17 B 0 0 2015 02
row.name18 B 1 1 2016 02
row.name19 B 0 1 2017 04", header = T)

Related

Impute missing records based on week and year in r

I want to impute missing weeks record with 0 values in duration column for each household, individual combination.
The minimum week here is w51 of 2021 and goes upto w4 of 2022
For Household 1001 - individual 1 combination, week 3 is missing in the sequence.
Household 1002 - individual 2, week 52,week 2 and week 4 is missing
Final dataset would be:
what I tried is using complete function from tidyr after group by with household and individual but its not working.
In actual dataset minimium and maximum weeks will be changing.
Here is the sample dataset
data <- data.frame(household=c(1001,1001,1001,1001,1001,1002,1002,1002,1003,1003,1003),
individual = c(1,1,1,1,1,2,2,2,1,1,1),
year = c(2021,2021,2022,2022,2022,2021,2022,2022,2022,2022,2022),
week =c("w51","w52","w1","w2","w4","w51","w1","w3","w1","w2","w3"),
duration =c(20,23,24,56,78,12,34,67,87,89,90))
Using the examples on the ?complete help page, you can use nesting() to give you what you want
data %>%
complete(nesting(household, individual), nesting(year, week), fill=list(duration=0))
# household individual year week duration
# <dbl> <dbl> <dbl> <chr> <dbl>
# 1 1001 1 2021 w51 20
# 2 1001 1 2021 w52 23
# 3 1001 1 2022 w1 24
# 4 1001 1 2022 w2 56
# 5 1001 1 2022 w3 0
# 6 1001 1 2022 w4 78
# 7 1002 2 2021 w51 12
# 8 1002 2 2021 w52 0
# 9 1002 2 2022 w1 34
# 10 1002 2 2022 w2 0
# 11 1002 2 2022 w3 67
# 12 1002 2 2022 w4 0
# 13 1003 1 2021 w51 0
# 14 1003 1 2021 w52 0
# 15 1003 1 2022 w1 87
# 16 1003 1 2022 w2 89
# 17 1003 1 2022 w3 90
# 18 1003 1 2022 w4 0

Count the occurences of accidents until the next accidents

I have the following data frame and I would like to create the "OUTPUT_COLUMN".
Explanation of columns:
ID is the identification number of the policy
ID_REG_YEAR is the identification number per Registration Year
CALENDAR_YEAR is the year that the policy have exposure
NUMBER_OF_RENEWALS is the count of numbers that the policy has renewed
ACCIDENT is accident occurred
KEY TO THE DATASET: ID_REG_YEAR and CALENDAR_YEAR
Basically, if column NUMBER_OF_RENEWALS = 0 then OUTPUT_COLUMN = 100. Any rows that an accident did not occurred before should contain 100 (e.g rows 13,16,17). If an Accident occured I would like to count the number of renewals until the next accident.
ID ID_REG_YEAR CALENDAR_YEAR NUMBER_OF_RENEWALS ACCIDENT OUTPUT_COLUMN
1 A A_2015 2015 0 YES 100
2 A A_2015 2016 0 YES 100
3 A A_2016 2016 1 YES 0
4 A A_2016 2017 1 YES 0
5 A A_2017 2017 2 NO 1
6 A A_2017 2018 2 NO 1
7 A A_2018 2018 3 NO 2
8 A A_2018 2019 3 NO 2
9 A A_2019 2019 4 YES 0
10 A A_2019 2020 4 YES 0
11 B B_2015 2015 0 NO 100
12 B B_2015 2016 0 NO 100
13 B B_2016 2016 1 NO 100
14 C C_2013 2013 0 NO 100
15 C C_2013 2014 0 NO 100
16 C C_2014 2014 1 NO 100
17 C C_2014 2015 1 NO 100
18 C C_2015 2015 2 YES 0
19 C C_2015 2016 2 YES 0
20 C C_2016 2016 3 NO 1
21 C C_2016 2017 3 NO 1
22 C C_2017 2017 4 NO 2
23 C C_2017 2018 4 NO 2
24 C C_2018 2018 5 YES 0
25 C C_2018 2019 5 YES 0
26 C C_2019 2019 6 NO 1
27 C C_2019 2020 6 NO 1
28 C C_2020 2020 7 NO 2
Here is a dplyr solution. First, obtain a separate column for the registration year, which will be used to calculate renewals since prior accident (assumes this is years since last accident). Then, create a column to contain the year of the last accident after grouping by ID. Using fill this value will be propagated. The final outcome column will be set as either 100 (if no prior accident, or NUMBER_OF_RENEWALS is zero) vs. the registration year - last accident year.
library(dplyr)
df %>%
separate(ID_REG_YEAR, into = c("ID_REG", "REG_YEAR"), convert = T) %>%
group_by(ID) %>%
mutate(LAST_ACCIDENT = ifelse(ACCIDENT == "YES", REG_YEAR, NA_integer_)) %>%
fill(LAST_ACCIDENT, .direction = "down") %>%
mutate(OUTPUT_COLUMN_2 = ifelse(
is.na(LAST_ACCIDENT) | NUMBER_OF_RENEWALS == 0, 100, REG_YEAR - LAST_ACCIDENT
))
Output
ID ID_REG REG_YEAR CALENDAR_YEAR NUMBER_OF_RENEWALS ACCIDENT OUTPUT_COLUMN LAST_ACCIDENT OUTPUT_COLUMN_2
<chr> <chr> <int> <int> <int> <chr> <int> <int> <dbl>
1 A A 2015 2015 0 YES 100 2015 100
2 A A 2015 2016 0 YES 100 2015 100
3 A A 2016 2016 1 YES 0 2016 0
4 A A 2016 2017 1 YES 0 2016 0
5 A A 2017 2017 2 NO 1 2016 1
6 A A 2017 2018 2 NO 1 2016 1
7 A A 2018 2018 3 NO 2 2016 2
8 A A 2018 2019 3 NO 2 2016 2
9 A A 2019 2019 4 YES 0 2019 0
10 A A 2019 2020 4 YES 0 2019 0
# … with 18 more rows
Note: If you want to use your policy number (NUMBER_OF_RENEWALS) and not go by the year, you can do something similar. Instead of adding a column with the last accident year, you can include the last accident policy. Then, your output column could reflect the policy number instead of year (to consider the possibility that one or more years could be skipped).
df %>%
separate(ID_REG_YEAR, into = c("ID_REG", "REG_YEAR"), convert = T) %>%
group_by(ID) %>%
mutate(LAST_ACCIDENT_POLICY = ifelse(ACCIDENT == "YES", NUMBER_OF_RENEWALS, NA_integer_)) %>%
fill(LAST_ACCIDENT_POLICY, .direction = "down") %>%
mutate(OUTPUT_COLUMN_2 = ifelse(
is.na(LAST_ACCIDENT_POLICY) | NUMBER_OF_RENEWALS == 0, 100, NUMBER_OF_RENEWALS - LAST_ACCIDENT_POLICY
))

How to use an index within another index to locate a change in a variable - R

I have the following dataset.
id<-c(1001,1001,1001,1002,1002,1003,1004,1005,1005,1005)
year<-c(2010,2013,2016, 2013,2010,2010,2016,2016,2010,2013)
status<-c(2,2,2,3,4,2,1,1,1,5)
df<-data.frame(id, year, status)
df <- df[order(df$id, df$year), ]
My goal is to create a for-loop with two indices one for id and the other for year so that it runs through the id first and then within each id it looks at years in which there was a change in the status. To record the changes with this loop, I want another variable that shows in which the change happened.
For example, in the dataframe below the variable change records 0 for id 1001 in all three years. But for 1002, a change in status is recorded with 1 in year 2013. For 1005, status changes twice, in 2013 and 2016, that's why 1 is recorded twice. btw, id is a character variable because the real data I am working on has alpha-numeric ids.
id year status change
1 1001 2010 2 0
2 1001 2013 2 0
3 1001 2016 2 0
5 1002 2010 4 0
4 1002 2013 3 1
6 1003 2010 2 0
7 1004 2016 1 0
9 1005 2010 1 0
10 1005 2013 2 1
8 1005 2016 1 1
The actual dataframe has over 600k observations. Loop takes a lot of time running. I am open to faster solutions too.
My code is below:
df$change<-NA df$id<-as.character(df$id) for(id in unique(df$id)) {
tau<-df$year[df$id==id] if (length(tau)>1) {
for( j in 1:(length(tau)-1)){
if (df$status[df$year==tau[j] & df$id==id] != df$status[df$year==tau[j+1]& df$id==id]) {
df$change[df$year==tau[j] & df$id==id]<-0
df$change[df$year==tau[j+1] & df$id==id]<-1
} else {
df$change[df$year==tau[j] & df$id==id]<-0
df$change[df$year==tau[j+1] & df$id==id]<-0
}}}
You could do:
Base R:
df |>
transform(change = ave(status, id, FUN = \(x)c(0, diff(x))!=0))
In tidyverse:
library(tidyverse)
df %>%
group_by(id) %>%
mutate(change = c(0, diff(status)!=0))
id year status change
<dbl> <dbl> <dbl> <dbl>
1 1001 2010 2 0
2 1001 2013 2 0
3 1001 2016 2 0
4 1002 2010 4 0
5 1002 2013 3 1
6 1003 2010 2 0
7 1004 2016 1 0
8 1005 2010 1 0
9 1005 2013 5 1
10 1005 2016 1 1
Does this yield the correct result?
library(dplyr)
id<-c(1001,1001,1001,1002,1002,1003,1004,1005,1005,1005)
year<-c(2010,2013,2016, 2013,2010,2010,2016,2016,2010,2013)
status<-c(2,2,2,3,4,2,1,1,1,5)
df<-data.frame(id, year, status)
df <- df[order(df$id, df$year), ]
df %>%
group_by(id) %>%
mutate(change = as.numeric(status != lag(status,
default = first(status))))
#> # A tibble: 10 x 4
#> id year status change
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1001 2010 2 0
#> 2 1001 2013 2 0
#> 3 1001 2016 2 0
#> 4 1002 2010 4 0
#> 5 1002 2013 3 1
#> 6 1003 2010 2 0
#> 7 1004 2016 1 0
#> 8 1005 2010 1 0
#> 9 1005 2013 5 1
#> 10 1005 2016 1 1
Note: I put the "NA replacement" in a second mutate since this step does not have to be on the grouped data which is then faster for large datasets
We can use ifelse with a logical comparison between status and lag(status). The key is the argument default = first(status), which eliminates common problems with NAs in the output.
df %>% group_by(id) %>%
mutate(change=ifelse(status==lag(status, default = first(status)), 0, 1))
# A tibble: 10 x 4
# Groups: id [5]
id year status change
<dbl> <dbl> <dbl> <dbl>
1 1001 2010 2 0
2 1001 2013 2 0
3 1001 2016 2 0
4 1002 2010 4 0
5 1002 2013 3 1
6 1003 2010 2 0
7 1004 2016 1 0
8 1005 2010 1 0
9 1005 2013 5 1
10 1005 2016 1 1

How to modify a column based on a condition in a time series?

I have a data on animal territories by month (1 = January etc.) for multiple individuals:
year month terr_size id
2018 1 20 1
2018 2 30 1
2019 1 5 1
2019 2 10 1
2018 3 20 2
2018 5 25 2
2018 6 20 2
2018 7 20 2
2019 1 10 2
2019 2 5 2
2019 3 20 2
2019 4 30 2
I want to add a column that has a 1 if two consecutive months exceed some value e.g. 10. One wrinkle is that my data can run over one year for a single id.
year month terr_size id new_col
2018 1 20 1 1
2018 2 30 1 1
2019 1 5 1 0
2019 2 10 1 0
2018 3 20 2 0
2018 5 25 2 1
2018 6 20 2 1
2018 7 20 2 1
2019 1 10 2 0
2019 2 5 2 0
2019 3 20 2 1
2019 4 30 2 1
This can be expressed compactly using a single left join in a single SQL statement.
Using the input shown in the Note at the end, perform a left self join using the indicated on condition and set new_col to 1 if for any original row both it and any matched rows have terr_size greater than or equal to 10. If there is no matched row then use coalesce to set new_col to 0.
library(sqldf)
sqldf("
select a.*,
coalesce(max(a.terr_size >= 10 and b.terr_size >= 10), 0)
new_col
from DF a
left join DF b on
a.id = b.id and
(12 * b.year + b.month = 12 * a.year + a.month + 1 or
12 * b.year + b.month = 12 * a.year + a.month - 1)
group by a.rowid")
giving:
year month terr_size id new_col
1 2018 1 20 1 1
2 2018 2 30 1 1
3 2019 1 5 1 0
4 2019 2 10 1 0
5 2018 3 20 2 0
6 2018 5 25 2 1
7 2018 6 20 2 1
8 2018 7 20 2 1
9 2019 1 10 2 0
10 2019 2 5 2 0
11 2019 3 20 2 1
12 2019 4 30 2 1
Note
The input and output shown in the question are not consistent so to be clear we assumed this:
Lines <- "year month terr_size id
2018 1 20 1
2018 2 30 1
2019 1 5 1
2019 2 10 1
2018 3 20 2
2018 5 25 2
2018 6 20 2
2018 7 20 2
2019 1 10 2
2019 2 5 2
2019 3 20 2
2019 4 30 2 "
DF <- read.table(text = Lines, header = TRUE)
Your data:
df <- read.table(text = "year month terr_size id
2018 1 20 1
2018 2 30 1
2019 1 5 1
2019 2 10 1
2018 3 20 2
2018 2 25 2
2018 6 20 2
2018 7 20 2
2019 1 10 2
2019 2 5 2
2019 3 20 2
2019 4 30 2 ", header = TRUE)
The idea is to create a date variable first.
Then you create two copies of your data by changing the dates one month ahead and one month back.
R is efficient memory-wise for this kind of operation, so you won't have a problem.
You will just take the space for one additional column. It doesn't actually replicate the whole dataframe.
Then you can join the new columns to the original dataframe.
You then apply the condition you needed.
I created a magic_number variable for that.
At the end, I selected only the original columns plus the one you needed.
library(dplyr)
library(lubridate)
# the threshold number
magic_number <- 10
# creare date variable
df <- df %>% mutate(date = make_date(year, month))
# [p]revious month
dfp <- df %>% transmute(id, date = date - months(1), terr_size_p = terr_size)
# [n]ext month
dfn <- df %>% transmute(id, date = date + months(1), terr_size_n = terr_size)
# join by id and date
df <- df %>%
left_join(dfp, by = c("id", "date")) %>%
left_join(dfn, by = c("id", "date"))
# for new_col to be 1, terr_size must be over the threshold, so must be at least one between previous and next month
df <- df %>%
mutate(new_col = as.numeric(terr_size > magic_number &
any(terr_size_p > magic_number, terr_size_n > magic_number)))
# remove variables if there is no more use for them
df <- df %>% select(-terr_size_p, -terr_size_n, -date)
df
Result:
year month terr_size id new_col
1 2018 1 20 1 1
2 2018 2 30 1 1
3 2019 1 5 1 0
4 2019 2 10 1 0
5 2018 3 20 2 1
6 2018 2 25 2 1
7 2018 6 20 2 1
8 2018 7 20 2 1
9 2019 1 10 2 0
10 2019 2 5 2 0
11 2019 3 20 2 1
12 2019 4 30 2 1
(The result is not exactly the same because your initial data and expected results do not correspond at row 5)
This solution handles the december-january issue we talked about in the comments.
I'm not exactly sure what is the rule because your output isn't following the rule you talk about (eg: line1/5 doesn't have another month for comparison yet you put an 1, line 6 is separated by 2 months, you put a 1 in the line 11 whereas line12 was <10).
I assumed the most complicated scenario, so you can remove the extra conditions you don't need:
You put an 1 if the territory size remained >10 for two consecutive months including this one (or the first recorded month if it's >10) for each individual.
df <- read.table(text = "year month terr_size id
2018 1 20 1
2018 2 30 1
2019 1 5 1
2019 2 10 1
2018 3 20 2
2018 5 25 2
2018 6 20 2
2018 7 20 2
2019 1 10 2
2019 2 5 2
2019 3 20 2
2019 4 30 2", header = TRUE)
Using dplyr and lag:
library(dplyr)
df %>% arrange(id, year,month) %>%
dplyr::mutate(newcol=case_when(is.na(lag(month))==TRUE & terr_size>10~1,
lag(id)!=id & terr_size>10~1,
id==lag(id) & year-lag(year)==0 & month-lag(month)==1 & terr_size>10 & lag(terr_size)>10~1,
id==lag(id) & year-lag(year)==1 & lag(month)-month==11 & terr_size>10 & lag(terr_size)>10~1,
TRUE~0))
output:
year month terr_size id newcol
1 2018 1 20 1 1
2 2018 2 30 1 1
3 2019 1 5 1 0
4 2019 2 10 1 0
5 2018 3 20 2 1
6 2018 5 25 2 0
7 2018 6 20 2 1
8 2018 7 20 2 1
9 2019 1 10 2 0
10 2019 2 5 2 0
11 2019 3 20 2 0
12 2019 4 30 2 1

Counting the distinct values for each day and group and inserting the value in an array in R

I want to transform the data below to give me an association array with the count of each unique id in each group for each day. So, for example, from the data below
Year Month Day Group ID
2014 04 26 1 A
2014 04 26 1 B
2014 04 26 2 B
2014 04 26 2 C
2014 05 12 1 B
2014 05 12 2 E
2014 05 12 2 F
2014 05 12 2 G
2014 05 12 3 G
2014 05 12 3 F
2015 05 19 1 F
2015 05 19 1 D
2015 05 19 2 E
2015 05 19 2 G
2015 05 19 2 D
2015 05 19 3 A
2015 05 19 3 E
2015 05 19 3 B
I want to make an array that gives:
[1] (04/26/2014)
Grp 1 2 3
1 0 1 0
2 1 0 0
3 0 0 0
[2] (05/12/2014)
Grp 1 2 3
1 0 0 1
2 0 0 2
3 1 2 0
[3] (05/19/2015)
Grp 1 2 3
1 0 1 0
2 1 0 1
3 0 1 0
The 'Grp' is just to indicate the group number. I know how to count the distinct values within the table, overall, but I’m trying to use for loops to also insert the appropriate unique value for each day for e.g., inserting the unique number of IDs that are present in both group 1 and 2 in 04/26/2014 and inserting that number in the group 1 and group 2 association matrix for that day. Any help would be appreciated.
I don't quite understand how you get the second one, but you can try this
dd <- read.table(header = TRUE, text = "Year Month Day Group ID
2014 04 26 1 A
2014 04 26 1 B
2014 04 26 2 B
2014 04 26 2 C
2014 05 12 1 B
2014 05 12 2 E
2014 05 12 2 F
2014 05 12 2 G
2014 05 12 3 G
2014 05 12 3 F
2015 05 19 1 F
2015 05 19 1 D
2015 05 19 2 E
2015 05 19 2 G
2015 05 19 2 D
2015 05 19 3 A
2015 05 19 3 E
2015 05 19 3 B")
dd <- within(dd, {
date <- as.Date(apply(dd[, 1:3], 1, paste0, collapse = '-'))
Group <- factor(Group)
Year <- Month <- Day <- NULL
})
Eg, for the first one
sp <- split(dd, dd$date)[[1]]
tbl <- table(sp$ID, sp$Group)
`diag<-`(crossprod(tbl), 0)
# 1 2 3
# 1 0 1 0
# 2 1 0 0
# 3 0 0 0
And do them all at once
lapply(split(dd, dd$date), function(x) {
cp <- crossprod(table(x$ID, x$Group))
diag(cp) <- 0
cp
})
# $`2014-04-26`
#
# 1 2 3
# 1 0 1 0
# 2 1 0 0
# 3 0 0 0
#
# $`2014-05-12`
#
# 1 2 3
# 1 0 0 0
# 2 0 0 2
# 3 0 2 0
#
# $`2015-05-19`
#
# 1 2 3
# 1 0 1 0
# 2 1 0 1
# 3 0 1 0
A possible solution with dplyr and tidyr will be as follows:
library(dplyr)
library(tidyr)
df$date <- as.Date(paste(df$Year, df$Month, df$Day, sep = '-'))
df %>%
expand(date, Group) %>%
left_join(., df) %>%
group_by(date, Group) %>%
summarise(nID = n_distinct(ID)) %>%
split(., .$date)
Resulting output:
$`2014-04-26`
Source: local data frame [3 x 3]
Groups: date [1]
date Group nID
(date) (int) (int)
1 2014-04-26 1 2
2 2014-04-26 2 2
3 2014-04-26 3 1
$`2014-05-12`
Source: local data frame [3 x 3]
Groups: date [1]
date Group nID
(date) (int) (int)
1 2014-05-12 1 1
2 2014-05-12 2 3
3 2014-05-12 3 2
$`2015-05-19`
Source: local data frame [3 x 3]
Groups: date [1]
date Group nID
(date) (int) (int)
1 2015-05-19 1 2
2 2015-05-19 2 3
3 2015-05-19 3 3

Resources