Check multiple columns for value in r after group by - r

In the following dataframe for each household, individual combination if "X_1","Y_2" and "Z_3" all three variable > 0 then add new column "criteria" = "C1" else 0.
Only household 1001 - individual 1 fulfill this condition
I tried ifelse option first and then select_at but to no avail. Its throwing error
data %>% group_by(household,individual) %>%
mutate(criteria = ifelse(X_1 >0 & Y_2 >0 & X_3 >0,"C1",0))
# option_2
data %>% group_by(household,individual) %>%
select_at(vars(X_1 >0 & Y_2 >0 & Z_3 >0,"C1",0),all_vars(.>0)) %>%
mutate(criteria = "c1")
I also want to retain all other variables intact for household - individual combination like year, week, duration in the final dataframe which are not present in the group by.
Please suggest
sample dataset:
data <- data.frame(household=c(1001,1001,1001,1001,1001,1002,1002,1002,1003,1003,1003),
individual = c(1,1,1,1,1,2,2,2,1,1,1),
year = c(2021,2021,2022,2022,2022,2021,2022,2022,2022,2022,2022),
week =c("w51","w52","w1","w2","w4","w51","w1","w3","w1","w2","w3"),
duration =c(20,23,24,56,78,12,34,67,87,89,90),
X_1 = c(3,3,3,3,3,0,0,0,1,1,1),
Y_2 = c(2,2,2,2,2,1,1,1,0,0,0),
Z_3 = c(4,4,4,4,4,0,0,0,0,0,0))

You coul use if_all(), which is more efficient than rowwise c_across.
data %>%
mutate(criteria = ifelse(if_all(X_1:Z_3, `>`, 0), "C1", "0"))
# household individual year week duration X_1 Y_2 Z_3 criteria
# 1 1001 1 2021 w51 20 3 2 4 C1
# 2 1001 1 2021 w52 23 3 2 4 C1
# 3 1001 1 2022 w1 24 3 2 4 C1
# 4 1001 1 2022 w2 56 3 2 4 C1
# 5 1001 1 2022 w4 78 3 2 4 C1
# 6 1002 2 2021 w51 12 0 1 0 0
# 7 1002 2 2022 w1 34 0 1 0 0
# 8 1002 2 2022 w3 67 0 1 0 0
# 9 1003 1 2022 w1 87 1 0 0 0
# 10 1003 1 2022 w2 89 1 0 0 0
# 11 1003 1 2022 w3 90 1 0 0 0

You're doing a rowwise operation so we can call rowwise and then do the ifelse using the c_across function. Calling ungroup to get out of rowwise
library(dplyr)
data |>
rowwise() |>
mutate(criteria = ifelse(all(c_across(X_1:Z_3) > 0), "C1", "0")) |>
ungroup()
Or you can just do:
data$criteria = apply(subset(data, ,X_1:Z_3), 1, \(x) ifelse(all(x) > 0, "C1", "0"))
household individual year week duration X_1 Y_2 Z_3 criteria
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 1001 1 2021 w51 20 3 2 4 C1
2 1001 1 2021 w52 23 3 2 4 C1
3 1001 1 2022 w1 24 3 2 4 C1
4 1001 1 2022 w2 56 3 2 4 C1
5 1001 1 2022 w4 78 3 2 4 C1
6 1002 2 2021 w51 12 0 1 0 0
7 1002 2 2022 w1 34 0 1 0 0
8 1002 2 2022 w3 67 0 1 0 0
9 1003 1 2022 w1 87 1 0 0 0
10 1003 1 2022 w2 89 1 0 0 0
11 1003 1 2022 w3 90 1 0 0 0

Related

Time since last event of grouped data in R

I have a data frame that contains a grouping variable (ID), a date and an event column with numeric values, in which 0 represent no event and >0 represents an event. An example data frame can be generated with the following code:
df <- data.frame(ID = c (1, 1, 1, 1, 2, 2, 2),
date = as.Date(c("2014-08-03", "2014-08-04", "2014-08-07", "2014-08-10", "2015-07-01", "2015-07-03", "2015-08-01")),
event = c(1, 0, 3, 0, 0, 4, 0))
df
> df
ID date event
1 1 2014-08-03 1
2 1 2014-08-04 0
3 1 2014-08-07 3
4 1 2014-08-10 0
5 2 2015-07-01 0
6 2 2015-07-03 4
7 2 2015-08-01 0
Now, I want to calculate the time that has passed since any last event (>0) has occured. In the particular case that the first entry/entries for any ID contains no event, "NA" should be generated. My desired output would look like this:
> df
ID date event tae
1 1 2014-08-03 1 0
2 1 2014-08-04 0 1
3 1 2014-08-07 3 0
4 1 2014-08-10 0 3
5 2 2015-07-01 0 NA
6 2 2015-07-03 4 0
7 2 2015-08-01 0 29
I have tried several different approaches. The closest I got was this:
library(dplyr)
df %>%
mutate(tmpG = cumsum(c(FALSE, as.logical(diff(event))))) %>%
group_by(ID) %>%
mutate(tmp = c(0, diff(date)) * !event) %>%
group_by(tmpG) %>%
mutate(tae = cumsum(tmp)) %>%
ungroup() %>%
select(-c(tmp, tmpG))
# A tibble: 7 x 4
ID date event tae
<dbl> <date> <dbl> <dbl>
1 1 2014-08-03 1 0
2 1 2014-08-04 0 1
3 1 2014-08-07 3 0
4 1 2014-08-10 0 3
5 2 2015-07-01 0 3
6 2 2015-07-03 4 0
7 2 2015-08-01 0 29
Any suggestions on how to get that code running (or any other alternative) would be greatly appreciated.
Here is another tidyverse approach, that uses fill to carry forward the most recent event.
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(last_event = if_else(event > 0, date, NA_real_)) %>%
fill(last_event) %>%
mutate(tae = as.numeric(date - last_event))
Output
ID date event last_event tae
<dbl> <date> <dbl> <date> <dbl>
1 1 2014-08-03 1 2014-08-03 0
2 1 2014-08-04 0 2014-08-03 1
3 1 2014-08-07 3 2014-08-07 0
4 1 2014-08-10 0 2014-08-07 3
5 2 2015-07-01 0 NA NA
6 2 2015-07-03 4 2015-07-03 0
7 2 2015-08-01 0 2015-07-03 29
df %>%
group_by(ID) %>%
mutate(tae = as.double(if_else(event==0, date-lag(date), 0)))
Output:
ID date event tae
<dbl> <date> <dbl> <dbl>
1 1 2014-08-03 1 0
2 1 2014-08-04 0 1
3 1 2014-08-07 3 0
4 1 2014-08-10 0 3
5 2 2015-07-01 0 NA
6 2 2015-07-03 4 0
7 2 2015-08-01 0 29

Keep previous value if it is under a certain threshold

I would like to create a variable called treatment_cont that is grouped by group as follows:
ID day day_diff treatment treatment_cont
1 0 NA 1 1
1 14 14 1 1
1 20 6 2 2
1 73 53 1 1
2 0 NA 1 1
2 33 33 1 1
2 90 57 2 2
2 112 22 3 2
2 152 40 1 1
2 178 26 4 1
Treatment_cont is the same as treatment but we want to keep the same treatment regime only when the day_diff, the difference in days between treatments, is lower than 30.
I have tried many ways on dplyr, manipulating the table, but I cannot figure out how to do it efficiently.
Probably, a conditional mutate, using case_when and lag might work:
df %>% mutate(treatment_cont = case_when(day_diff < 30 ~ treatment,TRUE ~ lag(treatment)))
You are probably looking for lag (and perhaps it's brother, lead):
df %>%
replace_na(list(day_diff=0)) %>%
group_by(ID) %>%
arrange(day) %>%
mutate(
treatment_cont = ifelse(day_diff < 30, lag(treatment_cont, default = treatment_cont[1]),treatment_cont)
# A tibble: 10 x 5
ID day day_diff treatment treatment_cont
<int> <int> <dbl> <int> <int>
1 1 0 0 1 1
2 1 14 14 1 1
3 1 20 6 2 1
4 1 73 53 1 1
5 2 0 0 1 1
6 2 33 33 1 1
7 2 90 57 2 2
8 2 112 22 3 2
9 2 152 40 1 1
10 2 178 26 4 1
) %>%
ungroup %>%
arrange(ID, day)

Calculated Column Based on Rows with Date Range

I have a dataframe as follows:
ID
Col1
RespID
Col3
Col4
Year
Month
Day
1
blue
729Ad
3.2
A
2021
April
2
2
orange
295gS
6.5
A
2021
April
1
3
red
729Ad
8.4
B
2021
April
20
4
yellow
592Jd
2.9
A
2021
March
12
5
green
937sa
3.5
B
2021
May
13
I would like to calculate a new column, Col5, such that its value is 1 if the row has Col4 value of A and there exists another column somewhere in the dataset a row with the same RespId but a Col4 value of B. Otherwise it’s value is 0. Then I will drop all rows with Col4 value of B, to keep just those with A. I'd also like to account for the date fields (year, month, date) so that this is done in groups based on say a 30 day timeframe. So if 'B' appears within 30 days of when 'A' appears in the dataset, only then is there a 1 present (if 'B' appears within 60 days, then there is no 1. Additionally, I'd like to keep everything as data.frames.
Here is what the desired output table would look like prior to dropping rows with Col4 value of B:
ID
Col1
RespID
Col3
Col4
Col5
1
blue
729Ad
3.2
A
1
2
orange
295gS
6.5
A
0
3
red
729Ad
8.4
B
0
4
yellow
592Jd
2.9
A
0
5
green
937sa
3.5
B
0
I have found Ronak's solution in this thread (Calculated Column Based on Rows in Tidymodels Recipe) to be useful, however, would like to modify for the date range.
A lot of things to unpack here.
I think you're tripping up over your own feet by trying to do too many things at once. I've broken down the code into four distinct steps to make the thought process easy to follow. Obviously, for use in a production environment it should be rewritten more efficiently.
1. Generate some data
library(tidyverse)
set.seed(42)
df <- tibble(
id = c(1:10),
resp_id = c(1701, seq(2286, 2289), 1701, seq(2290, 2293)),
grouping = sample(c("A", "B"), size = 10, replace = TRUE),
date = seq.Date(as.Date("2363-10-04"), as.Date("2363-11-17"), length.out = 10)
)
Resulting data:
# A tibble: 10 × 4
id resp_id grouping date
<int> <dbl> <chr> <date>
1 1 1701 A 2363-10-04
2 2 2286 A 2363-10-08
3 3 2287 A 2363-10-13
4 4 2288 A 2363-10-18
5 5 2289 B 2363-10-23
6 6 1701 B 2363-10-28
7 7 2290 B 2363-11-02
8 8 2291 B 2363-11-07
9 9 2292 A 2363-11-12
10 10 2293 B 2363-11-17
2. Check grouping
df <- df %>%
mutate(
is_a = ifelse(grouping == "A", 1, 0),
is_b = ifelse(grouping == "B", 1, 0)
)
We have the grouping now as easy-to-use dummy variables:
> df
# A tibble: 10 × 6
id resp_id grouping date is_a is_b
<int> <dbl> <chr> <date> <dbl> <dbl>
1 1 1701 A 2363-10-04 1 0
2 2 2286 A 2363-10-08 1 0
3 3 2287 A 2363-10-13 1 0
4 4 2288 A 2363-10-18 1 0
5 5 2289 B 2363-10-23 0 1
6 6 1701 B 2363-10-28 0 1
7 7 2290 B 2363-11-02 0 1
8 8 2291 B 2363-11-07 0 1
9 9 2292 A 2363-11-12 1 0
10 10 2293 B 2363-11-17 0 1
3. Check completeness
df <- df %>%
group_by(
resp_id
) %>%
mutate(
# Check if the grouping has both "A" and "B" values
is_complete = ifelse(
sum(is_a) > 0 & sum(is_b) > 0,
1,
0
)
) %>%
ungroup()
We see that there is only one resp_id value that is complete — 1701:
> df
# A tibble: 10 × 7
id resp_id grouping date is_a is_b is_complete
<int> <dbl> <chr> <date> <dbl> <dbl> <dbl>
1 1 1701 A 2363-10-04 1 0 1
2 2 2286 A 2363-10-08 1 0 0
3 3 2287 A 2363-10-13 1 0 0
4 4 2288 A 2363-10-18 1 0 0
5 5 2289 B 2363-10-23 0 1 0
6 6 1701 B 2363-10-28 0 1 1
7 7 2290 B 2363-11-02 0 1 0
8 8 2291 B 2363-11-07 0 1 0
9 9 2292 A 2363-11-12 1 0 0
10 10 2293 B 2363-11-17 0 1 0
4. Assign target value
df <- df %>%
group_by(
resp_id
) %>%
mutate(
# Check if the "A" part of a complete grouping has a another value within 30 days
is_within_timeframe = ifelse(
is_complete == 1 & is_a == 1 & max(date) - min(date) <= 30,
1,
0
)
) %>%
ungroup()
We see that our one complete set has in fact a B value that falls within 30 days of the A observation (Caveat: This only works if there are always exactly one or two observations per grouping!). Column is_within_timeframe corresponds to your Col4:
> df
# A tibble: 10 × 8
id resp_id grouping date is_a is_b is_complete is_within_timeframe
<int> <dbl> <chr> <date> <dbl> <dbl> <dbl> <dbl>
1 1 1701 A 2363-10-04 1 0 1 1
2 2 2286 A 2363-10-08 1 0 0 0
3 3 2287 A 2363-10-13 1 0 0 0
4 4 2288 A 2363-10-18 1 0 0 0
5 5 2289 B 2363-10-23 0 1 0 0
6 6 1701 B 2363-10-28 0 1 1 0
7 7 2290 B 2363-11-02 0 1 0 0
8 8 2291 B 2363-11-07 0 1 0 0
9 9 2292 A 2363-11-12 1 0 0 0
10 10 2293 B 2363-11-17 0 1 0 0

R MICE Imputation

data=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5),
"time"=c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4),
"v1"=c(16,12,14,12,17,16,12,12,13,12,16,16,10,10,14,17,17,12,10,11),
"v2"=c(1,1,3,2,2,2,3,1,2,1,2,1,3,1,1,2,3,3,1,2),
"v3"=c(4,1,4,4,2,2,2,2,1,3,2,3,1,2,2,1,4,1,1,4),
"v4"=c(NA,27,NA,42,40,48,45,25,29,NA,NA,27,NA,NA,NA,NA,NA,NA,44,39),
"v5"=c(NA,1,NA,NA,1,3,3,2,NA,NA,NA,1,NA,NA,NA,NA,3,2,4,1),
"v6"=c(NA,0,1,NA,1,NA,1,NA,0,NA,1,1,NA,NA,NA,NA,0,0,NA,0),
"v7"=c(0,1,1,NA,0,1,1,0,1,0,NA,0,NA,NA,NA,NA,0,1,NA,1),
"v8"=c(1,NA,0,1,0,0,NA,1,1,NA,0,0,NA,NA,NA,NA,1,0,NA,1))
This is my sample data and with it I am seeking to:
A. For time = 1 use v1-v3 to impute v4-v8 using MICE (v4 is continuous, v5 is categorical, v6-v8 is binary)
B. After imputed values are imputed for time = 1, I want to fill NA values that follow with the previous value. So if the variable for time 1-4 is: NA,NA,0,1 and the imputed value at time 1 is 1, then it could be: 1-1-0-1
I attemped:
dataNEW <- mice(data[,data$time == 1],m=5,maxit=50,meth='pmm',seed=500)
A. For time = 1 use v1-v3 to impute v4-v8 using MICE (v4 is continuous, v5 is categorical, v6-v8 is binary)
First, variables v5 - v6 have to be converted to factors:
data$v5 <- factor(data$v5)
data$v6 <- factor(data$v6)
data$v7 <- factor(data$v7)
data$v8 <- factor(data$v8)
Create a predictor matrix to tell mice to use only v1-v3 to predict v4-v8:
Pred_Matrix <- 1 - diag(ncol(data))
Pred_Matrix[,c(1:2, 6:10)] <- 0
Impute using only 1 imputation (the default is 5) because all you want are the imputed values; you're not doing anything else such as pooling the results for modelling.
impA <- mice(subset(data, subset = time==1), pred = Pred_Matrix, m = 1)
The imputed data can be extracted using the complete function (from the mice package, not tidyr).
B. After imputed values are imputed for time = 1, I want to fill NA
values that follow with the previous value. So if the variable for
time 1-4 is: NA,NA,0,1 and the imputed value at time 1 is 1, then it
could be: 1-1-0-1
library(dplyr)
library(tidyr) # Needed for the fill function
mice::complete(impA) %>%
rbind(subset(data, subset=time!=1)) %>%
arrange(student, time) %>%
group_by(student) %>%
fill(v4:v8)
# A tibble: 20 x 10
# Groups: student [5]
student time v1 v2 v3 v4 v5 v6 v7 v8
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
1 1 1 16 1 4 40 2 1 0 1
2 1 2 12 1 1 27 1 0 1 1
3 1 3 14 3 4 27 1 1 1 0
4 1 4 12 2 4 42 1 1 1 1
5 2 1 17 2 2 40 1 1 0 0
6 2 2 16 2 2 48 3 1 1 0
7 2 3 12 3 2 45 3 1 1 0
8 2 4 12 1 2 25 2 1 0 1
9 3 1 13 2 1 29 1 0 1 1
10 3 2 12 1 3 29 1 0 0 1
11 3 3 16 2 2 29 1 1 0 0
12 3 4 16 1 3 27 1 1 0 0
13 4 1 10 3 1 40 1 0 0 0
14 4 2 10 1 2 40 1 0 0 0
15 4 3 14 1 2 40 1 0 0 0
16 4 4 17 2 1 40 1 0 0 0
17 5 1 17 3 4 40 3 0 0 1
18 5 2 12 3 1 40 2 0 1 0
19 5 3 10 1 1 44 4 0 1 0
20 5 4 11 2 4 39 1 0 1 1
Data
Note, I had to change the first value of v5 to 2, otherwise the polyreg imputation fails (there are only two categories for time=1).
data=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5),
"time"=c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4),
"v1"=c(16,12,14,12,17,16,12,12,13,12,16,16,10,10,14,17,17,12,10,11),
"v2"=c(1,1,3,2,2,2,3,1,2,1,2,1,3,1,1,2,3,3,1,2),
"v3"=c(4,1,4,4,2,2,2,2,1,3,2,3,1,2,2,1,4,1,1,4),
"v4"=c(NA,27,NA,42,40,48,45,25,29,NA,NA,27,NA,NA,NA,NA,NA,NA,44,39),
"v5"=c(2,1,NA,NA,1,3,3,2,NA,NA,NA,1,NA,NA,NA,NA,3,2,4,1),
"v6"=c(NA,0,1,NA,1,NA,1,NA,0,NA,1,1,NA,NA,NA,NA,0,0,NA,0),
"v7"=c(0,1,1,NA,0,1,1,0,1,0,NA,0,NA,NA,NA,NA,0,1,NA,1),
"v8"=c(1,NA,0,1,0,0,NA,1,1,NA,0,0,NA,NA,NA,NA,1,0,NA,1))

Recode column for whole group based on other column's value of oldest group member

I want to recode two columns indicating the status (x1 or x2 = either 3 or 0) of a whole group based on the value of another column of the oldest member of each group.
In the example below x1(x2) is the sum of key1(key2) inside each group (there are always three values/imputations per person). However, I only want to have either x1>0 or x2>0 for each group. In those groups where there is one person with key1=1 and one person with key2=1 (and therefore x1=3 AND x2=3) the oldest person should decide. If the oldest person has key1=1 and key2=0, like in group A, x1 should be 3 and x2 should be 0 for the whole group and so on.
Reproducible example:
id <- c("A11", "A12", "A13", "A21", "A22", "A23", "B11", "B12", "B13", "C11", "C12", "C13", "C21", "C22", "C23", "D11", "D12", "D13", "D21", "D22", "D23", "E11", "E12", "E13", "E21", "E22", "E23")
group <- c("A","A","A","A","A","A","B","B","B","C","C","C","C","C","C","D","D","D","D","D","D","E","E","E","E","E","E")
imputation <- c(rep(1:3, 9))
age <- c(45,45,45,17,17,17,20,20,20,70,70,70,60,60,60,25,25,25,30,30,30,28,28,28,34,34,34)
key1 <- c(1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0)
key2 <- c(0,0,0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0)
x1 <- c(3,3,3,3,3,3,0,0,0,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3)
x2 <- c(3,3,3,3,3,3,0,0,0,3,3,3,3,3,3,3,3,3,3,3,3,0,0,0,0,0,0)
test <- data.frame(id, group, imputation, age, key1, key2, x1, x2)
Subset where x1 and x2 should be recoded:
> test %>% group_by(group) %>% filter(x1==x2 & x1>0 | x1==x2 & x2>0)
# A tibble: 18 x 8
# Groups: group [3]
id group imputation age key1 key2 x1 x2
<fct> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A11 A 1 45 1 0 3 3
2 A12 A 2 45 1 0 3 3
3 A13 A 3 45 1 0 3 3
4 A21 A 1 17 0 1 3 3
5 A22 A 2 17 0 1 3 3
6 A23 A 3 17 0 1 3 3
7 C11 C 1 70 0 1 3 3
8 C12 C 2 70 0 1 3 3
9 C13 C 3 70 0 1 3 3
10 C21 C 1 60 1 0 3 3
11 C22 C 2 60 1 0 3 3
12 C23 C 3 60 1 0 3 3
13 D11 D 1 25 1 0 3 3
14 D12 D 2 25 1 0 3 3
15 D13 D 3 25 1 0 3 3
16 D21 D 1 30 0 1 3 3
17 D22 D 2 30 0 1 3 3
18 D23 D 3 30 0 1 3 3
The output should be:
id group imputation age key1 key2 x1 x2
1 A11 A 1 45 1 0 3 0
2 A12 A 2 45 1 0 3 0
3 A13 A 3 45 1 0 3 0
4 A21 A 1 17 0 1 3 0
5 A22 A 2 17 0 1 3 0
6 A23 A 3 17 0 1 3 0
7 C11 C 1 70 0 1 0 3
8 C12 C 2 70 0 1 0 3
9 C13 C 3 70 0 1 0 3
10 C21 C 1 60 1 0 0 3
11 C22 C 2 60 1 0 0 3
12 C23 C 3 60 1 0 0 3
13 D11 D 1 25 1 0 0 3
14 D12 D 2 25 1 0 0 3
15 D13 D 3 25 1 0 0 3
16 D21 D 1 30 0 1 0 3
17 D22 D 2 30 0 1 0 3
18 D23 D 3 30 0 1 0 3
I guess it can be done with a combination of group_by, filter, mutate and ifelse, but I haven't figured it out yet. It is important, however, that it includes filter or something similar, because the observations with x1==x2 & x1>0 | x1==x2 & x2>0 are only a subset of my data frame.
Within each group you can compare the unique value of age where key1 is 1 with the unique value of age where key2 is 1 and update x1 and x2 accordingly:
id <- c("A11", "A12", "A13", "A21", "A22", "A23", "B11", "B12", "B13", "C11", "C12", "C13", "C21", "C22", "C23", "D11", "D12", "D13", "D21", "D22", "D23", "E11", "E12", "E13", "E21", "E22", "E23")
group <- c("A","A","A","A","A","A","B","B","B","C","C","C","C","C","C","D","D","D","D","D","D","E","E","E","E","E","E")
imputation <- c(rep(1:3, 9))
age <- c(45,45,45,17,17,17,20,20,20,70,70,70,60,60,60,25,25,25,30,30,30,28,28,28,34,34,34)
key1 <- c(1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0)
key2 <- c(0,0,0,1,1,1,0,0,0,1,1,1,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0)
x1 <- c(3,3,3,3,3,3,0,0,0,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3)
x2 <- c(3,3,3,3,3,3,0,0,0,3,3,3,3,3,3,3,3,3,3,3,3,0,0,0,0,0,0)
test <- data.frame(id, group, imputation, age, key1, key2, x1, x2)
library(dplyr)
test %>%
group_by(group) %>%
filter(x1==x2 & x1>0 | x1==x2 & x2>0) %>%
mutate(x1 = ifelse(unique(age[key1==1]) > unique(age[key2==1]), 3, 0),
x2 = ifelse(unique(age[key1==1]) > unique(age[key2==1]), 0, 3)) %>%
ungroup()
# # A tibble: 18 x 8
# id group imputation age key1 key2 x1 x2
# <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A11 A 1 45 1 0 3 0
# 2 A12 A 2 45 1 0 3 0
# 3 A13 A 3 45 1 0 3 0
# 4 A21 A 1 17 0 1 3 0
# 5 A22 A 2 17 0 1 3 0
# 6 A23 A 3 17 0 1 3 0
# 7 C11 C 1 70 0 1 0 3
# 8 C12 C 2 70 0 1 0 3
# 9 C13 C 3 70 0 1 0 3
#10 C21 C 1 60 1 0 0 3
#11 C22 C 2 60 1 0 0 3
#12 C23 C 3 60 1 0 0 3
#13 D11 D 1 25 1 0 0 3
#14 D12 D 2 25 1 0 0 3
#15 D13 D 3 25 1 0 0 3
#16 D21 D 1 30 0 1 0 3
#17 D22 D 2 30 0 1 0 3
#18 D23 D 3 30 0 1 0 3

Resources