Problem: I have 2 datasets with no matching identifiers (like ID) and need to find the closest match in df1$time to df2$tstart. df1 (with time column) has 660,000 rows with time stamps approximately every 0.00125 s.
Whatever the closest match is to df2$tstart, I would like a new column made (df1$trial_start) that says "yes", otherwise "no".
I've tried findInterval, but it only seems to match in ascending order, and doesn't check values in both directions. In the below code, it looks good for most of the outputs, but there are some indices where the value after the returned index is closer to $tstart
#my actual code:
index_closest <- findInterval(iti_summaries_2183[["24"]]$tstart, poke_1s$time)
poke_1s$trial_start <- ifelse(seq_len(nrow(poke_1s)) %in% index_closest, "yes", "no")
I've also tried which.min, which doesn't work since the lists lengths don't match.
Additionally, I've fought with roll = "nearest" like here but the functions return values and I'm not sure how to create a new column and assign y/n.
Code to replicate problem:
n <- 773
df1 <- structure(list(initiate = sample(c(0,1), replace=TRUE, size=n),
left = sample(c(0,1), replace=TRUE, size=n),
right = sample(c(0,1), replace=TRUE, size=n),
time = seq(from = 2267.2, to = 2363.75, by = 0.125)))
df1 <- data.frame(df1)
df2 <- structure(list(trial = c(156:162),
control = c(0, 0, 0, 0, 3, 0, 3),
t_start = c(2267.231583, 2289.036355, 2298.046849, 2318.933635, 2328.334036, 2347.870449, 2363.748095),
t_end = c(2268.76760, 2290.83370, 2299.38547, 2320.71400, 2329.93985, 2349.15464, 2365.12455)),
class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L), spec = structure(list(
cols = list(trial = structure(list(), class = c("collector_double",
"collector")), control = structure(list(), class = c("collector_double",
"collector")), t_start = structure(list(), class = c("collector_double",
"collector")), t_end = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
If I understand your question correctly :
library(data.table)
setDT(df1)
setDT(df2)
df1[df2,.(initiate,left,right,x.time,trial,control,t_start,t_end,
trial_start=fifelse(x.time>t_start&x.time<t_end,'Y','N')),
on=.(time=t_start),roll='nearest']
initiate left right x.time trial control t_start t_end trial_start
<num> <num> <num> <num> <int> <num> <num> <num> <char>
1: 0 0 1 2267.200 156 0 2267.232 2268.768 N
2: 0 0 1 2289.075 157 0 2289.036 2290.834 Y
3: 0 0 1 2298.075 158 0 2298.047 2299.385 Y
4: 1 1 1 2318.950 159 0 2318.934 2320.714 Y
5: 1 1 1 2328.325 160 3 2328.334 2329.940 N
6: 0 0 1 2347.825 161 0 2347.870 2349.155 N
7: 1 1 0 2363.700 162 3 2363.748 2365.125 N
Related
This is a related question from my original post found here: How to create a new variable based on condition from different dataframe in R
I have 2 data frames from an experiment. The 1st df reads a (roughly) continuous signal over 40 mins. There are 5 columns, 1:3 are binary - saying whether a button was pushed. The 4th column is a binary of if either from column 2 or 3 was pushed. The 5th column is an approximate time in seconds. Example from df below:
initiate
left
right
l or r
time
0
0
1
1
2.8225
0
0
1
1
2.82375
0
0
1
1
2.82500
0
0
1
1
2.82625
1
0
0
0
16.8200
1
0
0
0
16.8212
etc.
The 2nd data frame is session info where each row is a trial, usually 100-150 rows depending on the day. I have a column that marks trial start time and another column that marks trial end time in seconds. I have another column that states whether or not the trial had an intervention. Example from df below (I omitted several irrelevant columns):
trial
control
t start
t end
1
0
16.64709
35.49431
2
0
41.81843
57.74304
3
0
65.54510
71.16612
4
0
82.65743
87.30914
11
3
187.0787
193.5898
12
0
200.0486
203.1883
30
3
415.1710
418.0405
etc.
For the 1st data frame, I want to create a column that indicates whether or not the button was pushed within a trial. If the button was indeed pushed within a trial, I need to label it based on intervention. This is based on those start and end times in the 2nd df, along with the control info. In this table, 0 = intervention and 3 = control.
I would like it to look something like this (iti = inter-trial, wt_int = within trial & intervention, wt_control = within trial & control):
initiate
left
right
l or r
time
trial_type
0
0
1
1
2.8225
iti
0
0
1
1
2.82375
iti
0
0
1
1
2.82500
iti
0
0
1
1
2.82625
iti
1
0
0
0
16.82000
wt_int
1
0
0
0
16.82125
wt_int
1
0
0
0
187.0800
wt_control
etc.
Going off previous recommendations, I've tried nested ifelse statements with no success. I can get it to label all of the trials as either "iti" or "wt_int" with different failed attempts, or an error at row 1037 (when it changes from iti to wt). From my original question I have a "trial" column now in my 1st df which I'm using for the following code. Perhaps there is a more straightforward approach that combines the original code?
Errors out part way through:
df %>%
rowwise() %>%
mutate(trial_type = ifelse(any(trial == "wt" & df2$control == 0,
ifelse(trial == "wt" & df2$control == 3,
"wt_omission", "iti"), "wt_odor")))
Also tried this, which labels all as wt_int:
df$trial_type <- ifelse(df$trial == 'wt' && df2$control == 0,
ifelse(df$trial == 'wt' && df2$control == 3,
"wt_control", "iti"), "wt_int")
Thank you!
You could use cut to create intervals and check, if a values falls into them:
library(dplyr)
df1 %>%
mutate(
check_1 = cut(time, breaks = df2$t_start, labels = FALSE),
check_2 = coalesce(cut(time, breaks = df2$t_end, labels = FALSE), 0),
check_3 = df2$control[check_1],
trial_type = case_when(
check_1 - check_2 == 1 & check_3 == 0 ~ "wt_int",
check_1 - check_2 == 1 & check_3 == 3 ~ "wt_control",
TRUE ~ "iti"
)
) %>%
select(-starts_with("check_"))
This returns
# A tibble: 7 x 6
initiate left right l_or_r time trial_type
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 0 0 1 1 2.82 iti
2 0 0 1 1 2.82 iti
3 0 0 1 1 2.82 iti
4 0 0 1 1 2.83 iti
5 1 0 0 0 16.8 wt_int
6 1 0 0 0 16.8 wt_int
7 1 0 0 0 187. wt_control
Data
df1 <- structure(list(initiate = c(0, 0, 0, 0, 1, 1, 1), left = c(0,
0, 0, 0, 0, 0, 0), right = c(1, 1, 1, 1, 0, 0, 0), l_or_r = c(1,
1, 1, 1, 0, 0, 0), time = c(2.8225, 2.82375, 2.825, 2.82625,
16.82, 16.8212, 187.08)), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -7L), spec = structure(list(
cols = list(initiate = structure(list(), class = c("collector_double",
"collector")), left = structure(list(), class = c("collector_double",
"collector")), right = structure(list(), class = c("collector_double",
"collector")), l_or_r = structure(list(), class = c("collector_double",
"collector")), time = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
df2 <- structure(list(trial = c(1, 2, 3, 4, 11, 12, 30), control = c(0,
0, 0, 0, 3, 0, 3), t_start = c(16.64709, 41.81843, 65.5451, 82.65743,
187.0787, 200.0486, 415.171), t_end = c(35.49431, 57.74304, 71.16612,
87.30914, 193.5898, 203.1883, 418.0405)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L), spec = structure(list(
cols = list(trial = structure(list(), class = c("collector_double",
"collector")), control = structure(list(), class = c("collector_double",
"collector")), t_start = structure(list(), class = c("collector_double",
"collector")), t_end = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
I have my data in the wide-format
what is the easiest way to change it to long without having a X in front of the time variables
Sample data:
structure(list(X1 = c("01/12/2019", "02/12/2019"), `00:30` = c(41.95,
39.689), `01:00` = c(44.96, 40.47), `01:30` = c(42.939, 38.95
), `02:00` = c(43.221, 40.46), `02:30` = c(44.439, 41.97)), class = "data.frame", row.names = c(NA,
-2L), spec = structure(list(cols = list(X1 = structure(list(), class = c("collector_character",
"collector")), `00:30` = structure(list(), class = c("collector_double",
"collector")), `01:00` = structure(list(), class = c("collector_double",
"collector")), `01:30` = structure(list(), class = c("collector_double",
"collector")), `02:00` = structure(list(), class = c("collector_double",
"collector")), `02:30` = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
with pivot_longer and pivot_wider from tidyr:
dat |>
pivot_longer(names_to="time",values_to="val",`00:30`:`02:30`) |>
pivot_wider(names_from="X1", values_from="val")
Output:
# A tibble: 5 x 3
time `01/12/2019` `02/12/2019`
<chr> <dbl> <dbl>
1 00:30 42.0 39.7
2 01:00 45.0 40.5
3 01:30 42.9 39.0
4 02:00 43.2 40.5
5 02:30 44.4 42.0
I this special case, you could transpose the part of your data.frame containing numbers and assign the column names:
df_new <- data.frame(t(df[,-1]))
colnames(df_new) <- df[, 1]
This returns a data.frame df_new:
01/12/2019 02/12/2019
00:30 41.950 39.689
01:00 44.960 40.470
01:30 42.939 38.950
02:00 43.221 40.460
02:30 44.439 41.970
Edit (Thanks to jay.sf)
For versions of R >= 4.1, you could use the natural pipe:
t(df[, -1]) |>
data.frame() |>
`colnames<-`(df[, 1])
I need a chart of accounts to stay in order when new accounts are added or dropped in future years. This is because in Accounting the accounts are sorted by type (for example Asset, Liability Equity) but it is not explicit in the dataset. This is an example of the code that is putting new "Accounts" from Year2 and Year3 at the bottom.
XYZCompany_Consolidated <- XYZCompany_Year1 %>%
full_join(XYZCompany_Year2 by = "Account") %>%
full_join(XYZCompany_Year3, by = "Account")
Example: This picture is just to give a simplified example. The highlight in orange is where the new accounts are going and to the right is the code i'm using, and the green is what I'm trying to achieve
Perhaps I'm overthinking this problem but I find it hard to solve. Let's define some data first:
df_year1 <- structure(list(Account = c("Cash", "Accounts", "Loan1", "Auto",
"JaneDoe"), Year_1 = c(100, 1000, 20, 300, 500)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L), spec = structure(list(
cols = list(Account = structure(list(), class = c("collector_character",
"collector")), Year_1 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
df_year2 <- structure(list(Account = c("Cash", "Accounts", "Loan1", "Auto",
"Laptop", "JaneDoe"), Year_2 = c(80, 1200, 50, 300, 500, 0)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), spec = structure(list(
cols = list(Account = structure(list(), class = c("collector_character",
"collector")), Year_2 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
df_year3 <- structure(list(Account = c("Cash", "Accounts", "Loan1", "Auto",
"Rent", "JaneDoe"), Year_3 = c(80, 1200, 50, 300, 1000, 0)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), spec = structure(list(
cols = list(Account = structure(list(), class = c("collector_character",
"collector")), Year_3 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
Those are similar to the data shown in the OP's picture, e.g. df_year1 looks like
# A tibble: 5 x 2
Account Year_1
<chr> <dbl>
1 Cash 100
2 Accounts 1000
3 Loan1 20
4 Auto 300
5 JaneDoe 500
Next we transform those data a little bit, namely
library(dplyr)
library(tidyr)
df_y1 <- df_year1 %>%
mutate(Year = 1,
no = row_number()) %>%
rename(value = Year_1)
which returns
# A tibble: 5 x 4
Account value Year no
<chr> <dbl> <dbl> <int>
1 Cash 100 1 1
2 Accounts 1000 1 2
3 Loan1 20 1 3
4 Auto 300 1 4
5 JaneDoe 500 1 5
The new column no stores the account's original position, column Year stores the chart's year. All three data.frames are processed like this, so we get df_y1, df_y2, df_y3.
Finally we bind them together
bind_rows(df_y1, df_y2, df_y3) %>%
mutate(num_years = max(Year)) %>%
group_by(Account) %>%
mutate(rank = sum((num_years - n() + 1) * no), .keep = "unused") %>%
pivot_wider(names_from = Year) %>%
arrange(rank) %>%
select(-rank) %>%
ungroup()
and calculate a rank for each account. The accounts are ordered by this rank. As a result, we get
# A tibble: 7 x 4
Account Year_1 Year_2 Year_3
<chr> <dbl> <dbl> <dbl>
1 Cash 100 80 80
2 Accounts 1000 1200 1200
3 Loan1 20 50 50
4 Auto 300 300 300
5 Laptop NA 500 NA
6 Rent NA NA 1000
7 JaneDoe 500 0 0
Note
I believe, there are better approaches, but at least this works for the example data.
I'm not sure about the calculated rank's stability. Take care.
I have a dataframe with the following structure:
Timestamp Value1 Value2 Problem1 Problem2
00:00 32 40 No No
00:05 12 55 No No
00:10 14 42 Yes No
00:15 50 33 Yes No
00:20 78 47 No No
Where Problem1 defines if there is a problem with Value1, and Problem2 defines if there is a problem with Value2. In case of having a Yes in Problem1, I'd need to replace Value1 by Value2. In case of having problems in both, they should keep unchanged.
My problem here is that I won't know how many Value and Problem columns I'll have. So, in case of having more than 2, I'd need to replace the value with problems by the average of those values without problems.
So, in another example:
Timestamp Value1 Value2 Value3 Problem1 Problem2 Problem3
00:00 32 40 45 No No No
00:05 12 55 48 No No No
00:10 14 42 55 Yes No No
00:15 50 33 13 Yes No Yes
00:20 78 47 75 No No No
Here I'd need to replace Value1 at 00:10 by the average of Value2 and Value3. Also, I'd need to replace Value1 and Value3 at 00:15 by Value2.
I bet there is a more elegant solution.
library(tidyr)
library(dplyr)
df %>%
mutate(across(starts_with("Problem"), ~ .x == "Yes")) %>%
pivot_longer(-Timestamp, names_to = c("name", "id"), names_pattern = "(.*)(\\d+)") %>%
pivot_wider() %>%
group_by(Timestamp) %>%
mutate(Value = case_when(sum(Problem) == 0 | sum(Problem) == n() | !Problem ~ Value,
TRUE~ sum(Value * (1 - Problem))/sum(1-Problem))) %>%
pivot_longer(cols=c("Value", "Problem")) %>%
mutate(name = paste0(name,id), .keep="unused") %>%
pivot_wider() %>%
ungroup() %>%
mutate(across(starts_with("Problem"), ~ ifelse(.x == 1, "Yes", "No")))
returns
# A tibble: 5 x 7
Timestamp Value1 Problem1 Value2 Problem2 Value3 Problem3
<time> <dbl> <chr> <dbl> <chr> <dbl> <chr>
1 00'00" 32 No 40 No 45 No
2 05'00" 12 No 55 No 48 No
3 10'00" 48.5 Yes 42 No 55 No
4 15'00" 33 Yes 33 No 33 Yes
5 20'00" 78 No 47 No 75 No
What approach did I use?
Transform your Problem Variable into a boolean. R is able to use booleans in calculations, technically it is transformed later into a double.
Turn your value/problem numbers into a id, so for every timestamp there are several rows for Value and Problem.
Calculate the new value based on the number of problems and if the value is problematic.
Restore the shape of your data.frame.
Data
df <- structure(list(Timestamp = structure(c(0, 300, 600, 900, 1200
), class = c("hms", "difftime"), units = "secs"), Value1 = c(32,
12, 14, 50, 78), Value2 = c(40, 55, 42, 33, 47), Value3 = c(45,
48, 55, 13, 75), Problem1 = c("No", "No", "Yes", "Yes", "No"),
Problem2 = c("No", "No", "No", "No", "No"), Problem3 = c("No",
"No", "No", "Yes", "No")), problems = structure(list(row = 5L,
col = "Problem3", expected = "", actual = "embedded null",
file = "literal data"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -5L), spec = structure(list(
cols = list(Timestamp = structure(list(format = ""), class = c("collector_time",
"collector")), Value1 = structure(list(), class = c("collector_double",
"collector")), Value2 = structure(list(), class = c("collector_double",
"collector")), Value3 = structure(list(), class = c("collector_double",
"collector")), Problem1 = structure(list(), class = c("collector_character",
"collector")), Problem2 = structure(list(), class = c("collector_character",
"collector")), Problem3 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
I use the data.table package. I call your data given in the second example "din".
I hope this code helps you:
#I use the library data.table; get data in data.table format
library(data.table)
din <- data.table(din)
din[,Value1:=as.numeric(Value1)]
din[,Value2:=as.numeric(Value2)]
din[,Value3:=as.numeric(Value3)]
#set Values to NA if there is a Problem
din[Problem1=="Yes", Value1:=NA]
din[Problem2=="Yes", Value2:=NA]
din[Problem3=="Yes", Value3:=NA]
#print table with NA replaced if we have a Problem
#print(din)
# Timestamp Value1 Value2 Value3 Problem1 Problem2 Problem3
#1: 00:00 32 40 45 No No No
#2: 00:05 12 55 48 No No No
#3: 00:10 NA 42 55 Yes No No
#4: 00:15 NA 33 NA Yes No Yes
#5: 00:20 78 47 75 No No No
#use the mean function to replace if I have an NA in the table (just working if Timestamp is a unique id, otherwise you need to generate one and use this in the by argument)
din[is.na(Value1), Value1:=mean(c(Value2,Value3), na.rm=T), by=Timestamp]
din[is.na(Value2), Value2:=mean(c(Value1,Value2), na.rm=T), by=Timestamp]
din[is.na(Value3), Value3:=mean(c(Value1,Value2), na.rm=T), by=Timestamp]
#print final table
#print(din)
# Timestamp Value1 Value2 Value3 Problem1 Problem2 Problem3
#1: 00:00 32.0 40 45 No No No
#2: 00:05 12.0 55 48 No No No
#3: 00:10 48.5 42 55 Yes No No
#4: 00:15 33.0 33 33 Yes No Yes
#5: 00:20 78.0 47 75 No No No
``
I made a sample for the data:
df <- data.frame(value1 = runif(10, min = 0, max = 100),
value2 = runif(10, min = 0, max = 100),
value3 = runif(10, min = 0, max = 100))
df_problem <- data.frame(problem1 = sample(c('yes','no'), 10, replace = T),
problem2 = sample(c('yes','no'), 10, replace = T),
problem3 = sample(c('yes','no'), 10, replace = T))
See that I separated the values from the problems. Then:
df_problem[df_problem == 'yes'] <- 1
df_problem[df_problem == 'no'] <- NA
df_problem <- matrix(as.numeric(unlist(df_problem)), nrow = nrow(df)) #rebuild matrix
Finally:
df <- df * df_problem
for (i in 1:nrow(df)){
if (T %in% is.na(df[i,])){
df[i,c(which(is.na(df[i,])))] <- mean(unlist(df[i,]), na.rm = T)
}
}
df
I am at a loss, I feel as if I am finding the answer... but I have been unable to do so. I really don't know where to start
I have the results from ranked voting:
It shows how many votes (100 voters) each candidate (5 candidates) got for each place (reproducible data is at the bottom):
Name 1st 2nd 3rd 4th 5th
Can1 50 0 15 25 10
Can2 15 25 0 10 50
Can3 25 50 10 0 15
Can4 0 10 50 15 25
Can5 10 15 0 0 0
I am trying to recreate ballots from the results, 100 ballots like this (also, some ballots have not been filled completely):
Ballot1: Can1, Can3, Can4, Can5, Can2
Ballot2: Can1, Can3, Can5
Ballot3: Can3, Can5, Can2, Can1, Can4
...
Ballot100: Can2, Can5, Can1, Can4
I need to do this with 60 candidates and more than 1000 votes.
voting.results <- structure(list(X1 = c("Can1", "Can2", "Can3", "Can4", "Can5"),
`1place` = c(50L, 15L, 25L, 0L, 10L), `2place` = c(0L, 25L,
50L, 10L, 15L), `3place` = c(15L, 0L, 10L, 50L, 0L), `4place` = c(25L,
10L, 0L, 15L, 0L), `5place` = c(10L, 50L, 15L, 25L, 0L)), .Names = c("X1",
"1place", "2place", "3place", "4place", "5place"), class = "data.frame", row.names = c(NA,
-5L), spec = structure(list(cols = structure(list(X1 = structure(list(), class = c("collector_character",
"collector")), `1place` = structure(list(), class = c("collector_integer",
"collector")), `2place` = structure(list(), class = c("collector_integer",
"collector")), `3place` = structure(list(), class = c("collector_integer",
"collector")), `4place` = structure(list(), class = c("collector_integer",
"collector")), `5place` = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("X1", "1place", "2place", "3place",
"4place", "5place")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
at the beginning would be nice to have this dataset with each candidates and his/her all performances. What was done below it's just repeating each row (candidate, place) by the time it's occurred in voting.results.
df1 is a number of specific places by candidate.
library(magrittr);library(dplyr)
df1 <-
voting.results %>%
reshape2::melt() %>%
mutate( variable = as.integer(gsub("place","",variable) )) %>%
rename(place=variable,can=X1)
head(df1)
# can place value
# 1 Can1 1 50
# 2 Can2 1 15
# 3 Can3 1 25
# 4 Can4 1 0
# 5 Can5 1 10
# 6 Can1 2 0
And df2 is a dataset with row per each performance.
df2 <-
df1[ rep(row.names(df1), df1$value) , ] %>%
mutate(id = 1:n()) %>%
select(-value) %>%
arrange(place)
head(df2)
# can place id
# 1 Can1 1 1
# 2 Can1 1 2
# 3 Can1 1 3
# 4 Can1 1 4
# 5 Can1 1 5
# 6 Can1 1 6
We know that all events have it's winners, so we can initiate each separate event with first candidate (Assuming there is no ex-qequo). And then in every for( e in 1:length(events) ) add candidates at other places. Added candidates are substracted from initial dataset.
If some individuals from initial dataset are not assigned (nrow(temp)>1), then process is repeated until it's finish with success.
i <- 0
temp <- data.frame(1)
while(nrow(temp)>0){
i <- i + 1
temp <- df2[ sample(1:nrow(df2)),]
events <- temp %>% filter(place==1) %>% split(1:nrow(.))
for( e in 1:length(events) ){
for( p in sort( unique(temp$place) ) ){
inAlready <- events[[e]]
toInput <-
temp %>%
filter( !can %in% inAlready$can & place == p) %>%
.[1,]
events[[e]] <- rbind( inAlready , toInput )
}
events[[e]]$event <- e
idToExclude <- lapply( events , function(x) x$id) %>% unlist
temp %<>% filter(!id %in% idToExclude)
}
}
all <-
bind_rows(events) %>%
arrange(event, place) %>%
filter(!is.na(id))
I don't know if it's perfect solution, and how many iterations are necessary, but I hope this will help you find perfect solution. Anyway, probably there is more than one final solutions, so the perfect reproducibility could be impossible. I'm curious if there is some operational-research-like method to solve this problem.
Enjoy and good luck!