Azure Kusto Data Explorer: combine rows by column - azure-data-explorer

I am looking to update/extend column Process such that it reflects whether or not the Process has ended.
For example, table below should return:
Id Process
1 Completed
1 Completed
1 Completed //Id 1: because the last Process state was End
2 InProgress
2 InProgress //Id 2: because the last Process state was not End
3 InProgress
3 InProgress
3 InProgress
3 InProgress //Id 3: because the last Process state was not End
Table:
datatable(Id:int, Process:string, UpdateTime: datetime))
[
1, "Initiate", datetime(2020-02-02 12:00:00),
1, "Start", datetime(2020-02-02 13:00:00),
1, "End", datetime(2020-02-02 14:00:00),
2, "Initiate", datetime(2020-02-02 12:00:00),
2, "Start", datetime(2020-02-02 13:00:00),
3 "Initiate", datetime(2020-02-02 12:00:00),
3, "Start", datetime(2020-02-02 13:00:00),
3, "End", datetime(2020-02-02 14:00:00),
3, "Reopen", datetime(2020-02-02 15:00:00),
]

you could try something along the following lines:
datatable(Id:int, Process:string, UpdateTime: datetime)
[
1, "Initiate", datetime(2020-02-02 12:00:00),
1, "Start", datetime(2020-02-02 13:00:00),
1, "End", datetime(2020-02-02 14:00:00),
2, "Initiate", datetime(2020-02-02 12:00:00),
2, "Start", datetime(2020-02-02 13:00:00),
3, "Initiate", datetime(2020-02-02 12:00:00),
3, "Start", datetime(2020-02-02 13:00:00),
3, "End", datetime(2020-02-02 14:00:00),
3, "Reopen", datetime(2020-02-02 15:00:00),
]
| order by Id asc, UpdateTime asc
| extend session_start = row_window_session(UpdateTime, 365d, 365d, Id != prev(Id))
| as hint.materialized = true T
| lookup (
T
| summarize arg_max(UpdateTime, Process) by session_start, Id
| project Id, LastProcess = Process
) on Id
| project Id, Process = case(LastProcess == "End", "Completed", "InProgress")
which returns:
| Id | Process |
|----|------------|
| 1 | Completed |
| 1 | Completed |
| 1 | Completed |
| 2 | InProgress |
| 2 | InProgress |
| 3 | InProgress |
| 3 | InProgress |
| 3 | InProgress |
| 3 | InProgress |

Related

R Reshape and Select Max

HAVE = data.frame("WEEK"=c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
"STUDENT"=c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3),
"CLASS"=c('A', 'A', 'B', 'B', 'C', 'C', 'H', 'A', 'A', 'B', 'B', 'C', 'C', 'H', 'A', 'A', 'B', 'B', 'C', 'C', 'H', 'A', 'A', 'B', 'B', 'C', 'C', 'H', 'A', 'A', 'B', 'B', 'C', 'C', 'H', 'A', 'A', 'B', 'B', 'C', 'C', 'H'),
"TEST"=c(1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1),
"SCORE"=c(93, 97, 72, 68, 93, 51, 19, 88, 56, 53, 79, 69, 90, 61, 74, 50, 76, 97, 55, 63, 63, 59, 68, 77, 80, 52, 94, 74, 64, 74, 92, 98, 89, 84, 54, 51, 82, 86, 51, 90, 72, 86))
WANT = data.frame("WEEK"=c(1,1,1,2,2,2),
"STUDENT"=c(1,2,3,1,2,3),
"CLASS"=c('A','A','B','B','B','C'),
"H"=c(19,61,63,74,54,86),
"TEST1"=c(93,88,76,77,92,90),
"TEST2"=c(97,56,97,80,98,72))
I wish to group by WEEK and STUDENT and then for each combination of WEEK and STUDENT find the CLASS when SCORE equals to maximum(SCORE) where TEST equals to one. Then I wish to find the corresponding SCORE for TEST equals to 2 using that same CLASS. I wish to transform this into the data WANT from the data HAVE. And ALSO add the COLUMN H where it is just equals to the SCORE when CLASS equals to H
We can reshape to 'wide' with pivot_wider, then grouped by 'WEEK', 'STUDENT', create the 'H' column with 'TEST1' values were 'CLASS' is "H" and then slice the max row for 'TEST1'
library(dplyr)
library(tidyr)
HAVE %>%
pivot_wider(names_from = TEST, values_from = SCORE,
names_glue = "TEST{TEST}") %>%
group_by(WEEK, STUDENT) %>%
mutate(H = TEST1[CLASS == "H"], .before = 'TEST1') %>%
slice_max(n = 1, order_by = TEST1, with_ties = FALSE) %>%
ungroup
-output
# A tibble: 6 × 6
WEEK STUDENT CLASS H TEST1 TEST2
<dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 1 1 A 19 93 97
2 1 2 A 61 88 56
3 1 3 B 63 76 97
4 2 1 B 74 77 80
5 2 2 B 54 92 98
6 2 3 C 86 90 72
-checking with 'WANT'
> WANT
WEEK STUDENT CLASS H TEST1 TEST2
1 1 1 A 19 93 97
2 1 2 A 61 88 56
3 1 3 B 63 76 97
4 2 1 B 74 77 80
5 2 2 B 54 92 98
6 2 3 C 86 90 72

R: Labels with 0 Count

I am doing this research using RStudio that can generate/tabulate labels that will also show 0 counts. I am using the 'cro and 'calc_cro function in RStudio.
My sample codes:
S4 <- cro(data$S4, list(total(), data$S3A_1, data$User, data$S4, data$S8rev, data$S5rev))
and
S9 <- calc_cro(data, mrset(S9_1 %to% S9_993_1),list(total(), S3A_1, User, S4, S8rev, S5rev))
For example, S4 is a variable code for size, i.e. (1 - Small, 2 - Medium, 3 - Large).
Moreover, sample survey results show only small and large respondents. My codes results will be more like this:
Size
Total
Male
Female
...
Small
15
8
7
...
Large
15
8
7
...
Can someone help me to modify my codes that are using to show also labels with 0 counts like this:
Size
Total
Male
Female
...
Small
15
8
7
...
Medium
0
0
0
...
Large
15
8
7
...
I am thinking right now that this is not possible because R can't determine the range of the labels (in my example 1-3, how would it know that it is 1 to 3 and doesn't have 4,5,..., x number of labels).
However, there are still thoughts in my head saying if I can define/include the range in my codes, would it be possible to make this work?
Here's my dput...
structure(list(RespID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30), S4 = c(1, 3, 3, 3, 3, 3, 3, 1, 3, 1, 1, 1, 3, 3,
1, 3, 1, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1), Gender = c(1,
1, 2, 1, 2, 1, 2, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2,
2, 1, 2, 1, 1, 2, 2, 2), Area = c(2, 2, 2, 4, 1, 1, 3, 4, 1,
1, 1, 4, 2, 3, 1, 3, 3, 1, 3, 4, 1, 3, 3, 4, 4, 3, 1, 1, 4, 1
)), row.names = c(NA, -30L), class = c("tbl_df", "tbl", "data.frame"
))
With data labels:
S4 = {1 - Small, 2 - Medium, 3 - Large}
Gender = {1 - Male, 2 - Female}
Area = {1 - North, 2 - East, 3 - South, 4 - North}
I think you just misunderstood function usage, try
library("expss")
S4 <- cro(data$S4, col_vars=list("S3A_1", "User", "S4", "S8rev", "S5rev"))
S4
# | | | S3A_1 | User | S4 | S8rev | S5rev |
# | ------- | ------------ | ----- | ---- | -- | ----- | ----- |
# | data$S4 | 1 | 15 | 15 | 15 | 15 | 15 |
# | | 3 | 15 | 15 | 15 | 15 | 15 |
# | | #Total cases | 30 | 30 | 30 | 30 | 30 |
I'm not very sure, though, what you're actually asking.

Within rows of data frame, find first occurrence and longest sequence of value

Consider this data frame, which provides the scored responses on a 15-item test for 10 individuals:
library(tidyverse)
input <- tribble(
~ID, ~i1, ~i2, ~i3, ~i4, ~i5, ~i6, ~i7, ~i8, ~i9, ~i10, ~i11, ~i12, ~i13, ~i14, ~i15,
"A", 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
"B", 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
"C", 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,
"D", 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0,
"E", 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0,
"F", 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
"G", 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,
"H", 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,
"I", 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
"J", 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1
)
I want R to go row-by-row, and scan the cells in each row from left to right, in order to create these new columns:
first_0_name: returns the column name of the cell containing the first occurrence of the value 0
first_0_loc: returns the column location of the cell containing the first occurrence of the value 0
streak_1: starting from the first occurrence of 0, find the next occurrence of 1, and then count how many consecutive 1 before the next occurrence of 0.
The new columns should appear as below
new_cols <- tribble(
~first_0_name, ~first_0_loc, ~streak_1,
"i9", 10, 5,
"i4", 5, 4,
"i6", 7, 8,
"i8", 9, 4,
"i9", 10, 5,
NA, NA, NA,
"i1", 2, 5,
"i3", 4, 8,
"i2", 3, NA,
"i1", 2, 1
)
Thanks in advance for any help!
If you wanted to use base R a little more directly and avoid the cost of transforming the whole data frame. This solution also retains the order of rows without having to create extra ordering columns (unlike the tidyverse solution).
results <- apply(input, 1, function(x) {
# get indices of all zeros
zeros <- which(x == 0)
# exit early if no zeros are found
if (length(zeros) == 0) {
return(data.frame(first_0_name = NA, first_0_loc = NA, streak_1 = NA))
}
first.name <- names(zeros[1]) # name of first 0 column
first.idx <- zeros[1] # location of first zero
longest.streak <- diff(zeros)[1] - 1 # length of first 0-0 streak
return(data.frame(first_0_name = first.name,
first_0_loc = first.idx,
streak_1 = ifelse(longest.streak == 0, NA, longest.streak))
)
})
output <- do.call(rbind, results)
first_0_name first_0_loc streak_1
i9 i9 10 5
i4 i4 5 4
i6 i6 7 8
i8 i8 9 NA
i91 i9 10 5
1 <NA> NA NA
i1 i1 2 5
i3 i3 4 8
i2 i2 3 NA
i31 i3 4 2
Edit #2: Rewrote as combination of two summarizations.
input_tidy <- input %>%
gather(col, val, -ID) %>%
group_by(ID) %>%
arrange(ID) %>%
mutate(col_num = row_number() + 1)
input[,1] %>%
# Combine with summary of each ID's first zero
left_join(input_tidy %>% filter(val == 0) %>%
summarize(first_0_name = first(col),
first_0_loc = first(col_num))) %>%
# Combine with length of each ID's first post-0 streak of 1's
left_join(input_tidy %>%
filter(val == 1 & cumsum(val == 1 & lag(val, default = 1) == 0) == 1) %>%
summarize(streak_1 = n()))
# A tibble: 10 x 4
ID first_0_name first_0_loc streak_1
<chr> <chr> <dbl> <int>
1 A i9 10 5
2 B i4 5 4
3 C i6 7 8
4 D i8 9 4
5 E i9 10 5
6 F NA NA NA
7 G i1 2 5
8 H i3 4 8
9 I i2 3 NA
10 J i3 4 2
An option using melt from data.table
library(data.table)
melt(setDT(input), id.var = 'ID')[, .(first_o_name = first(variable[value == 0]),
first_o_loc = which(value == 0)[1] +1,
streak_1 = sum(cumsum(c(TRUE, diff(value == 0) < 0)) == 2) - 1 ), ID
][streak_1 < 0, streak_1 := NA_real_][]
A base R option can also be with apply and rle
do.call(rbind, apply(input[-1], 1, function(x) {
first_o_loc <- unname(which(x == 0)[1] + 1)
first_o_name <- names(x)[first_o_loc-1]
rl <- rle(x)
rl1 <- within.list(rl, {
i1 <- cumsum(values == 0) == 1
values <- values[i1]
lengths <- lengths[i1]})
streak_1 <- unname(rl1$lengths[2])
data.frame(first_o_name, first_o_loc, streak_1)}))
# first_o_name first_o_loc streak_1
#1 i9 10 5
#2 i4 5 4
#3 i6 7 8
#4 i8 9 4
#5 i9 10 5
#6 <NA> NA NA
#7 i1 2 5
#8 i3 4 8
#9 i2 3 NA
#10 i3 4 2

Advanced if/then/loop function to create new columns

I am learning R (focused on the tidyverse packages) and am hoping that someone could help with the following problem that has me stumped.
I have a data-set that looks similar to the following:
library("tibble")
myData <- frame_data(
~id, ~r1, ~r2, ~r3, ~r4, ~r5, ~r6, ~r7, ~r8, ~r9, ~r10, ~r11, ~r12, ~r13, ~r14, ~r15, ~r16,
"A", 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
"B", 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
"C", 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2,
"D", 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2,
"E", 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
)
Basically, I have multiple rows of respondent data, and each respondent gave 16 responses of either "1" or "2".
For each respondent (i.e., each row) I would like to create an additional three columns:
The first new column - called "switchCount" - identifies the number of times the respondent switched from a "2" response to a "1" response.
The second new column - called "switch1" - identifies the index of the first time the respondent switched from a "2" response to a "1" response.
The third new column - called "switch2" - identifies the index of the final time the respondent switched from a "2" response to a "1" response.
If there is no switch and all values are "2", then return the index of 0.
If there is no switch and all values are "1", then return the index of 16.
The final datatable should therefore look like this:
myData <- frame_data(
~id, ~r1, ~r2, ~r3, ~r4, ~r5, ~r6, ~r7, ~r8, ~r9, ~r10, ~r11, ~r12, ~r13, ~r14, ~r15, ~r16, ~switchCount, ~switch1, ~switch2,
"A", 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 1, 1,
"B", 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4,
"C", 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 3, 9,
"D", 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 3, 6, 15,
"E", 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 16, 16
)
One approach could be to concatenate all response columns row wise and then find the occurrences of 2,1 using gregexpr
library(dplyr)
myData %>%
rowwise() %>%
mutate(concat_cols = paste(r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r13,r14,r15,r16,sep=";"),
switchCount = ifelse(gregexpr("2;1", concat_cols)[[1]][1] == -1,
0,
length(gregexpr("2;1", concat_cols)[[1]])),
switch1 = ifelse(switchCount == 0,
ifelse(grepl("2",concat_cols), 1, 16),
min(floor(gregexpr("2;1", concat_cols)[[1]]/2)+1)),
switch2 = ifelse(switchCount == 0,
ifelse(grepl("2",concat_cols), 1, 16),
max(floor(gregexpr("2;1", concat_cols)[[1]]/2)+1))) %>%
select(-concat_cols)
Output is:
id r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 switchCount switch1 switch2
1 A 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 1 1
2 B 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4
3 C 2 2 2 1 1 1 2 2 2 1 1 1 1 2 2 2 2 3 9
4 D 1 1 2 2 2 2 1 1 2 2 1 1 1 2 2 1 3 6 15
5 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 16 16
Sample data:
myData <- structure(list(id = c("A", "B", "C", "D", "E"), r1 = c(2, 2,
2, 1, 1), r2 = c(2, 2, 2, 1, 1), r3 = c(2, 2, 2, 2, 1), r4 = c(2,
2, 1, 2, 1), r5 = c(2, 1, 1, 2, 1), r6 = c(2, 1, 1, 2, 1), r7 = c(2,
1, 2, 1, 1), r8 = c(2, 1, 2, 1, 1), r9 = c(2, 1, 2, 2, 1), r10 = c(2,
1, 1, 2, 1), r11 = c(2, 1, 1, 1, 1), r12 = c(2, 1, 1, 1, 1),
r13 = c(2, 1, 1, 1, 1), r14 = c(2, 1, 2, 2, 1), r15 = c(2,
1, 2, 2, 1), r16 = c(2, 1, 2, 1, 1), switchCount = c(0, 1,
2, 3, 0), switch1 = c(1, 4, 3, 6, 16), switch2 = c(1, 4,
9, 15, 16)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))

Finding all ids available at a particular time

I have a binary matrix that gives the indication whether a person (ID) is available at a time to do a job. The example matrix is
08:00 08:30 09:00 09:30 10:00 10:30 11:00 11:30 12:00 12:30 13:00 13:30 14:00 14:30 15:00 15:30 16:00 16:30 17:00 17:30 18:00 18:30 19:00
1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 0 0
2 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 0 0
3 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 0
4 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 0
5 0 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0
6 0 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0
19:30
1 0
2 0
3 0
4 0
5 0
6 0
The row names represent the IDs and the time showed are the ones where the IDs are available. In the example, IDs 1 and 2 start work at 8:00, and have specific break periods at 10:30-11:00, 13:00- 13:30. The persons that start half and hour later 3 and 4 takes break from 11:00-11:30, 13:30-14:00. This is to ensure that somebody is available to do a job that can start at any particular point.
dput(matrix)
structure(c(1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,
0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0), .Dim = c(6L, 24L), .Dimnames = list(c("1", "2", "3", "4",
"5", "6"), c("08:00", "08:30", "09:00", "09:30", "10:00", "10:30",
"11:00", "11:30", "12:00", "12:30", "13:00", "13:30", "14:00",
"14:30", "15:00", "15:30", "16:00", "16:30", "17:00", "17:30",
"18:00", "18:30", "19:00", "19:30")))
Another dataset have the "IDs" with their starting time
data1 <- data.frame(ID = 1:6, Start_Time = c("8:00", "8:00", "8:30",
"8:30", "9:00", "9:30"), stringsAsFactors=FALSE)
A third dataset will have the start and end timings for a particular task
data2 <- data.frame(Start = c("8:01", "9:35", "10:42", "11:25", "14:22",
"17:20", "18:19"), End = c("8:22", "9:42", "11:20", "11:32",
"14:35", "18:15", "18:25"), stringsAsFactors=FALSE)
I am trying to create a column in data2 that gives the IDs available to do the task based on the Start time in data2. The expected output is
data2$IdsAvail <- c("1, 2", "1, 2, 3, 4, 5, 6", "3, 4, 5, 6",
"1, 2, 5, 6", "1, 2, 3, 4", "3, 4, 5, 6", NA)
It would look like below
data2
Start End IdsAvail
1 8:01 8:22 1, 2
2 9:35 9:42 1, 2, 3, 4, 5, 6
3 10:42 11:20 3, 4, 5, 6
4 11:25 11:32 1, 2, 5, 6
5 14:22 14:35 1, 2, 3, 4
6 17:20 18:15 3, 4, 5, 6
7 18:19 18:25 <NA>
Tried to match the IDs with time in the matrix, but couldn't find a way. It is also possible that two jobs can come within the time frame where one person is doing a job. I am not taking that into consideration here. This just to get the initial IDs available based on the matrix.
EDIT: The below solution by #Audiophile works for the example, but it throws a warning here having duplicates
availability <- merge(availability,data2,by.x = 'time',by.y = 'slot',all.y = T)
I had to use allow.cartesian to make it work in the original dataset. My dataset have about 2000 rows, after using merge it gives about >20000 rows. The above merge step using this example also give different number of rows than in 'availability' or 'data2'. Is there any other method i.e. using foverlaps from data.table?
Identify the slots for which each person is available, and then merge it with the task list:
library(tidyr)
library(dplyr)
#Convert your availability matrix (mat1) to a data frame
df <- as.data.frame(mat1)
df$ID <- rownames(df)
#Reshape the availability dataset
availability <- df %>%
gather(time,available,-ID) %>%
filter(available==1) %>%
mutate(time = as.POSIXct(time,format = "%H:%M"))
data1$Start_Time <- as.POSIXct(data1$Start_Time,format = "%H:%M")
data2$Start <- as.POSIXct(data2$Start,format = "%H:%M")
#Use start times to refine availability dataset
availability <- merge(availability,data1,by = "ID")
availability <- availability %>%
filter(time>=Start_Time) %>%
select(ID,time)
#Round task time to nearest half hour slot
data2$slot <- as.POSIXct(floor(as.double(data2$Start)/1800)*1800,
format = "%H:%M",origin = as.POSIXct('1970-01-01',tz='UTC'))
availability <- merge(availability,data2,by.x = 'time',by.y = 'slot',all.y = T)
availability <- availability %>%
select(Start,End,ID) %>%
arrange(Start,ID) %>%
group_by(Start,End) %>%
summarise(IdsAvail = toString(ID)) %>%
ungroup() %>%
mutate(Start = format(Start,"%H:%M"))

Resources