KQL calculate the timespan of sequence - azure-data-explorer

I have a table with two columns Ts that represents datetime and Index. I want to calculate the total timespan of continuous sequence of indexes.
To do that, I used scan to calculate timespan:
let t = datatable(Ts: datetime, Index:int)
[
datetime(2022-12-1), 1,
datetime(2022-12-5), 2,
datetime(2022-12-6), 3,
datetime(2022-12-2), 10,
datetime(2022-12-3), 11,
datetime(2022-12-3), 12,
datetime(2022-12-1), 18,
datetime(2022-12-1), 19,
];
t
| sort by Index asc
| scan declare (startTime: datetime, index:int, totalTime: timespan) with
(
step inSession: true => startTime = iff(isnull(inSession.startTime), Ts, inSession.startTime), index = Index;
step endSession: Index != inSession.index + 1 => totalTime = Ts - inSession.startTime;
)
But I get:
Ts
Index
startTime
index
totalTime
2022-12-01T00:00:00Z
1
2022-12-01T00:00:00Z
1
2022-12-05T00:00:00Z
2
2022-12-01T00:00:00Z
2
2022-12-06T00:00:00Z
3
2022-12-01T00:00:00Z
3
2022-12-02T00:00:00Z
10
1.00:00:00
2022-12-02T00:00:00Z
10
2022-12-02T00:00:00Z
10
2022-12-03T00:00:00Z
11
2.00:00:00
2022-12-03T00:00:00Z
11
2022-12-02T00:00:00Z
11
2022-12-03T00:00:00Z
12
2.00:00:00
2022-12-03T00:00:00Z
12
2022-12-02T00:00:00Z
12
2022-12-01T00:00:00Z
18
-1.00:00:00
2022-12-01T00:00:00Z
18
2022-12-01T00:00:00Z
18
2022-12-01T00:00:00Z
19
00:00:00
2022-12-01T00:00:00Z
19
2022-12-01T00:00:00Z
19
Instead (the desired result):
Ts
Index
startTime
index
totalTime
2022-12-01T00:00:00Z
1
2022-12-01T00:00:00Z
1
2022-12-05T00:00:00Z
2
2022-12-01T00:00:00Z
2
2022-12-06T00:00:00Z
3
2022-12-01T00:00:00Z
3
2022-12-02T00:00:00Z
10
5.00:00:00
2022-12-02T00:00:00Z
10
2022-12-02T00:00:00Z
10
2022-12-03T00:00:00Z
11
2022-12-02T00:00:00Z
11
2022-12-03T00:00:00Z
12
2022-12-02T00:00:00Z
12
2022-12-01T00:00:00Z
18
1.00:00:00
2022-12-01T00:00:00Z
18
2022-12-01T00:00:00Z
18
2022-12-01T00:00:00Z
19
2022-12-01T00:00:00Z
19
2022-12-01T00:00:00Z
19
00:00:00
What is wrong with my query? How can I get the desired result?

let t = datatable(Ts: datetime, Index:int)
[
datetime(2022-12-1), 1,
datetime(2022-12-5), 2,
datetime(2022-12-6), 3,
datetime(2022-12-2), 10,
datetime(2022-12-3), 11,
datetime(2022-12-3), 12,
datetime(2022-12-1), 18,
datetime(2022-12-1), 19,
];
t
| sort by Index asc
| summarize rows = count(), min(Ts), max(Ts), min(Index), max(Index) by group_id = Index - row_number()
| extend totalTime = max_Ts - min_Ts
| project-away group_id
| order by min_Index asc
rows
min_Ts
max_Ts
min_Index
max_Index
totalTime
3
2022-12-01T00:00:00Z
2022-12-06T00:00:00Z
1
3
5.00:00:00
3
2022-12-02T00:00:00Z
2022-12-03T00:00:00Z
10
12
1.00:00:00
2
2022-12-01T00:00:00Z
2022-12-01T00:00:00Z
18
19
00:00:00
Fiddle

Related

How can I write a commmand in R that groups by multiple critera?

I am looking for a function where I can classify my data into five different industries given their SIC code
Permno SIC Industry
1 854
2 977
3 549
4 1231
5 3295
6 2000
7 1539
8 2549
9 3950
10 4758
11 4290
12 5498
13 5248
14 142
15 3209
16 2759
17 4859
18 2569
19 739
20 4529
It could be that all SICS between 100-200 and 400-700 should be in Industry 1, all SICs between 300-350 and 980-1020 should be in Industry 2 etc.
So in short - an 'If = or' function where I could list all the SICs that could match a given industry
Thank you!
You can add a new column with the filters by number:
For example:
data$Group <- 0
data[data$SCIS < 1000, data$Group == 1]
data[data$SCIS >= 1000, data$Group == 2 ]
floor the value after dividing the SIC value by 1000.
df$Industry <- floor(df$SIC/1000) + 1
df
# Permno SIC Industry
#1 1 854 1
#2 2 977 1
#3 3 549 1
#4 4 1231 2
#5 5 3295 4
#6 6 2000 3
#7 7 1539 2
#8 8 2549 3
#9 9 3950 4
#10 10 4758 5
#11 11 4290 5
#12 12 5498 6
#13 13 5248 6
#14 14 142 1
#15 15 3209 4
#16 16 2759 3
#17 17 4859 5
#18 18 2569 3
#19 19 739 1
#20 20 4529 5
If there is no way to programmatically define groups you may need to individually define the ranges. It is convenient to do this with case_when in dplyr.
library(dplyr)
df %>%
mutate(Industry = case_when(between(SIC, 100, 200) | between(SIC, 400, 700) ~ 'Industry 1',
between(SIC, 300, 350) | between(SIC, 980, 1020) ~ 'Industry 2'))

Select the first range of time values in order to add AM/PM

I have a dataframe (df) in which df$time has time values like the following:
df$id df$time
1 12:20
2 12:40
3 1:00
4 1:20
5 2:00
6 3:00
7 3:15
8 4:00
9 7:00
10 11:00
11 12:00
12 12:20
13 12:40
14 1:00
15 1:30
16 3:00
17 4:00
18 4:30
19 5:00
20 5:15
21 8:00
22 10:00
What I want is to indicate that the first range of time values (id 1:10), from 12:00 up to 11:59, is AM, and the second range, is pm.
to have sth like:
df$id df$time
1 12:20am
2 12:40am
....
.....
11 12:00pm
12 12:20pm
I have thousands of tables, I am thinking of a loop that will somehow put the first set, which will be df$time[i] < 12:00 OR df$time[i] < 1:00 and i < than a minimum number as am, but not sure if there is a more effective solution, something that will define that the first range of values is id 1:10 and the 2nd range is 11:22
This should be fairly quick.
df <- read.table(text="
id time
1 12:20
2 12:40
3 1:00
4 1:20
5 2:00
6 3:00
7 3:15
8 4:00
9 7:00
10 11:00
11 12:00
12 12:20
13 12:40
14 1:00
15 1:30
16 3:00
17 4:00
18 4:30
19 5:00
20 5:15
21 8:00
22 10:00", header=TRUE, stringsAsFactors=FALSE)
hm2dh <- function(x) {
hm <- do.call(rbind, strsplit(x, ":"))
as.numeric(hm[,1]) + as.numeric(hm[,2])/60
}
ampm <- c("pm", "am")[(cumprod(sign(diff(c(0, hm2dh(df$time) %% 12))))+3)/2]
df$timep <- paste0(df$time, ampm)
df
# id time timep
# 1 1 12:20 12:20am
# 2 2 12:40 12:40am
# 3 3 1:00 1:00am
# 4 4 1:20 1:20am
# 5 5 2:00 2:00am
# 6 6 3:00 3:00am
# 7 7 3:15 3:15am
# 8 8 4:00 4:00am
# 9 9 7:00 7:00am
# 10 10 11:00 11:00am
# 11 11 12:00 12:00pm
# 12 12 12:20 12:20pm
# 13 13 12:40 12:40pm
# 14 14 1:00 1:00pm
# 15 15 1:30 1:30pm
# 16 16 3:00 3:00pm
# 17 17 4:00 4:00pm
# 18 18 4:30 4:30pm
# 19 19 5:00 5:00pm
# 20 20 5:15 5:15pm
# 21 21 8:00 8:00pm
# 22 22 10:00 10:00pm
Here's a dplyr approach using the data you posted:
# example data
df <- read.table(text="
id time
1 12:20
2 12:40
3 1:00
4 1:20
5 2:00
6 3:00
7 3:15
8 4:00
9 7:00
10 11:00
11 12:00
12 12:20
13 12:40
14 1:00
15 1:30
16 3:00
17 4:00
18 4:30
19 5:00
20 5:15
21 8:00
22 10:00",
header=TRUE, stringsAsFactors=FALSE)
# create vectorised function to extract the hours
GetHrs = function(x) as.numeric(unlist(strsplit(x, ":"))[1])
GetHrs = Vectorize(GetHrs)
df %>%
mutate(hr = GetHrs(time), # get the hrs
group = cumsum(hr == 12 & lag(hr, default = 0) != 12), # create 2 groups based on where 12 appears after a value from 1 to 11
time_upd = ifelse(group == 1, paste0(time,"AM"), paste0(time,"PM"))) %>% # update values based on the grouping
select(id, time_upd) # keep only columns of interest
# id time_upd
# 1 1 12:20AM
# 2 2 12:40AM
# 3 3 1:00AM
# 4 4 1:20AM
# 5 5 2:00AM
# 6 6 3:00AM
# 7 7 3:15AM
# 8 8 4:00AM
# 9 9 7:00AM
# 10 10 11:00AM
# 11 11 12:00PM
# 12 12 12:20PM
# 13 13 12:40PM
# 14 14 1:00PM
# 15 15 1:30PM
# 16 16 3:00PM
# 17 17 4:00PM
# 18 18 4:30PM
# 19 19 5:00PM
# 20 20 5:15PM
# 21 21 8:00PM
# 22 22 10:00PM
We could convert the time values into modulo 1200 to find the cut point where diff is less than zero. The remainder could be done in a Map.
cp <- which(c(0, diff(as.numeric(gsub("\\D", "", df$time)) %% 1200)) < 0)
df$time <- unlist(Map(paste0, list(df$time[1:(cp-1)], df$time[cp:nrow(df)]), c("am", "pm")))
df
# id time
# 1 1 12:20am
# 2 2 12:40am
# 3 3 1:00am
# 4 4 1:20am
# 5 5 2:00am
# 6 6 3:00am
# 7 7 3:15am
# 8 8 4:00am
# 9 9 7:00am
# 10 10 11:00am
# 11 11 12:00pm
# 12 12 12:20pm
# 13 13 12:40pm
# 14 14 1:00pm
# 15 15 1:30pm
# 16 16 3:00pm
# 17 17 4:00pm
# 18 18 4:30pm
# 19 19 5:00pm
# 20 20 5:15pm
# 21 21 8:00pm
# 22 22 10:00pm
Data
df <- structure(list(id = 1:22, time = c("12:20", "12:40", "1:00",
"1:20", "2:00", "3:00", "3:15", "4:00", "7:00", "11:00", "12:00",
"12:20", "12:40", "1:00", "1:30", "3:00", "4:00", "4:30", "5:00",
"5:15", "8:00", "10:00")), row.names = c(NA, -22L), class = "data.frame")

Identify pattern in R, counting and assigning values accordingly

I'm new to R so this question might be quite basic.
There is a column in my data which goes like 4 4 4 4 7 7 7 13 13 13 13 13 13 13 4 4 7 7 7 13 13 13 13 13 13 13 13 4 4.....
One cycle of 4...7...13... is considered as one complete run, to which I will assign a Run Number (1, 2, 3...) to each run.
The number of times that each value (4, 7, 13) repeats is not fixed, and the total number of rows in a run is not fixed either. The total number of runs is unknown (but typically ranging from 60-90). The order of (4, 7, 13) is fixed.
I have attached my current code here. It works fine, but it does take a minute or two when there's about a few million rows of data. I'm aware that growing vectors in a for loop is really not recommended in R, so I would like to ask if anyone has a more elegant solution to this.
Sample data can be generated with the code below, and the desired output can also be generated with the sample code below.
#Generates sample data
df <- data.frame(Temp = c(sample(50:250, 30)), Pres = c(sample(500:1000, 30)),
Message = c(rep(4, 3), rep(7, 2), rep(13, 6), rep(4, 4), rep(7, 1), rep(13, 7), rep(4, 3), rep(7, 4)))
Current Solution
prev_val = 0
Rcount = 1
Run_Count = c()
for (val in df$Message)
{
delta = prev_val - val
if((delta == 9))
Rcount = Rcount + 1
prev_val = val
Run_Count = append(Run_Count, Rcount)
}
df$Run = Run_Count
The desired output:
226 704 4 1
138 709 4 1
136 684 4 1
57 817 7 1
187 927 7 1
190 780 13 1
152 825 13 1
126 766 13 1
202 855 13 1
214 757 13 1
172 922 13 1
50 975 4 2
159 712 4 2
212 802 4 2
181 777 4 2
102 933 7 2
165 753 13 2
67 962 13 2
119 631 13 2
The data frame will later be split by the Run Number, but after being categorized according to the value, i.e.
... 4 1
... 4 1
... 4 1
... 4 1
... 4 2
... 4 2
... 4 2
... 4 3
.....
I am not sure if this is an improvement, but it uses the rle run length encoding function to determine the length of each repeat in each run.
df <- data.frame(Temp = c(sample(50:250, 30)), Pres = c(sample(500:1000, 30)),
Message = c(rep(4, 3), rep(7, 2), rep(13, 6), rep(4, 4), rep(7, 1), rep(13, 7), rep(4, 3), rep(7, 4)))
rleout<-rle(df$Message)
#find the length of the runs and create the numbering
runcounts<-ceiling(length(rleout$lengths)/3)
runs<-rep(1:runcounts, each=3)
#need to trim the length of run numbers for cases where there is not a
# full sequence, as in the test case.
rleout$values<-runs[1:length(rleout$lengths)]
#create the new column
df$out<-inverse.rle(rleout)
I'm sure someone can come along and demonstrate and a better and faster method using data tables.
easily use:
df$runID <- cumsum(c(-1,diff(df$Message)) < 0)
# Temp Pres Message runID
# 1 174 910 4 1
# 2 181 612 4 1
# 3 208 645 4 1
# 4 89 601 7 1
# 5 172 812 7 1
# 6 213 672 13 1
# 7 137 848 13 1
# 8 153 833 13 1
# 9 127 591 13 1
# 10 243 907 13 1
# 11 146 599 13 1
# 12 151 567 4 2
# 13 139 855 4 2
# 14 147 793 4 2
# 15 227 533 4 2
# 16 241 959 7 2
# 17 206 948 13 2
# 18 236 875 13 2
# 19 133 537 13 2
# 20 70 688 13 2
# 21 218 528 13 2
# 22 244 927 13 2
# 23 161 697 13 2
# 24 177 572 4 3
# 25 179 911 4 3
# 26 192 559 4 3
# 27 60 771 7 3
# 28 245 682 7 3
# 29 196 614 7 3
# 30 171 536 7 3

Aggregate daily level data to weekly level in R

I have a huge dataset similar to the following reproducible sample data.
Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176
I want to aggregate this data to weekly level to get the output similar to the following:
Interval value
1 Week 2, June 2012 *aggregate value for day 10 to day 14 of June 2012*
2 Week 3, June 2012 *aggregate value for day 15 to day 21 of June 2012*
3 Week 4, June 2012 *aggregate value for day 22 to day 28 of June 2012*
4 Week 5, June 2012 *aggregate value for day 29 to day 30 of June 2012*
5 Week 1, July 2012 *aggregate value for day 1 to day 7 of July 2012*
6 Week 2, July 2012 *aggregate value for day 8 to day 10 of July 2012*
How do I achieve this easily without writing a long code?
If you mean the sum of of ‘value’ by week I think the easiest way to do it is to convert the data into a xts object as GSee suggested:
data <- as.xts(data$value,order.by=as.Date(data$interval))
weekly <- apply.weekly(data,sum)
[,1]
2012-06-10 552
2012-06-17 23629
2012-06-24 23872
2012-07-01 23667
2012-07-08 23552
2012-07-10 10902
I leave the formatting of the output as an exercise for you :-)
If you were to use week from lubridate, you would only get five weeks to pass to by. Assume dat is your data,
> library(lubridate)
> do.call(rbind, by(dat$value, week(dat$Interval), summary))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 24 552 4146 4188 3759 4529 4850
# 25 490 2498 4256 3396 4438 5156
# 26 564 2578 4206 3355 4346 4866
# 27 698 993 4868 3366 5122 5770
# 28 671 1086 3200 3200 5314 5726
This shows a summary for the 24th through 28th week of the year. Similarly, we can get the means with aggregate with
> aggregate(value~week(Interval), data = dat, mean)
# week(Interval) value
# 1 24 3758.667
# 2 25 3396.286
# 3 26 3355.000
# 4 27 3366.429
# 5 28 3199.500
I just came across this old question because it was used as a dupe target.
Unfortunately, all the upvoted answers (except the one by konvas and a now deleted one) present solutions for aggregating the data by week of the year while the OP has requested to aggregate by week of the month.
The definition of week of the year and week of the month is ambiguous as discussed here, here, and here.
However, the OP has indicated that he wants to count the days 1 to 7 of each month as week 1 of the month, days 8 to 14 as week 2 of the month, etc. Note that week 5 is a stub for most of the months consisting of only 2 or 3 days (except for the month of February if no leap year).
Having prepared the ground, here is a data.table solution for this kind of aggregation:
library(data.table)
DT[, .(value = sum(value)),
by = .(Interval = sprintf("Week %i, %s",
(mday(Interval) - 1L) %/% 7L + 1L,
format(Interval, "%b %Y")))]
Interval value
1: Week 2, Jun 2012 18366
2: Week 3, Jun 2012 24104
3: Week 4, Jun 2012 23348
4: Week 5, Jun 2012 5204
5: Week 1, Jul 2012 23579
6: Week 2, Jul 2012 11573
We can verify that we have picked the correct intervals by
DT[, .(value = sum(value),
date_range = toString(range(Interval))),
by = .(Week = sprintf("Week %i, %s",
(mday(Interval) -1L) %/% 7L + 1L,
format(Interval, "%b %Y")))]
Week value date_range
1: Week 2, Jun 2012 18366 2012-06-10, 2012-06-14
2: Week 3, Jun 2012 24104 2012-06-15, 2012-06-21
3: Week 4, Jun 2012 23348 2012-06-22, 2012-06-28
4: Week 5, Jun 2012 5204 2012-06-29, 2012-06-30
5: Week 1, Jul 2012 23579 2012-07-01, 2012-07-07
6: Week 2, Jul 2012 11573 2012-07-08, 2012-07-10
which is in line with OP's specification.
Data
library(data.table)
DT <- fread(
"rn Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176", drop = 1L)
DT[, Interval := as.Date(Interval)]
If you are using a data frame, you can easily do this with the tidyquant package. Use the tq_transmute function, which applies a mutation and returns a new data frame. Select the "value" column and apply the xts function apply.weekly. The additional argument FUN = sum will get the aggregate by week.
library(tidyquant)
df
#> # A tibble: 31 x 2
#> Interval value
#> <date> <int>
#> 1 2012-06-10 552
#> 2 2012-06-11 4850
#> 3 2012-06-12 4642
#> 4 2012-06-13 4132
#> 5 2012-06-14 4190
#> 6 2012-06-15 4186
#> 7 2012-06-16 1139
#> 8 2012-06-17 490
#> 9 2012-06-18 5156
#> 10 2012-06-19 4430
#> # ... with 21 more rows
df %>%
tq_transmute(select = value,
mutate_fun = apply.weekly,
FUN = sum)
#> # A tibble: 6 x 2
#> Interval value
#> <date> <int>
#> 1 2012-06-10 552
#> 2 2012-06-17 23629
#> 3 2012-06-24 23872
#> 4 2012-07-01 23667
#> 5 2012-07-08 23552
#> 6 2012-07-10 10902
When you say "aggregate" the values, you mean take their sum? Let's say your data frame is d and assuming d$Interval is of class Date, you can try
# if d$Interval is not of class Date d$Interval <- as.Date(d$Interval)
formatdate <- function(date)
paste0("Week ", (as.numeric(format(date, "%d")) - 1) + 1,
", ", format(date, "%b %Y"))
# change "sum" to your required function
aggregate(d$value, by = list(formatdate(d$Interval)), sum)
# Group.1 x
# 1 Week 1, Jul 2012 3725.667
# 2 Week 2, Jul 2012 3199.500
# 3 Week 2, Jun 2012 3544.000
# 4 Week 3, Jun 2012 3434.000
# 5 Week 4, Jun 2012 3333.143
# 6 Week 5, Jun 2012 3158.667

Running Total example

I have the following Data:
id customer date value1 value2 isTrue
10 13 2013-08-20 00:00:00.0000 170 180680 0
11 13 2013-09-02 00:00:00.0000 190 181830 0
12 13 2013-09-07 00:00:00.0000 150 183000 1
13 13 2013-09-14 00:00:00.0000 150 183930 0
14 13 2013-09-16 00:00:00.0000 150 184830 0
15 13 2013-09-19 00:00:00.0000 150 185765 1
16 13 2013-09-30 00:00:00.0000 800 187080 0
17 13 2013-10-02 00:00:00.0000 100 188210 0
28 13 2013-10-04 00:00:00.0000 380 188250 1
How can i have the following Results,where SumValue1 is the summury of value1 until field isTrue gets True and resets after and Difference Value2 is the difference of field value2 everytime the IsTrue field gets True?
id customer date value1 value2 isTrue SumValue1 DifferenceValue2
10 13 2013-08-20 00:00:00.0000 170 180680 0
11 13 2013-09-02 00:00:00.0000 190 181830 0
12 13 2013-09-07 00:00:00.0000 150 183000 1 510 2320
13 13 2013-09-14 00:00:00.0000 150 183930 0
14 13 2013-09-16 00:00:00.0000 150 184830 0
15 13 2013-09-19 00:00:00.0000 150 185765 1 450 2765
16 13 2013-09-30 00:00:00.0000 800 187080 0
17 13 2013-10-02 00:00:00.0000 100 188210 0
28 13 2013-10-04 00:00:00.0000 380 188250 1 1280 2485
Assuming id ordering, this query will do:
SELECT
id, customer, date, value1, value2, isTrue,
CASE isTrue WHEN 1 THEN (SELECT TOTAL(value1) FROM t WHERE customer=t2.customer AND id>t2.prev_id AND id<=t2.id) END AS SumValue1,
CASE isTrue WHEN 1 THEN value2-(SELECT value2 FROM t WHERE customer=t2.customer AND id=t2.prev_id) END AS DifferenceValue2
FROM (SELECT *, CASE isTrue WHEN 1 THEN COALESCE((SELECT id FROM t AS _ WHERE customer=t.customer AND date<t.date AND isTrue ORDER BY date DESC LIMIT 1), -1) END AS prev_id FROM t) AS t2;
Steps by step:
Previous id where isTrue is given by:
SELECT id FROM t AS _ WHERE customer=t.customer AND date<t.date AND isTrue ORDER BY date DESC LIMIT 1
Using COALESCE(..., -1) will ensure a non-null id before all others (-1).
SELECT *, CASE isTrue WHEN 1 THEN ... END AS prev_id FROM t will return all rows from t with column prev_id added.
At last, querying SELECT TOTAL(value1) FROM t WHERE customer=t2.customer AND id>t2.prev_id AND id<=t2.id and value2-(SELECT value2 FROM t WHERE customer=t2.customer AND id=t2.prev_id over previous results will return desired results.

Resources