SQL's lead equivalent in KQL - azure-data-explorer

I am trying to achieve something similar to SQL's lead in KQL. The query in SQL would look like below -
Select lead(changedBy,1,null) Over (partition by incidentId order by historyId) nextChangedBy
from IncidentHistory IH, Incidents I
where I.IncidentId = IH.IncidentId
and I.UpdatedDate > trunc(sysdate) -30;
Could someone let me know how I can achieve the same in Kusto Query Language? I don't see any lead function in KQL.

At least at this point, KQL is more imperative than SQL in this area of windows functions.
We need to partition the data and order it and then we can use next(), which is the equivalent for SQL lead().
P.S.
Please note that in SQL there is no need for the elaborate syntax of lead(changedBy,1,null).
Since 1 & null are the defaults, lead(changedBy) is enough.
Having said that, if needed, KQL does have an equivalent syntax.
let Incidents = datatable(incidentId:int, UpdatedDate:datetime)
[
1 ,datetime(2022-09-15 01:02:03)
,2 ,datetime(2022-10-01 04:05:06)
,3 ,datetime(2022-10-07 07:08:09)
];
let IncidentHistory = datatable(incidentId:int, historyId:int, changedBy:string)
[
1 , 10 , "CB1"
,2 , 20 , "CB2"
,2 , 50 , "CB3"
,2 , 60 , "CB4"
,2 , 80 , "CB5"
,3 , 30 , "CB6"
,3 , 40 , "CB7"
,3 , 70 , "CB8"
];
Incidents
| where UpdatedDate > startofday(ago(30d))
| join kind=inner IncidentHistory on incidentId
| partition hint.strategy=native by incidentId
(
order by historyId asc
| extend nextChangedBy = next(changedBy)
)
incidentId
UpdatedDate
incidentId1
historyId
changedBy
nextChangedBy
3
2022-10-07T07:08:09Z
3
30
CB6
CB7
3
2022-10-07T07:08:09Z
3
40
CB7
CB8
3
2022-10-07T07:08:09Z
3
70
CB8
2
2022-10-01T04:05:06Z
2
20
CB2
CB3
2
2022-10-01T04:05:06Z
2
50
CB3
CB4
2
2022-10-01T04:05:06Z
2
60
CB4
CB5
2
2022-10-01T04:05:06Z
2
80
CB5
Fiddle

Related

Filter rows by other rows with shared columns in KQL

I have a table with 3 columns
Group
Type
Index
A
Short
1
A
Short
2
A
Long
3
A
Short
4
B
Short
1
...
I want to query and group extract all the rows of that group, but exclude rows with same type and that the index-1 is also exists. For example, if I query Group1, I want to get:
Group
Type
Index
A
Short
1
A
Long
3
A
Short
4
Here we removed the Type1 Index 2 since index 1 already exists with same type.
I tried this query:
traces
| where Group == "A"
| expend OutterType = Type
| where (Index-1) !in((
traces
| where Group == "A" and OutterType == Type
| project Index))
But it says that OutterType doesn't exist in the context of the inner query.
How can filter those rows?
let t = datatable(Group:string, Type:string, Index:int)
[
,"A" ,"Short" ,1
,"A" ,"Short" ,2
,"A" ,"Long" ,3
,"A" ,"Short" ,4
,"B" ,"Short" ,1
];
t
| join kind=leftanti (t | extend Index = Index + 1) on Group, Type, Index
Group
Type
Index
B
Short
1
A
Short
1
A
Short
4
A
Long
3
Fiddle
datatable(Group:string, Type:string, Index:int)
[
,"A" ,"Short" ,1
,"A" ,"Short" ,2
,"A" ,"Long" ,3
,"A" ,"Short" ,4
,"B" ,"Short" ,1
]
| extend GroupType = strcat(Group, ":", Type)
| partition by GroupType
(
project-away GroupType
| order by Index asc
| where Index - 1 != prev(Index)
)
Group
Type
Index
A
Long
3
B
Short
1
A
Short
1
A
Short
4
Fiddle

kusto: Calcuating duration

I have a kusto table where I will receive data every 2 hours. I need to find the start and end time for a given data. Here, endtime should be determined by the system by seeing if the data is not present for that message.
Eg:
Id Name Timestamp
1 A 07-12-2022T04:05:00z
2 A 07-12-2022T06:05:00z
3 A 07-12-2022T08:05:00z
4 A 07-12-2022T12:05:00z
In the above example, we received data at 4,6,8 and it is missing for 10. I need to show that the start time is 04"05"00 and end time as 10:00:00 (here the system should detect and fill-in) and then one more start time as 12:05:00z without and end time as the current time is less than 2 hours from 12:05:00.
let p_max_allowed_gap = 2h;
datatable(Id:int, Name:string, Timestamp:datetime)
[
1 ,"A" ,"2022-07-12T04:05:00z"
,2 ,"A" ,"2022-07-12T06:05:00z"
,3 ,"A" ,"2022-07-12T08:05:00z"
,4 ,"A" ,"2022-07-12T12:05:00z"
]
| partition hint.strategy=native by Name
(
order by Timestamp asc
| extend session_id = row_cumsum(iff(Timestamp - prev(Timestamp) > p_max_allowed_gap, 1, 0))
)
| summarize StartTime = min(Timestamp), EndTime = max(Timestamp) by Name, session_id
| extend EndTime = iff(now() - EndTime < p_max_allowed_gap, datetime(null), EndTime)
Name
session_id
StartTime
EndTime
A
0
2022-07-12T04:05:00Z
2022-07-12T08:05:00Z
A
1
2022-07-12T12:05:00Z
2022-07-12T12:05:00Z
Fiddle

Query for the scenario

I have to implement a query where the scenario is something like below;
Col_1 Col_2 Col_3 Col_4 Col_5
A AB AC AD AE
B BC BD BE ?
C CD CE ? ?
D DE ? ? ?
E ? ? ? ?
Any help on the below is highly appreciated.
Thanks,
Amit
You need different rows for each column:
select
col_1
,col_1 || min(col_1) over (order by ... rows between 1 following and 1 following)
,col_1 || min(col_1) over (order by ... rows between 2 following and 2 following)
,col_1 || min(col_1) over (order by ... rows between 3 following and 3 following)
,col_1 || min(col_1) over (order by ... rows between 4 following and 4 following)
{
SELECT CHR(64+LEVEL) AS A,
DECODE(SIGN( 4 - LEVEL ) , -1, '?',CHR(64+LEVEL)||CHR(65 +LEVEL)) B,
DECODE(SIGN( 3 - LEVEL ) , -1, '?',CHR(64+LEVEL)||CHR(66 +LEVEL)) C,
DECODE(SIGN( 2 - LEVEL ) , -1, '?',CHR(64+LEVEL)||CHR(67 +LEVEL)) D,
DECODE(sign( 1 - level ) , -1, '?',chr(64+level)||chr(68 +level)) E
from dual connect by level <6;
}
this above is using oracle 10g, other db should have some other strategy.

How to calculate a row value based on the previous row value in the same column

I have the following data set:
DATE CODE RANK PARTITION
? ABS 0 1
12/04/2014 RET 1 1
20/04/2014 RET 2 1
01/05/2014 ABS 2 1
13/05/2014 RET 2 1
01/06/2015 ABS 2 1
09/10/2015 RETk 2 1
? ABS 0 2
02/04/2015 RET 1 2
03/04/2015 RET 2 2
04/04/2015 ABS 2 2
05/04/2015 STT 3 2
06/04/2015 RETk 4 2
07/04/2015 RETk 4 2
RANK is the column I want to calculate in my SQL given the columns DATE, CODE AND the previous value of the same column. It's initialized here to 0.
The logic I want to implement is as follows:
If RANK-1 (previous row) IS NULL AND CODE = ABS THEN RANK = 0
If RANK-1 (previous row) IS NULL AND CODE <> ABS THEN RANK <- (RANK-1) + 1
If RANK-1 = 0 or 1 AND CODE = RET THEN RANK <- (RANK-1) + 1
If RANK-1 = 2 AND CODE = STT THEN RANK <- (RANK-1) + 1
If RANK-1 = 3 AND CODE = RETk THEN RANK <- (RANK-1) + 1
If CODE = ABS THEN RANK <- (RANK-1) (previous row)
Else 0
The Teradata release I am using is R14. The calculation is done on a partition basis as shown in the example above. I have added some more constraints in the model to make it clearer. In this example, if the current code is RET, I do not increase the rank until the previous one is 0 or 1. Similarly, If my current code is RETk, I do not increase the rank until the previous one is equal to 3, otherwise, I do not change the rank. I repeat the same process in the following partition and so on ...
I cannot figure out how to update the current column value given the previous one... I tried many logic implementation with OLAP functions without success.
Can anyone give me a hint?
Thank you very much for your help
You can always use a recursive query for tasks like this. But performance will be bad unless the number of rows per group is low.
First you need a way to advance to the next row, as the next row's date can't be calculated based on the current row's date you must materialize the data and add a ROW_NUMBER:
CREATE TABLE tab(dt DATE, CODE VARCHAR(10), rnk INT, part INT);
INSERT INTO tab( NULL,'ABS' ,0 , 1);
INSERT INTO tab(DATE'2014-04-12','RET' ,1 , 1);
INSERT INTO tab(DATE'2014-04-20','RET' ,2 , 1);
INSERT INTO tab(DATE'2014-05-01','ABS' ,2 , 1);
INSERT INTO tab(DATE'2014-05-13','RET' ,2 , 1);
INSERT INTO tab(DATE'2014-06-01','ABS' ,2 , 1);
INSERT INTO tab(DATE'2014-10-09','RETk',2 , 1);
INSERT INTO tab( NULL,'ABS' ,0 , 2);
INSERT INTO tab(DATE'2015-04-02','RET' ,1 , 2);
INSERT INTO tab(DATE'2015-04-03','RET' ,2 , 2);
INSERT INTO tab(DATE'2015-04-04','ABS' ,2 , 2);
INSERT INTO tab(DATE'2015-04-05','STT' ,3 , 2);
INSERT INTO tab(DATE'2015-04-06','RETk',4 , 2);
INSERT INTO tab(DATE'2015-04-07','RETk',4 , 2);
CREATE VOLATILE TABLE vt AS
(
SELECT dt, code, part
-- used to find the next row
,ROW_NUMBER() OVER (PARTITION BY part ORDER BY dt) AS rn
FROM tab
) WITH DATA
PRIMARY INDEX(part, rn)
ON COMMIT PRESERVE ROWS
;
And now it's just applying your logic using CASE row after row:
WITH RECURSIVE cte (dt, code, rnk, part, rn) AS
(
SELECT
dt
,code
,CASE WHEN code = 'ABS' THEN 0 ELSE 1 END
,part
,rn
FROM vt
WHERE rn = 1
UNION ALL
SELECT
vt.dt
,vt.code
,CASE
WHEN cte.rnk IN (0,1) AND vt.CODE = 'RET' THEN cte.rnk + 1
WHEN cte.rnk = 2 AND vt.CODE = 'STT' THEN cte.rnk + 1
WHEN cte.rnk = 3 AND vt.CODE = 'RETk' THEN cte.rnk + 1
WHEN vt.CODE = 'ABS' THEN cte.rnk
ELSE cte.rnk
END
,vt.part
,vt.rn
FROM vt JOIN cte
ON vt.part =cte.part
AND vt.rn =cte.rn + 1
)
SELECT *
FROM cte
ORDER BY part, dt;
But I think your logic is not actually like this (based on the previous rows exact RANK value), you're just stuck in procedural thinking :-)
You might be able to do what you want using OLAP-functions only...
Something along the lines of:
create table table1
(
datecol date,
code varchar(10),
rankcol integer
);
--insert into table1 select '2014/05/13', 'RETj', 0;
select
case
when s1.code='ABS' and s2.rankcol = 1 then 1
when s1.code='RET' and s2.rankcol = 0 then 1
when s1.code='RET' and s2.rankcol = 1 then 2
else 0
end RET_res,
s1.*, s2.*
from
(select rankcol, code, row_number() OVER (order by datecol) var1 from table1) s1,
(select rankcol, code, row_number() OVER (order by datecol) var1 from table1) s2
where s1.var1=s2.var1-1
order by s1.var1
;

How to do get data out of multiple `countof`s on single table in SQLite?

I have a table with data and having a column with a date value and an affinity column, which can have 5 different values (i.e. a,b,c,d,e).
Table A
table_id date affinity
I need to count how many entries there is per month of the year per affinity. I initially created an sql query for each month per affinity, so the database gets opened for about 60 times, which is too much for most android phones to handle and is super slow.
How I can condense this in a single query and then how I can get the values? Ideally I would create a temporary table that looks like this with sample values.
Jan Feb Mar Apr May ...
a 2 4 6 4 1
b 4 1 3 4 0
c 2 2 4 2 0
d 7 3 6 0 5
e 9 5 1 9 8
I am not well versed with advanced sql querying, but I do know of JOINS and nested SELECTS. So I just need a little push in the right direction. How can I achieve this?
You can use conditional aggregation to do this by using a case expression in conjunction with the count function:
select
affinity
, count(case when month(`date`) = 1 then affinity end) as "Jan"
, count(case when month(`date`) = 2 then affinity end) as "Feb"
, count(case when month(`date`) = 3 then affinity end) as "Mar"
, count(case when month(`date`) = 4 then affinity end) as "Apr"
, count(case when month(`date`) = 5 then affinity end) as "May"
-- ... etc.
from a -- this is your table, which I assumed is called 'a'
group by affinity;
Sample SQL Fiddle
As SQLite doesn't have any month function, you would have to use the strftime function instead: strftime('%m', date)
For SQLite the query should probably look lie this:
select
affinity
, count(case when strftime('%m', date) = '01' then affinity end) as "Jan"
, count(case when strftime('%m', date) = '02' then affinity end) as "Feb"
, count(case when strftime('%m', date) = '03' then affinity end) as "Mar"
, count(case when strftime('%m', date) = '04' then affinity end) as "Apr"
, count(case when strftime('%m', date) = '05' then affinity end) as "May"
from a -- this is your table, which I assumed is called 'a'
group by affinity;
Investigate 'GROUP BY' and its aggregate functions. Something like:
SELECT COUNT() AS C, affinity, date
FROM ...
GROUP BY affinity, date
Gives you the list of records. Reorder to matrix if necessary.

Resources