KQL Join on max value - azure-data-explorer

KQL Join on max value - azure-data-explorer

I need to join on a table to return the MAX value from that right-hand table. I have tried to mock it up using 'datatable' but have failed miserably :(. I'll try and describe with words.
T1 = datatable(ID:int, Properties:string, ConfigTime:datetime) [1,'a,b,c','2021-03-04 00:00:00']
T2 = datatable(ID:int, Properties:string, ConfigTime:datetime) [2,'a,b,c','2021-03-02 00:00:00', 3,'a,b','2021-03-01 00:00:00', 4,'c','2021-03-20 00:00:00']
I'm using this as an update policy on T2, which has a source of T1. So I want to select the rows from T1 and then join the rows from T2 that have the highest timestamp. My first attempt was below:
T1 | join kind=inner T2 on Id
| summarize arg_max(ConfigTime1, Id, Properties, Properties1, ConfigTime) by Id
| project Id, Properties, ConfigTime
In my actual update policy, I merge the properties from T1 and T2 then write to T2, but for simplicity, I've left that for now.
Currently, I'm not getting any output in my T2 from the update policy. Any guidance on another way I should be doing this would be appreciated. Thanks

It seems that you want to push the arg_max calculation into the T2 side of the join, something like this:
T1
| join kind=inner (
T2
| summarize arg_max(ConfigTime1, Id, Properties, Properties1, ConfigTime) by Id
| project Id, Properties, ConfigTime
) on Id
Note that to ensure acceptable performance you want to limit the timeframe for the arg_max search, so you should consider a time based filter before the arg_max.

I think what you're looking for is a union
let T1 = datatable(ID:int, Properties:string, ConfigTime:datetime) [
1,'a,b,c','2021-03-04 00:00:00'
];
let T2 = datatable(ID:int, Properties:string, ConfigTime:datetime) [
2,'a,b,c','2021-03-02 00:00:00',
3,'a,b','2021-03-01 00:00:00',
4,'c','2021-03-20 00:00:00'
];
Here is an example using a variable with summarize max:
let Latest = toscalar(T2 | summarize max(ConfigTime));
T1
| union (T2 | where ConfigTime == Latest)
The result will keep the entries from T1 and only the latest entries from T2.
If this doesn't reflect your expected results please show your expected output.

Related

Kusto equivalent of SQL NOT IN

I am trying to identify what records exist in table 1 that are not in table 2 (so essentially using NOT IN)
let outliers =
Table 2
| project UniqueEventGuid;
Table 1
|where UniqueEventGuid !in (outliers)
|project UniqueEventGuid
but getting 0 records back even though I know there are orphans in table 1.
Is the !in not the right syntax?
Thanks in advance!

!in operator
"In tabular expressions, the first column of the result set is
selected."
In the following example I intentionally ordered the column such that the query will result in error due to mismatched data types.
In your case, the data types might match, so the query is valid, but the results are wrong.
let t1 = datatable(i:int, x:string)[1,"A", 2,"B", 3,"C" ,4,"D" ,5,"E"];
let t2 = datatable(y:string, i:int)["d",4 ,"e",5 ,"f",6 ,"g",7];
t1
| where i !in (t2)
Relop semantic error: SEM0025: One of the values provided to the
'!in' operator does not match the left side expression type 'int',
consider using explicit cast
Fiddle
If that is indeed the case, you can reorder the columns or project only the relevant one.
Note the use of double brackets.
let t1 = datatable(i:int, x:string)[1,"A", 2,"B", 3,"C" ,4,"D" ,5,"E"];
let t2 = datatable(y:string, i:int)["d",4 ,"e",5 ,"f",6 ,"g",7];
t1
| where i !in ((t2 | project i))
i
x
1
A
2
B
3
C
Fiddle
Another option is to use leftanti join
let t1 = datatable(i:int, x:string)[1,"A", 2,"B", 3,"C" ,4,"D" ,5,"E"];
let t2 = datatable(y:string, i:int)["d",4 ,"e",5 ,"f",6 ,"g",7];
t1
| join kind=leftanti t2 on i
i
x
2
B
3
C
1
A
Fiddle

Kusto: How summarize calculated data

I have start and end calculated columns which I have read from Table1.
And comparing the how many events are happened in this between time .
Input Data:
let Mytable1=datatable (Vin:string,start_time:datetime ,End_time:datetime )
[ABC,datetime(2021-03-18 08:49:08.467), datetime(2021-03-18 13:32:28.000),
ABC,datetime(2021-03-18 13:41:59.323),datetime(2021-03-18 13:41:59.323),
ABC,datetime(2021-03-18 13:46:59.239),datetime(2021-03-18 14:58:02.000)];
let Mytable2=datatable(Vin:string,Timestamp:datetime)
[ABC,datetime(2021-03-18 08:49:08.467),ABC,datetime(2021-03-18 08:59:08.466),ABC,datetime(2021-03-18 09:04:08.460),ABC,datetime(2021-03-18 13:24:27.0000000)];
Query:
let Test=Table1
|where Vin =="ABC" | distinct Vin,Start_Time,End_Time;
let min1=toscalar(Test |summarize min1= min(Start_Time));
let max1=toscalar(Test |summarize max1=max(End_Time));
Table2
|where Vin =="ABC" and Timestamp between (todatetime(min1) ..todatetime(max1))
| join kind=fullouter Test
on $left.Vin == $right.Vin and $left.Timestamp== $right.Start_Time
|summarize Events= (count()) by Timestamp,Vin,Start_Time,End_Time
|project Timestamp,Start_Time,End_Time,Events
Output of above query is :
But My expected output is :
Means Events count from between two start and end time.

You should not have timestamp in your final aggregation. A working example could look like:
let measurement_range=datatable (vin:string,start_time:datetime ,end_time:datetime )
["ABC",datetime(2021-03-18 08:49:08.467),datetime(2021-03-18 13:32:28.000),
"ABC",datetime(2021-03-18 13:41:59.323),datetime(2021-03-18 13:44:59.323),
"ABC",datetime(2021-03-18 13:46:59.239),datetime(2021-03-18 14:58:02.000),
];
let measurement=datatable(vin:string,timestamp:datetime)
["ABC",datetime(2021-03-18 08:49:08.467),
"ABC",datetime(2021-03-18 08:59:08.466),
"ABC",datetime(2021-03-18 09:04:08.460),
"ABC",datetime(2021-03-18 13:42:27.0000000)];
measurement_range
| join kind=inner (measurement)
on vin
| where timestamp between (start_time..end_time)
| summarize event=(count()) by vin, start_time, end_time
With this you get a count for your measurement window. In this example you get a large intermediate resultset, as the timerange is considered in the where statement.
Please see the Azure Data Explorer Documentation how to optimize time window joins (the example is not efficient for larger datasets).

Sqlite3 repeats value in other dates

.The involved tables:
data_incidencia.
data_ticket.
My query is the following
select t1.hurtos, t2.fallas,t3.ticket, t1.fecha_carga
from
(select count(ttc) as hurtos,
fecha_carga from data_incidencia
where campo_key_id = 2
group by fecha_carga) t1,
(select count(ttc) as fallas,
fecha_carga from data_incidencia
where campo_key_id = 1
group by fecha_carga) t2,
(select count(ticket) as ticket,
fecha_solicitud as fecha_carga from data_ticket ) t3
where t1.fecha_carga =t2.fecha_carga;
and the output:
but the desired output is:
notice that "ticket" is repeating value in 2018-05-16 where is no tickets, is probably something dumb as case when or group by, but I can't figure it out.
Any ideas of how should i fix this query ?

You have three subqueries, t1, t2, and t3.
t1 and t2 are joined, but t3 is not, so you get an implicit cross join.
The column names you're using look as if you want to join all three on the same column:
SELECT t1.hurtos, t2.fallas, t3.ticket, t1.fecha_carga
FROM (...) AS t1,
(...) AS t2,
(...) AS t3
WHERE t1.fecha_carga = t2.fecha_carga
AND t1.fecha_carga = t3.fecha_carga;
And implicit joins are outdated since 1992; better use explicit joins:
SELECT t1.hurtos, t2.fallas, t3.ticket, fecha_carga
FROM (...) AS t1
JOIN (...) AS t2 USING (fecha_carga)
JOIN (...) AS t3 USING (fecha_carga);

Combining multiple table with certain critera for Teradata

I'm trying to combine 3 different tables into one with certain criteria.
Table 2 - Calculate the Total Weight of the shipments base on Shipper Number and Ship Date
Table 3 - Calaculate the Total Revenue Amount base on Shipper Number and Ship Date
Result - Combined the tables into a single table.
Additionally, the SQL should also filter for "Customer Since" <= 720 and "Ship Date" <= 360
(Note: Customer Number = Shipper Number)

It's difficult to tell what you're looking for, but this could get you started
SELECT t2.Shipper_Number, t2.Ship_Date, t2.Total_Weight, t3.Total_Revenue
FROM (
SELECT Shipper_Number, Ship_Date, SUM(Weight) AS Total_Weight
FROM Table2
GROUP BY Shipper_Number, Ship_Date
) t2
INNER JOIN (
SELECT Shipper_Number, Ship_Date, SUM(Revenue) AS Total_Revenue
FROM Table3
GROUP BY Shipper_Number, Ship_Date
) t3 ON t2.<PK_column> = t3.<PK_column>
A few notes:
INNER JOIN will only return rows that match in both tables
You can add your "Customer Since" and "Ship Date" restrictions as WHERE clauses in the sub-queries
Update
If you want to get one row per (Shipper_Number, Ship_Date) group, then you need to make your JOIN condition for those two derived tables using that column combination (Shipper_Number, Ship_Date).
And if you want to further filter your rows on "Customer Since", then just JOIN to the source table and add the filtering condition. Something like this:
SELECT t2.Shipper_Number, t2.Ship_Date, t2.Total_Weight, t3.Total_Revenue
FROM (
SELECT Shipper_Number, Ship_Date, SUM(Weight) AS Total_Weight
FROM Table2
WHERE Ship_Date <= Ship_Date - INTERVAL '360' DAY -- Assumes you want only order that shipped within the last 360 days
GROUP BY Shipper_Number, Ship_Date
) t2
INNER JOIN (
SELECT Shipper_Number, Ship_Date, SUM(Revenue) AS Total_Revenue
FROM Table3
WHERE Ship_Date <= Ship_Date - INTERVAL '360' DAY -- Assumes you want only order that shipped within the last 360 days
GROUP BY Shipper_Number, Ship_Date
) t3 ON t2.Shipper_Number = t3.Shipper_Number AND t2.Ship_Date = t3.Ship_Date
INNER JOIN Table1 t1 ON t2.Shipper_Number = t1.Customer_Number -- Get shipper/customer info
WHERE t1."Customer Since" <= "Customer Since" - INTERVAL '720' DAY -- Assuming you want to get customers that are fewer than 720 days old
Again, using an INNER JOIN, this assumes that matching rows exist in both tables for (Shipper_Number and Ship_Date), otherwise it won't return that row.
Also, if feasible, you may want to consider combining your t2 and t3 tables into a single table based on a common key (shipper_number, shipping_number) or just shipping_number if it's a unique value.

Compare data of the same table

I need to compare data of the same table who do this.
At example, compare A,10 with B,10 when 10 is a repeat value.

You can do this using the cross-product operator, in SQL this would be done as
SELECT T1.colA, T2.colA, (T1.colA < T2.colA) as colA_comp
FROM TableName T1, TableName T2
WHERE T1.colB = T2.colB
What this does is take the cross-product of the table TableName with itself (renamed as T1 and T2), and the WHERE clause filters out those records that agree on colB (the repeated value 10, in your example).

If you compare with the same table then you may use left Join
SELECT t1.cola,t1.colb,t2.cola,...
from tableA t1
LEFT JOIN tableA t2 on t2.cola = t1.cola
WHERE t1.cola = 10
I hope it might be work!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

KQL Join on max value - azure-data-explorer

Related

Kusto equivalent of SQL NOT IN

Kusto: How summarize calculated data

Sqlite3 repeats value in other dates

Combining multiple table with certain critera for Teradata

Compare data of the same table

Categories

Resources