Ranking function in Kusto - azure-data-explorer

enter image description hereI have below data in Kusto table .( Run_Date datetime and sensor string are two column in table)
I have requirement to add autoincrement column in such way that if Run Date or Sensor value gets changed column should incremented by one .
Please refer attached screenshot . I have tried with Rank and Rownumber function in kusto but no luck for me.

you could use the scan operator: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/scan-operator
datatable(run_date:datetime, sensor:string)
[
datetime(2021-08-05), "A",
datetime(2021-08-05), "A",
datetime(2021-08-05), "A",
datetime(2021-08-05), "B",
datetime(2021-08-05), "B",
datetime(2021-09-05), "B",
]
| order by run_date asc
| scan declare (_rank: long = 0) with
(
step s1: true => _rank = iff(run_date > s1.run_date or sensor != s1.sensor, s1._rank + 1, s1._rank);
)
run_date
sensor
_rank
2021-08-05 00:00:00.0000000
A
1
2021-08-05 00:00:00.0000000
A
1
2021-08-05 00:00:00.0000000
A
1
2021-08-05 00:00:00.0000000
B
2
2021-08-05 00:00:00.0000000
B
2
2021-09-05 00:00:00.0000000
B
3

Another alternative is to use the row_rank() function
datatable(run_date:datetime, sensor:string)
[
datetime(2021-08-05), "A",
datetime(2021-08-05), "A",
datetime(2021-08-05), "A",
datetime(2021-08-05), "B",
datetime(2021-08-05), "B",
datetime(2021-09-05), "B",
]
| extend Day = bin(run_date, 1d)
| extend RankColumn = strcat(Day, sensor)
| order by RankColumn asc
| extend Rownumber = row_rank(RankColumn)
| project-away RankColumn, Day

Related

SQLite multiple CASE WHEN weird result

I'm using multiple CASE WHEN to find device actions in selected days, but instead of getting only the abreviation names (like V or C), sometimes i get the full action name. If i try to replace the 'ELSE action' with ELSE '', i get some blanks, even though there aren't any blank actions... How can i improve my query?
SELECT device,
CASE
WHEN action='Vaccum' AND strftime('%d', timestamp_action) = '25' THEN 'V'
WHEN action='Cooling' AND strftime('%d', timestamp_action) = '25' THEN 'C' ELSE action END AS '25',
CASE
WHEN action='Vaccum' AND strftime('%d', timestamp_action) = '26' THEN 'V'
WHEN action='Cooling' AND strftime('%d', timestamp_action) = '26' THEN 'C' ELSE action END AS '26',
FROM diary WHERE strftime('%m', timestamp_action = '08')
GROUP BY device
ORDER BY device
I want to get the latest action on selected days of all devices. I have around 100 devices and i need the actions for the entire month.
Example table:
timestamp_action | device | action
------------------------+---------------+-----------
2022-08-25 11:08 | 1 | Cooling
2022-08-25 11:09 | 1 | Vaccum
2022-08-25 11:08 | 2 | Cooling
2022-08-26 11:10 | 2 | Vaccum
2022-08-26 11:11 | 2 | Cooling
2022-08-26 12:30 | 1 | Vaccum
So the result i'm looking for is:
device | 25 | 26 .....
-----------+-----------+--------------
1 | V | V
2 | C | C
Use 2 levels of aggregation:
WITH cte AS (
SELECT device,
strftime('%d', timestamp_action) day,
CASE action WHEN 'Vaccum' THEN 'V' WHEN 'Cooling' THEN 'C' ELSE action END action,
MAX(timestamp_action) max_timestamp_action
FROM diary
WHERE strftime('%Y-%m', timestamp_action) = '2022-08'
GROUP BY device, day
)
SELECT device,
MAX(CASE WHEN day = '25' THEN action END) `25`,
MAX(CASE WHEN day = '26' THEN action END) `26`
FROM cte
GROUP BY device;
See the demo.

Summarize dynamic values with Kusto query in Azure Data Explorer

I have this query that almost works:
datatable (timestamp:datetime, value:dynamic)
[
datetime("2021-04-19"), "a",
datetime("2021-04-19"), "b",
datetime("2021-04-20"), 1,
datetime("2021-04-20"), 2,
datetime("2021-04-21"), "b",
datetime("2021-04-22"), 2,
datetime("2021-04-22"), 3,
]
| project timestamp, stringvalue=iif(gettype(value)=="string", tostring(value), ""), numericvalue=iif(gettype(value)=="long", toint(value), int(null))
| summarize any(stringvalue), avg(numericvalue) by bin(timestamp, 1d)
| project timestamp, value=iif(isnan(avg_numericvalue), any_stringvalue, avg_numericvalue)
This splits the values in the value field into stringvalue if the value is string and numericvalue of the value is long. Then it summarizes the values based on day level, for the string values it just takes any value and for the numeric values is calculates the average.
After this I want to put the values back into the value field.
I was thinking that the last row could be like below but the dynamic function only wants literals
| project timestamp, value=iif(isnan(avg_numericvalue), dynamic(any_stringvalue), dynamic(avg_numericvalue))
If I do it like this it will actually work:
| project timestamp, value=iif(isnan(avg_numericvalue), parse_json(any_stringvalue), parse_json(tostring(avg_numericvalue)))
But is there a better way than converting it to json and back?
iff expects the type of the 2nd and 3rd arguments to match. In your case, one is a number, and the other one is a string. To fix the issue, just add tostring() around the number:
datatable (timestamp:datetime, value:dynamic)
[
datetime("2021-04-19"), "a",
datetime("2021-04-19"), "b",
datetime("2021-04-20"), 1,
datetime("2021-04-20"), 2,
datetime("2021-04-21"), "b",
datetime("2021-04-22"), 2,
datetime("2021-04-22"), 3,
]
| project timestamp, stringvalue=iif(gettype(value)=="string", tostring(value), ""), numericvalue=iif(gettype(value)=="long", toint(value), int(null))
| summarize any(stringvalue), avg(numericvalue) by bin(timestamp, 1d)
| project timestamp, value=iif(isnan(avg_numericvalue), any_stringvalue, tostring(avg_numericvalue))

KQL window functions - how to partition by multiple columns?

Input table dimVehicleV1:
SaleStart
Product
Model
1/1/2020
Car
1
1/2/2020
Bike
1
2/1/2020
Car
2
3/1/2020
Bike
2
Desired output dimVehicleV2:
SaleStart
Product
Model
SaleEnd
1/1/2020
Car
1
2/1/2020
1/2/2020
Bike
1
3/1/2020
2/1/2020
Car
2
null
3/1/2020
Bike
2
null
I see serialization via order by, and then the next() function. I don't see how to make it respect the Product column groupings though.
Fail query:
let dimVehicleV2 =
dimVehicleV1
| order by Product asc, SaleStart asc
| extend SaleEnd = next(SaleStart, 1);
dimVehicleV2
How does one use the next() function so that it respects column groups?
If I understand your question correctly, this should work:
datatable(SaleStart:datetime, Product:string, Model:int)
[
datetime(1/1/2020), 'Car', 1,
datetime(1/2/2020), 'Bike', 1,
datetime(2/1/2020), 'Car', 2,
datetime(3/1/2020), 'Bike', 2,
]
| order by Product asc, SaleStart asc
| extend SaleEnd = iff(next(Product) == Product and next(Model) != Model, next(SaleStart), datetime(null))
SaleStart
Product
Model
SaleEnd
2020-01-01 00:00:00.0000000
Car
1
2020-02-01 00:00:00.0000000
2020-01-02 00:00:00.0000000
Bike
1
2020-03-01 00:00:00.0000000
2020-02-01 00:00:00.0000000
Car
2
2020-03-01 00:00:00.0000000
Bike
2
I came to this post searching for an answer to the question actually in the title of this post: "How to partition by multiple columns?"
In case someone else needs, here is what I ended up doing: extend the domain by creating a new column that combines the values of the multiple columns you want, and use that new column as the partition key.
You can combine the columns by using concatenation, or a hash, or something else.
dimVehicleV1
| extend PartitionKey = strcat(Product, ":", Model)
| partition hint.strategy=native by PartitionKey (top 1 by SaleStart) // or wharever partition transformation
In case useful to anyone, I found a solution I prefer over Yoni's perfectly adequate one.
let MyTable = datatable(SaleStart:datetime, Product:string, Model:int)
[
datetime(1/1/2020), 'Car', 1,
datetime(1/2/2020), 'Bike', 1,
datetime(2/1/2020), 'Car', 2,
datetime(3/1/2020), 'Bike', 2,
];
MyTable
| partition by Product
(
order by Model asc
| extend SaleEnd = next(SaleStart)
)
This seems to me to abstract away the details of the logic required, expressing just the thought.

Looking for a way to calculate aggregates without collapsing rows

As the title says, I'd like to find an efficient way to calculate aggregates over groups of rows without collapsing those rows together. For an example I want to create the mean column in the table below.
|------------|---------|-------------|
| category | value | mean(value) |
|------------|---------|-------------|
| A | 1 | 3 |
|------------|---------|-------------|
| A | 3 | 3 |
|------------|---------|-------------|
| A | 5 | 3 |
|------------|---------|-------------|
| B | 1 | 1.5 |
|------------|---------|-------------|
| B | 2 | 1.5 |
|------------|---------|-------------|
So far, the best way I've found to do this is:
T
| join kind=leftouter (T | summarize avg() by category) on category
This seems to be causing performance problems. I'm also aware of a way of doing it using partition by, but need to support having more than 64 categories.
Am I missing a good way of doing this task?
Here you go:
let MyTable = datatable(Category:string, value:long) [
"A", 1,
"A", 3,
"A", 5,
"B", 1,
"B", 2
];
let Avgs = MyTable | summarize avg(value) by Category;
MyTable | lookup (Avgs) on Category
This will output exactly what you want.
Explanation:
First you create a temporary table (using a let statement) named Avgs, where you'll have the average per Category.
Your main statement is to output MyTable, but for every category you want to also display the relevant value from Avgs, which you achieve by using the lookup operator.

Flatten nested arrays in cosmosdb sql

I have the following cosmosdb document:
{
id: "id",
outer: [
{
"inner": [ "a", "b", "c" ]
},
{
"inner": [ "d", "e", "f" ]
}
]
}
And I need to create a SQL request which would return all of the combined values of the "inner" arrays, like this:
{
"allInners": [ "a", "b", "c", "d", "e", "f" ]
}
I was able to unwind the first array level using the "IN" operator, but I am not sure how unwind it one more level and to handle double or even triple nested arrays.
The following is my subquery to aggregate those items
SELECT
... other stuff.
ARRAY(SELECT VALUE innerObj.inner FROM innerObj IN c.outer) AS allInners,
...
FROM c
I found the following solution to my problem (using the nested "IN" and a subquery):
ARRAY(
SELECT VALUE inner
FROM inner IN (
SELECT VALUE outers.inner
FROM outers IN c.outer
)
)
Please try something like this sql:
select ARRAY(SELECT VALUE e FROM c join d in c["outer"] join e in d["inner"]) AS allInners from c
Here is result of my test:
Hope this can help you.:)

Resources