KQL/Kusto - How to generate row_number similar to SQL - azure-data-explorer

Here is my data set in kusto and I am trying to generate "releaseRank" column based on the release column value.
Input Dataset:
let T = datatable(release:string, metric:long)
[
"22.05", 20,
"22.04", 40,
"22.03", 50,
"22.01", 560
];
T
|take 100;
desired output :
found that there is serialize and row_number kusto
T
|serialize
|extend releaseRank = row_number()
|take 100;
But if the release value is repeated, i need the releaseRank to be same for eg. given the data set, i am not getting the desired output
T = datatable(release:string, metric:long)
[
"22.05", 20,
"22.05", 21,
"22.04", 40,
"22.03", 50,
"22.01", 560
];
T
|serialize
|extend releaseRank = row_number()
|take 100;
expected output

This should do what we want.
let T = datatable(release:string, metric:long)
[
"22.05", 20,
"22.05", 21,
"22.04", 40,
"22.03", 50,
"22.01", 560
];
T
| sort by release desc , metric asc
| extend Rank=row_rank(release)
22.05 20 1
22.05 21 1
22.04 40 2
22.03 50 3
22.01 560 4

Related

JQ: How to split array by values and find out length of each piece?

I need to find out lengths of user sessions given timestamps of individual visits.
New session starts every time a delay between adjacent timestamps is longer than limit.
For example, for this set of timestamps (consider it sort of seconds from epoch):
[
101,
102,
105,
116,
128,
129,
140,
145,
146,
152
]
...and for value of limit=10, I need the following output:
[
3,
1,
2,
4
]
Assuming the values will be in ascending order, loop through the values accumulating the groups based on your condition. reduce works well in this case.
10 as $limit # remove this so you can feed in your value as an argument
| reduce .[] as $i (
{prev:.[0], group:[], result:[]};
if ($i - .prev > $limit)
then {prev:$i, group:[$i], result:(.result + [.group])}
else {prev:$i, group:(.group + [$i]), result}
end
)
| [(.result[], .group) | length]
If the difference from the previous value exceeds the limit, take the current group of values and move it to the result. Otherwise, the current value belongs to the current group so add it. At the end, you could count the sizes of the groups to get your result.
Here's a slightly modified version that just counts the values up.
10 as $limit
| reduce .[] as $i (
{prev:.[0], count:0, result:[]};
if ($i - .prev > $limit)
then {prev:$i, count:1, result:(.result + [.count])}
else {prev:$i, count:(.count + 1), result}
end
)
| [.result[], .count]
Here's another approach using indices to calculate the breakpoint positions:
Producing the lengths of the segments:
10 as $limit
| [
[0, indices(while(. != []; .[1:]) | select(.[0] + $limit <= .[1]))[] + 1, length]
| .[range(length-1):] | .[1] - .[0]
]
[
3,
1,
2,
4
]
Demo
Producing the segments themselves:
10 as $limit
| [
(
[indices(while(. != []; .[1:]) | select(.[0] + $limit <= .[1]))[] + 1]
| [null, .[0]], .[range(length):]
)
as [$a,$b] | .[$a:$b]
]
[
[
101,
102,
105
],
[
116
],
[
128,
129
],
[
140,
145,
146,
152
]
]
Demo

How to do 2 summarize operation in one Kusto query?

I am stuck with a Kusto query.
This is what I want to do - I would like to show day wise sales amount with the previous month's sales amount on the same day.
datatable(DateStamp:datetime, OrderId:string, SalesAmount:int)
[
"02-01-2019", "I01", 100,
"02-01-2019", "I02", 200,
"02-02-2019", "I03", 250,
"02-02-2019", "I04", 150,
"02-03-2019", "I13", 110,
"01-01-2019", "I10", 20,
"01-02-2019", "I11", 50,
"01-02-2019", "I12", 30,
]
| extend SalesDate = format_datetime(DateStamp, 'MM/dd/yyyy')
| summarize AmountOfSales = sum(SalesAmount) by SalesDate
This is what I see.
And, instead this is what I want to show as result --
I couldn't figure out how to add multiple summarize operator in one query.
Here's an option:
datatable(DateStamp:datetime, OrderId:string, SalesAmount:int)
[
"02-01-2019", "I01", 100,
"02-01-2019", "I02", 200,
"02-02-2019", "I03", 250,
"02-02-2019", "I04", 150,
"02-03-2019", "I13", 110,
"01-01-2019", "I10", 20,
"01-02-2019", "I11", 50,
"01-02-2019", "I12", 30,
]
| summarize AmountOfSales = sum(SalesAmount) by bin(DateStamp, 1d)
| as hint.materialized = true T
| extend prev_month = datetime_add("Month", -1, DateStamp)
| join kind=leftouter T on $left.prev_month == $right.DateStamp
| project SalesDate = format_datetime(DateStamp, 'MM/dd/yyyy'), AmountOfSales, AmountOfSalesPrevMonth = coalesce(AmountOfSales1, 0)
SalesDate
AmountOfSales
AmountOfSalesPrevMonth
01/01/2019
20
0
01/02/2019
80
0
02/01/2019
300
20
02/02/2019
400
80
02/03/2019
110
0

Create column based on calculated % of failures

Here is the sample data and my query. How to calculate % of failures and create new column value based on the percentage of failures?
let M = datatable (Run_Date:datetime , Job_Name:string , Job_Status:string , Count:int )
["2020-10-21", "Job_A1", "Succeeded", 10,
"2020-10-21", "Job_A1", "Failed", 8,
"10/21/2020", "Job_B2", "Succeeded", 21,
"10/21/2020", "Job_C3", "Succeeded", 21,
"10/21/2020", "Job_D4", "Succeeded", 136,
"10/21/2020", "Job_E5", "Succeeded", 187,
"10/21/2020", "Job_E5", "Failed", 4
];
M
| summarize count() by Job_Name, Count, summary = strcat(Job_Name, " failed " , Count, " out of ", Count ," times.")
And the desired output is below.
you could try something like this:
datatable (Run_Date:datetime , Job_Name:string , Job_Status:string , Count:int )
[
"2020-10-21", "Job_A1", "Succeeded", 10,
"2020-10-21", "Job_A1", "Failed", 8,
"10/21/2020", "Job_B2", "Succeeded", 21,
"10/21/2020", "Job_C3", "Succeeded", 21,
"10/21/2020", "Job_D4", "Succeeded", 136,
"10/21/2020", "Job_E5", "Succeeded", 187,
"10/21/2020", "Job_E5", "Failed", 4
]
| summarize failures = sumif(Count, Job_Status == "Failed"), total = sum(Count) by Job_Name, Run_Date
| project Job_Name, failure_rate = round(100.0 * failures / total, 2), Summary = strcat(Job_Name, " failed " , failures, " out of ", total ," times.")
| extend ['Alert if failure rate > 40%'] = iff(failure_rate > 40, "Yes", "No")
--->
| Job_Name | failure_rate | Summary | Alert if failure rate > 40% |
|----------|--------------|-----------------------------------|-----------------------------|
| Job_A1 | 44.44 | Job_A1 failed 8 out of 18 times. | Yes |
| Job_B2 | 0 | Job_B2 failed 0 out of 21 times. | No |
| Job_C3 | 0 | Job_C3 failed 0 out of 21 times. | No |
| Job_D4 | 0 | Job_D4 failed 0 out of 136 times. | No |
| Job_E5 | 2.09 | Job_E5 failed 4 out of 191 times. | No |

How to build a distribution of percentages in U-SQL?

In my database I have the column "Study hours per week". I want to build a distribution using U-SQL which groups the % of students into each 'student hour' bucket. Is there a built-in function to help me achieve this?
Essentially, I want to populate the right side of this table:
Study Hours per week | % of students
<= 1
<= 5
<= 10
<= 20
<= 40
<= 100
Example: If we had 10 unique students with the following study hours/week: [5, 6, 10, 9, 2, 25, 18, 5, 12, 1] the resulting output should be:
Study Hours per week | % of students
<= 1 | 10%
<= 5 | 40%
<= 10 | 70%
<= 20 | 90%
<= 40 | 100%
<= 100| 100%

collection findall in an array list

I am using groovy and I have a collection :
person 1: age - 1, weight - 25
person 2: age - 2, weight - 20
person 3: age - 3, weight - 25
I need to find all persons whose age or weight is in the list of valid age/weight returned by a method called getValidAgeForSchool() or getValidWeightForSchool() ex. ages [2,3] or weight [20,25]
I know there is something like this (not working too)
persons.findAll{ it.age == 2 || it.weight == 20}
but how I can say (like the IN Clause)
persons.findAll {it.age in [2,3] || it.weight in [20,25]}.
I also tried this (ignoring the weight for now) but not returning the list when it is supposed to
persons.age.findAll{ it == 2 || it == 3}
thanks.
The code you have works:
def people = [
[ id: 1, age: 1, weight: 25 ],
[ id: 2, age: 2, weight: 20 ],
[ id: 3, age: 3, weight: 25 ]
]
// This will find everyone (as everyone matches your criteria)
assert people.findAll {
it.age in [ 2, 3 ] || it.weight in [ 20, 25 ]
}.id == [ 1, 2, 3 ]
It also works if you have a list of instances like so:
class Person {
int id
int age
int weight
}
def people = [
new Person( id: 1, age: 1, weight: 25 ),
new Person( id: 2, age: 2, weight: 20 ),
new Person( id: 3, age: 3, weight: 25 )
]
I'm assuming your problem is that you have weight as a double or something?
If weight is a double, you'd need to do:
people.findAll { it.age in [ 2, 3 ] || it.weight in [ 20d, 25d ] }.id
But beware, this is doing double equality comparisons, so if you are doing any arithmetic on the weight, you may fall victim to rounding and accuracy errors

Resources