I am stuck with a Kusto query.
This is what I want to do - I would like to show day wise sales amount with the previous month's sales amount on the same day.
datatable(DateStamp:datetime, OrderId:string, SalesAmount:int)
[
"02-01-2019", "I01", 100,
"02-01-2019", "I02", 200,
"02-02-2019", "I03", 250,
"02-02-2019", "I04", 150,
"02-03-2019", "I13", 110,
"01-01-2019", "I10", 20,
"01-02-2019", "I11", 50,
"01-02-2019", "I12", 30,
]
| extend SalesDate = format_datetime(DateStamp, 'MM/dd/yyyy')
| summarize AmountOfSales = sum(SalesAmount) by SalesDate
This is what I see.
And, instead this is what I want to show as result --
I couldn't figure out how to add multiple summarize operator in one query.
Here's an option:
datatable(DateStamp:datetime, OrderId:string, SalesAmount:int)
[
"02-01-2019", "I01", 100,
"02-01-2019", "I02", 200,
"02-02-2019", "I03", 250,
"02-02-2019", "I04", 150,
"02-03-2019", "I13", 110,
"01-01-2019", "I10", 20,
"01-02-2019", "I11", 50,
"01-02-2019", "I12", 30,
]
| summarize AmountOfSales = sum(SalesAmount) by bin(DateStamp, 1d)
| as hint.materialized = true T
| extend prev_month = datetime_add("Month", -1, DateStamp)
| join kind=leftouter T on $left.prev_month == $right.DateStamp
| project SalesDate = format_datetime(DateStamp, 'MM/dd/yyyy'), AmountOfSales, AmountOfSalesPrevMonth = coalesce(AmountOfSales1, 0)
SalesDate
AmountOfSales
AmountOfSalesPrevMonth
01/01/2019
20
0
01/02/2019
80
0
02/01/2019
300
20
02/02/2019
400
80
02/03/2019
110
0
Related
Let take the following dataset
timestamp_bin
deviceId
flow
level
pressure
2020-05-15T00:00:00Z
fddf1cec-16db-4461-9057-3d08e46b6bcf
NaN
55
NaN
2020-05-15T00:00:00Z
aaaaaaaa-fed4-c23b-422b-e85e0877c092
365
85
NaN
2020-05-15T00:00:00Z
cb04ccff-48bc-4108-9d16-7d7db9152895
NaN
NaN
130
I would like to merge the flow, level and pressure column without the NaN values and without having to mention the flow, level and pressure column:
deviceId
timestamp
value
fddf1cec-16db-4461-9057-3d08e46b6bcf
2020-05-15 17:01:35.7750000
{"level": 55.0}
aaaaaaaa-fed4-c23b-422b-e85e0877c092
2020-05-15 17:01:35.7750000
{"flow": 365.0, "level": 85.0}
cb04ccff-48bc-4108-9d16-7d7db9152895
2020-05-15 17:01:35.7750000
{"pressure": 130.0}
The following query allow to achieve this result
let deviceTelemetry = datatable (deviceId: guid, timestamp: datetime, flow: real, level: real, pressure: real)
[
'fddf1cec-16db-4461-9057-3d08e46b6bcf', '2020-05-15 17:01:35.7750000', real(NaN), 55, real(NaN),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092', '2020-05-15 17:01:35.7750000', 365, 85, real(NaN),
'cb04ccff-48bc-4108-9d16-7d7db9152895', '2020-05-15 17:01:35.7750000', real(NaN), real(NaN), 130,
];
deviceTelemetry
| extend packAll=pack_all()
| extend packWithNan=bag_remove_keys(packAll, dynamic(['deviceId', 'timestamp']))
| project-away packAll
| mv-expand kind=array packWithNan
| where packWithNan[1] != 'NaN'
| extend packWithoutNan=pack(tostring(packWithNan[0]), packWithNan[1])
| summarize value=make_bag(packWithoutNan) by deviceId, timestamp
Is there a better way to achieve this query ?
Edit:
This discussion is a continuation of this previous thread
... and just for the fun of it, here is an extension of the previous solution:
let deviceTelemetry = datatable (deviceId:guid, timestamp:datetime, value:dynamic)[
'fddf1cec-16db-4461-9057-3d08e46b6bcf','2020-05-15 17:01:35.7750000', dynamic({ "level": 60}),
'fddf1cec-16db-4461-9057-3d08e46b6bcf','2020-05-15 18:01:35.7750000', dynamic({ "level": 50}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 17:01:35.7750000', dynamic({ "level": 100, "flow": 350}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 18:01:35.7750000', dynamic({ "level": 90, "flow": 360}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 19:01:35.7750000', dynamic({ "level": 80, "flow": 370}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 20:01:35.7750000', dynamic({ "level": 70, "flow": 380}),
'cb04ccff-48bc-4108-9d16-7d7db9152895','2020-05-15 21:01:35.7750000', dynamic({ "pressure": 120}),
'cb04ccff-48bc-4108-9d16-7d7db9152895','2020-05-15 20:01:35.7750000', dynamic({ "pressure": 130}),
'cb04ccff-48bc-4108-9d16-7d7db9152895','2020-05-15 21:01:35.7750000', dynamic({ "pressure": 140}),
];
deviceTelemetry
| mv-expand kind=array value
| extend k = tostring(value[0]), v = toreal(value[1])
| extend timestamp_bin = bin(timestamp , 1d)
| evaluate pivot(k, avg(v), timestamp_bin, deviceId)
// add the following 2 lines of code
| mv-apply with_itemindex=i pa = pack_all() on (summarize make_bag_if(pa, i >= 2 and tostring(pa) !has_cs "NaN"))
| project timestamp_bin, deviceId, bag_pa
timestamp_bin
deviceId
bag_pa
2020-05-15T00:00:00Z
fddf1cec-16db-4461-9057-3d08e46b6bcf
{"level":55}
2020-05-15T00:00:00Z
aaaaaaaa-fed4-c23b-422b-e85e0877c092
{"flow":365,"level":85}
2020-05-15T00:00:00Z
cb04ccff-48bc-4108-9d16-7d7db9152895
{"pressure":130}
Fiddle
This is why it's important to share the whole scenario and not just a friction of it :-)
let deviceTelemetry = datatable (deviceId:guid, timestamp:datetime, value:dynamic)[
'fddf1cec-16db-4461-9057-3d08e46b6bcf','2020-05-15 17:01:35.7750000', dynamic({ "level": 60}),
'fddf1cec-16db-4461-9057-3d08e46b6bcf','2020-05-15 18:01:35.7750000', dynamic({ "level": 50}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 17:01:35.7750000', dynamic({ "level": 100, "flow": 350}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 18:01:35.7750000', dynamic({ "level": 90, "flow": 360}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 19:01:35.7750000', dynamic({ "level": 80, "flow": 370}),
'aaaaaaaa-fed4-c23b-422b-e85e0877c092','2020-05-15 20:01:35.7750000', dynamic({ "level": 70, "flow": 380}),
'cb04ccff-48bc-4108-9d16-7d7db9152895','2020-05-15 21:01:35.7750000', dynamic({ "pressure": 120}),
'cb04ccff-48bc-4108-9d16-7d7db9152895','2020-05-15 20:01:35.7750000', dynamic({ "pressure": 130}),
'cb04ccff-48bc-4108-9d16-7d7db9152895','2020-05-15 21:01:35.7750000', dynamic({ "pressure": 140}),
];
deviceTelemetry
| mv-expand kind=array value
| extend k = tostring(value[0]), v = toreal(value[1])
| summarize avg(v) by k, timestamp_bin = bin(timestamp , 1d), deviceId
| summarize make_bag(pack_dictionary(k, avg_v)) by timestamp_bin, deviceId
timestamp_bin
deviceId
bag_
2020-05-15T00:00:00Z
fddf1cec-16db-4461-9057-3d08e46b6bcf
{"level":55}
2020-05-15T00:00:00Z
aaaaaaaa-fed4-c23b-422b-e85e0877c092
{"level":85,"flow":365}
2020-05-15T00:00:00Z
cb04ccff-48bc-4108-9d16-7d7db9152895
{"pressure":130}
Fiddle
I need to find out lengths of user sessions given timestamps of individual visits.
New session starts every time a delay between adjacent timestamps is longer than limit.
For example, for this set of timestamps (consider it sort of seconds from epoch):
[
101,
102,
105,
116,
128,
129,
140,
145,
146,
152
]
...and for value of limit=10, I need the following output:
[
3,
1,
2,
4
]
Assuming the values will be in ascending order, loop through the values accumulating the groups based on your condition. reduce works well in this case.
10 as $limit # remove this so you can feed in your value as an argument
| reduce .[] as $i (
{prev:.[0], group:[], result:[]};
if ($i - .prev > $limit)
then {prev:$i, group:[$i], result:(.result + [.group])}
else {prev:$i, group:(.group + [$i]), result}
end
)
| [(.result[], .group) | length]
If the difference from the previous value exceeds the limit, take the current group of values and move it to the result. Otherwise, the current value belongs to the current group so add it. At the end, you could count the sizes of the groups to get your result.
Here's a slightly modified version that just counts the values up.
10 as $limit
| reduce .[] as $i (
{prev:.[0], count:0, result:[]};
if ($i - .prev > $limit)
then {prev:$i, count:1, result:(.result + [.count])}
else {prev:$i, count:(.count + 1), result}
end
)
| [.result[], .count]
Here's another approach using indices to calculate the breakpoint positions:
Producing the lengths of the segments:
10 as $limit
| [
[0, indices(while(. != []; .[1:]) | select(.[0] + $limit <= .[1]))[] + 1, length]
| .[range(length-1):] | .[1] - .[0]
]
[
3,
1,
2,
4
]
Demo
Producing the segments themselves:
10 as $limit
| [
(
[indices(while(. != []; .[1:]) | select(.[0] + $limit <= .[1]))[] + 1]
| [null, .[0]], .[range(length):]
)
as [$a,$b] | .[$a:$b]
]
[
[
101,
102,
105
],
[
116
],
[
128,
129
],
[
140,
145,
146,
152
]
]
Demo
Here is my data set in kusto and I am trying to generate "releaseRank" column based on the release column value.
Input Dataset:
let T = datatable(release:string, metric:long)
[
"22.05", 20,
"22.04", 40,
"22.03", 50,
"22.01", 560
];
T
|take 100;
desired output :
found that there is serialize and row_number kusto
T
|serialize
|extend releaseRank = row_number()
|take 100;
But if the release value is repeated, i need the releaseRank to be same for eg. given the data set, i am not getting the desired output
T = datatable(release:string, metric:long)
[
"22.05", 20,
"22.05", 21,
"22.04", 40,
"22.03", 50,
"22.01", 560
];
T
|serialize
|extend releaseRank = row_number()
|take 100;
expected output
This should do what we want.
let T = datatable(release:string, metric:long)
[
"22.05", 20,
"22.05", 21,
"22.04", 40,
"22.03", 50,
"22.01", 560
];
T
| sort by release desc , metric asc
| extend Rank=row_rank(release)
22.05 20 1
22.05 21 1
22.04 40 2
22.03 50 3
22.01 560 4
I have a list of Employee names and Salaries in the following order
I need to create the output table in the below format. ie, whenever the accumulated salary-total crosses 3000 I have to detect that and mark that row.
I have tried to do row_cumsum and reset the Term once it crossed 3000 but it didn't work for the second iteration.
datatable (name:string, month:int, salary:long)
[
"Alice", 1, 1000,
"Alice", 2, 2000,
"Alice", 3, 1400,
"Alice", 3, 1400,
"Alice", 3, 1400,
]
| order by name asc, month asc
| extend total=row_cumsum(salary)
| extend total=iff(total >=3000,total-prev(total),total)
This is now possible with scan operator:
datatable (name:string, salary:long)
[
"Alice", 1000,
"Alice", 2000,
"Alice", 1400,
"Alice", 1400,
"Alice", 1400,
"Alice", 1000,
"Bob", 2400,
"Bob", 1000,
"Bob", 1000
]
| sort by name asc
| scan declare (total:long) with
(
step s: true => total = iff(isnull(s.total) or name != s.name, salary, iff(s.total < 3000, s.total + salary, salary));
)
| extend boundary_detected = iff(total >= 3000, 1, long(null))
name
salary
total
boundary_detected
Alice
1000
1000
Alice
2000
3000
1
Alice
1400
1400
Alice
1400
2800
Alice
1400
4200
1
Alice
1000
1000
Bob
2400
2400
Bob
1000
3400
1
Bob
1000
1000
Here is the sample data and my query. How to calculate % of failures and create new column value based on the percentage of failures?
let M = datatable (Run_Date:datetime , Job_Name:string , Job_Status:string , Count:int )
["2020-10-21", "Job_A1", "Succeeded", 10,
"2020-10-21", "Job_A1", "Failed", 8,
"10/21/2020", "Job_B2", "Succeeded", 21,
"10/21/2020", "Job_C3", "Succeeded", 21,
"10/21/2020", "Job_D4", "Succeeded", 136,
"10/21/2020", "Job_E5", "Succeeded", 187,
"10/21/2020", "Job_E5", "Failed", 4
];
M
| summarize count() by Job_Name, Count, summary = strcat(Job_Name, " failed " , Count, " out of ", Count ," times.")
And the desired output is below.
you could try something like this:
datatable (Run_Date:datetime , Job_Name:string , Job_Status:string , Count:int )
[
"2020-10-21", "Job_A1", "Succeeded", 10,
"2020-10-21", "Job_A1", "Failed", 8,
"10/21/2020", "Job_B2", "Succeeded", 21,
"10/21/2020", "Job_C3", "Succeeded", 21,
"10/21/2020", "Job_D4", "Succeeded", 136,
"10/21/2020", "Job_E5", "Succeeded", 187,
"10/21/2020", "Job_E5", "Failed", 4
]
| summarize failures = sumif(Count, Job_Status == "Failed"), total = sum(Count) by Job_Name, Run_Date
| project Job_Name, failure_rate = round(100.0 * failures / total, 2), Summary = strcat(Job_Name, " failed " , failures, " out of ", total ," times.")
| extend ['Alert if failure rate > 40%'] = iff(failure_rate > 40, "Yes", "No")
--->
| Job_Name | failure_rate | Summary | Alert if failure rate > 40% |
|----------|--------------|-----------------------------------|-----------------------------|
| Job_A1 | 44.44 | Job_A1 failed 8 out of 18 times. | Yes |
| Job_B2 | 0 | Job_B2 failed 0 out of 21 times. | No |
| Job_C3 | 0 | Job_C3 failed 0 out of 21 times. | No |
| Job_D4 | 0 | Job_D4 failed 0 out of 136 times. | No |
| Job_E5 | 2.09 | Job_E5 failed 4 out of 191 times. | No |