Format kusto `summarize percentiles` result - azure-data-explorer

I have a kusto query like so:
BuildRuns
| where FinishTime >= todatetime("2023-01-16T18:32:00.000Z") and FinishTime <= todatetime("2023-02-16T18:32:59.999Z")
| extend DurationInSecs = datetime_diff("Second", FinishTime, StartTime)
| summarize percentiles(DurationInSecs,50,75,90)
This outputs a table of the percentiles in seconds. Awesome. But how do I format the output into m:ss (minutes:seconds)?

please see:
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/datetime-timespan-arithmetic
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/format-timespanfunction
for example:
print seconds = 125
| extend as_timespan = seconds * 1s
| extend formatted = format_timespan(as_timespan, "mm:ss")
seconds
as_timespan
formatted
125
00:02:05
02:05
and
range seconds from 1 to 3500 step 1
| summarize (p50, p75, p90) = percentiles(seconds, 50, 75, 90)
| extend p50_as_timespan = p75 * 1s, p75_as_timespan = p75 * 1s, p90_as_timespan = p90 * 1s
| extend
p50_formatted = format_timespan(p50_as_timespan, "mm:ss"),
p75_formatted = format_timespan(p75_as_timespan, "mm:ss"),
p90_formatted = format_timespan(p90_as_timespan, "mm:ss")
p50
p75
p90
p50_as_timespan
p75_as_timespan
p90_as_timespan
p50_formatted
p75_formatted
p90_formatted
1749
2624
3151
00:43:44
00:43:44
00:52:31
43:44
43:44
52:31

Related

Kusto - Group by duration value to show numbers

I use the below query to calculate the time diff between 2 events. But I am not sure how to group the duraions. I tried case function but it does not seem to work. Is there a way to group the duration . For example a pie or column chart to show number of items with durations more than 2 hours, more than 5 hours and more than 10 hours. Thanks
| where EventName in ('Handligrequest','Requestcomplete')
| summarize Time_diff = anyif(Timestamp,EventName == "SlackMessagePosted") - anyif(Timestamp,EventName == "ReceivedSlackMessage") by CorrelationId
| where isnotnull(Time_diff)
| extend Duration = format_timespan(Time_diff, 's')
| sort by Duration desc```
// Generate data sample. Not part of the solution
let t = materialize (range i from 1 to 1000 step 1 | extend Time_diff = 24h*rand());
// Solution Starts here
t
| summarize count() by time_diff_range = case(Time_diff >= 10h, "10h <= x", Time_diff >= 5h, "5h <= x < 10h", Time_diff >= 2h, "2h <= x < 5h", "x < 2h")
| render piechart
time_diff_range
count_
10h <= x
590
5h <= x < 10h
209
x < 2h
89
2h <= x < 5h
112
Fiddle

how to add percentage sign on out put on KUSTO

I want to add the % sign on MAX and MIN output on KUSTO. how I can do that if possible.
example:
From
Date time source MAX Min
8/27/2020, 12:00:00.000 PM C4592E37E9A 19.43 14.91
8/21/2020, 1:00:00.000 PM 2E3437E9A5C 31.97 16.37
To
Date time source MAX Min
8/27/2020, 12:00:00.000 PM C4592E37E9A 19.43% 14.91%
8/21/2020, 1:00:00.000 PM 2E3437E9A5C 31.97% 16.37%
query:::
on TimeGenerated, Resource
| summarize in_Gbps = max(MaxInBps)/10000000000* 100, out_Gbps =
max(MaxOutBps)/10000000000 * 100 by bin(TimeGenerated, 60m), Resource
so I need the output of in_Gbps/out_Gbps to have "%"
Thanks
you could use strcat().
for example:
print p1 = 75.56
| extend p2 = strcat(p1, "%")
--------------
p1 | p2
--------------
75.56 | 75.56%
--------------

how to combine R data frames with non-exact criteria (greater/less than condition)

I have to combine two R data frames which have trade and quote information. Like a join, but based on a timestamp in seconds. I need to match each trade with the most recent quote. There are many more quotes than trades.
I have this table with stock quotes. The Timestamp is in seconds:
+--------+-----------+-------+-------+
| Symbol | Timestamp | bid | ask |
+--------+-----------+-------+-------+
| IBM | 10 | 132 | 133 |
| IBM | 20 | 132.5 | 133.3 |
| IBM | 30 | 132.6 | 132.7 |
+--------+-----------+-------+-------+
And these are trades:
+--------+-----------+----------+-------+
| Symbol | Timestamp | quantity | price |
+--------+-----------+----------+-------+
| IBM | 25 | 100 | 132.5 |
| IBM | 31 | 80 | 132.7 |
+--------+-----------+----------+-------+
I think a native R function or dplyr could do it - I've used both for basic purposes but not sure how to proceed here. Any ideas?
So the trade at 25 seconds should match with the quote at 20 seconds, and the trade #31 matches the quote #30, like this:
+--------+-----------+----------+-------+-------+-------+
| Symbol | Timestamp | quantity | price | bid | ask |
+--------+-----------+----------+-------+-------+-------+
| IBM | 25 | 100 | 132.5 | 132.5 | 133.3 |
| IBM | 31 | 80 | 132.7 | 132.6 | 132.7 |
+--------+-----------+----------+-------+-------+-------+
Consider merging on a calculated field by increments of 10. Specifically, calculate a column for multiples of 10 in both datasets, and merge on that field with Symbol.
Below transform and within are used to assign and de-assign the helper field, mult10. In this use case, both base functions are interchangeable:
final_df <- transform(merge(within(quotes, mult10 = floor(Timestamp / 10) * 10),
within(trades, mult10 = floor(Timestamp / 10) * 10),
by=c("Symbol", "mult10"),
multi10 = NULL)
Now if the 10 multiple does not suffice for your needs, adjust to level you require such as 15, 5, 2, etc.
within(quotes, mult10 <- floor(Timestamp / 15) * 15)
within(quotes, mult10 <- floor(Timestamp / 5) * 5)
within(quotes, mult10 <- floor(Timestamp / 2) * 2)
Even more, you may need to use the reverse, floor or ceiling for both data sets respectively to calculate highest multiple of quote's Timestamp and lowest multiple of trade's Timestamp:
within(quotes, mult10 <- ceiling(Timestamp / 15) * 15)
within(trades, mult10 <- floor(Timestamp / 5) * 5)

Recursive query with sub-graph aggregation

I am trying to use Neo4j to write a query that aggregates quantities along a particular sub-graph.
We have two stores Store1 and Store2 one with supplier S1 the other with supplier S2. We move 100 units from Store1 into Store3 and 200 units from Store2 to Store3.
We then move 100 units from Store3 to Store4. So now Store4 has 100 units and approximately 33 originated from supplier S1 and 66 from supplier S2.
I need the query to effectively return this information, E.g.
S1, 33
S2, 66
I have a recursive query to aggregate all the movements along each path
MATCH p=(store1:Store)-[m:MOVE_TO*]->(store2:Store { Name: 'Store4'})
RETURN store1.Supplier, reduce(amount = 0, n IN relationships(p) | amount + n.Quantity) AS reduction
Returns:
| store1.Supplier | reduction|
|-------------------- |-------------|
| S1 | 200 |
| S2 | 300 |
| null | 100 |
Desired:
| store1.Supplier | reduction|
|---------------------|-------------|
| S1 | 33.33 |
| S2 | 66.67 |
What about this one :
MATCH (s:Store) WHERE s.name = 'Store4'
MATCH (s)<-[t:MOVE_TO]-()<-[r:MOVE_TO]-(supp)
WITH t.qty as total, collect(r) as movements
WITH total, movements, reduce(totalSupplier = 0, r IN movements | totalSupplier + r.qty) as supCount
UNWIND movements as movement
RETURN startNode(movement).name as supplier, round(100.0*movement.qty/supCount) as pct
Which returns :
supplier pct
Store1 33
Store2 67
Returned 2 rows in 151 ms
So the following is pretty ugly, but it works for the example you've given.
MATCH (s4:Store { Name:'Store4' })<-[r1:MOVE_TO]-(s3:Store)<-[r2:MOVE_TO*]-(s:Store)
WITH s3, r1.Quantity as Factor, SUM(REDUCE(amount = 0, r IN r2 | amount + r.Quantity)) AS Total
MATCH (s3)<-[r1:MOVE_TO*]-(s:Store)
WITH s.Supplier as Supplier, REDUCE(amount = 0, r IN r1 | amount + r.Quantity) AS Quantity, Factor, Total
RETURN Supplier, Quantity, Total, toFloat(Quantity) / toFloat(Total) * Factor as Proportion
I'm sure it can be improved.

Can I calculate the average of these numbers?

I was wondering if it's possible to calculate the average of some numbers if I have this:
int currentCount = 12;
float currentScore = 6.1123 (this is a range of 1 <-> 10).
Now, if I receive another score (let's say 4.5), can I recalculate the average so it would be something like:
int currentCount now equals 13
float currentScore now equals ?????
or is this impossible and I still need to remember the list of scores?
The following formulas allow you to track averages just from stored average and count, as you requested.
currentScore = (currentScore * currentCount + newValue) / (currentCount + 1)
currentCount = currentCount + 1
This relies on the fact that your average is currently your sum divided by the count. So you simply multiply count by average to get the sum, add your new value and divide by (count+1), then increase count.
So, let's say you have the data {7,9,11,1,12} and the only thing you're keeping is the average and count. As each number is added, you get:
+--------+-------+----------------------+----------------------+
| Number | Count | Actual average | Calculated average |
+--------+-------+----------------------+----------------------+
| 7 | 1 | (7)/1 = 7 | (0 * 0 + 7) / 1 = 7 |
| 9 | 2 | (7+9)/2 = 8 | (7 * 1 + 9) / 2 = 8 |
| 11 | 3 | (7+9+11)/3 = 9 | (8 * 2 + 11) / 3 = 9 |
| 1 | 4 | (7+9+11+1)/4 = 7 | (9 * 3 + 1) / 4 = 7 |
| 12 | 5 | (7+9+11+1+12)/5 = 8 | (7 * 4 + 12) / 5 = 8 |
+--------+-------+----------------------+----------------------+
I like to store the sum and the count. It avoids an extra multiply each time.
current_sum += input;
current_count++;
current_average = current_sum/current_count;
It's quite easy really, when you look at the formula for the average: A1 + A2 + ... + AN/N. Now, If you have the old average and the N (numbers count) you can easily calculate the new average:
newScore = (currentScore * currentCount + someNewValue)/(currentCount + 1)
You can store currentCount and sumScore and you calculate sumScore/currentCount.
or... if you want to be silly, you can do it in one line :
current_average = (current_sum = current_sum + newValue) / ++current_count;
:)
float currentScore now equals (currentScore * (currentCount-1) + 4.5)/currentCount ?

Resources