Kusto - Group by duration value to show numbers - azure-data-explorer

I use the below query to calculate the time diff between 2 events. But I am not sure how to group the duraions. I tried case function but it does not seem to work. Is there a way to group the duration . For example a pie or column chart to show number of items with durations more than 2 hours, more than 5 hours and more than 10 hours. Thanks
| where EventName in ('Handligrequest','Requestcomplete')
| summarize Time_diff = anyif(Timestamp,EventName == "SlackMessagePosted") - anyif(Timestamp,EventName == "ReceivedSlackMessage") by CorrelationId
| where isnotnull(Time_diff)
| extend Duration = format_timespan(Time_diff, 's')
| sort by Duration desc```

// Generate data sample. Not part of the solution
let t = materialize (range i from 1 to 1000 step 1 | extend Time_diff = 24h*rand());
// Solution Starts here
t
| summarize count() by time_diff_range = case(Time_diff >= 10h, "10h <= x", Time_diff >= 5h, "5h <= x < 10h", Time_diff >= 2h, "2h <= x < 5h", "x < 2h")
| render piechart
time_diff_range
count_
10h <= x
590
5h <= x < 10h
209
x < 2h
89
2h <= x < 5h
112
Fiddle

Related

Auto update Kusto to pull todays date but specific time

I am very new to Kusto queries and I have one that is giving me the proper data that I export to Excel to manage. My only problem is that I only care (right now) about yesterday and today in two separate Sheets. I can manually change the datetime with the information but I would like to be able to just refresh the data and it pull the newest number.
It sounds pretty simple but I cannot figure out how to specify the exact time I want. Has to be from 2 am day 1 until 1:59 day 2
Thanks
['Telemetry.WorkStation']
| where NexusUid == "08463c7b-fe37-43b6-a0d2-237472b9774d"
| where TelemetryLocalTimeStamp >= make_datetime(2023,2,15,2,0,0) and TelemetryLocalTimeStamp < make_datetime(2023,2,16,01,59,0)
| where NumberOfBinPresentations >0
ago(), now(), startofday() and some datetime arithmetic.
// Sample data generation. Not part of the solution.
let ['Telemetry.WorkStation'] = materialize(range i from 1 to 1000000 step 1 | extend NexusUid = "08463c7b-fe37-43b6-a0d2-237472b9774d", TelemetryLocalTimeStamp = ago(2d * rand()));
// Solution starts here.
['Telemetry.WorkStation']
| where NexusUid == "08463c7b-fe37-43b6-a0d2-237472b9774d"
| where TelemetryLocalTimeStamp >= startofday(ago(1d)) + 2h
and TelemetryLocalTimeStamp < startofday(now()) + 2h
| summarize count(), min(TelemetryLocalTimeStamp), max(TelemetryLocalTimeStamp)
count_
min_TelemetryLocalTimeStamp
max_TelemetryLocalTimeStamp
500539
2023-02-15T02:00:00.0162031Z
2023-02-16T01:59:59.8883692Z
Fiddle

Is it possible to iterate over the row values of a column in KQL to feed each value through a function

I am applying the series_decompose_anomalies algorithm to time data coming from multiple meters. Currently, I am using the ADX dashboard feature to feed my meter identifier as a parameter into the algorithm and return my anomalies and scores as a table.
let dt = 3hr;
Table
| where meter_ID == dashboardParameter
| make-series num=avg(value) on timestamp from _startTime to _endTime step dt
| extend (anomalies,score,baseline) = series_decompose_anomalies( num, 3,-1, 'linefit')
| mv-expand timestamp, num, baseline, anomalies, score
| where anomalies ==1
| project dashboardParameter, todatetime(timestamp), toreal(num), toint(anomalies), toreal(score)
I would like to bulk process all my meters in one go and return a table with all anomalies found across them. Is it possible to feed an array as an iterable in KQL or something similar to allow my parameter to change multiple times in a single run?
Simply add by meter_ID to make-series
(and remove | where meter_ID == dashboardParameter)
| make-series num=avg(value) on timestamp from _startTime to _endTime step dt by meter_ID
P.S.
Anomaly can be positive (num > baseline => flag = 1) or negative (num < baseline => flag = -1)
Demo
let _step = 1h;
let _endTime = toscalar(TransformedServerMetrics | summarize max(Timestamp));
let _startTime = _endTime - 12h;
TransformedServerMetrics
| make-series num = avg(Value) on Timestamp from _startTime to _endTime step _step by SQLMetrics
| extend (flag, score, baseline) = series_decompose_anomalies(num , 3,-1, 'linefit')
| mv-expand Timestamp to typeof(datetime), num to typeof(real), flag to typeof(int), score to typeof(real), baseline to typeof(real)
| where flag != 0
SQLMetrics
num
Timestamp
flag
score
baseline
write_bytes
169559910.91717172
2022-06-14T15:00:30.2395884Z
-1
-3.4824039875238131
170205132.25708669
cpu_time_ms
17.369556143036036
2022-06-14T17:00:30.2395884Z
1
7.8874529842826
11.04372634506527
percent_complete
0.04595588235294118
2022-06-14T22:00:30.2395884Z
1
25.019464868749985
0.004552738927738928
blocking_session_id
-5
2022-06-14T22:00:30.2395884Z
-1
-25.019464868749971
-0.49533799533799527
pending_disk_io_count
0.0019675925925925924
2022-06-14T23:00:30.2395884Z
1
6.4686836384225685
0.00043773741690408352
Fiddle

How to build a distribution of percentages in U-SQL?

In my database I have the column "Study hours per week". I want to build a distribution using U-SQL which groups the % of students into each 'student hour' bucket. Is there a built-in function to help me achieve this?
Essentially, I want to populate the right side of this table:
Study Hours per week | % of students
<= 1
<= 5
<= 10
<= 20
<= 40
<= 100
Example: If we had 10 unique students with the following study hours/week: [5, 6, 10, 9, 2, 25, 18, 5, 12, 1] the resulting output should be:
Study Hours per week | % of students
<= 1 | 10%
<= 5 | 40%
<= 10 | 70%
<= 20 | 90%
<= 40 | 100%
<= 100| 100%

How to access the range-step value within `toscalar()` statement used within `range()` statement

Am using a Kusto query to create a timechart within Azure AppInsights, to visualize when our webservice is within its SLO (and when it isn't) using one of Google's examples of measuring if a webservice is within its error budget:
SLI = The proportion of sufficiently fast requests, as measured from the load balancer metrics. “Sufficiently fast” is defined as < 400 ms.
SLO = 90% of requests < 400 ms
Measured as:
count of http_requests with a duration less than or equal to "0.4" seconds
divided by count of all http_requests
Assuming 10-minute inspection intervals over a 7-day window, here is my code:
let fastResponseTimeMaxMs = 400.0;
let errorBudgetThresholdForFastResponseTime = 90.0;
//
let startTime = ago(7days);
let endTime = now();
let timeStep = 10m;
//
let timeRange = range InspectionTime from startTime to endTime step timeStep;
timeRange
| extend RespTimeMax_ms = fastResponseTimeMaxMs
| extend ActualCount = toscalar
(
requests
| where timestamp > InspectionTime - timeStep
| where timestamp <= InspectionTime
| where success == "True"
| where duration <= fastResponseTimeMaxMs
| count
)
| extend TotalCount = toscalar
(
requests
| where timestamp > InspectionTime - timeStep
| where timestamp <= InspectionTime
| where success == "True"
| count
)
| extend Percentage = round(todecimal(ActualCount * 100) / todecimal(TotalCount), 2)
| extend ErrorBudgetMinPercent = errorBudgetThresholdForFastResponseTime
| extend InBudget = case(Percentage >= ErrorBudgetMinPercent, 1, 0)
Sample query output of what I wish to achieve:
InspectionTime [UTC] RespTimeMax_ms ActualCount TotalCount Percentage ErrorBudgetMinPercent InBudget
2019-05-23T21:53:17.894 400 8,098 8,138 99.51 90 1
2019-05-23T22:03:17.894 400 8,197 9,184 89.14 90 0
2019-05-23T22:13:17.894 400 8,002 8,555 93.54 90 1
The error I'm getting is:
'where' operator: Failed to resolve scalar expression named 'InspectionTime'
I've tried todatetime(InspectionTime), fails with same error.
Replacing InspectionTime with other objects of type datetime gets this code to execute OK, but not with the datetime values that I want. By example, using this snippet executes OK, when used within my code sample above:
| extend ActualCount = toscalar
(
requests
| where timestamp > startTime // instead of 'InspectionTime - timeStep'
| where timestamp <= endTime // instead of 'InspectionTime'
| where duration <= fastResponseTimeMaxMs
| count
)
To me it seems that using InspectionTime within toscalar(...) is the crux of this problem, since I'm able to use InspectionTime within similar queries using range(...) that don't nest it within toscalar(...).
Note: I don't want a timechart chart of request.duration, since that doesn't tell me if the count of requests above my threshold (400ms) exceed our error budget according to the formula defined above.
your query is invalid as you can't reference the InspectionTime column in the subquery that you're running in toscalar().
if I understand your desired logic correctly, the following query might work or give you a different direction (if not - you may want to share a sample input dataset using the datatable operator, and specify the desired result that matches it)
let fastResponseTimeMaxMs = 400.0;
let errorBudgetThresholdForFastResponseTime = 90.0;
//
let startTime = ago(7days);
let endTime = now();
let timeStep = 10m;
//
requests
| where timestamp > startTime and timestamp < endTime
| where success == 'True'
| summarize TotalCount = count(), ActualCount = countif(duration <= fastResponseTimeMaxMs) by bin(timestamp, timeStep)
| extend Percentage = round(todecimal(ActualCount * 100) / todecimal(TotalCount), 2)
| extend ErrorBudgetMinPercent = errorBudgetThresholdForFastResponseTime
| extend InBudget = case(Percentage >= ErrorBudgetMinPercent, 1, 0)

Can I calculate the average of these numbers?

I was wondering if it's possible to calculate the average of some numbers if I have this:
int currentCount = 12;
float currentScore = 6.1123 (this is a range of 1 <-> 10).
Now, if I receive another score (let's say 4.5), can I recalculate the average so it would be something like:
int currentCount now equals 13
float currentScore now equals ?????
or is this impossible and I still need to remember the list of scores?
The following formulas allow you to track averages just from stored average and count, as you requested.
currentScore = (currentScore * currentCount + newValue) / (currentCount + 1)
currentCount = currentCount + 1
This relies on the fact that your average is currently your sum divided by the count. So you simply multiply count by average to get the sum, add your new value and divide by (count+1), then increase count.
So, let's say you have the data {7,9,11,1,12} and the only thing you're keeping is the average and count. As each number is added, you get:
+--------+-------+----------------------+----------------------+
| Number | Count | Actual average | Calculated average |
+--------+-------+----------------------+----------------------+
| 7 | 1 | (7)/1 = 7 | (0 * 0 + 7) / 1 = 7 |
| 9 | 2 | (7+9)/2 = 8 | (7 * 1 + 9) / 2 = 8 |
| 11 | 3 | (7+9+11)/3 = 9 | (8 * 2 + 11) / 3 = 9 |
| 1 | 4 | (7+9+11+1)/4 = 7 | (9 * 3 + 1) / 4 = 7 |
| 12 | 5 | (7+9+11+1+12)/5 = 8 | (7 * 4 + 12) / 5 = 8 |
+--------+-------+----------------------+----------------------+
I like to store the sum and the count. It avoids an extra multiply each time.
current_sum += input;
current_count++;
current_average = current_sum/current_count;
It's quite easy really, when you look at the formula for the average: A1 + A2 + ... + AN/N. Now, If you have the old average and the N (numbers count) you can easily calculate the new average:
newScore = (currentScore * currentCount + someNewValue)/(currentCount + 1)
You can store currentCount and sumScore and you calculate sumScore/currentCount.
or... if you want to be silly, you can do it in one line :
current_average = (current_sum = current_sum + newValue) / ++current_count;
:)
float currentScore now equals (currentScore * (currentCount-1) + 4.5)/currentCount ?

Resources