Any alternative to passing a range to a user-defined function? - azure-data-explorer

Since user-defined functions can't be invoked with an argument that varies with the row context, I'm struggling to find an alternative to what I am trying to do.
I have a table which has rows which records session start and end times after the sessions have closed. I want to be able to visualize the number of connections over time.
I've provided a sample datatable (T) as an example.
The following doesn't work because of user-defined function restrictions.
let T = datatable (TimeGenerated:datetime, ConnectionId:string, Start:datetime, End:datetime)
[ datetime('2021-02-11T20:21:58.680Z'), "0001", datetime('2021-02-11T20:20:50.172Z'), datetime('2021-02-11T20:21:28.673Z'),
datetime('2021-02-11T20:21:58.517Z'), "0002", datetime('2021-02-11T20:04:40.131Z'), datetime('2021-02-11T20:20:52.742Z'),
datetime('2021-02-11T20:21:57.470Z'), "0003", datetime('2021-02-11T20:17:51.585Z'), datetime('2021-02-11T20:18:41.945Z'),
datetime('2021-02-11T20:21:56.793Z'), "0004", datetime('2021-02-11T20:18:04.508Z'), datetime('2021-02-11T20:19:16.594Z'),
datetime('2021-02-11T20:21:55.697Z'), "0005", datetime('2021-02-11T20:15:20.139Z'), datetime('2021-02-11T20:18:26.688Z')
];
let bins = 1m;
let ConnectionsAtTime = (timestamp:datetime) {
toscalar(
T
| where Start <= timestamp and End >= timestamp
| count
)
};
let fakeNow = datetime('2021-02-11T20:22:00Z');
range timestamp from fakeNow - 5m to fakeNow step bins
| extend Count = ConnectionsAtTime(timestamp)
But, if it did work, I would expect the following output:
timestamp
Count
2/11/2021, 8:12:00.000 PM
1
2/11/2021, 8:13:00.000 PM
1
2/11/2021, 8:14:00.000 PM
1
2/11/2021, 8:15:00.000 PM
1
2/11/2021, 8:16:00.000 PM
2
2/11/2021, 8:17:00.000 PM
2
2/11/2021, 8:18:00.000 PM
3
2/11/2021, 8:19:00.000 PM
2
2/11/2021, 8:20:00.000 PM
1
2/11/2021, 8:21:00.000 PM
1
2/11/2021, 8:22:00.000 PM
0

you could try something along the following lines:
let T = datatable (TimeGenerated:datetime, ConnectionId:string, Start:datetime, End:datetime)
[
datetime('2021-02-11T20:21:58.680Z'), "0001", datetime('2021-02-11T20:20:50.172Z'), datetime('2021-02-11T20:21:28.673Z'),
datetime('2021-02-11T20:21:58.517Z'), "0002", datetime('2021-02-11T20:04:40.131Z'), datetime('2021-02-11T20:20:52.742Z'),
datetime('2021-02-11T20:21:57.470Z'), "0003", datetime('2021-02-11T20:17:51.585Z'), datetime('2021-02-11T20:18:41.945Z'),
datetime('2021-02-11T20:21:56.793Z'), "0004", datetime('2021-02-11T20:18:04.508Z'), datetime('2021-02-11T20:19:16.594Z'),
datetime('2021-02-11T20:21:55.697Z'), "0005", datetime('2021-02-11T20:15:20.139Z'), datetime('2021-02-11T20:18:26.688Z')
];
let bins = 1m;
let fakeNow = datetime('2021-02-11T20:22:00Z');
let ConnectionsAtTime = (T:(Start:datetime, End:datetime, ConnectionId:string), range_start:datetime, range_end:datetime, bin:timespan)
{
T
| mv-expand timestamp = range(range_start, range_end, bin) to typeof(datetime)
| summarize dcountif(ConnectionId, Start <= timestamp and End >= timestamp) by timestamp
}
;
T
| invoke ConnectionsAtTime(fakeNow - 5m, fakeNow, bins)
-->
timestamp
countif_
2021-02-11 20:17:00.0000000
2
2021-02-11 20:18:00.0000000
3
2021-02-11 20:19:00.0000000
2
2021-02-11 20:20:00.0000000
1
2021-02-11 20:21:00.0000000
1
2021-02-11 20:22:00.0000000
0

Related

Is it possible to iterate over the row values of a column in KQL to feed each value through a function

I am applying the series_decompose_anomalies algorithm to time data coming from multiple meters. Currently, I am using the ADX dashboard feature to feed my meter identifier as a parameter into the algorithm and return my anomalies and scores as a table.
let dt = 3hr;
Table
| where meter_ID == dashboardParameter
| make-series num=avg(value) on timestamp from _startTime to _endTime step dt
| extend (anomalies,score,baseline) = series_decompose_anomalies( num, 3,-1, 'linefit')
| mv-expand timestamp, num, baseline, anomalies, score
| where anomalies ==1
| project dashboardParameter, todatetime(timestamp), toreal(num), toint(anomalies), toreal(score)
I would like to bulk process all my meters in one go and return a table with all anomalies found across them. Is it possible to feed an array as an iterable in KQL or something similar to allow my parameter to change multiple times in a single run?
Simply add by meter_ID to make-series
(and remove | where meter_ID == dashboardParameter)
| make-series num=avg(value) on timestamp from _startTime to _endTime step dt by meter_ID
P.S.
Anomaly can be positive (num > baseline => flag = 1) or negative (num < baseline => flag = -1)
Demo
let _step = 1h;
let _endTime = toscalar(TransformedServerMetrics | summarize max(Timestamp));
let _startTime = _endTime - 12h;
TransformedServerMetrics
| make-series num = avg(Value) on Timestamp from _startTime to _endTime step _step by SQLMetrics
| extend (flag, score, baseline) = series_decompose_anomalies(num , 3,-1, 'linefit')
| mv-expand Timestamp to typeof(datetime), num to typeof(real), flag to typeof(int), score to typeof(real), baseline to typeof(real)
| where flag != 0
SQLMetrics
num
Timestamp
flag
score
baseline
write_bytes
169559910.91717172
2022-06-14T15:00:30.2395884Z
-1
-3.4824039875238131
170205132.25708669
cpu_time_ms
17.369556143036036
2022-06-14T17:00:30.2395884Z
1
7.8874529842826
11.04372634506527
percent_complete
0.04595588235294118
2022-06-14T22:00:30.2395884Z
1
25.019464868749985
0.004552738927738928
blocking_session_id
-5
2022-06-14T22:00:30.2395884Z
-1
-25.019464868749971
-0.49533799533799527
pending_disk_io_count
0.0019675925925925924
2022-06-14T23:00:30.2395884Z
1
6.4686836384225685
0.00043773741690408352
Fiddle

Kusto - Group by duration value to show numbers

I use the below query to calculate the time diff between 2 events. But I am not sure how to group the duraions. I tried case function but it does not seem to work. Is there a way to group the duration . For example a pie or column chart to show number of items with durations more than 2 hours, more than 5 hours and more than 10 hours. Thanks
| where EventName in ('Handligrequest','Requestcomplete')
| summarize Time_diff = anyif(Timestamp,EventName == "SlackMessagePosted") - anyif(Timestamp,EventName == "ReceivedSlackMessage") by CorrelationId
| where isnotnull(Time_diff)
| extend Duration = format_timespan(Time_diff, 's')
| sort by Duration desc```
// Generate data sample. Not part of the solution
let t = materialize (range i from 1 to 1000 step 1 | extend Time_diff = 24h*rand());
// Solution Starts here
t
| summarize count() by time_diff_range = case(Time_diff >= 10h, "10h <= x", Time_diff >= 5h, "5h <= x < 10h", Time_diff >= 2h, "2h <= x < 5h", "x < 2h")
| render piechart
time_diff_range
count_
10h <= x
590
5h <= x < 10h
209
x < 2h
89
2h <= x < 5h
112
Fiddle

How to convert to dynamic type/ apply multiple functions on same 'pack' in KQL/Kusto

I am absolutely in love with ADX time series capabilities; having worked tons on sensor data with Python. Below are the requirements for my case:
Handle Sensor data tags at different frequencies -- bring them to all to 1 sec frequency (if in milliseconds, aggregate over a 1sec interval)
Convert stacked data to unstacked data.
Join with another dataset which has multiple "string-labels" by timestamp, after unstack.
Do linear interpolation on some columns, and forward fill in others (around 10-12 in all).
I think with below query I have gotten the first three done; but unable to use series_fill_linear directly on column. The docs say this function requires a dynamic type as input. The error message is helpful:
series_fill_linear(): argument #1 was not of an expected data type: dynamic
Is it possible to apply series_fill_linear where I'm already using pack instead of using pack again. How can I apply this function selectively by Tag; and make my overall query more readable? It's important to note that only sensor_data table requires both series_fill_linear and series_fill_forward; label_data only requires series_fill_forward.
List item
sensor_data
| where timestamp > datetime(2020-11-24 00:59:59) and timestamp <datetime(2020-11-24 12:00:00)
| where device_number =='PRESSURE_599'
| where tag_name in ("tag1", "tag2", "tag3", "tag4")
| make-series agg_value = avg(value) default = double(null) on timestamp in range (datetime(2020-11-24 00:59:59), datetime(2020-11-24 12:00:00), 1s) by tag_name
| extend series_fill_linear(agg_value, double(null), false) //EDIT
| mv-expand timestamp to typeof(datetime), agg_value to typeof(double)
| summarize b = make_bag(pack(tag_name, agg_value)) by timestamp
| evaluate bag_unpack(b)
|join kind = leftouter (label_data
| where timestamp > datetime(2020-11-24 00:58:59) and timestamp <datetime(2020-11-24 12:00:01)
| where device_number =='PRESSURE_599'
| where tag != "PRESSURE_599_label_Raw"
| summarize x = make_bag(pack(tag, value)) by timestamp
| evaluate bag_unpack(x)) on timestamp
| project timestamp,
MY_LINEAR_COL_1 = series_fill_linear(tag1, double(null), false),
MY_LINEAR_COL_2 = series_fill_forward(tag2),
MY_LABEL_1 = series_fill_forward(PRESSURE_599_label_level1),
MY_LABEL_2 = series_fill_forward(PRESSURE_599_label_level2)
EDIT: I ended up using extend with case to handle different cases of interpolation.
// let forward_tags = dynamic({"tags": ["tag2","tag4"]}); unable to use this in query as "forward_tags.tags"
sensor_data
| where timestamp > datetime(2020-11-24 00:59:59) and timestamp <datetime(2020-11-24 12:00:00)
| where device_number = "PRESSURE_599"
| where tag_name in ("tag1", "tag2", "tag3", "tag4") // use a variable here instead?
| make-series agg_value = avg(value)
default = double(null)
on timestamp
in range (datetime(2020-11-24 00:59:59), datetime(2020-11-24 12:00:00), 1s)
by tag_name
| extend agg_value = case (tag_name in ("tag2", "tag3"), // use a variable here instead?
series_fill_forward(agg_value, double(null)),
series_fill_linear(agg_value, double(null), false)
)
| mv-expand timestamp to typeof(datetime), agg_value to typeof(double)
| summarize b = make_bag(pack(tag_name, agg_value)) by timestamp
| evaluate bag_unpack(b)
| join kind = leftouter (
label_data // don't want to use make-series here, will be unecessary data generation since already in 'ss' format.
| where timestamp > datetime(2020-11-24 00:58:59) and timestamp <datetime(2020-11-24 12:00:01)
| where tag != "PRESSURE_599_label_Raw"
| summarize x = make_bag(pack(tag, value)) by timestamp
| evaluate bag_unpack(x)
)
on timestamp
I was wondering if it is possible in KQL to pass a list of strings inside a query/fxn to use as shown below. I have commented where I think a list of strings could be passed to make the code more readable.
Now, I just need to fill_forward the label columns (MY_LABEL_1, MY_LABEL_2); which are a result of the below query. I would prefer the code is added on to the main query, and the final result is a table with all columns; Here is a sample table based on my case's result.
datatable (timestamp:datetime, tag1:double, tag2:double, tag3:double, tag4:double, MY_LABEL_1: string, MY_LABEL_2: string)
[
datetime(2020-11-24T00:01:00Z), 1, 3, 6, 9, "x", "foo",
datetime(2020-11-24T00:01:01Z), 1, 3, 6, 9, "", "",
datetime(2020-11-24T00:01:02Z), 1, 3, 6, 9,"", "",
datetime(2020-11-24T00:01:03Z), 1, 3, 6, 9,"y", "bar",
datetime(2020-11-24T00:01:04Z), 1, 3, 6, 9,"", "",
datetime(2020-11-24T00:01:05Z), 1, 3, 6, 9,"", "",
]
Series functions in ADX only work on dynamic arrays. You can apply a selective fill function using case() function, by replacing this line:
| extend series_fill_linear(agg_value, double(null), false) //EDIT
With something like the following:
| extend agg_value = case(
tag_name == "tag1", series_fill_linear(agg_value, double(null), false),
tag_name == "tag2", series_fill_forward(agg_value),
series_fill_forward(agg_value)
)
Edit:
Here is an example of string column fill-forward workaround:
let T = datatable ( Timestamp: datetime, Employee: string )
[ datetime(2020-01-01), "Bob",
datetime(2021-01-02), "",
datetime(2021-01-03), "Alice",
datetime(2021-01-04), "",
datetime(2021-01-05), "",
datetime(2021-01-06), "Alan",
datetime(2021-01-07), "",
datetime(2021-01-08), "" ]
| sort by Timestamp asc;
let employeeLookup = toscalar(T | where isnotempty(Employee) | summarize make_list(Employee));
T
| extend idx = row_cumsum(tolong(isnotempty(Employee)))
| extend EmployeeFilled = employeeLookup[idx - 1]
| project-away idx
Timestamp
Employee
EmployeeFilled
2021-01-01 00:00:00.0000000
Bob
Bob
2021-01-02 00:00:00.0000000
Bob
2021-01-03 00:00:00.0000000
Alice
Alice
2021-01-04 00:00:00.0000000
Alice
2021-01-05 00:00:00.0000000
Alice
2021-01-06 00:00:00.0000000
Alan
Alan
2021-01-07 00:00:00.0000000
Alan
2021-01-08 00:00:00.0000000
Alan
Regarding your requirement to convert the time series in many frequencies to a common one, have a look at series_downsample_fl() function library

How to access the range-step value within `toscalar()` statement used within `range()` statement

Am using a Kusto query to create a timechart within Azure AppInsights, to visualize when our webservice is within its SLO (and when it isn't) using one of Google's examples of measuring if a webservice is within its error budget:
SLI = The proportion of sufficiently fast requests, as measured from the load balancer metrics. “Sufficiently fast” is defined as < 400 ms.
SLO = 90% of requests < 400 ms
Measured as:
count of http_requests with a duration less than or equal to "0.4" seconds
divided by count of all http_requests
Assuming 10-minute inspection intervals over a 7-day window, here is my code:
let fastResponseTimeMaxMs = 400.0;
let errorBudgetThresholdForFastResponseTime = 90.0;
//
let startTime = ago(7days);
let endTime = now();
let timeStep = 10m;
//
let timeRange = range InspectionTime from startTime to endTime step timeStep;
timeRange
| extend RespTimeMax_ms = fastResponseTimeMaxMs
| extend ActualCount = toscalar
(
requests
| where timestamp > InspectionTime - timeStep
| where timestamp <= InspectionTime
| where success == "True"
| where duration <= fastResponseTimeMaxMs
| count
)
| extend TotalCount = toscalar
(
requests
| where timestamp > InspectionTime - timeStep
| where timestamp <= InspectionTime
| where success == "True"
| count
)
| extend Percentage = round(todecimal(ActualCount * 100) / todecimal(TotalCount), 2)
| extend ErrorBudgetMinPercent = errorBudgetThresholdForFastResponseTime
| extend InBudget = case(Percentage >= ErrorBudgetMinPercent, 1, 0)
Sample query output of what I wish to achieve:
InspectionTime [UTC] RespTimeMax_ms ActualCount TotalCount Percentage ErrorBudgetMinPercent InBudget
2019-05-23T21:53:17.894 400 8,098 8,138 99.51 90 1
2019-05-23T22:03:17.894 400 8,197 9,184 89.14 90 0
2019-05-23T22:13:17.894 400 8,002 8,555 93.54 90 1
The error I'm getting is:
'where' operator: Failed to resolve scalar expression named 'InspectionTime'
I've tried todatetime(InspectionTime), fails with same error.
Replacing InspectionTime with other objects of type datetime gets this code to execute OK, but not with the datetime values that I want. By example, using this snippet executes OK, when used within my code sample above:
| extend ActualCount = toscalar
(
requests
| where timestamp > startTime // instead of 'InspectionTime - timeStep'
| where timestamp <= endTime // instead of 'InspectionTime'
| where duration <= fastResponseTimeMaxMs
| count
)
To me it seems that using InspectionTime within toscalar(...) is the crux of this problem, since I'm able to use InspectionTime within similar queries using range(...) that don't nest it within toscalar(...).
Note: I don't want a timechart chart of request.duration, since that doesn't tell me if the count of requests above my threshold (400ms) exceed our error budget according to the formula defined above.
your query is invalid as you can't reference the InspectionTime column in the subquery that you're running in toscalar().
if I understand your desired logic correctly, the following query might work or give you a different direction (if not - you may want to share a sample input dataset using the datatable operator, and specify the desired result that matches it)
let fastResponseTimeMaxMs = 400.0;
let errorBudgetThresholdForFastResponseTime = 90.0;
//
let startTime = ago(7days);
let endTime = now();
let timeStep = 10m;
//
requests
| where timestamp > startTime and timestamp < endTime
| where success == 'True'
| summarize TotalCount = count(), ActualCount = countif(duration <= fastResponseTimeMaxMs) by bin(timestamp, timeStep)
| extend Percentage = round(todecimal(ActualCount * 100) / todecimal(TotalCount), 2)
| extend ErrorBudgetMinPercent = errorBudgetThresholdForFastResponseTime
| extend InBudget = case(Percentage >= ErrorBudgetMinPercent, 1, 0)

Time/Date range grammars

I need to parse strings containing time spans such as:
Thursday 6:30-7:30 AM
December 30, 2009 - January 1, 2010
1/15/09, 7:30 to 8:30 PM
Thursday, from 6:30 to 7:30 AM
and others...
added
6:30 to 7:30
and date/times such as most any cases that Word's insert->date can generate
As I'd be extremely surprised if anything out there covers all the cases I need to cover, I'm looking for grammars to start from.
Ok, the following grammar parses anything in your example:
DTExp = Day, ['-', Day]
Day = DayExp, [[','], ['from'], TimeRange]
DayExp = WeekDay
| [Weekday], Month, DayNumber, [[','], YearNumber]
| [Weekday], MonthNumber, '/', DayNumber, ['/', YearNumber]
TimeRange = Time, [['-'|'to'] Time]
Time = HourNumber, ':', MinuteNumber, ['AM'|'PM']
WeekDay = 'monday' | 'tuesday' | ...
Month = MonthNumber | MonthName
MonthName = 'january' | 'february' | ...
DayNumber = Number
MonthNumber = Number
YearNumber = Number, ['AD'|'BC']
HourNumber = Number
MinuteNumber = Number
There is a slight problem in the grammar. If a DayExp is read, followed by a Time, and a '-', then you could expect another DayExp or another time. But this is solved by a lookahead, because if it is a time, a number is followed by a ':'.
Lets try to construct a parse tree:
Thursday 6 : 30 - 7 : 30 AM
| | | | | |
WeekDay Number : Number - Number : Number |
| -----|---- -----|-----------
| Time - Time
| ---------|---------
DayExp TimeRange
----------|-----------
Day
|
DTExp

Resources