Filter columns by dashboard multi-select parameter - azure-data-explorer

I'm trying to render a time series, but I have too many columns to show by default. To remedy this I figured I would present the user with a multi-select of all the columns and downselect the columns I render to that list, but I can't for the life of me figure out or find an answer on how to do it.
I have data with, say, columns Time, X1, X2, ... X120 and a multi-select parameter _columns of that table | getschema | project ColumnName | where ColumnName != "Time". I want to project to Time and the contents of _columns.
I can only find how to filter rows based on some column's value vs the multi-select. I feel like I'm missing something very simple.

Updated
There is also a simple solution for data that looks something like that:
This kind of data might be created by a make-series operator with multiple aggregation functions, e.g. -
make-series series_001 = count(), series_002 = min(x), series_003 = sum(x), series_004 = avg(x), series_005 = countif(type == 1), series_006 = countif(subtype == 123) on Timestamp from ago(7d) to now() step 1d
// Data sample generation, including series creation.
// Not part of the solution.
let p_series_num = 100;
let data = materialize
(
range i from 1 to p_series_num step 1
| project series_name = strcat("series_", substring(strcat("00", i), -3))
| mv-apply range(1, 7, 1) on (summarize make_list(rand()))
| evaluate pivot(series_name, take_any(list_))
| extend Timestamp = range(now() - 6d, now(), 1d)
| project-reorder Timestamp, * granny-asc
);
// Solution starts here
// We assume the creation of a parameter named _series, in the dashboard
// Uncomment the following line when executed outside the context of the dashboard
let ['_series'] = 'series_001';
data
| project Timestamp, column_ifexists(['_series'], real(null))
| render timechart
Timestamp
series_001
["2022-10-08T15:59:51.4634127Z","2022-10-09T15:59:51.4634127Z","2022-10-10T15:59:51.4634127Z","2022-10-11T15:59:51.4634127Z","2022-10-12T15:59:51.4634127Z","2022-10-13T15:59:51.4634127Z","2022-10-14T15:59:51.4634127Z"]
["0.35039128090096505","0.79027849410190631","0.023939659111657484","0.14207071795033441","0.91242141133745414","0.33016368441829869","0.50674771943297525"]
Fiddle

This solution supports multi-selection.
The original data looks something like that:
// Data sample generation. Not part of the solution.
let p_start_time = startofday(ago(1d));
let p_interval = 5m;
let p_rows = 15;
let p_cols = 120;
let data = materialize
(
range Timestamp from p_start_time to p_start_time + p_rows * p_interval step p_interval
| mv-expand MetricID = range(1, p_cols) to typeof(int)
| extend MetricVal = rand(), MetricName = strcat("x", tostring(MetricID))
| evaluate pivot(MetricName, take_any(MetricVal), Timestamp)
| project-reorder Timestamp, * granny-asc
);
// Solution starts here
// We assume the creation of a parameter named _MetricName, in the dashboard
// Uncomment the following line when executed outside the context of the dashboard
let ['_series'] = dynamic(['x1', 'x3', 'x7', 'x100', 'x120']);
data
| project Timestamp, pa = pack_all()
| project Timestamp, cols = bag_remove_keys(pa, set_difference(bag_keys(pa), _series))
| evaluate bag_unpack(cols)
| render timechart
Timestamp
x1
x100
x120
x3
x7
2022-10-20T00:00:00Z
0.40517703772298719
0.86952520047109094
0.67458442932790974
0.20662741864260561
0.19230161743580523
2022-10-20T00:05:00Z
0.098438188653858671
0.14095230636982198
0.10269711129443576
0.99361020447683746
0.093624077251808144
2022-10-20T00:10:00Z
0.3779198279036311
0.095647188329308852
0.38967218915903867
0.62601873006422182
0.18486009896828509
2022-10-20T00:15:00Z
0.141551736845493
0.64623737123678782
... etc.
Fiddle

It might be very simple, if our tabular data (post creation of the series) looks something like that:
This kind of data might be created by a make-series operator with by clause, e.g. -
make-series count() on Timestamp from ago(7d) to now() step 1d by series_name
In that case, all we need to do is add a filter on the series name, E.g. -
// Data sample generation, including series creation.
// Not part of the solution.
let p_series_num = 100;
let data = materialize
(
range i from 1 to 1000000 step 1
| extend Timestamp = ago(rand()*7d)
,series_name = strcat("series_", substring(strcat("00", tostring(toint(rand(p_series_num)))), -3))
| make-series count() on Timestamp from ago(7d) to now() step 1d by series_name
);
// Solution starts here
// We assume the creation of a parameter named _series, in the dashboard
// Uncomment the following line when executed outside the context of the dashboard
let ['_series'] = 'series_001';
data
| where series_name == _series
| render timechart
series_name
count_
Timestamp
series_001
[1434,1439,1430,1428,1422,1372,1475]
["2022-10-07T15:54:57.3677580Z","2022-10-08T15:54:57.3677580Z","2022-10-09T15:54:57.3677580Z","2022-10-10T15:54:57.3677580Z","2022-10-11T15:54:57.3677580Z","2022-10-12T15:54:57.3677580Z","2022-10-13T15:54:57.3677580Z"]
Fiddle

Here is a solution that match the data structure in your scenario.
* It is the same solution is the other solution I just modified, but since the source data structure is different, I posted an additional answer for learning purposes.
The original data looks something like that:
The code is actually very simple, leveraging column_ifexists():
// Data sample generation. Not part of the solution.
let p_start_time = startofday(ago(1d));
let p_interval = 5m;
let p_rows = 15;
let p_cols = 120;
let data = materialize
(
range Timestamp from p_start_time to p_start_time + p_rows * p_interval step p_interval
| mv-expand MetricID = range(1, p_cols) to typeof(int)
| extend MetricVal = rand(), MetricName = strcat("x", tostring(MetricID))
| evaluate pivot(MetricName, take_any(MetricVal), Timestamp)
| project-reorder Timestamp, * granny-asc
);
// Solution starts here
// We assume the creation of a parameter named _MetricName, in the dashboard
// Uncomment the following line when executed outside the context of the dashboard
let ['_MetricName'] = "x42";
data
| project Timestamp, column_ifexists(['_MetricName'], real(null))
| render timechart
Timestamp
x42
2022-10-13T00:00:00Z
0.89472385054721115
2022-10-13T00:05:00Z
0.11275174098360444
2022-10-13T00:10:00Z
0.96233152692333268
2022-10-13T00:15:00Z
0.21751913633816042
2022-10-13T00:20:00Z
0.69591667527850931
2022-10-13T00:25:00Z
0.36802228024058203
2022-10-13T00:30:00Z
0.29060518653083045
2022-10-13T00:35:00Z
0.13362332423562559
2022-10-13T00:40:00Z
0.013920161307282448
2022-10-13T00:45:00Z
0.05909880950497
2022-10-13T00:50:00Z
0.146454957813311
2022-10-13T00:55:00Z
0.318823204227693
2022-10-13T01:00:00Z
0.020087435985750794
2022-10-13T01:05:00Z
0.31110660126024159
2022-10-13T01:10:00Z
0.75531136771424379
2022-10-13T01:15:00Z
0.99289833682620265
Fiddle

Related

unpivot (wide to long) with dynamic column names

I need a function wide_to_long that turns a wide table into a long table and that accepts an argument id_vars for which the values have to be repeated (see example).
Sample input
let T_wide = datatable(name: string, timestamp: datetime, A: int, B: int) [
'abc','2022-01-01 12:00:00',1,2,
'def','2022-01-01 13:00:00',3,4
];
Desired output
Calling wide_to_long(T_wide, dynamic(['name', 'timestamp'])) should produce the following table.
let T_long = datatable(name: string, timestamp: datetime, variable: string, value: int) [
'abc','2022-01-01 12:00:00','A',1,
'abc','2022-01-01 12:00:00','B',2,
'def','2022-01-01 13:00:00','A',3,
'def','2022-01-01 13:00:00','B',4
];
Attempt
I've come pretty far with the following code.
let wide_to_long = (T:(*), id_vars: dynamic) {
// get names of keys to remove later
let all_columns = toscalar(T | getschema | summarize make_list(ColumnName));
let remove = set_difference(all_columns, id_vars);
// expand columns not contained in id_vars
T
| extend packed1 = pack_all()
| extend packed1 = bag_remove_keys(packed1, id_vars)
| mv-expand kind=array packed1
| extend variable = packed1[0], value = packed1[1]
// remove unwanted columns
| project packed2 = pack_all()
| project packed2 = bag_remove_keys(packed2, remove)
| evaluate bag_unpack(packed2)
| project-away packed1
};
The problems are that the solution feels clunky (is there a better way?) and the columns in the result are ordered randomly. The second issue is minor, but annoying.

Kusto for sliding window

I am new to Kusto Query language. Requirement is to alert when the continuous 15 minute value of machine status is 1.
I have two columns with column1:(timestamp in every second) and column2:machine status(values 1 and 0).How can I use a sliding window to find if the machine is 1 for continuous 15 minutes.
Currently I have used the bin function, but it does not seem to be the proper one.
summarize avg_value = avg(status) by customer, machine,bin(timestamp,15m)
What could be the better solution for this.
Thanks in advance
Here is another option using time series functions:
let dt = 1s;
let n_bins = tolong(15m/dt);
let coeffs = repeat(1, n_bins);
let T = view(M:string) {
range Timestamp from datetime(2022-01-11) to datetime(2022-01-11 01:00) step dt
| extend machine = M
| extend status = iif(rand()<0.002, 0, 1)
};
union T("A"), T("B")
| make-series status=any(status) on Timestamp step dt by machine
| extend rolling_status = series_fir(status, coeffs, false)
| extend alerts = series_equals(rolling_status, n_bins)
| project machine, Timestamp, alerts
| mv-expand Timestamp to typeof(datetime), alerts to typeof(bool)
| where alerts == 1
You can also do it using the scan operator.
thanks
Here is one way to do it, the example uses generated data, hopefully it fits in your scenario:
let view = range x from datetime(2022-01-10 13:00:10) to datetime(2022-01-10 13:10:10) step 1s
| extend status = iif(rand()<0.01, 0, 1)
| extend current_sum = row_cumsum(status)
| extend prior_sum = prev(current_sum, 15)
| extend should_alert = (current_sum-prior_sum != 15 and isnotempty(prior_sum))
If you have multiple machines you need to sort it first by machines and restart the row_cumsum operation:
let T = view(M:string) {
range Timestamp from datetime(2022-01-10 13:00:10) to datetime(2022-01-10 13:10:10) step 1s
| extend machine = M
| extend status = iif(rand()<0.01, 0, 1)
};
union T("A"), T("B")
| sort by machine asc, Timestamp asc
| extend current_sum = row_cumsum(status, machine != prev(machine))
| extend prior_sum = iif(machine == prev(machine, 15), prev(current_sum, 15), int(null))
| extend should_alert = (current_sum-prior_sum != 15 and isnotempty(prior_sum))

How to access the range-step value within `toscalar()` statement used within `range()` statement

Am using a Kusto query to create a timechart within Azure AppInsights, to visualize when our webservice is within its SLO (and when it isn't) using one of Google's examples of measuring if a webservice is within its error budget:
SLI = The proportion of sufficiently fast requests, as measured from the load balancer metrics. “Sufficiently fast” is defined as < 400 ms.
SLO = 90% of requests < 400 ms
Measured as:
count of http_requests with a duration less than or equal to "0.4" seconds
divided by count of all http_requests
Assuming 10-minute inspection intervals over a 7-day window, here is my code:
let fastResponseTimeMaxMs = 400.0;
let errorBudgetThresholdForFastResponseTime = 90.0;
//
let startTime = ago(7days);
let endTime = now();
let timeStep = 10m;
//
let timeRange = range InspectionTime from startTime to endTime step timeStep;
timeRange
| extend RespTimeMax_ms = fastResponseTimeMaxMs
| extend ActualCount = toscalar
(
requests
| where timestamp > InspectionTime - timeStep
| where timestamp <= InspectionTime
| where success == "True"
| where duration <= fastResponseTimeMaxMs
| count
)
| extend TotalCount = toscalar
(
requests
| where timestamp > InspectionTime - timeStep
| where timestamp <= InspectionTime
| where success == "True"
| count
)
| extend Percentage = round(todecimal(ActualCount * 100) / todecimal(TotalCount), 2)
| extend ErrorBudgetMinPercent = errorBudgetThresholdForFastResponseTime
| extend InBudget = case(Percentage >= ErrorBudgetMinPercent, 1, 0)
Sample query output of what I wish to achieve:
InspectionTime [UTC] RespTimeMax_ms ActualCount TotalCount Percentage ErrorBudgetMinPercent InBudget
2019-05-23T21:53:17.894 400 8,098 8,138 99.51 90 1
2019-05-23T22:03:17.894 400 8,197 9,184 89.14 90 0
2019-05-23T22:13:17.894 400 8,002 8,555 93.54 90 1
The error I'm getting is:
'where' operator: Failed to resolve scalar expression named 'InspectionTime'
I've tried todatetime(InspectionTime), fails with same error.
Replacing InspectionTime with other objects of type datetime gets this code to execute OK, but not with the datetime values that I want. By example, using this snippet executes OK, when used within my code sample above:
| extend ActualCount = toscalar
(
requests
| where timestamp > startTime // instead of 'InspectionTime - timeStep'
| where timestamp <= endTime // instead of 'InspectionTime'
| where duration <= fastResponseTimeMaxMs
| count
)
To me it seems that using InspectionTime within toscalar(...) is the crux of this problem, since I'm able to use InspectionTime within similar queries using range(...) that don't nest it within toscalar(...).
Note: I don't want a timechart chart of request.duration, since that doesn't tell me if the count of requests above my threshold (400ms) exceed our error budget according to the formula defined above.
your query is invalid as you can't reference the InspectionTime column in the subquery that you're running in toscalar().
if I understand your desired logic correctly, the following query might work or give you a different direction (if not - you may want to share a sample input dataset using the datatable operator, and specify the desired result that matches it)
let fastResponseTimeMaxMs = 400.0;
let errorBudgetThresholdForFastResponseTime = 90.0;
//
let startTime = ago(7days);
let endTime = now();
let timeStep = 10m;
//
requests
| where timestamp > startTime and timestamp < endTime
| where success == 'True'
| summarize TotalCount = count(), ActualCount = countif(duration <= fastResponseTimeMaxMs) by bin(timestamp, timeStep)
| extend Percentage = round(todecimal(ActualCount * 100) / todecimal(TotalCount), 2)
| extend ErrorBudgetMinPercent = errorBudgetThresholdForFastResponseTime
| extend InBudget = case(Percentage >= ErrorBudgetMinPercent, 1, 0)

How to: Run a user defined function for a range of (date) values

So let’s say I want to test a function that finds outliers over past data. I’d love to end up with a table that looks like this:
Time Outliers_At_Time
<somedate> 0
<somedate + interval> 1
The function looks like this:
let OutliersAt = (TheDate:datetime) {
<… outputs zero or a positive integer>
}
My instinct would be to do something like this:
let SomeDates = range AtTime from ago(10d) to now() step 10m;
SomeDates | extend NumOutliers = OutliersAt (AtTime)
… but that gives me this error message:
Error Semantic error: '' has the following semantic error: Unresolved
reference binding: 'AtTime'. clientRequestId:
KustoWebV2;1ea28ba0-12f1-4a52-95e7-975db3310f59
Suggestions?
If you are looking on finding outliers - there is a built-in function in Kusto to do it:
https://learn.microsoft.com/en-us/azure/kusto/query/series-outliersfunction
Example:
let _data =
range Timestamp from ago(7d) to now() step 1min
| extend Value=case(rand(1000)==10, 1200.0, rand(100));
//
_data
| make-series AvgValue=avg(Value) default=0 on Timestamp in range(ago(7d), now(), 5min)
| extend outliers=series_outliers(AvgValue)
| render timechart
If the question is about general way to provide parameters to user-defined functions,
see more info here:
https://learn.microsoft.com/en-us/azure/kusto/query/functions/user-defined-functions
In particular, you can pass a serie into a user-defined-function (e.g. to get statistics):
let OutliersAt = (_serie:dynamic) {
let stats = series_stats_dynamic(_serie);
todouble(stats.max_idx) >= 0
};
let _data =
range Timestamp from ago(7d) to now() step 1min
| extend Value=case(rand(1000)==10, 1200.0, rand(100));
//
_data
| make-series AvgValue=avg(Value) default=0 on Timestamp in range(ago(7d), now(), 5min)
| extend outliers=series_outliers(AvgValue)
| project hasOutliers=OutliersAt(outliers)

Avoid running a function multiple times in a query

I have the following query in Application Insights where I run the parsejson function multiple times in the same query.
Is it possible to reuse the data from the parsejson() function after the first invocation? Right now I call it three times in the query. I am trying to see if calling it just once might be more efficient.
EventLogs
| where Timestamp > ago(1h)
and tostring(parsejson(tostring(Data.JsonLog)).LogId) =~ '567890'
| project Timestamp,
fileSize = toint(parsejson(tostring(Data.JsonLog)).fileSize),
pageCount = tostring(parsejson(tostring(Data.JsonLog)).pageCount)
| limit 10
You can use extend for that:
EventLogs
| where Timestamp > ago(1h)
| extend JsonLog = parsejson(tostring(Data.JsonLog)
| where tostring(JsonLog.LogId) =~ '567890'
| project Timestamp,
fileSize = toint(JsonLog.fileSize),
pageCount = tostring(JsonLog.pageCount)
| limit 10

Resources