Display a message under certain criteria instead of results in Kusto - azure-data-explorer

In Kusto, I want to display a message to the user depending on certain criteria. For example
isempty(['_tenant'])
| print "Note: ", "You must select a tenant"
else???
Events
| where tenant == ['_tenant']
| ...
The criteria is different for each query, as well as the message.

A different way to do it is to do a union where each leg of the union is mutually exclusive. The catch is that a function must return a consistent schema regardless of input. So you'll end up with both a Status column and an x column in this example.
let myFunc = (y:long) {
union
(
print Status = "Y must be greater than 0"
| where y > 0
),
(
range x from 1 to 10 step 1
| where y <= 0
)
};
myFunc(-1)

It sounds like you might be looking for the assert() function: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/assert-function
let checkLength = (len:long, s:string)
{
assert(len > 0, "Length must be greater than zero") and
strlen(s) > len
};
datatable(input:string)
[
'123',
'4567'
]
| where checkLength(len=long(-1), input)

Related

Kusto Query Dynamic sort Order

I have started working on Azure Data Explorer( Kusto) recently.
My requirement to make sorting order of Kusto table in dynamic way.
// Variable declaration
let SortColumn ="run_date";
let OrderBy="desc";
// Actual Code
tblOleMeasurments
| take 10
|distinct column1,column2,column3,run_date
|order by SortColumn OrderBy
Here My code working fine till Sortcolumn but when I tried to add [OrderBy] after [SortColumn] kusto gives me error .
My requirement here is to pass Asc/desc value from Variable [OrderBy].
Kindly assist here with workarounds and solutions which help me .
The sort column and order cannot be an expression, it must be a literal ("asc" or "desc"). If you want to pass the sort column and sort order as a variable, create a union instead where the filter on the variables results with the desired outcome. Here is an example:
let OrderBy = "desc";
let sortColumn = "run_date";
let Query = tblOleMeasurments | take 10 |distinct column1,column2,column3,run_date;
union
(Query | where OrderBy == "desc" and sortColumn == "run_date" | order by run_date desc),
(Query | where OrderBy == "asc" and sortColumn == "run_date" | order by run_date asc)
The number of union legs would be the product of the number of candidate sort columns times two (the two sort order options).
An alternative would be sorting by a calculated column, which is based on your sort_order and sort_column. The example below works for numeric columns
let T = range x from 1 to 5 step 1 | extend y = -10 * x;
let sort_order = "asc";
let sort_column = "y";
T
| order by column_ifexists(sort_column, "") * case(sort_order == "asc", -1, 1)

SQL to find next greater records for each element

I've got a table defined like this:
CREATE TABLE event (t REAL, event TEXT, value);
For each record in the table which have event='type' and value='G' there will be two corresponding records with event='Z' - one with value=1 and one with value=0. Here is an example:
t | event | value
1624838448.123 | type | G
1624838448.123 | Z | 1
1624839543.215 | Z | 0
Note that there could be other event='Z' records that don't have corresponding type='G' records. I'm trying to write a query to find all the event='G' records that do have a corresponding type='G' record to use as the bounds for an additional query (or join?).
Note: The t value for the "type" event and the Z event where value=1 will always be the same.
So for instance if the table looked like this:
t | event | value
1624838448.123 | type | G
1624838448.123 | Z | 1
1624839543.215 | Z | 0
1624839555.555 | type | H
1624838555.555 | Z | 1
1624839602.487 | Z | 0
1624839999.385 | type | G
1624839999.385 | Z | 1
1624840141.006 | Z | 0
Then I want the results of the query to return this:
t1 | t2
1624838448.123 | 1624839543.215
1624839999.385 | 1624840141.006
From your comment:
There are always three records (ignoring any other events in between)
in chronological order: the "type" event, the first "Z" record with
the same timestamp, and the second "Z" record with a later timestamp
So, there is no need to return t1 separately since it is equal to t in the row where event = 'type' and value = 'G'.
For t2 you can use conditional aggregation with MIN() window function:
SELECT t1, t2
FROM (
SELECT t AS t1, event, value
MIN(CASE WHEN event = 'Z' AND value = '0' THEN t END) OVER (ORDER BY t ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) t2
FROM Event
)
WHERE event = 'type' AND value = 'G'
See the demo.
I found a solution using the RANK() function. With this I get an intermediate table which has the same rank for both the "type" and first "Z" record, since they have the same timestamp, and a rank two greater for the second "Z" record. I use WITH so I can self join repeatedly without having to specify the same query over and over. I first join the "type" and first "Z" row by requiring that the type of two second record be greater than that of the first (so I only get the type:Z combination and not type:type, Z:type, or Z:Z). Then I self join again to get the rank-2 row which picks up the second Z record. Overall, the query looks like this:
WITH Seq(t,event,A,I)
AS
(
SELECT t, event, value,
RANK() OVER (ORDER BY t) I
FROM Event e1
WHERE (e1.event='type' OR e1.event='Z')
)
SELECT s2.t,s3.t
FROM Seq s1
INNER JOIN Seq s2 ON s1.I = s2.I AND s1.event < s2.event
INNER JOIN Seq s3 ON s1.I = s3.I-2
WHERE s1.value='G';

Kusto: How to convert table value to scalar and return from user defined function

I have the following user-defined functions with the intention of using a case conditional to output a table of 0s or 1s saying whether or not an account is active.
case needs scalar values as it's arguments, ie pro_account_active(account) and basic_account_active(account) need to be scalar values.
I'm struggling to get around the limitation of toscalar:
User-defined functions can't pass into toscalar() invocation
information that depends on the row-context in which the function is
called.
I think if there was a function I can use in place of the "??????" that would convert active to a scalar and return it from the function it would work.
Any help greatly appreciated
let basic_account_active=(account:string) {
basic_check_1(account) // returns 0 or 1 row only
| union basic_check_2(account)
| summarize result_count = count()
| extend active = iff(result_count == 2, 1, 0)
| ??????
};
let pro_account_active=(account:string) {
pro_check_1(account) // returns 0 or 1 row only
| union pro_check_2(account)
| summarize result_count = count()
| extend active = iff(result_count == 2, 1, 0)
| ??????
};
let is_active=(account_type:string, account:string) {
case(
account_type == 'pro', pro_account_active(account),
account_type == 'basic', basic_account_active(account),
-1
)
};
datatable(account_type:string, account:string)
[
'pro', '89e5678a92',
'basic', '9d8263da45',
'pro', '0b975f2454a',
'basic', '112a3f4753',
]
| extend result = is_active(account_type, account)
You can convert the output of a query to a scalar by using the toscalar() function, i.e.
let basic_account_active=(account:string) {
toscalar(basic_check_1(account) // returns 0 or 1 row only
| union basic_check_2(account)
| summarize result_count = count()
| extend active = iff(result_count == 2, 1, 0))};
From your example it looks that you have two tables per each account type and if both have entrees for a specific account, then the account is considered active. Is that correct? If so, I would use the "join" operator to find all the entrees in the applicable tables and count them. Here is an example of one way to do it (there are other ways as well).
let basicAccounts1 = datatable(account_type:string, account:string)[ 'basic', '9d8263da45', 'basic', '111111'];
let basicAccounts2 = datatable(account_type:string, account:string)[ 'basic', '9d8263da45', 'basic', '222222'];
let proAccounts1 = datatable(account_type:string, account:string)[ 'pro', '89e5678a92', 'pro', '111111'];
let proAccounts2 = datatable(account_type:string, account:string)[ 'pro', '89e5678a92', 'pro', '222222'];
let AllAccounts = union basicAccounts1, basicAccounts2, proAccounts1, proAccounts2
| summarize count() by account, account_type;
datatable(account_type:string, account:string)
[
'pro', '89e5678a92',
'basic', '9d8263da45',
'pro', '0b975f2454a',
'basic', '112a3f4753',
]
| join kind=leftouter hint.strategy=broadcast (AllAccounts) on account, account_type
| extend IsActive = count_ >=2
| project-away count_, account1, account_type1
The results are:

Can I use tabular parameters in Kusto user-defined functions

Basically I'd like to pass in a set of field values to a function so I can use in/!in operators. I'd prefer to be able to use the result of a previous query rather than having to construct a set manually.
As in:
let today = exception | where EventInfo_Time > ago(1d) | project exceptionMessage;
MyAnalyzeFunction(today)
What is then the signature of MyAnalyzeFunction?
See: https://learn.microsoft.com/en-us/azure/kusto/query/functions/user-defined-functions
For instance, the following will return a table with a single column (y) with the values 2 and 3:
let someTable = range x from 2 to 10 step 1
;
let F = (T:(x:long))
{
range y from 1 to 3 step 1
| where y in (T)
}
;
F(someTable)

Optimize left join query with multiple counts in SQLAlchemy?

Trying to optimize a query, which has multiple counts for objects in subordinate table (used aliases in SQLAlchemy). In Witch Academia terms, something like this:
SELECT
exam.id AS exam_id,
exam.name AS exam_name,
count(tried_witch.id) AS tried,
count(passed_witch.id) AS passed,
count(failed_witch.id) AS failed
FROM exam
LEFT OUTER JOIN witch AS tried_witch
ON tried_witch.exam_id = exam.id AND
tried_witch.is_failed = 0 AND
tried_witch.status != "passed"
LEFT OUTER JOIN witch AS passed_witch
ON passed_witch.exam_id = exam.id AND
passed_witch.is_failed = 0 AND
passed_witch.status = "passed"
LEFT OUTER JOIN witch AS failed_witch
ON failed_witch.exam_id = exam.id AND
failed_witch.is_failed = 1
GROUP BY exam.id, exam.name
ORDER BY tried ASC
LIMIT 20
Number of witches can be large (hundreds of thousands), number of exams is lower (hundreds), so the above query is quite slow. In a lot of similar questions I've found answers, which propose the above, but I feel like a totally different approach is needed here. I am stuck at coming up with alternative. NB, there is a need to order by calculated counts. It is also important to have zeros as counts, of course, where due. (do not pay attention to a somewhat funny model: witches can easily clone themselves to go to multiple exams, thus per exam identity)
With one EXISTS subquery, which is not reflected in the above and does not influence the ouotcome, the situation is:
# Query_time: 1.135747 Lock_time: 0.000209 Rows_sent: 20 Rows_examined: 98174
# Rows_affected: 0
# Full_scan: Yes Full_join: No Tmp_table: Yes Tmp_table_on_disk: Yes
# Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: No
Updated query, which is still quite slow:
SELECT
exam.id AS exam_id,
exam.name AS exam_name,
count(CASE WHEN (witch.status != "passed" AND witch.is_failed = 0)
THEN witch.id
ELSE NULL END) AS tried,
count(CASE WHEN (witch.status = "passed" AND witch.is_failed = 0)
THEN witch.id
ELSE NULL END) AS passed,
count(CASE WHEN (witch.is_failed = 1)
THEN witch.id
ELSE NULL END) AS failed
FROM exam
LEFT OUTER JOIN witch ON witch.exam_id = exam.id
GROUP BY exam.id, exam.name
ORDER BY tried ASC
LIMIT 20
Indexing is the key to get performance of the query.
I do not know MariaDB at all, so not sure what the possibilities are. But if it is anything like Microsoft SQL Server, then here is what I would try:
Create ONE composite index covering ALL the required columns: witch_id, status and is_failed. If the query uses that index, that should be it. Here the order of the included columns might be very important. Then profile the query in order to understand if the index is used. See Optimization and Indexes documentation page.
Consider Generated (Virtual and Persistent) Columns.
It looks like all the information for classification of the witch into tried, passed or failed bucket is contained in the row for witch. Therefore, you can basically create those virtual columns on the database table directly and use PERSISTENT option. This option allows creating index on it. Then you can create an index specifically for this query containing witch_id and three virtual columns: tried, passed and failed. Make sure you query uses it, and that should be pretty good. The query will then look very simple:
SELECT exam.id,
exam.name,
sum(witch.tried) AS tried,
sum(witch.passed) AS passed,
sum(witch.failed) AS failed
FROM exam
INNER JOIN witch ON exam.id = witch.exam_id
GROUP BY exam.id,
exam.name
ORDER BY sum(witch.tried)
LIMIT 20
Although query simple comparisons and AND/OR clauses, you are basically offloading the calculation of the 3 statuses to the database during INSERT/UPDATE. Then during SELECT you query should be much faster.
Your example does not specify any result filtering (WHERE clause), but if you have one, it might also have an impact on the way one optimises indices for query performance.
Original answer: Below is the originally proposed change to the query.
Here i assume that indexing part of the optimisation has been already done.
Could you try with SUM instead of COUNT?
SELECT exam.id,
exam.name,
sum(CASE
WHEN (witch.is_failed = 0
AND witch.status != 'passed') THEN 1
ELSE 0
END) AS tried,
sum(CASE
WHEN (witch.is_failed = 0
AND witch.status = 'passed') THEN 1
ELSE 0
END) AS passed,
sum(CASE
WHEN (witch.is_failed = 1) THEN 1
ELSE 0
END) AS failed
FROM exam
INNER JOIN witch ON exam.id = witch.exam_id
GROUP BY exam.id,
exam.name
ORDER BY sum(CASE
WHEN (witch.is_failed = 0
AND witch.status != 'passed') THEN 1
ELSE 0
END)
LIMIT 20
The rest:
Given you have specified sqlalchemy in your answer, here is the sqlalchemy code, which i used to model and generate the query:
# model
class Exam(Base):
id = Column(Integer, primary_key=True)
name = Column(String)
class Witch(Base):
id = Column(Integer, primary_key=True)
exam_id = Column(Integer, ForeignKey('exam.id'))
is_failed = Column(Integer)
status = Column(String)
exam = relationship(Exam, backref='witches')
# computed fields
#hybrid_property
def tried(self):
return self.is_failed == 0 and self.status != 'passed'
#hybrid_property
def passed(self):
return self.is_failed == 0 and self.status == 'passed'
#hybrid_property
def failed(self):
return self.is_failed == 1
# computed fields: expression
#tried.expression
def _tried_expression(cls):
return case([(and_(
cls.is_failed == 0,
cls.status != 'passed',
), 1)], else_=0)
#passed.expression
def _passed_expression(cls):
return case([(and_(
cls.status == 'passed',
cls.is_failed == 0,
), 1)], else_=0)
#failed.expression
def _failed_expression(cls):
return case([(cls.is_failed == 1, 1)], else_=0)
and:
# query
q = (
session.query(
Exam.id, Exam.name,
func.sum(Witch.tried).label("tried"),
func.sum(Witch.passed).label("passed"),
func.sum(Witch.failed).label("failed"),
)
.join(Witch)
.group_by(Exam.id, Exam.name)
.order_by(func.sum(Witch.tried))
.limit(20)
)

Resources