Can I use tabular parameters in Kusto user-defined functions - azure-data-explorer

Basically I'd like to pass in a set of field values to a function so I can use in/!in operators. I'd prefer to be able to use the result of a previous query rather than having to construct a set manually.
As in:
let today = exception | where EventInfo_Time > ago(1d) | project exceptionMessage;
MyAnalyzeFunction(today)
What is then the signature of MyAnalyzeFunction?

See: https://learn.microsoft.com/en-us/azure/kusto/query/functions/user-defined-functions
For instance, the following will return a table with a single column (y) with the values 2 and 3:
let someTable = range x from 2 to 10 step 1
;
let F = (T:(x:long))
{
range y from 1 to 3 step 1
| where y in (T)
}
;
F(someTable)

Related

Kusto equivalent of SQL NOT IN

I am trying to identify what records exist in table 1 that are not in table 2 (so essentially using NOT IN)
let outliers =
Table 2
| project UniqueEventGuid;
Table 1
|where UniqueEventGuid !in (outliers)
|project UniqueEventGuid
but getting 0 records back even though I know there are orphans in table 1.
Is the !in not the right syntax?
Thanks in advance!
!in operator
"In tabular expressions, the first column of the result set is
selected."
In the following example I intentionally ordered the column such that the query will result in error due to mismatched data types.
In your case, the data types might match, so the query is valid, but the results are wrong.
let t1 = datatable(i:int, x:string)[1,"A", 2,"B", 3,"C" ,4,"D" ,5,"E"];
let t2 = datatable(y:string, i:int)["d",4 ,"e",5 ,"f",6 ,"g",7];
t1
| where i !in (t2)
Relop semantic error: SEM0025: One of the values provided to the
'!in' operator does not match the left side expression type 'int',
consider using explicit cast
Fiddle
If that is indeed the case, you can reorder the columns or project only the relevant one.
Note the use of double brackets.
let t1 = datatable(i:int, x:string)[1,"A", 2,"B", 3,"C" ,4,"D" ,5,"E"];
let t2 = datatable(y:string, i:int)["d",4 ,"e",5 ,"f",6 ,"g",7];
t1
| where i !in ((t2 | project i))
i
x
1
A
2
B
3
C
Fiddle
Another option is to use leftanti join
let t1 = datatable(i:int, x:string)[1,"A", 2,"B", 3,"C" ,4,"D" ,5,"E"];
let t2 = datatable(y:string, i:int)["d",4 ,"e",5 ,"f",6 ,"g",7];
t1
| join kind=leftanti t2 on i
i
x
2
B
3
C
1
A
Fiddle

Display a message under certain criteria instead of results in Kusto

In Kusto, I want to display a message to the user depending on certain criteria. For example
isempty(['_tenant'])
| print "Note: ", "You must select a tenant"
else???
Events
| where tenant == ['_tenant']
| ...
The criteria is different for each query, as well as the message.
A different way to do it is to do a union where each leg of the union is mutually exclusive. The catch is that a function must return a consistent schema regardless of input. So you'll end up with both a Status column and an x column in this example.
let myFunc = (y:long) {
union
(
print Status = "Y must be greater than 0"
| where y > 0
),
(
range x from 1 to 10 step 1
| where y <= 0
)
};
myFunc(-1)
It sounds like you might be looking for the assert() function: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/assert-function
let checkLength = (len:long, s:string)
{
assert(len > 0, "Length must be greater than zero") and
strlen(s) > len
};
datatable(input:string)
[
'123',
'4567'
]
| where checkLength(len=long(-1), input)

How can we fill a column with specific values in Kusto?

I have a table in kusto with 13,000 rows. I would like to know how can I create a new column in this table which fill it with only 2 values (0 and 1) randomly. Is there also a possibility to create a column containing 3 different value of data type: string ?
you can extend a calculated column using the rand() function: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/randfunction
for example:
0 or 1:
| extend y = toint(rand(1) > 0.5)
1 of 3 strings (first, second or third):
| extend r = rand(3)
| extend s = case(r <= 0, "first", r <= 1, "second", "third")
| project-away r
if you need to do this at ingestion time, you can use an update policy: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/updatepolicy
or if you want to do this for the existing table, you can use a .set-or-replace command: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/data-ingestion/ingest-from-query

Kusto Query Language: How to save column of results into a variable?

Lets say I have a query like:
cluster("cluster1").database("db2").Table3
| distinct * // distinct combinations of data
| take 5 // take 5
How do I save the values from a column in the results output to a pack_array variable.
I want to use this pack_array variable for follow on queries like:
cluster("cluster2").database("db3").Table1
| where ColumnofInterest in (pack_array_var from above)
| take 5 // take 5
Provide the "*" argument to the function and use the "let" statement. Here is an example:
let ValuesFromTheOtherCluster = cluster('cluster1').database('db2').Table3
| extend tempArray = pack_array(*)
| summarize filters = make_set(tempArray);
cluster('cluster2').database("db3").Table1
| where ColumnofInterest in (ValuesFromTheOtherCluster)

Kusto result column name, bin value from request_parameters

Using query_parameters, how can I:
specify a result column name (ex: summarize ResultColumnName = count())
specify the value of a bin, when value is actually the name of a column in the table
This is easiest to summarize with an example:
let myTable = datatable (Timestamp:datetime)
[datetime(1910-06-11),
datetime(1930-01-01),
datetime(1997-06-25),
datetime(1997-06-25)];
let UntrustedUserInput_ColumnName = "MyCount"; // actually from query_parameters
let UntrustedUserInput_BinValue = "Timestamp"; // actually from query_parameters
let UntrustedUserInput_BinRoundTo = "365d"; // actually from query_parameters
// the query I really want to perform
myTable
| summarize MyCount=count() by bin(todatetime(Timestamp), totimespan(365d));
// what the query looks like if I use query_parameters
myTable
| summarize UntrustedUserInput_ColumnName=count() by bin(todatetime(UntrustedUserInput_BinValue), totimespan(UntrustedUserInput_BinRoundTo));
Results:
Timestamp MyCount
--------- -------
1909-09-26T00:00:00Z 1
1929-09-21T00:00:00Z 1
1996-09-04T00:00:00Z 2
Column1 UntrustedUserInput_ColumnName
------- -----------------------------
4
I can't find a solution to #1.
It appears #2 can almost be solved by using column_ifexists, but I don't have a "default" to fall back on, I'd rather just fail if the column doesn't exist.
Treating column names as variables is not possible since columns names are part of the result schema coming out of each operator (with the exception of the "evaluate" operator, see specifically the pivot plugin).
There actually is a way to set variable names to a column, using a hacky trick:
let VariableColumnName = "TestColumn"; // the new column name that you want
range i from 1 to 5 step 1 // this is just a sample query
| project pack(VariableColumnName, i) // this created a JSON
| evaluate bag_unpack(Column1) // unpacking the JSON creates a column with a dynamic name
This will return a column named TestColumn, which is set in VariableColumnName.

Resources