Kusto Query Language: How to save column of results into a variable? - azure-data-explorer

Lets say I have a query like:
cluster("cluster1").database("db2").Table3
| distinct * // distinct combinations of data
| take 5 // take 5
How do I save the values from a column in the results output to a pack_array variable.
I want to use this pack_array variable for follow on queries like:
cluster("cluster2").database("db3").Table1
| where ColumnofInterest in (pack_array_var from above)
| take 5 // take 5

Provide the "*" argument to the function and use the "let" statement. Here is an example:
let ValuesFromTheOtherCluster = cluster('cluster1').database('db2').Table3
| extend tempArray = pack_array(*)
| summarize filters = make_set(tempArray);
cluster('cluster2').database("db3").Table1
| where ColumnofInterest in (ValuesFromTheOtherCluster)

Related

Passing table list to "Find In" operator dynamically at run time in Kusto Query Language

I have a where condition which I want to run over a set of tables in my Azure Data Explorer DB. I found "Find in ()" operator in Kusto query quite useful, works fine when I pass list of tables as intended.
find withsource=DataType in (AppServiceFileAuditLogs,AzureDiagnostics)
where TimeGenerated > ago(31d)
project _ResourceId, _BilledSize, _IsBillable
| where _IsBillable == true
| summarize BillableDataBytes = sum(_BilledSize) by _ResourceId, DataType | sort by BillableDataBytes nulls last
However, in my scenario, I would like to decide the list of tables at run time using another query.
Usage
| where TimeGenerated > ago(32d)
| where StartTime >= startofday(ago(31d)) and EndTime < startofday(now())
| where IsBillable == true
| summarize BillableDataGB = sum(Quantity) / 1000 by DataType
| sort by BillableDataGB desc
|project DataType
find withsource=DataType in (<pass resulting table expression from above query here as comma separated list of tables>)
where TimeGenerated > ago(31d)
project _ResourceId, _BilledSize, _IsBillable
| where _IsBillable == true
| summarize BillableDataBytes = sum(_BilledSize) by _ResourceId, DataType | sort by BillableDataBytes nulls last
Found some examples of passing all tables in a database or cluster using wildcards but that does not fit my scenario. Can somebody help me here.
Here is one way to achieve this:
let Tables = toscalar(Usage
| where TimeGenerated > ago(32d)
| where StartTime >= startofday(ago(31d)) and EndTime < startofday(now())
| where IsBillable == true
| summarize by DataType);
union withsource=T *
| where T in (Tables)
| count
Note that there is a significance to the toscalar expression, it precalculates the list of tables and optimizes the filter on the union expression. I also updated your query to avoid unnecessary work.

Kusto result column name, bin value from request_parameters

Using query_parameters, how can I:
specify a result column name (ex: summarize ResultColumnName = count())
specify the value of a bin, when value is actually the name of a column in the table
This is easiest to summarize with an example:
let myTable = datatable (Timestamp:datetime)
[datetime(1910-06-11),
datetime(1930-01-01),
datetime(1997-06-25),
datetime(1997-06-25)];
let UntrustedUserInput_ColumnName = "MyCount"; // actually from query_parameters
let UntrustedUserInput_BinValue = "Timestamp"; // actually from query_parameters
let UntrustedUserInput_BinRoundTo = "365d"; // actually from query_parameters
// the query I really want to perform
myTable
| summarize MyCount=count() by bin(todatetime(Timestamp), totimespan(365d));
// what the query looks like if I use query_parameters
myTable
| summarize UntrustedUserInput_ColumnName=count() by bin(todatetime(UntrustedUserInput_BinValue), totimespan(UntrustedUserInput_BinRoundTo));
Results:
Timestamp MyCount
--------- -------
1909-09-26T00:00:00Z 1
1929-09-21T00:00:00Z 1
1996-09-04T00:00:00Z 2
Column1 UntrustedUserInput_ColumnName
------- -----------------------------
4
I can't find a solution to #1.
It appears #2 can almost be solved by using column_ifexists, but I don't have a "default" to fall back on, I'd rather just fail if the column doesn't exist.
Treating column names as variables is not possible since columns names are part of the result schema coming out of each operator (with the exception of the "evaluate" operator, see specifically the pivot plugin).
There actually is a way to set variable names to a column, using a hacky trick:
let VariableColumnName = "TestColumn"; // the new column name that you want
range i from 1 to 5 step 1 // this is just a sample query
| project pack(VariableColumnName, i) // this created a JSON
| evaluate bag_unpack(Column1) // unpacking the JSON creates a column with a dynamic name
This will return a column named TestColumn, which is set in VariableColumnName.

How to retreive custom property corresponding to another property in azure

I am trying to write a kusto query to retrieve a custom property as below.
I want to retrieve count of pkgName and corresponding organization. I could retrieve the count of pkgName and the code is attached below.
let mainTable = union customEvents
| extend name =replace("\n", "", name)
| where iif('*' in ("*"), 1 == 1, name in ("*"))
| where true;
let queryTable = mainTable;
let cohortedTable = queryTable
| extend dimension = customDimensions["pkgName"]
| extend dimension = iif(isempty(dimension), "<undefined>", dimension)
| summarize hll = hll(itemId) by tostring(dimension)
| extend Events = dcount_hll(hll)
| order by Events desc
| serialize rank = row_number()
| extend dimension = iff(rank > 10, 'Other', dimension)
| summarize merged = hll_merge(hll) by tostring(dimension)
| project ['pkgName'] = dimension, Counts = dcount_hll(merged);
cohortedTable
Please help me to get the organization along with each pkgName projected.
Please try this simple query:
customEvents
| summarize counts=count(tostring(customDimensions.pkgName)) by pkgName=tostring(customDimensions.pkgName),organization=tostring(customDimensions.organization)
Please feel free to modify it to meet your requirement.
If the above does not meet your requirement, please try to create another table which contains pkgName and organization relationship. Then use join operator to join these tables. For example:
//create a table which contains the relationship
let temptable = customEvents
| summarize by pkgName=tostring(customDimensions.pkgName),organization=tostring(customDimensions.organization);
//then use the join operator to join these tables on the keyword pkgName.

Can I use tabular parameters in Kusto user-defined functions

Basically I'd like to pass in a set of field values to a function so I can use in/!in operators. I'd prefer to be able to use the result of a previous query rather than having to construct a set manually.
As in:
let today = exception | where EventInfo_Time > ago(1d) | project exceptionMessage;
MyAnalyzeFunction(today)
What is then the signature of MyAnalyzeFunction?
See: https://learn.microsoft.com/en-us/azure/kusto/query/functions/user-defined-functions
For instance, the following will return a table with a single column (y) with the values 2 and 3:
let someTable = range x from 2 to 10 step 1
;
let F = (T:(x:long))
{
range y from 1 to 3 step 1
| where y in (T)
}
;
F(someTable)

Passing result variable to nested SELECT statement in Sqlite

I have the following query which works:
SELECT
SoftwareList,
Count (SoftwareList) as Count
FROM [assigned]
GROUP BY SoftwareList
This returns the following result set:
*SoftwareList* | *Count*
--------------------------
Office XP | 3
Adobe Reader | 3
Dreamewaver | 2
I can also run the following query:
SELECT
GROUP_CONCAT(LastSeen) as LastSeen
FROM [assigned]
WHERE SoftwareList = 'Dreamweaver';
Which would return the following result set:
*LastSeen*
----------
2007-9-23,2012-3-12
I wish to combine both of these queries into one, so that the following results are returned:
*SoftwareList* | *Count* | *LastSeen*
--------------------------------------------------------
Office XP | 3 | 2001-2-12,2008-3-19,2002-2-17
Adobe Reader | 3 | 2008-2-12,2009-3-20,2007-3-16
Dreamewaver | 2 | 2007-9-23,2012-3-12
I am trying this but don't know how to refer to the initial SoftwareList variable within the nested statement:
SELECT
SoftwareList,
Count (SoftwareList) as Count,
(SELECT
GROUP_CONCAT(LastSeen) FROM [assigned]
WHERE SoftwareList = SoftwareList
) as LastSeen
FROM [assigned]
GROUP BY SoftwareList;
How can I pass SoftwareList which is returned for each row, into the nested statement?
I think this is what you want:
SELECT SoftwareList, COUNT(SoftwareList) AS Count, GROUP_CONCAT(LastSeen)
FROM assigned GROUP BY SoftwareList

Resources