Kusto Custom Sort Order? - azure-data-explorer

How do I perform a custom sort order in Kusto?
Example query:
//==================================================//
// Assign variables
//==================================================//
let varStart = ago(2d);
let varEnd = now();
let varStorageAccount = 'stgacctname';
//==================================================//
// Filter table
//==================================================//
StorageBlobLogs
| where TimeGenerated between (varStart .. varEnd)
and AccountName == varStorageAccount
| sort by OperationName
Need:
I want to put the various OperationNames (GetBlob, AppendFile, etc.) into a custom order.
Something like:
| sort by OperationName['GetBlob'], OperationName['AppendFile'], OperationName asc
Ideally I'd like to specify values to sort by then allow Kusto to order the remaining using asc/desc.
Is this possible?

Use an aux column, like this:
datatable(OperationName:string, SomethingElse:string)
[
"AppendFile", "3",
"GetBlob", "1",
"AppendFile", "4",
"GetBlob", "2"
]
| extend OrderPriority =
case(OperationName == "GetBlob", 1,
OperationName == "AppendFile", 2,
3)
| order by OrderPriority asc, SomethingElse asc
| project-away OrderPriority
Output:
OperationName
SomethingElse
GetBlob
1
GetBlob
2
AppendFile
3
AppendFile
4

Related

Kusto Query Dynamic sort Order

I have started working on Azure Data Explorer( Kusto) recently.
My requirement to make sorting order of Kusto table in dynamic way.
// Variable declaration
let SortColumn ="run_date";
let OrderBy="desc";
// Actual Code
tblOleMeasurments
| take 10
|distinct column1,column2,column3,run_date
|order by SortColumn OrderBy
Here My code working fine till Sortcolumn but when I tried to add [OrderBy] after [SortColumn] kusto gives me error .
My requirement here is to pass Asc/desc value from Variable [OrderBy].
Kindly assist here with workarounds and solutions which help me .
The sort column and order cannot be an expression, it must be a literal ("asc" or "desc"). If you want to pass the sort column and sort order as a variable, create a union instead where the filter on the variables results with the desired outcome. Here is an example:
let OrderBy = "desc";
let sortColumn = "run_date";
let Query = tblOleMeasurments | take 10 |distinct column1,column2,column3,run_date;
union
(Query | where OrderBy == "desc" and sortColumn == "run_date" | order by run_date desc),
(Query | where OrderBy == "asc" and sortColumn == "run_date" | order by run_date asc)
The number of union legs would be the product of the number of candidate sort columns times two (the two sort order options).
An alternative would be sorting by a calculated column, which is based on your sort_order and sort_column. The example below works for numeric columns
let T = range x from 1 to 5 step 1 | extend y = -10 * x;
let sort_order = "asc";
let sort_column = "y";
T
| order by column_ifexists(sort_column, "") * case(sort_order == "asc", -1, 1)

How to retreive custom property corresponding to another property in azure

I am trying to write a kusto query to retrieve a custom property as below.
I want to retrieve count of pkgName and corresponding organization. I could retrieve the count of pkgName and the code is attached below.
let mainTable = union customEvents
| extend name =replace("\n", "", name)
| where iif('*' in ("*"), 1 == 1, name in ("*"))
| where true;
let queryTable = mainTable;
let cohortedTable = queryTable
| extend dimension = customDimensions["pkgName"]
| extend dimension = iif(isempty(dimension), "<undefined>", dimension)
| summarize hll = hll(itemId) by tostring(dimension)
| extend Events = dcount_hll(hll)
| order by Events desc
| serialize rank = row_number()
| extend dimension = iff(rank > 10, 'Other', dimension)
| summarize merged = hll_merge(hll) by tostring(dimension)
| project ['pkgName'] = dimension, Counts = dcount_hll(merged);
cohortedTable
Please help me to get the organization along with each pkgName projected.
Please try this simple query:
customEvents
| summarize counts=count(tostring(customDimensions.pkgName)) by pkgName=tostring(customDimensions.pkgName),organization=tostring(customDimensions.organization)
Please feel free to modify it to meet your requirement.
If the above does not meet your requirement, please try to create another table which contains pkgName and organization relationship. Then use join operator to join these tables. For example:
//create a table which contains the relationship
let temptable = customEvents
| summarize by pkgName=tostring(customDimensions.pkgName),organization=tostring(customDimensions.organization);
//then use the join operator to join these tables on the keyword pkgName.

Kusto: How to convert table value to scalar and return from user defined function

I have the following user-defined functions with the intention of using a case conditional to output a table of 0s or 1s saying whether or not an account is active.
case needs scalar values as it's arguments, ie pro_account_active(account) and basic_account_active(account) need to be scalar values.
I'm struggling to get around the limitation of toscalar:
User-defined functions can't pass into toscalar() invocation
information that depends on the row-context in which the function is
called.
I think if there was a function I can use in place of the "??????" that would convert active to a scalar and return it from the function it would work.
Any help greatly appreciated
let basic_account_active=(account:string) {
basic_check_1(account) // returns 0 or 1 row only
| union basic_check_2(account)
| summarize result_count = count()
| extend active = iff(result_count == 2, 1, 0)
| ??????
};
let pro_account_active=(account:string) {
pro_check_1(account) // returns 0 or 1 row only
| union pro_check_2(account)
| summarize result_count = count()
| extend active = iff(result_count == 2, 1, 0)
| ??????
};
let is_active=(account_type:string, account:string) {
case(
account_type == 'pro', pro_account_active(account),
account_type == 'basic', basic_account_active(account),
-1
)
};
datatable(account_type:string, account:string)
[
'pro', '89e5678a92',
'basic', '9d8263da45',
'pro', '0b975f2454a',
'basic', '112a3f4753',
]
| extend result = is_active(account_type, account)
You can convert the output of a query to a scalar by using the toscalar() function, i.e.
let basic_account_active=(account:string) {
toscalar(basic_check_1(account) // returns 0 or 1 row only
| union basic_check_2(account)
| summarize result_count = count()
| extend active = iff(result_count == 2, 1, 0))};
From your example it looks that you have two tables per each account type and if both have entrees for a specific account, then the account is considered active. Is that correct? If so, I would use the "join" operator to find all the entrees in the applicable tables and count them. Here is an example of one way to do it (there are other ways as well).
let basicAccounts1 = datatable(account_type:string, account:string)[ 'basic', '9d8263da45', 'basic', '111111'];
let basicAccounts2 = datatable(account_type:string, account:string)[ 'basic', '9d8263da45', 'basic', '222222'];
let proAccounts1 = datatable(account_type:string, account:string)[ 'pro', '89e5678a92', 'pro', '111111'];
let proAccounts2 = datatable(account_type:string, account:string)[ 'pro', '89e5678a92', 'pro', '222222'];
let AllAccounts = union basicAccounts1, basicAccounts2, proAccounts1, proAccounts2
| summarize count() by account, account_type;
datatable(account_type:string, account:string)
[
'pro', '89e5678a92',
'basic', '9d8263da45',
'pro', '0b975f2454a',
'basic', '112a3f4753',
]
| join kind=leftouter hint.strategy=broadcast (AllAccounts) on account, account_type
| extend IsActive = count_ >=2
| project-away count_, account1, account_type1
The results are:

Application Insights query to get time between 2 custom events

I am trying to write a query that will get me the average time between 2 custom events, sorted by user session. I have added custom tracking events throughout this application and I want to query the time it takes the user from 'Setup' event to 'Process' event.
let allEvents=customEvents
| where timestamp between (datetime(2019-09-25T15:57:18.327Z)..datetime(2019-09-25T16:57:18.327Z))
| extend SourceType = 5;
let allPageViews=pageViews
| take 0;
let all = allEvents
| union allPageViews;
let step1 = materialize(all
| where name == "Setup" and SourceType == 5
| summarize arg_min(timestamp, *) by user_Id
| project user_Id, step1_time = timestamp);
let step2 = materialize(step1
| join
hint.strategy=broadcast (all
| where name == "Process" and SourceType == 5
| project user_Id, step2_time=timestamp
)
on user_Id
| where step1_time < step2_time
| summarize arg_min(step2_time, *) by user_Id
| project user_Id, step1_time,step2_time);
let 1Id=step1_time;
let 2Id=step2_time;
1Id
| union 2Id
| summarize AverageTimeBetween=avg(step2_time - step1_time)
| project AverageTimeBetween
When I run this query it produces this error message:
'' operator: Failed to resolve table or column or scalar expression named 'step1_time'
I am relatively new to writing queries with AI and have not found many resources to assist with this problem. Thank you in advance for your help!
I'm not sure what the let 1id=step1_time lines are intended to do.
those lines are trying to declare a new value, but step1_time isn't a thing, it was a field in another query
i'm also not sure why you're doing that pageviews | take 0 and unioning it with events?
let allEvents=customEvents
| where timestamp between (datetime(2019-09-25T15:57:18.327Z)..datetime(2019-09-25T16:57:18.327Z))
| extend SourceType = 5;
let step1 = materialize(allEvents
| where name == "Setup" and SourceType == 5
| summarize arg_min(timestamp, *) by user_Id
| project user_Id, step1_time = timestamp);
let step2 = materialize(step1
| join
hint.strategy=broadcast (allEvents
| where name == "Process" and SourceType == 5
| project user_Id, step2_time=timestamp
)
on user_Id
| where step1_time < step2_time
| summarize arg_min(step2_time, *) by user_Id
| project user_Id, step1_time,step2_time);
step2
| summarize AverageTimeBetween=avg(step2_time - step1_time)
| project AverageTimeBetween
if I remove the things I don't understand (like union with 0 pageviews, and the lets, I get a result, but I don't have your data so I had to use other values than "Setup" and "Process" so I don't know if it is what you expect?
you might want to look at the results of the step2 query without the summarize to just see what you're getting matches what you expect.

Select several event params in a single row for Firebase events stored in Google BigQuery

I'm trying to perform a very simple query for Firebase events stored in Google BigQuery but I´m not able to find a way to do it.
In the Android app, I´m logging an event like this:
Bundle params = new Bundle();
params.putInt("productID", productId);
params.putInt(FirebaseAnalytics.Param.VALUE, value);
firebaseAnalytics.logEvent("productEvent", params);
So, in BigQuery I have something like this:
___________________ _______________________ ____________________________
| event_dim.name | event_dim.params.key | event_dim.params.int_value |
|___________________|_______________________|____________________________|
| productEvent | productID | 25 |
| |_______________________|____________________________|
| | value | 1253 |
|___________________|_______________________|____________________________|
When I get the data from this table I get two rows:
___________________ _______________________ ____________________________
|event_dim.name | event_dim.params.key | event_dim.params.int_value |
|___________________|_______________________|____________________________|
| productEvent | productID | 25 |
| productEvent | value | 12353 |
But what I really need is a SELECT clause from this table to get the data as below:
___________________ _____________ _________
| name | productID | value |
|___________________|_____________|_________|
| productEvent | 25 | 12353 |
Any idea or suggestion?
You can pivot the values into columns like this
SELECT
event_dim.name as name,
MAX(IF(event_dim.params.key = "productID", event_dim.params.int_value, NULL)) WITHIN RECORD productID,
MAX(IF(event_dim.params.key = "value", event_dim.params.int_value, NULL)) WITHIN RECORD value,
FROM [events]
In case you want to generate this command using SQL, see this solution: Pivot Repeated fields in BigQuery
Using standard SQL (uncheck "Use Legacy SQL" under "Show Options" in the UI), you can express the query as:
SELECT
event_dim.name as name,
(SELECT value.int_value FROM UNNEST(event_dim.params)
WHERE key = "productID") AS productID,
(SELECT value.int_value FROM UNNEST(event_dim.params)
WHERE key = "value") AS value
FROM `dataset.mytable` AS t,
t.event_dim AS event_dim;
Edit: updated example to include int_value as part of value based on the comment below. Here is a self-contained example that demonstrates the approach as well:
WITH T AS (
SELECT ARRAY_AGG(event_dim) AS event_dim
FROM (
SELECT STRUCT(
"foo" AS name,
ARRAY<STRUCT<key STRING, value STRUCT<int_value INT64, string_value STRING>>>[
("productID", (10, NULL)), ("value", (5, NULL))
] AS params) AS event_dim
UNION ALL
SELECT STRUCT(
"bar" AS name,
ARRAY<STRUCT<key STRING, value STRUCT<int_value INT64, string_value STRING>>>[
("productID", (13, NULL)), ("value", (42, NULL))
] AS params) AS event_dim
)
)
SELECT
event_dim.name as name,
(SELECT value.int_value FROM UNNEST(event_dim.params)
WHERE key = "productID") AS productID,
(SELECT value.int_value FROM UNNEST(event_dim.params)
WHERE key = "value") AS value
FROM T AS t,
t.event_dim AS event_dim;

Resources