Kusto: compare each row in a resultset with another table - azure-data-explorer

I have two tables:
EventsTable
And Subcategory table:
I expect to mark all rows in EventsTable with "dataflow" subcategory, because the keywords: cpu, dataflow and cpupct, belong to the subcategory dataflow.
I am looking for a query with a logic like this:
let Subcategory = datatable(subcategory:string, keywords:dynamic )
[
'saturacion', dynamic(["saturation","infrastructure"]),
'slow disk',dynamic(["low","disk","space"]),
'saturacion',dynamic(["using","win","use"]),
'saturacion',dynamic(["used","win","utilization","percentage"]),
'swap memory',dynamic(["swap","memory","usage"]),
'disk full',dynamic(["disk","free","size","filesystemspace"]),
'dataflow',dynamic(["cpu","dataflow","cpupct"])
];
let EventsTable = datatable(ID:string, category:string, words:dynamic )
[
'mcsc1','cpu',dynamic(["swap","memory","usage"]),
'mcsc2','cpu',dynamic(["disk","free","size","filesystemspace"]),
'mcsc3','cpu',dynamic(["cpu","dataflow","cpupct"])
];
EventsTable
| mv-apply Subcategory on
(
extend subcat=iff(
array_length(set_intersect(words, Subcategory.keywords)) == array_length(Subcategory.keywords),
Subcategory.subcategory, 'none')
)

You can try the following approach (though I'm not sure it's the most optimal way to solve this):
let Subcategory = datatable(subcategory:string, keywords:dynamic )
[
'saturacion', dynamic(["saturation","infrastructure"]),
'slow disk',dynamic(["low","disk","space"]),
'saturacion',dynamic(["using","win","use"]),
'saturacion',dynamic(["used","win","utilization","percentage"]),
'swap memory',dynamic(["swap","memory","usage"]),
'disk full',dynamic(["disk","free","size","filesystemspace"]),
'dataflow',dynamic(["cpu","dataflow","cpupct"])
];
let EventsTable = datatable(ID:string, category:string, words:dynamic )
[
'mcsc1','cpu',dynamic(["swap","memory","usage"]),
'mcsc2','cpu',dynamic(["disk","free","size","filesystemspace"]),
'mcsc3','cpu',dynamic(["cpu","dataflow","cpupct"])
];
EventsTable | extend Temp=1
| join kind=inner (Subcategory | extend Temp=1) on Temp
| extend subcat = iff(array_length(set_intersect(words, keywords)) == array_length(keywords), category, 'none')
| project-away Temp, Temp1

Related

Grouping by Username in MS Sentinel

How do I group by Username in Sentinel? I get confused with the project, extend, join, and summerize operations. Sanitized KQL Code and Picture attached. This is for mass downloads of files, but number has been lowered to trigger for MSP.
let threshold = 100;
let szSharePointFileOperation = "SharePointFileOperation";
let szOperations = dynamic(["FileDownloaded"]);
let starttime = 10m;
let endtime = 600m;
let historicalActivity =
OfficeActivity
| where TimeGenerated between(ago(starttime)..ago(endtime))
| where RecordType =~ szSharePointFileOperation
| where Operation in~ (szOperations)
| summarize historicalCount = count() by ClientIP, RecordType, Operation;
let recentActivity = OfficeActivity
| where TimeGenerated > ago(endtime)
| where RecordType =~ szSharePointFileOperation
| where Operation in~ (szOperations)
| summarize min(Start_Time), max(Start_Time), recentCount = count() by ClientIP, RecordType, Operation;
let RareIP = recentActivity | join kind= leftanti ( historicalActivity ) on ClientIP, RecordType, Operation
// More than 100 downloads/uploads from a new IP
| where recentCount > threshold;
OfficeActivity
| where TimeGenerated >= ago(endtime)
| where RecordType =~ szSharePointFileOperation
| where Operation in~ (szOperations)
| join kind= inner (RareIP) on ClientIP, RecordType, Operation
| where Start_Time between(min_Start_Time .. max_Start_Time)
| summarize StartTimeUtc = min(min_Start_Time), EndTimeUtc = max(max_Start_Time) by RecordType, Operation, UserType, UserId, ClientIP, OfficeWorkload, Site_Url, OfficeObjectId, UserAgent, IPSeenCount = recentCount
| extend timestamp = StartTimeUtc, AccountCustomEntity = UserId, IPCustomEntity = ClientIP, URLCustomEntity = Site_Url
| order by IPSeenCount desc, ClientIP asc, Operation asc, UserId asc
| where UserAgent <> "FileZip/1.0"
| where Site_Url contains "https://abc.sharepoint.com/sites/"
Results in image: https://ibb.co/QjW308q. I am trying to group by UserIDs or "AccountCustomIdentity" and not have seperate rows for each. Many thanks for any help!
If you are not interested to see the userIds, you can simply remove it from the "summarize" line here (this is the applicable line without it):
| summarize StartTimeUtc = min(min_Start_Time), EndTimeUtc = max(max_Start_Time) by RecordType, Operation, UserType, ClientIP, OfficeWorkload, Site_Url, OfficeObjectId, UserAgent, IPSeenCount = recentCount
Once you do this, these line will show and error under the UserId expressions:
| extend timestamp = StartTimeUtc, AccountCustomEntity = UserId, IPCustomEntity = ClientIP, URLCustomEntity = Site_Url
| order by IPSeenCount desc, ClientIP asc, Operation asc, UserId asc
You should remove these expressions from these lines as well and run the query. The UserIds will be gone.
As a side note, UserIds in this query is duplicated and show up twice in the results, once as UserId and once as AccountCustomEntity - this is strange.

Kusto Custom Sort Order?

How do I perform a custom sort order in Kusto?
Example query:
//==================================================//
// Assign variables
//==================================================//
let varStart = ago(2d);
let varEnd = now();
let varStorageAccount = 'stgacctname';
//==================================================//
// Filter table
//==================================================//
StorageBlobLogs
| where TimeGenerated between (varStart .. varEnd)
and AccountName == varStorageAccount
| sort by OperationName
Need:
I want to put the various OperationNames (GetBlob, AppendFile, etc.) into a custom order.
Something like:
| sort by OperationName['GetBlob'], OperationName['AppendFile'], OperationName asc
Ideally I'd like to specify values to sort by then allow Kusto to order the remaining using asc/desc.
Is this possible?
Use an aux column, like this:
datatable(OperationName:string, SomethingElse:string)
[
"AppendFile", "3",
"GetBlob", "1",
"AppendFile", "4",
"GetBlob", "2"
]
| extend OrderPriority =
case(OperationName == "GetBlob", 1,
OperationName == "AppendFile", 2,
3)
| order by OrderPriority asc, SomethingElse asc
| project-away OrderPriority
Output:
OperationName
SomethingElse
GetBlob
1
GetBlob
2
AppendFile
3
AppendFile
4

Kusto: How to convert table value to scalar and return from user defined function

I have the following user-defined functions with the intention of using a case conditional to output a table of 0s or 1s saying whether or not an account is active.
case needs scalar values as it's arguments, ie pro_account_active(account) and basic_account_active(account) need to be scalar values.
I'm struggling to get around the limitation of toscalar:
User-defined functions can't pass into toscalar() invocation
information that depends on the row-context in which the function is
called.
I think if there was a function I can use in place of the "??????" that would convert active to a scalar and return it from the function it would work.
Any help greatly appreciated
let basic_account_active=(account:string) {
basic_check_1(account) // returns 0 or 1 row only
| union basic_check_2(account)
| summarize result_count = count()
| extend active = iff(result_count == 2, 1, 0)
| ??????
};
let pro_account_active=(account:string) {
pro_check_1(account) // returns 0 or 1 row only
| union pro_check_2(account)
| summarize result_count = count()
| extend active = iff(result_count == 2, 1, 0)
| ??????
};
let is_active=(account_type:string, account:string) {
case(
account_type == 'pro', pro_account_active(account),
account_type == 'basic', basic_account_active(account),
-1
)
};
datatable(account_type:string, account:string)
[
'pro', '89e5678a92',
'basic', '9d8263da45',
'pro', '0b975f2454a',
'basic', '112a3f4753',
]
| extend result = is_active(account_type, account)
You can convert the output of a query to a scalar by using the toscalar() function, i.e.
let basic_account_active=(account:string) {
toscalar(basic_check_1(account) // returns 0 or 1 row only
| union basic_check_2(account)
| summarize result_count = count()
| extend active = iff(result_count == 2, 1, 0))};
From your example it looks that you have two tables per each account type and if both have entrees for a specific account, then the account is considered active. Is that correct? If so, I would use the "join" operator to find all the entrees in the applicable tables and count them. Here is an example of one way to do it (there are other ways as well).
let basicAccounts1 = datatable(account_type:string, account:string)[ 'basic', '9d8263da45', 'basic', '111111'];
let basicAccounts2 = datatable(account_type:string, account:string)[ 'basic', '9d8263da45', 'basic', '222222'];
let proAccounts1 = datatable(account_type:string, account:string)[ 'pro', '89e5678a92', 'pro', '111111'];
let proAccounts2 = datatable(account_type:string, account:string)[ 'pro', '89e5678a92', 'pro', '222222'];
let AllAccounts = union basicAccounts1, basicAccounts2, proAccounts1, proAccounts2
| summarize count() by account, account_type;
datatable(account_type:string, account:string)
[
'pro', '89e5678a92',
'basic', '9d8263da45',
'pro', '0b975f2454a',
'basic', '112a3f4753',
]
| join kind=leftouter hint.strategy=broadcast (AllAccounts) on account, account_type
| extend IsActive = count_ >=2
| project-away count_, account1, account_type1
The results are:

Select several event params in a single row for Firebase events stored in Google BigQuery

I'm trying to perform a very simple query for Firebase events stored in Google BigQuery but I´m not able to find a way to do it.
In the Android app, I´m logging an event like this:
Bundle params = new Bundle();
params.putInt("productID", productId);
params.putInt(FirebaseAnalytics.Param.VALUE, value);
firebaseAnalytics.logEvent("productEvent", params);
So, in BigQuery I have something like this:
___________________ _______________________ ____________________________
| event_dim.name | event_dim.params.key | event_dim.params.int_value |
|___________________|_______________________|____________________________|
| productEvent | productID | 25 |
| |_______________________|____________________________|
| | value | 1253 |
|___________________|_______________________|____________________________|
When I get the data from this table I get two rows:
___________________ _______________________ ____________________________
|event_dim.name | event_dim.params.key | event_dim.params.int_value |
|___________________|_______________________|____________________________|
| productEvent | productID | 25 |
| productEvent | value | 12353 |
But what I really need is a SELECT clause from this table to get the data as below:
___________________ _____________ _________
| name | productID | value |
|___________________|_____________|_________|
| productEvent | 25 | 12353 |
Any idea or suggestion?
You can pivot the values into columns like this
SELECT
event_dim.name as name,
MAX(IF(event_dim.params.key = "productID", event_dim.params.int_value, NULL)) WITHIN RECORD productID,
MAX(IF(event_dim.params.key = "value", event_dim.params.int_value, NULL)) WITHIN RECORD value,
FROM [events]
In case you want to generate this command using SQL, see this solution: Pivot Repeated fields in BigQuery
Using standard SQL (uncheck "Use Legacy SQL" under "Show Options" in the UI), you can express the query as:
SELECT
event_dim.name as name,
(SELECT value.int_value FROM UNNEST(event_dim.params)
WHERE key = "productID") AS productID,
(SELECT value.int_value FROM UNNEST(event_dim.params)
WHERE key = "value") AS value
FROM `dataset.mytable` AS t,
t.event_dim AS event_dim;
Edit: updated example to include int_value as part of value based on the comment below. Here is a self-contained example that demonstrates the approach as well:
WITH T AS (
SELECT ARRAY_AGG(event_dim) AS event_dim
FROM (
SELECT STRUCT(
"foo" AS name,
ARRAY<STRUCT<key STRING, value STRUCT<int_value INT64, string_value STRING>>>[
("productID", (10, NULL)), ("value", (5, NULL))
] AS params) AS event_dim
UNION ALL
SELECT STRUCT(
"bar" AS name,
ARRAY<STRUCT<key STRING, value STRUCT<int_value INT64, string_value STRING>>>[
("productID", (13, NULL)), ("value", (42, NULL))
] AS params) AS event_dim
)
)
SELECT
event_dim.name as name,
(SELECT value.int_value FROM UNNEST(event_dim.params)
WHERE key = "productID") AS productID,
(SELECT value.int_value FROM UNNEST(event_dim.params)
WHERE key = "value") AS value
FROM T AS t,
t.event_dim AS event_dim;

Join twice a relation with doctrine

I have a couple of tables:
table items: id, title
table properties: id_item, name, value
So an item has multiple properties (is an EAV).
Now i need to find which item has some properties, so i try to join the same relation multiple times:
$queryBuilder = $this->createQueryBuilder('i')
->join('i.properties', 'p');
$i = 0;
foreach ($properties as $name=>$value) {
$queryBuilder->join('i.properties', 'p'.$i)
->andWhere("p{$i}.name = :name".$i)
->setParameter(':name'.$i, $name)
->andWhere("p{$i}.value = :value".$i)
->setParameter(':value'.$i, $value);
$i = $i + 1;
}
return $queryBuilder->getQuery()->getResult();
But this doesn't work because the join is not repeated by doctrine, it uses always the same.
UPDATE:
To be more clear, if I join only once the table item and properties I get:
id | title | name | value
1 | t shirt | color| red
1 | t shirt | size | large
But if i need to search the items that have property color=red and size=large, i need to join it twice, so that i can make more where condition on different columns:
id | title | name1 | value1 | name2 | value2
1 | t shirt | color | red | size | large
The SQL generate actually is something like:
SELECT m0_.id AS id0
FROM item m0_
INNER JOIN properties m1_ ON m0_.id = m1_.item_id
WHERE m1_.name = 'Color' AND m1_.value = 'red' AND m1_.name = 'Size' AND m1_.value = 'Large'
But obviously m1_.name cannot be Color and Size at the same time.

Resources