Summarize 2 sets into 1 set per user KQL

Summarize 2 sets into 1 set per user KQL - azure-data-explorer

What would be the proper way to summarize 2 sets into 1 set by user?
For example, in the picture below:
I want to create a new set (the column that has the question mark) combining the X_locations and Y_Locations columns by User.
I did try strcat_array, but I am not sure those results will work, is anyone aware of a proper way to do this?, I envision something like this?:
| summarize whateverSetUnionFunctionHere(X_Locations,Y_Locations) by User

You can use the make_set() function, it will create a distinct set from all the sets in the input.

It seems you are looking for the combination of #Avnera & #Yoni K. answers
datatable(User:string, X_locations:dynamic, Y_locations:dynamic)
[
"user1", dynamic(["a"]), dynamic(["a"]),
"user2", dynamic(["b","c"]), dynamic(["c"]),
"user2", dynamic(["b"]), dynamic(["b","d"]),
]
| summarize make_set(set_union(X_locations, Y_locations)) by User
User
set_
user1
["a"]
user2
["b","c","d"]
Fiddle
P.S.
There are multiple variations for that, E.g. set_union could be replaced by array_concat

You can use the set_union() function.
For example:
datatable(User:string, X_locations:dynamic, Y_locations:dynamic)
[
"user1", dynamic(["a"]), dynamic(["a"]),
"user2", dynamic(["b"]), dynamic(["c"]),
"user2", dynamic(["b"]), dynamic(["b"]),
]
| extend result = set_union(X_locations, Y_locations)
User
X_locations
Y_locations
result
user1
[ "a"]
[ "a"]
[ "a"]
user2
[ "b"]
[ "c"]
[ "b", "c"]
user2
[ "b"]
[ "b"]
[ "b"]

Related

How to summarize a dynamic object column?

Say I have an exceptions table which I know contains some data like the below, where details is a dynamic object
operation_id
details
1
{"cause": "sometext"}
1
{"other_info": 240}
1
{"message": "blabal" }
2
{"cause": "some other text"}
2
{"other_info": 88}
2
{"message": "blabal2" }
How can I query these results to be grouped by operation_id, but somehow aggregate everying in the details column, perhaps something like
operation_id
details_1
details_2
details_3
1
{"cause": "sometext"}
{"other_info": 240}
{"message": "blabal" }
2
{"cause": "some other text"}
{"other_info": 88}
{"message": "blabal2" }
or even just join all details into a single column
I tried doing it with summarize, but it just shows each entry on a separate line (since each details is unique):
exceptions
| where timestamp > now() - 10m
| summarize by operation_Id, dynamic_to_json(['details'])
Does anyone have any advice about this?

you can use the make_bag() aggregation function.
for example:
datatable(operation_id:int, details:dynamic)
[
1, dynamic({"cause": "sometext"}),
1, dynamic({"other_info": 240}),
1, dynamic({"message": "blabal" }),
2, dynamic({"cause": "some other text"}),
2, dynamic({"other_info": 88}),
2, dynamic({"message": "blabal2" }),
]
| summarize details = make_bag(details) by operation_id
operation_id
details
1
{ "cause": "sometext", "other_info": 240, "message": "blabal"}
2
{ "cause": "some other text", "other_info": 88, "message": "blabal2"}

I also got it working like this (using make_set())
exceptions
| project
operation_Id,
details
| summarize Details=make_set(details) by operation_Id
Although it returns details as an array of objects rather than a merged object

jq: list users belonging to a specific group in array

input json:
[
{
"user": "u1"
},
{
"user": "u2",
"groups": [
{
"id": "100001",
"name": "G1"
},
{
"id": "100002",
"name": "G2"
}
]
},
{
"user": "u3",
"groups": [
{
"id": "100001",
"name": "G1"
}
]
}
]
I want to find all users belonging to specific group (searching by group name or group id in the groups array)
$ jq -r '.[]|select(.groups[].name=="G1" | .user)' json
jq: error (at json:27): Cannot iterate over null (null)
Desired output format when searching of example group G1 would be:
u2
u3
Additional question:
Is it possible to produce comma-separated output u2,u3 without using external utilities like tr?

Better enter your serach data from parameters using --arg and use any to avoid duplicate outputs if both inputs match:
jq -r --arg id "" --arg name "G1" '
.[] | select(.groups | map(.id == $id or .name == $name) | any)? | .user
'
u2
u3
Demo

Using ? as the Optional Object Identifier-Index operator, you could do a select as below
map(select(.groups[].name == "G1")? | .user)
and un-wrap the results from the array by using [] at the end of the filter. To combine multiple selection conditions use the boolean operators with and/or inside the select statement
See demo on jqplay

jq - find duplicates in a value which is nested array of strings

Assuming the below input, how can I detect the presence of duplicates in the replicas list? (replicas":[5,5,6]")
{"version":1,
"partitions":
[{"topic":"mytopic1","partition":3,"replicas":[4,5],"log_dirs":["any","any"]},
{"topic":"mytopic1","partition":1,"replicas":[5,5,6],"log_dirs":["any","any"]},
{"topic":"mytopic2","partition":2,"replicas":[6,5],"log_dirs":["any","any"]}]
}

This one will give you an array of just the partitions with duplicates in the replicas field:
jq '[.partitions[] | select((.replicas | length) != (.replicas | unique | length))]' input.json
Pretty-printed example output:
[
{
"topic": "mytopic1",
"partition": 1,
"replicas": [
5,
5,
6
],
"log_dirs": [
"any",
"any"
]
}
]

Set multiple threshold on a log based kusto query

I have set up a log-based alert in Microsoft Azure. The deployment of the alerts done via ARM template.
Where you can input your query and set threshold like below.
"triggerThresholdOperator": {
"value": "GreaterThan"
},
"triggerThreshold": {
"value": 0
},
"frequencyInMinutes": {
"value":15
},
"timeWindowInMinutes": {
"value": 15
},
"severityLevel": {
"value": "0"
},
"appInsightsQuery": {
"value": "exceptions\r\n| where A_ != '2000' \r\n| where A_ != '4000' \r\n| where A_ != '3000' "
}
As far as I understand we can only set threshold once ON an entire query.
Questions: I have multiple statements in my query which I am excluding since it's just a noise. But now I want to set a threshold on value 3000 to 5 and also want to set a time-window to 30 in the same query. meaning only exclude 3000 when it occurs 5 times in the last 30 minutes(when query get run).
exceptions
| where A_ != '2000'
| where A_ != '4000'
| where A_ != '3000'
I am pretty sure that I can't set a threshold like this in the query and the only workaround is to create a new alert just for value 3000 and set a threshold in ARM template. I haven't found any heavy threshold/time filters in Aure. Is there any way I can set multiple thresholds and time filters in a single query? which is again getting checked by different threshold and time filetrs in the ARM template.
Thanks.

I don't fully understand your question.
But for your time window question you could do something like
exceptions
| summarize count() by A_, bin(TimeGenerated, 30m)
That way you will get a count of A_ in blocks of 30 minutes.
Another way would be to do:
let Materialized = materialize(
exceptions
| summarize Count=count(A_) by bin(TimeGenerated, 30m)
); 
Materialized | where Count == 10
But then again it all depends on what you would like to achieve

You can easily set that in the query and fire based on the aggregate result.
exceptions
| where timestamp > ago(30m)
| summarize count2000 = countif(A_ == '2000'), count3000 = countif(A_ == '3000'), count4000 = countif(A_ == '4000')
| where count2000 > 5 or count3000 > 3 or count4000 > 4
If the number of results is greater than one than the aggregate condition applies.

How to get the index path of found values using jq?

Say I have a JSON like this:
{
"json": [
"a",
[
"b",
"c",
[
"d",
"foo",
1
],
[
[
42,
"foo"
]
]
]
]
}
And I want an array of jq index paths that contain foo:
[
".json[1][2][1]",
".json[1][3][0][1]"
]
Can I achieve this using jq and how?
I tried recurse | .foo to get the matches first but I receive an error: Cannot index array with string "foo".

First of all, I'm not sure what is the purpose of obtaining an array of jq programs. While means of doing this exist, they are seldom necessary; jq does not provide any sort of eval command.
jq has the concept of a path, which is an array of strings and numbers representing the position of an element in a JSON; this is equivalent to the strings on your expected output. As an example, ".json[1][2][1]" would be represented as ["json", 1, 2, 1]. The standard library contains several functions that operate with this concept, such as getpath, setpath, paths and leaf_paths.
We can thus obtain all leaf paths in the given JSON and iterate through them, select those for which their value in the input JSON is "foo", and generate an array out of them:
jq '[paths as $path | select(getpath($path) == "foo") | $path]'
This will return, for your given input, the following output:
[
["json", 1, 2, 1],
["json", 1, 3, 0, 1]
]
Now, although it should not be necessary, and it is most likely a sign that you're approaching whatever problem you are facing in the wrong way, it is possible to convert these arrays to the jq path strings you seek by transforming each path through the following script:
".\(map("[\(tojson)]") | join(""))"
The full script would therefore be:
jq '[paths as $path | select(getpath($path) == "foo") | $path | ".\(map("[\(tojson)]") | join(""))"]'
And its output would be:
[
".[\"json\"][1][2][1]",
".[\"json\"][1][3][0][1]"
]

Santiago's excellent program can be further tweaked to produce output in the requested format:
def jqpath:
def t: test("^[A-Za-z_][A-Za-z0-9_]*$");
reduce .[] as $x
("";
if ($x|type) == "string"
then . + ($x | if t then ".\(.)" else ".[" + tojson + "]" end)
else . + "[\($x)]"
end);
[paths as $path | select( getpath($path) == "foo" ) | $path | jqpath]
jq -f wrangle.jq input.json
[
".json[1][2][1]",
".json[1][3][0][1]"
]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Summarize 2 sets into 1 set per user KQL - azure-data-explorer

You can use the make_set() function, it will create a distinct set from all the sets in the input.

Related

How to summarize a dynamic object column?

jq: list users belonging to a specific group in array

jq - find duplicates in a value which is nested array of strings

Set multiple threshold on a log based kusto query

How to get the index path of found values using jq?

Categories

Resources