How to summarize a dynamic object column? - azure-application-insights

Say I have an exceptions table which I know contains some data like the below, where details is a dynamic object
operation_id
details
1
{"cause": "sometext"}
1
{"other_info": 240}
1
{"message": "blabal" }
2
{"cause": "some other text"}
2
{"other_info": 88}
2
{"message": "blabal2" }
How can I query these results to be grouped by operation_id, but somehow aggregate everying in the details column, perhaps something like
operation_id
details_1
details_2
details_3
1
{"cause": "sometext"}
{"other_info": 240}
{"message": "blabal" }
2
{"cause": "some other text"}
{"other_info": 88}
{"message": "blabal2" }
or even just join all details into a single column
I tried doing it with summarize, but it just shows each entry on a separate line (since each details is unique):
exceptions
| where timestamp > now() - 10m
| summarize by operation_Id, dynamic_to_json(['details'])
Does anyone have any advice about this?

you can use the make_bag() aggregation function.
for example:
datatable(operation_id:int, details:dynamic)
[
1, dynamic({"cause": "sometext"}),
1, dynamic({"other_info": 240}),
1, dynamic({"message": "blabal" }),
2, dynamic({"cause": "some other text"}),
2, dynamic({"other_info": 88}),
2, dynamic({"message": "blabal2" }),
]
| summarize details = make_bag(details) by operation_id
operation_id
details
1
{ "cause": "sometext", "other_info": 240, "message": "blabal"}
2
{ "cause": "some other text", "other_info": 88, "message": "blabal2"}

I also got it working like this (using make_set())
exceptions
| project
operation_Id,
details
| summarize Details=make_set(details) by operation_Id
Although it returns details as an array of objects rather than a merged object

Related

Kusto complex json with array

This is my source format:
{
"message":[
{"name":"sensorID","value":"5"},
{"name":"eventT","value":"2021-04-16T19:11:26.149Z"},
{"name":"pressure","value":"150"}
]
}
Looking to flatten it out into a table:
sensorID
eventT
pressure
5
"2021-04-16T19:11:26.149Z"
150
Cannot for the life of me figure it out.
Splitting the array just gets me a more nested array:
test
| project ray=array_split(message, 1)
And using mv-expand gets me two separate rows:
test
| mv-expand message
At my wits end. any help greatly appreciated.
if the schema is unknown in advance, you could try something like this (using mv-apply, summarize make_bag() and bag_unpack())
datatable(d:dynamic)
[
dynamic({
"message":[
{"name":"sensorID","value":"5"},
{"name":"eventT","value":"2021-04-16T19:11:26.149Z"},
{"name":"pressure","value":"150"}
]}),
dynamic({
"message":[
{"name":"sensorID","value":"55"},
{"name":"eventT","value":"2021-03-16T19:11:26.149Z"},
{"name":"pressure","value":"1515"}
]})
]
| mv-apply d.message on (
summarize b = make_bag(pack(tostring(d_message.name), d_message.value))
)
| project b
| evaluate bag_unpack(b)
eventT
pressure
sensorID
2021-03-16 19:11:26.1490000
1515
55
2021-04-16 19:11:26.1490000
150
5

Set multiple threshold on a log based kusto query

I have set up a log-based alert in Microsoft Azure. The deployment of the alerts done via ARM template.
Where you can input your query and set threshold like below.
"triggerThresholdOperator": {
"value": "GreaterThan"
},
"triggerThreshold": {
"value": 0
},
"frequencyInMinutes": {
"value":15
},
"timeWindowInMinutes": {
"value": 15
},
"severityLevel": {
"value": "0"
},
"appInsightsQuery": {
"value": "exceptions\r\n| where A_ != '2000' \r\n| where A_ != '4000' \r\n| where A_ != '3000' "
}
As far as I understand we can only set threshold once ON an entire query.
Questions: I have multiple statements in my query which I am excluding since it's just a noise. But now I want to set a threshold on value 3000 to 5 and also want to set a time-window to 30 in the same query. meaning only exclude 3000 when it occurs 5 times in the last 30 minutes(when query get run).
exceptions
| where A_ != '2000'
| where A_ != '4000'
| where A_ != '3000'
I am pretty sure that I can't set a threshold like this in the query and the only workaround is to create a new alert just for value 3000 and set a threshold in ARM template. I haven't found any heavy threshold/time filters in Aure. Is there any way I can set multiple thresholds and time filters in a single query? which is again getting checked by different threshold and time filetrs in the ARM template.
Thanks.
I don't fully understand your question.
But for your time window question you could do something like
exceptions
| summarize count() by A_, bin(TimeGenerated, 30m)
That way you will get a count of A_ in blocks of 30 minutes.
Another way would be to do:
let Materialized = materialize(
exceptions
| summarize Count=count(A_) by bin(TimeGenerated, 30m)
); 
Materialized | where Count == 10
But then again it all depends on what you would like to achieve
You can easily set that in the query and fire based on the aggregate result.
exceptions
| where timestamp > ago(30m)
| summarize count2000 = countif(A_ == '2000'), count3000 = countif(A_ == '3000'), count4000 = countif(A_ == '4000')
| where count2000 > 5 or count3000 > 3 or count4000 > 4
If the number of results is greater than one than the aggregate condition applies.

Using both 'distinct' and 'project'

In Azure Data Explorer, I am trying to use both the 'project' and 'distinct' keywords.
The table records have 3 fields I want to use the 'project' on:
CowName
CowType
CowNum
CowLabel
But there are many other fields in the table such as Date, Measurement, etc, that I do not want to return.
Cows
| project CowName, CowType, CowNum, CowLabel
However, I want to avoid duplicate records of CowName and CowNum, so I included
Cows
| project CowName, CowType, CowNum, CowLabel
| distinct CowName, CowNum
But when I do this, the only columns that are returned are CowName and CowNum. I am now missing CowType and CowLabel entirely.
Is there a way to use both 'project' and 'distinct' without them interfering with each other?
Is there a different approach I should take?
You can do:
Cows
| distinct CowName, CowType, CowNum
or, if you don't want to have distinct values of CowType - and just have any value of it:
Cows
| summarize any(CowType) by CowName, CowNum
References:
Summarize operator: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/summarizeoperator
Distinct operator:https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/distinctoperator
any() aggregation function: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/any-aggfunction
You can use this
| summarize any(CowType, CowLabel) by CowName, CowNum
To visualize how this will work take the following sample table/query:
let CowTable = datatable(CowNum:int, CowName:string, CowType:string, CowLabel:string, DontWantThis:int)
[
1, "Bob", "Bull", "label1", 99,
2, "Tipsy", "Heifer", "label1", 98,
3, "Milly", "Heifer", "label2", 99,
4, "Bob", "Bull", "label2", 87,
4, "Bob", "Bull", "label2", 77,
2, "Hanna", "Heifer", "label1", 98,
];
CowTable
| summarize any(CowType, CowLabel) by CowName, CowNum
Results:
Note that we do not see CowNum 4 listed twice, however we do see CowNum 2 listed twice; this is because those rows are unique in regard to the CowName & CowNum. We also see Bob listed twice (not 3 times); this is because 2 of the Bob entries are unique in regard to CowName/CowNum, but 2 of the Bob entries are not unique in regard to CowName/CowNum.
If you truly only want results where the CowName is unique and the CowNum is also distinct you can do this in a 2-step summarize:
CowTable
| summarize any(CowName, CowType, CowLabel) by CowNum
| summarize any(CowNum, any_CowType, any_CowLabel) by any_CowName
//normalize column names
| project CowNum = any_CowNum, CowName = any_CowName, CowType = any_any_CowType, CowLabel = any_any_CowLabel
Results:

Transforming a list of objects into a table in Kusto

I am trying to get the json data (in form of a list of key-value pairs) in one of my data table cells and convert that into a dynamic table of sorts.
T
| where id == "xyz"
| project telem_obj
The data in the telem_obj cell is of the format
[
{
"Value": "SomeKey01",
"Key": "0"
},
{
"Value": "SomeKey02",
"Key": "1"
}
]
My end objective is to get a table of the form;
|Key | Value |
|SomeValue01 | 0 |
|SomeValue02 | 1 |
I have managed to do this by taking out the static data and creating atable out of it.
print EnumVals = dynamic(
[
{
"Value": "SomeKey01",
"Key": "0"
},
{
"Value": "SomeKey02",
"Key": "1"
}
]
)
| mvexpand EnumVals
| evaluate bag_unpack(EnumVals)
I am not sure how can I go about taking result of my query, extracting this list of json objects from it and convert it into a new dynamic table. I cannot find any example which works on a list of objects.
After a good night's sleep, i found how to do it
T
| take 1
| mvexpand telem_obj
| evaluate bag_unpack(telem_obj)
| project Value, Key
my mistake was I was trying to force the actual query inside a dynamic function.
print EnumVals = dynamic(
T
| where id == "xyz"
| project telem_obj
)
| mvexpand EnumVals
| evaluate bag_unpack(EnumVals)

Alert on error rate exceeding threshold using Azure Insights and/or Analytics

I'm sending customEvents to Azure Application Insights that look like this:
timestamp | name | customDimensions
----------------------------------------------------------------------------
2017-06-22T14:10:07.391Z | StatusChange | {"Status":"3000","Id":"49315"}
2017-06-22T14:10:14.699Z | StatusChange | {"Status":"3000","Id":"49315"}
2017-06-22T14:10:15.716Z | StatusChange | {"Status":"2000","Id":"49315"}
2017-06-22T14:10:21.164Z | StatusChange | {"Status":"1000","Id":"41986"}
2017-06-22T14:10:24.994Z | StatusChange | {"Status":"3000","Id":"41986"}
2017-06-22T14:10:25.604Z | StatusChange | {"Status":"2000","Id":"41986"}
2017-06-22T14:10:29.964Z | StatusChange | {"Status":"3000","Id":"54234"}
2017-06-22T14:10:35.192Z | StatusChange | {"Status":"2000","Id":"54234"}
2017-06-22T14:10:35.809Z | StatusChange | {"Status":"3000","Id":"54234"}
2017-06-22T14:10:39.22Z | StatusChange | {"Status":"1000","Id":"74458"}
Assuming that status 3000 is an error status, I'd like to get an alert when a certain percentage of Ids end up in the error status during the past hour.
As far as I know, Insights cannot do this by default, so I would like to try the approach described here to write an Analytics query that could trigger the alert. This is the best I've been able to come up with:
customEvents
| where timestamp > ago(1h)
| extend isError = iff(toint(customDimensions.Status) == 3000, 1, 0)
| summarize failures = sum(isError), successes = sum(1 - isError) by timestamp bin = 1h
| extend ratio = todouble(failures) / todouble(failures+successes)
| extend failure_Percent = ratio * 100
| project iff(failure_Percent < 50, "PASSED", "FAILED")
However, for my alert to work properly, the query should:
Return "PASSED" even if there are no events within the hour (another alert will take care of the absence of events)
Only take into account the final status of each Id within the hour.
As the request is written, if there are no events, the query returns neither "PASSED" nor "FAILED".
It also takes into account any records with Status == 3000, which means that the example above would return "FAILED" (5 out of 10 records have Status 3000), while in reality only 1 out of 4 Ids ended up in error state.
Can someone help me figure out the correct query?
(And optional secondary questions: Has anyone setup a similar alert using Insights? Is this a correct approach?)
As mentioned, since you're only querying on a singe hour your don't need to bin the timestamp, or use it as part of your aggregation at all.
To answer your questions:
The way to overcome no data at all would be to inject a synthetic row into your table which will translate to a success result if no other result is found
If you want your pass/fail criteria to be based on the final status for each ID, then you need to use argmax in your summarize - it will return the status corresponding to maximal timestamp.
So to wrap it all up:
customEvents
| where timestamp > ago(1h)
| extend isError = iff(toint(customDimensions.Status) == 3000, 1, 0)
| summarize argmax(timestamp, isError) by tostring(customDimensions.Id)
| summarize failures = sum(max_timestamp_isError), successes = sum(1 - max_timestamp_isError)
| extend ratio = todouble(failures) / todouble(failures+successes)
| extend failure_Percent = ratio * 100
| project Result = iff(failure_Percent < 50, "PASSED", "FAILED"), IsSynthetic = 0
| union (datatable(Result:string, IsSynthetic:long) ["PASSED", 1])
| top 1 by IsSynthetic asc
| project Result
Regarding the bonus question - you can setup alerting based on Analytics queries using Flow. See here for a related question/answer
I'm presuming that the query returns no rows if you have no data in the hour, because the timestamp bin = 1h (aka bin(timestamp,1h)) doesn't return any bins?
but if you're only querying the last hour, i don't think you need the bin on timestamp at all?
without having your data it's hard to repro exactly but... you could try something like (beware syntax errors):
customEvents
| where timestamp > ago(1h)
| extend isError = iff(toint(customDimensions.Status) == 3000, 1, 0)
| summarize totalCount = count(), failures = countif(isError == 1), successes = countif(isError ==0)
| extend ratio = iff(totalCount == 0, 0, todouble(failures) / todouble(failures+successes))
| extend failure_Percent = ratio * 100
| project iff(failure_Percent < 50, "PASSED", "FAILED")
hypothetically, getting rid of the hour binning should just give you back a single row here of
totalCount = 0, failures = 0, successes = 0, so the math for failure percent should give you back 0 failure ratio, which should get you "PASSED".
without being to try it i'm not sure if that works or still returns you no row if there's no data?
for your second question, you could use something like
let maxTimestamp = toscalar(customEvents where timestamp > ago(1h)
| summarize max(timestamp));
customEvents | where timestamp == maxTimestamp ...
// ... more query here
to get just the row(s) that have that have a timestamp of the last event in the hour?

Resources