Aggregate contents of array column in Azure Data Explorer

Aggregate contents of array column in Azure Data Explorer - azure-application-insights

I have some data stored in the customEvents property bag that I want to perform aggregation and analytics on. I'm new to the analytics queries that you can run against AppInsights using the Azure Data Explorer query language and am stuck.
I have a handle on converting the contents of one of the property bag's key-value-pairs into an array of numbers (in the example below, that output is represented by items.
let items = parse_json('{"operation_Id": "12345Z12", "days":[43, 21, 65]}');
print items.operation_Id, items.days;
However, when I need to calculate the average value of the items in the array for each operation_Id, I run into a documentation wall. I've looked at mvexpand, let (using a lambda expression), datatable, using dynamic data types, etc. The blocking issue I've had with using mvexpand is that I want to associate every row in the output correlated with its operation operation_Id, and mvexpand only seems to persist that relationship in the first row. With a datatable, the type doesn't support a pipeline input.
Another common error I've (including from the code sample below is Operator Source Expression should be table or column).
let items = parse_json('{"days":[43, 21, 65]}');
let arraySum = (T:(x: long))
{
T
| summarize sum(x)
};
items
| project days | invoke arraySum()
If necessary, I can perform the aggregation code in JavaScript and only pass the calculated average in the property bag, but it feels like a waste to throw away the raw data values. Is there some obvious calculation or aggregation function that solves this problem?

Both the following options should allow you to calculate the average as you're interested in:
(caveat: this is based on the example you've shown, which may potentially be "dumbed down" to not reflect your real-life scenario, so please clarify in case this isn't helpful):
let items = dynamic({"operation_Id": "12345Z12", "days":[43, 21, 65]});
print operationId = tostring(items.operation_Id), days = items.days
| mvexpand days to typeof(int)
| summarize avg(days) by operationId
// or
let items = dynamic({"operation_Id": "12345Z12", "days":[43, 21, 65]});
print operationId = tostring(items.operation_Id), days = items.days
| project operationId, series_stats_dynamic(days)['avg']
Your second example is indeed invalid (scalars and tabular arguments are not born equal), but could be rewritten as follows:
(same caveat as above)
let items = dynamic({"days":[43, 21, 65]});
let arraySum = (T:(x: long))
{
T
| summarize sum(x)
};
print items
| mvexpand x = items.days to typeof(long)
| invoke arraySum()
// or
let items = dynamic({"days":[43, 21, 65]});
print items
| project sum = series_stats_dynamic(items.days)["avg"] * array_length(items.days)
Updated examples following up the comments provided later on:
datatable (Operation_id:string, customDimensions:dynamic)
[
"MTFfq", dynamic({"siteId": "1", "fileCount": "3", "pendingDays":[15,10,11]}),
"LXVjk", dynamic({"siteId": "2", "fileCount": "1", "pendingDays":[3]}),
"jnySt", dynamic({"siteId": "3", "fileCount": "2", "pendingDays":[7,11]}),
"NoxoX", dynamic({"siteId": "4", "fileCount": "4", "pendingDays":[1,4,3,11]})
]
| mvexpand days = customDimensions.pendingDays to typeof(int)
| summarize avg(days) by Operation_id
// or
datatable (Operation_id:string, customDimensions:dynamic)
[
"MTFfq", dynamic({"siteId": "1", "fileCount": "3", "pendingDays":[15,10,11]}),
"LXVjk", dynamic({"siteId": "2", "fileCount": "1", "pendingDays":[3]}),
"jnySt", dynamic({"siteId": "3", "fileCount": "2", "pendingDays":[7,11]}),
"NoxoX", dynamic({"siteId": "4", "fileCount": "4", "pendingDays":[1,4,3,11]})
]
| project Operation_id, series_stats_dynamic(customDimensions.pendingDays)['avg']
and:
let arraySum = (T:(x: long))
{
T
| summarize sum(x)
};
datatable (Operation_id:string, customDimensions:dynamic)
[
"MTFfq", dynamic({"siteId": "1", "fileCount": "3", "pendingDays":[15,10,11]}),
"LXVjk", dynamic({"siteId": "2", "fileCount": "1", "pendingDays":[3]}),
"jnySt", dynamic({"siteId": "3", "fileCount": "2", "pendingDays":[7,11]}),
"NoxoX", dynamic({"siteId": "4", "fileCount": "4", "pendingDays":[1,4,3,11]})
]
| mvexpand x = customDimensions.pendingDays to typeof(long)
| invoke arraySum()
// or
datatable (Operation_id:string, customDimensions:dynamic)
[
"MTFfq", dynamic({"siteId": "1", "fileCount": "3", "pendingDays":[15,10,11]}),
"LXVjk", dynamic({"siteId": "2", "fileCount": "1", "pendingDays":[3]}),
"jnySt", dynamic({"siteId": "3", "fileCount": "2", "pendingDays":[7,11]}),
"NoxoX", dynamic({"siteId": "4", "fileCount": "4", "pendingDays":[1,4,3,11]})
]
| project Operation_id, sum = series_stats_dynamic(customDimensions.pendingDays)["avg"] * array_length(customDimensions.pendingDays)
Some references for operators/functions used above:
series_stats_dynamic
array_length
mvexpand

Related

How to sum values from object json jq

How can I sum values from json object with jq
Example input JSON object
{
"orderNumber": 2346999,
"workStep": 110,
"good": 8,
"bad": 0,
"type": "1",
"date": "2022-11-08T07:17:09",
"time": 0,
"result": 1
}
{
"orderNumber": 2346999,
"workStep": 110,
"good": 8,
"bad": 0,
"type": "1",
"date": "2022-11-08T07:26:57",
"time": 0,
"result": 1
}
jq condition
. | select(.orderNumber==2346999 and .workStep==110) | .good
result
8
8
and I liketo have
16

A simple approach using add to get the sum of the numbers.
Use map() with --slurp to create an array with just the .good and apply add:
map(select(.orderNumber==2346999 and .workStep==110).good) | add
Gives: 16
Online demo

An efficient approach that avoids the need to slurp is to use the general utility:
def count(s): reduce s as $_ (0; .+1);
With this, you can use your filter using jq's -n option instead of the -s option:
count(inputs | select(.orderNumber==2346999 and .workStep==110) | .good)
Notice that the leading . in your filter is unnecessary.

Limiting Azure data explorer update policy input

We have a use case where we are saving telemetry and statistic data from the machines but the update policy, which is supposed to process the raw data, is giving us trouble and running out of memory.
Aggregation over string column exceeded the memory budget of 8GB during evaluation
We have two tables, the 'ingest-table' where the data is initially being ingested to and the 'main-table' where it should end up.
We are in a process of migrating from another solution to ADX and have to ingest a high volume of data.
The raw data is in a matrix format, which means that one message from a machine will end up as multiple rows/records in the ADX database. We use mv-expand for the breakdown and the query is pretty much doing that, among with some data formatting.
So, our update policy looks like the following:
['ingest-table']
| mv-expand Counter = Data.Counters
| mv-expand with_itemindex = r Row = Rows
| mv-expand Column = Rows[r].Data
| project ...
I don't see any way how could I improve the processing query itself and I'm looking for a way to somehow limit the number of the record which the update policy function would receive.
I've tried playing around with the ingestion batching (MaximumNumberOfItems = 1000) and also sharding policy (MaxRowCount = 1000) for the 'ingest-table' but it does not have any effect on the number of records the update policy is pulling it at once.
My idea is to let only 1000 items at once to be processed by the update policy function because I've manually tested and it works fine to up to 5k record but fails closly above that.
Any suggestion what we could do in this case and how I can achieve that?
EDIT:
An example raw message which has to be processed by the update policy.
The number of rows the policy has to generate is the number of COUNTERS * ROWS * COLUMNS. In this case it would mean that we end up with ~1200 rows after this single message is processed.
I do not see any other way that doing a mv-expand here.
{
"Name": "StatisicName",
"TimeInterval": {
"StartUtc": 1654221156.285,
"EndUtc": 1654221216.286
},
"Legend": {
"RowLabels": [
"0",
"0.04",
"0.08",
"0.12",
"0.16",
"0.2",
"0.24",
"0.28",
"0.32",
"0.36",
"0.4",
"0.44",
"0.48",
"0.52",
"0.56",
"0.6",
"0.64",
"0.68",
"0.72",
"0.76",
"0.8",
"0.84",
"0.88",
"0.92",
"0.96",
"1",
"1.04",
"1.08",
"1.12",
"1.16",
"1.2",
"1.24",
"1.28",
"1.32",
"1.36",
"1.4",
"1.44",
"1.48",
"1.52",
"1.56",
"1.6",
"1.64",
"1.68",
"1.72",
"1.76",
"1.8",
"1.84",
"1.88",
"1.92",
"1.96"
],
"ColumnLabels": [
"Material1",
"Material2",
"Material3",
"Material4",
"Material5",
"Material6",
"Material7",
"Material8",
"Material9",
"Material10",
"Material11",
"Material12"
]
},
"Counters": [
{
"Type": "Cumulative",
"Matrix": {
"Rows": [
{
"Data": [
6.69771873292923,
0,
0,
0,
0.01994649920463562,
0.017650499296188355,
0.007246749711036683,
0.003443999862670899,
0.1422802443265915,
0,
0,
0.0008609999656677247
]
}
//,{...} ... for each row of the matrix
]
}
},
{
"Type": "Count",
"Matrix": {
"Rows": [
{
"Data": [
0.0001434999942779541,
0,
0,
0,
0.0001434999942779541,
0.0001434999942779541,
0.0001317590856552124,
0.0001434999942779541,
0.00014285165093031273,
0,
0,
0.0001434999942779541
]
}
//,{...} ... for each row of the matrix
]
}
}
]
}

The main issue I see in your code is this:
| mv-expand with_itemindex = r Row = Rows
| mv-expand Column = Rows[r].Data
You explode Rows and get the exploded values in a new column called Row, but then instead of working with Row.Data, you keep using the original unexploded Rows, traversing through the elements using r.
This leads to unnecessary duplication of Rows and it is probably what creates the memory pressure.
Check out the following code.
You can use the whole code and get the data formatted as a table with columns Material1, Material2 etc., or exclude the last 2 rows and simply get the exploded values, each in a separate row.
// Data sample generation. Not part of the solution
let p_matrixes = 3;
let p_columns = 12;
let p_rows = 50;
let ['ingest-table'] =
range i from 1 to p_matrixes step 1
| extend StartUtc = floor((ago(28d + rand()*7d) - datetime(1970))/1ms/1000,0.001)
| extend EndUtc = floor((ago(rand()*7d) - datetime(1970))/1ms/1000,0.001)
| extend RowLabels = toscalar(range x from todecimal(0) to todecimal(0.04 * (p_rows - 1)) step todecimal(0.04) | summarize make_list(tostring(x)))
| extend ColumnLabels = toscalar(range x from 1 to p_columns step 1 | summarize make_list(strcat("Material",tostring(x))))
| extend Counters_Cumulative = toscalar(range x from 1 to p_rows step 1 | mv-apply range(1, p_columns) on (summarize Data = pack_dictionary("Data", make_list(rand()))) | summarize make_list(Data))
| extend Counters_Count = toscalar(range x from 1 to p_rows step 1 | mv-apply range(1, p_columns) on (summarize Data = pack_dictionary("Data", make_list(rand()))) | summarize make_list(Data))
| project i, Data = pack_dictionary("Name", "StatisicName", "TimeInterval", pack_dictionary("StartUtc", StartUtc, "EndUtc",EndUtc), "Legend", pack_dictionary("RowLabels", RowLabels, "ColumnLabels", ColumnLabels), "Counters", pack_array(pack_dictionary("Type", "Cumulative", "Matrix", pack_dictionary("Rows", Counters_Cumulative)), pack_dictionary("Type", "Count", "Matrix", pack_dictionary("Rows", Counters_Count))))
;
// Solution starts here
// Explode values
['ingest-table']
| project Name = tostring(Data.Name), StartUtc = todecimal(Data.TimeInterval.StartUtc), EndUtc = todecimal(Data.TimeInterval.EndUtc), RowLabels = Data.Legend.RowLabels, ColumnLabels = Data.Legend.ColumnLabels, Counters = Data.Counters
| mv-apply Counters on (project Type = tostring(Counters.Type), Rows = Counters.Matrix.Rows)
| mv-apply RowLabels to typeof(decimal), Rows on (project RowLabels, Data = Rows.Data)
| mv-expand ColumnLabels to typeof(string), Data to typeof(real)
// Format as table
| evaluate pivot(ColumnLabels, take_any(Data))
| project-reorder Name, StartUtc, EndUtc, RowLabels, Type, * granny-asc
"Explode values" sample
Name
StartUtc
EndUtc
ColumnLabels
RowLabels
Type
Data
StatisicName
1658601891.654
1660953273.898
Material4
0.88
Count
0.33479977032253788
StatisicName
1658601891.654
1660953273.898
Material7
0.6
Cumulative
0.58620965468565811
StatisicName
1658801257.201
1660941025.56
Material1
0.72
Count
0.23164306814350025
StatisicName
1658601891.654
1660953273.898
Material4
1.68
Cumulative
0.47149864409592157
StatisicName
1658601891.654
1660953273.898
Material12
1.08
Cumulative
0.777589612330022
"Format as table" Sample
Name
StartUtc
EndUtc
RowLabels
Type
Material1
Material2
Material3
Material4
Material5
Material6
Material7
Material8
Material9
Material10
Material11
Material12
StatisicName
1658581605.446
1660891617.665
0.52
Cumulative
0.80568785763966921
0.69112398516227513
0.45844947991605256
0.87975011678339887
0.19607303271777138
0.76728212781319993
0.27520162657976527
0.48612400400362971
0.23810927904958085
0.53986865017468966
0.31225384042818344
0.99380179164514848
StatisicName
1658581605.446
1660891617.665
0.72
Count
0.77601864161716061
0.351768361021601
0.59345888695494731
0.92329751241805491
0.80811999338933449
0.49117503870065837
0.97871902062153937
0.94241064167069055
0.52950523227349289
0.39281849330041424
0.080759530370922858
0.8995622227351241
StatisicName
1658345203.482
1660893443.968
1.92
Count
0.78327575542772387
0.16795871437570925
0.01201541525964204
0.96029371013283549
0.60248327254185241
0.019315208353334352
0.4828009899119266
0.75923221663483853
0.29630236707606555
0.23977292819044668
0.94531978804572625
0.54626985282267437
StatisicName
1658345203.482
1660893443.968
1
Count
0.65268575186841382
0.61471913013853441
0.80536656853846211
0.380104887115314
0.84979344481966745
0.68790819414895632
0.80862491082567767
0.083687871352600765
0.16707928827946666
0.4071460045501768
0.94115460659910444
0.25011225557898314
StatisicName
1658581605.446
1660891617.665
1.6
Count
0.75532393959433786
0.71081551001527776
0.9757484452705758
0.55510969429009
0.055800808878012885
0.74924458240427783
0.78706505608871058
0.18745675452118818
0.70192553697345517
0.39429935579653647
0.4048784200404818
0.14888395753558561
Fiddle

Unable to retrieve key:values, getting error --> jq: error (at <stdin>:0): Cannot index number with string

Unable to extract key:value pairs, tried indexing the block.
<nd.com.citrix.netscaler.json" -X GET https://abcd.com/nitro/v1/config/lbpersistentsessions?args=vserver:puppet-vip.ta10.sd | jq '[.[] ] | .[3] | [.srcip]'
Getting the following error :
jq: error (at <stdin>:0): Cannot index array with string "srcip"
I need to extract the key:values as srcip and destip (see below)
<ion/vnd.com.citrix.netscaler.json" -X GET https://abcd.com/nitro/v1/config/lbpersistentsessions?args=vserver:somevip | jq '[.[] ] | .[3]' | more
[
{
"vserver": "somevip",
"type": "1",
"typestring": "SOURCEIP",
"srcip": “1.1.1.1",
"srcipv6": "::/0",
"destip": "2.2.2.2",
"destipv6": "::/0",
"flags": false,
"destport": 0,
"vservername": “somevip”,
"timeout": "0",
"referencecount": "0",
"persistenceparam": "1.1.1.1"
},
I had to use [.3] to index as the original output is :
<-Type:application/vnd.com.citrix.netscaler.json" -X GET https://abcd.com/nitro/v1/config/lbpersistentsessions?args=vserver:somevip | jq '[.[] ]' | more
[
0,
"Done",
"NONE",
[
{
"vserver": "somevip",
"type": "1",
"typestring": "SOURCEIP",
"srcip": “1.1.1.1”,
"srcipv6": "::/0",
"destip": "2.2.2.2",
"destipv6": "::/0",
"flags": false,
"destport": 0,
"vservername": "somevip",
"timeout": "0",
"referencecount": "0",
"persistenceparam": "1.1.1.1"
},
{
"vserver": "somevip",
"type": "1",
"typestring": "SOURCEIP",
"srcip": "3.3.3.3”,
"srcipv6": "::/0",
"destip": "4.4.4.4”,
"destipv6": "::/0",
"flags": false,
"destport": 0,
"vservername": "somevip",
"timeout": "0",
"referencecount": "0",
"persistenceparam": "1.1.1.1"
},
Also, tried this way and get the error :
<GET https://abcd.com/nitro/v1/config/lbpersistentsessions?args=vserver:somevip | jq -r '.[] | select(.vserver == "somevip") | .srcip'
jq: error (at <stdin>:0): Cannot index number with string "vserver"

After fixing some minor problems with the full JSON shown in the Q, the invocation:
jq '.[3][].srcip' input.json
yields:
"1.1.1.1"
"3.3.3.3"
Notes
.[3][].scrip is just a shortened form of: .[3] | .[] | .srcip
In your initial query, [.[]] effectively does nothing because the input is an array.

Is there an R library or function for formatting international currency strings?

Here's a snippet of the JSON data I'm working with:
{
"item" = "Mexican Thing",
...
"raised": "19",
"currency": "MXN"
},
{
"item" = "Canadian Thing",
...
"raised": "42",
"currency": "CDN"
},
{
"item" = "American Thing",
...
"raised": "1",
"currency": "USD"
}
You get the idea.
I'm hoping there's a function out there that can take in a standard currency abbreviation and a number and spit out the appropriate string. I could theoretically write this myself except I can't pretend like I know all the ins and outs of this stuff and I'm bound to spend days and weeks being surprised by bugs or edge cases I didn't think of. I'm hoping there's a library (or at least a web api) already written that can handle this but my Googling has yielded nothing useful so far.
Here's an example of the result I want (let's pretend "currency" is the function I'm looking for)
currency("USD", "32") --> "$32"
currency("GBP", "45") --> "£45"
currency("EUR", "19") --> "€19"
currency("MXN", "40") --> "MX$40"

Assuming your real json is valid, then it should be relatively simple. I'll provide a valid json string, fixing the three invalid portions here: = should be :; ... is obviously a placeholder; and it should be a list wrapped in [ and ]:
js <- '[{
"item": "Mexican Thing",
"raised": "19",
"currency": "MXN"
},
{
"item": "Canadian Thing",
"raised": "42",
"currency": "CDN"
},
{
"item": "American Thing",
"raised": "1",
"currency": "USD"
}]'
with(jsonlite::parse_json(js, simplifyVector = TRUE),
paste(raised, currency))
# [1] "19 MXN" "42 CDN" "1 USD"
Edit: in order to change to specific currency characters, don't make this too difficult: just instantiate a lookup vector where "USD" (for example) prepends "$" and appends "" (nothing) to the raised string. (I say both prepend/append because I believe some currencies are always post-digits ... I could be wrong.)
pre_currency <- Vectorize(function(curr) switch(curr, USD="$", GDP="£", EUR="€", CDN="$", "?"))
post_currency <- Vectorize(function(curr) switch(curr, USD="", GDP="", EUR="", CDN="", "?"))
with(jsonlite::parse_json(js, simplifyVector = TRUE),
paste0(pre_currency(currency), raised, post_currency(currency)))
# [1] "?19?" "$42" "$1"
I intentionally left "MXN" out of the vector here to demonstrate that you need a default setting, "?" (pre/post) here. You may choose a different default/unknown currency value.
An alternative:
currency <- function(val, currency) {
pre <- sapply(currency, switch, USD="$", GDP="£", EUR="€", CDN="$", "?")
post <- sapply(currency, switch, USD="", GDP="", EUR="", CDN="", "?")
paste0(pre, val, post)
}
with(jsonlite::parse_json(js, simplifyVector = TRUE),
currency(raised, currency))
# [1] "?19?" "$42" "$1"

jq query with condition and format output/labels

I have a JSON file:
[
{
"platform": "p1",
"id": "5",
"pri": "0",
"sec": "20"
}
]
[
{
"platform": "p2",
"id": "6",
"pri": "10",
"sec": "0"
}
]
I can to format it to the form:
$ jq -c '.[]|{PLATFORM: .platform, ID: .id, PRI: .pri, SEC: .sec}' test.json
{"PLATFORM":"p1","ID":"5","PRI":"0","SEC":"20"}
{"PLATFORM":"p2","ID":"6","PRI":"10","SEC":"0"}
$
but how to ignore SEC/PRI with "0" and get output in form:
PLATFORM:p1, ID:5, SEC:20
PLATFORM:p2, ID:6, PRI:10
I can process it with bash/awk command, but maybe someone have a solution with jq directly.
thank you,

You can use conditional statements to remove the unwanted keys, e.g.:
if (.sec == "0") then del(.sec) else . end
The formatting could be done with #tsv by converting the data to an array, e.g.:
filter.jq
.[] |
if (.sec == "0") then del(.sec) else . end |
if (.pri == "0") then del(.pri) else . end |
to_entries |
map("\(.key | ascii_upcase):\(.value)") |
#tsv
Run it like this:
jq -crf filter.jq test.json
Output:
PLATFORM:p1 ID:5 SEC:20
PLATFORM:p2 ID:6 PRI:10

jq solution:
jq -c 'def del_empty($k): if (.[$k]|tonumber > 0) then . else del(.[$k]) end;
.[] | {PLATFORM: .platform, ID: .id, PRI: .pri, SEC: .sec}
| del_empty("PRI")
| del_empty("SEC")' test.json
The output:
{"PLATFORM":"p1","ID":"5","SEC":"20"}
{"PLATFORM":"p2","ID":"6","PRI":"10"}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Aggregate contents of array column in Azure Data Explorer - azure-application-insights

Related

How to sum values from object json jq

Limiting Azure data explorer update policy input

Unable to retrieve key:values, getting error --> jq: error (at <stdin>:0): Cannot index number with string

Is there an R library or function for formatting international currency strings?

jq query with condition and format output/labels

Categories

Resources