Kusto: Apply function on multiple column values during bag_unpack - azure-data-explorer

Given a dynamic field, say, milestones, it has value like: {"ta": 1655859586546, "tb": 1655859586646},
How do I print a table with columns like "ta", "tb" etc, with the single row as unixtime_milliseconds_todatetime(tolong(taValue)), unixtime_milliseconds_todatetime(tolong(tbValue)) etc.
I figured that I'll need to write a function that I can call, so I created this:-
let f = view(a:string ){
unixtime_milliseconds_todatetime(tolong(a))
};
I can use this function with a normal column as:- project f(columnName).
However, in this case, its a dynamic field, and the number of items in the list is large, so I do not want to enter the fields manually. This is what I have so far.
log_table
| take 1
| evaluate bag_unpack(milestones, "m_") // This gives me fields as columns
// | project-keep m_* // This would work, if I just wanted the value, however, I want `view(columnValue)
| project-keep f(m_*) // This of course doesn't work, but explains the idea.

Based on the mv-apply operator
// Generate data sample. Not part of the solution.
let log_table = materialize(range record_id from 1 to 10 step 1 | mv-apply range(1, 1 + rand(5), 1) on (summarize milestones = make_bag(pack_dictionary(strcat("t", make_string(to_utf8("a")[0] + toint(rand(26)))), 1600000000000 + rand(60000000000)))));
// Solution Starts here.
log_table
| mv-apply kv = milestones on
(
extend k = tostring(bag_keys(kv)[0])
| extend v = unixtime_milliseconds_todatetime(tolong(kv[k]))
| summarize milestones = make_bag(pack_dictionary(k, v))
)
| evaluate bag_unpack(milestones)
record_id
ta
tb
tc
td
te
tf
tg
th
ti
tk
tl
tm
to
tp
tr
tt
tu
tw
tx
tz
1
2021-07-06T20:24:47.767Z
2
2021-05-09T07:21:08.551Z
2022-07-28T20:57:16.025Z
2022-07-28T14:21:33.656Z
2020-11-09T00:54:39.71Z
2020-12-22T00:30:13.463Z
3
2021-12-07T11:07:39.204Z
2022-05-16T04:33:50.002Z
2021-10-20T12:19:27.222Z
4
2022-01-31T23:24:07.305Z
2021-01-20T17:38:53.21Z
5
2022-04-27T22:41:15.643Z
7
2022-01-22T08:30:08.995Z
2021-09-30T08:58:46.47Z
8
2022-03-14T13:41:10.968Z
2022-03-26T10:45:19.56Z
2022-08-06T16:50:37.003Z
10
2021-03-03T11:02:02.217Z
2021-02-28T09:52:24.327Z
2021-04-09T07:08:06.985Z
2020-12-28T20:18:04.973Z
9
2022-02-17T04:55:35.468Z
6
2022-08-02T14:44:15.414Z
2021-03-24T10:22:36.138Z
2020-12-17T01:14:40.652Z
2022-01-30T12:45:54.28Z
2022-03-31T02:29:43.114Z
Fiddle

Related

Parsing NSG Flowlogs in Azure Log Analytics Workspace to separate Public IP addresses

I have been updating a KQL query for use in reviewing NSG Flow Logs to separate the columns for Public/External IP addresses. However the data within each cell of the column contains additional information that needs to be parsed out so my excel addin can run NSLOOKUP against each cell and looking for additional insights. Later I would like to use the parse operator (https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/parseoperator) to separate this information to determine what that external IP address belongs to through nslookup, resolve-dnsname, whois , or other means.
However currently I am attempting to parse out the column, but is not comma delimited and instead uses a single space and multiple pipes. Below is my query and I would like to add a parse to this to either have a comma delimited string in a single cell [ for PublicIP (combination of Source and Destination), PublicSourceIP, and PublicDestIP. ] or break it out into multiple rows. How would parse be best used to separate this information, or is there a better operator to use to carry this out?
For Example the content could look like this
"20.xx.xx.xx|1|0|0|0|0|0 78.xxx.xxx.xxx|1|0|0|0|0|0"
AzureNetworkAnalytics_CL
| where SubType_s == 'FlowLog' and (FASchemaVersion_s == '1'or FASchemaVersion_s == '2')
| extend NSG = NSGList_s, Rule = NSGRule_s,Protocol=L4Protocol_s, Hits = (AllowedInFlows_d + AllowedOutFlows_d + DeniedInFlows_d + DeniedOutFlows_d)
| project-away NSGList_s, NSGRule_s
| project TimeGenerated, NSG, Rule, SourceIP = SrcIP_s, DestinationIP = DestIP_s, DestinationPort = DestPort_d, FlowStatus = FlowStatus_s, FlowDirection = FlowDirection_s, Protocol=L4Protocol_s, PublicIP=PublicIPs_s,PublicSourceIP = SrcPublicIPs_s,PublicDestIP=DestPublicIPs_s
// ## IP Address Filtering ##
| where isnotempty(PublicIP)
**| parse kind = regex PublicIP with * "|1|0|0|0|0|0" ipnfo ' ' *
| project ipnfo**
// ## port filtering
| where DestinationPort == '443'
Based on extract_all() followed by strcat_array() or mv-expand
let AzureNetworkAnalytics_CL = datatable (RecordId:int, PublicIPs_s:string)
[
1 ,"51.105.236.244|2|0|0|0|0|0 51.124.32.246|12|0|0|0|0|0 51.124.57.242|1|0|0|0|0|0"
,2 ,"20.44.17.10|6|0|0|0|0|0 20.150.38.228|1|0|0|0|0|0 20.150.70.36|2|0|0|0|0|0 20.190.151.9|2|0|0|0|0|0 20.190.151.134|1|0|0|0|0|0 20.190.154.137|1|0|0|0|0|0 65.55.44.109|2|0|0|0|0|0"
,3 ,"20.150.70.36|1|0|0|0|0|0 52.183.220.149|1|0|0|0|0|0 52.239.152.234|2|0|0|0|0|0 52.239.169.68|1|0|0|0|0|0"
];
// Option 1
AzureNetworkAnalytics_CL
| project RecordId, PublicIPs = strcat_array(extract_all("(?:^| )([^|]+)", PublicIPs_s),',');
// Option 2
AzureNetworkAnalytics_CL
| mv-expand with_itemindex=i PublicIP = extract_all("(?:^| )([^|]+)", PublicIPs_s) to typeof(string)
| project RecordId, i = i+1, PublicIP
Fiddle
Option 1
RecordId
PublicIPs
1
51.105.236.244,51.124.32.246,51.124.57.242
2
20.44.17.10,20.150.38.228,20.150.70.36,20.190.151.9,20.190.151.134,20.190.154.137,65.55.44.109
3
20.150.70.36,52.183.220.149,52.239.152.234,52.239.169.68
Option 2
RecordId
i
PublicIP
1
1
51.105.236.244
1
2
51.124.32.246
1
3
51.124.57.242
2
1
20.44.17.10
2
2
20.150.38.228
2
3
20.150.70.36
2
4
20.190.151.9
2
5
20.190.151.134
2
6
20.190.154.137
2
7
65.55.44.109
3
1
20.150.70.36
3
2
52.183.220.149
3
3
52.239.152.234
3
4
52.239.169.68
David answers your question. I would just like to add that I worked on the raw NSG Flow Logs and parsed them using kql in this way:
The raw JSON:
{"records":[{"time":"2022-05-02T04:00:48.7788837Z","systemId":"x","macAddress":"x","category":"NetworkSecurityGroupFlowEvent","resourceId":"/SUBSCRIPTIONS/x/RESOURCEGROUPS/x/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/x","operationName":"NetworkSecurityGroupFlowEvents","properties":{"Version":2,"flows":[{"rule":"DefaultRule_DenyAllInBound","flows":[{"mac":"x","flowTuples":["1651463988,0.0.0.0,192.168.1.6,49944,8008,T,I,D,B,,,,"]}]}]}}]}
kql parsing:
| mv-expand records
| evaluate bag_unpack(records)
| extend flows = properties.flows
| mv-expand flows
| evaluate bag_unpack(flows)
| mv-expand flows
| extend flowz = flows.flowTuples
| mv-expand flowz
| extend result=split(tostring(flowz), ",")
| extend source_ip=tostring(result[1])
| extend destination_ip=tostring(result[2])
| extend source_port=tostring(result[3])
| extend destination_port=tostring(result[4])
| extend protocol=tostring(result[5])
| extend traffic_flow=tostring(result[6])
| extend traffic_decision=tostring(result[7])
| extend flow_state=tostring(result[8])
| extend packets_src_to_dst=tostring(result[9])
| extend bytes_src_to_dst=tostring(result[10])
| extend packets_dst_to_src=tostring(result[11])
| extend bytes_dst_to_src=tostring(result[12])

Kusto query calculate 2 metric fields

I'm doing a query in Kusto on Azure to bring the memory fragmentation value of Redis, this value is obtained by dividing the RSS memory by the memory used, the problem is that I am not able to do the calculation using these two different fields because it is necessary to filter the value of the "Average" field of the "usedmemoryRss" and "usedmemory" fields when I do the filter on the extend line the query returns no value, the code looks like this:
AzureMetrics
| extend m1 = Average | where MetricName == "usedmemoryRss" and
| extend m2 = Average | where MetricName == "usedmemory"
| extend teste = m1 / m2
When I remove the "where" clauyse from the lines it divides the value of each record by itself and return 1. Is it possible to do that? Thank you in advance for your help.
Thanks for the answer Justin you gave me an idea and i solved this way
let m1 = AzureMetrics | where MetricName == "usedmemoryRss" | where Average != 0 | project Average;
let m2 = AzureMetrics | where MetricName == "usedmemory" | where Average != 0 | project Average;
print memory_fragmentation=toscalar(m1) / toscalar(m2)
let Average=datatable (MetricName:string, Value:long)
["usedmemoryRss", 10,
"usedmemory", "5"];
let m1=Average
| where MetricName =="usedmemoryRss" | project Value;
let m2=Average
| where MetricName =="usedmemory" | project Value;
print teste=toscalar(m1) / toscalar (m2)

KDB: Create Empty Table Using Dynamic Columns

I want to create an empty table with the following static columns:
date, security, active, horizon
and an undefined number of additional columns that are represented by the following variables:
outFactor, subFacCols
The columns represented by outFactor and subFacCols are float types. How can I create a dummy table with the aforementioned columns?
Example:
These are the first 5 columns, not including subFacCols
dummyTable:flip (`date`security`active`horizon,outFactor)!(`date$();`int$();`boolean$();`int$();`float$())
You need the key and value of the dictionary to be of the same length, therefore the following should work:
q)outFactor:`price`size
q)subFacCols:`bestBid
q)dummyTable:flip (`date`security`active`horizon,outFactor,subFacCols)!(`date$();`int$();`boolean$();`int$()),(count[outFactor]#`float$()),count[subFacCols]#`float$()
q)meta dummyTable
c | t f a
--------| -----
date | d
security| i
active | b
horizon | i
price | f
size | f
bestBid | f
Uses: https://code.kx.com/q/ref/lists/#take

SQLite find table row where a subset of columns satisfies a specified constraint

I have the following SQLite table
CREATE TABLE visits(urid INTEGER PRIMARY KEY AUTOINCREMENT,
hash TEXT,dX INTEGER,dY INTEGER,dZ INTEGER);
Typical content would be
# select * from visits;
urid | hash | dx | dY | dZ
------+-----------+-------+--------+------
1 | 'abcd' | 10 | 10 | 10
2 | 'abcd' | 11 | 11 | 11
3 | 'bcde' | 7 | 7 | 7
4 | 'abcd' | 13 | 13 | 13
5 | 'defg' | 20 | 21 | 17
What I need to do here is identify the urid for the table row which satisfies the constraint
hash = 'abcd' AND (nearby >= (abs(dX - tX) + abs(dY - tY) + abs(dZ - tZ))
with the smallest deviation - in the sense of smallest sum of absolute distances
In the present instance with
nearby = 7
tX = tY = tZ = 12
there are three rows that meet the above constraint but with different deviations
urid | hash | dx | dY | dZ | deviation
------+-----------+-------+--------+--------+---------------
1 | 'abcd' | 10 | 10 | 10 | 6
2 | 'abcd' | 11 | 11 | 11 | 3
4 | 'abcd' | 12 | 12 | 12 | 3
in which case I would like to have reported urid = 2 or urid = 3 - I don't actually care which one gets reported.
Left to my own devices I would fetch the full set of matching rows and then dril down to the one that matches my secondary constraint - smallest deviation - in my own Java code. However, I suspect that is not necessary and it can be done in SQL alone. My knowledge of SQL is sadly too limited here. I hope that someone here can put me on the right path.
I now have managed to do the following
CREATE TEMP TABLE h1(v1 INTEGER,v2 INTEGER);
SELECT urid,(SELECT (abs(dX - 12) + abs(dY - 12) + abs(dZ - 12))) devi FROM visits WHERE hash = 'abcd';
which gives
--SELECT * FROM h1
urid | devi |
-------+-----------+
1 | 6 |
2 | 3 |
4 | 3 |
following which I issue
select urid from h1 order by v2 asc limit 1;
which yields urid = 2, the result I am after. Whilst this works, I would like to know if there is a better/simpler way of doing this.
You're so close! You have all of the components you need, you just have to put them together into a single query.
Consider:
SELECT urid
, (abs(dx - :tx) + abs(dy - :tx) + abs(dz - :tx)) AS devi
FROM visits
WHERE hash=:hashval AND devi < :nearby
ORDER BY devi
LIMIT 1
Line by line, first you list the rows and computed values you want (:tx is a placeholder; in your code you want to prepare a statement and then bind values to the placeholders before executing the statement) from the visit table.
Then in the WHERE clause you restrict what rows get returned to those matching the particular hash (That column should have an index for best results... CREATE INDEX visits_idx_hash ON visits(hash) for example), and that have a devi that is less than the value of the :nearby placeholder. (I think devi < :nearby is clearer than :nearby >= devi).
Then you say that you want those results sorted in increasing order according to devi, and LIMIT the returned results to a single row because you don't care about any others (If there are no rows that meet the WHERE constraints, nothing is returned).

AdvancedDatagrid Iterating Through Each Row of the Open Leaf/Tree

I need to get the data for each row in an advanceddatagrid where the nodes are open.
For example, my ADG looks like this:
+ Science
- Math
- Passed
John Doe | A+ | Section C
Amy Rourke | B- | Section B
- Failed
Jane Doe | F | Section D
Mike Cones | F | Section D
- English
+ Passed
+ Failed
- History
+ Passed
- Failed
Lori Pea | F | Section C
I tried using the following code to get the open nodes:
var o:Object = new Object();
o = IHierarchicalCollectionView(myADG.dataProvider).openNodes;
But doing the following code to inspect the object:
Alert.show(ObjectUtil.toString(o), 'object inpsection');
Gives me:
(Object)#0
Math (2)
children = (mx.collections::ArrayCollection)#2
filterFunction = (null)
length = 2
list = (mx.collections::ArrayList)#3
length = 2
source = (Array)#4
[0] (Object)#5
children = (mx.collections::ArrayCollection)#6
filterFunction = (null)
length = 2
list = (mx.collections::ArrayList)#7
length = 2
source = (Array)#8
[0] <Table>
<Name>John Doe</Name>
<Grade>A+</Grade>
<Section>Section C</Section>
</Table>
[1] <Table>
<Name>Amy Rourke</Name>
<Grade>B-</Grade>
<Section>Section B</Section>
....
...
..
Basically, I just need to create an object or array or xmllist that would give me:
Math | Passed | John Doe | A+ | Section C
Math | Passed | Amy Rourke | B- | Section B
Math | Failed | Jane Doe | F | Section D
Math | Failed | Mike Cones | F | Section D
History | Failed | Lori Pea | F | Section C
Any suggestion would be highly appreciated. Thanks
You should be able to iterate across the openNodes object's properties and for each one grab the collection and concat the values onto a new array then use that as the source of another type of collection if necessary. Something like this:
var newArray:Array = [];
for(var property:String in o)
{
newArray = newArray.concat(o[property][0].source); //Passed, property is subject as in Math
newArray = newArray.concat(o[property][1].source); //Failed property is subject as in Math
}
The only real problem with this is you're trying to also keep the Math and passed or failed in the objects, otherwise the above should work. To get this other part working I think you need to break each of the statements above into it's own loop that iterates across the source of the openNodes object and puts the right values into a new Value Object you make up that has the subject and the pass or fail set on it. Then you could store these values as well, also notice I'm assuming the pass fail is always organized this way in the original data structure where in each subject you'll have two arrays and the first will be pass followed by fail.

Resources