Kusto - extract key value from the Kusto table result - azure-data-explorer

How do I extract a set of key value from Kusto Table result. I have a Data field (column in Kusto table) that has log details (15 lines with time stamp). Out of these 15 lines, the last 3 lines has a key value pair which I will need to use in the Query to filter and display results.
Do I use the below method can you give some examples
Extract values on column with strings sharing the same format or pattern -
Example Values from column are (last 3 lines from the Data section)
2021-09-05T06:42:19.2287304Z VMtype - C4
2021-09-05T06:42:19.2287304Z patchsizeMB - 2533```
I am trying with the below Query is this the right way of doing
```| parse Data with * "Virtual machine =" Virtual machine
| parse Data with * "VMtype=" VMtype
| parse Data with * "patchsizeMB=" patchsizeMB```

The way that you implemented it is fine, easiest to understand, and maintain.
Here is a full example, where the results are returned in one row for each timestamp:
datatable(Data:string)["2021-09-05T06:42:19.2287304Z VMtype - C4",
"2021-09-05T06:42:19.2287304Z patchsizeMB - 2533",
"2021-09-05T06:42:19.2287304Z Virtual machine - VM_Name"]
| parse Data with Timestemp:datetime " " *
| parse Data with * "Virtual machine -" VirtualMachine
| parse Data with * "VMtype -" VMtype
| parse Data with * "patchsizeMB -" patchsizeMB
| summarize take_any(VirtualMachine), take_any(VMtype), take_any(patchsizeMB) by bin(Timestemp, 1d)
Results:
Timestemp
VirtualMachine
VMtype
patchsizeMB
2021-09-05
VM_Name
C4
2533
You can also implement it by splitting the string into an array and working on the array to populate the appliable values but it is much more code and probably more fragile.

Related

Mariadb get string in binary format BE

If I am run query:
SELECT HEX(BINARY(CONVERT('ßÁÁÁÁÁȵ$€Łß' USING ucs2)));
I am get:
00DF00C100C100C100C100C1010C00B5002420AC014100DF
and I suppose that sequence is BE, because in txt file in UTF-16 BE is the same sequence.
How to get sequence in UTF-16 LE?
You ask why I want LE? Because the query on MS SQL server:
SELECT CONVERT(varbinary(100), N'ßÁÁÁÁÁȵ$€Łß',0)
return:
0xDF00C100C100C100C100C1000C01B5002400AC204101DF00
Thank
Jaroslav
You need to cast with a little endian character set:
SELECT HEX(BINARY(CONVERT('ßÁÁÁÁÁȵ$€Łß' USING utf16le)));
+----------------------------------------------------------------+
| HEX(BINARY(CONVERT('ßÁÁÁÁÁȵ$€Łß' USING utf16le))) |
+----------------------------------------------------------------+
| DF00C100C100C100C100C1000C01B5002400AC204101DF00 |
+----------------------------------------------------------------+

In KQL how can I use bag_unpack to turn a serialized dictionary object in customDimensions into columns?

I'm trying to write a KQL query that will, among other things, display the contents of a serialized dictionary called Tags which has been added to the Application Insights traces table customDimensions column by application logging.
An example of the serialized Tags dictionary is:
{
"Source": "SAP",
"Destination": "TC",
"SAPDeliveryNo": "0012345678",
"PalletID": "(00)312340123456789012(02)21234987654(05)123456(06)1234567890"
}
I'd like to use evaluate bag_unpack(...) to evaluate the JSON and turn the keys into columns. We're likely to add more keys to the dictionary as the project develops and it would be handy not to have to explicitly list every column name in the query.
However, I'm already using project to reduce the number of other columns I display. How can I use both a project statement, to only display some of the other columns, and evaluate bag_unpack(...) to automatically unpack the Tags dictionary into columns?
Or is that not possible?
This is what I have so far, which doesn't work:
traces
| where datetime_part("dayOfYear", timestamp) == datetime_part("dayOfYear", now())
and message has "SendPalletData"
| extend TagsRaw = parse_json(customDimensions.["Tags"])
| evaluate bag_unpack(TagsRaw)
| project timestamp, message, ActionName = customDimensions.["ActionName"], TagsRaw
| order by timestamp desc
When it runs it displays only the columns listed in the project statement (including TagsRaw, so I know the Tags exist in customDimensions).
evaluate bag_unpack(TagsRaw) doesn't automatically add extra columns to the result set unpacked from the Tags in customDimensions.
EDIT: To clarify what I want to achieve, these are the columns I want to output:
timestamp
message
ActionName
TagsRaw
Source
Destination
SAPDeliveryNo
PalletID
EDIT 2: It turned out a major part of my problem was that double quotes within the Tags data are being escaped. While the Tags as viewed in the Azure portal looked like normal JSON, and copied out as normal JSON, when I copied out the whole of a customDimensions record the Tags looked like "Tags": "{\"Source\":\"SAP\",\"Destination\":\"TC\", ... with the double quotes escaped with backslashes.
The accepted answer from David Markovitz handles this situation in the line:
TagsRaw = todynamic(tostring(customDimensions["Tags"]))
A few comments:
When filtering on timestamp, better use the timestamp column As Is, and do the manipulations on the other side of the equation.
When using the has[...] operators, prefer the case-sensitive one (if feasable)
Everything extracted from dynamic value is also dynamic, and when given a dynamic value parse_json() (or its equivalent, todynamic()), simply returns it, As Is.
Therefore, we need to treet customDimensions.["Tags"] in 2 steps:
1st, convert it to string. 2nd, convert the result to dynamic.
To reference a field within a dynamic type you can use X.Y, X["Y"], or "X['Y'].
No need to combine them as you did with customDimensions.["Tags"].
As the bag_unpack plugin doc states:
"The specified input column (Column) is removed."
In other words, TagsRaw does not exist following the bag_unpack operation.
Please note that you can add prefix to the columns generated by bag_unpack. Might make it easier to differentiate them from the rest of the columns.
While you can use project, using project-away is sometimes easier.
// Data sample generation. Not part of the solution.
let traces =
print c1 = "some columns"
,c2 = "we"
,c3 = "don't need"
,timestamp = ago(now()%1d * rand())
,message = "abc SendPalletData xyz"
,customDimensions = dynamic
(
{
"Tags":"{\"Source\":\"SAP\",\"Destination\":\"TC\",\"SAPDeliveryNo\":\"0012345678\",\"PalletID\":\"(00)312340123456789012(02)21234987654(05)123456(06)1234567890\"}"
,"ActionName":"Action1"
}
)
;
// Solution starts here
traces
| where timestamp >= startofday(now())
and message has_cs "SendPalletData"
| extend TagsRaw = todynamic(tostring(customDimensions["Tags"]))
,ActionName = customDimensions.["ActionName"]
| project-away c*
| evaluate bag_unpack(TagsRaw, "TR_")
| order by timestamp desc
timestamp
message
ActionName
TR_Destination
TR_PalletID
TR_SAPDeliveryNo
TR_Source
2022-08-27T04:15:07.9337681Z
abc SendPalletData xyz
Action1
TC
(00)312340123456789012(02)21234987654(05)123456(06)1234567890
0012345678
SAP
Fiddle
If I understand correctly, you want to use project to limit the number of columns that are displayed, but you also want to include all of the unpacked columns from TagsRaw, without naming all of the tags explicitly.
The easiest way to achieve this is to switch the order of your steps, so that you first do the project (including the TagsRaw column) and then you unpack the tags. If desired, you can then use project-away to specifically remove the TagsRaw column after you've unpacked it.

Using Flatten to select where var1 (non-repeated) = "abc" from a bigquery table which contains multiple nested variables?

I'm fairly new to BigQuery (3rd day of using it with no training), I'm just trying to get my head around nested fields etc.
I've looked at the following resources and used the personsdata example on the google bigquery docs link
https://cloud.google.com/bigquery/docs/data
https://chartio.com/resources/tutorials/how-to-flatten-data-using-google-bigquerys-legacy-vs-standard-sql/
I'd like to run the below query:
select *
from [dataset.tableid]
where fullname = 'John Doe'
If I run this, I get the following error:
Error: Cannot output multiple independently repeated fields at the same time. Found children_age and citiesLived_place
From reading the above articles this isn't possible because you need to flatten the results, which from what I can understand just duplicates all the none repeated variables i.e.
Fullname | age | gender | Children.name | children.age
John Doe | 22 | Male | John | 5
John Doe | 22 | Male | Jane | 7
One of the above articles suggests that you can still use the where statements by using the flatten function in bigquery:
select fullname,
age,
gender,
citiesLived.place
FROM (FLATTEN([dataset.tableId], children))
WHERE
(citiesLived.yearLived > 1995) AND
(children.age > 3)
GROUP BY fullName, age, gender, citiesLived.place
If I change this to:
select *
FROM (FLATTEN([dataset.tableId], children))
WHERE fullname = 'John Doe'
Then this works fine and gives me what I need however if I change to this:
select *
FROM (FLATTEN([dataset.tableId], citieslived))
WHERE fullname = 'John Doe'
Then I get the following error:
Error: Cannot output multiple independently repeated fields at the same time. Found children_age and citiesLived_yearsLived
Can someone explain why this will work flattening based on "Children" but not "CitiesLived" and how to know what variables to use within flatten with more complex datasets with multiple nested variables?
Thank you in advance
Can someone explain why this will work flattening based on "Children" but not "CitiesLived"
Check schema of this table again
Schema
-----------------------------------
|- kind: STRING
|- fullName: STRING (required)
|- age: INTEGER
|- gender: STRING
+- phoneNumber: RECORD
| |- areaCode: INTEGER
| |- number: INTEGER
+- children: RECORD (repeated)
| |- name: STRING
| |- gender: STRING
| |- age: INTEGER
+- citiesLived: RECORD (repeated)
| |- place: STRING
| +- yearsLived: INTEGER (repeated)
As you can see - when you flatten children repeated record – the only repeated record that is left for output is citiesLived and even though it has inside it yet another repeated field – yearsLived – they are not independent – thus BigQuery Legacy SQL can output result
Now, when you flatten by citiesLived – what you get in result are two repeated fileds - children and yearsLived. Those two are independent - thus BigQuery Legacy SQL cannot output such result.
how to know what variables to use within flatten with more complex datasets with multiple nested variables?
To make it work - you should add yet another flattening with (for example) yearsLived filed. Something like below
FROM (FLATTEN(FLATTEN([dataset.tableId], citieslived), yearsLived))
Adding all those multiple FLATTENs can become cumbersome so using BigQuery Standard SQL is really the way to go!
See Migrating from Legacy SQL to BigQuery Standard SQL
If you run this query:
SELECT
*
FROM
(FLATTEN((FLATTEN(([project_id:dataset_id.table]), citiesLived.yearsLived)), citiesLived))
It will flatten as expected.
When using the Legacy SQL, BQ tries to flatten automatically the results for you.
What I have noticed though is that if you try to flatten repeated fields that have other repeated fields inside then sometimes you might run into these errors (notice that the fields citiesLived and citiesLived.yearsLived are both repeated).
So one way to solve that is by forcing the flatten operation on all repeated fields you want to work with (in the example I showed you I first flattened the yearsLived and then citiesLived) and not relying on the automatic flattening operation that the Legacy SQL offers.
But what I strongly recommend and encourage you to do is to learn the Standard SQL version for BQ as Elliot suggested in his comment. It might have a steeper learning curve at first but it will totally pay off in the long run (and you won't have the risk of eventually having to migrate all your legacy queries to standard as we had to do in our company)

How to update entries in a table within a nested dictionary?

I am trying to create an order book data structure where a top level dictionary holds 3 basic order types, each of those types has a bid and ask side and each of the sides has a list of tables, one for each ticker. For example, if I want to retrieve all the ask orders of type1 for Google stock, I'd call book[`orderType1][`ask][`GOOG]. I implemented that using the following:
bookTemplate: ([]orderID:`int$();date:"d"$();time:`time$();sym:`$();side:`$();
orderType:`$();price:`float$();quantity:`int$());
bookDict:(1#`)!enlist`orderID xkey bookTemplate;
book: `orderType1`orderType2`orderType3 ! (3# enlist(`ask`bid!(2# enlist bookDict)));
Data retrieval using book[`orderType1][`ask][`ticker] seems to be working fine. The problem appears when I try to add new order to a specific order book e.g:
testorder:`orderID`date`time`sym`side`orderType`price`quantity!(111111111;.z.D;.z.T;
`GOOG;`ask;`orderType1;100.0f;123);
book[`orderType1][`ask][`GOOG],:testorder;
Executing the last query gives 'assign error. What's the reason? How to solve it?
A couple of issues here. First one being that while you can lookup into dictionaries using a series of in-line repeated keys, i.e.
q)book[`orderType1][`ask][`GOOG]
orderID| date time sym side orderType price quantity
-------| -------------------------------------------
you can't assign values like this (can only assign at one level deep). The better approach is to use dot-indexing (and dot-amend to reassign values). However, the problem is that the value of your book dictionary is getting flattened to a table due to the list of dictionaries being uniform. So this fails:
q)book . `orderType1`ask`GOOG
'rank
You can see how it got flattened by inspecting the terminal
q)book
| ask
----------| -----------------------------------------------------------------
orderType1| (,`)!,(+(,`orderID)!,`int$())!+`date`time`sym`side`orderType`pric
orderType2| (,`)!,(+(,`orderID)!,`int$())!+`date`time`sym`side`orderType`pric
orderType3| (,`)!,(+(,`orderID)!,`int$())!+`date`time`sym`side`orderType`pric
To prevent this flattening you can force the value to be a mixed list by adding a generic null
q)book: ``orderType1`orderType2`orderType3 !(::),(3# enlist(`ask`bid!(2# enlist bookDict)));
Then it looks like this:
q)book
| ::
orderType1| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
orderType2| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
orderType3| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
Dot-indexing now works:
q)book . `orderType1`ask`GOOG
orderID| date time sym side orderType price quantity
-------| -------------------------------------------
which means that dot-amend will now work too
q).[`book;`orderType1`ask`GOOG;,;testorder]
`book
q)book
| ::
orderType1| `ask`bid!+``GOOG!(((+(,`orderID)!,`int$())!+`date`time`sym`side`o
orderType2| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
orderType3| `ask`bid!+(,`)!,((+(,`orderID)!,`int$())!+`date`time`sym`side`ord
Finally, I would recommend reading this FD whitepaper on how to best store book data: http://www.firstderivatives.com/downloads/q_for_Gods_Nov_2012.pdf

Query a manual list of data items

I would like to run a query involving joining a table to a manually generated list but am stuck trying to generate the manual list. There is an example of what I am attempting to do below:
SELECT
*
FROM
('29/12/2014', '30/12/2014', '30/12/2014') dates
;
Ideally I would want my output to look like:
29/12/2014
30/12/2014
31/12/2014
What's your Teradata release?
In TD14 there's STRTOK_SPLIT_TO_TABLE:
SELECT *
FROM TABLE (STRTOK_SPLIT_TO_TABLE(1 -- any dummy value
,'29/12/2014,30/12/2014,30/12/2014' -- any delimited string
,',' -- delimiter
)
RETURNS (outkey INTEGER
,tokennum INTEGER
,token VARCHAR(20) CHARACTER SET UNICODE) -- modify to match the actual size
) AS d
You can easily put this in a Derived Table and then join to it.
inkey (here the dummy value 1) is a numeric or string column, usually a key. Can be used for joining back to the original row.
outkey is the same as inkey.
tokennum is the ordinal position of the token in the input string.
token is the extracted substring.
Try this:
select '29/12/2014'
union
select '30/12/2014'
union
...
It should work in Teradata as well as in MySql.

Resources