I'm trying to write a KQL query that will, among other things, display the contents of a serialized dictionary called Tags which has been added to the Application Insights traces table customDimensions column by application logging.
An example of the serialized Tags dictionary is:
{
"Source": "SAP",
"Destination": "TC",
"SAPDeliveryNo": "0012345678",
"PalletID": "(00)312340123456789012(02)21234987654(05)123456(06)1234567890"
}
I'd like to use evaluate bag_unpack(...) to evaluate the JSON and turn the keys into columns. We're likely to add more keys to the dictionary as the project develops and it would be handy not to have to explicitly list every column name in the query.
However, I'm already using project to reduce the number of other columns I display. How can I use both a project statement, to only display some of the other columns, and evaluate bag_unpack(...) to automatically unpack the Tags dictionary into columns?
Or is that not possible?
This is what I have so far, which doesn't work:
traces
| where datetime_part("dayOfYear", timestamp) == datetime_part("dayOfYear", now())
and message has "SendPalletData"
| extend TagsRaw = parse_json(customDimensions.["Tags"])
| evaluate bag_unpack(TagsRaw)
| project timestamp, message, ActionName = customDimensions.["ActionName"], TagsRaw
| order by timestamp desc
When it runs it displays only the columns listed in the project statement (including TagsRaw, so I know the Tags exist in customDimensions).
evaluate bag_unpack(TagsRaw) doesn't automatically add extra columns to the result set unpacked from the Tags in customDimensions.
EDIT: To clarify what I want to achieve, these are the columns I want to output:
timestamp
message
ActionName
TagsRaw
Source
Destination
SAPDeliveryNo
PalletID
EDIT 2: It turned out a major part of my problem was that double quotes within the Tags data are being escaped. While the Tags as viewed in the Azure portal looked like normal JSON, and copied out as normal JSON, when I copied out the whole of a customDimensions record the Tags looked like "Tags": "{\"Source\":\"SAP\",\"Destination\":\"TC\", ... with the double quotes escaped with backslashes.
The accepted answer from David Markovitz handles this situation in the line:
TagsRaw = todynamic(tostring(customDimensions["Tags"]))
A few comments:
When filtering on timestamp, better use the timestamp column As Is, and do the manipulations on the other side of the equation.
When using the has[...] operators, prefer the case-sensitive one (if feasable)
Everything extracted from dynamic value is also dynamic, and when given a dynamic value parse_json() (or its equivalent, todynamic()), simply returns it, As Is.
Therefore, we need to treet customDimensions.["Tags"] in 2 steps:
1st, convert it to string. 2nd, convert the result to dynamic.
To reference a field within a dynamic type you can use X.Y, X["Y"], or "X['Y'].
No need to combine them as you did with customDimensions.["Tags"].
As the bag_unpack plugin doc states:
"The specified input column (Column) is removed."
In other words, TagsRaw does not exist following the bag_unpack operation.
Please note that you can add prefix to the columns generated by bag_unpack. Might make it easier to differentiate them from the rest of the columns.
While you can use project, using project-away is sometimes easier.
// Data sample generation. Not part of the solution.
let traces =
print c1 = "some columns"
,c2 = "we"
,c3 = "don't need"
,timestamp = ago(now()%1d * rand())
,message = "abc SendPalletData xyz"
,customDimensions = dynamic
(
{
"Tags":"{\"Source\":\"SAP\",\"Destination\":\"TC\",\"SAPDeliveryNo\":\"0012345678\",\"PalletID\":\"(00)312340123456789012(02)21234987654(05)123456(06)1234567890\"}"
,"ActionName":"Action1"
}
)
;
// Solution starts here
traces
| where timestamp >= startofday(now())
and message has_cs "SendPalletData"
| extend TagsRaw = todynamic(tostring(customDimensions["Tags"]))
,ActionName = customDimensions.["ActionName"]
| project-away c*
| evaluate bag_unpack(TagsRaw, "TR_")
| order by timestamp desc
timestamp
message
ActionName
TR_Destination
TR_PalletID
TR_SAPDeliveryNo
TR_Source
2022-08-27T04:15:07.9337681Z
abc SendPalletData xyz
Action1
TC
(00)312340123456789012(02)21234987654(05)123456(06)1234567890
0012345678
SAP
Fiddle
If I understand correctly, you want to use project to limit the number of columns that are displayed, but you also want to include all of the unpacked columns from TagsRaw, without naming all of the tags explicitly.
The easiest way to achieve this is to switch the order of your steps, so that you first do the project (including the TagsRaw column) and then you unpack the tags. If desired, you can then use project-away to specifically remove the TagsRaw column after you've unpacked it.
I have a sqlite table below:
panel_name
layer
asset_name
ls_ui_main
2
pfb_ls_ui_main
ss_ui_main
2
pfb_ss_ui_main
And the pk is "panel_name".
When I added a new row "| ms_ui_main | 2 | pfb_ms_ui_main |" to the table and wrote the change, the sequence was changed.
I used command "SELECT panel_name FROM ui_panel_info" and got result that:
panel_name
ls_ui_main
ms_ui_main
ss_ui_main
And then "SELECT asset_name FROM ui_panel_info" got result that:
asset_name
pfb_ls_ui_main
pfb_ss_ui_main
pfb_ms_ui_main
It's doesn't matter the ms is above the ss. But I need the other columns are the same sequence.
And I found when I did some changes I didn't what, such as modify table definition etc, later the "SELECT asset_name FROM ui_panel_info" got the correct sequence.
What happend? How to fix it?
I achieved to display a list of table names of interest I have in my database with the following function:
SELECT name
FROM sqlite_master
WHERE type = 'table'
AND name LIKE '%#_1' ESCAPE '#';
(It is not the subject but it return me a list of table names finishing by "_1")
Now what I would like to do is to display the content of all these tables in one command (just like if I was using cat *) and I would like to time this command.
So what should be the command ?
Thank you for your help.
This is not possible with a single SQL command.
You have to generate a series of SELECT statements, one for each table, and execute all of them.
I'm fairly new to BigQuery (3rd day of using it with no training), I'm just trying to get my head around nested fields etc.
I've looked at the following resources and used the personsdata example on the google bigquery docs link
https://cloud.google.com/bigquery/docs/data
https://chartio.com/resources/tutorials/how-to-flatten-data-using-google-bigquerys-legacy-vs-standard-sql/
I'd like to run the below query:
select *
from [dataset.tableid]
where fullname = 'John Doe'
If I run this, I get the following error:
Error: Cannot output multiple independently repeated fields at the same time. Found children_age and citiesLived_place
From reading the above articles this isn't possible because you need to flatten the results, which from what I can understand just duplicates all the none repeated variables i.e.
Fullname | age | gender | Children.name | children.age
John Doe | 22 | Male | John | 5
John Doe | 22 | Male | Jane | 7
One of the above articles suggests that you can still use the where statements by using the flatten function in bigquery:
select fullname,
age,
gender,
citiesLived.place
FROM (FLATTEN([dataset.tableId], children))
WHERE
(citiesLived.yearLived > 1995) AND
(children.age > 3)
GROUP BY fullName, age, gender, citiesLived.place
If I change this to:
select *
FROM (FLATTEN([dataset.tableId], children))
WHERE fullname = 'John Doe'
Then this works fine and gives me what I need however if I change to this:
select *
FROM (FLATTEN([dataset.tableId], citieslived))
WHERE fullname = 'John Doe'
Then I get the following error:
Error: Cannot output multiple independently repeated fields at the same time. Found children_age and citiesLived_yearsLived
Can someone explain why this will work flattening based on "Children" but not "CitiesLived" and how to know what variables to use within flatten with more complex datasets with multiple nested variables?
Thank you in advance
Can someone explain why this will work flattening based on "Children" but not "CitiesLived"
Check schema of this table again
Schema
-----------------------------------
|- kind: STRING
|- fullName: STRING (required)
|- age: INTEGER
|- gender: STRING
+- phoneNumber: RECORD
| |- areaCode: INTEGER
| |- number: INTEGER
+- children: RECORD (repeated)
| |- name: STRING
| |- gender: STRING
| |- age: INTEGER
+- citiesLived: RECORD (repeated)
| |- place: STRING
| +- yearsLived: INTEGER (repeated)
As you can see - when you flatten children repeated record – the only repeated record that is left for output is citiesLived and even though it has inside it yet another repeated field – yearsLived – they are not independent – thus BigQuery Legacy SQL can output result
Now, when you flatten by citiesLived – what you get in result are two repeated fileds - children and yearsLived. Those two are independent - thus BigQuery Legacy SQL cannot output such result.
how to know what variables to use within flatten with more complex datasets with multiple nested variables?
To make it work - you should add yet another flattening with (for example) yearsLived filed. Something like below
FROM (FLATTEN(FLATTEN([dataset.tableId], citieslived), yearsLived))
Adding all those multiple FLATTENs can become cumbersome so using BigQuery Standard SQL is really the way to go!
See Migrating from Legacy SQL to BigQuery Standard SQL
If you run this query:
SELECT
*
FROM
(FLATTEN((FLATTEN(([project_id:dataset_id.table]), citiesLived.yearsLived)), citiesLived))
It will flatten as expected.
When using the Legacy SQL, BQ tries to flatten automatically the results for you.
What I have noticed though is that if you try to flatten repeated fields that have other repeated fields inside then sometimes you might run into these errors (notice that the fields citiesLived and citiesLived.yearsLived are both repeated).
So one way to solve that is by forcing the flatten operation on all repeated fields you want to work with (in the example I showed you I first flattened the yearsLived and then citiesLived) and not relying on the automatic flattening operation that the Legacy SQL offers.
But what I strongly recommend and encourage you to do is to learn the Standard SQL version for BQ as Elliot suggested in his comment. It might have a steeper learning curve at first but it will totally pay off in the long run (and you won't have the risk of eventually having to migrate all your legacy queries to standard as we had to do in our company)
How can I search for a name like O'Neil from a table when I use a query like
select * from table_name where name like 'O'Neil';
then it shows an error.
Escape it with the second single quote:
select * from table_name where name like 'O''Neil';
Since Oracle 10g there is also a quote-operator:
select * from table_name where name like q'('O'Neil)';
Syntax: q'c text-to-be-quoted c'. c is a single character (called the quote delimiter). With the «quote operator» apostrophes don't have to be doubled.