I use the following command to extract data from a json file. Suppose that I just keep the first match if there is a match. (If there is no match, an error should be printed.)
.[] | select(.[1] | objects."/Type".N == "/Catalog") | .[1]."/Dests"
Let's say the output is 16876. Then, I use the following jq code to extract the data.
.[] | select(.[0] == 16876) | .[1] | to_entries[] | [.key, .value[0], .value[2].F, .value[3].F] | #tsv
This involves multiple passes of the input json data. Can the two jq commands be combined into one, so that one pass of the input json data is sufficient?
Use as $dests to bind the results of your first query branch to the name $dests, and then refer back to those results in the second branch within a single copy of jq.
(.[] | select(.[1] | objects."/Type".N == "/Catalog") | .[1]."/Dests") as $dests
| .[] | select(.[0] == $dests) | .[1] | to_entries[] | [.key, .value[0], .value[2].F, .value[3].F] | #tsv
Related
How can I (generically) transform the input file below to the output file below, using jq. The record format of the output file is: array_index | key | value
Input file:
[{"a": 1, "b": 10},
{"a": 2, "d": "fred", "e": 30}]
Output File:
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Here's a solution using tostream, which creates a stream of paths and their values. Filter out those having values using select, flatten to align both, and join for the output format:
jq -r 'tostream | select(has(1)) | flatten | join("|")'
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Demo
Or a very similar one using paths to get the paths, scalars for the filter, and getpath for the corresponding value:
jq -r 'paths(scalars) as $p | [$p[], getpath($p)] | join("|")'
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Demo
< file.json jq -r 'to_entries
| .[]
| .key as $k
| ((.value | to_entries )[]
| [$k, .key, .value])
| #csv'
Output:
0,"a",1
0,"b",10
1,"a",2
1,"d","fred"
1,"e",30
You just need to remove the double quotes.
to_entries can be used to loop over the elements of arrays and objects in a way that gives both the key (index) and the value of the element.
jq -r '
to_entries[] |
.key as $id |
.value |
to_entries[] |
[ $id, .key, .value ] |
join("|")
'
Demo on jqplay
Replace join("|") with #csv to get proper CSV.
I have a column 'Apples' in azure table that has this string: "Colour:red,Size:small".
Current situation:
|-----------------------|
| Apples |
|-----------------------|
| Colour:red,Size:small |
|-----------------------|
Desired Situation:
|----------------|
| Colour | Size |
|----------------|
| Red | small |
|----------------|
Please help
I'll answer the title as I noticed many people searched for a solution.
The key here is mv-expand operator (expands multi-value dynamic arrays or property bags into multiple records):
datatable (str:string)["aaa,bbb,ccc", "ddd,eee,fff"]
| project splitted=split(str, ',')
| mv-expand col1=splitted[0], col2=splitted[1], col3=splitted[2]
| project-away splitted
project-away operator allows us to select what columns from the input exclude from the output.
Result:
+--------------------+
| col1 | col2 | col3 |
+--------------------+
| aaa | bbb | ccc |
| ddd | eee | fff |
+--------------------+
This query gave me the desired results:
| parse Apples with "Colour:" AppColour ", Size:" AppSize
Remember to include all the different delimiters preceding each word you want to extract, e.g ", Size". Mind the space between.
This helped me then i used my intuition to customize the query according to my needs:
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/parseoperator
I am trying to parse the below data in Kusto. Need help.
[[ObjectCount][LinkCount][DurationInUs]]
[ChangeEnumeration][[88][9][346194]]
[ModifyTargetInLive][[3][6][595903]]
Need generic implementation without any hardcoding.
ideally - you'd be able to change the component that produces source data in that format to use a standard format (e.g. CSV, Json, etc.) instead.
The following could work, but you should consider it very inefficient
let T = datatable(s:string)
[
'[[ObjectCount][LinkCount][DurationInUs]]',
'[ChangeEnumeration][[88][9][346194]]',
'[ModifyTargetInLive][[3][6][595903]]',
];
let keys = toscalar(
T
| where s startswith "[["
| take 1
| project extract_all(#'\[([^\[\]]+)\]', s)
);
T
| where s !startswith "[["
| project values = extract_all(#'\[([^\[\]]+)\]', s)
| mv-apply with_itemindex = i keys on (
extend Category = tostring(values[0]), p = pack(tostring(keys[i]), values[i + 1])
| summarize b = make_bag(p) by Category
)
| project-away values
| evaluate bag_unpack(b)
--->
| Category | ObjectCount | LinkCount | DurationInUs |
|--------------------|-------------|-----------|--------------|
| ChangeEnumeration | 88 | 9 | 346194 |
| ModifyTargetInLive | 3 | 6 | 595903 |
I'm looking to get the count of query param usage from the query string from page views stored in app insights using KQL. My query currently looks like:
pageViews
| project parsed=parseurl(url)
| project keys=bag_keys(parsed["Query Parameters"])
and the results look like
with each row looking like
I'm looking to get the count of each value in the list when it is contained in the url in order to anwser the question "How many times does page appear in the querystring". So the results might look like:
Page | From | ...
1000 | 67 | ...
Thanks in advance
you could try something along the following lines:
datatable(url:string)
[
"https://a.b.c/d?p1=hello&p2=world",
"https://a.b.c/d?p2=world&p3=foo&p4=bar"
]
| project parsed = parseurl(url)
| project keys = bag_keys(parsed["Query Parameters"])
| mv-expand key = ['keys'] to typeof(string)
| summarize count() by key
which returns:
| key | count_ |
|-----|--------|
| p1 | 1 |
| p2 | 2 |
| p3 | 1 |
| p4 | 1 |
I have a larger dataset (data.table with approx 9m rows) with a column that I would like to use to aggregate values (min and max etc). The column is a combination of various other columns and has a string based format, like the one below:
string <- "318XXXX | VNSGN | BIER"
To gain some speed in performing tasks, I would like to recode this to a unique integer. Another application that I use on a regular basis to deal with data has a build-in function that transforms a string as the one above in a integer (e.g. 73823). I was wondering whether there is a similar function in R? The idea is that a particular string will always result in the same integer; this will allow it to be used in merging data.tables etc.
Here a little example of the data.table column that I would like to encode in simple integer values:
sample <- c("318XXXX | VNSGN | BIER", "462XXXX | TZZZH | 9905", "462XXXX | TZZZH | 9905",
"462XXXX | TZZZH | 9905", "511XXXX | FAWOR | 336H", "511XXXX | FAWOR | 336H",
"652XXXX | XXXXR | T136", "652XXXX | XXXXR | T136", "672XXXX | BQQSZ | 7777",
"672XXXX | BQQSZ | 7777")
I am hoping to encode the strings into an additional column to the table like the one below; note that the same strings result in the same numbers.
String Number
318XXXX | VNSGN | BIER 19872
462XXXX | TZZZH | 9905 78392
462XXXX | TZZZH | 9905 78392
462XXXX | TZZZH | 9905 78392
511XXXX | FAWOR | 336H 23053
511XXXX | FAWOR | 336H 23053
652XXXX | XXXXR | T136 95832
652XXXX | XXXXR | T136 95832
672XXXX | BQQSZ | 7777 71829
672XXXX | BQQSZ | 7777 71829
The data.table package will create indexes for you without making you handle them explicitly so it would be less work than the approach in the question. See the setkey function in data.table.
Also the sqldf package can use the SQL create index statement as per Examples 4h and 4i on the sqldf home page as can just about any database package.