Concise notation for JQ - jq

According to jq tutorial, end of ยง4.1, there is a compact notation for jq where the | operator (and the spaces surrounding it) can just be skipped.
On the SQuAD 2.0 devset (download link), the following expressions work:
$> cat src/dev-v2.0.json | jq ".data" | jq ".[]" | jq ".paragraphs" | jq ".[]" | jq ".qas" | jq ".[]" | jq ".question" | head
$> cat src/dev-v2.0.json | jq ".data | .[] | .paragraphs | .[] | .qas | .[] | .question" | head
But the "concise" notation doesn't:
$> cat src/dev-v2.0.json | jq ".data.[].paragraphs.[].qas.[].question" | head
jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at <top-level>, line 1:jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at <top-level>, line 1:
.data.[].paragraphs.[].qas.[].question
jq: 1 compile error
As the error hints, this may be an encoding issue, but with the quotes around the expression, this seems unlikely. I thus follow the jq recommendation to ask that at StackOverflow: What am I missing here?
I am using Ubuntu 20.04.3 LTS, jq-1.6, bash 5.0.17(1).

.[] accesses the members of the array (or object) in the context ..
.data | .[] accesses a field called data of the object in the context .. By piping it into the next filter, the accessed field becomes the new context ., so .[] accesses the members of the array (or object) in that context.
.data[] accesses the members of the array (or object) in a field called data of the object in the context ..
jq ".data[].paragraphs[].qas[].question" dev-v2.0.json | head
"In what country is Normandy located?"
"When were the Normans in Normandy?"
"From which countries did the Norse originate?"
"Who was the Norse leader?"
"What century did the Normans first gain their separate identity?"
"Who gave their name to Normandy in the 1000's and 1100's"
"What is France a region of?"
"Who did King Charles III swear fealty to?"
"When did the Frankish identity emerge?"
"Who was the duke in the battle of Hastings?"

The valid abbreviated form of .x|.[]|.y is .x[].y

Related

Definition of boolean_expression in jq

According to the manual select needs a parameter boolean_expression.
I always wonder what exactly is meant by this in jq.
To take full advantage of the select filter, it would be nice to have a clear definition.
Can someone give this missing precise definition?
The following collection of unusual examples looks a bit strange and counterintuitive to me:
jq -n '1,2 | select(null)' outputs nothing
jq -n '1,2 | select(empty)' outputs nothing
jq -n '1,2 | select(42)' outputs 1 2
jq -n '1,2 | select(-1.23)' outputs 1 2
jq -n '1,2 | select({a:"strange"})' outputs 1 2
jq -n '1,0,-1,null,false,42 | select(.)' outputs 1 0 -1 42
It seems to me that everything that is not false and not null is considered true.
In the examples, the constants are to understand as placeholders for the result of an arbitrary expression.
Yes, null and false are indeed considered falsy, other values as truthy. This notion is (somewhat unfortunately) explained in the if-then-else section of the manual.
Therefore jq -n '1,2 | select(null)' will produce nothing, as would jq -n '1,2 | select(false)'
In the case of jq -n '1,2 | select(empty)', the empty just eats up all the results, so there is nothing to output.
All other cases are truthy, therefore the input is propagated.
Note that none of your examples considers the actual input for evaluation. All selects have a constant argument.
To filter based on the input, the argument of select has to somehow process it (as opposed to constants which simply ignore it), e.g. jq -n '1,2 | select(.%2 == 0)' outputs just 2.

how to out timestamp with jq

I'm trying to out data with jq request
"{\"#timestamp\":\"2019-03-13T00:11:03.123Y\"
I'm typing: cat myfile.txt | jq fromjson.timestamp
But it's incorrect
The pipeline you're looking for is
jq 'fromjson | .["#timestamp"]'
You cannot use the .foo syntax to access fields with special characters such as "#".

Jq using contain with multiple match conditions

I have json file with multiple object Id's and I need a query that excludes different ids based on naming conventions. These are essentially OR's. I thought I had it with this query but they are still appearing in the output.
If I run the query with them separately I can get it to work, but I need to add a large list.
Works
cat file.json | jq '.interface[] | select(.description | contains ("VLL") | not )'
Not working
cat file.json | jq '.interface[] | select(.description | contains ("VLL"|"2002089"|"otherstuff" ) | not )'
Ive tried a few different ways with commas and quoting but no luck.
Am I far off?
I also plan to run this in bash script if that help(probably makes worse)
Thanks
Am I far off?
If you use test/1 instead of contains, and make corresponding adjustments, no:
.interface[]
| select(.description | test ("VLL|2002089|otherstuff" ) | not )
The argument of test is interpreted as a regex. There are of course alternatives, but if using a regex is appropriate, then test would be suitable.
Blacklist of strings
If you have a blacklist of strings and want to use string equality as the criterion, consider:
["VLL","2002089","otherstuff"] as $blacklist
| .interface[]
| select(.description | IN($blacklist[]) | not)

How to determine the last part of URL with jq?

I have to distinguish between the following two paths.
shorter: https://www.example.com/
longer: https://www.example.com/foo/
In Bash script, using Bash built-in literals as follows returns only longer one.
$ url1=https://www.example.com/
$ url2=https://www.example.com/foo/
$ cut -d/ -f4 <<<${url1%/*} # this returns nothing
>$
$ cut -d/ -f4 <<<${url2%/*} # this returns last part of path
>$ foo
So it could be identified longer one in Bash script,
but now I have to define same filter for JSON value handled in jq.
If jq can write like the following, my goal can be achieved...
jq '. | select( .url | (cut -d/ -f4 <<< ${url2%/*})!=null) )'
But can not do that. How can do that?
jq has many string-handling functions -- one could do worse than checking the jq manual. For the task at hand, using a regex function would probably be best, but since you mentioned cut -d/ -f4, it might be of interest to note that much the same effect can be achieved by:
split("/")[3]
For the last non-trivial part you could consider:
sub("/ *$";"") | split("/")[-1]

unix awk repeated pattern

I have the following variable. i want to search with pattern "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/"
export str='16/02/02 11:29:22 INFO mortbay.log: State being saved: {"#class":"com.paypal.fpti.hadoop.copy.FPTICopyState","timestamp":0,"state":"Running","name":"com.paypal.fpti.hadoop.copy.FPTICopyState","id":"99c7cba7-d211-4845-97a1-c34168a91b22","subStates":{"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-4_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"99034acb-cfad-41a0-89ed-e2731b1f82ec","subStates":null,"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-4","sourceDir":"/fpti/v2/hdfs_writer_4//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"40325dec-0fe2-4025-8258-f896f957ddf0","subStates":null,"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data","sourceDir":"/fpti/v2/hdfs_writer//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-1_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"5216f8c1-2cfa-4eac-a390-f4d2bcd6584f","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-1","sourceDir":"/fpti/v2/hdfs_writer_1//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-2_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"5fcd0b6e-3df9-4f82-a76f-bc8ff1493623","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-2","sourceDir":"/fpti/v2/hdfs_writer_2//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-3_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"6ec9223a-fcf0-447a-b9ae-2020e3232f6d","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-3","sourceDir":"/fpti/v2/hdfs_writer_3//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-5_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"d123742c-8a55-4e25-bfa0-0a97f6ed25d7","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-5","sourceDir":"/fpti/v2/hdfs_writer_5//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//"}},"copystate":"CopyToLocalDone","start":"2016-02-02T11:21:24.678Z","end":null,"window":"2016-02-02T10:00:00.000Z","retryCount":0}'
I tried like below it gives the first occurence alone
[ggangadharan#phxbastion2 ~]$ echo $str | awk '{match($0, "/x/home[/,a-z,0-9,_]+*", a)}END{print a[0]}'
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
but i want output like below.
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
Can somebody help me how to use awk for this scenario?
thanks in advance
I'm not sure how to hack this in awk, but you can safely use egrep here:
$ echo $str | egrep -o /x/home[/,a-z,0-9,_]+*
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
Using "significant splitting" in AWK:
$ awk -v RS="\"" '/\/x\/home\/pp_dt_fpti_batch\/stampy_copy_orchestration\//' <<< "$str"
which gives
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
You specified /x/home/pp_dt_fpti_batch/stampy_copy_orchestration/ for your search pattern, so I used that. If you want something different, use something different.
This separates input into records by a quote " (set RS to ", escaped in the shell). Any record matching the regular expression is printed. Input is given from the shell with the string $str. Maybe this is more readable:
$ awk -v RS='"' '/regexp/' <<< "$str"
Here are two approaches using a JSON-aware command-line tool, here jq.
In both cases we assume that the string of interest is embedded in the
JSON object contained in $str
(1) In the following, we simply pretty-print the JSON object and grep for
the string of interest in case it appears in a surprising spot. Further trimming of the result can easily be done (e.g. using sed) as desired:
$ sed 's/^[^{]*//' <<< "$str" | jq '.[]' | fgrep /x/home/pp_dt_fpti_batch/stampy_copy_orchestration/
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//"
(2) The following query is appropriate if we are only interested in a
match if it occurs in an object as a value corresponding to the key "localDir":
sed 's/^[^{]*//' <<< "$str" |
jq -r '..
| select(.localDir?)
| .localDir
| select(test("/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/"))'
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//

Resources