Definition of boolean_expression in jq - jq

According to the manual select needs a parameter boolean_expression.
I always wonder what exactly is meant by this in jq.
To take full advantage of the select filter, it would be nice to have a clear definition.
Can someone give this missing precise definition?
The following collection of unusual examples looks a bit strange and counterintuitive to me:
jq -n '1,2 | select(null)' outputs nothing
jq -n '1,2 | select(empty)' outputs nothing
jq -n '1,2 | select(42)' outputs 1 2
jq -n '1,2 | select(-1.23)' outputs 1 2
jq -n '1,2 | select({a:"strange"})' outputs 1 2
jq -n '1,0,-1,null,false,42 | select(.)' outputs 1 0 -1 42
It seems to me that everything that is not false and not null is considered true.
In the examples, the constants are to understand as placeholders for the result of an arbitrary expression.

Yes, null and false are indeed considered falsy, other values as truthy. This notion is (somewhat unfortunately) explained in the if-then-else section of the manual.
Therefore jq -n '1,2 | select(null)' will produce nothing, as would jq -n '1,2 | select(false)'
In the case of jq -n '1,2 | select(empty)', the empty just eats up all the results, so there is nothing to output.
All other cases are truthy, therefore the input is propagated.
Note that none of your examples considers the actual input for evaluation. All selects have a constant argument.
To filter based on the input, the argument of select has to somehow process it (as opposed to constants which simply ignore it), e.g. jq -n '1,2 | select(.%2 == 0)' outputs just 2.

Related

Jq using contain with multiple match conditions

I have json file with multiple object Id's and I need a query that excludes different ids based on naming conventions. These are essentially OR's. I thought I had it with this query but they are still appearing in the output.
If I run the query with them separately I can get it to work, but I need to add a large list.
Works
cat file.json | jq '.interface[] | select(.description | contains ("VLL") | not )'
Not working
cat file.json | jq '.interface[] | select(.description | contains ("VLL"|"2002089"|"otherstuff" ) | not )'
Ive tried a few different ways with commas and quoting but no luck.
Am I far off?
I also plan to run this in bash script if that help(probably makes worse)
Thanks
Am I far off?
If you use test/1 instead of contains, and make corresponding adjustments, no:
.interface[]
| select(.description | test ("VLL|2002089|otherstuff" ) | not )
The argument of test is interpreted as a regex. There are of course alternatives, but if using a regex is appropriate, then test would be suitable.
Blacklist of strings
If you have a blacklist of strings and want to use string equality as the criterion, consider:
["VLL","2002089","otherstuff"] as $blacklist
| .interface[]
| select(.description | IN($blacklist[]) | not)

How to determine the last part of URL with jq?

I have to distinguish between the following two paths.
shorter: https://www.example.com/
longer: https://www.example.com/foo/
In Bash script, using Bash built-in literals as follows returns only longer one.
$ url1=https://www.example.com/
$ url2=https://www.example.com/foo/
$ cut -d/ -f4 <<<${url1%/*} # this returns nothing
>$
$ cut -d/ -f4 <<<${url2%/*} # this returns last part of path
>$ foo
So it could be identified longer one in Bash script,
but now I have to define same filter for JSON value handled in jq.
If jq can write like the following, my goal can be achieved...
jq '. | select( .url | (cut -d/ -f4 <<< ${url2%/*})!=null) )'
But can not do that. How can do that?
jq has many string-handling functions -- one could do worse than checking the jq manual. For the task at hand, using a regex function would probably be best, but since you mentioned cut -d/ -f4, it might be of interest to note that much the same effect can be achieved by:
split("/")[3]
For the last non-trivial part you could consider:
sub("/ *$";"") | split("/")[-1]

Divide the result of two grep and word count

I have a log file and I would like to divide the result of one grep and count by another grep and count.
$ echo $((cat log2.txt | grep timed\|error\|Error | wc -l)/(cat log2.txt | grep Duration | wc -l))
zsh: bad math expression: operator expected at `log2.txt |...'
It's ugly, doesn't work and I can probably do it in a better way but I don't know how.
Also I would like to know if it possible to id incrementaly on a log stream read by tail for example.
First of all, you should know that, both grep|wc -l will count number of matched lines instead of occurrences, I hope this is what you really want.
Regarding your requirement, indeed, your approach is ugly (7 processes), apart from the mistakes. The job can be done by a single awk line:
awk '/timed|[Ee]rror/{a++}/Duration/{b++}END{printf "%.2f\n",a/b}' log2.txt
The above line calculates the result based on matched number of lines, same as your grep|wc -l.
You have several problems:
You are trying to run shell commands directly inside an arithmetic expression.
You aren't passing the correct regular expression to grep.
You need to make sure at least one of the operands is a floating-point value to trigger zsh's floating-point division.
Each pipeline can also be reduced to a single command; use input redirection instead of cat, and use the -c option to get the number of lines that match the regular expression.
echo $(( 1.0 * $(grep -c 'timed\|error\|Error' log2.txt) / $(grep -c Duration log2.txt))
Basic regular expressions treat unescaped | as a literal character, not an alteration operator.
$ echo foo | grep foo\|bar
$ echo foo | grep foo\\\|bar # Pass a literal backslash as part of the regex
foo
$ echo foo | grep 'foo\|bar' # Use '...' instead of explicitly escaping \ and |
foo
$ echo foo | grep -E 'foo|bar' # Use extended regular expressions instead

When it is allowed to omit the dot filter in jq?

I do not understand, when it is allowed to omit the dot expression.
It is possible to convert every line of raw input into a JSON string:
$ echo -e "a\nb" | jq -Rc .
"a"
"b"
In that example it makes no difference, when the dot expression is missing:
$ echo -e "a\nb" | jq -Rc
"a"
"b"
Next I can read the output from the first jq and slurp it into an array:
$ echo -e "a\nb" | jq -Rc . | jq -sc .
["a","b"]
Here it makes also no difference, when I omit the dot expression:
$ echo -e "a\nb" | jq -Rc . | jq -sc
["a","b"]
But when I omit both dot expressions, I get an usage error and an empty array as result:
$ echo -e "a\nb" | jq -Rc | jq -sc
jq - commandline JSON processor [version 1.5]
Usage: jq [options] <jq filter> [file...]
...
[]
Why?
Before directly answering the question, I'd like to clarify that:
It is always acceptable to specify a filter explicitly.
Some versions of jq expect that a filter will be specified explicitly.
Different versions of jq behave differently in the absence of an explicit filter.
The main idea guiding jq's evolution with regard to interpreting the absence of a filter intelligently has been that if there's something to read on STDIN, and if a filter has not been specified explicitly, and if it looks like you meant ., then assume you did mean ..
The answer to the question, then, is that the perplexing behavior noted in the question is a bug in a particular version of jq.
(Or if you like, the perplexing behavior reflects the difficulties that arise when developers seek to endow software with the ability to read your mind.)
By the way, the bug has been fixed:
$ jq --version
jq-1.5rc2-150-g1740fd0
$ echo -e "a\nb" | jq -Rc | jq -sc
["a","b"]
The answer is in the rest of the text
Usage: jq [options] <jq filter> [file...]
A filter should be mandatory then, a filter takes an input and produces an output, but in many times you dont need to produce an output and just want the result printed so the default was . (see the issue believe introduced in 1.5, before you must had to include the filter)
so it should be the same if . is the default filtering, unfortunately is how pipe is reading stdin and stout. You can read the details in the GitHub issue
Maybe we should print the usage message only when the program is empty, and stdin and stdout are both terminals? That is, assume . when stdin is not a terminal or when stdout is not a terminal.
so the rule is :
if you want to be perfectionist always use a filter even if . is the filter you want
if you want the result of your command to be the input of another pipe, you must indicate the filter, again if you just want the same result to be taken as input of the next command.
so the same
echo -e "a\nb" | jq -Rc > test.txt will produce an error but echo -e "a\nb" | jq -Rc . > test.txt will write the result of the command into the file

unix awk repeated pattern

I have the following variable. i want to search with pattern "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/"
export str='16/02/02 11:29:22 INFO mortbay.log: State being saved: {"#class":"com.paypal.fpti.hadoop.copy.FPTICopyState","timestamp":0,"state":"Running","name":"com.paypal.fpti.hadoop.copy.FPTICopyState","id":"99c7cba7-d211-4845-97a1-c34168a91b22","subStates":{"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-4_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"99034acb-cfad-41a0-89ed-e2731b1f82ec","subStates":null,"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-4","sourceDir":"/fpti/v2/hdfs_writer_4//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"40325dec-0fe2-4025-8258-f896f957ddf0","subStates":null,"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data","sourceDir":"/fpti/v2/hdfs_writer//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-1_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"5216f8c1-2cfa-4eac-a390-f4d2bcd6584f","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-1","sourceDir":"/fpti/v2/hdfs_writer_1//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-2_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"5fcd0b6e-3df9-4f82-a76f-bc8ff1493623","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-2","sourceDir":"/fpti/v2/hdfs_writer_2//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-3_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"6ec9223a-fcf0-447a-b9ae-2020e3232f6d","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-3","sourceDir":"/fpti/v2/hdfs_writer_3//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-5_2016/02/02/10/":{"#class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"d123742c-8a55-4e25-bfa0-0a97f6ed25d7","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-5","sourceDir":"/fpti/v2/hdfs_writer_5//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//"}},"copystate":"CopyToLocalDone","start":"2016-02-02T11:21:24.678Z","end":null,"window":"2016-02-02T10:00:00.000Z","retryCount":0}'
I tried like below it gives the first occurence alone
[ggangadharan#phxbastion2 ~]$ echo $str | awk '{match($0, "/x/home[/,a-z,0-9,_]+*", a)}END{print a[0]}'
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
but i want output like below.
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
Can somebody help me how to use awk for this scenario?
thanks in advance
I'm not sure how to hack this in awk, but you can safely use egrep here:
$ echo $str | egrep -o /x/home[/,a-z,0-9,_]+*
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
Using "significant splitting" in AWK:
$ awk -v RS="\"" '/\/x\/home\/pp_dt_fpti_batch\/stampy_copy_orchestration\//' <<< "$str"
which gives
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//
You specified /x/home/pp_dt_fpti_batch/stampy_copy_orchestration/ for your search pattern, so I used that. If you want something different, use something different.
This separates input into records by a quote " (set RS to ", escaped in the shell). Any record matching the regular expression is printed. Input is given from the shell with the string $str. Maybe this is more readable:
$ awk -v RS='"' '/regexp/' <<< "$str"
Here are two approaches using a JSON-aware command-line tool, here jq.
In both cases we assume that the string of interest is embedded in the
JSON object contained in $str
(1) In the following, we simply pretty-print the JSON object and grep for
the string of interest in case it appears in a surprising spot. Further trimming of the result can easily be done (e.g. using sed) as desired:
$ sed 's/^[^{]*//' <<< "$str" | jq '.[]' | fgrep /x/home/pp_dt_fpti_batch/stampy_copy_orchestration/
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//"
(2) The following query is appropriate if we are only interested in a
match if it occurs in an object as a value corresponding to the key "localDir":
sed 's/^[^{]*//' <<< "$str" |
jq -r '..
| select(.localDir?)
| .localDir
| select(test("/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/"))'
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//

Resources