I am using jq to search for specific results in a large file. I do not care for duplicate entries matching this specific condition, and it takes a while to process the whole file. What I would like to do is print some details about the first match and then terminate the jq command on the file to save time.
I.e.
jq '. | if ... then "print something; exit jq" else ... end'
I looked into http://stedolan.github.io/jq/manual/?#Breakingoutofcontrolstructures but this didn't quite seem to apply
EDIT:
The file I am parsing contains multiple json objects, one after another. They are not in an array.
Here is an approach which uses a recent version of first/1 (currently in master)
def first(g): label $out | g | ., break $out;
first(inputs | if .=="100" then . else empty end)
Example:
$ seq 1000000000 | jq -M -Rn -f filter.jq
Output (followed by immediate termination)
"100"
Here I use seq in lieu of a large JSON dataset.
To do what is requested is possible using features that were added after the release of jq 1.4. The following uses foreach and inputs:
label $top
| foreach inputs as $line
# state: true means found; false means not yet found
(false;
if . then break $top
else if $line | tostring | test("goodbye") then true else false end
end;
if . then $line else empty end
)
Example:
$ cat << EOF | jq -n -f exit.jq
1
"goodbye"
3
4
EOF
Result:
"goodbye"
You can accomplish it using halt and the inputs builtin:
jq -n 'inputs | if ... then "something", halt else ... end'
Will print "something" and terminate gracefully when the condition matches.
For this to work (i.e. terminate when condition is true), jq needs the -n parameter. See this issue
Related
I wrote several reductions, where I had array to begin with. But if I try to read raw data and transform each line into object, I don't have much luck reducing them together
echo -e "1\n2\n\n\n3\n4\n5" | jq --raw-input '. | select (. != "") | {(.):123} | reduce . as $i ({}; . + $i)'
the reduction does nothing. Why? How to correct the reduction to produce single object having keys 1,2,3,4,5?
First, the initial .| is unnecessary.
Second, since your input is a stream, you will either need to use the -s option, or better, use the -n option with inputs.
So you could go with:
echo -e "1\n2\n\n\n3\n4\n5" |
jq -nR 'reduce (inputs|select(. != "")) as $i ({}; . + {($i): 123})'
though maybe {($i): null} might be more appropriate.
You were almost there. To convert from multiple results to a single object, you can run another jq in slurp mode:
echo -e "1\n2\n\n\n3\n4\n5" \
| jq --raw-input 'select (. != "") | {(.):123}' \
| jq --slurp 'reduce .[] as $o ({}; . + $o)'
I've defined a variable inside a shell script and I want to use it. For some reason, I cannot pass it into to command line that I need it in.
Here's my script which fails at the last lines
#! /usr//bin/tcsh -f
if ( $# != 2 ) then
echo "Usage: jump_sorter.sh <jump> <field to sort on>"
exit;
endif
set a = `cat $1 | tail -1` #prepares last row for check with loop
set b = $2 #this is the value last row will be checked for
set counter = 0
foreach i ($a)
if ($i == "$b") then
set bingo = $counter
echo "$bingo is the field to print from $a"
endif
set counter = `expr $counter + 1`
end
echo $bingo #this prints the correct value for using in the command below
cat $1 | awk '{print($bingo)}' | sort | uniq -c | sort -nr #but this doesn't work.
#when I use $9 instead of $bingo, it does work.
How can I pass $bingo into the final line correctly, please?
Update: following the accepted answer from Martin Tournoij, the correct way to handle the "$" sign in the command is:
cat $1 | awk "{print("\$"$bingo)}" | sort | uniq -c | sort -nr
The reason it doesn't work is because variables are only substituted inside double quotes ("), not single quotes ('), and you're using single quotes:
cat $1 | awk '{print($bingo)}' | sort | uniq -c | sort -nr
The following should work:
cat $1 | awk "{print($bingo)}" | sort | uniq -c | sort -nr
You also have an error here:
#! /usr//bin/tcsh -f
That should be:
#!/usr/bin/tcsh -f
Note that csh isn't usually recommended for scripting; it has many quirks and lacks some features like functions. Unless you really need to use csh, it's recommended to use a Bourne shell (/bin/sh, bash, zsh) or a scripting language (Python, Ruby, etc.) instead.
As the tile says, I wonder when jq exits with code 1. In its manual, it says -e sets the exit status of jq to 0 if the last output values was neither false nor null, 1 if the last output value was either false or null. Not clear what it means by the last output values of false or null? what if I don't use -e?
The following is intended to supplement the information given about the -e option elsewhere on this page.
Assuming -e has NOT been specified, the return code is:
5 if a call to error/1 or halt_error/0 causes program termination
an integral value (*) depending on N if halt_error(N) causes program termination, where N is a number; in particular, if N is non-negative, then the status is set to (N%256).
Otherwise, but still assuming -e has NOT been specified, the return code is:
2 if a parsing error reading input causes program termination
3 or 4 if a syntax error in the jq program causes program termination
0 on normal termination.
(*) Specifically:
N % 256 | if . < 0 then 256+. else . end
This answer is based on current jq 1.6 source code from https://github.com/stedolan/jq
With --exit-status (-e), there are 6 possible exit codes:
0: jq output something and last line was neither false nor null
1: last line output was false or null
2: usage problem or system error
3: jq program compile error
4: jq didn't ouput anything at all
5: unknown (unexpected) error: any error other than 2 and 3
Without --exit-status (-e), 0 just means that jq ran successfully. Additionally, exit status 1 and 4 disappear and 0 is returned instead.
Here is (Unix Bourne shell) some ways to get 1 as an exit value:
$ echo false | jq -e .
false
$ echo '{ "foo": false }' | jq -e .foo
false
$ echo null | jq -e .
null
$ echo '{ "foo": null }' | jq -e .foo
null
$ echo '{ }' | jq -e .foo
null
$ echo '{ "foo": false }' | jq -e '.bar?'
null
Here is how to get 4:
$ echo 'false' | jq -e '.foo?'
And (I'm sure you want to know) here is one way to get 5:
$ echo false | jq .foo
jq: error (at <stdin>:1): Cannot index boolean with string "foo"
I am trying to use jq 1.5 to develop a script that can take one or more user inputs that represent a key and recursively remove them from JSON input.
The JSON I am referencing is here:
https://github.com/EmersonElectricCo/fsf/blob/master/docs/Test.json
My script, which seems to work pretty well, is as follows.
def post_recurse(f):
def r:
(f | select(. != null) | r), .;
r;
def post_recurse:
post_recurse(.[]?);
(post_recurse | objects) |= del(.META_BASIC_INFO)
However, I would like to replace META_BASIC_INFO with one or more user inputs. How would I go about accomplishing this? I presume with --arg from the command line, but I am unclear on how to incorporate this into my .jq script?
I've tried replacing del(.META_BASIC_INFO) with del(.$module) and invoking with cat test.json | ./jq -f fsf_key_filter.jq --arg module META_BASIC_INFO to test but this does not work.
Any guideance on this is greatly appreciated!
ANSWER:
Based on a couple of suggestions I was able to arrive to the following that works and users JQ.
Innvocation:
cat test.json | jq --argjson delete '["META_BASIC_INFO","SCAN_YARA"]' -f fsf_module_filter.jq
Code:
def post_recurse(f):
def r:
(f | select(. != null) | r), .;
r;
def post_recurse:
post_recurse(.[]?);
(post_recurse | objects) |= reduce $delete[] as $d (.; delpaths([[ $d ]]))
It seems the name module is a keyword in 1.5 so $module will result in a syntax error. You should use a different name. There are other builtins to do recursion for you, consider using them instead of churning out your own.
$ jq '(.. | objects | select(has($a))) |= del(.[$a])' --arg a "META_BASIC_INFO" Test.json
You could also use delpaths/1. For example:
$ jq -n '{"a":1, "b": 1} | delpaths([["a"]])'
{
"b": 1
}
That is, modifying your program so that the last line reads like this:
(post_recurse | objects) |= delpaths([[ $delete ]] )
you would invoke jq like so:
$ jq --arg delete "META_BASIC_INFO" -f delete.jq input.json
(One cannot use --arg module ... as "$module" has some kind of reserved status.)
Here's a "one-line" solution using walk/1:
jq --arg d "META_BASIC_INFO" 'walk(if type == "object" then del(.[$d]) else . end)' input.json
If walk/1 is not in your jq, here is its definition:
# Apply f to composite entities recursively, and to atoms
def walk(f):
. as $in
| if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | walk(f)) } ) | f
elif type == "array" then map( walk(f) ) | f
else f
end;
If you want to recursively delete a bunch of key-value pairs, then here's one approach using --argjson:
rdelete.jq:
def rdelete(key):
walk(if type == "object" then del(.[key]) else . end);
reduce $strings[] as $s (.; rdelete($s))
Invocation:
$ jq --argjson strings '["a","b"]' -f rdelete.jq input.json
I have a fixed-width-field file which I'm trying to sort using the UNIX (Cygwin, in my case) sort utility.
The problem is there is a two-line header at the top of the file which is being sorted to the bottom of the file (as each header line begins with a colon).
Is there a way to tell sort either "pass the first two lines across unsorted" or to specify an ordering which sorts the colon lines to the top - the remaining lines are always start with a 6-digit numeric (which is actually the key I'm sorting on) if that helps.
Example:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
500123TSTMY_RADAR00
222334NOTALINEOUT01
477821USASHUTTLES21
325611LVEANOTHERS00
should sort to:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
222334NOTALINEOUT01
325611LVEANOTHERS00
477821USASHUTTLES21
500123TSTMY_RADAR00
(head -n 2 <file> && tail -n +3 <file> | sort) > newfile
The parentheses create a subshell, wrapping up the stdout so you can pipe it or redirect it as if it had come from a single command.
If you don't mind using awk, you can take advantage of awk's built-in pipe abilities
eg.
extract_data | awk 'NR<3{print $0;next}{print $0| "sort -r"}'
This prints the first two lines verbatim and pipes the rest through sort.
Note that this has the very specific advantage of being able to selectively sort parts
of a piped input. all the other methods suggested will only sort plain files which can be read multiple times. This works on anything.
In simple cases, sed can do the job elegantly:
your_script | (sed -u 1q; sort)
or equivalently,
cat your_data | (sed -u 1q; sort)
The key is in the 1q -- print first line (header) and quit (leaving the rest of the input to sort).
For the example given, 2q will do the trick.
The -u switch (unbuffered) is required for those seds (notably, GNU's) that would otherwise read the input in chunks, thereby consuming data that you want to go through sort instead.
Here is a version that works on piped data:
(read -r; printf "%s\n" "$REPLY"; sort)
If your header has multiple lines:
(for i in $(seq $HEADER_ROWS); do read -r; printf "%s\n" "$REPLY"; done; sort)
This solution is from here
You can use tail -n +3 <file> | sort ... (tail will output the file contents from the 3rd line).
head -2 <your_file> && nawk 'NR>2' <your_file> | sort
example:
> cat temp
10
8
1
2
3
4
5
> head -2 temp && nawk 'NR>2' temp | sort -r
10
8
5
4
3
2
1
It only takes 2 lines of code...
head -1 test.txt > a.tmp;
tail -n+2 test.txt | sort -n >> a.tmp;
For a numeric data, -n is required. For alpha sort, the -n is not required.
Example file:
$ cat test.txt
header
8
5
100
1
-1
Result:
$ cat a.tmp
header
-1
1
5
8
100
So here's a bash function where arguments are exactly like sort. Supporting files and pipes.
function skip_header_sort() {
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
local file=${#: -1}
set -- "${#:1:$(($#-1))}"
fi
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
}
How it works. This line checks if there is at least one argument and if the last argument is a file.
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
This saves the file to separate argument. Since we're about to erase the last argument.
local file=${#: -1}
Here we remove the last argument. Since we don't want to pass it as a sort argument.
set -- "${#:1:$(($#-1))}"
Finally, we do the awk part, passing the arguments (minus the last argument if it was the file) to sort in awk. This was orignally suggested by Dave, and modified to take sort arguments. We rely on the fact that $file will be empty if we're piping, thus ignored.
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
Example usage with a comma separated file.
$ cat /tmp/test
A,B,C
0,1,2
1,2,0
2,0,1
# SORT NUMERICALLY SECOND COLUMN
$ skip_header_sort -t, -nk2 /tmp/test
A,B,C
2,0,1
0,1,2
1,2,0
# SORT REVERSE NUMERICALLY THIRD COLUMN
$ cat /tmp/test | skip_header_sort -t, -nrk3
A,B,C
0,1,2
2,0,1
1,2,0
Here's a bash shell function derived from the other answers. It handles both files and pipes. First argument is the file name or '-' for stdin. Remaining arguments are passed to sort. A couple examples:
$ hsort myfile.txt
$ head -n 100 myfile.txt | hsort -
$ hsort myfile.txt -k 2,2 | head -n 20 | hsort - -r
The shell function:
hsort ()
{
if [ "$1" == "-h" ]; then
echo "Sort a file or standard input, treating the first line as a header.";
echo "The first argument is the file or '-' for standard input. Additional";
echo "arguments to sort follow the first argument, including other files.";
echo "File syntax : $ hsort file [sort-options] [file...]";
echo "STDIN syntax: $ hsort - [sort-options] [file...]";
return 0;
elif [ -f "$1" ]; then
local file=$1;
shift;
(head -n 1 $file && tail -n +2 $file | sort $*);
elif [ "$1" == "-" ]; then
shift;
(read -r; printf "%s\n" "$REPLY"; sort $*);
else
>&2 echo "Error. File not found: $1";
>&2 echo "Use either 'hsort <file> [sort-options]' or 'hsort - [sort-options]'";
return 1 ;
fi
}
This is the same as Ian Sherbin answer but my implementation is :-
cut -d'|' -f3,4,7 $arg1 | uniq > filetmp.tc
head -1 filetmp.tc > file.tc;
tail -n+2 filetmp.tc | sort -t"|" -k2,2 >> file.tc;
Another simple variation on all the others, reading a file once
HEADER_LINES=2
(head -n $HEADER_LINES; sort) < data-file.dat
With Python:
import sys
HEADER_ROWS=2
for _ in range(HEADER_ROWS):
sys.stdout.write(next(sys.stdin))
for row in sorted(sys.stdin):
sys.stdout.write(row)
cat file_name.txt | sed 1d | sort
This will do what you want.