Using jq how do I search against an array of strings? [duplicate] - jq

Since an example is worth a thousand words, say I have the following JSON stream:
{"a": 0, "b": 1}
{"a": 2, "b": 2}
{"a": 7, "b": null}
{"a": 3, "b": 7}
How can I keep all the objects for which the .b property is one of [1, 7] (in reality the list is much longer so I don't want to do select(.b == 1 or .b == 7)). I'm looking for something like this: select(.b in [1, 7]), but I couldn't find what I'm looking for in the man page.

Doing $value in $collection could be achieved using the pattern select($value == $collection[]). A more efficient alternative would be select(any($value == $collection[]; .)) So your filter should be this:
[1, 7] as $whitelist | select(any(.b == $whitelist[]; .))
Having the array in a variable has its benefits as it lets you change the whitelist easily using arguments.
$ jq --argjson whitelist '[2, 7]' 'select(any(.b == $whitelist[]; .))'

The following approach using index/1 is similar to what was originally sought (".b in [1, 7]"), and might be noticeably faster than using .[] within select if the whitelist is large.
If your jq supports --argjson:
jq --argjson w '[1,7]' '. as $in | select($w | index($in.b))'
Otherwise:
jq --arg w '[1,7]' '. as $in | ($w|fromjson) as $w | select($w | index($in.b))'
or:
jq '. as $in | select([1, 7] | index($in.b))'
UPDATE
On Jan 30, 2017, a builtin named IN was added for efficiently testing whether a JSON entity is contained in a stream. It can also be used for efficiently testing membership in an array. For example, the above invocation with --argjson can be simplified to:
jq --argjson w '[1,7]' 'select( .b | IN($w[]) )'
If your jq does not have IN/1, then so long as your jq has first/1, you can use this equivalent definition:
def IN(s): . as $in | first(if (s == $in) then true else empty end) // false;

Related

Check if element is member of array [duplicate]

I have an array and I need to check if elements exists in that array or to get that element from the array using
jq, fruit.json:
{
"fruit": [
"apple",
"orange",
"pomegranate",
"apricot",
"mango"
]
}
cat fruit.json | jq '.fruit .apple'
does not work
The semantics of 'contains' is not straightforward at all. In general, it would be better to use 'index' to test if an array has a specific value, e.g.
.fruit | index( "orange" )
However, if the item of interest is itself an array, the general form:
ARRAY | index( [ITEM] )
should be used, e.g.:
[1, [2], 3] | index( [[2]] ) #=> 1
IN/1
If your jq has IN/1 then a better solution is to use it:
.fruit as $f | "orange" | IN($f[])
If your jq has first/1 (as does jq 1.5), then here is a fast definition of IN/1 to use:
def IN(s): first((s == .) // empty) // false;
any(_;_)
Another efficient alternative that is sometimes more convenient is to use any/2, e.g.
any(.fruit[]; . == "orange")
or equivalently:
any(.fruit[] == "orange"; .)
To have jq return success if the array fruit contains "apple", and error otherwise:
jq -e '.fruit|any(. == "apple")' fruit.json >/dev/null
To output the element(s) found, change to
jq -e '.fruit[]|select(. == "apple")' fruit.json
If searching for a fixed string, this isn't very relevant, but it might be if the select expression might match different values, e.g. if it's a regexp.
To output only distinct values, pass the results to unique.
jq '[.fruit[]|select(match("^app"))]|unique' fruit.json
will search for all fruits starting with app, and output unique values. (Note that the original expression had to be wrapped in [] in order to be passed to unique.)
[WARNING: SEE THE COMMENTS AND ALTERNATIVE ANSWERS.]
cat fruit.json | jq '.fruit | contains(["orange"])'
For future visitors, if you happen to have the array in a variable and want to check the input against it, and you have jq 1.5 (without IN), your best option is index but with a second variable:
.inputField as $inputValue | $storedArray|index($inputValue)
This is functionally equivalent to .inputField | IN($storedArray[]).
Expanding on the answers here, If you need to filter the array of fruit against another array of fruit, you could do something like this:
cat food.json | jq '[.fruit[] as $fruits | (["banana", "apple"] | contains([$fruits])) as $results | $fruits | select($results)]'
This will return an array only containing "apple" in the above sample json.
This modified sample did worked here:
jq -r '.fruit | index( "orange" )' fruit.json | tail -n 1
It gets only the last line of the output.
If it exist, it returns 0.
If don't, it returns null.

jq select elements with array not containing string

Now, this is somewhat similar to jq: select only an array which contains element A but not element B but it somehow doesn't work for me (which is likely my fault)... ;-)
So here's what we have:
[ {
"employeeType": "student",
"cn": "dc8aff1",
"uid": "dc8aff1",
"ou": [
"4210910",
"4210910 #Abg",
"4210910 Abgang",
"4240115",
"4240115 5",
"4240115 5\/5"
]
},
{
"employeeType": "student",
"cn": "160f656",
"uid": "160f656",
"ou": [
"4210910",
"4210910 3",
"4210910 3a"
] } ]
I'd like to select all elements where ou does not contain a specific string, say "4210910 3a" or - which would be even better - where ou does not contain any member of a given list of strings.
When it comes to possibly changing inputs, you should make it a parameter to your filter, rather than hardcoding it in. Also, using contains might not work for you in general. It runs the filter recursively so even substrings will match which might not be preferred.
For example:
["10", "20", "30", "40", "50"] | contains(["0"])
is true
I would write it like this:
$ jq --argjson ex '["4210910 3a"]' 'map(select(all(.ou[]; $ex[]!=.)))' input.json
This response addresses the case where .ou is an array and we are given another array of forbidden strings.
For clarity, let's define a filter, intersectq(a;b), that will return true iff the arrays have an element in common:
def intersectq(a;b):
any(a[]; . as $x | any( b[]; . == $x) );
This is effectively a loop-within-a-loop, but because of the semantics of any/2, the computation will stop once a match has been found.(*)
Assuming $ex is the list of exceptions, then the filter we could use to solve the problem would be:
map(select(intersectq(.ou; $ex) | not))
For example, we could use an invocation along the lines suggested by Jeff:
$ jq --argjson ex '["4210910 3a"]' -f myfilter.jq input.json
Now you might ask: why use the any-within-any double loop rather than .[]-within-all double loop? The answer is efficiency, as can be seen using debug:
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; ($b[] | debug) != .)'
["DEBUG:",1]
["DEBUG:",1]
false
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; . as $x | all( $b[]; debug | $x != .))'
["DEBUG:",1]
false
(*) Footnote
Of course intersectq/2 as defined here is still O(m*n) and thus inefficient, but the main point of this post is to highlight the drawback of the .[]-within-all double loop.
Here is a solution that checks the .ou member of each element of the input using foreach and contains.
["4210910 3a"] as $list # adjust as necessary
| .[]
| foreach $list[] as $e (
.; .; if .ou | contains([$e]) then . else empty end
)
EDIT: I now realize a filter of the form foreach E as $X (.; .; R) can almost always be rewritten as E as $X | R so the above is really just
["4210910 3a"] as $list
| .[]
| $list[] as $e
| if .ou | contains([$e]) then . else empty end

jq - How to select objects based on a 'whitelist' of property values

Since an example is worth a thousand words, say I have the following JSON stream:
{"a": 0, "b": 1}
{"a": 2, "b": 2}
{"a": 7, "b": null}
{"a": 3, "b": 7}
How can I keep all the objects for which the .b property is one of [1, 7] (in reality the list is much longer so I don't want to do select(.b == 1 or .b == 7)). I'm looking for something like this: select(.b in [1, 7]), but I couldn't find what I'm looking for in the man page.
Doing $value in $collection could be achieved using the pattern select($value == $collection[]). A more efficient alternative would be select(any($value == $collection[]; .)) So your filter should be this:
[1, 7] as $whitelist | select(any(.b == $whitelist[]; .))
Having the array in a variable has its benefits as it lets you change the whitelist easily using arguments.
$ jq --argjson whitelist '[2, 7]' 'select(any(.b == $whitelist[]; .))'
The following approach using index/1 is similar to what was originally sought (".b in [1, 7]"), and might be noticeably faster than using .[] within select if the whitelist is large.
If your jq supports --argjson:
jq --argjson w '[1,7]' '. as $in | select($w | index($in.b))'
Otherwise:
jq --arg w '[1,7]' '. as $in | ($w|fromjson) as $w | select($w | index($in.b))'
or:
jq '. as $in | select([1, 7] | index($in.b))'
UPDATE
On Jan 30, 2017, a builtin named IN was added for efficiently testing whether a JSON entity is contained in a stream. It can also be used for efficiently testing membership in an array. For example, the above invocation with --argjson can be simplified to:
jq --argjson w '[1,7]' 'select( .b | IN($w[]) )'
If your jq does not have IN/1, then so long as your jq has first/1, you can use this equivalent definition:
def IN(s): . as $in | first(if (s == $in) then true else empty end) // false;

How to get the index path of found values using jq?

Say I have a JSON like this:
{
"json": [
"a",
[
"b",
"c",
[
"d",
"foo",
1
],
[
[
42,
"foo"
]
]
]
]
}
And I want an array of jq index paths that contain foo:
[
".json[1][2][1]",
".json[1][3][0][1]"
]
Can I achieve this using jq and how?
I tried recurse | .foo to get the matches first but I receive an error: Cannot index array with string "foo".
First of all, I'm not sure what is the purpose of obtaining an array of jq programs. While means of doing this exist, they are seldom necessary; jq does not provide any sort of eval command.
jq has the concept of a path, which is an array of strings and numbers representing the position of an element in a JSON; this is equivalent to the strings on your expected output. As an example, ".json[1][2][1]" would be represented as ["json", 1, 2, 1]. The standard library contains several functions that operate with this concept, such as getpath, setpath, paths and leaf_paths.
We can thus obtain all leaf paths in the given JSON and iterate through them, select those for which their value in the input JSON is "foo", and generate an array out of them:
jq '[paths as $path | select(getpath($path) == "foo") | $path]'
This will return, for your given input, the following output:
[
["json", 1, 2, 1],
["json", 1, 3, 0, 1]
]
Now, although it should not be necessary, and it is most likely a sign that you're approaching whatever problem you are facing in the wrong way, it is possible to convert these arrays to the jq path strings you seek by transforming each path through the following script:
".\(map("[\(tojson)]") | join(""))"
The full script would therefore be:
jq '[paths as $path | select(getpath($path) == "foo") | $path | ".\(map("[\(tojson)]") | join(""))"]'
And its output would be:
[
".[\"json\"][1][2][1]",
".[\"json\"][1][3][0][1]"
]
Santiago's excellent program can be further tweaked to produce output in the requested format:
def jqpath:
def t: test("^[A-Za-z_][A-Za-z0-9_]*$");
reduce .[] as $x
("";
if ($x|type) == "string"
then . + ($x | if t then ".\(.)" else ".[" + tojson + "]" end)
else . + "[\($x)]"
end);
[paths as $path | select( getpath($path) == "foo" ) | $path | jqpath]
jq -f wrangle.jq input.json
[
".json[1][2][1]",
".json[1][3][0][1]"
]

How do i add an index in jq

I want to use jq map my input
["a", "b"]
to output
[{name: "a", index: 0}, {name: "b", index: 1}]
I got as far as
0 as $i | def incr: $i = $i + 1; [.[] | {name:., index:incr}]'
which outputs:
[
{
"name": "a",
"index": 1
},
{
"name": "b",
"index": 1
}
]
But I'm missing something.
Any ideas?
It's easier than you think.
to_entries | map({name:.value, index:.key})
to_entries takes an object and returns an array of key/value pairs. In the case of arrays, it effectively makes index/value pairs. You could map those pairs to the items you wanted.
A more "hands-on" approach is to use reduce:
["a", "b"] | . as $in | reduce range(0;length) as $i ([]; . + [{"name": $in[$i], "index": $i}])
Here are a few more ways. Assuming input.json contains your data
["a", "b"]
and you invoke jq as
jq -M -c -f filter.jq input.json
then any of the following filter.jq filters will generate
{"name":"a","index":0}
{"name":"b","index":1}
1) using keys and foreach
foreach keys[] as $k (.;.;[$k,.[$k]])
| {name:.[1], index:.[0]}
EDIT: I now realize a filter of the form foreach E as $X (.; .; R) can almost always be rewritten as E as $X | R so the above is really just
keys[] as $k
| [$k, .[$k]]
| {name:.[1], index:.[0]}
which can be simplified to
keys[] as $k
| {name:.[$k], index:$k}
2) using keys and transpose
[keys, .]
| transpose[]
| {name:.[1], index:.[0]}
3) using a function
def enumerate:
def _enum(i):
if length<1
then empty
else [i, .[0]], (.[1:] | _enum(i+1))
end
;
_enum(0)
;
enumerate
| {name:.[1], index:.[0]}

Resources