jq select elements with array not containing string - jq

Now, this is somewhat similar to jq: select only an array which contains element A but not element B but it somehow doesn't work for me (which is likely my fault)... ;-)
So here's what we have:
[ {
"employeeType": "student",
"cn": "dc8aff1",
"uid": "dc8aff1",
"ou": [
"4210910",
"4210910 #Abg",
"4210910 Abgang",
"4240115",
"4240115 5",
"4240115 5\/5"
]
},
{
"employeeType": "student",
"cn": "160f656",
"uid": "160f656",
"ou": [
"4210910",
"4210910 3",
"4210910 3a"
] } ]
I'd like to select all elements where ou does not contain a specific string, say "4210910 3a" or - which would be even better - where ou does not contain any member of a given list of strings.

When it comes to possibly changing inputs, you should make it a parameter to your filter, rather than hardcoding it in. Also, using contains might not work for you in general. It runs the filter recursively so even substrings will match which might not be preferred.
For example:
["10", "20", "30", "40", "50"] | contains(["0"])
is true
I would write it like this:
$ jq --argjson ex '["4210910 3a"]' 'map(select(all(.ou[]; $ex[]!=.)))' input.json

This response addresses the case where .ou is an array and we are given another array of forbidden strings.
For clarity, let's define a filter, intersectq(a;b), that will return true iff the arrays have an element in common:
def intersectq(a;b):
any(a[]; . as $x | any( b[]; . == $x) );
This is effectively a loop-within-a-loop, but because of the semantics of any/2, the computation will stop once a match has been found.(*)
Assuming $ex is the list of exceptions, then the filter we could use to solve the problem would be:
map(select(intersectq(.ou; $ex) | not))
For example, we could use an invocation along the lines suggested by Jeff:
$ jq --argjson ex '["4210910 3a"]' -f myfilter.jq input.json
Now you might ask: why use the any-within-any double loop rather than .[]-within-all double loop? The answer is efficiency, as can be seen using debug:
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; ($b[] | debug) != .)'
["DEBUG:",1]
["DEBUG:",1]
false
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; . as $x | all( $b[]; debug | $x != .))'
["DEBUG:",1]
false
(*) Footnote
Of course intersectq/2 as defined here is still O(m*n) and thus inefficient, but the main point of this post is to highlight the drawback of the .[]-within-all double loop.

Here is a solution that checks the .ou member of each element of the input using foreach and contains.
["4210910 3a"] as $list # adjust as necessary
| .[]
| foreach $list[] as $e (
.; .; if .ou | contains([$e]) then . else empty end
)
EDIT: I now realize a filter of the form foreach E as $X (.; .; R) can almost always be rewritten as E as $X | R so the above is really just
["4210910 3a"] as $list
| .[]
| $list[] as $e
| if .ou | contains([$e]) then . else empty end

Related

how to accomplish each_slice like ruby with jq

Sample Input
[1,2,3,4,5,6,7,8,9]
My Solution
$ echo '[1,2,3,4,5,6,7,8,9]' | jq --arg g 4 '. as $l|($g|tonumber) as $n |$l|length as $c|[range(0;$c;($g|tonumber))]|map($l[.:.+$n])' -c
Output
[[1,2,3,4],[5,6,7,8],[9]]
shorthand, handy method anything else?
Use a while loop to chop off the first 4 elements .[4:] until the array is empty []. Then, for each result array, consider only its first 4 items [:4]. Generalized to $n:
jq -c --argjson n 4 '[while(. != []; .[$n:])[:$n]]'
[[1,2,3,4],[5,6,7,8],[9]]
Demo
There's an undocumented builtin function, _nwise/1, which you would use like this:
jq -nc --argjson n 4 '[1,2,3,4,5,6,7,8,9] | [_nwise($n)]'
[[1,2,3,4],[5,6,7,8],[9]]
Notice that using --argjson allows you to avoid the call to tonumber.
One way using reduce operating on the whole list, forming only n entries (sub-arrays) at a time
jq -c --argjson g 4 '. as $input |
reduce range(0; ( $input | length ) ; $g) as $r ( []; . + [ $input[ $r: ( $r + $g ) ] ] )'
The three argument form of range(from: upto; by) generates numbers from to upto with an increment of by
E.g. range(0; 9; 4) from your original input produces a set of indices - 0, 4, 8 which is ranged over and the final list is formed by appending the slices, coming out of the array slice operation e.g. [0:4], [4:8] and [8:12]

Check if element is member of array [duplicate]

I have an array and I need to check if elements exists in that array or to get that element from the array using
jq, fruit.json:
{
"fruit": [
"apple",
"orange",
"pomegranate",
"apricot",
"mango"
]
}
cat fruit.json | jq '.fruit .apple'
does not work
The semantics of 'contains' is not straightforward at all. In general, it would be better to use 'index' to test if an array has a specific value, e.g.
.fruit | index( "orange" )
However, if the item of interest is itself an array, the general form:
ARRAY | index( [ITEM] )
should be used, e.g.:
[1, [2], 3] | index( [[2]] ) #=> 1
IN/1
If your jq has IN/1 then a better solution is to use it:
.fruit as $f | "orange" | IN($f[])
If your jq has first/1 (as does jq 1.5), then here is a fast definition of IN/1 to use:
def IN(s): first((s == .) // empty) // false;
any(_;_)
Another efficient alternative that is sometimes more convenient is to use any/2, e.g.
any(.fruit[]; . == "orange")
or equivalently:
any(.fruit[] == "orange"; .)
To have jq return success if the array fruit contains "apple", and error otherwise:
jq -e '.fruit|any(. == "apple")' fruit.json >/dev/null
To output the element(s) found, change to
jq -e '.fruit[]|select(. == "apple")' fruit.json
If searching for a fixed string, this isn't very relevant, but it might be if the select expression might match different values, e.g. if it's a regexp.
To output only distinct values, pass the results to unique.
jq '[.fruit[]|select(match("^app"))]|unique' fruit.json
will search for all fruits starting with app, and output unique values. (Note that the original expression had to be wrapped in [] in order to be passed to unique.)
[WARNING: SEE THE COMMENTS AND ALTERNATIVE ANSWERS.]
cat fruit.json | jq '.fruit | contains(["orange"])'
For future visitors, if you happen to have the array in a variable and want to check the input against it, and you have jq 1.5 (without IN), your best option is index but with a second variable:
.inputField as $inputValue | $storedArray|index($inputValue)
This is functionally equivalent to .inputField | IN($storedArray[]).
Expanding on the answers here, If you need to filter the array of fruit against another array of fruit, you could do something like this:
cat food.json | jq '[.fruit[] as $fruits | (["banana", "apple"] | contains([$fruits])) as $results | $fruits | select($results)]'
This will return an array only containing "apple" in the above sample json.
This modified sample did worked here:
jq -r '.fruit | index( "orange" )' fruit.json | tail -n 1
It gets only the last line of the output.
If it exist, it returns 0.
If don't, it returns null.

Using jq how do I search against an array of strings? [duplicate]

Since an example is worth a thousand words, say I have the following JSON stream:
{"a": 0, "b": 1}
{"a": 2, "b": 2}
{"a": 7, "b": null}
{"a": 3, "b": 7}
How can I keep all the objects for which the .b property is one of [1, 7] (in reality the list is much longer so I don't want to do select(.b == 1 or .b == 7)). I'm looking for something like this: select(.b in [1, 7]), but I couldn't find what I'm looking for in the man page.
Doing $value in $collection could be achieved using the pattern select($value == $collection[]). A more efficient alternative would be select(any($value == $collection[]; .)) So your filter should be this:
[1, 7] as $whitelist | select(any(.b == $whitelist[]; .))
Having the array in a variable has its benefits as it lets you change the whitelist easily using arguments.
$ jq --argjson whitelist '[2, 7]' 'select(any(.b == $whitelist[]; .))'
The following approach using index/1 is similar to what was originally sought (".b in [1, 7]"), and might be noticeably faster than using .[] within select if the whitelist is large.
If your jq supports --argjson:
jq --argjson w '[1,7]' '. as $in | select($w | index($in.b))'
Otherwise:
jq --arg w '[1,7]' '. as $in | ($w|fromjson) as $w | select($w | index($in.b))'
or:
jq '. as $in | select([1, 7] | index($in.b))'
UPDATE
On Jan 30, 2017, a builtin named IN was added for efficiently testing whether a JSON entity is contained in a stream. It can also be used for efficiently testing membership in an array. For example, the above invocation with --argjson can be simplified to:
jq --argjson w '[1,7]' 'select( .b | IN($w[]) )'
If your jq does not have IN/1, then so long as your jq has first/1, you can use this equivalent definition:
def IN(s): . as $in | first(if (s == $in) then true else empty end) // false;

jq - How to select objects based on a 'whitelist' of property values

Since an example is worth a thousand words, say I have the following JSON stream:
{"a": 0, "b": 1}
{"a": 2, "b": 2}
{"a": 7, "b": null}
{"a": 3, "b": 7}
How can I keep all the objects for which the .b property is one of [1, 7] (in reality the list is much longer so I don't want to do select(.b == 1 or .b == 7)). I'm looking for something like this: select(.b in [1, 7]), but I couldn't find what I'm looking for in the man page.
Doing $value in $collection could be achieved using the pattern select($value == $collection[]). A more efficient alternative would be select(any($value == $collection[]; .)) So your filter should be this:
[1, 7] as $whitelist | select(any(.b == $whitelist[]; .))
Having the array in a variable has its benefits as it lets you change the whitelist easily using arguments.
$ jq --argjson whitelist '[2, 7]' 'select(any(.b == $whitelist[]; .))'
The following approach using index/1 is similar to what was originally sought (".b in [1, 7]"), and might be noticeably faster than using .[] within select if the whitelist is large.
If your jq supports --argjson:
jq --argjson w '[1,7]' '. as $in | select($w | index($in.b))'
Otherwise:
jq --arg w '[1,7]' '. as $in | ($w|fromjson) as $w | select($w | index($in.b))'
or:
jq '. as $in | select([1, 7] | index($in.b))'
UPDATE
On Jan 30, 2017, a builtin named IN was added for efficiently testing whether a JSON entity is contained in a stream. It can also be used for efficiently testing membership in an array. For example, the above invocation with --argjson can be simplified to:
jq --argjson w '[1,7]' 'select( .b | IN($w[]) )'
If your jq does not have IN/1, then so long as your jq has first/1, you can use this equivalent definition:
def IN(s): . as $in | first(if (s == $in) then true else empty end) // false;

How to get the index path of found values using jq?

Say I have a JSON like this:
{
"json": [
"a",
[
"b",
"c",
[
"d",
"foo",
1
],
[
[
42,
"foo"
]
]
]
]
}
And I want an array of jq index paths that contain foo:
[
".json[1][2][1]",
".json[1][3][0][1]"
]
Can I achieve this using jq and how?
I tried recurse | .foo to get the matches first but I receive an error: Cannot index array with string "foo".
First of all, I'm not sure what is the purpose of obtaining an array of jq programs. While means of doing this exist, they are seldom necessary; jq does not provide any sort of eval command.
jq has the concept of a path, which is an array of strings and numbers representing the position of an element in a JSON; this is equivalent to the strings on your expected output. As an example, ".json[1][2][1]" would be represented as ["json", 1, 2, 1]. The standard library contains several functions that operate with this concept, such as getpath, setpath, paths and leaf_paths.
We can thus obtain all leaf paths in the given JSON and iterate through them, select those for which their value in the input JSON is "foo", and generate an array out of them:
jq '[paths as $path | select(getpath($path) == "foo") | $path]'
This will return, for your given input, the following output:
[
["json", 1, 2, 1],
["json", 1, 3, 0, 1]
]
Now, although it should not be necessary, and it is most likely a sign that you're approaching whatever problem you are facing in the wrong way, it is possible to convert these arrays to the jq path strings you seek by transforming each path through the following script:
".\(map("[\(tojson)]") | join(""))"
The full script would therefore be:
jq '[paths as $path | select(getpath($path) == "foo") | $path | ".\(map("[\(tojson)]") | join(""))"]'
And its output would be:
[
".[\"json\"][1][2][1]",
".[\"json\"][1][3][0][1]"
]
Santiago's excellent program can be further tweaked to produce output in the requested format:
def jqpath:
def t: test("^[A-Za-z_][A-Za-z0-9_]*$");
reduce .[] as $x
("";
if ($x|type) == "string"
then . + ($x | if t then ".\(.)" else ".[" + tojson + "]" end)
else . + "[\($x)]"
end);
[paths as $path | select( getpath($path) == "foo" ) | $path | jqpath]
jq -f wrangle.jq input.json
[
".json[1][2][1]",
".json[1][3][0][1]"
]

Resources