Check if element is member of array [duplicate] - jq

I have an array and I need to check if elements exists in that array or to get that element from the array using
jq, fruit.json:
{
"fruit": [
"apple",
"orange",
"pomegranate",
"apricot",
"mango"
]
}
cat fruit.json | jq '.fruit .apple'
does not work

The semantics of 'contains' is not straightforward at all. In general, it would be better to use 'index' to test if an array has a specific value, e.g.
.fruit | index( "orange" )
However, if the item of interest is itself an array, the general form:
ARRAY | index( [ITEM] )
should be used, e.g.:
[1, [2], 3] | index( [[2]] ) #=> 1
IN/1
If your jq has IN/1 then a better solution is to use it:
.fruit as $f | "orange" | IN($f[])
If your jq has first/1 (as does jq 1.5), then here is a fast definition of IN/1 to use:
def IN(s): first((s == .) // empty) // false;
any(_;_)
Another efficient alternative that is sometimes more convenient is to use any/2, e.g.
any(.fruit[]; . == "orange")
or equivalently:
any(.fruit[] == "orange"; .)

To have jq return success if the array fruit contains "apple", and error otherwise:
jq -e '.fruit|any(. == "apple")' fruit.json >/dev/null
To output the element(s) found, change to
jq -e '.fruit[]|select(. == "apple")' fruit.json
If searching for a fixed string, this isn't very relevant, but it might be if the select expression might match different values, e.g. if it's a regexp.
To output only distinct values, pass the results to unique.
jq '[.fruit[]|select(match("^app"))]|unique' fruit.json
will search for all fruits starting with app, and output unique values. (Note that the original expression had to be wrapped in [] in order to be passed to unique.)

[WARNING: SEE THE COMMENTS AND ALTERNATIVE ANSWERS.]
cat fruit.json | jq '.fruit | contains(["orange"])'

For future visitors, if you happen to have the array in a variable and want to check the input against it, and you have jq 1.5 (without IN), your best option is index but with a second variable:
.inputField as $inputValue | $storedArray|index($inputValue)
This is functionally equivalent to .inputField | IN($storedArray[]).

Expanding on the answers here, If you need to filter the array of fruit against another array of fruit, you could do something like this:
cat food.json | jq '[.fruit[] as $fruits | (["banana", "apple"] | contains([$fruits])) as $results | $fruits | select($results)]'
This will return an array only containing "apple" in the above sample json.

This modified sample did worked here:
jq -r '.fruit | index( "orange" )' fruit.json | tail -n 1
It gets only the last line of the output.
If it exist, it returns 0.
If don't, it returns null.

Related

jq: Why do two expressions which produce identical output produce different output when surrounded by an array operator?

I have been trying to understand jq, and the following piuzzle is giving me a headache: I can construct two expressions, A and B, which seem to produce the same output. And yet, when I surround them with [] array construction braces (as in [A] and [B]), they produce different output. In this case, the expressions are:
A := jq '. | add'
B := jq -s `.[] | add`
Concretely:
$ echo '[1,2] [3,4]' | jq '.'
[1,2]
[3,4]
$ echo '[1,2] [3,4]' | jq '. | add'
3
7
# Now surround with array construction and we get two values:
$ echo '[1,2] [3,4]' | jq '[. | add]'
[3]
[7]
$ echo '[1,2] [3,4]' | jq -s '.[]'
[1,2]
[3,4]
$ echo '[1,2] [3,4]' | jq -s '.[] | add'
3
7
# Now surround with array construction and we get only one value:
$ echo '[1,2] [3,4]' | jq -s '[.[] | add]'
[3,7]
What is going on here? Why is it that the B expression, which applies the --slurp setting but appears to produce identical intermediate output to the A expression, produces different output when surrounded with [] array construction brackets?
When jq is fed with a stream, just like [1,2] [3,4] with two inputs, it executes the filter independently for each. That's why jq '[. | add]' will produce two results; each addition will separately be wrapped into an array.
When jq is given the --slurp option, it combines the stream to an array, rendering it just one input. Therefore jq -s '[.[] | add]' will have one result only; the multiple additions will be caught by the array constructor, which is executed just once.

How to use jq to format array of objects to separated list of key values

How can I (generically) transform the input file below to the output file below, using jq. The record format of the output file is: array_index | key | value
Input file:
[{"a": 1, "b": 10},
{"a": 2, "d": "fred", "e": 30}]
Output File:
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Here's a solution using tostream, which creates a stream of paths and their values. Filter out those having values using select, flatten to align both, and join for the output format:
jq -r 'tostream | select(has(1)) | flatten | join("|")'
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Demo
Or a very similar one using paths to get the paths, scalars for the filter, and getpath for the corresponding value:
jq -r 'paths(scalars) as $p | [$p[], getpath($p)] | join("|")'
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Demo
< file.json jq -r 'to_entries
| .[]
| .key as $k
| ((.value | to_entries )[]
| [$k, .key, .value])
| #csv'
Output:
0,"a",1
0,"b",10
1,"a",2
1,"d","fred"
1,"e",30
You just need to remove the double quotes.
to_entries can be used to loop over the elements of arrays and objects in a way that gives both the key (index) and the value of the element.
jq -r '
to_entries[] |
.key as $id |
.value |
to_entries[] |
[ $id, .key, .value ] |
join("|")
'
Demo on jqplay
Replace join("|") with #csv to get proper CSV.

Using jq how do I search against an array of strings? [duplicate]

Since an example is worth a thousand words, say I have the following JSON stream:
{"a": 0, "b": 1}
{"a": 2, "b": 2}
{"a": 7, "b": null}
{"a": 3, "b": 7}
How can I keep all the objects for which the .b property is one of [1, 7] (in reality the list is much longer so I don't want to do select(.b == 1 or .b == 7)). I'm looking for something like this: select(.b in [1, 7]), but I couldn't find what I'm looking for in the man page.
Doing $value in $collection could be achieved using the pattern select($value == $collection[]). A more efficient alternative would be select(any($value == $collection[]; .)) So your filter should be this:
[1, 7] as $whitelist | select(any(.b == $whitelist[]; .))
Having the array in a variable has its benefits as it lets you change the whitelist easily using arguments.
$ jq --argjson whitelist '[2, 7]' 'select(any(.b == $whitelist[]; .))'
The following approach using index/1 is similar to what was originally sought (".b in [1, 7]"), and might be noticeably faster than using .[] within select if the whitelist is large.
If your jq supports --argjson:
jq --argjson w '[1,7]' '. as $in | select($w | index($in.b))'
Otherwise:
jq --arg w '[1,7]' '. as $in | ($w|fromjson) as $w | select($w | index($in.b))'
or:
jq '. as $in | select([1, 7] | index($in.b))'
UPDATE
On Jan 30, 2017, a builtin named IN was added for efficiently testing whether a JSON entity is contained in a stream. It can also be used for efficiently testing membership in an array. For example, the above invocation with --argjson can be simplified to:
jq --argjson w '[1,7]' 'select( .b | IN($w[]) )'
If your jq does not have IN/1, then so long as your jq has first/1, you can use this equivalent definition:
def IN(s): . as $in | first(if (s == $in) then true else empty end) // false;

jq select elements with array not containing string

Now, this is somewhat similar to jq: select only an array which contains element A but not element B but it somehow doesn't work for me (which is likely my fault)... ;-)
So here's what we have:
[ {
"employeeType": "student",
"cn": "dc8aff1",
"uid": "dc8aff1",
"ou": [
"4210910",
"4210910 #Abg",
"4210910 Abgang",
"4240115",
"4240115 5",
"4240115 5\/5"
]
},
{
"employeeType": "student",
"cn": "160f656",
"uid": "160f656",
"ou": [
"4210910",
"4210910 3",
"4210910 3a"
] } ]
I'd like to select all elements where ou does not contain a specific string, say "4210910 3a" or - which would be even better - where ou does not contain any member of a given list of strings.
When it comes to possibly changing inputs, you should make it a parameter to your filter, rather than hardcoding it in. Also, using contains might not work for you in general. It runs the filter recursively so even substrings will match which might not be preferred.
For example:
["10", "20", "30", "40", "50"] | contains(["0"])
is true
I would write it like this:
$ jq --argjson ex '["4210910 3a"]' 'map(select(all(.ou[]; $ex[]!=.)))' input.json
This response addresses the case where .ou is an array and we are given another array of forbidden strings.
For clarity, let's define a filter, intersectq(a;b), that will return true iff the arrays have an element in common:
def intersectq(a;b):
any(a[]; . as $x | any( b[]; . == $x) );
This is effectively a loop-within-a-loop, but because of the semantics of any/2, the computation will stop once a match has been found.(*)
Assuming $ex is the list of exceptions, then the filter we could use to solve the problem would be:
map(select(intersectq(.ou; $ex) | not))
For example, we could use an invocation along the lines suggested by Jeff:
$ jq --argjson ex '["4210910 3a"]' -f myfilter.jq input.json
Now you might ask: why use the any-within-any double loop rather than .[]-within-all double loop? The answer is efficiency, as can be seen using debug:
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; ($b[] | debug) != .)'
["DEBUG:",1]
["DEBUG:",1]
false
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; . as $x | all( $b[]; debug | $x != .))'
["DEBUG:",1]
false
(*) Footnote
Of course intersectq/2 as defined here is still O(m*n) and thus inefficient, but the main point of this post is to highlight the drawback of the .[]-within-all double loop.
Here is a solution that checks the .ou member of each element of the input using foreach and contains.
["4210910 3a"] as $list # adjust as necessary
| .[]
| foreach $list[] as $e (
.; .; if .ou | contains([$e]) then . else empty end
)
EDIT: I now realize a filter of the form foreach E as $X (.; .; R) can almost always be rewritten as E as $X | R so the above is really just
["4210910 3a"] as $list
| .[]
| $list[] as $e
| if .ou | contains([$e]) then . else empty end

How to get the index path of found values using jq?

Say I have a JSON like this:
{
"json": [
"a",
[
"b",
"c",
[
"d",
"foo",
1
],
[
[
42,
"foo"
]
]
]
]
}
And I want an array of jq index paths that contain foo:
[
".json[1][2][1]",
".json[1][3][0][1]"
]
Can I achieve this using jq and how?
I tried recurse | .foo to get the matches first but I receive an error: Cannot index array with string "foo".
First of all, I'm not sure what is the purpose of obtaining an array of jq programs. While means of doing this exist, they are seldom necessary; jq does not provide any sort of eval command.
jq has the concept of a path, which is an array of strings and numbers representing the position of an element in a JSON; this is equivalent to the strings on your expected output. As an example, ".json[1][2][1]" would be represented as ["json", 1, 2, 1]. The standard library contains several functions that operate with this concept, such as getpath, setpath, paths and leaf_paths.
We can thus obtain all leaf paths in the given JSON and iterate through them, select those for which their value in the input JSON is "foo", and generate an array out of them:
jq '[paths as $path | select(getpath($path) == "foo") | $path]'
This will return, for your given input, the following output:
[
["json", 1, 2, 1],
["json", 1, 3, 0, 1]
]
Now, although it should not be necessary, and it is most likely a sign that you're approaching whatever problem you are facing in the wrong way, it is possible to convert these arrays to the jq path strings you seek by transforming each path through the following script:
".\(map("[\(tojson)]") | join(""))"
The full script would therefore be:
jq '[paths as $path | select(getpath($path) == "foo") | $path | ".\(map("[\(tojson)]") | join(""))"]'
And its output would be:
[
".[\"json\"][1][2][1]",
".[\"json\"][1][3][0][1]"
]
Santiago's excellent program can be further tweaked to produce output in the requested format:
def jqpath:
def t: test("^[A-Za-z_][A-Za-z0-9_]*$");
reduce .[] as $x
("";
if ($x|type) == "string"
then . + ($x | if t then ".\(.)" else ".[" + tojson + "]" end)
else . + "[\($x)]"
end);
[paths as $path | select( getpath($path) == "foo" ) | $path | jqpath]
jq -f wrangle.jq input.json
[
".json[1][2][1]",
".json[1][3][0][1]"
]

Resources