Creating a composite string sourced from multiple places in a JSON document - jq

Consider this JSON document
echo '
{
"alpha": {
"id": "id1",
"values": [
"one",
"two"
]
},
"beta": {
"id": "id2",
"values": [
"three"
]
}
}
' >data.json
check syntax
$ yq -p json -P -o j 'true ' data.json
true
I want to generate a series of strings that combines the id field with each of the values. So output I need should look like this
"id1-one"
"id1-two"
"id2-three"
This is what I've tried
$ yq -p json -P -o j '.[] | .id as $ID | .values[] | $ID + "-" + . ' data.json
"id1-one"
"id2-one"
"id1-two"
"id2-two"
"id1-three"
"id2-three"
There seems to be a multiplication factor kicking in with the $ID variable. Is this the correct approach to get attributes from a different scope, or is there a cleaner way to achieve this?
Note -- the real JSON document contains a lot more nesting, so there are multiple nested arrays/objects between the values and the id attributes.
One final point. I tried the same code with jq and it worked fine.
$ jq ' .[] | .id as $ID | .values[] | $ID + "-" + . ' data.json
"id1-one"
"id1-two"
"id2-three"

Do you need that variable elsewhere? Because it just works without:
yq -p json -P -o json '.[] | .id + "-" + .values[]' data.json
"id1-one"
"id1-two"
"id2-three"
Tested with mikefarah/yq version v4.30.5

Related

Pretty printing in jq

jq . has the side-effect of pretty printing the input.
$ echo '{"foo":"bar", "baz":[1,2,3]}' | jq .
{
"foo": "bar",
"baz": [
1,
2,
3
]
}
But if I want to use jq to incorporate the input with some surrounding text, the input renders compactly
$ echo '{"foo":"bar", "baz":[1,2,3]}' | jq -r '"My value is:\n\(.)\nSome other stuff"'
My value is:
{"foo":"bar","baz":[1,2,3]}
Some other stuff
Is there any way to force pretty printing here? I'd like the output to be
My value is:
{
"foo": "bar",
"baz": [
1,
2,
3
]
}
Some other stuff
I did find a solution while I was writing the question: don't put the input inside string interpolation, output a stream of things:
echo '{"foo":"bar", "baz":[1,2,3]}' | jq -r '"My value is:", . , "Some other stuff"'
# .........................................................^^^^^
outputs
My value is:
{
"foo": "bar",
"baz": [
1,
2,
3
]
}
Some other stuff
This might not fit your actual use case, but instead of creating a single JSON string in jq, just use a shell command group.
echo '{"foo":"bar", "baz":[1,2,3]}' | { echo "My value is:"; jq .; echo "Some other stuff"; }

How to use jq to format array of objects to separated list of key values

How can I (generically) transform the input file below to the output file below, using jq. The record format of the output file is: array_index | key | value
Input file:
[{"a": 1, "b": 10},
{"a": 2, "d": "fred", "e": 30}]
Output File:
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Here's a solution using tostream, which creates a stream of paths and their values. Filter out those having values using select, flatten to align both, and join for the output format:
jq -r 'tostream | select(has(1)) | flatten | join("|")'
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Demo
Or a very similar one using paths to get the paths, scalars for the filter, and getpath for the corresponding value:
jq -r 'paths(scalars) as $p | [$p[], getpath($p)] | join("|")'
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Demo
< file.json jq -r 'to_entries
| .[]
| .key as $k
| ((.value | to_entries )[]
| [$k, .key, .value])
| #csv'
Output:
0,"a",1
0,"b",10
1,"a",2
1,"d","fred"
1,"e",30
You just need to remove the double quotes.
to_entries can be used to loop over the elements of arrays and objects in a way that gives both the key (index) and the value of the element.
jq -r '
to_entries[] |
.key as $id |
.value |
to_entries[] |
[ $id, .key, .value ] |
join("|")
'
Demo on jqplay
Replace join("|") with #csv to get proper CSV.

jq - Get objects with latest date

Json looks like this:
cat test.json |jq -r ".nodes[].run_data"
{
"id": "1234",
"status": "PASSED",
"penultimate_status": "PASSED",
"end_time":"2022-02-28T09:50:05Z"
}
{
"id": "4321",
"status": "PASSED",
"penultimate_status": "UNKNOWN",
"end_time": "2020-10-14T13:52:57Z"
}
I want to get "status" and "end_time" of the newest run. Unfortunately the order is not fix. Meaning the newest run can be first in the list, but also last or in the middle...
Use sort_by to bring the items in order, then extract the last item:
jq '
[.nodes[].run_data]
| sort_by(.end_time) | last
| {status, end_time}
' test.json
{
"status": "PASSED",
"end_time": "2022-02-28T09:50:05Z"
}
To get the fields in another format, replace {status, end_time} with your format, e.g. "\(.end_time): Status \(.status)", and set the -r flag as this isn't JSON anymore but raw text.
You can use transpose to map each object with its end_time.
Here I have converted end_time to seconds since Unix epoch and outputted the object with largest seconds value (this is the newest).
[
[. | map(.end_time | strptime("%Y-%m-%dT%H:%M:%SZ") | mktime), [.[0], .[1]]]
| transpose[]
| .[1] += {secs: .[0]} | .[1]
]
| sort_by(.secs) | last
| {status, end_time}
Output
{
"status": "PASSED",
"end_time": "2022-02-28T09:50:05Z"
}
Demo
https://jqplay.org/s/w1z2n2drc7

jq select elements with array not containing string

Now, this is somewhat similar to jq: select only an array which contains element A but not element B but it somehow doesn't work for me (which is likely my fault)... ;-)
So here's what we have:
[ {
"employeeType": "student",
"cn": "dc8aff1",
"uid": "dc8aff1",
"ou": [
"4210910",
"4210910 #Abg",
"4210910 Abgang",
"4240115",
"4240115 5",
"4240115 5\/5"
]
},
{
"employeeType": "student",
"cn": "160f656",
"uid": "160f656",
"ou": [
"4210910",
"4210910 3",
"4210910 3a"
] } ]
I'd like to select all elements where ou does not contain a specific string, say "4210910 3a" or - which would be even better - where ou does not contain any member of a given list of strings.
When it comes to possibly changing inputs, you should make it a parameter to your filter, rather than hardcoding it in. Also, using contains might not work for you in general. It runs the filter recursively so even substrings will match which might not be preferred.
For example:
["10", "20", "30", "40", "50"] | contains(["0"])
is true
I would write it like this:
$ jq --argjson ex '["4210910 3a"]' 'map(select(all(.ou[]; $ex[]!=.)))' input.json
This response addresses the case where .ou is an array and we are given another array of forbidden strings.
For clarity, let's define a filter, intersectq(a;b), that will return true iff the arrays have an element in common:
def intersectq(a;b):
any(a[]; . as $x | any( b[]; . == $x) );
This is effectively a loop-within-a-loop, but because of the semantics of any/2, the computation will stop once a match has been found.(*)
Assuming $ex is the list of exceptions, then the filter we could use to solve the problem would be:
map(select(intersectq(.ou; $ex) | not))
For example, we could use an invocation along the lines suggested by Jeff:
$ jq --argjson ex '["4210910 3a"]' -f myfilter.jq input.json
Now you might ask: why use the any-within-any double loop rather than .[]-within-all double loop? The answer is efficiency, as can be seen using debug:
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; ($b[] | debug) != .)'
["DEBUG:",1]
["DEBUG:",1]
false
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; . as $x | all( $b[]; debug | $x != .))'
["DEBUG:",1]
false
(*) Footnote
Of course intersectq/2 as defined here is still O(m*n) and thus inefficient, but the main point of this post is to highlight the drawback of the .[]-within-all double loop.
Here is a solution that checks the .ou member of each element of the input using foreach and contains.
["4210910 3a"] as $list # adjust as necessary
| .[]
| foreach $list[] as $e (
.; .; if .ou | contains([$e]) then . else empty end
)
EDIT: I now realize a filter of the form foreach E as $X (.; .; R) can almost always be rewritten as E as $X | R so the above is really just
["4210910 3a"] as $list
| .[]
| $list[] as $e
| if .ou | contains([$e]) then . else empty end

How to get the index path of found values using jq?

Say I have a JSON like this:
{
"json": [
"a",
[
"b",
"c",
[
"d",
"foo",
1
],
[
[
42,
"foo"
]
]
]
]
}
And I want an array of jq index paths that contain foo:
[
".json[1][2][1]",
".json[1][3][0][1]"
]
Can I achieve this using jq and how?
I tried recurse | .foo to get the matches first but I receive an error: Cannot index array with string "foo".
First of all, I'm not sure what is the purpose of obtaining an array of jq programs. While means of doing this exist, they are seldom necessary; jq does not provide any sort of eval command.
jq has the concept of a path, which is an array of strings and numbers representing the position of an element in a JSON; this is equivalent to the strings on your expected output. As an example, ".json[1][2][1]" would be represented as ["json", 1, 2, 1]. The standard library contains several functions that operate with this concept, such as getpath, setpath, paths and leaf_paths.
We can thus obtain all leaf paths in the given JSON and iterate through them, select those for which their value in the input JSON is "foo", and generate an array out of them:
jq '[paths as $path | select(getpath($path) == "foo") | $path]'
This will return, for your given input, the following output:
[
["json", 1, 2, 1],
["json", 1, 3, 0, 1]
]
Now, although it should not be necessary, and it is most likely a sign that you're approaching whatever problem you are facing in the wrong way, it is possible to convert these arrays to the jq path strings you seek by transforming each path through the following script:
".\(map("[\(tojson)]") | join(""))"
The full script would therefore be:
jq '[paths as $path | select(getpath($path) == "foo") | $path | ".\(map("[\(tojson)]") | join(""))"]'
And its output would be:
[
".[\"json\"][1][2][1]",
".[\"json\"][1][3][0][1]"
]
Santiago's excellent program can be further tweaked to produce output in the requested format:
def jqpath:
def t: test("^[A-Za-z_][A-Za-z0-9_]*$");
reduce .[] as $x
("";
if ($x|type) == "string"
then . + ($x | if t then ".\(.)" else ".[" + tojson + "]" end)
else . + "[\($x)]"
end);
[paths as $path | select( getpath($path) == "foo" ) | $path | jqpath]
jq -f wrangle.jq input.json
[
".json[1][2][1]",
".json[1][3][0][1]"
]

Resources