jq: Why do two expressions which produce identical output produce different output when surrounded by an array operator? - jq

I have been trying to understand jq, and the following piuzzle is giving me a headache: I can construct two expressions, A and B, which seem to produce the same output. And yet, when I surround them with [] array construction braces (as in [A] and [B]), they produce different output. In this case, the expressions are:
A := jq '. | add'
B := jq -s `.[] | add`
Concretely:
$ echo '[1,2] [3,4]' | jq '.'
[1,2]
[3,4]
$ echo '[1,2] [3,4]' | jq '. | add'
3
7
# Now surround with array construction and we get two values:
$ echo '[1,2] [3,4]' | jq '[. | add]'
[3]
[7]
$ echo '[1,2] [3,4]' | jq -s '.[]'
[1,2]
[3,4]
$ echo '[1,2] [3,4]' | jq -s '.[] | add'
3
7
# Now surround with array construction and we get only one value:
$ echo '[1,2] [3,4]' | jq -s '[.[] | add]'
[3,7]
What is going on here? Why is it that the B expression, which applies the --slurp setting but appears to produce identical intermediate output to the A expression, produces different output when surrounded with [] array construction brackets?

When jq is fed with a stream, just like [1,2] [3,4] with two inputs, it executes the filter independently for each. That's why jq '[. | add]' will produce two results; each addition will separately be wrapped into an array.
When jq is given the --slurp option, it combines the stream to an array, rendering it just one input. Therefore jq -s '[.[] | add]' will have one result only; the multiple additions will be caught by the array constructor, which is executed just once.

Related

How to use jq to format array of objects to separated list of key values

How can I (generically) transform the input file below to the output file below, using jq. The record format of the output file is: array_index | key | value
Input file:
[{"a": 1, "b": 10},
{"a": 2, "d": "fred", "e": 30}]
Output File:
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Here's a solution using tostream, which creates a stream of paths and their values. Filter out those having values using select, flatten to align both, and join for the output format:
jq -r 'tostream | select(has(1)) | flatten | join("|")'
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Demo
Or a very similar one using paths to get the paths, scalars for the filter, and getpath for the corresponding value:
jq -r 'paths(scalars) as $p | [$p[], getpath($p)] | join("|")'
0|a|1
0|b|10
1|a|2
1|d|fred
1|e|30
Demo
< file.json jq -r 'to_entries
| .[]
| .key as $k
| ((.value | to_entries )[]
| [$k, .key, .value])
| #csv'
Output:
0,"a",1
0,"b",10
1,"a",2
1,"d","fred"
1,"e",30
You just need to remove the double quotes.
to_entries can be used to loop over the elements of arrays and objects in a way that gives both the key (index) and the value of the element.
jq -r '
to_entries[] |
.key as $id |
.value |
to_entries[] |
[ $id, .key, .value ] |
join("|")
'
Demo on jqplay
Replace join("|") with #csv to get proper CSV.

how to accomplish each_slice like ruby with jq

Sample Input
[1,2,3,4,5,6,7,8,9]
My Solution
$ echo '[1,2,3,4,5,6,7,8,9]' | jq --arg g 4 '. as $l|($g|tonumber) as $n |$l|length as $c|[range(0;$c;($g|tonumber))]|map($l[.:.+$n])' -c
Output
[[1,2,3,4],[5,6,7,8],[9]]
shorthand, handy method anything else?
Use a while loop to chop off the first 4 elements .[4:] until the array is empty []. Then, for each result array, consider only its first 4 items [:4]. Generalized to $n:
jq -c --argjson n 4 '[while(. != []; .[$n:])[:$n]]'
[[1,2,3,4],[5,6,7,8],[9]]
Demo
There's an undocumented builtin function, _nwise/1, which you would use like this:
jq -nc --argjson n 4 '[1,2,3,4,5,6,7,8,9] | [_nwise($n)]'
[[1,2,3,4],[5,6,7,8],[9]]
Notice that using --argjson allows you to avoid the call to tonumber.
One way using reduce operating on the whole list, forming only n entries (sub-arrays) at a time
jq -c --argjson g 4 '. as $input |
reduce range(0; ( $input | length ) ; $g) as $r ( []; . + [ $input[ $r: ( $r + $g ) ] ] )'
The three argument form of range(from: upto; by) generates numbers from to upto with an increment of by
E.g. range(0; 9; 4) from your original input produces a set of indices - 0, 4, 8 which is ranged over and the final list is formed by appending the slices, coming out of the array slice operation e.g. [0:4], [4:8] and [8:12]

processing TSV embedded with JSON using jq?

$ jq --slurp '.[] | .a' <<< '{"a": 1}'$'\n''{"a": 2}'
1
2
I can process a one-column TSV file like above. When there are multiple columns and one column is JSON, how to print the processing result of the JSON column alone with other columns literally? In the following example, how to print the first column and the JSON processing result of the 2nd column?
$ jq --slurp '.[] | .a' <<< $'A\t{"a": 1}'$'\nB\t{"a": 2}'
parse error: Invalid numeric literal at line 1, column 2
Before piping your TSV file into jq you should extract the JSON column first. For instance, use cut from GNU coreutils to get the second field in a tab-separated line:
cut -f2 <<< $'A\t{"a": 1}'$'\nB\t{"a": 2}' | jq --slurp '.[] | .a'
In order to print the other columns as well, you may use paste to put the columns back together:
paste <(
cut -f1 <<< $'A\t{"a": 1}'$'\nB\t{"a": 2}'
) <(
cut -f2 <<< $'A\t{"a": 1}'$'\nB\t{"a": 2}' | jq --slurp '.[] | .a'
)
To solve this entirely in jq you have to read it as non-JSON first and interpret the second column as JSON using jq's fromjson
jq -Rr './"\t" | .[1] |= (fromjson | .a) | #tsv' <<< $'A\t{"a": 1}'$'\nB\t{"a": 2}'
jq --raw-input --raw-output --slurp 'split("\n") | map(split("\t")) | map(select(length>0)) | .[] | {"p":.[0], "j":.[1] | fromjson} | [.p, .j.a] | #tsv' <<< $'A\t{"a": 1}'$'\nB\t{"a": 2}'
A 1
B 2
or process line by line for huge data
cat ./data.txt | while read line;
do
echo "$line" | jq --raw-input --raw-output --slurp 'split("\t") | {"p":.[0], "j":.[1] | fromjson} | [.p, .j.a] | #tsv'
done

How can i reverse sort a letter by occurrence but if it has the amount of occurrences sort it alphabetically?

So I am making exercises for practice but i encountered something confusing. I need to count the amount of times every letter in a text document occurred and then print them one after another, the letter that appears the most first.
The problem lies in the fact that i need to sort the letters with the same number of occurrence alphabetically. It's a problem because sort automatically sort alphabetically so when I reverse sort by occurrence it automatically sorts the letters with the same number of occurrence in reverse alphabetical order.
I've tried to sort per column so it sorts the numbers first and the letters after but that doesn't work.
So lets just work with a few letters now.
echo eeeeerrrbbbcccnN | tr a-z A-Z | grep -iE [a-z] -o | sort | uniq -c | sort -rn | tr -d 0-9'\n '
The output is ERCBN but it needs to be EBCRN.
You can specify multiple sort conditions:
$ # OP's attempt
$ echo eeeeerrrbbbcccnN | tr a-z A-Z | grep -iE [a-z] -o | sort | uniq -c | sort -rn
5 E
3 R
3 C
3 B
2 N
$ # multiple column sort
$ # also note the change in grep command
$ echo eeeeerrrbbbcccnN | tr a-z A-Z | grep -o [A-Z] | sort | uniq -c | sort -k1,1nr -k2,2
5 E
3 B
3 C
3 R
2 N

UNIX : Substitute value of variable inside script

I may not have got the title perfect for this question, but I am wondering if there is a way to do the following :
Basically, I have a text file with some key value pairs and also a statement (in the same text file) which will be extracted by a shell script and which needs to also simultaneously substitute the A, B, C in the STATEMENT variable .
To make things simple, let me provide an example.
Here is my text file :
File : values.txt
A=1
B=2
C=3
STATEMENT=apple A orange B grape C
Also, I have a shell script which extracts these values and the statement from the text file and uses the STATEMENT variable as a parameter to another script it calls, something like:
Script : first_script.sh
A=`cat values.txt | grep -w '^A' | cut -d'=' -f2`
B=`cat values.txt | grep -w '^B' | cut -d'=' -f2`
C=`cat values.txt | grep -w '^C' | cut -d'=' -f2`
STATEMENT=`cat values.txt | grep -w 'STATEMENT' | cut -d'=' -f2`
second_script.sh $STATEMENT
As you can see, second_script is called from within first_script and it uses the STATEMENT variable, so what I expect to see with the second_script call is :
second_script.sh apple 1 orange 2 grape 3
Note that "A", "B" and "C" should get substituted to their values 1, 2 and 3.
However, what I get is still :
second_script.sh apple A orange B grape C
which is what I don't want.
How do I make sure that A, B and C get substituted to 1,2 and 3 respectively when second_script is called from first_script ?
Sorry to make it confusing.
You can substitute values in a string like this:
STATEMENT=${STATEMENT/A/$A} # replace the letter A with the value of $A
STATEMENT=${STATEMENT/B/$B}
STATEMENT=${STATEMENT/C/$C}
second_script.sh "$STATEMENT"
If you want to match word boundaries use sed:
STATEMENT=$(sed -e "s/\bA\b/$A/g" -e "s/\bB\b/$B/g" -e "s/\bC\b/$C/g" <<< "$STATEMENT")
Also, you don't need to use cat in your script. You can do it like this:
A=`grep -w '^A' values.txt | cut -d'=' -f2`
Or, using awk:
A=$(awk -F= '/^A\y/{print $2}' values.txt)
Alternatively:
Are you able to change the values.txt file?
It would be easier if you could change it to:
A=1
B=2
C=3
STATEMENT="apple $A orange $B grape $C"
Then in your script you could simply import the file like this:
. values.txt
second_script.sh "$STATEMENT"
How about this:
#!/bin/bash
tail -1 values.txt | {
IFS='=' read name statement
./second_script.sh $(echo "$statement" | sed -r $(
head --lines=-1 values.txt | while IFS='=' read name value
do
printf "%s s/\<%q\>/%q/g " "-e" "$name" "$value"
done))
}
I don't like hard coded solutions, so this will read the values.txt file and evaluate all variables (all but last line) in the statement (last line) and give this to the second_script.sh.
I'd be careful about names and values containing special characters like spaces etc. Run thorough test if you plan on using these!

Resources