how to accomplish each_slice like ruby with jq - jq

Sample Input
[1,2,3,4,5,6,7,8,9]
My Solution
$ echo '[1,2,3,4,5,6,7,8,9]' | jq --arg g 4 '. as $l|($g|tonumber) as $n |$l|length as $c|[range(0;$c;($g|tonumber))]|map($l[.:.+$n])' -c
Output
[[1,2,3,4],[5,6,7,8],[9]]
shorthand, handy method anything else?

Use a while loop to chop off the first 4 elements .[4:] until the array is empty []. Then, for each result array, consider only its first 4 items [:4]. Generalized to $n:
jq -c --argjson n 4 '[while(. != []; .[$n:])[:$n]]'
[[1,2,3,4],[5,6,7,8],[9]]
Demo

There's an undocumented builtin function, _nwise/1, which you would use like this:
jq -nc --argjson n 4 '[1,2,3,4,5,6,7,8,9] | [_nwise($n)]'
[[1,2,3,4],[5,6,7,8],[9]]
Notice that using --argjson allows you to avoid the call to tonumber.

One way using reduce operating on the whole list, forming only n entries (sub-arrays) at a time
jq -c --argjson g 4 '. as $input |
reduce range(0; ( $input | length ) ; $g) as $r ( []; . + [ $input[ $r: ( $r + $g ) ] ] )'
The three argument form of range(from: upto; by) generates numbers from to upto with an increment of by
E.g. range(0; 9; 4) from your original input produces a set of indices - 0, 4, 8 which is ranged over and the final list is formed by appending the slices, coming out of the array slice operation e.g. [0:4], [4:8] and [8:12]

Related

jq: Why do two expressions which produce identical output produce different output when surrounded by an array operator?

I have been trying to understand jq, and the following piuzzle is giving me a headache: I can construct two expressions, A and B, which seem to produce the same output. And yet, when I surround them with [] array construction braces (as in [A] and [B]), they produce different output. In this case, the expressions are:
A := jq '. | add'
B := jq -s `.[] | add`
Concretely:
$ echo '[1,2] [3,4]' | jq '.'
[1,2]
[3,4]
$ echo '[1,2] [3,4]' | jq '. | add'
3
7
# Now surround with array construction and we get two values:
$ echo '[1,2] [3,4]' | jq '[. | add]'
[3]
[7]
$ echo '[1,2] [3,4]' | jq -s '.[]'
[1,2]
[3,4]
$ echo '[1,2] [3,4]' | jq -s '.[] | add'
3
7
# Now surround with array construction and we get only one value:
$ echo '[1,2] [3,4]' | jq -s '[.[] | add]'
[3,7]
What is going on here? Why is it that the B expression, which applies the --slurp setting but appears to produce identical intermediate output to the A expression, produces different output when surrounded with [] array construction brackets?
When jq is fed with a stream, just like [1,2] [3,4] with two inputs, it executes the filter independently for each. That's why jq '[. | add]' will produce two results; each addition will separately be wrapped into an array.
When jq is given the --slurp option, it combines the stream to an array, rendering it just one input. Therefore jq -s '[.[] | add]' will have one result only; the multiple additions will be caught by the array constructor, which is executed just once.

Check if element is member of array [duplicate]

I have an array and I need to check if elements exists in that array or to get that element from the array using
jq, fruit.json:
{
"fruit": [
"apple",
"orange",
"pomegranate",
"apricot",
"mango"
]
}
cat fruit.json | jq '.fruit .apple'
does not work
The semantics of 'contains' is not straightforward at all. In general, it would be better to use 'index' to test if an array has a specific value, e.g.
.fruit | index( "orange" )
However, if the item of interest is itself an array, the general form:
ARRAY | index( [ITEM] )
should be used, e.g.:
[1, [2], 3] | index( [[2]] ) #=> 1
IN/1
If your jq has IN/1 then a better solution is to use it:
.fruit as $f | "orange" | IN($f[])
If your jq has first/1 (as does jq 1.5), then here is a fast definition of IN/1 to use:
def IN(s): first((s == .) // empty) // false;
any(_;_)
Another efficient alternative that is sometimes more convenient is to use any/2, e.g.
any(.fruit[]; . == "orange")
or equivalently:
any(.fruit[] == "orange"; .)
To have jq return success if the array fruit contains "apple", and error otherwise:
jq -e '.fruit|any(. == "apple")' fruit.json >/dev/null
To output the element(s) found, change to
jq -e '.fruit[]|select(. == "apple")' fruit.json
If searching for a fixed string, this isn't very relevant, but it might be if the select expression might match different values, e.g. if it's a regexp.
To output only distinct values, pass the results to unique.
jq '[.fruit[]|select(match("^app"))]|unique' fruit.json
will search for all fruits starting with app, and output unique values. (Note that the original expression had to be wrapped in [] in order to be passed to unique.)
[WARNING: SEE THE COMMENTS AND ALTERNATIVE ANSWERS.]
cat fruit.json | jq '.fruit | contains(["orange"])'
For future visitors, if you happen to have the array in a variable and want to check the input against it, and you have jq 1.5 (without IN), your best option is index but with a second variable:
.inputField as $inputValue | $storedArray|index($inputValue)
This is functionally equivalent to .inputField | IN($storedArray[]).
Expanding on the answers here, If you need to filter the array of fruit against another array of fruit, you could do something like this:
cat food.json | jq '[.fruit[] as $fruits | (["banana", "apple"] | contains([$fruits])) as $results | $fruits | select($results)]'
This will return an array only containing "apple" in the above sample json.
This modified sample did worked here:
jq -r '.fruit | index( "orange" )' fruit.json | tail -n 1
It gets only the last line of the output.
If it exist, it returns 0.
If don't, it returns null.

jq select elements with array not containing string

Now, this is somewhat similar to jq: select only an array which contains element A but not element B but it somehow doesn't work for me (which is likely my fault)... ;-)
So here's what we have:
[ {
"employeeType": "student",
"cn": "dc8aff1",
"uid": "dc8aff1",
"ou": [
"4210910",
"4210910 #Abg",
"4210910 Abgang",
"4240115",
"4240115 5",
"4240115 5\/5"
]
},
{
"employeeType": "student",
"cn": "160f656",
"uid": "160f656",
"ou": [
"4210910",
"4210910 3",
"4210910 3a"
] } ]
I'd like to select all elements where ou does not contain a specific string, say "4210910 3a" or - which would be even better - where ou does not contain any member of a given list of strings.
When it comes to possibly changing inputs, you should make it a parameter to your filter, rather than hardcoding it in. Also, using contains might not work for you in general. It runs the filter recursively so even substrings will match which might not be preferred.
For example:
["10", "20", "30", "40", "50"] | contains(["0"])
is true
I would write it like this:
$ jq --argjson ex '["4210910 3a"]' 'map(select(all(.ou[]; $ex[]!=.)))' input.json
This response addresses the case where .ou is an array and we are given another array of forbidden strings.
For clarity, let's define a filter, intersectq(a;b), that will return true iff the arrays have an element in common:
def intersectq(a;b):
any(a[]; . as $x | any( b[]; . == $x) );
This is effectively a loop-within-a-loop, but because of the semantics of any/2, the computation will stop once a match has been found.(*)
Assuming $ex is the list of exceptions, then the filter we could use to solve the problem would be:
map(select(intersectq(.ou; $ex) | not))
For example, we could use an invocation along the lines suggested by Jeff:
$ jq --argjson ex '["4210910 3a"]' -f myfilter.jq input.json
Now you might ask: why use the any-within-any double loop rather than .[]-within-all double loop? The answer is efficiency, as can be seen using debug:
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; ($b[] | debug) != .)'
["DEBUG:",1]
["DEBUG:",1]
false
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; . as $x | all( $b[]; debug | $x != .))'
["DEBUG:",1]
false
(*) Footnote
Of course intersectq/2 as defined here is still O(m*n) and thus inefficient, but the main point of this post is to highlight the drawback of the .[]-within-all double loop.
Here is a solution that checks the .ou member of each element of the input using foreach and contains.
["4210910 3a"] as $list # adjust as necessary
| .[]
| foreach $list[] as $e (
.; .; if .ou | contains([$e]) then . else empty end
)
EDIT: I now realize a filter of the form foreach E as $X (.; .; R) can almost always be rewritten as E as $X | R so the above is really just
["4210910 3a"] as $list
| .[]
| $list[] as $e
| if .ou | contains([$e]) then . else empty end

print duplicate entries without deleting unix/linux

Let's say I have a file like this with 2 columns
56-cde
67-cde
56-cao
67-cgh
78-xyz
456-hhh
456-jjjj
45678-nnmn
45677-abdc
45678-aief
I am trying to get an output like this:
56-cde
56-cao
67-cde
67-cgh
456-hhh
456-jjjj
45678-aief
45678-nnmn
So basically instead of printing out the unique values I need to print the duplicates:
I tried to accomplish this using awk like this :
cat input.txt | awk -F"-" '{print $1,$2}' | sort -n | uniq -w 2 -D
This is without doubt showing me what values in column 1 have been duplicated, and also displaying the duplicated values of column 1 along with the respective column 2 values. But since I am hardcoding the number of bytes to 2, it displays the duplicated values only for the 2 digit numbers in column one. Is there a way to do this using awk ?
Thanks in advance.
See if your uniq has a -D option. My cygwin version does:
cat input.txt | sort | uniq -w 2 -D
another awk solution without arrays (but with presort)
sort -n file | awk -F- '
NR==1{p=$1; a=$0; c++; next}
p==$1{a=a RS $0; c++; next}
c{print a}
{a=$0; p=$1; c=0}
END{if(c) print a}'
This is what I came up with (just an awk program, no external sort, uniq etc.):
BEGIN { FS = "-" }
{ arr[$1] = arr[$1] "-" $2 }
END {
for (i in arr) {
if ((n = split(arr[i], a)) < 3) continue
for (j = 2; j <= n; ++j)
print i"-"a[j]
}
}
It collects all numbers along with the different strings attached
in arr (assuming the strings won't contain dashes -).
With gawk, you could use arrays of arrays in order to avoid the concatenation and splitting with dashes.
I would handle the varying-number-of-digits case by pre-conditioning the data so that the number field is a fixed large width (and use that width in uniq):
cat input.txt | awk -F- '{printf "%12d-%s\n",$1,$2}'| sort | uniq -w 12 -D
If you need the output left-justified as well, just tack on this post-conditioning step:
| awk '{print $1}'
Using Perl
$ cat two_cols.txt
56-cde
67-cde
56-cao
67-cgh
78-xyz
456-hhh
456-jjjj
45678-nnmn
45677-abdc
45678-aief
$ perl -F"-" -lane ' #t=#{$kv{$F[0]}}; push(#t,$_); $kv{$F[0]}=[#t]; END { while(($x,$y)=each(%kv)){ print join("\n",#{$y}) if scalar #{$y}>1 }} ' two_cols.txt
67-cde
67-cgh
56-cde
56-cao
456-hhh
456-jjjj
45678-nnmn
45678-aief
$

bc arithmetic Error

i am trying to solve this bash script which reads an arithmetic expression from user and echoes it to the output screen with round up of 3 decimal places in the end.
sample input
5+50*3/20 + (19*2)/7
sample output
17.929
my code is
read x
echo "scale = 3; $x" | bc -l
when there is an input of
5+50*3/20 + (19*2)/7
**my output is **
17.928
which the machine wants it to be
17.929
and due to this i get the solution wrong. any idea ?
The key here is to be sure to use printf with the formatting spec of "%.3f" and printf will take care of doing the rounding as you wish, as long as "scale=4" for bc.
Here's a script that works:
echo -e "please enter math to calculate: \c"
read x
printf "%.3f\n" $(echo "scale=4;$x" | bc -l)
You can get an understanding of what is going on with the above solution, if you run this command at the commandline: echo "scale=4;5+50*3/20 + (19*2)/7" | bc the result will be 17.9285. When that result is provided to printf as an argument, the function takes into account the fourth decimal place and rounds up the value so that the formatted result displays with precisely three decimal places and with a value of 17.929.
Alternatively, this works, too without a pipe by redirecting the here document as input for bc, as follows which avoids creating a sub-shell:
echo -e "please enter math to calculate: \c"
read x
printf "%.3f\n" $(bc -l <<< "scale=4;$x")
You are not rounding the number, you are truncating it.
$ echo "5+50*3/20 + (19*2)/7" | bc -l
17.92857142857142857142
$ echo "scale = 3; 5+50*3/20 + (19*2)/7" | bc -l
17.928
The only way I know to round a number is using awk:
$ awk 'BEGIN { rounded = sprintf("%.3f", 5+50*3/20 + (19*2)/7); print rounded }'
17.929
So, in you example:
read x
awk 'BEGIN { rounded = sprintf("%.3f", $x; print rounded }'
I entirely agree with jherran that you are not rounding the number, you are truncating it. I would go on to say that scale is probably just not behaving at all the way you want it, possibly in a way that noone would want it to behave.
> x="5+50*3/20 + (19*2)/7"
> echo "$x" | bc -l
17.92857142857142857142
> echo "scale = 3; $x" | bc -l
17.928
Furthermore, because of the behaviour of scale, you are rounding each multiplication/division separately from the additions. Let me prove my point with some examples :
> echo "scale=0; 5/2" | bc -l
2
> echo "scale=0; 5/2 + 7/2" | bc -l
5
> echo "5/2 + 7/2" | bc -l
6.00000000000000000000
However scale without any operation doesn't work either. There is an ugly work-around :
> echo "scale=0; 5.5" | bc -l
5.5
> echo "scale=0; 5.5/1" | bc -l
5
So tow things come out of this.
If you want to use bc's scale, do it only for the final result already computed, and even then, beware.
Remember that rounding is the same as truncating a number + half of the desired precision.
Let us take the example of rounding to the nearest integer, if you add .5 to a number that should be rounded up, its integer part will take the next integer value and truncation will give the desired result. If that number should have been rounded down, then adding .5 will not change its integer value and truncation will yield the same result as when nothing was added.
Thus my solution follows :
> y=$(echo "$x" | bc -l)
> echo "scale=3; ($y+0.0005)/1" | bc -l # scale doesn't apply to the +, so we get the expected result
17.929
Again, note that the following doesn't work (as explained above), thus breaking it up in two operations is really needed :
> echo "scale=3; ($x+0.0005)/1" | bc -l
17.928

Resources