How to duplicate input into outputs with jq? - unix

I'm trying to adapt the following snippet:
echo '{"a":{"value":"b"}, "c":{"value":"d"}}' \
| jq -r '. as $in | keys[] | [$in[.].value | tostring + " 1"] | #tsv'
b 1
d 1
to output:
b 1
b 2
d 1
d 2

The following adaptation produces the desired output:
echo '{"a":{"value":"b"}, "c":{"value":"d"}}' |
jq -r '
def addindex(start;lessthan):
range(start;lessthan) as $i | "\(.) \($i)";
. as $in
| keys[]
| $in[.].value
| addindex(1;3)'
Note that keys emits the key names after they have been sorted, whereas keys_unsorted retains the ordering.

Related

how to print all key and one of the values from json using jq

I have a test.json file and I want to print:
prob1, 9
prob2, 10
prob3, 11
cat test.json | jq --raw-output '.[].abc'
return 9,10,11 but I am not sure how to print keys as well.
{
"prob1":{
"abc":9,
"abcd":2,
"Foo":3
},
"prob2":{
"abc":10,
"abcd":2,
"Foo":3
},
"prob3":{
"abc":11,
"abcd":2,
"Foo":3
}
}
Two approaches:
jq -r 'to_entries[] | "\( .key ): \( .value.abc )"'
Demo on jqplay
jq -r 'keys[] as $key | .[$key] | "\( $key ): \( .abc )"'
Demo on jqplay
jq -r 'to_entries[] | "\(.key): \(.value.abc)"'
prob1: 9
prob2: 10
prob3: 11

rdflib::as_rdf only recognizes some IRIs

I am trying to convert a dataframe to RDF. The dataframe contains literals as well as IRIs.
I'm doing
test_withangles.rdf <-
rdflib::as_rdf(x = test_withangles,
key = 'uuid')
rdf_serialize(rdf = test_withangles.rdf, doc = "test_withangles.ttl")
on
+--------------------------------------+-----------------------------------------------+------------------------------------------------------+
| uuid | source_ontology | source_term |
+--------------------------------------+-----------------------------------------------+------------------------------------------------------+
| 7ca250c1-0747-4db0-b613-a1bb45848711 | <http://192.168.0.233:8080/ontologies/RXNORM> | <http://purl.bioontology.org/ontology/RXNORM/706898> |
+--------------------------------------+-----------------------------------------------+------------------------------------------------------+
| e10acb95-e9d9-4804-a227-844f3e551c78 | <http://192.168.0.233:8080/ontologies/RXNORM> | <http://purl.bioontology.org/ontology/RXNORM/214081> |
+--------------------------------------+-----------------------------------------------+------------------------------------------------------+
and getting
#base <localhost://> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<df:7ca250c1-0747-4db0-b613-a1bb45848711>
<df:source_ontology> <http://192.168.0.233:8080/ontologies/RXNORM> ;
<df:source_term> "<http://purl.bioontology.org/ontology/RXNORM/706898>" .
<df:e10acb95-e9d9-4804-a227-844f3e551c78>
<df:source_ontology> <http://192.168.0.233:8080/ontologies/RXNORM> ;
<df:source_term> "<http://purl.bioontology.org/ontology/RXNORM/214081>" .
Why is it interpreting my source_term column as strings, instead of IRIs?

Unix awk - count of occurrences for each unique value

In Unix, I am printing the unique value for the first character in a field. I am also printing a count of the unique field lengths. Now I would like to do both together. Easy to do in SQL, but I'm not sure how to do this in Unix with awk (or grep, sed, ...).
PRINT FIRST UNIQ LEADING CHAR
awk -F'|' '{print substr($61,1,1)}' file_name.sqf | sort | uniq
PRINT COUNT OF FIELDS WITH LENGTHS 8, 10, 15
awk -F'|' 'NR>1 {count[length($61)]++} END {print count[8] ", " count[10] ", " count[15]}' file_name.sqf | sort | uniq
DESIRED OUTPUT
first char, length 8, length 10, length 15
a, 10, , 150
b, 50, 43, 31
A, 20, , 44
B, 60, 83, 22
The fields that start with an upper or lower 'a' are never length 10.
The input file is a | delimited .sqf with no header. The field is varChar 15.
sample input
56789 | someValue | aValue | otherValue | 712345
46789 | someValue | bValue | otherValue | 812345
36789 | someValue | AValue | otherValue | 912345
26789 | someValue | BValue | otherValue | 012345
56722 | someValue | aValue | otherValue | 712345
46722 | someValue | bValue | otherValue | 812345
desired output
a: , , 2
b: 1, , 1
A: , , 1
B: , 1,
'a' has two instances that are length 15
'b' has one instance each of length 8 and 15
'A' has one instance that is length 15
'B' has one instance that is length 10
Thank you.
I think you need a better sample input file, but I guess that's what you're looking for
$ awk -F' \\| ' -v OFS=, '{k=substr($3,1,1); ks[k]; c[k,length($3)]++}
END {for(k in ks) print k": "c[k,6],c[k,10],c[k,15]}' file
A: 1,,
B: 1,,
a: 2,,
b: 2,,
note that since all lengths are 6, I printed that count instead of 8. With the right data you should be able to get the output you expect. Note however that the order is not preserved.

jq: error: number and object cannot be subtracted

I'm using jq 1.4 and am confused about the following situation. I can calculate a number, but get an error when I try to construct an object with this number:
echo '{"aggregations":{"sent":{"value":25},"bounced":{"value":null},"incoming_act":{"value":25}}}' |
jq '.aggregations
| {"num_sent": .sent.value, "num_incoming_act": .incoming_act.value }
| .num_sent as $x
| .num_incoming_act as $y
| $y-$x as $d
| $d'
0
works fine. But
echo '{"aggregations":{"sent":{"value":25},"bounced":{"value":null},"incoming_act":{"value":25}}}' |
jq '.aggregations
| {"num_sent": .sent.value, "num_incoming_act": .incoming_act.value }
| .num_sent as $x
| .num_incoming_act as $y
| $y-$x as $d
| {diff: $d}'
jq: error: number and object cannot be subtracted
doesn't work. Same happens when I ask for objects in the last part:
echo '{"aggregations":{"sent":{"value":25},"bounced":{"value":null},"incoming_act":{"value":25}}}' |
jq '.aggregations
| {"num_sent": .sent.value, "num_incoming_act": .incoming_act.value }
| .num_sent as $x
| .num_incoming_act as $y
| $y-$x as $d
| objects'
jq: error: number and object cannot be subtracted
I love jq's pipe system. However, something seems to be going on here. What is the "0" that I get in the first example? It doesn't seem to be a normal number 0. This works again:
jq -n ' 0 as $x | {diff: $x} '
This
echo '{"aggregations":{"sent":{"value":25},"bounced":{"value":null},"incoming_act":{"value":12}}}' | jq '.aggregations | {"num_sent": .sent.value, "num_incoming_act": .incoming_act.value } | {diff:(.num_sent as $x | .num_incoming_act as $y | $y-$x as $d | $d)}'
Will Produce:
{
"diff": -13
}
Difference being here;
Previous: .num_sent as $x | .num_incoming_act as $y | $y-$x as $d | {diff: $d}'
Now: {diff:(.num_sent as $x | .num_incoming_act as $y | $y-$x as $d | $d)}'
You can probably see by visualising the difference, where jq is processing things.
In the examples where you get an error, write ($y-$x) as $d rather than just $y-$x as $d. The parentheses are sometimes necessary, and always advisable, when writing (COMPOUND INFIX EXPRESSION) as $variable.
Explanation:
The parser treats expressions of the form:
3-2 as $d | EXPR
as:
3-(2 as $d | EXPR)
This means that 3-2 as $d|$d is parsed as 3-(2 as $d|$d) which evaluates to 3-2. Notice, though, that in this case, $d itself has the value 2.

Is there a Unix style command like `column` that formats into a table?

Is there a command line tool that takes lines of delimiter-separated values and arranges them in a SQL-style table? E.g.,
id,name
1,apple
2,banana
3,yogurt
into
id | name
----+---------
1 | apple
2 | banana
3 | yogurt
With perl and format statement :
Input file:
$ cat file.scv
id,name
1,apple
2,banana
3,yogurt
Code:
$ cat ./format-STDIN.pl
#!/usr/bin/env perl
use strict; use warnings;
sep();
while (<>) {
$. == 2 and sep();
format STDOUT =
|#<< | #<<<<<<<<<<<|
split /,/
.
write;
}
sep();
sub sep{ print "+----+-------------+\n"; }
Output:
$ ./format-STDIN.pl file.csv
+----+-------------+
|id | name |
+----+-------------+
|1 | apple |
|2 | banana |
|3 | yogurt |
+----+-------------+

Resources