Parsing an array of objects, do some math with their index - jq

I have a large json which is actually a concatenated array of objects from several configuration files. I would like to use them to bring up a menue in a bash script. To make the menue easier to read, the json array contains special objects that would trigger a line break. In the end, the user picks the index of the array.
A simplified json looks like this:
[
{
"index" : 0,
"value" : "one a"
},
{
"index" : 3,
"value" : "two a"
},
{
"value" : ""
},
{
"index" : 2,
"value" : "three a"
},
{
"value" : ""
},
{
"index" : 1,
"value" : "one b"
},
{
"index" : 3,
"value" : "two b"
},
{
"index" : 2,
"value" : "three b"
}
]
All values with a come from the first file, all bs from the second file. The entries with an empty value are line breaks.
What I got so far, after hours of researching, is this:
jq --raw-output 'to_entries[] | "\(.key + 1). \(.value.value) (\(.value.index))"' test.json
Which produces this out of the above data:
1. one a (0)
2. two a (3)
3. (null)
4. three a (2)
5. (null)
6. one b (1)
7. two b (3)
8. three b (2)
Now the user would type 8 to work with the three b.
What I need, however, is this:
1. one a (0)
2. two a (3)
3. three a (2)
4. one b (1)
5. two b (3)
6. three b (2)
So the user would need to type 6 to do the same.
Any idea welcome!

Using foreach to count would be one way:
foreach .[] as {$index, $value} (0;
if $value != "" then . + 1 else . end;
if $value != "" then "\(.). \($value) (\($index))" else "" end
)
1. one a (0)
2. two a (3)
3. three a (2)
4. one b (1)
5. two b (3)
6. three b (2)
Demo

Related

extract value from JSON object using SQLite and the json_tree function

I have a table (named, patrons) that contains a column (named, json_patron_varfields) of JSON data--an array of objects that looks something like this:
[
{
"display_order": 1,
"field_content": "example 1",
"name": "Note",
"occ_num": 0,
"varfield_type_code": "x"
},
{
"display_order": 2,
"field_content": "example 2",
"name": "Note",
"occ_num": 1,
"varfield_type_code": "x"
},
{
"display_order": 3,
"field_content": "some field we do not want",
"occ_num": 0,
"varfield_type_code": "z"
}
]
What I'm trying to do is to target the objects that contain the key named varfield_type_code and the value of x which I've been able to do with the following query:
SELECT
patrons.patron_record_id,
json_extract(patrons.json_patron_varfields, json_tree.path)
FROM
patrons,
json_tree(patrons.json_patron_varfields)
WHERE
json_tree.key = 'varfield_type_code'
AND json_tree.value = 'x'
My Question is... how do I extract (or even possibly filter on) the values of the field_content keys from the objects I'm extracting?
I'm struggling with the syntax of how to do that... I was thinking it could be as simple as using json_extract(patrons.json_patron_varfields, json_tree.path."field_content") but that doesn't appear to be correct..
You can concat to build the string
json_tree.path || '.field_content'
With the structure you've given - you can also use json_each() instead of json_tree() which may simplify things.
extract:
SELECT
patrons.patron_record_id,
json_extract(value, '$.field_content')
FROM
patrons,
json_each(patrons.json_patron_varfields)
WHERE json_extract(value, '$.varfield_type_code') = 'x'
filter:
SELECT
patrons.patron_record_id,
value
FROM
patrons,
json_each(patrons.json_patron_varfields)
WHERE json_extract(value, '$.varfield_type_code') = 'x'
AND json_extract(value, '$.field_content') = 'example 2'

regex replacement for whole object tree / reverse operation to `tostring`

So I have big json, where I need to take some subtree and copy it to other place, but with some properties updated (a lot of them). So for example:
{
"items": [
{ "id": 1, "other": "abc"},
{ "id": 2, "other": "def"},
{ "id": 3, "other": "ghi"}
]
}
and say, that i'd like to duplicate record having id == 2, and replace char e in other field with char x using regex. That could go (I'm sure there is a better way, but I'm beginner) something like:
jq '.items |= . + [.[]|select (.id == 2) as $orig | .id=4 | .other=($orig.other | sub("e";"x"))]'<sample.json
producing
{
"items": [
{
"id": 1,
"other": "abc"
},
{
"id": 2,
"other": "def"
},
{
"id": 3,
"other": "ghi"
},
{
"id": 4,
"other": "dxf"
}
]
}
Now that's great. But suppose, that there ins't just one other field. There are multitude of them, and over deep tree. Well I can issue multiple sub operations, but assuming, that replacement pattern is sufficiently selective, maybe we can turn the whole JSON subtree to string (trivial, tostring method) and replace all occurences using singe sub call. But how to turn that substituted string back to — is it call object? — to be able to add it back to items array?
Here's a program that might be a solution to the general problem you are describing, but if not at least illustrates how problems of this type can be solved. Note in particular that there is no explicit reference to a field named "other", and that (thanks to walk) the update function is applied to all candidate JSON objects in the input.
def update($n):
if .items | length > 0
then ((.items[0]|keys_unsorted) - ["id"]) as $keys
| if ($keys | length) == 1
then $keys[0] as $key
| (.items|map(.id) | max + 1) as $newid
| .items |= . + [.[] | select(.id == $n) as $orig | .id=$newid | .[$key]=($orig[$key] | sub("e";"x"))]
else .
end
else .
end;
walk(if type == "object" and has("items") then update(2) else . end)

Cassandra collection tombstones

I have created a table with a collection. Inserted a record and took sstabledump of it and seeing there is range tombstone for it in the sstable. Does this tombstone ever get removed? Also when I run sstablemetadata on the only sstable, it shows "Estimated droppable tombstones" as 0.5", Similarly it shows one record with epoch time as insert time for - "Estimated tombstone drop times: 1548384720: 1". Does it mean that when I do sstablemetadata on a table having collections, the estimated droppable tombstone ratio and drop times values are not true and dependable values due to collection/list range tombstones?
CREATE TABLE ks.nmtest (
reservation_id text,
order_id text,
c1 int,
order_details map<text, text>,
PRIMARY KEY (reservation_id, order_id)
) WITH CLUSTERING ORDER BY (order_id ASC)
user#cqlsh:ks> insert into nmtest (reservation_id , order_id , c1, order_details ) values('3','3',3,{'key':'value'});
user#cqlsh:ks> select * from nmtest ;
reservation_id | order_id | c1 | order_details
----------------+----------+----+------------------
3 | 3 | 3 | {'key': 'value'}
(1 rows)
[root#localhost nmtest-e1302500201d11e983bb693c02c04c62]# sstabledump mc-5-big-Data.db
WARN 02:52:19,596 memtable_cleanup_threshold has been deprecated and should be removed from cassandra.yaml
[
{
"partition" : {
"key" : [ "3" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 41,
"clustering" : [ "3" ],
"liveness_info" : { "tstamp" : "2019-01-25T02:51:13.574409Z" },
"cells" : [
{ "name" : "c1", "value" : 3 },
{ "name" : "order_details", "deletion_info" : { "marked_deleted" : "2019-01-25T02:51:13.574408Z", "local_delete_time" : "2019-01-25T02:51:13Z" } },
{ "name" : "order_details", "path" : [ "key" ], "value" : "value" }
]
}
]
}
SSTable: /data/data/ks/nmtest-e1302500201d11e983bb693c02c04c62/mc-5-big
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Minimum timestamp: 1548384673574408
Maximum timestamp: 1548384673574409
SSTable min local deletion time: 1548384673
SSTable max local deletion time: 2147483647
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 1.0714285714285714
TTL min: 0
TTL max: 0
First token: -155496620801056360 (key=3)
Last token: -155496620801056360 (key=3)
minClustringValues: [3]
maxClustringValues: [3]
Estimated droppable tombstones: 0.5
SSTable Level: 0
Repaired at: 0
Replay positions covered: {CommitLogPosition(segmentId=1548382769966, position=6243201)=CommitLogPosition(segmentId=1548382769966, position=6433666)}
totalColumnsSet: 2
totalRows: 1
Estimated tombstone drop times:
1548384720: 1
Another quuestion was on the nodetool tablestats output - what does slice refer to in cassandra?
Average live cells per slice (last five minutes): 1.0
Maximum live cells per slice (last five minutes): 1
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Dropped Mutations: 0
sstablemetadata does not have the information about your table that is not held within the sstable as it is not guaranteed to be run on system that has Cassandra running, and even if it was its very complex to be able to know how to pull the schema information from it.
Since the gc_grace_seconds is a table parameter and not in the metadata it defaults to assuming a 0 gc grace so the droppable times listed in that histogram will be more a histogram of the tombstone creation times by default. If you know your gc grace you can add it as a -g parameter to your sstablemetadata call. like:
sstablemetadata -g 864000 mc-5-big-Data.db
see http://cassandra.apache.org/doc/latest/tools/sstable/sstablemetadata.html for information on the tools output.
With collections it's just normal range tombstone with all that it entails. They are used to prevent the requirement of a read-before-write when overwriting the value of a multicell collection.

jq select elements with array not containing string

Now, this is somewhat similar to jq: select only an array which contains element A but not element B but it somehow doesn't work for me (which is likely my fault)... ;-)
So here's what we have:
[ {
"employeeType": "student",
"cn": "dc8aff1",
"uid": "dc8aff1",
"ou": [
"4210910",
"4210910 #Abg",
"4210910 Abgang",
"4240115",
"4240115 5",
"4240115 5\/5"
]
},
{
"employeeType": "student",
"cn": "160f656",
"uid": "160f656",
"ou": [
"4210910",
"4210910 3",
"4210910 3a"
] } ]
I'd like to select all elements where ou does not contain a specific string, say "4210910 3a" or - which would be even better - where ou does not contain any member of a given list of strings.
When it comes to possibly changing inputs, you should make it a parameter to your filter, rather than hardcoding it in. Also, using contains might not work for you in general. It runs the filter recursively so even substrings will match which might not be preferred.
For example:
["10", "20", "30", "40", "50"] | contains(["0"])
is true
I would write it like this:
$ jq --argjson ex '["4210910 3a"]' 'map(select(all(.ou[]; $ex[]!=.)))' input.json
This response addresses the case where .ou is an array and we are given another array of forbidden strings.
For clarity, let's define a filter, intersectq(a;b), that will return true iff the arrays have an element in common:
def intersectq(a;b):
any(a[]; . as $x | any( b[]; . == $x) );
This is effectively a loop-within-a-loop, but because of the semantics of any/2, the computation will stop once a match has been found.(*)
Assuming $ex is the list of exceptions, then the filter we could use to solve the problem would be:
map(select(intersectq(.ou; $ex) | not))
For example, we could use an invocation along the lines suggested by Jeff:
$ jq --argjson ex '["4210910 3a"]' -f myfilter.jq input.json
Now you might ask: why use the any-within-any double loop rather than .[]-within-all double loop? The answer is efficiency, as can be seen using debug:
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; ($b[] | debug) != .)'
["DEBUG:",1]
["DEBUG:",1]
false
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; . as $x | all( $b[]; debug | $x != .))'
["DEBUG:",1]
false
(*) Footnote
Of course intersectq/2 as defined here is still O(m*n) and thus inefficient, but the main point of this post is to highlight the drawback of the .[]-within-all double loop.
Here is a solution that checks the .ou member of each element of the input using foreach and contains.
["4210910 3a"] as $list # adjust as necessary
| .[]
| foreach $list[] as $e (
.; .; if .ou | contains([$e]) then . else empty end
)
EDIT: I now realize a filter of the form foreach E as $X (.; .; R) can almost always be rewritten as E as $X | R so the above is really just
["4210910 3a"] as $list
| .[]
| $list[] as $e
| if .ou | contains([$e]) then . else empty end

How to use the function "table:get" (table extension) when 2 keys are required?

I have a file .txt with 3 columns: ID-polygon-1, ID-polygon-2 and distance.
When I import my file into Netlogo, I obtain 3 lists [[list1][list2][list3]] which corresponds with the 3 columns.
I used table:from-list list to create a table with the content of 3 lists.
I obtain {{table: [[1 1] [67 518] [815 127]]}} (The table displays the first two lines of my dataset).
For example, I would like to get the value of distance (list3) between ID-polygon-1 = 1 (list1) and ID-polygon-2 = 67 (list1), that is, 815.
How can I use table:get table key when I have need of 2 keys (ID-polygon-1 and ID-polygon-2) ?
Thanks very much your help.
Using table:from-list will not help you there: it expects "a list of two element lists, or pairs" where the "the first element in the pair is the key and the second element is the value." That's not what you have in your original list.
Furthermore, NetLogo tables (and associative arrays in general) cannot have two keys. They are always just key-value pairs. Nothing prevents the value from being another table, however, and in your case, that is what you need: a table of tables!
There is no primitive to build that directly, however. You will need to build it yourself:
extensions [ table ]
globals [ t ]
to setup
let lists [
[ 1 1 ] ; ID-polygon-1 column
[ 67 518 ] ; ID-polygon-2 column
[ 815 127 ] ; distance column
]
set t table:make
foreach n-values length first lists [ ? ] [
let id1 item ? (item 0 lists)
let id2 item ? (item 1 lists)
let dist item ? (item 2 lists)
if not table:has-key? t id1 [
table:put t id1 table:make
]
table:put (table:get t id1) id2 dist
]
end
Here is what you get when you print the resulting table:
{{table: [[1 {{table: [[67 815] [518 127]]}}]]}}
And here is a small reporter to make it convenient to get a distance from the table:
to-report get-dist [ id1 id2 ]
report table:get (table:get t id1) id2
end
Using get-dist 1 67 will give the 815 result you were looking for.

Resources