jq: tidy up mapping expression - jq

My jq extract expression is getting too much large and I'm getting unconfortable using it:
jq -r '[.id,.meta.lastUpdated,.identifier[0].use, .identifier[0].system, .identifier[0].value, .identifier[1].use, .identifier[1].system, .identifier[1].value, .identifier[2].use, .identifier[2].system, .identifier[2].value, .identifier[2].assigner.reference, .active, .name[0].use, .name[0].text, .name[0].given[0], .name[0].family, .name[0]._family.extension[0].valueString, .address[0].extension[0].valueString, .address[0].type, (.address[0].line[]? | select(. | contains("TV^")) | split("^")[1]) // null, (.address[0].line[]? | select(. | contains("NV^")) | split("^")[1]) // null, (.address[0].line[]? | select(. | contains("NVI^")) | split("^")[1]) // null, .address[0].city, .address[0].state, .address[0].postalCode, .address[0].country, .qualification[0].code.coding[0].system, .qualification[0].code.coding[0].code] | #csv' practitioners-pre.json > practitioners-pre.csv
Is there any way tidy it up a bit?

First of all, you can add line breaks and remove . |
[
.id,
.meta.lastUpdated,
.identifier[0].use,
.identifier[0].system,
.identifier[0].value,
.identifier[1].use,
.identifier[1].system,
.identifier[1].value,
.identifier[2].use,
.identifier[2].system,
.identifier[2].value,
.identifier[2].assigner.reference,
.active,
.name[0].use,
.name[0].text,
.name[0].given[0],
.name[0].family,
.name[0]._family.extension[0].valueString,
.address[0].extension[0].valueString,
.address[0].type,
( .address[0].line[]? | select( contains( "TV^" ) ) | split( "^" )[1] ) // null,
( .address[0].line[]? | select( contains( "NV^" ) ) | split( "^" )[1] ) // null,
( .address[0].line[]? | select( contains( "NVI^" ) ) | split( "^" )[1] ) // null,
.address[0].city,
.address[0].state,
.address[0].postalCode,
.address[0].country,
.qualification[0].code.coding[0].system,
.qualification[0].code.coding[0].code
] | #csv
We can also move the address line searching logic into a function. With the code isolated, it's easier to make it better. In the process, I improved it to split first, and to handle multiple matching lines better. (You may need to adjust the select`.)
def addr_special_field($field):
[ .line[]? | split("^") | select( .[0] == $field ) ] | .[0][1]?;
[
.id,
.meta.lastUpdated,
.identifier[0].use,
.identifier[0].system,
.identifier[0].value,
.identifier[1].use,
.identifier[1].system,
.identifier[1].value,
.identifier[2].use,
.identifier[2].system,
.identifier[2].value,
.identifier[2].assigner.reference,
.active,
.name[0].use,
.name[0].text,
.name[0].given[0],
.name[0].family,
.name[0]._family.extension[0].valueString,
.address[0].extension[0].valueString,
.address[0].type,
( .address[0] | addr_special_field( "TV" ) ),
( .address[0] | addr_special_field( "NV" ) ),
( .address[0] | addr_special_field( "NVI" ) ),
.address[0].city,
.address[0].state,
.address[0].postalCode,
.address[0].country,
.qualification[0].code.coding[0].system,
.qualification[0].code.coding[0].code
] | #csv
This is long, but at least it's readable. Another thing we could do is factor out common terms, though I'm not sure it truly helps.
def addr_special_field($field):
[ .line[]? | split("^") | select( .[0] == $field ) ] | .[0][1]?;
[
.id,
.meta.lastUpdated,
( .identifier[0] | .use, .system, .value ),
( .identifier[1] | .use, .system, .value ),
( .identifier[2] | .use, .system, .value, .assigner.reference ),
.active,
( .name[0] | .use, .text, .given[0], .family, ._family.extension[0].valueString ),
( .address[0] |
.extension[0].valueString,
.type,
addr_special_field( "TV" ),
addr_special_field( "NV" ),
addr_special_field( "NVI" ),
.city,
.state,
.postalCode,
.country
),
( .qualification[0].code.coding[0] | .system, .code )
] | #csv

Related

jq Split value on base space and join to one string

I would like to ask for help. I need to split values from key "Text" on base space " " and join to one line. In actually code I calculate with exactly position but if key Text has S10 is show only S1.
My input
[
{
"PartNumber": "5SE32DFVLG002",
"ClassificationNo": "500001",
"StringValue": "R0050SWSW",
"Field": "95001",
"Text": "S1 W1 cr.sec+colour"
},
{
"PartNumber": "5SE32DFVLG002",
"ClassificationNo": "500001",
"StringValue": "R0050SWSW",
"Field": "95004",
"Text": "S1 W10 cr.sec+colour"
}
]
My actually condition in jq play
[.Oslm[] | select(.ClassificationNo=="500001" and .StringValue!="") |
{PartNumber,ClassificationNo,StringValue,Field,Text}] |
sort_by(.Field) | .[] | [.PartNumber,.ClassificationNo,
.Field[3:5],.Text[0:2] + "-" + .Text[3:5] + .StringValue[0:1], "Test
", .StringValue[1:10]] | join(";")
Actual result
5SE32DFVLG002;500001;95001;S1-W1R;TEST;0050SWSW
5SE32DFVLG002;500001;95004;S1-W1R;TEST;0050SWSW
I would like to have this result
5SE32DFVLG002;500001;95001;S1-W1R;TEST;0050SWSW
5SE32DFVLG002;500001;95004;S1-W10R;TEST;0050SWSW
Modify the part involving generation of .Text to something simpler using split() method in jq that can be used to split on a single white-space. This way, you are not reliant on the length of the sub-fields you want to extract
( .Text | split(" ") | .[0] + "-" + .[1] ) + .StringValue[0:1]
i.e. with full code
.[] | [ select( .ClassificationNo =="500001" and .StringValue != "" ) |
{
PartNumber,
ClassificationNo,
StringValue,
Field,
Text
} ] |
sort_by(.Field) |
map(
.PartNumber,
.ClassificationNo,
.Field[3:5],
( .Text | split(" ") | .[0] + "-" + .[1] ) + .StringValue[0:1],
"Test", .StringValue[1:10]
) |
join(";")
demo at jqplay

Double condition in array with JQ

My JSON is an array of one object like this:
[{
"id": 125650,
"status": "success",
"name": "build_job",
"artifacts": [
{
"file_type": "archive",
"size": 72720116,
"filename": "artifacts.zip",
"file_format": "zip"
},
{
"file_type": "metadata",
"size": 1406,
"filename": "metadata.gz",
"file_format": "gzip"
}
]
}]
I want to select only the object ID if the following conditions matches:
status == success
name == build_job
artifacts.size > 0 where file_type == archive
I'm stuck on the last condition, I can select artifacts with size > 0, OR artifacts where file_type = archive, but not both at the same time.
Here's my current query :
| jq '.[0] | select(.name == "build_job" and .status == "success" and .artifacts[].file_type == "archive") | .id'
Can you help me with that ?
For the last condition, you presumably mean something like:
all(.artifacts[];
if .file_type == "archive" then .size > 0 else true end)
which can also be written as:
all(.artifacts[] | select(.file_type == "archive");
.size > 0)
Iā€™d recommend using either all or any, depending on your requirements.
Try this:
.[0] | select(
.name == "build_job" and .status == "success" and (
.artifacts[] | select(.file_type == "archive") | length > 0
)
) | .id
This selects successful build_jobs containing one or more archive artifacts. Unfortunately, multiple ids are returned if there's more than one such artifacts. Here's how to wrap the expression to fix that:
[
.[] | select(
.name == "build_job" and .status == "success" and (
.artifacts[] | select(.file_type == "archive") | length > 0
)
)
] | unique | .[].id
For the last condition, take the array .artifacts, reduce it to those elements matching your criteria map(select(.file_type == "archive")) and test the resulting array's length length > 0.
All together:
.[0] | select(
.name == "build_job" and
.status == "success" and (
(.artifacts | map(select(.file_type == "archive"))) | length > 0
)
)
| .id

MariaDB JSON remove key and its values

I have a TABLBE like
CREATE TABLE `saved_links` (
`link_entry_id` bigint(20) NOT NULL AUTO_INCREMENT,
`link_id` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL,
`user_data_json` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
PRIMARY KEY (`link_entry_id`),
UNIQUE KEY `link_id` (`link_id`)
) ENGINE=InnoDB AUTO_INCREMENT=19 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='saved Links'
AND INSERT
INSERT INTO `saved_links`(`link_id`, `user_data_json` )
VALUES (
'AABBCC',
'[{
"mama#gmail_DOT_com": {"u_email": "mama#gmail_DOT_com", "private": "no"}},
{
"papa#gmail_DOT_com": {"u_email": "papa#gmail_DOT_com", "private": "no"}},
{
"daughter#gmail_DOT_com": {"u_email": "daughter#gmail_DOT_com", "private": "no"}},
{
"son#gmail_DOT_com": {"u_email": "son#gmail_DOT_com", "private": "no"}
}]'
), (
'DDEEFF',
'[{
"mama#gmail_DOT_com": {"u_email": "mama#gmail_DOT_com", "private": "no"}},
{
"papa#gmail_DOT_com": {"u_email": "papa#gmail_DOT_com", "private": "no"}}
]'
) ;
SELECT*
---------------------------------------------------
`link_id` | `user_data_json`
----------------------------------------------------
`AABBCC` | [{
| "mama#gmail_DOT_com": {"u_email": "mama#gmail_DOT_com", "private": "no"}},
| {
| "papa#gmail_DOT_com": {"u_email": "papa#gmail_DOT_com", "private": "no"}},
| {
| "daughter#gmail_DOT_com": {"u_email": "daughter#gmail_DOT_com", "private": "no"}},
| {
| "son#gmail_DOT_com": {"u_email": "son#gmail_DOT_com", "private": "no"}}]
---------------------------------------------------------------------------------------------
`DDEEFF` | [{
| "mama#gmail_DOT_com": {"u_email": "mama#gmail_DOT_com", "private": "no"}},
| {
| "papa#gmail_DOT_com": {"u_email": "papa#gmail_DOT_com", "private": "no"}}
| ]
---------------------------------------------------------------------------------------------
I would like to REMOVE "papa#gmail_DOT_com" and all his values from AABBCC
I have tried (Am using 10.4.15-MariaDB)
UPDATE `saved_links`
SET `user_data_json` = IFNULL(
JSON_REMOVE( `user_data_json`, JSON_UNQUOTE(
REPLACE( JSON_SEARCH(
`user_data_json`, 'all', 'papa#gmail_DOT_com', NULL, '$**.papa#gmail_DOT_com'), '.u_email', '' ) ) ), `user_data_json` )
where `link_id` = 'AABBCC'
This returns
---------------------------------------------------
`link_id` | `user_data_json`
----------------------------------------------------
`AABBCC` | [{
| "mama#gmail_DOT_com": {"u_email": "mama#gmail_DOT_com", "private": "no"}},
| {}, //-> Notice these empty braces that are left behind.
| {
| "daughter#gmail_DOT_com": {"u_email": "daughter#gmail_DOT_com", "private": "no"}},
| {
| "son#gmail_DOT_com": {"u_email": "son#gmail_DOT_com", "private": "no"}}]
Is there a way to avoid having the empty {} after removal?
UPDATE01- If you try:
UPDATE `saved_links` SET
`user_data_json` =
JSON_REMOVE(`user_data_json`, '$.papa#gmail_DOT_com')
WHERE `link_id`= 'AABBCC'
This deletes all data in the column user_data_json WHERE link_id= 'AABBCC'`
Thank you
select json_remove(user_data_json,'$[1]') from saved_links where link_entry_id=19;
will return:
[{"mama#gmail_DOT_com": {"private": "no", "u_email": "mama#gmail_DOT_com"}},
{"daughter#gmail_DOT_com": {"private": "no", "u_email": "daughter#gmail_DOT_com"}},
{"son#gmail_DOT_com": {"private": "no", "u_email": "son#gmail_DOT_com"}}]
I am not really using JSON, but got my inspiration from the second example here: https://mariadb.com/kb/en/json_remove/
EDIT:
You could optimize this:
with recursive abc as (
Select 0 as i
union all
select i+1 from abc where i<2)
select link_entry_id, link_id,i, json_keys(user_data_json,concat('$[',i,']'))
from saved_links,abc;
output:
+---------------+---------+------+----------------------------------------------+
| link_entry_id | link_id | i | json_keys(user_data_json,concat('$[',i,']')) |
+---------------+---------+------+----------------------------------------------+
| 19 | AABBCC | 0 | ["mama#gmail_DOT_com"] |
| 20 | DDEEFF | 0 | ["mama#gmail_DOT_com"] |
| 19 | AABBCC | 1 | ["papa#gmail_DOT_com"] |
| 20 | DDEEFF | 1 | ["papa#gmail_DOT_com"] |
| 19 | AABBCC | 2 | ["daughter#gmail_DOT_com"] |
| 20 | DDEEFF | 2 | NULL |
+---------------+---------+------+----------------------------------------------+
With this you could 'convert' "papa#gm...." to 1.
EDIT2:
Combining different JSON functions from Mariadb or from MySQL can do a lot:
SELECT
j.person,
JSON_KEYS(j.person),
JSON_EXTRACT(JSON_KEYS(j.person),'$[0]'),
JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(j.person),'$[0]')),
JSON_VALUE(JSON_KEYS(j.person),'$[0]')
FROM
JSON_TABLE('[{
"mama#gmail_DOT_com": {"u_email": "mama#gmail_DOT_com", "private": "no"}},
{
"papa#gmail_DOT_com": {"u_email": "papa#gmail_DOT_com", "private": "no"}},
{
"daughter#gmail_DOT_com": {"u_email": "daughter#gmail_DOT_com", "private": "no"}},
{
"son#gmail_DOT_com": {"u_email": "son#gmail_DOT_com", "private": "no"}
}]',
'$[*]' COLUMNS(person JSON PATH '$[0]')) j
output (please scroll right, the last column is more interesting than the first column šŸ˜‰):
+ ----------- + ------------------------ + --------------------------------------------- + ----------------------------------------------------------- + ------------------------------------------- +
| person | JSON_KEYS(j.person) | JSON_EXTRACT(JSON_KEYS(j.person),'$[0]') | JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(j.person),'$[0]')) | JSON_VALUE(JSON_KEYS(j.person),'$[0]') |
+ ----------- + ------------------------ + --------------------------------------------- + ----------------------------------------------------------- + ------------------------------------------- +
| {"mama#gmail_DOT_com": {"private": "no", "u_email": "mama#gmail_DOT_com"}} | ["mama#gmail_DOT_com"] | "mama#gmail_DOT_com" | mama#gmail_DOT_com | mama#gmail_DOT_com |
| {"papa#gmail_DOT_com": {"private": "no", "u_email": "papa#gmail_DOT_com"}} | ["papa#gmail_DOT_com"] | "papa#gmail_DOT_com" | papa#gmail_DOT_com | papa#gmail_DOT_com |
| {"daughter#gmail_DOT_com": {"private": "no", "u_email": "daughter#gmail_DOT_com"}} | ["daughter#gmail_DOT_com"] | "daughter#gmail_DOT_com" | daughter#gmail_DOT_com | daughter#gmail_DOT_com |
| {"son#gmail_DOT_com": {"private": "no", "u_email": "son#gmail_DOT_com"}} | ["son#gmail_DOT_com"] | "son#gmail_DOT_com" | son#gmail_DOT_com | son#gmail_DOT_com |
+ ----------- + ------------------------ + --------------------------------------------- + ----------------------------------------------------------- + ------------------------------------------- +
EDIT (2020-12-26):
I did have a look at mariadb, and below is tested on version 10.5.8.
select json_extract(json_array(user_data_json,"papa#gmail_DOT_com"), '$[1]') from saved_links;
+-----------------------------------------------------------------------+
| json_extract(json_array(user_data_json,"papa#gmail_DOT_com"), '$[1]') |
+-----------------------------------------------------------------------+
| "papa#gmail_DOT_com" |
| "papa#gmail_DOT_com" |
+-----------------------------------------------------------------------+
But use of $[1] is not desired, soe whe have to determine the correct value for 1:
WITH RECURSIVE data AS (
SELECT
link_entry_id,
link_id,
0 as I,
JSON_KEYS(user_data_json, '$[0]') jk
FROM saved_links
UNION ALL
SELECT
sl.link_entry_id,
sl.link_id,
I+1,
JSON_KEYS(user_data_json, CONCAT('$[',i+1,']'))
FROM saved_links sl, (select max(i) as I from data) x
WHERE JSON_KEYS(user_data_json, CONCAT('$[',i+1,']'))<>'')
SELECT * FROM data
;
.
+---------------+---------+------+----------------------------+
| link_entry_id | link_id | I | jk |
+---------------+---------+------+----------------------------+
| 19 | AABBCC | 0 | ["mama#gmail_DOT_com"] |
| 20 | DDEEFF | 0 | ["mama#gmail_DOT_com"] |
| 19 | AABBCC | 1 | ["papa#gmail_DOT_com"] |
| 20 | DDEEFF | 1 | ["papa#gmail_DOT_com"] |
| 19 | AABBCC | 2 | ["daughter#gmail_DOT_com"] |
| 19 | AABBCC | 3 | ["son#gmail_DOT_com"] |
+---------------+---------+------+----------------------------+
I is the correct value for finding papa#gmail_DOT_com
WITH RECURSIVE data AS (
SELECT
link_entry_id,
link_id,
0 as I,
JSON_KEYS(user_data_json, '$[0]') jk
FROM saved_links
UNION ALL
SELECT
sl.link_entry_id,
sl.link_id,
I+1,
JSON_KEYS(user_data_json, CONCAT('$[',i+1,']'))
FROM saved_links sl, (select max(i) as I from data) x
WHERE JSON_KEYS(user_data_json, CONCAT('$[',i+1,']'))<>'')
SELECT
json_remove(user_data_json, concat('$[',I,']'))
FROM saved_links sl
INNER JOIN data d ON d.link_entry_id= sl.link_entry_id AND d.link_id=sl.link_id and d.I=1
;
.
[{"mama#gmail_DOT_com": {"u_email": "mama#gmail_DOT_com", "private": "no"}},
{"daughter#gmail_DOT_com": {"u_email": "daughter#gmail_DOT_com", "private": "no"}},
{"son#gmail_DOT_com": {"u_email": "son#gmail_DOT_com", "private": "no"}}]
[{"mama#gmail_DOT_com": {"u_email": "mama#gmail_DOT_com", "private": "no"}}]
I've played some time with this puzzle and I figured in another way to do it.
You can use json_search (plus to other functions) to finally use json_remove.
Once you a creating an array of jsons, we must consider it are your designer decision to upload data as is.
So, this is my code:
UPDATE saved_links sl
SET user_data_json =
JSON_REMOVE(user_data_json,
SUBSTRING_INDEX(
JSON_UNQUOTE(
JSON_SEARCH(sl.user_data_json,'one','papa#gmail_DOT_com')
)
,'.', 1)
)
WHERE link_id='AABBCC'
json_search(sl.user_data_json,'one','papa#gmail_DOT_com')
Returns "$[1].papa#gmail_DOT_com.u_email"
JSON_UNQUOTE
Returns $[1].papa#gmail_DOT_com.u_email
SUBSTRING_INDEX(#JSON,'.',1)
Returns $[1]
And finally you will use this last return as JSON_REMOVE path.
I don't know if your JSON key will be always the same of u_email but if it's true, then you can use it.

Generate new array from sparse array of objects in JQ

I have a JSON file that I want to process with JQ. It has an array of objects inside another object, with a key that I want to use to populate a new array.
In my real use-case this is nested in with a lot of other fluff and there lots more arrays but take this as a simpler but representative example of the kind of thing:
{
"numbers": [
{
"numeral": 1,
"ordinal": "1st",
"word": "One"
},
{
"numeral": 2,
"ordinal": "2nd",
"word": "Two"
},
{
"numeral": 5,
"ordinal": "5th",
"word": "Five"
},
{
"some-other-fluff-i-want-to-ignore": true
}
]
}
I'd like to use JQ to get a new array based on the elements, ignoring some elements and handling the missing ones. e.g.
[
"The 1st word is One",
"The 2nd word is Two",
"Wot no number 3?",
"Wot no number 4?",
"The 5th word is Five"
]
Doing this in a loop for the elements that are there is simple, terse and elegant enough:
.numbers | map( . | select( .numeral) | [ "The", .ordinal, "word is", .word ] | join (" "))
But I can't find a way to cope with the missing entries. I have some code that sort-of works:
.numbers | [
( .[] | select(.numeral == 1) | ( [ "The", .ordinal, "word is", .word ] | join (" ")) ) // "Wot no number 1?",
( .[] | select(.numeral == 2) | ( [ "The", .ordinal, "word is", .word ] | join (" ")) ) // "Wot no number 2?",
( .[] | select(.numeral == 3) | ( [ "The", .ordinal, "word is", .word ] | join (" ")) ) // "Wot no number 3?",
( .[] | select(.numeral == 4) | ( [ "The", .ordinal, "word is", .word ] | join (" ")) ) // "Wot no number 4?",
( .[] | select(.numeral == 5) | ( [ "The", .ordinal, "word is", .word ] | join (" ")) ) // "Wot no number 5?"
]
It produces usable output, after a fashion:
richard#sophia:~$ jq -f make-array.jq < numbers.json
[
"The 1st word is One",
"The 2nd word is Two",
"Wot no number 3?",
"Wot no number 4?",
"The 5th word is Five"
]
richard#sophia:~$
However, whilst it produces the output, handles the missing elements and ignores the bits I don't want, it's obviously extremely naff code that cries out for a for-loop or something similar but I can't see a way in JQ to do this. Any ideas?
jq solution:
jq 'def print(o): "The \(o.ordinal) word is \(o.word)";
.numbers | (reduce map(select(.numeral))[] as $o ({}; .["\($o.numeral)"] = $o)) as $o
| [range(0; ($o | [keys[] | tonumber] | max))
| "\(.+1)" as $i
| if ($o[$i]) then print($o[$i]) else "Wot no number \($i)?" end
]' input.json
The output:
[
"The 1st word is One",
"The 2nd word is Two",
"Wot no number 3?",
"Wot no number 4?",
"The 5th word is Five"
]
Another solution !
jq '[
range(1; ( .numbers | max_by(.numeral)|.numeral ) +1 ) as $range_do_diplay |
.numbers as $thedata | $range_do_diplay |
. as $i |
if ([$thedata[]|contains( { numeral: $i })]|any )
then
($thedata|map(select( .numeral == $i )))|.[0]| "The \(.ordinal) word is \(.word) "
else
"Wot no number \($i)?"
end
] ' numbers.json
This solution use
max_by to find the max value of numeral
range to generate a list o values
use variables to store intermediate value

Concatenating two columns and having only one carriage return

I am trying to concatenate two columns into one and managed to do it. But I dont want a carriage return for two columns, instead want only one with a space between two columns.
SELECT t1.save_line || t2.save_line as save_line
FROM (
SELECT *
FROM save_output_table
WHERE save_type = 'R' and seq_id = '0' and execution_id = '292'
) t1
JOIN (
SELECT *
FROM save_output_table
WHERE save_type = 'R' and seq_id = '0' and execution_id = '286'
) t2 ON t1.line_id = t2.line_id
Output:
+------------+
| Save_line |
+------------+
| 18 |
| 12 |
|------------|
| 23 |
| 22 |
+------------+
Expected:
+------------+
| Save_line |
+------------+
| 18 12 |
| 23 22 |
+------------+
First, the query is more simply written as:
SELECT (sov.save_line || sov2.save_line) as save_line
FROM save_output_table sov JOIN
save_output_table sov2
on sov.line_id = sov2.line_id
WHERE sov.save_type = 'R' and sov.seq_id = '0' and sov.execution_id = '292' and
sov2.save_type = 'R' and sov2.seq_id = '0' and sov2.execution_id = '286' ;
Then, you can replace the newline character with a space. This might depend on your operating system, but something like this:
SELECT replace(sov.save_line || sov2.save_line0, char(13), ' ') as save_line
In Windows, you may need to replace both CR and LF:
SELECT replace(replace(sov.save_line || sov2.save_line0, char(13), ' '), char(10), '') as save_line

Resources