How to avoid record duplication in jq - jq

I've the following Json :
{
"hits": {
"hits": [
{
"_source": {
"offers_data": [
{
"base_price": 198.89,
"shop_id": 2002,
"shop_name": "TheOtherShop"
},
{
"base_price": 223,
"shop_id": 2247,
"shop_name": "MainShop"
},
{
"base_price": 225,
"shop_id": 2247,
"shop_name": "MainShop"
}
],
"search_result_data": {
"identifiers": {
"id": 32116
},
"shop": {
"id": 2247,
"name": "MainShop"
}
}
}
}
]
}
}
I'm writing the following command :
jq -c --raw-output '.hits.hits[]|{products_ids: ._source.search_result_data.identifiers.id,
best_shop_id: ._source.search_result_data.shop.id,
best_shop_name: (if ._source.search_result_data.shop.id>0 then ._source.search_result_data.shop.id as $shop_id|._source.offers_data[]|select(.shop_id==$shop_id).shop_name else "" end),
best_offer_base_price: (if ._source.search_result_data.shop.id>0 then ._source.search_result_data.shop.id as $shop_id|._source.offers_data[]|select(.shop_id==$shop_id).base_price else "" end)}'
and I get this result :
{"products_ids":32116,"best_shop_id":2247,"best_shop_name":"MainShop","best_offer_base_price":223}
{"products_ids":32116,"best_shop_id":2247,"best_shop_name":"MainShop","best_offer_base_price":225}
{"products_ids":32116,"best_shop_id":2247,"best_shop_name":"MainShop","best_offer_base_price":223}
{"products_ids":32116,"best_shop_id":2247,"best_shop_name":"MainShop","best_offer_base_price":225}
As you can see I get 2 duplicates : Of course I've two offers from MainShop, so it's normal that I get 2 records, but if I'm also fetching the base prices, the it duplicates the result again. In my real world case I get 32 records instead of 2 legitimate ones, because I'm fetching other fields. So I'd like to avoid this extra duplication each time I fetch a field.
The icing on the cake would be to be able to only get one record, the one where amongst Mainshop offers the base_price is the minimum.
Thanks

Icing
... the one where the base_price is the minimum.
The following two interpretations of the problem both assume we can take, as the "minimum" item, any of the admissible items that has the minimal value.
First Interpretation of the original question
.hits.hits[]._source
| (.offers_data | min_by(.base_price)) as $min_offers_data
| .search_result_data
| {products_ids: .identifiers.id}
+ ($min_offers_data
| {best_shop_id: .shop_id,
best_shop_name: .shop_name,
best_offer_base_price: .base_price})
Output:
{
"products_ids": 32116,
"best_shop_id": 2002,
"best_shop_name": "TheOtherShop",
"best_offer_base_price": 198.89
}
Second Interpretation
Restrict consideration to .search_result_data.shop.id:
.hits.hits[]._source
| (.search_result_data.shop.id) as $shop
| (.offers_data | map(select(.shop_id == $shop)) | min_by(.base_price)) as $min_offers_data
| .search_result_data
| {products_ids: .identifiers.id}
+ ($min_offers_data
| {best_shop_id: .shop_id,
best_shop_name: .shop_name,
best_offer_base_price: .base_price})
Output
{
"products_ids": 32116,
"best_shop_id": 2247,
"best_shop_name": "MainShop",
"best_offer_base_price": 223
}

Related

Conditionally output a field?

In this example I only want isGreaterThanOne field to be shown if it's true. Here's what I started with (always shown)
echo '[{"a":5},{"a":1}]' | jq '[.[] | {value:.a, isGreaterThanOne:(.a>1)}]'
I inserted an if statement
echo '[{"a":5},{"a":1}]' | jq '[.[] | {value:.a, X:(if .a>1 then "Y" else "N" end) }]'
Then got stuck trying to move the field into the conditional. Also it seems like I must have an else with an if
echo '[{"a":5},{"a":1}]' | jq '[.[] | {value:.a, (if .a>1 then (K:"Y)" else (L:"N") end) }]'
I want the below as the result (doesn't need to be pretty printed)
[
{
"value": 5,
"X": "Y"
},
{
"value": 1,
}
]
Using if, make one branch provide an empty object {} which wouldn't contain the extra field:
map({value: .a} + if .a > 1 then {X: "Y"} else {} end)
Demo
Alternatively, equip only selected items with the extra field:
map({value: .a} | select(.value > 1).X = "Y")
Demo
Output:
[
{
"value": 5,
"X": "Y"
},
{
"value": 1
}
]

Group nested array objects to parent key in JQ

I have JSON coming from an external application, formatted like so:
{
"ticket_fields": [
{
"url": "https://example.com/1122334455.json",
"id": 1122334455,
"type": "tagger",
"custom_field_options": [
{
"id": 123456789,
"name": "I have a problem",
"raw_name": "I have a problem",
"value": "help_i_have_problem",
"default": false
},
{
"id": 456789123,
"name": "I have feedback",
"raw_name": "I have feedback",
"value": "help_i_have_feedback",
"default": false
},
]
}
{
"url": "https://example.com/6677889900.json",
"id": 6677889900,
"type": "tagger",
"custom_field_options": [
{
"id": 321654987,
"name": "United States,
"raw_name": "United States",
"value": "location_123_united_states",
"default": false
},
{
"id": 987456321,
"name": "Germany",
"raw_name": "Germany",
"value": "location_456_germany",
"default": false
}
]
}
]
}
The end goal is to be able to get the data into a TSV in the sense that each object in the custom_field_options array is grouped by the parent ID (ticket_fields.id), and then transposed such that each object would be represented on a single line, like so:
Ticket Field ID
Name
Value
1122334455
I have a problem
help_i_have_problem
1122334455
I have feedback
help_i_have_feedback
6677889900
United States
location_123_united_states
6677889900
Germany
location_456_germany
I have been able to export the data successfully to TSV already, but it reads per-line, and without preserving order, like so:
Using jq -r '.ticket_fields[] | select(.type=="tagger") | [.id, .custom_field_options[].name, .custom_field_options[].value] | #tsv'
Ticket Field ID
Name
Name
Value
Value
1122334455
I have a problem
I have feedback
help_i_have_problem
help_i_have_feedback
6677889900
United States
Germany
location_123_united_states
location_456_germany
Each of the custom_field_options arrays in production may consist of any number of objects (not limited to 2 each). But I seem to be stuck on how to appropriately group or map these objects to their parent ticket_fields.id and to transpose the data in a clean manner. The select(.type=="tagger") is mentioned in the query as there are multiple values for ticket_fields.type which need to be filtered out.
Based on another answer on here, I did try variants of jq -r '.ticket_fields[] | select(.type=="tagger") | map(.custom_field_options |= from_entries) | group_by(.custom_field_options.ticket_fields) | map(map( .custom_field_options |= to_entries))' without success. Any assistance would be greatly appreciated!
You need two nested iterations, one in each array. Save the value of .id in a variable to access it later.
jq -r '
.ticket_fields[] | select(.type=="tagger") | .id as $id
| .custom_field_options[] | [$id, .name, .value]
| #tsv
'

jq - Get objects with latest date

Json looks like this:
cat test.json |jq -r ".nodes[].run_data"
{
"id": "1234",
"status": "PASSED",
"penultimate_status": "PASSED",
"end_time":"2022-02-28T09:50:05Z"
}
{
"id": "4321",
"status": "PASSED",
"penultimate_status": "UNKNOWN",
"end_time": "2020-10-14T13:52:57Z"
}
I want to get "status" and "end_time" of the newest run. Unfortunately the order is not fix. Meaning the newest run can be first in the list, but also last or in the middle...
Use sort_by to bring the items in order, then extract the last item:
jq '
[.nodes[].run_data]
| sort_by(.end_time) | last
| {status, end_time}
' test.json
{
"status": "PASSED",
"end_time": "2022-02-28T09:50:05Z"
}
To get the fields in another format, replace {status, end_time} with your format, e.g. "\(.end_time): Status \(.status)", and set the -r flag as this isn't JSON anymore but raw text.
You can use transpose to map each object with its end_time.
Here I have converted end_time to seconds since Unix epoch and outputted the object with largest seconds value (this is the newest).
[
[. | map(.end_time | strptime("%Y-%m-%dT%H:%M:%SZ") | mktime), [.[0], .[1]]]
| transpose[]
| .[1] += {secs: .[0]} | .[1]
]
| sort_by(.secs) | last
| {status, end_time}
Output
{
"status": "PASSED",
"end_time": "2022-02-28T09:50:05Z"
}
Demo
https://jqplay.org/s/w1z2n2drc7

Extract nested properties from an array of objects

I have the following JSON file :
{
"filter": [
{
"id": "id_1",
"criteria": {
"from": "mail#domain1.com",
"subject": "subject_1"
},
"action": {
"addLabelIds": [
"Label_id_1"
],
"removeLabelIds": [
"INBOX",
"SPAM"
]
}
},
{
"id": "id_2",
"criteria": {
"from": "mail#domain2.com",
"subject": "subject_1"
},
"action": {
"addLabelIds": [
"Label_id_2"
],
"removeLabelIds": [
"INBOX",
"SPAM"
]
}
}
]
}
And I would like to extract emails values : mail#domain1.com and mail#domain2.com
I have tried this command:
jq --raw-output '.filter[] | select(.criteria.from | test("mail"; "i")) | .id'
But does not work, I get this error :
jq: error (at <stdin>:1206): null (null) cannot be matched, as it is
not a string exit status 5
Another point : how to display the value of "id" key, where "from" key value = mail#domain1.com ?
So in my file id = id_1
Do you have an idea ?
If you only need to extract the emails from .criteria.from then this filter is enough as far as I can tell:
jq --raw-output '.filter[].criteria.from' file.json
If some objects don't have a criteria object then you can filter out nulls with:
jq --raw-output '.filter[].criteria.from | select(. != null)' file.json
If you want to keep the emails equal to "mail#domain1.com":
jq --raw-output '.filter[].criteria.from | select(. == "mail#domain1.com")' file.json
If you want to keep the emails that start with "mail#":
jq --raw-output '.filter[].criteria.from | select(. != null) | select(startswith("mail#"))' file.json
I would like to extract emails values
There is a wide spectrum of possible answers, with these
amongst the least specific with respect to where in the JSON the email addresses occur:
.. | objects | .from | select(type=="string")
.. | strings | select(test("#([a-z0-9]+[.])+[a-z]+$"))

regex replacement for whole object tree / reverse operation to `tostring`

So I have big json, where I need to take some subtree and copy it to other place, but with some properties updated (a lot of them). So for example:
{
"items": [
{ "id": 1, "other": "abc"},
{ "id": 2, "other": "def"},
{ "id": 3, "other": "ghi"}
]
}
and say, that i'd like to duplicate record having id == 2, and replace char e in other field with char x using regex. That could go (I'm sure there is a better way, but I'm beginner) something like:
jq '.items |= . + [.[]|select (.id == 2) as $orig | .id=4 | .other=($orig.other | sub("e";"x"))]'<sample.json
producing
{
"items": [
{
"id": 1,
"other": "abc"
},
{
"id": 2,
"other": "def"
},
{
"id": 3,
"other": "ghi"
},
{
"id": 4,
"other": "dxf"
}
]
}
Now that's great. But suppose, that there ins't just one other field. There are multitude of them, and over deep tree. Well I can issue multiple sub operations, but assuming, that replacement pattern is sufficiently selective, maybe we can turn the whole JSON subtree to string (trivial, tostring method) and replace all occurences using singe sub call. But how to turn that substituted string back to — is it call object? — to be able to add it back to items array?
Here's a program that might be a solution to the general problem you are describing, but if not at least illustrates how problems of this type can be solved. Note in particular that there is no explicit reference to a field named "other", and that (thanks to walk) the update function is applied to all candidate JSON objects in the input.
def update($n):
if .items | length > 0
then ((.items[0]|keys_unsorted) - ["id"]) as $keys
| if ($keys | length) == 1
then $keys[0] as $key
| (.items|map(.id) | max + 1) as $newid
| .items |= . + [.[] | select(.id == $n) as $orig | .id=$newid | .[$key]=($orig[$key] | sub("e";"x"))]
else .
end
else .
end;
walk(if type == "object" and has("items") then update(2) else . end)

Resources