How to process output from match function in jq? - jq

I'm using js tool to parse some JSONs/strings. My minimal example is the following command:
echo '"foo foo"' | jq 'match("(foo)"; "g")'
Which results in the following output:
{
"offset": 0,
"length": 3,
"string": "foo",
"captures": [
{
"offset": 0,
"length": 3,
"string": "foo",
"name": null
}
]
}
{
"offset": 4,
"length": 3,
"string": "foo",
"captures": [
{
"offset": 4,
"length": 3,
"string": "foo",
"name": null
}
]
}
I want my final output for this example to be:
"foo,foo"
But in this case I get two separate objects instead of an array or similar that I could call implode on. I guess either the API isn't made for my UC or my understanding of it is very wrong. Please, advise.

The following script takes the string value from each of the separate objects with .string, wraps them in an array [...] and then joins the members of the array with commas using join.
I modified the regex because you didn't actually need a capture group for the given use case, but if you wanted to access the capture groups you could do .captures[].string instead of .string.
echo '"foo foo"' | jq '[match("foo"; "g").string] | join(",")'

Related

How to sum values from object json jq

How can I sum values from json object with jq
Example input JSON object
{
"orderNumber": 2346999,
"workStep": 110,
"good": 8,
"bad": 0,
"type": "1",
"date": "2022-11-08T07:17:09",
"time": 0,
"result": 1
}
{
"orderNumber": 2346999,
"workStep": 110,
"good": 8,
"bad": 0,
"type": "1",
"date": "2022-11-08T07:26:57",
"time": 0,
"result": 1
}
jq condition
. | select(.orderNumber==2346999 and .workStep==110) | .good
result
8
8
and I liketo have
16
A simple approach using add to get the sum of the numbers.
Use map() with --slurp to create an array with just the .good and apply add:
map(select(.orderNumber==2346999 and .workStep==110).good) | add
Gives: 16
Online demo
An efficient approach that avoids the need to slurp is to use the general utility:
def count(s): reduce s as $_ (0; .+1);
With this, you can use your filter using jq's -n option instead of the -s option:
count(inputs | select(.orderNumber==2346999 and .workStep==110) | .good)
Notice that the leading . in your filter is unnecessary.

How can I use jq to sort by datetime field and filter based on attribute?

I am trying to sort following json response based on "startTime" and also want to filter based on "name" and fetch only "dataCenter" of matched record. Can you please help with jq function for doing it?
I tried something like this jq '.[]|= sort_by(.startTime)' but it doesnt return correct result.
[
{
"name": "JPCSKELT",
"dataCenter": "mvsADM",
"orderId": "G9HC8",
"scheduleTable": "FD33515",
"nodeGroup": null,
"controlmApp": "P/C-DEVELOPMENT-LRSP",
"groupName": "SCMTEST",
"assignmentGroup": "HOST_CONFIG_MGMT",
"owner": "PC00000",
"description": null,
"startTime": "2021-11-11 17:45:48.0",
"endTime": "2021-11-11 17:45:51.0",
"successCount": 1,
"failureCount": 0,
"dailyRunCount": 0,
"scriptName": "JPCSKELT"
},
{
"name": "JPCSKELT",
"dataCenter": "mvsADM",
"orderId": "FWX98",
"scheduleTable": "JPCS1005",
"nodeGroup": null,
"controlmApp": "P/C-DEVELOPMENT-LRSP",
"groupName": "SCMTEST",
"assignmentGroup": "HOST_CONFIG_MGMT",
"owner": "PC00000",
"description": null,
"startTime": "2021-07-13 10:49:47.0",
"endTime": "2021-07-13 10:49:49.0",
"successCount": 1,
"failureCount": 0,
"dailyRunCount": 0,
"scriptName": "JPCSKELT"
},
{
"name": "JPCSKELT",
"dataCenter": "mvsADM",
"orderId": "FWX98",
"scheduleTable": "JPCS1005",
"nodeGroup": null,
"controlmApp": "P/C-DEVELOPMENT-LRSP",
"groupName": "SCMTEST",
"assignmentGroup": "HOST_CONFIG_MGMT",
"owner": "PC00000",
"description": null,
"startTime": "2021-10-13 10:49:47.0",
"endTime": "2021-10-13 10:49:49.0",
"successCount": 1,
"failureCount": 0,
"dailyRunCount": 0,
"scriptName": "JPCSKELT"
}
]
You can use the following expression to sort the input -
sort_by(.startTime | sub("(?<time>.*)\\..*"; "\(.time)") | strptime("%Y-%m-%d %H:%M:%S") | mktime)
The sub("(?<time>.*)\\..*"; "\(.time)") expression removes the trailing decimal fraction.
I assume you can use the result from the above query to perform desired filtering.
Welcome. From what I'm guessing you're asking, you want to supply a value to filter the records on using the name property, sort the results by the startTime property and then just output the value of the dataCenter property for those records. How about this:
jq --arg name JPCSKELT '
map(select(.name==$name))|sort_by(.startTime)[].dataCenter
' data.json
Based on your sample data, this produces:
"mvsADM"
"mvsADM"
"mvsADM"
So I'm wondering if this is what you're really asking?

regex replacement for whole object tree / reverse operation to `tostring`

So I have big json, where I need to take some subtree and copy it to other place, but with some properties updated (a lot of them). So for example:
{
"items": [
{ "id": 1, "other": "abc"},
{ "id": 2, "other": "def"},
{ "id": 3, "other": "ghi"}
]
}
and say, that i'd like to duplicate record having id == 2, and replace char e in other field with char x using regex. That could go (I'm sure there is a better way, but I'm beginner) something like:
jq '.items |= . + [.[]|select (.id == 2) as $orig | .id=4 | .other=($orig.other | sub("e";"x"))]'<sample.json
producing
{
"items": [
{
"id": 1,
"other": "abc"
},
{
"id": 2,
"other": "def"
},
{
"id": 3,
"other": "ghi"
},
{
"id": 4,
"other": "dxf"
}
]
}
Now that's great. But suppose, that there ins't just one other field. There are multitude of them, and over deep tree. Well I can issue multiple sub operations, but assuming, that replacement pattern is sufficiently selective, maybe we can turn the whole JSON subtree to string (trivial, tostring method) and replace all occurences using singe sub call. But how to turn that substituted string back to — is it call object? — to be able to add it back to items array?
Here's a program that might be a solution to the general problem you are describing, but if not at least illustrates how problems of this type can be solved. Note in particular that there is no explicit reference to a field named "other", and that (thanks to walk) the update function is applied to all candidate JSON objects in the input.
def update($n):
if .items | length > 0
then ((.items[0]|keys_unsorted) - ["id"]) as $keys
| if ($keys | length) == 1
then $keys[0] as $key
| (.items|map(.id) | max + 1) as $newid
| .items |= . + [.[] | select(.id == $n) as $orig | .id=$newid | .[$key]=($orig[$key] | sub("e";"x"))]
else .
end
else .
end;
walk(if type == "object" and has("items") then update(2) else . end)

Understanding fold() and its impact on gremlin query cost in Azure Cosmos DB

I am trying to understand query costs in Azure Cosmos DB
I cannot figure out what is the difference in the following examples and why using fold() lowers the cost:
g.V().hasLabel('item').project('itemId', 'id').by('itemId').by('id')
which produces the following output:
[
{
"itemId": 14,
"id": "186de1fb-eaaf-4cc2-b32b-de8d7be289bb"
},
{
"itemId": 5,
"id": "361753f5-7d18-4a43-bb1d-cea21c489f2e"
},
{
"itemId": 6,
"id": "1c0840ee-07eb-4a1e-86f3-abba28998cd1"
},
....
{
"itemId": 5088,
"id": "2ed1871d-c0e1-4b38-b5e0-78087a5a75fc"
}
]
The cost is 15642 RUs x 0.00008 $/RU = 1.25$
g.V().hasLabel('item').project('itemId', 'id').by('itemId').by('id').fold()
which produces the following output:
[
[
{
"itemId": 14,
"id": "186de1fb-eaaf-4cc2-b32b-de8d7be289bb"
},
{
"itemId": 5,
"id": "361753f5-7d18-4a43-bb1d-cea21c489f2e"
},
{
"itemId": 6,
"id": "1c0840ee-07eb-4a1e-86f3-abba28998cd1"
},
...
{
"itemId": 5088,
"id": "2ed1871d-c0e1-4b38-b5e0-78087a5a75fc"
}
]
]
The cost is 787 RUs x 0.00008$/RU = 0.06$
g.V().hasLabel('item').values('id', 'itemId')
with the following output:
[
"186de1fb-eaaf-4cc2-b32b-de8d7be289bb",
14,
"361753f5-7d18-4a43-bb1d-cea21c489f2e",
5,
"1c0840ee-07eb-4a1e-86f3-abba28998cd1",
6,
...
"2ed1871d-c0e1-4b38-b5e0-78087a5a75fc",
5088
]
cost: 10639 RUs x 0.00008 $/RU = 0.85$
g.V().hasLabel('item').values('id', 'itemId').fold()
with the following output:
[
[
"186de1fb-eaaf-4cc2-b32b-de8d7be289bb",
14,
"361753f5-7d18-4a43-bb1d-cea21c489f2e",
5,
"1c0840ee-07eb-4a1e-86f3-abba28998cd1",
6,
...
"2ed1871d-c0e1-4b38-b5e0-78087a5a75fc",
5088
]
]
The cost is 724.27 RUs x 0.00008 $/RU = 0.057$
As you see, the impact on the cost is tremendous.
This is just for approx. 3200 nodes with few properties.
I would like to understand why adding fold changes so much.
I was trying to reproduce your example, but unfortunately have opposite results (500 vertices in Cosmos):
g.V().hasLabel('test').values('id')
or
g.V().hasLabel('test').project('id').by('id')
gave respectively
86.08 and 91.44 RU, while same queries followed by fold() step resulted in 585.06 and
590.43 RU.
This result I got seems fine, as according to TinkerPop documentation:
There are situations when the traversal stream needs a "barrier" to
aggregate all the objects and emit a computation that is a function of
the aggregate. The fold()-step (map) is one particular instance of
this.
Knowing that Cosmos charge RUs for both number of accessed objects and computations that are done on those obtained objects (fold in this particular case), higher costs for fold is as expected.
You can try to run executionProfile() step for your traversal, which can help you to investigate your case. When I tried:
g.V().hasLabel('test').values('id').executionProfile()
I got 2 additional steps for fold() (same parts of output omitted for brevity), and this ProjectAggregation is where the result set was mapped from 500 to 1:
...
{
"name": "ProjectAggregation",
"time": 165,
"annotations": {
"percentTime": 8.2
},
"counts": {
"resultCount": 1
}
},
{
"name": "QueryDerivedTableOperator",
"time": 1,
"annotations": {
"percentTime": 0.05
},
"counts": {
"resultCount": 1
}
}
...

How do I select multiple fields in jq?

My input file looks something like this:
{
"login": "dmaxfield",
"id": 7449977,
...
}
{
"login": "dmaxfield",
"id": 7449977,
...
}
I can get all the login names with this : cat members | jq '.[].login'
but I have not been able to crack the syntax to get both the login and id?
You can use jq '.[] | .login, .id' to obtain each login followed by its id.
This works for me:
> echo '{"a":1,"b":2,"c":3}{"a":1,"b":2,"c":3}' | jq '{a,b}'
{
"a": 1,
"b": 2
}
{
"a": 1,
"b": 2
}
Just provide one more example here (jq-1.6):
Walk through an array and select a field of an object element and a field of object in that object
echo '[{"id":1, "private_info": {"name": "Ivy", "age": 18}}, {"id":2, "private_info": {"name": "Tommy", "aga": 18}}]' | jq ".[] | {id: .id, name: .private_info.name}" -
{
"id": 1,
"name": "Ivy"
}
{
"id": 2,
"name": "Tommy"
}
Without the example data:
jq ".[] | {id, name: .private_info.name}" -
.[]: walk through an array
{id, name: .private_info.name}: take .id and .private_info.name and wrap it into an object with field name "id" and "name" respectively
In order to select values which are indented to different levels (i.e. both first and second level), you might use the following:
echo '[{"a":{"aa":1,"ab":2},"b":3,"c":4},{"a":{"aa":5,"ab":6},"b":7,"c":8}]' \
| jq '.[]|[.a.aa,.a.ab,.b]'
[
1,
2,
3
]
[
5,
6,
7
]

Resources