collection findall in an array list - collections

I am using groovy and I have a collection :
person 1: age - 1, weight - 25
person 2: age - 2, weight - 20
person 3: age - 3, weight - 25
I need to find all persons whose age or weight is in the list of valid age/weight returned by a method called getValidAgeForSchool() or getValidWeightForSchool() ex. ages [2,3] or weight [20,25]
I know there is something like this (not working too)
persons.findAll{ it.age == 2 || it.weight == 20}
but how I can say (like the IN Clause)
persons.findAll {it.age in [2,3] || it.weight in [20,25]}.
I also tried this (ignoring the weight for now) but not returning the list when it is supposed to
persons.age.findAll{ it == 2 || it == 3}
thanks.

The code you have works:
def people = [
[ id: 1, age: 1, weight: 25 ],
[ id: 2, age: 2, weight: 20 ],
[ id: 3, age: 3, weight: 25 ]
]
// This will find everyone (as everyone matches your criteria)
assert people.findAll {
it.age in [ 2, 3 ] || it.weight in [ 20, 25 ]
}.id == [ 1, 2, 3 ]
It also works if you have a list of instances like so:
class Person {
int id
int age
int weight
}
def people = [
new Person( id: 1, age: 1, weight: 25 ),
new Person( id: 2, age: 2, weight: 20 ),
new Person( id: 3, age: 3, weight: 25 )
]
I'm assuming your problem is that you have weight as a double or something?
If weight is a double, you'd need to do:
people.findAll { it.age in [ 2, 3 ] || it.weight in [ 20d, 25d ] }.id
But beware, this is doing double equality comparisons, so if you are doing any arithmetic on the weight, you may fall victim to rounding and accuracy errors

Related

Incrementing a value progressively on each item

So i have this Grafana dashboard that i'm making up using jq and different files. The problem i end up with is that when you export the json produced by Grafana, it will export it the way it sees it currently. Example:
[
{
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 22
},
"panels": []
},
{
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 43
},
"panels": []
},
{
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 17
},
"panels": []
}
]
But the problem is that the grid positions need to be properly incremented (the Y's) so that when you reload the Grafana dashboards, the panels nested under row panels get set to their proper locations. If you have a sub panel that has a gridPos.y that is lower than the row panel's gridPos.y then it will appear in a weird location.
I tried using reduce and foreach but i'm not super good with these constructs yet. For example, i tried this:
[
1 as $currentY |
foreach .[] as $item (
[];
(. + [$item * {"gridPos": {"y": ($currentY + 1)}}]);
. | last
)
]
But i can't figure out how to increment $currentY within the loop to achieve proper incrementation. The objective would be to nest a second foreach/reduce to continue setting and incrementing $currentY in all panels and sub panels.
Can you help? Thanks!
Note: I know i should use reduce when using .|last, this was just the last try. Don't point that out, i want guidance on how to increment $currentY in the current approach.
With your existing approach as such, you need to reference the y field in each $item processed and increment its value, rather than the predefined value of $currentY, i.e.
[
1 as $currentY |
foreach .[] as $item (
[];
(. + [$item * {"gridPos": {"y": ($currentY + $item.gridPos.y )}}]);
last
)
]
which again could be written as
[
1 as $currentY |
foreach .[] as $item (
[];
(. + [ $item | .gridPos.y += $currentY ]);
last
)
]
which again could be written with a simple walk expression
1 as $currentY |
walk ( if type == "object" and has("gridPos") then .gridPos.y += $currentY else . end )

JQ: How to split array by values and find out length of each piece?

I need to find out lengths of user sessions given timestamps of individual visits.
New session starts every time a delay between adjacent timestamps is longer than limit.
For example, for this set of timestamps (consider it sort of seconds from epoch):
[
101,
102,
105,
116,
128,
129,
140,
145,
146,
152
]
...and for value of limit=10, I need the following output:
[
3,
1,
2,
4
]
Assuming the values will be in ascending order, loop through the values accumulating the groups based on your condition. reduce works well in this case.
10 as $limit # remove this so you can feed in your value as an argument
| reduce .[] as $i (
{prev:.[0], group:[], result:[]};
if ($i - .prev > $limit)
then {prev:$i, group:[$i], result:(.result + [.group])}
else {prev:$i, group:(.group + [$i]), result}
end
)
| [(.result[], .group) | length]
If the difference from the previous value exceeds the limit, take the current group of values and move it to the result. Otherwise, the current value belongs to the current group so add it. At the end, you could count the sizes of the groups to get your result.
Here's a slightly modified version that just counts the values up.
10 as $limit
| reduce .[] as $i (
{prev:.[0], count:0, result:[]};
if ($i - .prev > $limit)
then {prev:$i, count:1, result:(.result + [.count])}
else {prev:$i, count:(.count + 1), result}
end
)
| [.result[], .count]
Here's another approach using indices to calculate the breakpoint positions:
Producing the lengths of the segments:
10 as $limit
| [
[0, indices(while(. != []; .[1:]) | select(.[0] + $limit <= .[1]))[] + 1, length]
| .[range(length-1):] | .[1] - .[0]
]
[
3,
1,
2,
4
]
Demo
Producing the segments themselves:
10 as $limit
| [
(
[indices(while(. != []; .[1:]) | select(.[0] + $limit <= .[1]))[] + 1]
| [null, .[0]], .[range(length):]
)
as [$a,$b] | .[$a:$b]
]
[
[
101,
102,
105
],
[
116
],
[
128,
129
],
[
140,
145,
146,
152
]
]
Demo

jq - find duplicates in a value which is nested array of strings

Assuming the below input, how can I detect the presence of duplicates in the replicas list? (replicas":[5,5,6]")
{"version":1,
"partitions":
[{"topic":"mytopic1","partition":3,"replicas":[4,5],"log_dirs":["any","any"]},
{"topic":"mytopic1","partition":1,"replicas":[5,5,6],"log_dirs":["any","any"]},
{"topic":"mytopic2","partition":2,"replicas":[6,5],"log_dirs":["any","any"]}]
}
This one will give you an array of just the partitions with duplicates in the replicas field:
jq '[.partitions[] | select((.replicas | length) != (.replicas | unique | length))]' input.json
Pretty-printed example output:
[
{
"topic": "mytopic1",
"partition": 1,
"replicas": [
5,
5,
6
],
"log_dirs": [
"any",
"any"
]
}
]

MariaDB JSON_ARRAYAGG gives wrong result

I have 2 problems in MariaDB 15.1 when using JSON_ARRAYAGG
The brackets [] are omitted
Incorrect wrong result, values are duplicates or omitted
My database is the following:
user:
+----+------+
| id | name |
+----+------+
| 1 | Jhon |
| 2 | Bob |
+----+------+
car:
+----+---------+-------------+
| id | user_id | model |
+----+---------+-------------+
| 1 | 1 | Tesla |
| 2 | 1 | Ferrari |
| 3 | 2 | Lamborghini |
+----+---------+-------------+
phone:
+----+---------+----------+--------+
| id | user_id | company | number |
+----+---------+----------+--------+
| 1 | 1 | Verzion | 1 |
| 2 | 1 | AT&T | 2 |
| 3 | 1 | T-Mobile | 3 |
| 4 | 2 | Sprint | 4 |
| 5 | 1 | Sprint | 2 |
+----+---------+----------+--------+
1. The brackets [] are omitted
For example this query that gets users with their list of cars:
SELECT
user.id AS id,
user.name AS name,
JSON_ARRAYAGG(
JSON_OBJECT(
'id', car.id,
'model', car.model
)
) AS cars
FROM user
INNER JOIN car ON user.id = car.user_id
GROUP BY user.id;
Result: brackets [] were omitted in cars (JSON_ARRAYAGG has the behavior similar to GROUP_CONCAT)
+----+------+-----------------------------------------------------------+
| id | name | cars |
+----+------+-----------------------------------------------------------+
| 1 | Jhon | {"id": 1, "model": "Tesla"},{"id": 2, "model": "Ferrari"} |
| 2 | Bob | {"id": 3, "model": "Lamborghini"} |
+----+------+-----------------------------------------------------------+
However when adding the filter WHERE user.id = 1, the brackets [] are not omitted:
+----+------+-------------------------------------------------------------+
| id | name | cars |
+----+------+-------------------------------------------------------------+
| 1 | Jhon | [{"id": 1, "model": "Tesla"},{"id": 2, "model": "Ferrari"}] |
+----+------+-------------------------------------------------------------+
2. Incorrect wrong result, values are duplicates or omitted
This error is strange as the following conditions must be met:
Consult more than 2 tables
The DISTINCT option must be used
A user has at least 2 cars and at least 3 phones.
Duplicate values
for example, this query that gets users with their car list and their phone list:
SELECT
user.id AS id,
user.name AS name,
JSON_ARRAYAGG( DISTINCT
JSON_OBJECT(
'id', car.id,
'model', car.model
)
) AS cars,
JSON_ARRAYAGG( DISTINCT
JSON_OBJECT(
'id', phone.id,
'company', phone.company,
'number', phone.number
)
) AS phones
FROM user
INNER JOIN car ON user.id = car.user_id
INNER JOIN phone ON user.id = phone.user_id
GROUP BY user.id;
I will leave the output in json format and I will only leave the elements that interest.
Result: brackets [] were omitted and duplicate Verizon
{
"id": 1,
"name": "Jhon",
"phones": // [ Opening bracket expected
{
"id": 5,
"company": "Sprint",
"number": 2
},
{
"id": 1,
"company": "Verzion",
"number": 1
},
{
"id": 1,
"company": "Verzion",
"number": 1
}, // Duplicate object with the DISTINCT option
{
"id": 2,
"company": "AT&T",
"number": 2
},
{
"id": 3,
"company": "T-Mobile",
"number": 3
}
// ] Closing bracket expected
}
Omitted values
This error occurs when omit phone.id is omitted in the query
SELECT
user.id AS id,
user.name AS name,
JSON_ARRAYAGG( DISTINCT
JSON_OBJECT(
'id', car.id,
'model', car.model
)
) AS cars,
JSON_ARRAYAGG( DISTINCT
JSON_OBJECT(
--'id', phone.id,
'company', phone.company,
'number', phone.number
)
) AS phones
FROM user
INNER JOIN car ON user.id = car.user_id
INNER JOIN phone ON user.id = phone.user_id
GROUP BY user.id;
Result: brackets [] were omitted and Sprint was omitted.
Apparently this happens because it makes an OR type between the columns of the JSON_OBJECT, since the company exists in a different row and number in a other different row
{
"id": 1,
"name": "Jhon",
"phones": // [ Opening bracket expected
//{
// "company": "Sprint",
// "number": 2
//}, `Sprint` was omitted
{
"company": "Verzion",
"number": 1
},
{
"company": "AT&T",
"number": 2
},
{
"company": "T-Mobile",
"number": 3
}
// ] Closing bracket expected
}
GROUP_CONCAT instance of JSON_ARRAYAGG solves the problem of duplicate or omitted objects
However, by adding the filter WHERE user.id = 1, the brackets [] are not omitted and also the problem of duplicate or omitted objects is also solved:
{
"id": 1,
"name": "Jhon",
"phones": [
{
"id": 1,
"company": "Verzion",
"number": 1
},
{
"id": 2,
"company": "AT&T",
"number": 2
},
{
"id": 3,
"company": "T-Mobile",
"number": 3
},
{
"id": 5,
"company": "Sprint",
"number": 2
}
]
}
What am I doing wrong?
So far my solution is this, but I would like to use JSON_ARRAYAGG since the query is cleaner
-- 1
SELECT
user.id AS id,
user.name AS name,
CONCAT(
'[',
GROUP_CONCAT( DISTINCT
JSON_OBJECT(
'id', car.id,
'model', car.model
)
),
']'
) AS cars
FROM user
INNER JOIN car ON user.id = car.user_id
GROUP BY user.id;
-- 2
SELECT
user.id AS id,
user.name AS name,
CONCAT(
'[',
GROUP_CONCAT( DISTINCT
JSON_OBJECT(
'id', car.id,
'model', car.model
)
),
']'
) AS cars,
CONCAT(
'[',
GROUP_CONCAT( DISTINCT
JSON_OBJECT(
'id', phone.id,
'company', phone.company,
'number', phone.number
)
),
']'
) AS phones
FROM user
INNER JOIN car ON user.id = car.user_id
INNER JOIN phone ON user.id = phone.user_id
GROUP BY user.id;

Understanding fold() and its impact on gremlin query cost in Azure Cosmos DB

I am trying to understand query costs in Azure Cosmos DB
I cannot figure out what is the difference in the following examples and why using fold() lowers the cost:
g.V().hasLabel('item').project('itemId', 'id').by('itemId').by('id')
which produces the following output:
[
{
"itemId": 14,
"id": "186de1fb-eaaf-4cc2-b32b-de8d7be289bb"
},
{
"itemId": 5,
"id": "361753f5-7d18-4a43-bb1d-cea21c489f2e"
},
{
"itemId": 6,
"id": "1c0840ee-07eb-4a1e-86f3-abba28998cd1"
},
....
{
"itemId": 5088,
"id": "2ed1871d-c0e1-4b38-b5e0-78087a5a75fc"
}
]
The cost is 15642 RUs x 0.00008 $/RU = 1.25$
g.V().hasLabel('item').project('itemId', 'id').by('itemId').by('id').fold()
which produces the following output:
[
[
{
"itemId": 14,
"id": "186de1fb-eaaf-4cc2-b32b-de8d7be289bb"
},
{
"itemId": 5,
"id": "361753f5-7d18-4a43-bb1d-cea21c489f2e"
},
{
"itemId": 6,
"id": "1c0840ee-07eb-4a1e-86f3-abba28998cd1"
},
...
{
"itemId": 5088,
"id": "2ed1871d-c0e1-4b38-b5e0-78087a5a75fc"
}
]
]
The cost is 787 RUs x 0.00008$/RU = 0.06$
g.V().hasLabel('item').values('id', 'itemId')
with the following output:
[
"186de1fb-eaaf-4cc2-b32b-de8d7be289bb",
14,
"361753f5-7d18-4a43-bb1d-cea21c489f2e",
5,
"1c0840ee-07eb-4a1e-86f3-abba28998cd1",
6,
...
"2ed1871d-c0e1-4b38-b5e0-78087a5a75fc",
5088
]
cost: 10639 RUs x 0.00008 $/RU = 0.85$
g.V().hasLabel('item').values('id', 'itemId').fold()
with the following output:
[
[
"186de1fb-eaaf-4cc2-b32b-de8d7be289bb",
14,
"361753f5-7d18-4a43-bb1d-cea21c489f2e",
5,
"1c0840ee-07eb-4a1e-86f3-abba28998cd1",
6,
...
"2ed1871d-c0e1-4b38-b5e0-78087a5a75fc",
5088
]
]
The cost is 724.27 RUs x 0.00008 $/RU = 0.057$
As you see, the impact on the cost is tremendous.
This is just for approx. 3200 nodes with few properties.
I would like to understand why adding fold changes so much.
I was trying to reproduce your example, but unfortunately have opposite results (500 vertices in Cosmos):
g.V().hasLabel('test').values('id')
or
g.V().hasLabel('test').project('id').by('id')
gave respectively
86.08 and 91.44 RU, while same queries followed by fold() step resulted in 585.06 and
590.43 RU.
This result I got seems fine, as according to TinkerPop documentation:
There are situations when the traversal stream needs a "barrier" to
aggregate all the objects and emit a computation that is a function of
the aggregate. The fold()-step (map) is one particular instance of
this.
Knowing that Cosmos charge RUs for both number of accessed objects and computations that are done on those obtained objects (fold in this particular case), higher costs for fold is as expected.
You can try to run executionProfile() step for your traversal, which can help you to investigate your case. When I tried:
g.V().hasLabel('test').values('id').executionProfile()
I got 2 additional steps for fold() (same parts of output omitted for brevity), and this ProjectAggregation is where the result set was mapped from 500 to 1:
...
{
"name": "ProjectAggregation",
"time": 165,
"annotations": {
"percentTime": 8.2
},
"counts": {
"resultCount": 1
}
},
{
"name": "QueryDerivedTableOperator",
"time": 1,
"annotations": {
"percentTime": 0.05
},
"counts": {
"resultCount": 1
}
}
...

Resources