Aggregation/GroupCount on graphDatabase - gremlin

I have a graph's database in gremlin with shape like this image:
Vertices:
Person
Event:
Edge:
Attends
I need help to build a query to get results between all "Persons", with the edge as a count of all "Events" in common. The result should be something like this:
{
nodes: [
{id:"PersonA", label: "Person A"},
{id:"PersonB", label: "Person B"},
{id:"PersonC", label: "Person C"},
{id:"PersonD", label: "Person D"},
{id:"PersonE", label: "Person E"},
{id:"PersonF", label: "Person F"},
],
edges: [
{from: "PersonA", to: "PersonB", label: 1},
{from: "PersonA", to: "PersonC", label: 2},
{from: "PersonA", to: "PersonD", label: 2},
{from: "PersonA", to: "PersonE", label: 1},
{from: "PersonA", to: "PersonF", label: 1},
{from: "PersonB", to: "PersonC", label: 1},
{from: "PersonB", to: "PersonD", label: 1},
{from: "PersonC", to: "PersonD", label: 2},
{from: "PersonC", to: "PersonE", label: 1},
{from: "PersonC", to: "PersonF", label: 1},
{from: "PersonD", to: "PersonE", label: 1},
{from: "PersonD", to: "PersonF", label: 1},
{from: "PersonE", to: "PersonF", label: 1}
]
}
I'm struggling on this for a few hours and can't something close for what I'm looking for.

A picture is nice, but when asking questions about Gremlin it's best to provide a Gremlin script to create your data:
g.addV('person').property(id,'a').as('a').
addV('person').property(id,'b').as('b').
addV('person').property(id,'c').as('c').
addV('person').property(id,'d').as('d').
addV('person').property(id,'e').as('e').
addV('person').property(id,'f').as('f').
addV('event').property(id,'1').as('1').
addV('event').property(id,'2').as('2').
addE('attends').from('a').to('1').
addE('attends').from('a').to('2').
addE('attends').from('b').to('2').
addE('attends').from('c').to('1').
addE('attends').from('c').to('2').
addE('attends').from('d').to('1').
addE('attends').from('d').to('2').
addE('attends').from('e').to('1').
addE('attends').from('f').to('1').iterate()
I went with this approach to solve your problem:
g.V().hasLabel('person').as('s').
out().in().
where(neq('s')).
path().by(id).
groupCount().
by(union(limit(local,1),tail(local,1)).fold()).
unfold().
dedup().
by(select(keys).order(local)).
order().
by(select(keys).limit(local,1)).
by(select(keys).tail(local,1))
which produces the output your seeking:
gremlin> g.V().hasLabel('person').as('s').
......1> out().in().
......2> where(neq('s')).
......3> path().by(id).
......4> groupCount().
......5> by(union(limit(local,1),tail(local,1)).fold()).
......6> unfold().
......7> dedup().
......8> by(select(keys).order(local)).
......9> order().by(select(keys).limit(local,1))
==>[a, b]=1
==>[a, e]=1
==>[a, c]=2
==>[a, d]=2
==>[a, f]=1
==>[b, c]=1
==>[b, d]=1
==>[c, d]=2
==>[c, e]=1
==>[c, f]=1
==>[d, e]=1
==>[d, f]=1
==>[e, f]=1
The approach above utilizes path() to gather the "person->event<-person" that Gremlin travels over and avoid retracing steps with where(neq('s')). It then does a groupCount() by the "person" vertices which represent the person pairs. We now have a Map with the person pairs and their counts as you want but it needs a bit of post-processing so we unfold() the Map to key-value pairs. The first step is to dedup() by the person pairs as the Map currently contains things like "a->b" and "b->a" and we don't need both, so deduping by the ordered list of those pairs, will give us the unique list. Finally, we add some order() to make the results look exactly like yours.
I suppose you could try to dedup() immediately after the path() and avoid some extra work in groupCount().

Related

Can Gremlin aggregate the values of edges connected to the same node?

Suppose you have one node with label, 'A'. This node is connected to many nodes with label, 'B', via edges with label 'e'. For a given B, there can be many edges between A and B with the same label, 'e'. On each edge, there is a property, 'p'.
We want to aggregate all the 'p' properties from edges connected from A, to the same B.
E.g. suppose we have a particular B. One edge between A and that B has a 'p' value of 'foo', and another edge connecting to the same B has a 'p' value of 'bar'. Their aggregation would be:
{'e': {'p': ['foo', 'bar']}
How can this be achieved?
At the moment, I have this query:
g.V()
.hasLabel('A').as('A')
.outE().hasLabel('e').as('e')
.inV().hasLabel('B').as('B')
.select('A', 'e', 'B')
.by(valueMap())
It would produce an output like this:
[
{{'A': {'name': ['john']}, {'e': {'p': ['foo']}, 'B': {'place': 'Qatar'}},
{{'A': {'name': ['john']}, {'e': {'p': ['bar']}, 'B': {'place': 'Qatar'}},
{{'A': {'name': ['john']}, {'e': {'p': ['hello']}, 'B': {'place': 'Argentina'}},
{{'A': {'name': ['john']}, {'e': {'p': ['goodbye']}, 'B': {'place': 'Argentina'}}
]
Whereas, I would want this:
[
{{'A': {'name': ['john']}, {'e': {'p': ['foo', 'bar']}, 'B': {'place': 'Qatar'}},
{{'A': {'name': ['john']}, {'e': {'p': ['hello', 'goodbye']}, 'B': {'place': 'Argentina'}}
]
Using the data from the question, the following graph can be built:
g.addV('A').property('name','john').property(id,'J1').as('j').
addV('B').property('place','Qatar').property(id,'Q1').as('q').
addV('B').property('place','Argentina').property(id,'A1').as('a').
addE('e').from('j').to('q').property('p','foo').
addE('e').from('j').to('q').property('p','bar').
addE('e').from('j').to('a').property('p','hello').
addE('e').from('j').to('a').property('p','goodbye')
Using that graph, we can get close to what you are looking for using a nested group step. From these building blocks you should be able to construct other variations:
g.V().hasLabel('A').as('a').outE('e').as('e').inV().hasLabel('B').as('b').
group().
by(select('a').values('name')).
by(group().
by(select('b').values('place')).
by(select('e').values('p').fold()))
Which yields
{'john': {'Argentina': ['hello', 'goodbye'], 'Qatar': ['foo', 'bar']}}
Using valueMap we can add the keys to the result:
g.V().hasLabel('A').as('a').outE('e').as('e').inV().hasLabel('B').as('b').
group().
by(select('a').values('name')).
by(group().
by(select('b').valueMap('place')).
by(select('e').valueMap('p').unfold().group().by(keys).by(values)))
Which produces
{'john': {{'place': ('Argentina',)}: {'p': ['hello', 'goodbye']}, {'place': ('Qatar',)}: {'p': ['foo', 'bar']}}}
So, what we end up with, for each person (just "john" in this case), is a list containing each place they visited along with the "p" values for each edge that got them there). You can then select into that nested structure any way you need to to extract individual pieces. With these building blocks you should be able to tweak things to get any variations of this output that you prefer.

Getting properties of a vertex along with the properties of its child vertices [Gremlin]

I’m using Gremlin for querying a Graph DB and I’m having trouble figuring out how to retrieve all properties of a specific vertex along with all properties of specific child vertices. I know valueMap() is generally the operation to use to expose properties of nodes, but I'm not sure I am using it correctly.
Here is a visual representation of the Graph that I am working with. In this graph there are author nodes, which can be related to multiple book nodes connected by a wrote edge. And a book node can be connected to multiple chapter nodes by a hasChapter edge. A book node has a title and year as additional properties, while a chapter node has a name and a length as additional properties:
Here is the data that produces the above graph:
g.addV('author').property(id, 'author1').as('a1').
addV('book').
property(id,'book1').
property('title', 'Book 1').
property('year', '1999').
as('b1').
addV('book').
property(id,'book2').
property('title', 'Book 2').
property('year', '2000').
as('b2').
addV('book').
property(id,'book3').
property('title', 'Book 3').
property('year', '2002').
as('b3').
addE('wrote').from('a1').to('b1').
addE('wrote').from('a1').to('b2').
addE('wrote').from('a1').to('b3').
addV('chapter').
property(id,'b1chapter1').
property('name', 'The Start').
property('length', '350').
as('b1c1').
addV('chapter').
property(id,'b1chapter2').
property('name', 'Trees').
property('length', '500').
as('b1c2').
addV('chapter').
property(id,'b2chapter1').
property('name', 'Chapter 1').
property('length', '425').
as('b2c1').
addV('chapter').
property(id,'b2chapter2').
property('name', 'Chapter 2').
property('length', '650').
as('b2c2').
addV('chapter').
property(id,'b2chapter3').
property('name', 'Chapter 3').
property('length', '505').
as('b2c3').
addE('hasChapter').from('b1').to('b1c1').
addE('hasChapter').from('b1').to('b1c2').
addE('hasChapter').from('b2').to('b2c1').
addE('hasChapter').from('b2').to('b2c2').
addE('hasChapter').from('b2').to('b2c3').
iterate()
I would like to form a query that is able to return the properties of all books that are written by author1, along with all properties of each book’s chapters, ideally sorted by date (ascending). I'm wondering if it's possible to make a query that would return the results in the following fashion (or something similar enough that I can parse through on the client side):
1 {'year': ['1999'], 'title': ['Book 1'], 'chapters': [{'name': ['The Start'], 'length': ['350']}, {'name': ['Trees'], 'length': ['500']}]}
2 {'year': ['2000'], 'title': ['Book 2'], 'chapters': [{'name': ['Chapter 1'], 'length': ['425']}, {'name': ['Chapter 2'], 'length': ['650']}, {'name': ['Chapter 3'], 'length': ['505']}]}
3 {'year': ['2002'], 'title': ['Book 3'], 'chapters': []}
So far, I have attempted a few variations of this query with no luck:
g.V('author1').as('writer')
.out('wrote').as('written')
.order().by('year', asc)
.out('hasChapter').as('chapter')
.project('written', 'chapter')
.by(valueMap())
which returns:
1 {'written': {'name': ['The Start'], 'length': ['350']}, 'chapter': {'name': ['The Start'], 'length': ['350']}}
2 {'written': {'name': ['Trees'], 'length': ['500']}, 'chapter': {'name': ['Trees'], 'length': ['500']}}
3 {'written': {'name': ['Chapter 1'], 'length': ['425']}, 'chapter': {'name': ['Chapter 1'], 'length': ['425']}}
4 {'written': {'name': ['Chapter 2'], 'length': ['650']}, 'chapter': {'name': ['Chapter 2'], 'length': ['650']}}
5 {'written': {'name': ['Chapter 3'], 'length': ['505']}, 'chapter': {'name': ['Chapter 3'], 'length': ['505']}}
The above query only returns the chapter properties for all books with chapters, whereas I'm looking for a query that will give me all book properties (regardless of whether the book has chapters) and all chapter properties of each book. Anyone have any advice on the use of valueMap() across multiple levels of traversal. Ideally would like to avoid multiple queries to the Graph DB, but open to solutions that involve doing so.
This below query gives the result exactly as you needed.
I think the key is projecting the values inside the by modulator.
gremlin> g.V('author1').
......1> out('wrote').
......2> order().by('year', asc).
......3> project('year', 'title', 'chapters').
......4> by('year').
......5> by('title').
......6> by(out('hasChapter').valueMap().fold())
==>[year:1999,title:Book 1,chapters:[[name:[Trees],length:[500]],[name:[The Start],length:[350]]]]
==>[year:2000,title:Book 2,chapters:[[name:[Chapter 1],length:[425]],[name:[Chapter 2],length:[650]],[name:[Chapter 3],length:[505]]]]
==>[year:2002,title:Book 3,chapters:[]]

jq function to convert a stream of strings into a single array

I am trying to get an array of all possible paths in a JSON Document.
Given the document:
{
"a": "bar",
"b": [
{"c": 3}, {"d": 6},
{"c": 7}, {"d": 5}
]
}
I'd like the output to be:
["","a","b","b/0","b/0/c","b/1","b/1/d","b/2","b/2/c","b/3","b/3/d"]
I got pretty close, here is a snippet on the JQ Playground.
Try
jq '["", (paths | join("/"))]'
Demo

JQ to filter only vaule of id

the following is the JSON data. need to get only of id key
{apps:[ {
"id": "/application1/4b693882-ffba-4c93-a0f2-cccafcb4d7dd",
"cmd": null,
"args": null,
"user": null,
"env": {},
"constraints": [
[
"hostname",
"GROUP_BY",
"5"
]
},
{
"id": "/application2/4b693882-ffba-4c93-a0f2-cccafcb4d7dd",
"cmd": null,
"args": null,
"user": null,
"env": {},
"constraints": [
[
"hostname",
"GROUP_BY",
"5"
]
]},
output expected is
/application1/4b693882-ffba-4c93-a0f2-cccafcb4d7dd
/application2/4b693882-ffba-4c93-a0f2-cccafcb4d7dd
Thanks in advance
After fixing the errors in your JSON, we can use the following jq filter to get the desired output:
.apps[] | .id
JqPlay Demo
Result jq -r '.apps[] | .id':
/application1/4b693882-ffba-4c93-a0f2-cccafcb4d7dd
/application2/4b693882-ffba-4c93-a0f2-cccafcb4d7dd
You can use map() to create an array from the properties of the objects. Try this:
let data = {apps:[{"id":"/application1/4b693882-ffba-4c93-a0f2-cccafcb4d7dd","cmd":null,"args":null,"user":null,"env":{},"constraints":["hostname","GROUP_BY","5"]},{"id":"/application2/4b693882-ffba-4c93-a0f2-cccafcb4d7dd","cmd":null,"args":null,"user":null,"env":{},"constraints":["hostname","GROUP_BY","5"]}]}
let ids = data.apps.map(o => o.id);
console.log(ids);
Note that I corrected the invalid brace/bracket combinations in the data structure you posted in the question. I assume this is just a typo in that example, otherwise there would be parsing errors in the console.

GeoJSON MultiPolygon with multiple holes

Below I have what I'd expect is a way to create a GeoJSON MultiPolygon object with one polygon in it which has two "holes".
When I use the service http://geojson.io/ to validate this object, it returns with an error each element in a position must be a number and it does not render, however if I remove the "holes" nest, removing one of them then it works.
I'm looking for a way to describe a MultiPolygon where the polygons can have multiple holes.
I'm not looking for a way in code to create a polygon with holes.
I'm looking for a way to use the GeoJSON spec to represent MultiPolygons with multiple holes.
{
"type": "MultiPolygon",
"coordinates": [
[
[
[
-73.98114904754641,
40.7470284264813
],
[
-73.98314135177611,
40.73416844413217
],
[
-74.00538969848634,
40.734314779027144
],
[
-74.00479214294432,
40.75027851544338
],
[
-73.98114904754641,
40.7470284264813
]
],
[
[
[
-73.99818643920906,
40.74550031602355
],
[
-74.00298643920905,
40.74550031602355
],
[
-74.00058643920897,
40.74810024102966
],
[
-73.99818643920906,
40.74550031602355
]
],
[
[
-73.98917421691903,
40.73646098717515
],
[
-73.99397421691901,
40.73646098717515
],
[
-73.99157421691893,
40.739061265535696
],
[
-73.98917421691903,
40.73646098717515
]
]
]
]
]
}
This is how it works:
{
"type": "MultiPolygon",
"coordinates": [
[
{polygon},
{hole},
{hole},
{hole}
]
]
}
Not like this:
{
"type": "MultiPolygon",
"coordinates": [
[
{polygon},
[
{hole},
{hole},
{hole}
]
]
]
}
Here's an example!
{
"type": "MultiPolygon",
"coordinates": [
[
[
[
-47.900390625,
-14.944784875088372
],
[
-51.591796875,
-19.91138351415555
],
[
-41.11083984375,
-21.309846141087192
],
[
-43.39599609375,
-15.390135715305204
],
[
-47.900390625,
-14.944784875088372
]
],
[
[
-46.6259765625,
-17.14079039331664
],
[
-47.548828125,
-16.804541076383455
],
[
-46.23046874999999,
-16.699340234594537
],
[
-45.3515625,
-19.31114335506464
],
[
-46.6259765625,
-17.14079039331664
]
],
[
[
-44.40673828125,
-18.375379094031825
],
[
-44.4287109375,
-20.097206227083888
],
[
-42.9345703125,
-18.979025953255267
],
[
-43.52783203125,
-17.602139123350838
],
[
-44.40673828125,
-18.375379094031825
]
]
]
]
}
For your example in fact it's not really a MultiPolygon (in the sense of geoJSON) but a simple Polygon (with a single outer ring and multiple inner rings for the holes).
Note the difference with Multipolygons in OSM (which represents them as a relation containing ways, and whose first and last node should be merged to the same "node" element (something that does not exist in geoJSON where they are unified only by the fact that the two nodes have the same coordinates, but will in reality be automatically closed by an additional segment for "Polygon" and "MultiPolygon" types of GeoJSON)
Note that when you import a geoJSON in OSM editors (such as JOSM) they will be imported with separate nodes for the first and last node, even if they have the same coordinates - you need to use the JOSM validator to detect superposed nodes and merge them after the import in JOSM but before submission to OSM.
But in scripts or general use of geoJSON, all rings (arrays of coordinate pairs) in a "type":"Polygon" or members of a "type":"Polygon" are not required to include the same coordinates for the last node as the first node, because it is implicit (but it is still recommended to add this duplicate node for compatibility). Such closure of rings is implicit for "Polygon" and "MultiPolygon" (as they represent surfaces), but not for "Polyline" and "MultiPolyline" (as they represent curves) where you still need to include twice the same coordinates for the first and last node to get closed curves.
To represent an OSM "multipolygon" with multiple "outer" rings, you have to include several "[ {outer},{inner*} ]" in the main array of coordinates for the geoJSON "MultiPolygon" type, i.e.
{"type":"MultiPolygon", "coordinates":[
[
[[x0,y0], [x1,y1], ... [x0,y0]], /*outer1*/
[[x0,y0], [x1,y1], ... [x0,y0]], /*inner1, optional*/
[[x0,y0], [x1,y1], ... [x0,y0]], /*inner2, optional*/
],[
[[x0,y0], [x1,y1], ... [x0,y0]], /*outer2*/
],...,[
[[x0,y0], [x1,y1], ... [x0,y0]], /*outer3*/
],[
[[x0,y0], [x1,y1], ... [x0,y0]], /*outer4*/
]
]}
So for your example, the solution is:
{"type":"Polygon", "coordinates":[
[[x0,y0], [x1,y1], [x2,y2], [x3,y3], [x0,y0]], /*outer1*/
[[x4,y4], [x5,y5], [x6,y6], [x4,y4]], /*inner1*/
[[x7,y7], [x8,y8], [x9,y9], [x7,y7]] /*inner2*/
]}
If you had several outer rings only (possibly overlapping to create an union of surfaces, but this is not recommended) it would need to be a MultiPolygon, and here you would get no "holes":
{"type":"MultiPolygon", "coordinates":[
[[[x0,y0], [x1,y1], [x2,y2], [x3,y3], [x0,y0]]], /*outer1*/
[[[x4,y4], [x5,y5], [x6,y6], [x4,y4]]], /*outer2*/
[[[x7,y7], [x8,y8], [x9,y9], [x7,y7]]] /*outer3*/
]}
Note there's one less level of [square braces] because we can use "Polygon" here instead of a Multipolygon that would contain only one member in your example.
As far as I know, you can use SUBSTR(JSON_EXTRACT(ST_ASGEOJSON(WKT) function if converting from wkt to geography. That way you can represent in map. What I found in bigquery is seems like multipolygon with holes switch position for holes coordinates when u use ST_ASGEOJSON().
And check out this link:
https://dev.socrata.com/docs/datatypes/multipolygon.html#,

Resources