Calculate connection between consecutive vertices on the path - gremlin

Given a graph as the image below described:
Find all the paths between person1 and person5, then calculate connection between consecutive vertices on the path.
To illustrate the definition of connection, take person1 and person2 as an example:
1. person1 create comment1 to reply post2 created by person2
2. person2 create comment3 to reply post1 created by person1
So, connection between person1 and person2 is 2; and that between person2 and person5 is 0.
The path in the graph given above is [v[person1],v[person2],v[person5]]:
gremlin> g.V('person1').
......1> repeat(both('knows').simplePath()).
......2> until(hasId('person5')).path()
==>[v[person1],v[person2],v[person5]]
For now, i can only manage to get dsl:
gremlin> g.V('person1').
......1> repeat(both('knows').simplePath()).
......2> until(hasId('person5').or().loops().is(eq(2))).hasId('person5').path().
......3> repeat(
......4> filter(count(local).is(gt(1))).
......5> sack(assign).by(
......6> sideEffect(range(local,1,2).aggregate('m')).
......7> range(local,0,1).
......8> in('hasCreator').hasLabel('comment').
......9> out('replyOf').hasLabel('post').
.....10> out('hasCreator').where(within('m')).count()
.....11> ).
.....12> sack(sum).by(
.....13> sideEffect(range(local,0,1).aggregate('n')).
.....14> range(local,1,2).
.....15> in('hasCreator').hasLabel('comment').
.....16> out('replyOf').hasLabel('post').
.....17> out('hasCreator').where(within('n')).count()
.....18> ).
.....19> skip(local, 1)
.....20> ).
.....21> emit().sack().fold()
==>[2,1]
But the result is wrong, which is expected to be [2,0]. I know that i should not use aggregate to filter, but i can't find an proper method according to my knowledge.
The example graph can be generated by :
g.addV('person').property(id, 'person1')
g.addV('person').property(id, 'person2')
g.addV('person').property(id, 'person5')
g.addE('knows').from(V('person1')).to(V('person2'))
g.addE('knows').from(V('person2')).to(V('person5'))
g.addV('post').property(id, 'post1')
g.addV('post').property(id, 'post2')
g.addV('comment').property(id, 'comment1')
g.addV('comment').property(id, 'comment2')
g.addV('comment').property(id, 'comment3')
g.addE('hasCreator').from(V('post1')).to(V('person1'))
g.addE('hasCreator').from(V('post2')).to(V('person2'))
g.addE('hasCreator').from(V('comment1')).to(V('person1'))
g.addE('hasCreator').from(V('comment2')).to(V('person2'))
g.addE('hasCreator').from(V('comment3')).to(V('person2'))
g.addE('replyOf').from(V('comment1')).to(V('post2'))
g.addE('replyOf').from(V('comment2')).to(V('post2'))
g.addE('replyOf').from(V('comment3')).to(V('post1'))

After I replace the usage of aggregate with select, I can get the right answer by now:
g.V('person1').
repeat(both('knows').simplePath()).
until(hasId('person5').or().loops().is(eq(2))).hasId('person5').path().
repeat(
filter(count(local).is(gt(1))).
sack(assign).by(
__.as('orig').
range(local,1,2).as('v2').
select('orig').range(local,0,1).
in('hasCreator').hasLabel('comment').
out('replyOf').hasLabel('post').
out('hasCreator').where(eq('v2')).count()
).
sack(sum).by(
__.as('orig').
range(local,0,1).as('v1').
select('orig').range(local,1,2).
in('hasCreator').hasLabel('comment').
out('replyOf').hasLabel('post').
out('hasCreator').where(eq('v1')).count()
).
skip(local, 1)
).
emit().sack().fold()

Related

How to merge values from different objects in gremlin query?

I have a query that returns the output in the following format,
{
"Key": [
"Value1",
"Value2"
],
"Count": [
{
"Count1": 28,
"Count2": 28
},
{
"Count3": 16,
"Count4": 16
}
]
}
I want to display it in the following format
[
{
"Key" : "Value1",
"Count1": 28,
"Count2": 28
},
{
"Key" : "Value2",
"Count3": 16,
"Count4": 16
}
]
Is it possible?
The gremlin that produces a similar output
g.V().
has('organizationId', 'b121672e-8049-40cc-9f28-c62dff4cc2d9').
hasLabel('employee').
group().
by('officeId').
by(project('Id', 'Status').
by(choose(has('officeId'), constant('Total'), constant(''))).
by(coalesce(out('hasStatus').
or(
has('release', is(false)),
has('autoRelease', is(true)).
has('release', is(true)).
has('endDate', gte(637250976000000000))
), values('status'), constant('Green'))).
select(values).
unfold().
groupCount()).
project('Id', 'Count').
by(select(keys)).
by(select(values))
And the data that I have is an employee vertex and a healthStatus vertex, there's an hasStatus edge between employee and healthStatus
Properties in employee vertex:
id, organizations, officeId, Name, createdOn
Properties in healthStatus vertex:
id, status, startDate, endDate, release, autoRelease, createdOn
Sample Data
g.addV('employee').
property('id',1).
property('organizationId',1).
property('officeId',1).
property('name','A').
property('createdOn', 637263231140000000).as('1').
addV('employee').
property('id',2).
property('organizationId',1).
property('officeId',2).
property('name','B').
property('createdOn', 637263231140000000).as('2').
addV('employee').
property('id',5).
property('organizationId',1).
property('officeId',3).
property('name','C').
property('createdOn', 637263231140000000).as('5').
addV('healthStatus').
property('id',3).
property('status','Red').
property('startDate',637262367140000000).
property('endDate',637264095140000000).
property('release',false).
property('createdOn',637262367140000000)as('3').
addV('healthStatus').
property('id',4).
property('status','Yellow').
property('startDate',637262367140000000).
property('endDate',637264095140000000).
property('release',false).
property('createdOn',637262367140000000)as('4').
addE('hasStatus').from('1').to('3').
addE('hasStatus').from('4').to('4')
Output:
[
{
"Id" : [
1,
2,
3
]
},
{
"Count": [
{
"Red" : 1
},
{
"Yellow" : 1
},
{
"Green" : 1
}
]
}
Expected Output
[
{
"Id" : 1,
"Red" : 1
},
{
"Id" : 2,
"Yellow" : 1
},
{
"Id" : 3,
"Green" : 1
}
]
Note : This Id in projection is officeId from employee vertex
I think I've captured what you wanted. There were some errors in your sample data script and I wanted some extra data to make sure counts were making sense so I added a bit:
g = TinkerGraph.open().traversal()
g.addV('employee').
property('id',1).
property('organizationId',1).
property('officeId',1).
property('name','A').
property('createdOn', 637263231140000000).as('1').
addV('employee').
property('id',2).
property('organizationId',1).
property('officeId',2).
property('name','B').
property('createdOn', 637263231140000000).as('2').
addV('employee').
property('id',5).
property('organizationId',1).
property('officeId',3).
property('name','C').
property('createdOn', 637263231140000000).as('5').
addV('employee').
property('id',6).
property('organizationId',1).
property('officeId',3).
property('name','D').
property('createdOn', 637263231140000000).as('6').
addV('healthStatus').
property('id',3).
property('status','Red').
property('startDate',637262367140000000).
property('endDate',637264095140000000).
property('release',false).
property('createdOn',637262367140000000).as('3').
addV('healthStatus').
property('id',4).
property('status','Yellow').
property('startDate',637262367140000000).
property('endDate',637264095140000000).
property('release',false).
property('createdOn',637262367140000000).as('4').
addE('hasStatus').from('1').to('3').
addE('hasStatus').from('2').to('4').
addE('hasStatus').from('6').to('4')
I've re-written you traversal a bit to provide a different approach that I think provides the data you expect, however in a slightly different form:
gremlin> g.V().has('employee','organizationId', 1).
......1> project('Id', 'Status').
......2> by('officeId').
......3> by(coalesce(out('hasStatus').
......4> or(has('release', false),
......5> has('autoRelease', true).has('release', true).has('endDate', gte(637250976000000000))).
......6> values('status'),
......7> constant('Green'))).
......8> group().
......9> by(select('Id')).
.....10> by(groupCount().
.....11> by('Status'))
==>[1:[Red:1],2:[Yellow:1],3:[Yellow:1,Green:1]]
I prefer this form a bit, but perhaps you require the original format you inquired about, in which case you need another round of manipulation on the collection:
gremlin> g.V().has('employee','organizationId', 1).
......1> project('Id', 'Status').
......2> by('officeId').
......3> by(coalesce(out('hasStatus').
......4> or(has('release', false),
......5> has('autoRelease', true).has('release', true).has('endDate', gte(637250976000000000))).
......6> values('status'),
......7> constant('Green'))).
......8> group().
......9> by(select('Id')).
.....10> by(groupCount().
.....11> by('Status')).
.....12> unfold().
.....13> map(union(project('Id').by(select(keys)),
.....14> select(values)).
.....15> unfold().
.....16> group().by(keys).by(select(values)))
==>[Red:1,Id:1]
==>[Yellow:1,Id:2]
==>[Yellow:1,Id:3,Green:1]

Gremlin - Update or insert multiple vertices

I'm trying the following gremlin query to replace existing vertices if they exist but line V(__.select('id')).drop() keeps failing because __.select('id') does not return the id on that line.
vertices = [
{"id":1, "label": "person", "first_name":"bob","age":25,"height": 177},
{"id":2, "label": "person", "first_name":"joe","surname":"bloggs", "age": 32}
]
graph_traversal.inject(vertices).unfold().as_('entity'). \
V(__.select('id')).drop(). \
addV(__.select('label')).property(T.id, __.select('id')).as_('vertex'). \
sideEffect(__.select('entity').unfold().as_('kv').select('vertex'). \
property(
__.select('kv').by(Column.keys),
__.select('kv').by(Column.values)
)
)
Here is a way to approach it, but it requires that you do a bit of front end processing to your List of Map data to extract the "id" values so that they can be passed to g.V() directly.
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> data = [
......1> ["id":1, "label": "person", "first_name":"bob","age":25,"height": 177],
......2> ["id":2, "label": "person", "first_name":"joe","surname":"bloggs", "age": 32]
......3> ]
==>[id:1,label:person,first_name:bob,age:25,height:177]
==>[id:2,label:person,first_name:joe,surname:bloggs,age:32]
gremlin> g.V(data.collect{it.id}).
......1> sideEffect(drop()).
......2> fold().
......3> constant(data).
......4> unfold().as('properties').
......5> addV(select('label')).property(T.id, select('id')).as('vertex').
......6> sideEffect(select('properties').
......7> unfold().as('kv').
......8> where(select(keys).is(without('id','label'))).
......9> select('vertex').
.....10> property(select('kv').by(keys), select('kv').by(values)))
==>v[1]
==>v[2]
gremlin> g.V(1,2).elementMap()
==>[id:1,label:person,first_name:bob,age:25,height:177]
==>[id:2,label:person,surname:bloggs,first_name:joe,age:32]
Of particular note here is the use of fold() to reduce the stream of dropped vertex traversers to a single traverser (i.e. a List of those vertices) which then let's us replace that with a single instance of "data" to iterate over in the common fashion for using a Map to create a Vertex. Note that I added a where() to ignore the "id" and "label" keys since I figured you didn't want those values duplicated in as vertex properties.

Gremlin query with nested vertices

My use case is: Bag vertex has edge holds to Box vertex and Box vertex has edge contains to Fruit vertex. So it's a parent-child relation between 3 vertices.
Is it possible to write gremlin query which returns all related 3 vertices. for e.g i need to fetch all Bags by id including Box vertex and further down to Fruit vertex for that Bag id. In SQL like syntax it's a simple select * from bag where id = 1.
sample structure:
g.addV('bag').property('id',1).property('name','bag1').property('size','12').as('1').
addV('box').property('id',2).property('name','box1').property('width','12').as('2').
addV('fruit').property('id',3).property('name','apple').property('color','red').as('3').
addV('bag').property('id',4).property('name','bag2').property('size','44').as('4').
addV('box').property('id',5).property('name','box2').property('width','14').as('5').
addV('fruit').property('id',6).property('name','orange').property('color','yellow').as('6').
addE('holds').from('1').to('2').
addE('contains').from('2').to('3').
addE('holds').from('4').to('5').
addE('contains').from('5').to('6').iterate()
I want to get all properties of 1, 2, 3 when i query for vertices 1.
I want the response in the below format.
"bags" : [{
"id":"1",
"name":"bag1",
"size" :"12",
"boxes":[ {
"id" : "2",
"name":"box1",
"width" : "12",
"fruits": [{
"id":"3",
"name" : "apple",
"color" : "red"
}]
}]
},
{
"id":"4",
"name":"bag2",
"size" : "44",
"boxes":[ {
"id" : "5",
"name":"box2",
"width" : "44",
"fruits": [{
"id":"6",
"name" : "orange"
"color" : "yellow"
}]
}]
}]
But not sure if similar case is possible in gremlin as there are no implicit relation between vertices.
I would probably use project() to accomplish this:
gremlin> g.V().hasLabel('bag').
......1> project('id', 'name','boxes').
......2> by('id').
......3> by('name').
......4> by(out('holds').
......5> project('id','name','fruits').
......6> by('id').
......7> by('name').
......8> by(out('contains').
......9> project('id','name').
.....10> by('id').
.....11> by('name').
.....12> fold()).
.....13> fold())
==>[id:1,name:bag1,boxes:[[id:2,name:box1,fruits:[[id:3,name:apple]]]]]
==>[id:4,name:bag2,boxes:[[id:5,name:box2,fruits:[[id:6,name:orange]]]]]
I omitted the "bags" root level key as there were no other keys in the Map and it didn't seem useful to add that extra level.

Gremlin: how can I return vertex and their associated vertex?

I Need to return some groups and people in that group, like this:
Group A
-----Person A
-----Person B
-----Person C
Group B
-----Person D
-----Person E
-----Person F
How can I do that with gremlin. They are connected to group with a edge.
It is always helpful to include a sample graph with your questions on Gremlin preferably as a something easily pasted to the Gremlin Console as follows:
g.addV('group').property('name','Group A').as('ga').
addV('group').property('name','Group B').as('gb').
addV('person').property('name','Person A').as('pa').
addV('person').property('name','Person B').as('pb').
addV('person').property('name','Person C').as('pc').
addV('person').property('name','Person D').as('pd').
addV('person').property('name','Person E').as('pe').
addV('person').property('name','Person F').as('pf').
addE('contains').from('ga').to('pa').
addE('contains').from('ga').to('pb').
addE('contains').from('ga').to('pc').
addE('contains').from('gb').to('pd').
addE('contains').from('gb').to('pe').
addE('contains').from('gb').to('pf').iterate()
A solution to your problem is to use group() step:
gremlin> g.V().has('group', 'name', within('Group A','Group B')).
......1> group().
......2> by('name').
......3> by(out('contains').values('name').fold())
==>[Group B:[Person D,Person E,Person F],Group A:[Person A,Person B,Person C]]

Gremlin query uneven result issue

Suppose I have 3 students (A,B,C) and having a major subject and marks respectievely but when I query the result shown in a uneven way.
Data
A -> Math -> 77
B -> History -> 70
C -> Science -> 97
Query
g.V('Class').has('name',within('A','B','C'))
Result
{"student_name":['A','B','C'], "major_subject":['Math','Science','History'], "marks":[70,77,97]}
The data displayed by querying the database is not in order according to the name of the student.
I assume that your graph looks kinda like this:
g = TinkerGraph.open().traversal()
g.addV('student').property('name', 'A').
addE('scored').to(addV('subject').property('name', 'Math')).
property('mark', 77).
addV('student').property('name', 'B').
addE('scored').to(addV('subject').property('name', 'History')).
property('mark', 70).
addV('student').property('name', 'C').
addE('scored').to(addV('subject').property('name', 'Science')).
property('mark', 97).iterate()
Now the easiest way to gather the data is this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('student').
outE('scored').as('mark').inV().as('major').
select('student','major','mark').
by('name').
by('name').
by('mark')
==>[student:A,major:Math,mark:77]
==>[student:B,major:History,mark:70]
==>[student:C,major:Science,mark:97]
But if you really depend on the format shown in your question, you can do this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).
store('student').by('name').
outE('scored').store('mark').by('mark').
inV().store('major').by('name').
cap('student','major','mark')
==>[major:[Math,History,Science],student:[A,B,C],mark:[77,70,97]]
If you want to get the cap'ed result to be ordered by marks, you'll need a mix of the 2 queries:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('a').
outE('scored').as('b').
order().
by('mark').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[History,Math,Science],student:[B,A,C],mark:[70,77,97]]
To order by the order of inputs:
gremlin> input = ['C', 'B', 'A']; []
gremlin> g.V().has('student', 'name', within(input)).as('a').
order().
by {input.indexOf(it.value('name'))}.
outE('scored').as('b').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[97,70,77],student:[C,B,A],mark:[Science,History,Math]]

Resources