I'm trying the following gremlin query to replace existing vertices if they exist but line V(__.select('id')).drop() keeps failing because __.select('id') does not return the id on that line.
vertices = [
{"id":1, "label": "person", "first_name":"bob","age":25,"height": 177},
{"id":2, "label": "person", "first_name":"joe","surname":"bloggs", "age": 32}
]
graph_traversal.inject(vertices).unfold().as_('entity'). \
V(__.select('id')).drop(). \
addV(__.select('label')).property(T.id, __.select('id')).as_('vertex'). \
sideEffect(__.select('entity').unfold().as_('kv').select('vertex'). \
property(
__.select('kv').by(Column.keys),
__.select('kv').by(Column.values)
)
)
Here is a way to approach it, but it requires that you do a bit of front end processing to your List of Map data to extract the "id" values so that they can be passed to g.V() directly.
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> data = [
......1> ["id":1, "label": "person", "first_name":"bob","age":25,"height": 177],
......2> ["id":2, "label": "person", "first_name":"joe","surname":"bloggs", "age": 32]
......3> ]
==>[id:1,label:person,first_name:bob,age:25,height:177]
==>[id:2,label:person,first_name:joe,surname:bloggs,age:32]
gremlin> g.V(data.collect{it.id}).
......1> sideEffect(drop()).
......2> fold().
......3> constant(data).
......4> unfold().as('properties').
......5> addV(select('label')).property(T.id, select('id')).as('vertex').
......6> sideEffect(select('properties').
......7> unfold().as('kv').
......8> where(select(keys).is(without('id','label'))).
......9> select('vertex').
.....10> property(select('kv').by(keys), select('kv').by(values)))
==>v[1]
==>v[2]
gremlin> g.V(1,2).elementMap()
==>[id:1,label:person,first_name:bob,age:25,height:177]
==>[id:2,label:person,surname:bloggs,first_name:joe,age:32]
Of particular note here is the use of fold() to reduce the stream of dropped vertex traversers to a single traverser (i.e. a List of those vertices) which then let's us replace that with a single instance of "data" to iterate over in the common fashion for using a Map to create a Vertex. Note that I added a where() to ignore the "id" and "label" keys since I figured you didn't want those values duplicated in as vertex properties.
Related
I'm new to using Gremlin and I need help to set the best query to select unique and filtered results.
Starting from a team I would get player (note: each player can play for more than one team) of each team connected by is_friends_with
The result (I would like to get)
[
{
"Player": "Icardi",
"Teams": ["Valladolid"]
},
{
"Player": "Kroll",
"Teams": ["Valladolid"]
},
{
"Player": "Baggio",
"Teams": ["Eagles"]
},
{
"Player": "Papin",
"Teams": ["Valladolid","Eagls"]
},
]
The graph
The schema:
g.addV('team').as('1').
property(single, 'name', 'Eagles').
addV('player').as('2').
property(single, 'name', 'Zico').addV('team').
as('3').
property(single, 'name', 'team A').
addV('team').as('4').
property(single, 'name', 'Horses').
addV('player').as('5').
property(single, 'name', 'Papin').
addV('player').as('6').
property(single, 'name', 'Ronaldo').
addV('player').as('7').
property(single, 'name', 'Visco').
addV('player').as('8').
property(single, 'name', 'Baggio').
addV('tournament').as('9').
addV('team').as('10').
property(single, 'name', 'Valladolid').
addV('player').as('11').
property(single, 'name', 'Kroll').
addV('player').as('12').
property(single, 'name', 'Icardi').
addE('owned').from('1').to('5').addE('owned').
from('1').to('6').addE('owned').from('1').
to('8').addE('owned').from('3').to('6').
addE('owned').from('3').to('7').
addE('created').from('3').to('9').
addE('is_friends_with').from('3').to('10').
addE('is_friends_with').from('3').to('1').
addE('owned').from('4').to('8').addE('owned').
from('4').to('2').addE('owned').from('4').
to('5').addE('owned').from('4').to('7').
addE('invited').from('9').to('1').
addE('invited').from('9').to('4').
addE('owned').from('10').to('11').
addE('owned').from('10').to('12').
addE('owned').from('10').to('5')
Here is one way to do it using group
gremlin> g.V().
......1> has('name','team A').
......2> out('is_friends_with').as('a').
......3> out('owned').
......4> group().
......5> by('name').
......6> by(select('a').values('name').fold()).
......7> unfold()
==>Papin=[Valladolid, Eagles]
==>Icardi=[Valladolid]
==>Baggio=[Eagles]
==>Ronaldo=[Eagles]
==>Kroll=[Valladolid]
To get the exact format that matches your JSON, we can just add aproject step to the query.
gremlin> g.V().
......1> has('name','team A').
......2> out('is_friends_with').as('a').
......3> out('owned').
......4> group().
......5> by('name').
......6> by(select('a').values('name').fold()).
......7> unfold().
......8> project('player','teams').
......9> by(keys).
.....10> by(values)
==>[player:Papin,teams:[Valladolid,Eagles]]
==>[player:Icardi,teams:[Valladolid]]
==>[player:Baggio,teams:[Eagles]]
==>[player:Ronaldo,teams:[Eagles]]
==>[player:Kroll,teams:[Valladolid]]
gremlin> g.V().has("name", "team A").sideEffect(__.out("owned").hasLabel("player").aggregate("my_player").limit(1)).both("is_friends_with").hasLabel("team").as("team2").out("owned").hasLabel("player").as("friends_player").where(without("my_player")).as("friends_player2").select("team2", "friends_player2").group().by(select("friends_player2")).by(select("team2").fold()).unfold().project("Player", "Teams").by(select(keys).values("name")).by(select(values).unfold().values("name").fold())
==>[Player:Baggio,Teams:[Eagles]]
==>[Player:Kroll,Teams:[Valladolid]]
==>[Player:Icardi,Teams:[Valladolid]]
==>[Player:Papin,Teams:[Valladolid,Eagles]]
I have a query that returns the output in the following format,
{
"Key": [
"Value1",
"Value2"
],
"Count": [
{
"Count1": 28,
"Count2": 28
},
{
"Count3": 16,
"Count4": 16
}
]
}
I want to display it in the following format
[
{
"Key" : "Value1",
"Count1": 28,
"Count2": 28
},
{
"Key" : "Value2",
"Count3": 16,
"Count4": 16
}
]
Is it possible?
The gremlin that produces a similar output
g.V().
has('organizationId', 'b121672e-8049-40cc-9f28-c62dff4cc2d9').
hasLabel('employee').
group().
by('officeId').
by(project('Id', 'Status').
by(choose(has('officeId'), constant('Total'), constant(''))).
by(coalesce(out('hasStatus').
or(
has('release', is(false)),
has('autoRelease', is(true)).
has('release', is(true)).
has('endDate', gte(637250976000000000))
), values('status'), constant('Green'))).
select(values).
unfold().
groupCount()).
project('Id', 'Count').
by(select(keys)).
by(select(values))
And the data that I have is an employee vertex and a healthStatus vertex, there's an hasStatus edge between employee and healthStatus
Properties in employee vertex:
id, organizations, officeId, Name, createdOn
Properties in healthStatus vertex:
id, status, startDate, endDate, release, autoRelease, createdOn
Sample Data
g.addV('employee').
property('id',1).
property('organizationId',1).
property('officeId',1).
property('name','A').
property('createdOn', 637263231140000000).as('1').
addV('employee').
property('id',2).
property('organizationId',1).
property('officeId',2).
property('name','B').
property('createdOn', 637263231140000000).as('2').
addV('employee').
property('id',5).
property('organizationId',1).
property('officeId',3).
property('name','C').
property('createdOn', 637263231140000000).as('5').
addV('healthStatus').
property('id',3).
property('status','Red').
property('startDate',637262367140000000).
property('endDate',637264095140000000).
property('release',false).
property('createdOn',637262367140000000)as('3').
addV('healthStatus').
property('id',4).
property('status','Yellow').
property('startDate',637262367140000000).
property('endDate',637264095140000000).
property('release',false).
property('createdOn',637262367140000000)as('4').
addE('hasStatus').from('1').to('3').
addE('hasStatus').from('4').to('4')
Output:
[
{
"Id" : [
1,
2,
3
]
},
{
"Count": [
{
"Red" : 1
},
{
"Yellow" : 1
},
{
"Green" : 1
}
]
}
Expected Output
[
{
"Id" : 1,
"Red" : 1
},
{
"Id" : 2,
"Yellow" : 1
},
{
"Id" : 3,
"Green" : 1
}
]
Note : This Id in projection is officeId from employee vertex
I think I've captured what you wanted. There were some errors in your sample data script and I wanted some extra data to make sure counts were making sense so I added a bit:
g = TinkerGraph.open().traversal()
g.addV('employee').
property('id',1).
property('organizationId',1).
property('officeId',1).
property('name','A').
property('createdOn', 637263231140000000).as('1').
addV('employee').
property('id',2).
property('organizationId',1).
property('officeId',2).
property('name','B').
property('createdOn', 637263231140000000).as('2').
addV('employee').
property('id',5).
property('organizationId',1).
property('officeId',3).
property('name','C').
property('createdOn', 637263231140000000).as('5').
addV('employee').
property('id',6).
property('organizationId',1).
property('officeId',3).
property('name','D').
property('createdOn', 637263231140000000).as('6').
addV('healthStatus').
property('id',3).
property('status','Red').
property('startDate',637262367140000000).
property('endDate',637264095140000000).
property('release',false).
property('createdOn',637262367140000000).as('3').
addV('healthStatus').
property('id',4).
property('status','Yellow').
property('startDate',637262367140000000).
property('endDate',637264095140000000).
property('release',false).
property('createdOn',637262367140000000).as('4').
addE('hasStatus').from('1').to('3').
addE('hasStatus').from('2').to('4').
addE('hasStatus').from('6').to('4')
I've re-written you traversal a bit to provide a different approach that I think provides the data you expect, however in a slightly different form:
gremlin> g.V().has('employee','organizationId', 1).
......1> project('Id', 'Status').
......2> by('officeId').
......3> by(coalesce(out('hasStatus').
......4> or(has('release', false),
......5> has('autoRelease', true).has('release', true).has('endDate', gte(637250976000000000))).
......6> values('status'),
......7> constant('Green'))).
......8> group().
......9> by(select('Id')).
.....10> by(groupCount().
.....11> by('Status'))
==>[1:[Red:1],2:[Yellow:1],3:[Yellow:1,Green:1]]
I prefer this form a bit, but perhaps you require the original format you inquired about, in which case you need another round of manipulation on the collection:
gremlin> g.V().has('employee','organizationId', 1).
......1> project('Id', 'Status').
......2> by('officeId').
......3> by(coalesce(out('hasStatus').
......4> or(has('release', false),
......5> has('autoRelease', true).has('release', true).has('endDate', gte(637250976000000000))).
......6> values('status'),
......7> constant('Green'))).
......8> group().
......9> by(select('Id')).
.....10> by(groupCount().
.....11> by('Status')).
.....12> unfold().
.....13> map(union(project('Id').by(select(keys)),
.....14> select(values)).
.....15> unfold().
.....16> group().by(keys).by(select(values)))
==>[Red:1,Id:1]
==>[Yellow:1,Id:2]
==>[Yellow:1,Id:3,Green:1]
Given a graph as the image below described:
Find all the paths between person1 and person5, then calculate connection between consecutive vertices on the path.
To illustrate the definition of connection, take person1 and person2 as an example:
1. person1 create comment1 to reply post2 created by person2
2. person2 create comment3 to reply post1 created by person1
So, connection between person1 and person2 is 2; and that between person2 and person5 is 0.
The path in the graph given above is [v[person1],v[person2],v[person5]]:
gremlin> g.V('person1').
......1> repeat(both('knows').simplePath()).
......2> until(hasId('person5')).path()
==>[v[person1],v[person2],v[person5]]
For now, i can only manage to get dsl:
gremlin> g.V('person1').
......1> repeat(both('knows').simplePath()).
......2> until(hasId('person5').or().loops().is(eq(2))).hasId('person5').path().
......3> repeat(
......4> filter(count(local).is(gt(1))).
......5> sack(assign).by(
......6> sideEffect(range(local,1,2).aggregate('m')).
......7> range(local,0,1).
......8> in('hasCreator').hasLabel('comment').
......9> out('replyOf').hasLabel('post').
.....10> out('hasCreator').where(within('m')).count()
.....11> ).
.....12> sack(sum).by(
.....13> sideEffect(range(local,0,1).aggregate('n')).
.....14> range(local,1,2).
.....15> in('hasCreator').hasLabel('comment').
.....16> out('replyOf').hasLabel('post').
.....17> out('hasCreator').where(within('n')).count()
.....18> ).
.....19> skip(local, 1)
.....20> ).
.....21> emit().sack().fold()
==>[2,1]
But the result is wrong, which is expected to be [2,0]. I know that i should not use aggregate to filter, but i can't find an proper method according to my knowledge.
The example graph can be generated by :
g.addV('person').property(id, 'person1')
g.addV('person').property(id, 'person2')
g.addV('person').property(id, 'person5')
g.addE('knows').from(V('person1')).to(V('person2'))
g.addE('knows').from(V('person2')).to(V('person5'))
g.addV('post').property(id, 'post1')
g.addV('post').property(id, 'post2')
g.addV('comment').property(id, 'comment1')
g.addV('comment').property(id, 'comment2')
g.addV('comment').property(id, 'comment3')
g.addE('hasCreator').from(V('post1')).to(V('person1'))
g.addE('hasCreator').from(V('post2')).to(V('person2'))
g.addE('hasCreator').from(V('comment1')).to(V('person1'))
g.addE('hasCreator').from(V('comment2')).to(V('person2'))
g.addE('hasCreator').from(V('comment3')).to(V('person2'))
g.addE('replyOf').from(V('comment1')).to(V('post2'))
g.addE('replyOf').from(V('comment2')).to(V('post2'))
g.addE('replyOf').from(V('comment3')).to(V('post1'))
After I replace the usage of aggregate with select, I can get the right answer by now:
g.V('person1').
repeat(both('knows').simplePath()).
until(hasId('person5').or().loops().is(eq(2))).hasId('person5').path().
repeat(
filter(count(local).is(gt(1))).
sack(assign).by(
__.as('orig').
range(local,1,2).as('v2').
select('orig').range(local,0,1).
in('hasCreator').hasLabel('comment').
out('replyOf').hasLabel('post').
out('hasCreator').where(eq('v2')).count()
).
sack(sum).by(
__.as('orig').
range(local,0,1).as('v1').
select('orig').range(local,1,2).
in('hasCreator').hasLabel('comment').
out('replyOf').hasLabel('post').
out('hasCreator').where(eq('v1')).count()
).
skip(local, 1)
).
emit().sack().fold()
My use case is: Bag vertex has edge holds to Box vertex and Box vertex has edge contains to Fruit vertex. So it's a parent-child relation between 3 vertices.
Is it possible to write gremlin query which returns all related 3 vertices. for e.g i need to fetch all Bags by id including Box vertex and further down to Fruit vertex for that Bag id. In SQL like syntax it's a simple select * from bag where id = 1.
sample structure:
g.addV('bag').property('id',1).property('name','bag1').property('size','12').as('1').
addV('box').property('id',2).property('name','box1').property('width','12').as('2').
addV('fruit').property('id',3).property('name','apple').property('color','red').as('3').
addV('bag').property('id',4).property('name','bag2').property('size','44').as('4').
addV('box').property('id',5).property('name','box2').property('width','14').as('5').
addV('fruit').property('id',6).property('name','orange').property('color','yellow').as('6').
addE('holds').from('1').to('2').
addE('contains').from('2').to('3').
addE('holds').from('4').to('5').
addE('contains').from('5').to('6').iterate()
I want to get all properties of 1, 2, 3 when i query for vertices 1.
I want the response in the below format.
"bags" : [{
"id":"1",
"name":"bag1",
"size" :"12",
"boxes":[ {
"id" : "2",
"name":"box1",
"width" : "12",
"fruits": [{
"id":"3",
"name" : "apple",
"color" : "red"
}]
}]
},
{
"id":"4",
"name":"bag2",
"size" : "44",
"boxes":[ {
"id" : "5",
"name":"box2",
"width" : "44",
"fruits": [{
"id":"6",
"name" : "orange"
"color" : "yellow"
}]
}]
}]
But not sure if similar case is possible in gremlin as there are no implicit relation between vertices.
I would probably use project() to accomplish this:
gremlin> g.V().hasLabel('bag').
......1> project('id', 'name','boxes').
......2> by('id').
......3> by('name').
......4> by(out('holds').
......5> project('id','name','fruits').
......6> by('id').
......7> by('name').
......8> by(out('contains').
......9> project('id','name').
.....10> by('id').
.....11> by('name').
.....12> fold()).
.....13> fold())
==>[id:1,name:bag1,boxes:[[id:2,name:box1,fruits:[[id:3,name:apple]]]]]
==>[id:4,name:bag2,boxes:[[id:5,name:box2,fruits:[[id:6,name:orange]]]]]
I omitted the "bags" root level key as there were no other keys in the Map and it didn't seem useful to add that extra level.
Suppose I have 3 students (A,B,C) and having a major subject and marks respectievely but when I query the result shown in a uneven way.
Data
A -> Math -> 77
B -> History -> 70
C -> Science -> 97
Query
g.V('Class').has('name',within('A','B','C'))
Result
{"student_name":['A','B','C'], "major_subject":['Math','Science','History'], "marks":[70,77,97]}
The data displayed by querying the database is not in order according to the name of the student.
I assume that your graph looks kinda like this:
g = TinkerGraph.open().traversal()
g.addV('student').property('name', 'A').
addE('scored').to(addV('subject').property('name', 'Math')).
property('mark', 77).
addV('student').property('name', 'B').
addE('scored').to(addV('subject').property('name', 'History')).
property('mark', 70).
addV('student').property('name', 'C').
addE('scored').to(addV('subject').property('name', 'Science')).
property('mark', 97).iterate()
Now the easiest way to gather the data is this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('student').
outE('scored').as('mark').inV().as('major').
select('student','major','mark').
by('name').
by('name').
by('mark')
==>[student:A,major:Math,mark:77]
==>[student:B,major:History,mark:70]
==>[student:C,major:Science,mark:97]
But if you really depend on the format shown in your question, you can do this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).
store('student').by('name').
outE('scored').store('mark').by('mark').
inV().store('major').by('name').
cap('student','major','mark')
==>[major:[Math,History,Science],student:[A,B,C],mark:[77,70,97]]
If you want to get the cap'ed result to be ordered by marks, you'll need a mix of the 2 queries:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('a').
outE('scored').as('b').
order().
by('mark').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[History,Math,Science],student:[B,A,C],mark:[70,77,97]]
To order by the order of inputs:
gremlin> input = ['C', 'B', 'A']; []
gremlin> g.V().has('student', 'name', within(input)).as('a').
order().
by {input.indexOf(it.value('name'))}.
outE('scored').as('b').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[97,70,77],student:[C,B,A],mark:[Science,History,Math]]