Suppose I have 3 students (A,B,C) and having a major subject and marks respectievely but when I query the result shown in a uneven way.
Data
A -> Math -> 77
B -> History -> 70
C -> Science -> 97
Query
g.V('Class').has('name',within('A','B','C'))
Result
{"student_name":['A','B','C'], "major_subject":['Math','Science','History'], "marks":[70,77,97]}
The data displayed by querying the database is not in order according to the name of the student.
I assume that your graph looks kinda like this:
g = TinkerGraph.open().traversal()
g.addV('student').property('name', 'A').
addE('scored').to(addV('subject').property('name', 'Math')).
property('mark', 77).
addV('student').property('name', 'B').
addE('scored').to(addV('subject').property('name', 'History')).
property('mark', 70).
addV('student').property('name', 'C').
addE('scored').to(addV('subject').property('name', 'Science')).
property('mark', 97).iterate()
Now the easiest way to gather the data is this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('student').
outE('scored').as('mark').inV().as('major').
select('student','major','mark').
by('name').
by('name').
by('mark')
==>[student:A,major:Math,mark:77]
==>[student:B,major:History,mark:70]
==>[student:C,major:Science,mark:97]
But if you really depend on the format shown in your question, you can do this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).
store('student').by('name').
outE('scored').store('mark').by('mark').
inV().store('major').by('name').
cap('student','major','mark')
==>[major:[Math,History,Science],student:[A,B,C],mark:[77,70,97]]
If you want to get the cap'ed result to be ordered by marks, you'll need a mix of the 2 queries:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('a').
outE('scored').as('b').
order().
by('mark').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[History,Math,Science],student:[B,A,C],mark:[70,77,97]]
To order by the order of inputs:
gremlin> input = ['C', 'B', 'A']; []
gremlin> g.V().has('student', 'name', within(input)).as('a').
order().
by {input.indexOf(it.value('name'))}.
outE('scored').as('b').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[97,70,77],student:[C,B,A],mark:[Science,History,Math]]
Related
Given the air-routes graph, say I want to get all possible one-stopover routes, like so:
[home] --distance--> [stopover] --distance--> [destination]
where [home], [stopover] and [destination] are airport nodes that each have a property 'code' that represent an airport code; and distance is an integer weight given to each edge connecting two airport nodes.
How could I write a query that gets me the airport codes for [home], [stopover] and [destination] such that the results are sorted as follows:
[home] airport codes are sorted alphabetically.
For each group of [home] airport, the [stopover] airport codes are sorted by the distance between [home] and [stopover] (ascending).
After sorting 1 and 2, [destination] airport codes are sorted by the distance between [stopover] and [destination].
(Note: it doesn't matter if [home] and [destination] are the same airport)
One way you could do this is through group with nested by modulation.
g.V().
group().
by('code').
by(
outE('route').
order().by('dist').
inV().
group().
by('code').
by(
outE('route').
order().by('dist').
inV().
values('code').fold())).
unfold()
The result is something like:
1. {'SHH': {'WAA': ['KTS', 'SHH', 'OME'], 'OME': ['TLA', 'WMO', 'KTS', 'GLV', 'ELI', 'TNC', 'WAA', 'WBB', 'SHH', 'SKK', 'KKA', 'UNK', 'SVA', 'OTZ', 'GAM', 'ANC']}}
2. {'KWN': {'BET': ['WNA', 'KWT', 'ATT', 'KUK', 'TLT', 'EEK', 'WTL', 'KKH', 'KWN', 'KLG', 'MLL', 'KWK', 'PQS', 'CYF', 'KPN', 'NME', 'OOK', 'GNU', 'VAK', 'SCM', 'HPB', 'EMK', 'ANC'], 'EEK': ['KWN', 'BET'], 'TOG': ['KWN']}}
3. {'NUI': {'SCC': ['NUI', 'BTI', 'BRW', 'FAI', 'ANC'], 'BRW': ['ATK', 'AIN', 'NUI', 'PIZ', 'SCC', 'FAI', 'ANC']}}
4. {'PSG': {'JNU': ['HNH', 'GST', 'HNS', 'SGY', 'SIT', 'KAE', 'PSG', 'YAK', 'KTN', 'ANC', 'SEA'], 'WRG': ['PSG', 'KTN']}}
5. {'PIP': {'UGB': ['PTH']}}
.
.
.
I have the following vertices -
Person1 -> Device1 <- Person2
^
| |
v
Email1 <- Person3
Now I want to write a gremlin query (janusgraph) which will give me all persons connected to the device(only) with which person1 is connected.
So according to the above graph, our output should be - [Person2].
Person3 is not in output because Person3 is also connected with "Email1" of "Person1".
g.addV('person').property('name', 'Person1').as('p1').
addV('person').property('name', 'Person2').as('p2').
addV('person').property('name', 'Person3').as('p3').
addV('device').as('d1').
addV('email').as('e1').
addE('HAS_DEVICE').from('p1').to('d1').
addE('HAS_EMAIL').from('p1').to('e1').
addE('HAS_DEVICE').from('p2').to('d1').
addE('HAS_DEVICE').from('p3').to('d1').
addE('HAS_EMAIL').from('p3').to('e1')
The following traversal will give you the person vertices that are connected to "Person1" via one or more "device" vertices and not connected via any other type of vertices.enter code here
g.V().has('person', 'name', 'Person1').as('p1').
out().as('connector').
in().where(neq('p1')).
group().
by().
by(select('connector').label().fold()).
unfold().
where(
select(values).
unfold().dedup().fold(). // just in case the persons are connected by multiple devices
is(eq(['device']))
).
select(keys)
I'm trying the following gremlin query to replace existing vertices if they exist but line V(__.select('id')).drop() keeps failing because __.select('id') does not return the id on that line.
vertices = [
{"id":1, "label": "person", "first_name":"bob","age":25,"height": 177},
{"id":2, "label": "person", "first_name":"joe","surname":"bloggs", "age": 32}
]
graph_traversal.inject(vertices).unfold().as_('entity'). \
V(__.select('id')).drop(). \
addV(__.select('label')).property(T.id, __.select('id')).as_('vertex'). \
sideEffect(__.select('entity').unfold().as_('kv').select('vertex'). \
property(
__.select('kv').by(Column.keys),
__.select('kv').by(Column.values)
)
)
Here is a way to approach it, but it requires that you do a bit of front end processing to your List of Map data to extract the "id" values so that they can be passed to g.V() directly.
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> data = [
......1> ["id":1, "label": "person", "first_name":"bob","age":25,"height": 177],
......2> ["id":2, "label": "person", "first_name":"joe","surname":"bloggs", "age": 32]
......3> ]
==>[id:1,label:person,first_name:bob,age:25,height:177]
==>[id:2,label:person,first_name:joe,surname:bloggs,age:32]
gremlin> g.V(data.collect{it.id}).
......1> sideEffect(drop()).
......2> fold().
......3> constant(data).
......4> unfold().as('properties').
......5> addV(select('label')).property(T.id, select('id')).as('vertex').
......6> sideEffect(select('properties').
......7> unfold().as('kv').
......8> where(select(keys).is(without('id','label'))).
......9> select('vertex').
.....10> property(select('kv').by(keys), select('kv').by(values)))
==>v[1]
==>v[2]
gremlin> g.V(1,2).elementMap()
==>[id:1,label:person,first_name:bob,age:25,height:177]
==>[id:2,label:person,surname:bloggs,first_name:joe,age:32]
Of particular note here is the use of fold() to reduce the stream of dropped vertex traversers to a single traverser (i.e. a List of those vertices) which then let's us replace that with a single instance of "data" to iterate over in the common fashion for using a Map to create a Vertex. Note that I added a where() to ignore the "id" and "label" keys since I figured you didn't want those values duplicated in as vertex properties.
My use case is: Bag vertex has edge holds to Box vertex and Box vertex has edge contains to Fruit vertex. So it's a parent-child relation between 3 vertices.
Is it possible to write gremlin query which returns all related 3 vertices. for e.g i need to fetch all Bags by id including Box vertex and further down to Fruit vertex for that Bag id. In SQL like syntax it's a simple select * from bag where id = 1.
sample structure:
g.addV('bag').property('id',1).property('name','bag1').property('size','12').as('1').
addV('box').property('id',2).property('name','box1').property('width','12').as('2').
addV('fruit').property('id',3).property('name','apple').property('color','red').as('3').
addV('bag').property('id',4).property('name','bag2').property('size','44').as('4').
addV('box').property('id',5).property('name','box2').property('width','14').as('5').
addV('fruit').property('id',6).property('name','orange').property('color','yellow').as('6').
addE('holds').from('1').to('2').
addE('contains').from('2').to('3').
addE('holds').from('4').to('5').
addE('contains').from('5').to('6').iterate()
I want to get all properties of 1, 2, 3 when i query for vertices 1.
I want the response in the below format.
"bags" : [{
"id":"1",
"name":"bag1",
"size" :"12",
"boxes":[ {
"id" : "2",
"name":"box1",
"width" : "12",
"fruits": [{
"id":"3",
"name" : "apple",
"color" : "red"
}]
}]
},
{
"id":"4",
"name":"bag2",
"size" : "44",
"boxes":[ {
"id" : "5",
"name":"box2",
"width" : "44",
"fruits": [{
"id":"6",
"name" : "orange"
"color" : "yellow"
}]
}]
}]
But not sure if similar case is possible in gremlin as there are no implicit relation between vertices.
I would probably use project() to accomplish this:
gremlin> g.V().hasLabel('bag').
......1> project('id', 'name','boxes').
......2> by('id').
......3> by('name').
......4> by(out('holds').
......5> project('id','name','fruits').
......6> by('id').
......7> by('name').
......8> by(out('contains').
......9> project('id','name').
.....10> by('id').
.....11> by('name').
.....12> fold()).
.....13> fold())
==>[id:1,name:bag1,boxes:[[id:2,name:box1,fruits:[[id:3,name:apple]]]]]
==>[id:4,name:bag2,boxes:[[id:5,name:box2,fruits:[[id:6,name:orange]]]]]
I omitted the "bags" root level key as there were no other keys in the Map and it didn't seem useful to add that extra level.
I Need to return some groups and people in that group, like this:
Group A
-----Person A
-----Person B
-----Person C
Group B
-----Person D
-----Person E
-----Person F
How can I do that with gremlin. They are connected to group with a edge.
It is always helpful to include a sample graph with your questions on Gremlin preferably as a something easily pasted to the Gremlin Console as follows:
g.addV('group').property('name','Group A').as('ga').
addV('group').property('name','Group B').as('gb').
addV('person').property('name','Person A').as('pa').
addV('person').property('name','Person B').as('pb').
addV('person').property('name','Person C').as('pc').
addV('person').property('name','Person D').as('pd').
addV('person').property('name','Person E').as('pe').
addV('person').property('name','Person F').as('pf').
addE('contains').from('ga').to('pa').
addE('contains').from('ga').to('pb').
addE('contains').from('ga').to('pc').
addE('contains').from('gb').to('pd').
addE('contains').from('gb').to('pe').
addE('contains').from('gb').to('pf').iterate()
A solution to your problem is to use group() step:
gremlin> g.V().has('group', 'name', within('Group A','Group B')).
......1> group().
......2> by('name').
......3> by(out('contains').values('name').fold())
==>[Group B:[Person D,Person E,Person F],Group A:[Person A,Person B,Person C]]