I am using gremlin query language with Neptune gdb, and experienced a weird behavior when using select:
Let's say my graph is a single node
g.addV("test").property(id,"v1")
and I try this query:
g.V("v1").as("a")
.V().has("test","name","non-existing-name")
.fold().coalesce(unfold(),V("v1")).as("b")
.select("b")
The response is v[v1] as expected.
If I do the same with select("a") at the end:
g.V("v1").as("a")
.V().has("test","name","non-existing-name")
.fold().coalesce(unfold(),V("v1")).as("b")
.select("a")
I get the same result, again- as expected.
the weird behavior is when I try to use select("a","b") at the end:
g.V("v1").as("a")
.V().has("test","name","non-existing-name")
.fold().coalesce(unfold(),V("v1")).as("b")
.select("a","b")
For some reason I get an empty response. Any idea why?
(I did find out that replacing the first as with store works, but I don't understand why)
I don't quite get the same results as you do for that second traversal and I would not expect to. Here is what I would expect to see:
gremlin> g.addV("test").property(id,"v1")
==>v[v1]
gremlin> g.V("v1").as("a").
......1> V().has("test","name","non-existing-name").
......2> fold().coalesce(unfold(),V("v1")).as("b").
......3> select("b")
==>v[v1]
gremlin> g.V("v1").as("a").
......1> V().has("test","name","non-existing-name").
......2> fold().coalesce(unfold(),V("v1")).as("b").
......3> select("a")
gremlin> g.V("v1").as("a").
......1> V().has("test","name","non-existing-name").
......2> fold().coalesce(unfold(),V("v1")).as("b").
......3> select("a","b")
gremlin>
Note that the last two traversal do not return results.
When you fold() you lose the path history to "a" so the traversal can't select() that step label any more. In general, you can't reference back to step labels that are on the opposite side of a reducing barrier step (like fold()). Other examples of reducing barriers would be steps like sum(), max(), min(), etc - where you have a number of traversers that reduce to a single one.
Related
Objective
I want to generate random walks in Gremlin, and already have the command to generate one: g.V(<start_id>).repeat(local(both().sample(1))).times(<depth>).path().
While this is good, I do have to generate <nb_rw_per_node> random walks per start node, and I'd like to use a unique query to handle it if possible.
Issue
I've tried using the repeat() step, in combination with select() to do this, as follows:
g.V(<start_id>).as("start").
repeat(
select("start").
repeat(
local(
both().sample(1)
)
).times(<depth>).path()
).emit().times(<nb_rw_per_node>)
This yields the following results, which I don't understand (here, <depth> = 2 and <nb_rw_per_nodes> = 2)
gremlin> g.V(6652128).as("start").repeat(select("start").repeat(local(both().sample(1))).times(2).path()).emit().times(2)
==>path[v[6652128], v[6652128], v[95670392], v[1044704]]
==>path[v[6652128], v[6652128], v[95670392], v[1044704], path[v[6652128], v[6652128], v[95670392], v[1044704]], v[6652128], v[94818432], v[245928]]
How can I not get the first node doubled in the path?
Why is the second result the concatenation of the first result and the concatenation of the first result and a random walk of the correct length? I expected to get another path of the same format as the first one.
Is this the correct way to generate multiple paths from a same initial node in a single query? If so, how can I correct my query?
Thanks to everyone reading and answering!
When you select you essentially add another copy of the thing selected to the path. If you need 2 random walks from the same start, why not just include the start twice at the very beginning? So the query becomes something like this (using a data set I have to hand):
gremlin> g.V(44,44).repeat(local(out().sample(1))).times(2).path()
==>[v[44],v[8],v[580]]
==>[v[44],v[20],v[34]]
To use nested repeat steps you will need something like this:
gremlin> g.V('44').as('s').
......1> repeat(select('s').as('start').
......2> repeat(local(out().sample(1))).
......3> times(4).path().from('start')).
......4> times(3).
......5> emit()
==>[v[44],v[31],v[271],v[149],v[4]]
==>[v[44],v[31],v[264],v[1],v[152]]
==>[v[44],v[8],v[38],v[4],v[190]]
This last option is a little gimmicky, but also works.
gremlin> g.V(44).
......1> repeat(store('x').identity()).times(3).
......2> cap('x').
......3> unfold().as('start').
......4> repeat(local(out().sample(1))).
......5> times(2).
......6> path().
......7> from('start')
==>[v[44],v[31],v[42]]
==>[v[44],v[8],v[407]]
==>[v[44],v[13],v[53]]
In each of the last two examples, the real key is the introduction of the from step to avoid the redundant starting vertex entries from being included. Try running the queries without the from to see the difference.
Good morning!
I have the following data model where actions follow a journey that can be uniquely identified by the connecting edges having a label that matches a Journey ID. See below for a sample.
Data Model
What I'm trying to achieve is that I can group each unique journey together and give them a count. For example, in the data above, if Jeremy woke up in the morning and ate eggs, and then in the evening ate toast, I would want to see:
Jeremy/Morn->Eats->Eggs->JourneyEnd, count: 1
Jeremy/Eve->Eats->Toast->JourneyEnd, count: 1
Instead I (understandably) get:
Jeremy/Morn->Eats->Eggs->JourneyEnd
Jeremy/Eve->Eats->Toast->JourneyEnd
Jeremy/Morn->Eats->Toast->JourneyEnd
Jeremy/Eve->Eats->Eggs->JourneyEnd
I've tried filtering using repeat, and statements like:
g.V().hasLabel('UserJourney').as('root').
out('firstStep').repeat(
outE().filter(
label().is(select('root').by(id())))).
until(hasLabel('JourneyEnd')).path()
but (I think) because of the way the traversal works, it is not viable as the root step contains all Journeys by the time I go back to read it.
Any suggestions on how to get to the output I'm looking for is most welcome. The setup script is below:
g.addV('UserJourney').property(id, 'Jeremy/Morn').
addV('UserJourney').property(id, 'Jeremy/Eve').
addV('JourneyStep').property(id, 'I Need').
addV('JourneyStep').property(id, 'Eats').
addV('JourneyStep').property(id, 'Eggs').
addV('JourneyStep').property(id, 'Toast').
addV('JourneyEnd').property(id, 'JourneyEnd').
addE('Jeremy/Morn').from(V('Eats')).to(V('Eggs')).
addE('Jeremy/Morn').from(V('Eggs')).to(V('JourneyEnd')).
addE('firstStep').from(V('Jeremy/Morn')).to(V('Eats')).
addE('Jeremy/Eve').from(V('Eats')).to(V('Toast')).
addE('Jeremy/Eve').from(V('Toast')).to(V('JourneyEnd')).
addE('firstStep').from(V('Jeremy/Eve')).to(V('Eats')).
iterate()
You can use the path, from and where...by steps to achieve what you need.
gremlin> g.V().hasLabel('UserJourney').as('a').out().
......1> repeat(outE().where(eq('a')).by(label).by(id).inV()).
......2> until(hasLabel('JourneyEnd')).
......3> path().
......4> from('a')
==>[v[Jeremy/Morn],v[Eats],e[3][Eats-Jeremy/Morn->Eggs],v[Eggs],e[4][Eggs-Jeremy/Morn->JourneyEnd],v[JourneyEnd
]]
==>[v[Jeremy/Eve],v[Eats],e[6][Eats-Jeremy/Eve->Toast],v[Toast],e[7][Toast-Jeremy/Eve->JourneyEnd],v[JourneyEnd
]]
To remove the edges from the result a flatMap can be used
gremlin> g.V().hasLabel('UserJourney').as('a').out().
......1> repeat(flatMap(outE().where(eq('a')).by(label).by(id).inV())).
......2> until(hasLabel('JourneyEnd')).
......3> path().
......4> from('a')
==>[v[Jeremy/Morn],v[Eats],v[Eggs],v[JourneyEnd]]
==>[v[Jeremy/Eve],v[Eats],v[Toast],v[JourneyEnd]]
I'm using Amazon Neptune, which does not support variables. For complex queries, however, I need to use a variable in multiple places. How can I do this without querying twice for the same data?
Here's the problem I'm trying to tackle:
Given a start Person, find Persons that the start Person is connected to by at most 3 steps via the knows relationship. Return each Person's name and email, as well as the distance (1-3).
How would I write this query in Gremlin without variables, since variables are unsupported in Neptune?
I don't see any reason why you would need variables for your traversal and there are many ways you could get an answer. Assuming this graph:
g = TinkerGraph.open().traversal()
g.addV('person').property('name','A').property('age',20).as('a').
addV('person').property('name','B').property('age',21).as('b').
addV('person').property('name','C').property('age',22).as('c').
addV('person').property('name','D').property('age',19).as('d').
addV('person').property('name','E').property('age',22).as('e').
addV('person').property('name','F').property('age',24).as('f').
addE('next').from('a').to('b').
addE('next').from('b').to('c').
addE('next').from('b').to('d').
addE('next').from('c').to('e').
addE('next').from('d').to('e').
addE('next').from('e').to('f').iterate()
You could do something like:
gremlin> g.V().has('person','name','A').
......1> repeat(out().
......2> group('m').
......3> by(loops()).
......4> by(valueMap('name','age').by(unfold()).fold())).
......5> times(3).
......6> cap('m')
==>[0:[[name:B,age:21]],1:[[name:C,age:22],[name:D,age:19]],2:[[name:E,age:22],[name:E,age:22]]]
Find a particular "person" vertex by their name, in this case "A", then repeatedly traverse out() and group those vertices you come across by loops() which is how deep you have traversed. I use valueMap() in this case to extract the properties you wanted. The times(3) is the limit to the depth of your search. Finally you cap() out the side-effect Map held in "m" from our group(). That approach was meant to just give you a bit of basic structure to how you would accomplish this. You could perhaps polish it further this way:
gremlin> g.V().has('person','name','A').
......1> repeat(out().
......2> group('m').
......3> by(loops())).
......4> times(3).
......5> cap('m').unfold().select(values).unfold().
......6> dedup().
......7> valueMap('name','age').by(unfold())
==>[name:B,age:21]
==>[name:C,age:22]
==>[name:D,age:19]
==>[name:E,age:22]
The above example, extracts the values from the Map in "m", removes the duplicates with dedup() and then converts to the result you want. Maybe you don't need the Map in the first place (I just have it on my mind because of this answer actually) - you could simple store() your results as follows:
gremlin> g.V().has('person','name','A').
......1> repeat(out().store('m')).
......2> times(3).
......3> cap('m').unfold().
......4> dedup().
......5> valueMap('name','age').by(unfold())
==>[name:B,age:21]
==>[name:C,age:22]
==>[name:D,age:19]
==>[name:E,age:22]
You might look at using something like simplePath() as well to help avoid re-traversing the same paths over and over again. You can read about that step in the Reference Documentation.
I got a sample graph which can be constructed with following DSL:
g.addV('A').property(id, 'A1')
g.addV('B').property(id, 'B1').addE('B').from(V('A1'))
g.addV('B').property(id, 'B2').addE('B').from(V('A1'))
g.addV('C').property(id, 'C1').addE('C').from(V('B1'))
g.addV('C').property(id, 'C2').addE('C').from(V('B2'))
g.addV('BB').property(id, 'BB1').property('age', 2).addE('BB').from(V('B2'))
g.addV('BB').property(id, 'BB2').addE('BB').from(V('B2'))
g.addV('BB').property(id, 'BB3').addE('BB').from(V('B1'))
I wanna to traverse from vertices with Label A, through edges with Label 'B', 'C', and output all the path with 'BB' attached with each 'B' vertex, I can manage to get the result use:
g.V().hasLabel('A').as('a').
out('B').as('b').
out('C').as('c').
project('shop', 'product', 'spec', 'device').
by(select('a').valueMap(true)).
by(select('b').valueMap(true)).
by(select('b').out('BB').valueMap(true).fold()).
by(select('c').valueMap(true))
Then I ran into another scenario, I have to filter 'B' vertex with condition of 'BB', which can be achieved by:
g.V().hasLabel('A').as('a').
out('B').where(out('BB').has('age', 2)).as('b').
out('C').as('c').
project('shop', 'product', 'spec', 'device').
by(select('a').valueMap(true)).
by(select('b').valueMap(true)).
by(select('b').out('BB').has('age', 2).valueMap(true).fold()).
by(select('c').valueMap(true))
My question is: Can i reuse the result of Where Step instead of filter 'BB' again in Projection ?
Any help is appreciated.
In the context of your approach, no, you cannot simply re-use the results of the traversal within the where(). The reason is fairly straightforward in that the where() doesn't fully iterate the result - it seeks a what amounts to a hasNext() to detect the first item in the Iterator.
So, depending on the selectivity of has('age',2) and the fact that where() is really just looking for one result, the cost of that traversal may not be terribly expensive and you could possibly live with it traversing twice. If it is "expensive" and your graph supports some sort of vertex-centric index you might denormalize "age" to the "BB" edge and then just do where(outE('BB').has('age',2)).
Another way to possibly look at it would be to simplify your traversal a bit. Since you use step labels, why not eliminate project() and directly traverse "BB":
gremlin> g.V().hasLabel('A').as('shop').
......1> out('B').as('product').
......2> out('BB').has('age', 2).as('spec').
......3> select('product').
......4> out('C').as('device').
......5> select('shop', 'product', 'spec', 'device').
......6> by(valueMap(true))
==>[shop:[id:A1,label:A],product:[id:B2,label:B],spec:[id:BB1,label:BB,age:[2]],device:[id:C2,label:C]]
That's a much more readable traversal, but makes some assumptions about your data and the shape of your result that may not quite match what you were doing with project(). I suppose that with a fair bit of Gremlin collection manipulation you could bring the grouping around "spec" back, but then the readability starts to fall apart.
The following approach seems sacrifices some readability to do the out('BB').has('age',2) just once:
gremlin> g.V().hasLabel('A').as('shop').
......1> out('B').as('product').
......2> project('s').
......3> by(out('BB').has('age', 2).valueMap(true).fold()).as('spec').
......4> where(select('s').unfold()).
......5> select('product').
......6> out('C').as('device').
......7> select('shop', 'product', 'spec', 'device').
......8> by(valueMap(true)).
......9> by(valueMap(true)).
.....10> by(select('s')).
.....11> by(valueMap(true))
==>[shop:[id:A1,label:A],product:[id:B2,label:B],spec:[[id:BB1,label:BB,age:[2]]],device:[id:C2,label:C]]
If I were looking at this for the first time, I'd immediately wonder what the point of lines 2-4 where doing. It's not clear that the whole point of a the Map produce by project('s') is to fully realize the results of out('BB').has('age', 2) so that they can be used at line 4 to filter those traversers away. I don't think we'd often recommend this approach except that in this case you need to realize the entire result no matter what. If there is even one result then you need all of them, so may as well grab them all up front.
I'm new to gremlin, please help me with a query for below graph data.
Gremlin sample graph
graph = TinkerGraph.open()
g = graph.traversal()
v1 = g.addV('4630').property('loc','B_1_1').next()
v2 = g.addV('4630').property('loc','C_1_1').next()
e1 = g.addE('sp').from(v1).to(v2).property('dist',1).property('anglein',90).property('angleout',45).next()
e2 = g.addE('sp').from(v2).to(v1).property('dist',2).property('anglein',190).property('angleout',145)
Expected result:
source destination dist angein angleout
B_1_1 C_1_1 1 90 145
C_1_1 B_1_1 2 190 145
Query that I'm trying is:
g.V().has('4630','loc',within('B_1_1','C_1_1')).
outE('sp').
inV().has('4630','loc',within('B_1_1','C_1_1')).
path().
by('loc').
by(valueMap().select(values)).
by('loc')
With below result
==>[B_1_1,[90,1,45],C_1_1]
==>[C_1_1,[190,2,145],B_1_1]
Want to have all the path edge properties in the result without any inner result. Please help how can I achieve the expected result?
It sounds like you just want to flatten your result.
gremlin> g.V().has('4630','loc',within('B_1_1','C_1_1')).
......1> outE('sp').
......2> inV().has('4630','loc',within('B_1_1','C_1_1')).
......3> path().
......4> by('loc').
......5> by(valueMap().select(values)).
......6> by('loc').
......7> map(unfold().unfold().fold())
==>[B_1_1,90,1,45,C_1_1]
==>[C_1_1,190,2,145,B_1_1]
Each path will need to be flattened so you want to apply that operation with map(). To flatten you need to first unfold() the path and then unfold() each item in the path. Since the map() operation will only next() that child traversal you need to include a final fold() to convert that flattened stream of objects back to a List.
Adding to what Stephen already said, you could also get rid of the by() modulation in your path step and instead use the path elements to collect all the values you need afterward. This will save you a few traversers and thus it should be slightly faster.
g.V().has('4630','loc',within('B_1_1','C_1_1')).
outE('sp').inV().has('4630','loc',within('B_1_1','C_1_1')).
path().
map(unfold().values('loc','dist','anglein','angleout').fold())
Also, note that even if you prefer the other query, you shouldn't use valueMap. valueMap().select(values) is just a waste of resources in my opinion.