I got a sample graph which can be constructed with following DSL:
g.addV('A').property(id, 'A1')
g.addV('B').property(id, 'B1').addE('B').from(V('A1'))
g.addV('B').property(id, 'B2').addE('B').from(V('A1'))
g.addV('C').property(id, 'C1').addE('C').from(V('B1'))
g.addV('C').property(id, 'C2').addE('C').from(V('B2'))
g.addV('BB').property(id, 'BB1').property('age', 2).addE('BB').from(V('B2'))
g.addV('BB').property(id, 'BB2').addE('BB').from(V('B2'))
g.addV('BB').property(id, 'BB3').addE('BB').from(V('B1'))
I wanna to traverse from vertices with Label A, through edges with Label 'B', 'C', and output all the path with 'BB' attached with each 'B' vertex, I can manage to get the result use:
g.V().hasLabel('A').as('a').
out('B').as('b').
out('C').as('c').
project('shop', 'product', 'spec', 'device').
by(select('a').valueMap(true)).
by(select('b').valueMap(true)).
by(select('b').out('BB').valueMap(true).fold()).
by(select('c').valueMap(true))
Then I ran into another scenario, I have to filter 'B' vertex with condition of 'BB', which can be achieved by:
g.V().hasLabel('A').as('a').
out('B').where(out('BB').has('age', 2)).as('b').
out('C').as('c').
project('shop', 'product', 'spec', 'device').
by(select('a').valueMap(true)).
by(select('b').valueMap(true)).
by(select('b').out('BB').has('age', 2).valueMap(true).fold()).
by(select('c').valueMap(true))
My question is: Can i reuse the result of Where Step instead of filter 'BB' again in Projection ?
Any help is appreciated.
In the context of your approach, no, you cannot simply re-use the results of the traversal within the where(). The reason is fairly straightforward in that the where() doesn't fully iterate the result - it seeks a what amounts to a hasNext() to detect the first item in the Iterator.
So, depending on the selectivity of has('age',2) and the fact that where() is really just looking for one result, the cost of that traversal may not be terribly expensive and you could possibly live with it traversing twice. If it is "expensive" and your graph supports some sort of vertex-centric index you might denormalize "age" to the "BB" edge and then just do where(outE('BB').has('age',2)).
Another way to possibly look at it would be to simplify your traversal a bit. Since you use step labels, why not eliminate project() and directly traverse "BB":
gremlin> g.V().hasLabel('A').as('shop').
......1> out('B').as('product').
......2> out('BB').has('age', 2).as('spec').
......3> select('product').
......4> out('C').as('device').
......5> select('shop', 'product', 'spec', 'device').
......6> by(valueMap(true))
==>[shop:[id:A1,label:A],product:[id:B2,label:B],spec:[id:BB1,label:BB,age:[2]],device:[id:C2,label:C]]
That's a much more readable traversal, but makes some assumptions about your data and the shape of your result that may not quite match what you were doing with project(). I suppose that with a fair bit of Gremlin collection manipulation you could bring the grouping around "spec" back, but then the readability starts to fall apart.
The following approach seems sacrifices some readability to do the out('BB').has('age',2) just once:
gremlin> g.V().hasLabel('A').as('shop').
......1> out('B').as('product').
......2> project('s').
......3> by(out('BB').has('age', 2).valueMap(true).fold()).as('spec').
......4> where(select('s').unfold()).
......5> select('product').
......6> out('C').as('device').
......7> select('shop', 'product', 'spec', 'device').
......8> by(valueMap(true)).
......9> by(valueMap(true)).
.....10> by(select('s')).
.....11> by(valueMap(true))
==>[shop:[id:A1,label:A],product:[id:B2,label:B],spec:[[id:BB1,label:BB,age:[2]]],device:[id:C2,label:C]]
If I were looking at this for the first time, I'd immediately wonder what the point of lines 2-4 where doing. It's not clear that the whole point of a the Map produce by project('s') is to fully realize the results of out('BB').has('age', 2) so that they can be used at line 4 to filter those traversers away. I don't think we'd often recommend this approach except that in this case you need to realize the entire result no matter what. If there is even one result then you need all of them, so may as well grab them all up front.
Related
Objective
I want to generate random walks in Gremlin, and already have the command to generate one: g.V(<start_id>).repeat(local(both().sample(1))).times(<depth>).path().
While this is good, I do have to generate <nb_rw_per_node> random walks per start node, and I'd like to use a unique query to handle it if possible.
Issue
I've tried using the repeat() step, in combination with select() to do this, as follows:
g.V(<start_id>).as("start").
repeat(
select("start").
repeat(
local(
both().sample(1)
)
).times(<depth>).path()
).emit().times(<nb_rw_per_node>)
This yields the following results, which I don't understand (here, <depth> = 2 and <nb_rw_per_nodes> = 2)
gremlin> g.V(6652128).as("start").repeat(select("start").repeat(local(both().sample(1))).times(2).path()).emit().times(2)
==>path[v[6652128], v[6652128], v[95670392], v[1044704]]
==>path[v[6652128], v[6652128], v[95670392], v[1044704], path[v[6652128], v[6652128], v[95670392], v[1044704]], v[6652128], v[94818432], v[245928]]
How can I not get the first node doubled in the path?
Why is the second result the concatenation of the first result and the concatenation of the first result and a random walk of the correct length? I expected to get another path of the same format as the first one.
Is this the correct way to generate multiple paths from a same initial node in a single query? If so, how can I correct my query?
Thanks to everyone reading and answering!
When you select you essentially add another copy of the thing selected to the path. If you need 2 random walks from the same start, why not just include the start twice at the very beginning? So the query becomes something like this (using a data set I have to hand):
gremlin> g.V(44,44).repeat(local(out().sample(1))).times(2).path()
==>[v[44],v[8],v[580]]
==>[v[44],v[20],v[34]]
To use nested repeat steps you will need something like this:
gremlin> g.V('44').as('s').
......1> repeat(select('s').as('start').
......2> repeat(local(out().sample(1))).
......3> times(4).path().from('start')).
......4> times(3).
......5> emit()
==>[v[44],v[31],v[271],v[149],v[4]]
==>[v[44],v[31],v[264],v[1],v[152]]
==>[v[44],v[8],v[38],v[4],v[190]]
This last option is a little gimmicky, but also works.
gremlin> g.V(44).
......1> repeat(store('x').identity()).times(3).
......2> cap('x').
......3> unfold().as('start').
......4> repeat(local(out().sample(1))).
......5> times(2).
......6> path().
......7> from('start')
==>[v[44],v[31],v[42]]
==>[v[44],v[8],v[407]]
==>[v[44],v[13],v[53]]
In each of the last two examples, the real key is the introduction of the from step to avoid the redundant starting vertex entries from being included. Try running the queries without the from to see the difference.
I am using gremlin query language with Neptune gdb, and experienced a weird behavior when using select:
Let's say my graph is a single node
g.addV("test").property(id,"v1")
and I try this query:
g.V("v1").as("a")
.V().has("test","name","non-existing-name")
.fold().coalesce(unfold(),V("v1")).as("b")
.select("b")
The response is v[v1] as expected.
If I do the same with select("a") at the end:
g.V("v1").as("a")
.V().has("test","name","non-existing-name")
.fold().coalesce(unfold(),V("v1")).as("b")
.select("a")
I get the same result, again- as expected.
the weird behavior is when I try to use select("a","b") at the end:
g.V("v1").as("a")
.V().has("test","name","non-existing-name")
.fold().coalesce(unfold(),V("v1")).as("b")
.select("a","b")
For some reason I get an empty response. Any idea why?
(I did find out that replacing the first as with store works, but I don't understand why)
I don't quite get the same results as you do for that second traversal and I would not expect to. Here is what I would expect to see:
gremlin> g.addV("test").property(id,"v1")
==>v[v1]
gremlin> g.V("v1").as("a").
......1> V().has("test","name","non-existing-name").
......2> fold().coalesce(unfold(),V("v1")).as("b").
......3> select("b")
==>v[v1]
gremlin> g.V("v1").as("a").
......1> V().has("test","name","non-existing-name").
......2> fold().coalesce(unfold(),V("v1")).as("b").
......3> select("a")
gremlin> g.V("v1").as("a").
......1> V().has("test","name","non-existing-name").
......2> fold().coalesce(unfold(),V("v1")).as("b").
......3> select("a","b")
gremlin>
Note that the last two traversal do not return results.
When you fold() you lose the path history to "a" so the traversal can't select() that step label any more. In general, you can't reference back to step labels that are on the opposite side of a reducing barrier step (like fold()). Other examples of reducing barriers would be steps like sum(), max(), min(), etc - where you have a number of traversers that reduce to a single one.
I'm using Amazon Neptune, which does not support variables. For complex queries, however, I need to use a variable in multiple places. How can I do this without querying twice for the same data?
Here's the problem I'm trying to tackle:
Given a start Person, find Persons that the start Person is connected to by at most 3 steps via the knows relationship. Return each Person's name and email, as well as the distance (1-3).
How would I write this query in Gremlin without variables, since variables are unsupported in Neptune?
I don't see any reason why you would need variables for your traversal and there are many ways you could get an answer. Assuming this graph:
g = TinkerGraph.open().traversal()
g.addV('person').property('name','A').property('age',20).as('a').
addV('person').property('name','B').property('age',21).as('b').
addV('person').property('name','C').property('age',22).as('c').
addV('person').property('name','D').property('age',19).as('d').
addV('person').property('name','E').property('age',22).as('e').
addV('person').property('name','F').property('age',24).as('f').
addE('next').from('a').to('b').
addE('next').from('b').to('c').
addE('next').from('b').to('d').
addE('next').from('c').to('e').
addE('next').from('d').to('e').
addE('next').from('e').to('f').iterate()
You could do something like:
gremlin> g.V().has('person','name','A').
......1> repeat(out().
......2> group('m').
......3> by(loops()).
......4> by(valueMap('name','age').by(unfold()).fold())).
......5> times(3).
......6> cap('m')
==>[0:[[name:B,age:21]],1:[[name:C,age:22],[name:D,age:19]],2:[[name:E,age:22],[name:E,age:22]]]
Find a particular "person" vertex by their name, in this case "A", then repeatedly traverse out() and group those vertices you come across by loops() which is how deep you have traversed. I use valueMap() in this case to extract the properties you wanted. The times(3) is the limit to the depth of your search. Finally you cap() out the side-effect Map held in "m" from our group(). That approach was meant to just give you a bit of basic structure to how you would accomplish this. You could perhaps polish it further this way:
gremlin> g.V().has('person','name','A').
......1> repeat(out().
......2> group('m').
......3> by(loops())).
......4> times(3).
......5> cap('m').unfold().select(values).unfold().
......6> dedup().
......7> valueMap('name','age').by(unfold())
==>[name:B,age:21]
==>[name:C,age:22]
==>[name:D,age:19]
==>[name:E,age:22]
The above example, extracts the values from the Map in "m", removes the duplicates with dedup() and then converts to the result you want. Maybe you don't need the Map in the first place (I just have it on my mind because of this answer actually) - you could simple store() your results as follows:
gremlin> g.V().has('person','name','A').
......1> repeat(out().store('m')).
......2> times(3).
......3> cap('m').unfold().
......4> dedup().
......5> valueMap('name','age').by(unfold())
==>[name:B,age:21]
==>[name:C,age:22]
==>[name:D,age:19]
==>[name:E,age:22]
You might look at using something like simplePath() as well to help avoid re-traversing the same paths over and over again. You can read about that step in the Reference Documentation.
I'm trying to put together a Gremlin query that returns results for 1 to n depth of a certain edge type - without having to resort to using multiple queries stitched together with .union().
I have some test data that simulates the structure of sales offices and people that work in them, including who manages which offices and which offices "roll up" under the jurisdiction of which higher level offices. The following screen shot (from Neo4j, actually) shows a subset of the graph that I'm going to reference.
The graph can be created with the following:
g.
addV('Office').as('O_111').property('code','111').
addV('Office').as('O_356').property('code','356').
addV('Office').as('O_279').property('code','279').
addV('Office').as('O_KC5').property('code','KC5').
addE('MERGES_INTO').from('O_356').to('O_111').
addE('MERGES_INTO').from('O_279').to('O_356').
addE('MERGES_INTO').from('O_KC5').to('O_279').
addV('Person').as('Bob').property('name','Bob').
addE('MANAGES').from('Bob').to('O_111').addE('WORKS_WITH').from('Bob').to('O_111').
addV('Person').as('Michael').property('name','Michael').addE('WORKS_WITH').from('Michael').to('O_111').
addV('Person').as('John').property('name','John').addE('WORKS_WITH').from('John').to('O_111').
addV('Person').as('Rich').property('name','Rich').addE('WORKS_WITH').from('Rich').to('O_111').
addV('Person').as('Matt').property('name','Matt').
addE('WORKS_WITH').from('Matt').to('O_279').addE('MANAGES').from('Matt').to('O_279').
addV('Person').as('Judy').property('name','Judy').addE('WORKS_WITH').from('Judy').to('O_279').
addV('Person').as('Joe').property('name','Joe'). addE('WORKS_WITH').from('Joe').to('O_279').
addV('Person').as('Ben').property('name','Ben').addE('WORKS_WITH').from('Ben').to('O_279').
addV('Person').as('Ron').property('name','Ron').addE('WORKS_WITH').from('Ron').to('O_KC5').
If I want to see which people (orange) that work with an office (pink) that Bob directly or indirectly manages (because, for example, offices KC5, 279, and 356 roll up to Bob's 111 office), I can use .union() and something like the following to get the proper results:
gremlin> g.V().has('Person','name','Bob').
......1> out('MANAGES').
......2> union(
......3> __.in('WORKS_WITH'),
......4> __.in('MERGES_INTO').in('WORKS_WITH'),
......5> __.in('MERGES_INTO').in('MERGES_INTO').in('WORKS_WITH'),
......6> __.in('MERGES_INTO').in('MERGES_INTO').in('MERGES_INTO').in('WORKS_WITH')
......7> ).
......8> values('name').fold()
==>[Bob, Michael, John, Rich, Matt, Judy, Joe, Ben, Ron]
That seems super verbose and awkward. Is that my only choice? Is there a better way that doesn't seem so redundant like .union()?
Coming from a Neo4j world, I'd just do something with a ranged depth of "0 or more" using *0.., like this:
MATCH (manager:Person {name:'Bob'})
OPTIONAL MATCH (manager)-[:MANAGES]->(:Office)<-[:MERGES_INTO*0..]-(:Office)<-[:WORKS_WITH]-(p:Person)
RETURN p
How do I achieve the same sort of thing in Gremlin? Even if I can't do open ended, but could do 1 to some arbitrary limit (say, 1 to 10), that would work. It probably wouldn't matter, but I will be using AWS Neptune for the actual Graph database.
When asking questions about Gremlin, a picture of your graph is nice, but a script that provides some sample data is even better - like this:
g.addV('person').property('name','michael').as('mi').
addV('person').property('name','john').as('jo').
addV('person').property('name','rich').as('ri').
addV('person').property('name','bob').as('bo').
addV('person').property('name','matt').as('ma').
addV('person').property('name','ron').as('ro').
addV('person').property('name','joe').as('joe').
addV('person').property('name','ben').as('be').
addV('person').property('name','judy').as('ju').
addV('office').property('name','111').as('111').
addV('office').property('name','356').as('356').
addV('office').property('name','279').as('279').
addV('office').property('name','kc5').as('kc5').
addE('mergesInto').from('kc5').to('279').
addE('mergesInto').from('279').to('356').
addE('mergesInto').from('356').to('111').
addE('worksWith').from('mi').to('111').
addE('worksWith').from('jo').to('111').
addE('worksWith').from('ri').to('111').
addE('worksWith').from('bo').to('111').
addE('manages').from('bo').to('111').
addE('worksWith').from('ma').to('279').
addE('manages').from('ma').to('279').
addE('worksWith').from('joe').to('279').
addE('worksWith').from('be').to('279').
addE('worksWith').from('ju').to('279').
addE('worksWith').from('ro').to('kc5').iterate()
Your instincts are correct where union() isn't quite right for what you want to do. I would prefer repeat():
gremlin> g.V().has('person','name','bob').
......1> out('manages').
......2> repeat(__.in('worksWith','mergesInto')).
......3> emit(hasLabel('person')).
......4> values('name')
==>bob
==>michael
==>john
==>rich
==>matt
==>joe
==>ben
==>judy
==>ron
In this way it traverses to arbitrary depth (though we tend to recommend setting some kind of sensible limit to avoid problems if you run into some unexpected cycle somewhere) and is much more succinct. Note the use of emit() which controls which types of vertices are returned from the repeat() - if you do not include that filter you will also return "office" vertices.
I'm quite new to Gremlin, I've been practicing a bit with this guide, but when it comes to writing more complex queries I clearly haven't got the hang of it yet. To put you in context, I'm trying to answer a question that in SQL can easily be cracked with a self-join.
Imagine the following simplified graph:
As you can see, there are two types of entities in the graph: Routes and Legs. A Route is made of 1+ Legs following a particular order (specified in the edge), and a Leg can be in several Routes.
The question I want to answer is: which routes travel from one country to another, and then back to the previous country?
In the case of the graph above, Route 1 goes from ES to FR in the first Leg, and from FR to ES in the third Leg, so the output of the query would look like:
=> Route id: 1
=> Leg1 order: 1
=> Leg1 id: 1
=> Leg2 order: 3
=> Leg2 id: 3
If I had the following relational table:
route_id leg_id order source_country destination_country
1 1 1 ES FR
1 2 2 FR FR
1 3 3 FR ES
I could get the desired output with the following query:
SELECT
a.route_id
,a.leg_id
,a.order
,b.leg_id
,b.order
FROM Routes a
JOIN Routes b
ON a.route_id = b.route_id
AND a.source_country = b.destination_country
AND a.destination_country = b.source_country
WHERE a.source_country <> a.destination_country;
When it comes to writing it in Gremlin, I'm really not quite sure how to start. My inexperience makes me want to perform a self-join as well, but even then I didn't get very far:
g.V().hasLabel('Route').as('a').V().hasLabel('Route').as('b').where('a', eq('b')).and(join 'a' edges&legs with 'b' edges&legs)...
And that's about it, because I don't know how to reference a again as an object that can be traversed to look for the edges and legs connected to the routes.
Any help/guidance would be greatly appreciated, it could definitely happen that this problem can be solved in a simpler way as well :)
Thanks,
BĂ©ntor
With graphs you should try to think of terms of "navigating connected things" rather than "joining disparate things" because with a graph the things are already joined explicitly. It also helps to think in terms of streams of things being lazily evaluated (i.e. objects going from one Gremlin step to the next).
First of all, the picture is nice but it's always more helpful to provide some sample data in the form of a Gremlin script like this:
g = TinkerGraph.open().traversal()
g.addV('route').property('rid',1).as('r1').
addV('route').property('rid',2).as('r2').
addV('route').property('rid',3).as('r3').
addV('leg').property('lid',1).property('source','ES').property('dest','FR').as('l1').
addV('leg').property('lid',2).property('source','FR').property('dest','FR').as('l2').
addV('leg').property('lid',3).property('source','FR').property('dest','ES').as('l3').
addV('leg').property('lid',4).property('source','ES').property('dest','FR').as('l4').
addV('leg').property('lid',5).property('source','FR').property('dest','FR').as('l5').
addV('leg').property('lid',6).property('source','FR').property('dest','US').as('l6').
addE('has_leg').from('r1').to('l1').property('order',1).
addE('has_leg').from('r1').to('l2').property('order',2).
addE('has_leg').from('r1').to('l3').property('order',3).
addE('has_leg').from('r3').to('l4').property('order',1).
addE('has_leg').from('r3').to('l5').property('order',2).
addE('has_leg').from('r3').to('l6').property('order',3).
addE('has_leg').from('r2').to('l2').property('order',1).iterate()
Your question was:
which routes travel from one country to another, and then back to the previous country?
Note that I added some extra data that didn't meet the requirements of that question to be sure my traversal was working properly. I suppose I assumed that you were open to getting routes that just stayed in the country like a leg that just went from "FR" to FR" as it started in "FR" and ended in that "previous country". I guess I could revise this further to do that if you really needed me to, but for now I will stick with that assumption since you're just learning.
After considering the data and reading that question I immediately thought, let's find the routes which you did well enough and then let's just see what it takes to get the start leg of the trip and the end leg of the trip for that route:
gremlin> g.V().hasLabel('route').
......1> map(outE('has_leg').
......2> order().by('order').
......3> union(limit(1).inV().values('source'), tail().inV().values('dest')).
......4> fold())
==>[ES,ES]
==>[FR,FR]
==>[ES,US]
So, I find a "route" vertex with hasLabel('route') and then I convert each into a List of the start and end country (i.e. a pair where the first item is the "source" country and the second item is the "dest" country). To do that I traverse outgoing "has_leg" edges, order them. Once ordered I grab the first edge in the stream (i.e with limit(1)) and traverse to the incoming "leg" vertex and grab its "source" value and do the same for the last incoming vertex of the edge (i.e. with tail()) but this time grab its "dest" value. We then use fold() to push that two item stream from union() into a List. Again, because this all happens inside of map() we are effectively doing it for each "route" vertex so we get three pairs as a result.
With that output we just now need to compare the start/end values in the pairs to determine which represent a route starting and ending in the same country.
gremlin> g.V().hasLabel('route').
......1> filter(outE('has_leg').
......2> order().by('order').
......3> fold().
......4> project('start','end').
......5> by(unfold().limit(1).inV().values('source')).
......6> by(unfold().tail().inV().values('dest')).
......7> where('start', eq('end'))).
......8> elementMap()
==>[id:0,label:route,rid:1]
==>[id:2,label:route,rid:2]
At line 1, note that we changed map() to filter(). I only used map() initially so that I could see the results of what I was traversing before I worried about how to use those results to get rid of the data I didn't want. That's a common practice with Gremlin as you build more and more complexity in your traversals. So we are now ready to apply a filter() to each "route" vertex. I imagine that there are a number of ways to do this, but I chose to gather all the ordered edges into a List at line 3. I then project() that step at line 4 and transform the edge list for both "start" and "end" keys using the associated by() modulators. In both cases I must unfold() the edge list to a stream and then apply the same limit(1) and tail() sort of traversal that was explained earlier. The result is a Map with "start" and "end" keys which can be compared using where() step. As you can see from the result, the third route that started in "ES" and ended in "US" has been filtered away.
I'll expand my answer based on your comment - Since all of my previous data seems to align with your more general case of wanting to find any route that returns to a country in any sense:
g = TinkerGraph.open().traversal()
g.addV('route').property('rid',1).as('r1').
addV('route').property('rid',2).as('r2').
addV('route').property('rid',3).as('r3').
addV('route').property('rid',4).as('r4').
addV('leg').property('lid',1).property('source','ES').property('dest','FR').as('l1').
addV('leg').property('lid',2).property('source','FR').property('dest','FR').as('l2').
addV('leg').property('lid',3).property('source','FR').property('dest','ES').as('l3').
addV('leg').property('lid',4).property('source','ES').property('dest','FR').as('l4').
addV('leg').property('lid',5).property('source','FR').property('dest','FR').as('l5').
addV('leg').property('lid',6).property('source','FR').property('dest','US').as('l6').
addV('leg').property('lid',7).property('source','ES').property('dest','FR').as('l7').
addV('leg').property('lid',8).property('source','FR').property('dest','CA').as('l8').
addV('leg').property('lid',9).property('source','CA').property('dest','US').as('l9').
addE('has_leg').from('r1').to('l1').property('order',1).
addE('has_leg').from('r1').to('l2').property('order',2).
addE('has_leg').from('r1').to('l3').property('order',3).
addE('has_leg').from('r3').to('l4').property('order',1).
addE('has_leg').from('r3').to('l5').property('order',2).
addE('has_leg').from('r3').to('l6').property('order',3).
addE('has_leg').from('r4').to('l7').property('order',1).
addE('has_leg').from('r4').to('l8').property('order',2).
addE('has_leg').from('r4').to('l9').property('order',3).
addE('has_leg').from('r2').to('l2').property('order',1).iterate()
If I have this right the newly added "rid=4" route should be filtered as its route never revisits the same country. I think this bit of Gremlin is even easier than what I suggested previously because now we just need to look for unique routes which means that if we satisfy one of these two situations then we've found a route we care about:
There is one leg and it starts/ends in the same country
There are multiple legs and if the number of times that country appears in the route exceeds 2 (because we are taking into account "source" and "dest")
Here's the Gremlin:
gremlin> g.V().hasLabel('route').
......1> filter(out('has_leg').
......2> union(values('source'),
......3> values('dest')).
......4> groupCount().
......5> or(select(values).unfold().is(gt(2)),
......6> count(local).is(1))).
......7> elementMap()
==>[id:0,label:route,rid:1]
==>[id:2,label:route,rid:2]
==>[id:4,label:route,rid:3]
If you understood my earlier explanations of the code, then you likely follow everything up to line 5 where we take the Map produced by the groupCount() on country names and apply the two filter conditions I just described. At line 5, we apply the second condition which extracts the values from the Map (i.e. the counts of the number of times each country appears) and detects if any are greater than 2. On line 6, we count the entries in the Map which maps to the first condition. Note that we use local there because we aren't counting the Map-objects in the stream but the entries within the Map (i.e. local to the Map).
Just in case it's useful here is a similar example I was playing with before I saw Stephen had already answered. This uses the air-routes data set from the tutorial. The first example starts specifically at LHR. The second looks at all airports. I assumed a constant of 2 segments. You could change that by modifying the query, and, as Stephen mentioned, there are many ways you could approach this.
gremlin> g.V().has('code','LHR').as('a').
......1> out().
......2> where(neq('a')).by('country').
......3> repeat(out().simplePath()).times(1).
......4> where(eq('a')).by('country').
......5> path().
......6> by(values('country','code').fold()).
......7> limit(5)
==>[[UK,LHR],[MA,CMN],[UK,LGW]]
==>[[UK,LHR],[MA,CMN],[UK,MAN]]
==>[[UK,LHR],[MA,TNG],[UK,LGW]]
==>[[UK,LHR],[CN,CTU],[UK,LGW]]
==>[[UK,LHR],[PT,FAO],[UK,BHX]]
gremlin> g.V().hasLabel('airport').as('a').
......1> out().
......2> where(neq('a')).by('country').
......3> repeat(out().simplePath()).times(1).
......4> where(eq('a')).by('country').
......5> path().
......6> by(values('country','code').fold()).
......7> limit(5)
==>[[US,ATL],[CL,SCL],[US,DFW]]
==>[[US,ATL],[CL,SCL],[US,IAH]]
==>[[US,ATL],[CL,SCL],[US,JFK]]
==>[[US,ATL],[CL,SCL],[US,LAX]]
==>[[US,ATL],[CL,SCL],[US,MCO]]
For your specific example, the technique Stephen used taking advantage of segments having an order number is much nicer. The air-routes data set does not have a concept of a segment but thought this might be of some interest as you start exploring Gremlin more.