TinkerPop: Filter by Edge count - gremlin

Sample data: TinkerPop Modern
Summary: I want to find People who have created 2 softwares.
I started with the basics, and got the count properly
g.V().hasLabel("Person").as("from" ,"to1" )
.repeat(bothE().as("e1").otherV().as("to1").dedup("from", "to1")).times(1)
.emit(filter(hasLabel("Software"))).hasLabel("Software")
.group().by(select("from").by("name")).by(count()).as("c")
Result:
>> {'Marko': 1, 'Peter': 1, 'Josh': 2}
So I tried to apply a filter but its not working (ie. Result is incorrect), what I tried:
g.V().hasLabel("Person").as("from")
.repeat(bothE().as("e1").otherV().as("to1").dedup("from", "to1")).times(1)
.filter(bothE().otherV().hasLabel("Software").count(local).is(eq(1)))
.dedup()
.values("name")
Any idea what am I doing wrong?
Sample data:

If you just need "person" vertices by edge count I don't really see why you need all that repeat() infrastructure. It's just:
gremlin> g.V().hasLabel('person').
......1> filter(outE('created').limit(2).count().is(2))
==>v[4]
You only need to count outgoing edges because the schema is such that the "created" label only connects to "software", so you don't need to check the
"software" vertex label. You limit(2) to exit the edge iteration as soon as possible but not before you have the 2 edges you are trying to count.

Related

Tinkerpop Gremlin Get Edges that go to vertices within a list

I'm trying to query for edges that go to vertices within an aggregated list. It sounds quite simple, and it should be, but I seem to be writing my queries wrong, and I just can't figure out why. Anyway, I'll use the Modern Toy Graph to make an example, that won't necessarily make much sense in this context, but still illustrates what I wish to do:
graph = TinkerFactory.createModern()
g = graph.traversal()
g.V().
hasLabel('person').
aggregate('x').
outE().
where(inV().is(within('x')))
What I'm doing is traversing to all 'person' vertices, aggregating them, then trying to get all the outgoing edges that lead to another vertex within that aggregated list. I expect the above query to return the edge labelled "knows" that goes between vertex 1 and 2, and the one between 1 and 4, however nothing is returned. If i simple want to get the vertices on the other end of those edges, rather than the edges themselves, the following works fine, returning vertex 2 and 4:
g.V().
hasLabel('person').
aggregate('x').
out().
where(within('x'))
So how can I get edges that lead to vertices already aggregated in a list?
(Once again, I'm aware this example doesn't make much sense within this particular graph, and I could easily query outE('knows'), but this query is relevant to a different graph.)
Thanks.
You can't use is() quite that way. An easy fix would be to just combine your "working" traversal with the one that doesn't:
gremlin> g.V().hasLabel('person').
......1> aggregate('x').
......2> outE().
......3> where(inV().where(within('x')))
==>e[7][1-knows->2]
==>e[8][1-knows->4]

Can someone explain what this graph travesal in Gremlin is doing?

I'm having a bit of trouble understand these Gremlin queries:
from os import getenv
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
pmap = g.V().has(name, value) \
.union(__.hasLabel('UID'),
__.hasLabel('OID').outE('attached').inV()) \
.union(__.propertyMap(),
__.inE('attached').outV().hasLabel('OID') \
.propertyMap()).toList()
So I understand g.V().has(name, value) is looking for a vertex with the keyname = value. What is the union doing here? Is it unioning vertices with a label "OID" with edges that go outward with a label "attached"?
What is theinV()` and why are the two arguments for union?
The union() step just merges the child traversal streams that are provided to it as arguments. Take a more simple example:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has('person','name','marko').union(has('age',29),bothE())
==>v[1]
==>e[9][1-created->3]
==>e[7][1-knows->2]
==>e[8][1-knows->4]
gremlin> g.V().has('person','name','marko').union(has('age',30),bothE())
==>e[9][1-created->3]
==>e[7][1-knows->2]
==>e[8][1-knows->4]
In the first example, we get union() takes in a "marko" vertex as the starting point for both has('age',29) and bothE(). As v[1] also has a "age" property with a value of "29" we see v[1] in the output. We also see all the edges of v[1] in merged into that stream of output. In the second traversal, we see v[1] being filtered out as the "age" is not equal to "30" so all we get are the edges.
With that explanation in mind, consider what the traversal you've included in your question is doing. It finds a vertex with a "name" and some value for that key. That becomes the start point for the first union(). If the vertex has a label of "UID" then it pass through. If the vertex has a label of "OID" then it traverses the outgoing "attached" edges to the adjacent vertex and returns those.
What's odd about that is the fact that a Vertex can only have one label (at least by TinkerPop's definition - some graphs support multiple element labels). So, assuming one label, you really only get one or the other stream. Personally, I don't think the use of union() is a good choice there. I think it would be more intuitive to use coalesce since only one stream can be returned, thus expanding my example from above:
gremlin> g.V().has('person','name','marko').coalesce(has('age',30),has('age',29).bothE())
==>e[9][1-created->3]
==>e[7][1-knows->2]
==>e[8][1-knows->4]
gremlin> g.V().has('person','name','marko').coalesce(has('age',29),has('age',29).bothE())
==>v[1]
The use of coalesce() makes the intent much more clear in my opinion. Following on further with the original code to the second union() - at this point, you either have the original Vertex or one or more "attached" vertices for which the traversal combines a propertyMap() and/or a propertyMap() of an additional "attached" vertices that have an "OID" label.
It's really hard to say exactly what the intent of this traversal is given the information provided. Depending on what the data structure is and what the intent is, I imagine that things could be simplified. Hopefully, I've at least explained what union() is doing and clarified that for you as that seemed to be the core of your question.

Gremlin: how to identify which properties belong to which edges

I have a simple graph with two vertices, having ids 'a' and 'b'.
I have assigned two edges from 'a' to 'b' where each edge has the label = "foo"
[1] gremlin> g.V('a').outE()
==>e[f4b4b71d-ca98-5302-3eb1-7f99a7e74081][a-foo->b]
==>e[98b4b71d-c8c9-4ca2-9fbe-2f58e33d25e4][a-foo->b]
Each of the edges has a property with key = "committed".
[2] gremlin> g.E().properties()
==>p[committed->2]
==>p[committed->1]
My question: I want to enumerate the edges and return their respective properties as in step [2], but how can I match the edge-properties in the results back to their respective edges (ids)? All I get back are the property key-value assignments; nothing that relates to an edge id.
Thanks,
Joel Stevick
You should avoid returning graph elements like vertices and edges and instead transform your result to the specific form in which you need it. You could do that in a number of ways. In this case project() works nicely:
gremlin> g.V().outE().project('id','weight').by(id).by('weight')
==>[id:9,weight:0.4]
==>[id:7,weight:0.5]
==>[id:8,weight:1.0]
==>[id:10,weight:1.0]
==>[id:11,weight:0.4]
==>[id:12,weight:0.2]
or you could use valueMap() - on 3.4.0 you have the with() syntax:
gremlin> g.V().outE().valueMap('weight').with(WithOptions.tokens)
==>[id:9,label:created,weight:0.4]
==>[id:7,label:knows,weight:0.5]
==>[id:8,label:knows,weight:1.0]

What is the Gremlin query that can get me all the vertices either directly or indirectly connected to one specific vertex

I need help with a Gremlin query that can output all the vertices related to one specific vertex A and their cascading related vertices (which means all the vertices related directly or indirectly to A).
For example, in a graph
A -> B -> C
D
Running this query on A will give me B and C.
The solution I have right now is an ugly one:
g.V('A').both(); g.V('A').both().both();
etc
Any help would be really appreciated.
Your solution isn't ugly; it only lacks a bit of iteration and an exit condition.
Do you require a maximum depth? Depending on the shape of your graph, the query you want to execute could be returning all vertices of that graph.
Assuming a toy modern TinkerGraph created in the Gremlin console:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
This query could be helpful:
gremlin> g.V(1).repeat(both().simplePath()).emit().times(3).dedup()
==>v[3]
==>v[2]
==>v[4]
==>v[6]
==>v[5]
"Starting from vertex with id=1, traverse the graph in all directions up to a maximum depth of 3 while discarding previously visited paths. The emit() step ensures that all traversed vertices found along the way, and not just leaves, are returned."
Chances are high that you want to figure out which vertices are linked to that vertex only via specific edges. In such case, you could be passing label(s) to the both() step, and/or maybe chain a few filters.
When developing your query, feel free to chain the path() step to better understand the output.
gremlin> g.V(1).repeat(both().simplePath()).emit().times(3).dedup().path()
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[3],v[6]]
==>[v[1],v[4],v[5]]
There are other ways to solve this, but this query should get you started and familiarize yourself with basic Gremlin steps and concepts.

Gremlin: Count connections ignoring edges with a parallel edge in the opposing direction

I'm currently working with a graph which indicates connections between vertices. The vertices can be connected in both directions. I am interested in knowing how many vertices are connected to each other regardless both the direction of the connection or if connections exist in both directions.
So for example, in the graph sketched below the total number of connected vertices would be 3 (whilst a simple edge count would tell us there are 4
Due to the directionality of the edges this isn’t the same problem solved by the duplicate edge detection provided by the Tinkerpop recipes Is there a Gremlin query which could help with this count?
I’ve included some example data below:
vertex1 = graph.addVertex(“example","vertex1")
vertex2 = graph.addVertex("example","vertex2")
vertex3 = graph.addVertex("example","vertex3")
vertex4 = graph.addVertex("example","vertex4")
vertex1.addEdge("Connected_to",vertex2)
vertex2.addEdge("Connected_to",vertex1)
vertex2.addEdge("Connected_to",vertex3)
vertex3.addEdge("Connected_to",vertex4)
I’m new to the Gremlin language and I’m having trouble creating a query which counts the number of connections between vertices. It would be great to get some help from you guys as I get to grips with the complexities of Graph queries!
You can dedup() by the two vertices ids. Just make sure to have a consistent order of the two vertices (e.g. order by their id), so that the edge direction has no impact.
gremlin> g.E()
==>e[8][0-Connected_to->2]
==>e[9][2-Connected_to->0]
==>e[10][2-Connected_to->4]
==>e[11][4-Connected_to->6]
gremlin> g.E().dedup().by(bothV().order().by(id).fold())
==>e[8][0-Connected_to->2]
==>e[10][2-Connected_to->4]
==>e[11][4-Connected_to->6]
gremlin> g.E().dedup().by(bothV().order().by(id).fold()).count()
==>3

Resources