Gremlin find all connected Vertices over n levels - gremlin

I am basically trying to search all the connected Vertices for a node type, the Cypher query version gives me the expected result, but the Gremlin version is not giving me the intended result. Any thing that I am doing incorrectly??
Visual Representation of my data
Cyher Query to fetch all the connections
MATCH p=shortestPath((n:Process)-[*]-(m:Process))
WHERE n <> m
RETURN ID(n), n, ID(m), m, length(p)
Gremlin version
gremlin> g.V().hasLabel('Process')
.repeat(both().simplePath())
.until(hasLabel('Process'))
.path().by('title')
==>[Cash Processing,Accounting]
==>[Cash Processing,Sales]
==>[Sales,Marketing]
==>[Sales,Cash Processing]
==>[Marketing,Accounting]
==>[Marketing,Sales]
==>[Accounting,Cash Processing]
==>[Accounting,Marketing]
Any idea why Gremlin is not catching the 'Cash Processing'->'Sales'->'Marketing' connection???
I got a feeling something needs a change in that until() function, but cant figure out what

You don't talk about the labels of your vertices, but to me, it seems like the Sales vertex already fulfills the until(HasLabel('Process')) stop condition.
The correct translation of your Cypher query would be something more like this:
g.V().hasLabel('Process').as('n').
repeat(both().simplePath()).
emit(hasLabel('Process')).as('m').
dedup('n','m').
path().count(local).as('len').
select('m','n','len')

Related

Gremlin - Showing the shortest (lowest cost) path in with meaningful information

I am trying to make Gremlin show me the shortest path (regarding the cost, not the number vertices traveled) with meaningful information. There is a similar example in [Gremlin's Recipes]http://tinkerpop.apache.org/docs/3.2.1-SNAPSHOT/recipes/#shortest-path about how one can get all the paths and their respective costs from one vertex to another, but I can not find a way to get Gremlin to display meaningful information like names or age of vertices and weight of edges. For example one can not know who v[1} is from the result below.
gremlin> g.V(1).repeat(outE().inV().simplePath()).until(hasId(5)).
path().as('p').
map(unfold().coalesce(values('weight'),
constant(0.0)).sum()).as('cost').
select('cost','p') //(4)
==>[cost:3.00, p:[v[1], e[0][1-knows->2], v[2], e[1][2-knows->4], v[4], e[2][4-knows->5], v[5]]]
==>[cost:2.00, p:[v[1], e[0][1-knows->2], v[2], e[3][2-knows->3], v[3], e[4][3-knows->4], v[4], e[2][4-knows->5], v[5]]]
I know Gremlin supports a by()-step modulator for such task as in here:
gremlin> g.V().out().out().path().by('name').by('age')
==>[marko,32,ripple]
==>[marko,32,lop]
, but I could not figure out how to combine the two solutions. Ideally the result I am looking for should be like this:
==>[duration:2, path:[Chicago, supertrain, New York]]
Any suggestions? Many Thanks in advance!
You can add by modulator after the path step, and change the values into select:
g.V().hasLabel('A').repeat(outE().inV().
simplePath()).
until(hasLabel('C')).path().
by(valueMap().with(WithOptions.tokens)).as('p').
map(unfold().
coalesce(
select('distance'),
constant(0.0)
).sum()).
as('cost').
select('cost', 'p')
example: https://gremlify.com/2wk6e3d03fe

How do I produce output even when there is no edge and when using select for projection

Can someone help me please with this simple query...Many thanks in advance...
I am using the following gremlin query and it works well giving me the original vertex (v) (with id-=12345), its edges (e) and the child vertex (id property). However, say if the original vertex 'v' (with id-12345) has no outgoing edges, the query returns nothing. I still want the properties of the original vertex ('v') even if it has no outgoing edges and a child. How can I do that?
g.V().has('id', '12345').as('v').
outE().as('e').
inV().
as('child_v').
select('v', 'e', 'child_v').
by(valueMap()).by(id).by(id)
There are a couple of things going on here but the major update you need to the traversal is to use a project() step instead of a select().
select() and project() steps are similar in that they both allow you to format the results of a traversal however they differ in (at least) one significant way. select() steps function by allowing you to access previously traversed and labeled elements (via as). project() steps allow you take the current traverser and branch it to manipulate the output moving forward.
In your original traversal, when there are no outgoing edges from original v so all the traversers are filtered out during the outE() step. Since there are no further traversers after the outE() step then remainder of the traversal has no input stream so there is no data to return. If you use a project() step after the original v you're able to return the original traverser as well as return the edges and incident vertex. This does lead to a slight complication when handling cases where no out edges exist. Gremlin does not handle null values, such as no out edges existing, you need to return some constant value for these statements using a coalesce statement.
Here is functioning version of this traversal:
g.V().hasId(3).
project('v', 'e', 'child_v').
by(valueMap()).
by(coalesce(outE().id(), constant(''))).
by(coalesce(out().id(), constant('')))
Currently you will get a lot of duplicate data, in the above query you will get the vertex properties E times. probably will be better to use project:
g.V('12345').project('v', 'children').
by(valueMap()).
by(outE().as('e').
inV().as('child').
select('e', 'child').by(id).fold())
example: https://gremlify.com/a1
You can get the original data format if you do something like this:
g.V('12345').as('v').
coalesce(
outE().as('e').
inV().
as('child_v')
select('v', 'e', 'child_v').
by(valueMap()).by(id).by(id),
project('v').by(valueMap())
)
example: https://gremlify.com/a2

Gremlin: Find all nodes from one set with a connection to another

Given two Gremlin queries q1 and q2 and their results ri = qi.toSet(), I want to find all nodes in r1 that have a connection to a node in r2 - ignoring edge labels and direction.
My current approach included the calculation of shortest paths between the two result sets:
q1.shortestPath().with_(ShortestPath.target, q2).toList()
However, I found the shortest path calculation in Tinkerpop is unsuitable for this purpose because the result will be empty if there are nodes in r1 without any connection to any node in r2.
Instead, I thought about connected components, but the connectedComponents() step will yield all connected components found and I would have to filter them to find the connected component that meets the above requirements.
Do you have suggestions on how I could tackle this problem in gremlin-python?
Here is one way of doing what I think you need in Gremlin Python. This may or may not be efficient depending on the size and shape of your graph. In my test graph only vertices 1,2 and 3 have a route to either 12 or 13. This example does not show you how you got there, just that at least one path exists (if any exist).
>>> ids = g.V('1','2','3','99999').id().toList()
>>> ids
['1', '2', '3', '99999']
>>> ids2 = g.V('12','13').id().toList()
>>> ids2
['12', '13']
>>> g.V(ids).filter(__.repeat(__.out().simplePath()).until(__.hasId(within(ids2))).limit(1)).toList()
[v[1], v[2], v[3]]
You can also use dedup() instead of simplePath() and limit() if you only care that any route exists.
g.V(ids).filter(__.repeat(__.out().dedup()).until(__.hasId(within(ids2)))).toList()

Add edge if not exist using gremlin

I'm using cosmos graph db in azure.
Does anyone know if there is a way to add an edge between two vertex only if it doesn't exist (using gremlin graph query)?
I can do that when adding a vertex, but not with edges. I took the code to do so from here:
g.Inject(0).coalesce(__.V().has('id', 'idOne'), addV('User').property('id', 'idOne'))
Thanks!
It is possible to do with edges. The pattern is conceptually the same as vertices and centers around coalesce(). Using the "modern" TinkerPop toy graph to demonstrate:
gremlin> g.V().has('person','name','vadas').as('v').
V().has('software','name','ripple').
coalesce(__.inE('created').where(outV().as('v')),
addE('created').from('v').property('weight',0.5))
==>e[13][2-created->5]
Here we add an edge between "vadas" and "ripple" but only if it doesn't exist already. the key here is the check in the first argument to coalesce().
The performance of the accepted answer isn't great since it use inE(...), which is an expensive operation.
This query is what I use for my work in CosmosDB:
g.E(edgeId).
fold().
coalesce(
unfold(),
g.V(sourceId).
has('pk', sourcePk).
as('source').
V(destinationId).
has('pk', destinationPk).
addE(edgeLabel).
from('source').
property(T.id, edgeId)
)
This uses the id and partition keys of each vertex for cheap lookups.
I have been working on similar issues, trying to avoid duplication of vertices or edges. The first is a rough example of how I check to make sure I am not duplicating a vertex:
"g.V().has('word', 'name', '%s').fold()"
".coalesce(unfold(),"
"addV('word')"
".property('name', '%s')"
".property('pos', '%s')"
".property('pk', 'pk'))"
% (re.escape(category_),re.escape(category_), re.escape(pos_))
The second one is the way I can make sure that isn't a directional edge in either direction. I make use of two coalesce statements, one nested inside the other:
"x = g.V().has('word', 'name', '%s').next()\n"
"y = g.V().has('word', 'name', '%s').next()\n"
"g.V(y).bothE('distance').has('weight', %f).fold()"
".coalesce("
"unfold(),"
"g.addE('distance').from(x).to(y).property('weight', %f)"
")"
% (word_1, word_2, weight, weight)
So, if the edge exists y -> x, it skips producing another one. If y -> x doesn't exist, then it tests to see if x -> y exists. If not, then it goes to the final option of creating x -> y
Let me know if anyone here knows of a more concise solution. I am still very new to gremlin, and would love a cleaner answer. Though, this one appears to suffice.
When I implemented the previous solutions provided, when I ran my code twice, it produced an edge for each try, because it only tests one direction before creating a new edge.

Use Gremlin to find the shortest path in a graph avoiding a given list of vertices?

I need to use Gremlin find the shortest path between two nodes (vertices) while avoiding a list of given vertices.
I already have:
v.bothE.bothV.loop(2){!it.object.equals(y)}.paths>>1
To get my shortest path.
I was attempting something like:
v.bothE.bothV.filter{it.name!="ignored"}.loop(3){!it.object.equals(y)}.paths>>1
but it does not seem to work.
Please HELP!!!
The second solution you have looks correct. However, to be clear on what you are trying to accomplish. If x and y are the vertices that you want to find the shortest path between and a vertex to ignore during the traversal if it has the property name:"ignored", then the query is:
x.both.filter{it.name!="ignored"}.loop(2){!it.object.equals(y)}.paths>>1
If the "list of given vertices" you want filtered is actually a list, then the traversal is described as such:
list = [ ... ] // construct some list
x.both.except(list).loop(2){!it.object.equals(y)}.paths>>1
Moreover, I tend to use a range filter just to be safe as this will go into an infinite loop if you forget the >>1 :)
x.both.except(list).loop(2){!it.object.equals(y)}[1].paths>>1
Also, if there is a potential for no path, then to avoid an infinitely long search, you can do a loop limit (e.g. no more than 4 steps):
x.both.except(list).loop(2){!it.object.equals(y) & it.loop < 5}.filter{it.object.equals(y)}.paths>>1
Note why the last filter step before paths is needed. There are two reasons the loop is broken out of. Thus, you might not be at y when you break out of the loop (instead, you broke out of the loop because it.loops < 5).
Here is you solution implemented over the Grateful Dead graph distributed with Gremlin. First some set up code, where we load the graph and define two vertices x and y:
gremlin> g = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> g.loadGraphML('data/graph-example-2.xml')
==>null
gremlin> x = g.v(89)
==>v[89]
gremlin> y = g.v(100)
==>v[100]
gremlin> x.name
==>DARK STAR
gremlin> y.name
==>BROWN EYED WOMEN
Now your traversal. Note that there is not name:"ignored" property, so instead, I altered it to account for the number of performances of each song along the path. Thus, shortest path of songs played more than 10 times in concert:
gremlin> x.both.filter{it.performances > 10}.loop(2){!it.object.equals(y)}.paths>>1
==>v[89]
==>v[26]
==>v[100]
If you use Gremlin 1.2+, then you can use a path closure to provide the names of those vertices (for example) instead of just the raw vertex objects:
gremlin> x.both.filter{it.performances > 10}.loop(2){!it.object.equals(y)}.paths{it.name}>>1
==>DARK STAR
==>PROMISED LAND
==>BROWN EYED WOMEN
I hope that helps.
Good luck!
Marko.

Resources