Gremlin DFS/BFS Search while avoiding Loops - azure-cosmosdb

In Gremlin I am trying to get find all the connected nodes in my graph - Using either BFS or DFS I am not worried about the traversal method as I will have a list of edges that will show the connections between the nodes where the output would be something like
[
Nodes : [ {id : 1, name: "abc"}, "{id: 2, name : "pqr"],
Edges : [ {id : 100, label : ParentOf, from : 1, to : 2 }, {id : 101, label : ChildOf, from : 2, to : 1 }]
]
My Graph Looks something like this
My issues are with cycles - I am trying to emit only the nodes that are connected, say I start with the node 1
g.V('a07771c3-8657-4535-8302-60bcdac5b753').repeat(out('knows')).until(__.not(outE('knows'))).path().
unfold().dedup().id().fold()
I end up with the error
Gremlin Query Execution Error: Exceeded maximum number of loops on a repeat() step. Cannot exceed 32 loops. Recommend limiting the number of loops using times(n) step or with a loops() condition
I am looking for a way where the query skips the nodes that are already emited? Not exactly sure how to do that

The simplePath step can be used to prevent cycles.
g.V('a07771c3-8657-4535-8302-60bcdac5b753').
repeat(out('knows').simplePath()).
until(__.not(outE('knows'))).path().
unfold().
dedup().
id().
fold()

Related

Multiple add if doesn't exist steps Gremlin

I have an injected array of values. I'm I want to add vertices if they don't exist. I use the fold and coalesce step, but it doesn't work in this instance since I'm trying to do it for multiple vertices. Since 1 vertex exists I can no longer get a null value, and the the unfold inside the coalesce step returns a value from there on. This leads to vertices that don't exist yet not to be added.
This is my current traversal:
const traversal = await g
?.inject([
{ twitterPostId: 'kay', like: true, retweet: false },
{ twitterPostId: 'fay', like: true, retweet: false },
{ twitterPostId: 'nay', like: true, retweet: false },
])
.unfold()
.as('a')
.aggregate('ta')
.V()
.as('b')
.where('b', p.eq('a'))
.by(__.id())
.by('twitterPostId')
.fold()
.coalesce(__.unfold(), __.addV().property(t.id, __.select('ta').unfold().select('twitterPostId')))
.toList();
Returns:
[Bn { id: 'kay', label: 'vertex', properties: undefined }]
Without using coalesce you can do conditional upserts using what we often refer to as "map injection". The Gremlin does get a little advanced, but here is an example
g.withSideEffect('ids',['3','4','xyz','abc']).
withSideEffect('p',['xyz': ['type':'dog'],'abc':['type':'cat']]).
V('3','4','xyz','abc').
id().fold().as('found').
select('ids').
unfold().
where(without('found')).as('missing').
addV('new-vertex').
property(id,select('missing')).
property('type',select('p').select(select('missing')).select('type'))
That query will look for a set of vertices, figure out which ones exist, and for the rest use the ID values and properties from the map called 'p' to create the new vertices. You can build on this pattern a great many ways and I find it very useful until mergeV and mergeE are more broadly available
You can also use the list of IDs in the query to check which ones exist. However, this may lead to inefficient query plans depending on the given implementation:
g.withSideEffect('ids',['3','4','xyz','abc']).
withSideEffect('p',['xyz': ['type':'dog'],'abc':['type':'cat']]).
V().
where(within('ids')).
by(id).
by().
id().fold().as('found').
select('ids').
unfold().
where(without('found')).as('missing').
addV('new-vertex').
property(id,select('missing')).
property('type',select('p').select(select('missing')).select('type'))
This is trickier than the first query, as the V step cannot take a traversal. So you cannot do V(select('ids')) in Gremlin today.

Optimizing gremlin query to avoid multiple traversals of graph

I am little new to Gremlin query paradigm. I have following gremlin query to get all the nodes related to node of type foo.
g.V().hasLabel('foo').as('foo')
.coalesce(out('hasBar'), constant('')).as('bar')
.select('foo').coalesce(out('hasDelta'), constant('')).as('Delta')
.select('foo').coalesce(out('hasBar').out('hasGamma'), constant('')).as('Gamma')
.select('foo', 'bar', 'Delta', 'Gamma')
However this is not the optimized one as I have to traverse the graph multiple times and slows down the query execution.
Edit
Sample Data -
g.addV('foo').property('id', '1').property('p1', '1234').property('pk', 1)
g.addV('bar').property('id', '2').property('p2', '12345').property('pk', 1)
g.addV('Gamma').property('id', '3').property('p3', '123').property('pk', 1)
g.addV('Delta').property('id', '4').property('p4', '12').property('pk', 1)
g.V('1').addE("hasBar").to(g.V('2'))
g.V('1').addE("hasGamma").to(g.V('3'))
g.V('2').addE("hasDelta").to(g.V('4'))
g.addV('foo').property('id', '5').property('p1', '12345').property('pk', 1)
g.V('5').addE("hasBar").to(g.V('2'))
g.V('5').addE("hasGamma").to(g.V('3'))
g.addV('foo').property('id', '6').property('p1', '1').property('pk', 1)
g.V('6').addE("hasBar").to(g.V('2'))
g.V('6').addE("hasGamma").to(g.V('3'))
g.addV('foo').property('id', '7').property('p1', '145').property('pk', 1)
g.V('7').addE("hasBar").to(g.V('2'))
g.V('7').addE("hasGamma").to(g.V('3'))
g.addV('foo').property('id', '8').property('p1', '15').property('pk', 1)
g.addV('bar').property('id', '9').property('p2', '78').property('pk', 1)
g.addV('Gamma').property('id', '10').property('p3', '1236').property('pk', 1)
g.addV('Delta').property('id', '11').property('p4', '1258').property('pk', 1)
g.V('8').addE("hasBar").to(g.V('9'))
g.V('8').addE("hasGamma").to(g.V('10'))
g.V('10').addE("hasDelta").to(g.V('11'))
Previously I was fetching all foo and then was querying the corresponding bar, gamma and delta, which is very inefficient, so changed the query to fetch all at once, but now I am doing the same thing, but avoiding network calls.
Above query gives following response -
[
{
foo: {},
bar: {},
Delta: {},
Gamma: {}
},
{
foo: {},
bar: {},
Delta: {},
Gamma: {}
}
]
You could just take advantage of labels and use the path step:
g.V().hasLabel('foo').
outE('hasBar','hasDelta','hasGamma').
inV().
path().by(label)
If you want to identify the vertices by a property or their ID adding a second by modulator after the path step will do that.
g.V().hasLabel('foo').
outE('hasBar','hasDelta','hasGamma').
inV().
path().
by(id).
by(label)
The paths returned will be of the form (I just assumed numeric IDs):
[1,hasBar,10]
[1,hasDelta,15]
[1,hasGamma,27]

Trouble simultaneously fetching filtered vertices and unfiltered vertices count

I'm trying to return a limited number of vertices matching a pattern, as well as the total (non-limited) count of vertices matching that pattern.
g.V()
.hasLabel("PersonPublic")
.has('partitionKey', "Q2r1NaG6KWdScX4RaeZs")
.has('docId', "Q2r1NaG6KWdScX4RaeZs")
.out("CONTACT_LIST")
.out("SUBSCRIBER")
.dedup()
.order()
.by("identifier")
.by("docId")
.fold()
.project('people','total')
.by(
unfold()
.has('docId', gt("23")),
.limit(2)
.project('type','id')
.by(label())
.by(values('docId'))
)
.by(unfold().count())
In plain English, I'm finding a person, finding all the contact lists of that person, finding all the subscribers to those contact lists, de-duplicating the subscribers, ordering the subscribers, pausing there to collect everything and then projecting the results in the form
{
people: [{type: string, id: string}],
total: number,
}
The "people" part of the projection is unfolded, filtered to only contain results with a "docId" greater than "23", limited to 2, and then projected again.
The "total" part of the projection is unfolded (no-limit) and counted.
My goal is to allow paging through a pattern while still retrieving the total number of vertices associated with the pattern.
Unfortunately, on cosmosdb this query is not working. Results are in the form
{
people: {type: string, id: string},
total: number,
}
And only the first person result is returned (rather than an array).
Any help would be greatly appreciated!
You need to fold() the projected value again, otherwise, it's always gonna be trimmed to the first one. Also, for the total you don't need to unfold(), that's just a waste of resources.
g.V()
.hasLabel("PersonPublic")
.has('partitionKey', "Q2r1NaG6KWdScX4RaeZs")
.has('docId', "Q2r1NaG6KWdScX4RaeZs")
.out("CONTACT_LIST")
.out("SUBSCRIBER")
.dedup()
.order()
.by("identifier")
.by("docId")
.fold()
.project('people','total')
.by(
unfold()
.has('docId', gt("23"))
.limit(2)
.project('type','id')
.by(label)
.by('docId')
.fold()
)
.by(count(local))

Get neighborhood of a starting node ArangoDB

I'm using ArangoDB 3.2.25. I want to extract neighbors from a starting node.
Here is what I tried:
FOR x IN 1..1
ANY "vert1/5001" Col_edge_L
RETURN x
but I'm getting missing vert2.
Here is the schema of the collection
{"_from":"vert1/560","_to":"vert2/5687768","id":771195,"score":218}
What you do in your query is to start at the vertex with key 5001 from the collection vert1 and follow all edges stored in collection Col_edge_L in any direction (so _from or _to equal to vert1/5001).
If there are edges in Col_edge_L like
{ "_from": "vert1/5001", "_to": "vert1/789" }
{ "_from": "vert2/44", "_to": "vert1/5001" }
then the result should be:
[
{ "_id": "vert2/44", ... },
{ "_id": "vert1/789", ... }
]
Exception: if the vertex collections exist, but not the vertices referenced in the _from and _to properties of the edges, the traversal will work but return null for the missing vertices (x variable).
The edge you posted in your question does not reference the starting vertex vert1/5001, so it wouldn't be followed and no vertex returned from this edge. If you miss vertices in the result, there might simply be no edges that link the starting vertex to another document.

Generic traversal of a directed tree with Neo4J

I modelled a directed tree structure using the graph database Neo4J. So I have something like this: http://ouwarovite.net/YAPC/220px-Binary_tree.svg.png (not mandatory binary)
Users of my database can add child nodes of existing nodes at will, so the height of the tree and the degree of the single nodes is unknown.
Now, I want to query my tree like this: Starting with node x, give me all leaves that are descendants of leave x.
Is this kind of query performable with Gremlin or Cypher and if so, how to do this achieving the maximum of performance? (I haven't found a possibility to perform queries on 'generic' trees because you alway have to specify a maximum depth)
I know, that it's possible with the REST / JSON framework and the JAVA API like this:
POST /db/data/node/51/traverse/node
{
"return_filter" :
{
"body" : "position.endNode().hasProperty('leave')" ,
"language" : "javascript"
},
"relationships" : [ { "type" : "_default", "direction" : "out" } ] ,
"prune_evaluator" : { "name" : "none" , "language" : "builtin" }
}
(my leaves have the property 'leave', my edges have no type -> so _default)
Is there a simpler / better way to do this maybe with a better performance?
Cypher could look like that:
start root=node({rootId})
match root-[*]->child
where child.leave
return child
rootId being a parameter to be passed in.

Resources