TinkerPop Gremlin Repeat until nodes with specified label - gremlin

In my graph there are 2 type of labels: a and b and a boolean property travel_by.
I would like to perform BFS (with max depth of 5): start from a given node and get all the first nodes with label a.
I tried to do something like this:
g.V(<node_to_start_from>).repeat(__.both().has("travel_by", True).simplePath())
.times(5)
.until(__.hasLabel('a')).toList()
But this query is stuck for a really long time (even if I change to times(2))

A common way to do this is to use loops
g.V(<node_to_start_from>).
repeat(__.both().has("travel_by", True).simplePath()).
until(__.hasLabel('a').or().loops().is(5)).
hasLabel('a').
toList()
The second has prevents anything other than items with a label of 'a' from being part of the result in cases where loops has reached 5.

Related

How to query hops using PGQL?

We are trying to write PGQL query to get multiple hop(s) for selected node.
To get nodes and edge for the selected node,
SELECT n0.id as n0id, e0.id as e0id, n1.id as n1id FROM MATCH (n0)->[e0]->(n1) WHERE n0.id=12345
To increase nodes and edges result, example 2 hops,
... FROM MATCH (n0)->[e0]->(n1)->[e1]->(n3) ...
However in this case, nodes with 1 hop will not be return.
I wonder if there is any way query required hop(s) for selected node?
Any solution would be appreciated.
It seems you are looking for the PGQL syntax for variable-length paths:
https://pgql-lang.org/spec/1.4/#variable-length-paths
Among these patterns, I suppose the reachability syntax is useful in your case.
SELECT n0.id as n0id, e0.id as e0id, n1.id as n1id
FROM MATCH (n0)->/:edge-label{1,3}/->(n1)
WHERE n0.id=12345
https://pgql-lang.org/spec/1.4/#between-n-and-m
Thanks!

Repeat in gremlin

Two queries related to gremlin are as follows:
Want to stop the traversal when a condition is satisfied during repeated condition check.
g.V().has('label_','A')).emit().repeat(inE().outV()).until(has('stop',1)).project('depth','values').by(valueMap('label_','stop'))
I want the query to stop returning further values when the stop is equal to 1 for the node encountered during the repeat statement. But the query doesn't stop and return all the records.
Output required:
=>{label_='A',stop=0}
=>{label_='B',stop=0}
=>{label_='C',stop=1}
Query to return traversal values in the following format considering if edge exists between them. Considering the graph as A->E1->B->E2->C. The output must be as follows
=> A,E1,B
=> B,E2,C
A, B, C, E1, E2 represents properties respectively where is the starting node
For the first part, it seems you traversing on the in edges and not on the out is this on purpose? if so replace the out() in the repeat to in
g.V().has(label, 'A').emit().
repeat(out()).until(has('stop', 1)).
project('label', 'stop').
by(label).
by(values('stop'))
example: https://gremlify.com/ma2xkkszkzr/1
for the second part, I'm still not sure what you meant if you just want to get all edges with their out and in you can use elementMap:
g.E().elementMap()
example: https://gremlify.com/ma2xkkszkzr/4
and if not supported you can maybe do something like this:
g.E().local(union(
outV(),
identity(),
inV()
).label().fold())
example: https://gremlify.com/ma2xkkszkzr/2

How do I produce output even when there is no edge and when using select for projection

Can someone help me please with this simple query...Many thanks in advance...
I am using the following gremlin query and it works well giving me the original vertex (v) (with id-=12345), its edges (e) and the child vertex (id property). However, say if the original vertex 'v' (with id-12345) has no outgoing edges, the query returns nothing. I still want the properties of the original vertex ('v') even if it has no outgoing edges and a child. How can I do that?
g.V().has('id', '12345').as('v').
outE().as('e').
inV().
as('child_v').
select('v', 'e', 'child_v').
by(valueMap()).by(id).by(id)
There are a couple of things going on here but the major update you need to the traversal is to use a project() step instead of a select().
select() and project() steps are similar in that they both allow you to format the results of a traversal however they differ in (at least) one significant way. select() steps function by allowing you to access previously traversed and labeled elements (via as). project() steps allow you take the current traverser and branch it to manipulate the output moving forward.
In your original traversal, when there are no outgoing edges from original v so all the traversers are filtered out during the outE() step. Since there are no further traversers after the outE() step then remainder of the traversal has no input stream so there is no data to return. If you use a project() step after the original v you're able to return the original traverser as well as return the edges and incident vertex. This does lead to a slight complication when handling cases where no out edges exist. Gremlin does not handle null values, such as no out edges existing, you need to return some constant value for these statements using a coalesce statement.
Here is functioning version of this traversal:
g.V().hasId(3).
project('v', 'e', 'child_v').
by(valueMap()).
by(coalesce(outE().id(), constant(''))).
by(coalesce(out().id(), constant('')))
Currently you will get a lot of duplicate data, in the above query you will get the vertex properties E times. probably will be better to use project:
g.V('12345').project('v', 'children').
by(valueMap()).
by(outE().as('e').
inV().as('child').
select('e', 'child').by(id).fold())
example: https://gremlify.com/a1
You can get the original data format if you do something like this:
g.V('12345').as('v').
coalesce(
outE().as('e').
inV().
as('child_v')
select('v', 'e', 'child_v').
by(valueMap()).by(id).by(id),
project('v').by(valueMap())
)
example: https://gremlify.com/a2

Traverse Graph With Directed Cycles using Relationship Properties as Filters

I have a Neo4j graph with directed cycles. I have had no issue finding all descendants of A assuming I don't care about loops using this Cypher query:
match (n:TEST{name:"A"})-[r:MOVEMENT*]->(m:TEST)
return n,m,last(r).movement_time
The relationships between my nodes have a timestamp property on them, movement_time. I've simulated that in my test data below using numbers that I've imported as floats. I would like to traverse the graph using the timestamp as a constraint. Only follow relationships that have a greater movement_time than the movement_time of the relationship that brought us to this node.
Here is the CSV sample data:
from,to,movement_time
A,B,0
B,C,1
B,D,1
B,E,1
B,X,2
E,A,3
Z,B,5
C,X,6
X,A,7
D,A,7
Here is what the graph looks like:
I would like to calculate the descendants of every node in the graph and include the timestamp from the last relationship using Cypher; so I'd like my output data to look something like this:
Node:[{Descendant,Movement Time},...]
A:[{B,0},{C,1},{D,1},{E,1},{X,2}]
B:[{C,1},{D,1},{E,1},{X,2},{A,7}]
C:[{X,6},{A,7}]
D:[{A,7}]
E:[{A,3}]
X:[{A,7}]
Z:[{B,5}]
This non-Neo4J implementation looks similar to what I'm trying to do: Cycle enumeration of a directed graph with multi edges
This one is not 100% what you want, but very close:
MATCH (n:TEST)-[r:MOVEMENT*]->(m:TEST)
WITH n, m, r, [x IN range(0,length(r)-2) |
(r[x+1]).movement_time - (r[x]).movement_time] AS deltas
WHERE ALL (x IN deltas WHERE x>0)
RETURN n, collect(m), collect(last(r).movement_time)
ORDER BY n.name
We basically find all the paths between any of your nodes (beware cartesian products get very expensive on non-trivial datasets). In the WITH we're building a collection delta's that holds the difference between two subsequent movement_time properties.
The WHERE applies an ALL predicate to filter out those having any non-positive value - aka we guarantee increasing values of movement_time along the path.
The RETURN then just assembles the results - but not as a map, instead one collection for the reachable nodes and the last value of movement_time.
The current issue is that we have duplicates since e.g. there are multiple paths from B to A.
As a general notice: this problem is much more elegantly and more performant solvable by using Java traversal API (http://neo4j.com/docs/stable/tutorial-traversal.html). Here you would have a PathExpander that skips paths with decreasing movement_time early instead of collection all and filter out (as Cypher does).

Depth First Search

Perform Depth-first Search on the graph shown starting with vertex a. When you traverse the neighbours, process them in alphabetical order.
The question is to find the DFI, Level and the Parent of each vertex.
Here is a picture of it:
I'm unsure of how to get going with this, it is a practice question for an upcoming exam. I know for depth first search, it uses a stack and it will start at vertex a and go in alphabetical order in the stack but i'm not sure how I would get the values for each of the columns. Can someone explain further or help me with this?
So you start at 'a' and must traverse the nodes in alphabetical order so from a you either have the option of going to b or g so you choose b because it is first alphabetically. from b your only choice is g and so on....
now for your values. the parent of a is null since you have no previous nodes the parent of b is a and the parent of g is b and so on.
the dfs level is the level that it would end up on a tree. so imagine that you do your traversal then erase all lines that weren't part of the traversal. and then you take your root and 'shake it out' what i mean is you rearrange it so that it looks like a tree. (this particular graph is very uninteresting) and then you assign levels based on that tree.
And the dfs index is simply the order in which you touched the nodes.
The folowing are for your graph but using g as a starting point....I think it makes it slightly more intersting
the numbers are the order in which the edges were taken.
Here is what i was talking about when i said 'shake it out' this is what your tree looks like and in blue i show the level of each node(0 based). I hope the images make it a little more understandable.
the one i drew( the terrible free hand one) was formed by deleting all of the edges that weren't used and then rearranging them to look like a tree.
You can think of the depth as how many steps did i have to take from the root to get to the current node. so from g to b is 1 step so depth of 1 from g to i 3 because we go from g->c->d->i 3 steps. after you have made your traversal you ignore the fact that you can in fact get from g to i in two steps(g->h->i) because it wasnt part of the traversal
The index is simply the number in order that the node is visited. a is first, write 1 there. Knowing depth first search as you do, you should know what the second node is; so write 2 under that. Depth is how high a node is; every time you deepen the depth, it increases, and whenever you go shallower, it's less. So a is on depth 1; the next node and its sister will be on depth 2, etc. The parent is the letter identifying the node that you just came from; so a has no parent, and the node with index 2 will have a as parent.
If your class uses a zero-based numbering system, replace 2 in the above paragraph with 1, and 1 with 0. If you have no idea what "zero-based numbering system" is, ignore this paragraph.

Resources