Gremlin - simultaneously limit traverse iterations and search for edge property - gremlin

I have a (broken) piece of gremlin code to generate the shortest path from a given vertex to one which has the parameter test_parameter. If that parameter is not found on an edge, no paths should be returned.
s.V(377524408).repeat(bothE().has('date', between(1554076800, 1556668800)).otherV()) /* date filter on edges */
.until(or(__.bothE().has('test_property', gt(0)),
loops().is(4))) /* broken logic! */
.path()
.local(unfold().filter(__.has('entity_id')).fold()) /* remove edges from output paths*/
The line that's broken is .until(or(__.outE().has('test_property', gt(0)), loops().is(4))).
At present - and it makes sense as to why - it gives all paths that are 4 hops from the starting vertex.
I'm trying to adapt it so that if the traverse is at 4 iterations, and if the property test_property is not found, then it should not return any paths. If test_property is found, it should return only the path(s) to that vertex.
I've attempted to put a times(4) constraint in and removing the loops() condition, but don't know how to have both the times(4) this and the .has('test_property', gt(0)) constraint.

Daniel's answer has few issues (see comments).
This query returns the correct result:
g.V(377524408)
.repeat(bothE().has('date', between(1554076800, 1556668800)).otherV().simplePath().as("v"))
.until(and(bothE().has('tp', gt(0)), loops().is(lte(4))))
.select(all, "v")
.limit(1)
The simplePath() is required so we won't go back and forth and avoid circles.
The repeat loop is until the condition is met AND we have not reached max hop.
The limit(1) return only the first (shortest) path. Omit to get all paths.
Note that if the graph is directed it is better to use outE() and not bothE().

This should work:
s.V(377524408).
repeat(bothE().has('date', between(1554076800, 1556668800)).otherV().as('v')).
times(4).
filter(bothE().has('test_property', gt(0))).
select(all, 'v')
Also note, that I replaced your local(unfold().filter(__.has('entity_id')).fold()) with something much simpler (assuming that the sole purpose was the removal of edges from the path).

Related

Repeat in gremlin

Two queries related to gremlin are as follows:
Want to stop the traversal when a condition is satisfied during repeated condition check.
g.V().has('label_','A')).emit().repeat(inE().outV()).until(has('stop',1)).project('depth','values').by(valueMap('label_','stop'))
I want the query to stop returning further values when the stop is equal to 1 for the node encountered during the repeat statement. But the query doesn't stop and return all the records.
Output required:
=>{label_='A',stop=0}
=>{label_='B',stop=0}
=>{label_='C',stop=1}
Query to return traversal values in the following format considering if edge exists between them. Considering the graph as A->E1->B->E2->C. The output must be as follows
=> A,E1,B
=> B,E2,C
A, B, C, E1, E2 represents properties respectively where is the starting node
For the first part, it seems you traversing on the in edges and not on the out is this on purpose? if so replace the out() in the repeat to in
g.V().has(label, 'A').emit().
repeat(out()).until(has('stop', 1)).
project('label', 'stop').
by(label).
by(values('stop'))
example: https://gremlify.com/ma2xkkszkzr/1
for the second part, I'm still not sure what you meant if you just want to get all edges with their out and in you can use elementMap:
g.E().elementMap()
example: https://gremlify.com/ma2xkkszkzr/4
and if not supported you can maybe do something like this:
g.E().local(union(
outV(),
identity(),
inV()
).label().fold())
example: https://gremlify.com/ma2xkkszkzr/2

How do I produce output even when there is no edge and when using select for projection

Can someone help me please with this simple query...Many thanks in advance...
I am using the following gremlin query and it works well giving me the original vertex (v) (with id-=12345), its edges (e) and the child vertex (id property). However, say if the original vertex 'v' (with id-12345) has no outgoing edges, the query returns nothing. I still want the properties of the original vertex ('v') even if it has no outgoing edges and a child. How can I do that?
g.V().has('id', '12345').as('v').
outE().as('e').
inV().
as('child_v').
select('v', 'e', 'child_v').
by(valueMap()).by(id).by(id)
There are a couple of things going on here but the major update you need to the traversal is to use a project() step instead of a select().
select() and project() steps are similar in that they both allow you to format the results of a traversal however they differ in (at least) one significant way. select() steps function by allowing you to access previously traversed and labeled elements (via as). project() steps allow you take the current traverser and branch it to manipulate the output moving forward.
In your original traversal, when there are no outgoing edges from original v so all the traversers are filtered out during the outE() step. Since there are no further traversers after the outE() step then remainder of the traversal has no input stream so there is no data to return. If you use a project() step after the original v you're able to return the original traverser as well as return the edges and incident vertex. This does lead to a slight complication when handling cases where no out edges exist. Gremlin does not handle null values, such as no out edges existing, you need to return some constant value for these statements using a coalesce statement.
Here is functioning version of this traversal:
g.V().hasId(3).
project('v', 'e', 'child_v').
by(valueMap()).
by(coalesce(outE().id(), constant(''))).
by(coalesce(out().id(), constant('')))
Currently you will get a lot of duplicate data, in the above query you will get the vertex properties E times. probably will be better to use project:
g.V('12345').project('v', 'children').
by(valueMap()).
by(outE().as('e').
inV().as('child').
select('e', 'child').by(id).fold())
example: https://gremlify.com/a1
You can get the original data format if you do something like this:
g.V('12345').as('v').
coalesce(
outE().as('e').
inV().
as('child_v')
select('v', 'e', 'child_v').
by(valueMap()).by(id).by(id),
project('v').by(valueMap())
)
example: https://gremlify.com/a2

Gremlin strategy to vertex peer sharing exactly the same neighbourhood

Using AWS Neptune, I need to find a traversal strategy that takes one reference vertex, and by traversing along one edge type, it finds another vertices that has exactly the same neighbors, ie. not more, not less.
g.V('1').as('ref_vertex').out('created').as('creations').in('created')
Finds vertices that also created the same things as "1", but it also scopes in vertexes (a) that created something else too, and also (b) those that did not create everything that "1" created.
g.V('1').as('ref_vertex')
.out('created').as('creations').in('created')
.not(out('created').where(neq('creations'))
Helps only problem (a), getting rid of persons created something extra.
How to i continue this query to skip (b) vertices from the result ?
g.V('1').aggregate('ref_vertex').
out('created').
sideEffect(aggregate('neighbors')). /* get neighbors of 'ref_vertex' */
in('created').groupCount().unfold(). /* group count in('created') by its occurrence times */
as('candidate_shared_neighbor_cnt_pair').
where('candidate_shared_neighbor_cnt_pair', eq('neighbors')). /* select only the vertices have the same occurrence times as 'ref_vertex' */
by(select(values)).
by(unfold().count()).
select(keys).
where(without('ref_vertex'))

Gremlin: Add edges to multiple vertices

I have vertices [song1, song2, song3, user].
I want to add edges listened from user to the songs.
I have the following:
g.V().is(within(song1, song2, song3)).addE('listened').from(user)
However I'm getting the following error:
No signature of method: org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal.from() is applicable for argument types: (org.janusgraph.graphdb.vertices.CacheVertex) values: [v[4344]]
Possible solutions: sort(), drop(int), sum(), find(), grep(), sort(groovy.lang.Closure)
Of course, I can iterate through them one at a time instead but a single query would be nice:
user.addEdge('listened', song1)
user.addEdge('listened', song2)
user.addEdge('listened', song3)
The from() modulator accepts two things:
a step label or
a traversal
A single vertex or a list of vertices can easily be turned into a traversal by wrapping it in V(). Also, note that g.V().is(within(...)) will most likely end up being a full scan over all vertices; it pretty much depends on the provider implementation, but you should prefer to use g.V(<list of vertices>) instead. Thus your traversal should look more like any of these:
g.V().is(within(song1, song2, song3)).
addE('listened').from(V(user)) // actually bad, as it's potentially a full scan
g.V(song1, song2, song3).
addE('listened').from(V(user))
g.V(user).as('u').
V(within(song1, song2, song3)).
addE('listened').from('u')

Time Complexity of Adding Edge to Graph using Adjacency List

I've been studying up on graphs using the Adjacency List implementation, and I am reading that adding an edge is an O(1) operation.
This makes sense if you are just tacking an edge on to the Vertex's Linked List of edges, but I don't understand how this could be the case so long as you care about removing the old edge if one already exists. Finding that edge would take O(V) time.
If you don't do this, and you add an edge that already exists, you would have duplicate entries for that edge, which means they could have different weights, etc.
Can anyone explain what I seem to be missing? Thanks!
You're right at your complecxity analysis. Find if edge already exist is truly O(V). But notice that adding this edge even if existed is still O(1).
You need to remember that having 2 edges with the same source an destination are valid input to graph - even with different weights (maybe not even but because).
That way adding edge to adjacency-list-graph is O(1)
What people usually do to have both optimal search time complexity and the advantages of adjacency lists is to use an array of hashsets instead of an array of lists.
Alternatively,
If you want a worst-case optimal solution, use RadixSort to order the
list of all edges in O(v+e) time, remove duplicates, and then build
the adjacency list representation in the usual way.
source: https://www.quora.com/What-are-the-various-approaches-you-can-use-to-build-adjacency-list-representation-of-a-undirected-graph-having-time-complexity-better-than-O-V-*-E-and-avoiding-duplicate-edges

Resources