Gremlin: Add edges to multiple vertices - gremlin

I have vertices [song1, song2, song3, user].
I want to add edges listened from user to the songs.
I have the following:
g.V().is(within(song1, song2, song3)).addE('listened').from(user)
However I'm getting the following error:
No signature of method: org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal.from() is applicable for argument types: (org.janusgraph.graphdb.vertices.CacheVertex) values: [v[4344]]
Possible solutions: sort(), drop(int), sum(), find(), grep(), sort(groovy.lang.Closure)
Of course, I can iterate through them one at a time instead but a single query would be nice:
user.addEdge('listened', song1)
user.addEdge('listened', song2)
user.addEdge('listened', song3)

The from() modulator accepts two things:
a step label or
a traversal
A single vertex or a list of vertices can easily be turned into a traversal by wrapping it in V(). Also, note that g.V().is(within(...)) will most likely end up being a full scan over all vertices; it pretty much depends on the provider implementation, but you should prefer to use g.V(<list of vertices>) instead. Thus your traversal should look more like any of these:
g.V().is(within(song1, song2, song3)).
addE('listened').from(V(user)) // actually bad, as it's potentially a full scan
g.V(song1, song2, song3).
addE('listened').from(V(user))
g.V(user).as('u').
V(within(song1, song2, song3)).
addE('listened').from('u')

Related

Gremlin: limit by vertex label

Hello dear gremlin jedi,
I have a bunch of nodes with different labels in my graph:
g.addV('book')
.addV('book')
.addV('book')
.addV('movie')
.addV('movie')
.addV('movie')
.addV('album')
.addV('album')
.addV('album').iterate()
There also may be vertices with other labels.
and a hash map describing what labels and how many vertices of each label I want to get:
LIMITS = {
"book": 2,
"movie": 2,
"album": 2,
}
I'd like to write a query that returns a list of vertices consisting of vertices with specified labels whete amount of vertices with each label is limited in according to the LIMITS hash map. In this case there should be 2 books, 2 movies and 2 albums in the result.
The limits and requested labels are calculated independently for every query so they cannot be hardcoded.
As far as I can see the limit step does not support passing traversals as an argument.
What trick can I use to write such query? The only option I see is to build the query using capabilities of the client side programming language (Ruby with grumlin as a gremlin client in my case):
nodes = LIMITS.map do |label, limit|
__.hasLabel(label).limit(limit)
end
g.V().union(*nodes).toList
But I believe there is a better solution.
Thank you!
The most direct way would be to use group() I think:
gremlin> g.V().group().by(label)
==>[software:[v[3],v[5]],person:[v[1],v[2],v[4],v[6]]]
gremlin> g.V().group().by(label).by(unfold().limit(2).fold())
==>[software:[v[3],v[5]],person:[v[1],v[2]]]
You can filter the vertices going to group() with hasLabel() if you need those sorts of restrictions. Depending upon how you use this, the traversal could be expensive in the sense that you have to traverse a fair bit of data to filter away all but two (in this case) vertices. If that is a concern, your approach to dynamically construct the traversal and the piecing it together with union() doesn't seem so bad. While I could probably think up a way to write that in just Gremlin, it probably wouldn't not be as readable as your approach.

How do I produce output even when there is no edge and when using select for projection

Can someone help me please with this simple query...Many thanks in advance...
I am using the following gremlin query and it works well giving me the original vertex (v) (with id-=12345), its edges (e) and the child vertex (id property). However, say if the original vertex 'v' (with id-12345) has no outgoing edges, the query returns nothing. I still want the properties of the original vertex ('v') even if it has no outgoing edges and a child. How can I do that?
g.V().has('id', '12345').as('v').
outE().as('e').
inV().
as('child_v').
select('v', 'e', 'child_v').
by(valueMap()).by(id).by(id)
There are a couple of things going on here but the major update you need to the traversal is to use a project() step instead of a select().
select() and project() steps are similar in that they both allow you to format the results of a traversal however they differ in (at least) one significant way. select() steps function by allowing you to access previously traversed and labeled elements (via as). project() steps allow you take the current traverser and branch it to manipulate the output moving forward.
In your original traversal, when there are no outgoing edges from original v so all the traversers are filtered out during the outE() step. Since there are no further traversers after the outE() step then remainder of the traversal has no input stream so there is no data to return. If you use a project() step after the original v you're able to return the original traverser as well as return the edges and incident vertex. This does lead to a slight complication when handling cases where no out edges exist. Gremlin does not handle null values, such as no out edges existing, you need to return some constant value for these statements using a coalesce statement.
Here is functioning version of this traversal:
g.V().hasId(3).
project('v', 'e', 'child_v').
by(valueMap()).
by(coalesce(outE().id(), constant(''))).
by(coalesce(out().id(), constant('')))
Currently you will get a lot of duplicate data, in the above query you will get the vertex properties E times. probably will be better to use project:
g.V('12345').project('v', 'children').
by(valueMap()).
by(outE().as('e').
inV().as('child').
select('e', 'child').by(id).fold())
example: https://gremlify.com/a1
You can get the original data format if you do something like this:
g.V('12345').as('v').
coalesce(
outE().as('e').
inV().
as('child_v')
select('v', 'e', 'child_v').
by(valueMap()).by(id).by(id),
project('v').by(valueMap())
)
example: https://gremlify.com/a2

Gremlin - simultaneously limit traverse iterations and search for edge property

I have a (broken) piece of gremlin code to generate the shortest path from a given vertex to one which has the parameter test_parameter. If that parameter is not found on an edge, no paths should be returned.
s.V(377524408).repeat(bothE().has('date', between(1554076800, 1556668800)).otherV()) /* date filter on edges */
.until(or(__.bothE().has('test_property', gt(0)),
loops().is(4))) /* broken logic! */
.path()
.local(unfold().filter(__.has('entity_id')).fold()) /* remove edges from output paths*/
The line that's broken is .until(or(__.outE().has('test_property', gt(0)), loops().is(4))).
At present - and it makes sense as to why - it gives all paths that are 4 hops from the starting vertex.
I'm trying to adapt it so that if the traverse is at 4 iterations, and if the property test_property is not found, then it should not return any paths. If test_property is found, it should return only the path(s) to that vertex.
I've attempted to put a times(4) constraint in and removing the loops() condition, but don't know how to have both the times(4) this and the .has('test_property', gt(0)) constraint.
Daniel's answer has few issues (see comments).
This query returns the correct result:
g.V(377524408)
.repeat(bothE().has('date', between(1554076800, 1556668800)).otherV().simplePath().as("v"))
.until(and(bothE().has('tp', gt(0)), loops().is(lte(4))))
.select(all, "v")
.limit(1)
The simplePath() is required so we won't go back and forth and avoid circles.
The repeat loop is until the condition is met AND we have not reached max hop.
The limit(1) return only the first (shortest) path. Omit to get all paths.
Note that if the graph is directed it is better to use outE() and not bothE().
This should work:
s.V(377524408).
repeat(bothE().has('date', between(1554076800, 1556668800)).otherV().as('v')).
times(4).
filter(bothE().has('test_property', gt(0))).
select(all, 'v')
Also note, that I replaced your local(unfold().filter(__.has('entity_id')).fold()) with something much simpler (assuming that the sole purpose was the removal of edges from the path).

Traverse Graph With Directed Cycles using Relationship Properties as Filters

I have a Neo4j graph with directed cycles. I have had no issue finding all descendants of A assuming I don't care about loops using this Cypher query:
match (n:TEST{name:"A"})-[r:MOVEMENT*]->(m:TEST)
return n,m,last(r).movement_time
The relationships between my nodes have a timestamp property on them, movement_time. I've simulated that in my test data below using numbers that I've imported as floats. I would like to traverse the graph using the timestamp as a constraint. Only follow relationships that have a greater movement_time than the movement_time of the relationship that brought us to this node.
Here is the CSV sample data:
from,to,movement_time
A,B,0
B,C,1
B,D,1
B,E,1
B,X,2
E,A,3
Z,B,5
C,X,6
X,A,7
D,A,7
Here is what the graph looks like:
I would like to calculate the descendants of every node in the graph and include the timestamp from the last relationship using Cypher; so I'd like my output data to look something like this:
Node:[{Descendant,Movement Time},...]
A:[{B,0},{C,1},{D,1},{E,1},{X,2}]
B:[{C,1},{D,1},{E,1},{X,2},{A,7}]
C:[{X,6},{A,7}]
D:[{A,7}]
E:[{A,3}]
X:[{A,7}]
Z:[{B,5}]
This non-Neo4J implementation looks similar to what I'm trying to do: Cycle enumeration of a directed graph with multi edges
This one is not 100% what you want, but very close:
MATCH (n:TEST)-[r:MOVEMENT*]->(m:TEST)
WITH n, m, r, [x IN range(0,length(r)-2) |
(r[x+1]).movement_time - (r[x]).movement_time] AS deltas
WHERE ALL (x IN deltas WHERE x>0)
RETURN n, collect(m), collect(last(r).movement_time)
ORDER BY n.name
We basically find all the paths between any of your nodes (beware cartesian products get very expensive on non-trivial datasets). In the WITH we're building a collection delta's that holds the difference between two subsequent movement_time properties.
The WHERE applies an ALL predicate to filter out those having any non-positive value - aka we guarantee increasing values of movement_time along the path.
The RETURN then just assembles the results - but not as a map, instead one collection for the reachable nodes and the last value of movement_time.
The current issue is that we have duplicates since e.g. there are multiple paths from B to A.
As a general notice: this problem is much more elegantly and more performant solvable by using Java traversal API (http://neo4j.com/docs/stable/tutorial-traversal.html). Here you would have a PathExpander that skips paths with decreasing movement_time early instead of collection all and filter out (as Cypher does).

neo4j logic gate simulation, how to?

I would like to create a bunch of "and" and "or" and "not" gates in a directed graph.
And then traverse from the inputs to see what they results are.
I assume there is a ready made traversal that would do that but I don't see it.
I don't know what the name of such a traversal would be.
Certainly breadth first will not do the job.
I need to get ALL the leaves, and go up toward the root.
In other words
A = (B & (C & Z))
I need to resolve C # Z first.
I need to put this type of thing in a graph and to traverse up.
You would probably create each of the operations as a node which has N incoming and one outgoing connection. You can of course also have more complex operations encapsuled as a node.
With Neo4j 2.0 I would use Labels for the 3 types of operations.
I assume your leaves would then be boolean values? Actually I think you have many roots and just a single leaf (the result expression)
(input1)-->(:AND {id:1})-->(:OR {id:2})-->(output)
(input2)-->(:AND {id:1})
(input3)------------------>(:OR {id:2})
Then you can use CASE when for decisions on the label type and use the collection predicates (ALL, ANY) for the computation
See: http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html
Predicates: http://docs.neo4j.org/chunked/milestone/query-function.html
Labels: http://docs.neo4j.org/chunked/milestone/query-match.html#match-get-all-nodes-with-a-label

Resources