gremlin query using select() in coalesce() step - gremlin

I'm trying to access the previously saved traversal using .as() in the second traversal inside coalesce() as below:
query to upsert edge (update edge if present / create)
Java code:
g.V('x').as('start')
.V('y').as('stop')
.inE('label').where(outV().as('edge'))
.select('start','stop','edge').fold()
.coalesce(unfold(),
addE('label').from(select('start')).to(select('stop')))
.property('key','value')
.promise(Traversal::Next);
Throws error as below: (precised for brevity)
gremlin.driver.exception.ResponseException: The provided traverser does not map to a value [stop]
when i replace the last step as below its working fine (instead of alias querying the vertices again)
Replaced addE('label').from(select('start')).to(select('stop'))
with addE('label').from(V('x')).to(V('y'))
Is there anyway to refer the alias in the second traversal in coalesce?
Note: I'm collecting all data related to finding edges before coalesce in order to make the gremlin throw error when any of the vertex / vertices are missing while creating edge
Expected behaviour: True on successful transaction and error when any vertex missing while creating edge.
This works as expected without using as() alias. But, i'm trying with as(). which i couldn't make it.
Hope this is clear. Please comment if in need of more info. Thanks.

The reason you cannot select the labels 'start' and 'stop' is that you used fold() after defining them. fold() is a reducing barrier step that causes all the labels defined before it to be lost.
Before I explain the solution, here is the traversal to add the two test vertices.
g.addV().property(id, 'x').
addV().property(id, 'y')
The following traversal returns the string 'error' if any of the vertices 'x' or 'y' is missing. If both vertices exist, it upserts the edge (updates the edge if present or adds it if not present).
g.inject(1).
optional(V('x').as('start')).
choose(
select('start'),
optional(V('y').as('stop')).
choose(
select('stop'),
coalesce(
select('start').outE('label').as('e').inV().where(eq('stop')).select('e'),
addE('label').from('start').to('stop')).
property('key', 'value'),
constant('error')),
constant('error'))

Related

Gremlin: Unexpected result when using coalesce with group().by()

I'm trying to group vertices and for each, get the sum of a property value.
I'd like to get 0 if the value doesn't exist so I need to add a coalesce in the following query (from Modern graph):
g.V().group().by(__.label()).by(__.out("knows").values("age").sum())
==>[person:59]
But with the coalesce step, the result is not what I expect ([software:0,person:59]):
g.V().group().by(__.label()).by(__.coalesce(__.out("knows").values("age").sum(), __.constant(0)))
==>[software:0,person:0]
What am I missing ?
The problem is that you are grouping by label and a person has both knows and created edges. So The person is matching with the constant(0) first due to the created edges.
gremlin> g.V().outE().inV().path().by(label)
==>[person,created,software]
==>[person,knows,person]
==>[person,knows,person]
==>[person,created,software]
==>[person,created,software]
==>[person,created,software]

Gremlin select multiple vertices gives an output without the properties with null values

In order to get all data from two vertices a and b i used the following
g.V('xxx').out('hasA')..as('X').out('hasB').as('Y').select('X','Y').
I get values of X where the value of Y isnt null.I wanted to get all X where the value of Y can be or may not be null.
Any ideas as to how i can tweak the above query?
I'm not sure that this matters to you any more but to directly answer your question you need to deal with the chance that there are no "hasB" edges. You might do that with coalesce() in the following fashion:
g.V('xxx').out('hasA').as('X').
coalesce(out('hasB'),constant('n/a')).as('Y').
select('X','Y')

How, with Gremlin, to return properties from in-vertices the same as I do from out-vertices? (Not as arrays)

I'm trying to start traversing from one set of labelled vertices, then get all their in-vertices connected by a particular kind of edge, then from there, return a property of those in-vertices as objects. I can do this same thing with some out-vertices starting from the same set of labelled vertices with no problem, but get a "The provided traverser does not map to a value:" error when I attempt it with some in-vertices.
I have found a workaround, but it is not ideal, as it returns the desired property values as arrays of length one.
Here is how I do the very similar task successfully with out-vertices:
g.V().hasLabel('TestCenter').project('address').by(out('physical').project('street').by(values('street1')))
This returns things like
==>{address={street=561 PLACE DE CEDARE}}
==>{address={street=370 N BLACK STATION AVE}}
This is great!
Then I try the same sort of query with some in-vertices, like this:
g.V().hasLabel('TestCenter').project('host').by(__.in('hosts').project('aCode').by(values('code')))
and get the above mentioned error.
The workaround I've been able to find is to add a .fold() to the final "by" like this:
g.V().hasLabel('TestCenter').project('host').by(__.in('hosts').project('aCode').by(values('code')).fold())
but then my responses are like this
==>{host=[{aCode=7387}]}
==>{host=[{aCode=9160}]}
What I would like is a response looking like this:
==>{host={aCode=4325}}
==>{host={aCode=1234}}
(Note: I am not sure if this is relevant, but I am connecting Gremlin to a Neptune DB Instance)
It seems to me from the error above and your workaround that not all of your 'TestCenter' have an in edge from type 'hosts'. When using project the by have to map for a valid value.
you can do two things:
1) make sure a value will be returned in the project:
g.V().hasLabel('TestCenter').project('host')
.by(coalesce(__.in('hosts').project('aCode').by(values('code')), constant('empty')))
2) filter does values:
g.V().hasLabel('TestCenter').where(__.in('hosts'))
.project('host').by(__.in('hosts').project('aCode').by(values('code')))

How does gremlin step .dedup('from' , 'to'') works?

Gremlin step .dedup('from' , 'to') remove elements with the same from AND to values or with the same from OR to values?
I need AND so I made that in this way:
.select('from' , 'to').as('hash').dedup('hash')
The strings provided to dedup() refer to as() labels used earlier in the traversal. There is an example here. http://tinkerpop.apache.org/docs/current/reference/#dedup-step
It's not really 'from' , 'to' as much as parts of the path history. As the example shows you can think of it as being an AND where if the same path segment appears more than once dedup('a','b') will remove it.

Smart way to generate edges in Neo4J for big graphs

I want to generate a graph from a csv file. The rows are the vertices and the columns the attributes. I want to generate the edges by similarity on the vertices (not necessarily with weights) in a way, that when two vertices have the same value of some attribute, an edge between those two will have the same attribute with value 1 or true.
The simplest cypher query that occurs to me looks somewhat like this:
Match (a:LABEL), (b:LABEL)
WHERE a.attr = b.attr
CREATE (a)-[r:SIMILAR {attr : 1}]->(b)
The graph has about 148000 vertices and the Java Heap Sizeoption is: dynamically calculated based on available system resources.
The query I posted gives a Neo.DatabaseError.General.UnknownFailure with a hint to Java Heap Space above.
A problem I could think of, is that a huge cartesian product is build first to then look for matches to create edges. Is there a smarter, maybe a consecutive way to do that?
I think you need a little change model: no need to connect every node to each other by the value of a particular attribute. It is better to have a an intermediate node to which you will bind the nodes with the same value attribute.
This can be done at the export time or later.
For example:
Match (A:LABEL) Where A.attr Is Not Null
Merge (S:Similar {propName: 'attr', propValue: A.attr})
Merge (A)-[r:Similar]->(S)
Later with separate query you can remove similar node with only one connection (no other nodes with an equal value of this attribute):
Match (S:Similar)<-[r]-()
With S, count(r) As r Where r=1
Detach Delete S
If you need connect by all props, you can use next query:
Match (A:LABEL) Where A.attr Is Not Null
With A, Keys(A) As keys
Unwind keys as key
Merge (S:Similar {propName: key, propValue: A[key]})
Merge (A)-[:Similar]->(S)
You're right that a huuuge cartesian product will be produced.
You can iterate the a nodes in batches of 1000 for eg and run the query by incrementing the SKIP value on every iteration until it returns 0.
MATCH (a:Label)
WITH a LIMIT SKIP 0 LIMIT 1000
MATCH (b:Label)
WHERE b.attr = a.attr AND id(b) > id(a)
CREATE (a)-[:SIMILAR_TO {attr: 1}]->(b)
RETURN count(*) as c

Resources