Gremlin: Unexpected result when using coalesce with group().by()

Gremlin: Unexpected result when using coalesce with group().by() - gremlin

I'm trying to group vertices and for each, get the sum of a property value.
I'd like to get 0 if the value doesn't exist so I need to add a coalesce in the following query (from Modern graph):
g.V().group().by(__.label()).by(__.out("knows").values("age").sum())
==>[person:59]
But with the coalesce step, the result is not what I expect ([software:0,person:59]):
g.V().group().by(__.label()).by(__.coalesce(__.out("knows").values("age").sum(), __.constant(0)))
==>[software:0,person:0]
What am I missing ?

The problem is that you are grouping by label and a person has both knows and created edges. So The person is matching with the constant(0) first due to the created edges.
gremlin> g.V().outE().inV().path().by(label)
==>[person,created,software]
==>[person,knows,person]
==>[person,knows,person]
==>[person,created,software]
==>[person,created,software]
==>[person,created,software]

Related

gremlin query using select() in coalesce() step

I'm trying to access the previously saved traversal using .as() in the second traversal inside coalesce() as below:
query to upsert edge (update edge if present / create)
Java code:
g.V('x').as('start')
.V('y').as('stop')
.inE('label').where(outV().as('edge'))
.select('start','stop','edge').fold()
.coalesce(unfold(),
addE('label').from(select('start')).to(select('stop')))
.property('key','value')
.promise(Traversal::Next);
Throws error as below: (precised for brevity)
gremlin.driver.exception.ResponseException: The provided traverser does not map to a value [stop]
when i replace the last step as below its working fine (instead of alias querying the vertices again)
Replaced addE('label').from(select('start')).to(select('stop'))
with addE('label').from(V('x')).to(V('y'))
Is there anyway to refer the alias in the second traversal in coalesce?
Note: I'm collecting all data related to finding edges before coalesce in order to make the gremlin throw error when any of the vertex / vertices are missing while creating edge
Expected behaviour: True on successful transaction and error when any vertex missing while creating edge.
This works as expected without using as() alias. But, i'm trying with as(). which i couldn't make it.
Hope this is clear. Please comment if in need of more info. Thanks.

The reason you cannot select the labels 'start' and 'stop' is that you used fold() after defining them. fold() is a reducing barrier step that causes all the labels defined before it to be lost.
Before I explain the solution, here is the traversal to add the two test vertices.
g.addV().property(id, 'x').
addV().property(id, 'y')
The following traversal returns the string 'error' if any of the vertices 'x' or 'y' is missing. If both vertices exist, it upserts the edge (updates the edge if present or adds it if not present).
g.inject(1).
optional(V('x').as('start')).
choose(
select('start'),
optional(V('y').as('stop')).
choose(
select('stop'),
coalesce(
select('start').outE('label').as('e').inV().where(eq('stop')).select('e'),
addE('label').from('start').to('stop')).
property('key', 'value'),
constant('error')),
constant('error'))

Gremlin select multiple vertices gives an output without the properties with null values

In order to get all data from two vertices a and b i used the following
g.V('xxx').out('hasA')..as('X').out('hasB').as('Y').select('X','Y').
I get values of X where the value of Y isnt null.I wanted to get all X where the value of Y can be or may not be null.
Any ideas as to how i can tweak the above query?

I'm not sure that this matters to you any more but to directly answer your question you need to deal with the chance that there are no "hasB" edges. You might do that with coalesce() in the following fashion:
g.V('xxx').out('hasA').as('X').
coalesce(out('hasB'),constant('n/a')).as('Y').
select('X','Y')

How to get count of intermediate vertices?

Let's say there is a tree-like structure.
Top level: Warehouse
Next level: Storage space
Last level: Stored item
I want to get count of Storage spaces and Storage items per each warehouse.
I've already tried to get the number of Stored items: it's done pretty easily using groupCount.
g.V().
hasLabel('Warehouse').
as('w').
out('HAS_SPACE').
hasLabel('Space').
as('s').
out('HAS_ITEM').
hasLabel('Item').
groupCount().by(select('w')).
unfold().
order().by(values, desc).
limit(100).
project('WarehouseName', 'ItemsCount').
by(select(keys).values('Name')).
by(select(values))
However I want to get count of 's' as well and I can't think of any fast way to achieve it. I've thought about counting traversals something like:
g.V().
hasLabel('Warehouse').
project('WarehouseName', 'SpaceCount', 'ItemCount').
by('Name').
by(out('HAS_SPACE').count()).
by(out('HAS_SPACE').out('HAS_ITEMS').count())
but it works extremely slow on large number of vertices (there are about 26M).
Is there any other way to get that count?

You can use groupCount as sideEffect by giving it a name:
g.V().hasLabel('Warehouse').as('w')
.out('HAS_SPACE').hasLabel('Space').as('s')
.groupCount('spaceCount').by(select('w'))
.out('HAS_ITEM').hasLabel('Item')
.groupCount('itemsCount').by(select('w'))
.count().select('spaceCount', 'itemsCount')
Note that this will return 2 maps, one for spaces and one for items.
If you need to get it as a single map you can replace last line with:
.count().union(select('spaceCount'),select('itemsCount')).unfold()
.group().by(keys).by(select(values).fold()).unfold()
The result will be a map of arrays, of which first value is spaces count and second is items count.

Smart way to generate edges in Neo4J for big graphs

I want to generate a graph from a csv file. The rows are the vertices and the columns the attributes. I want to generate the edges by similarity on the vertices (not necessarily with weights) in a way, that when two vertices have the same value of some attribute, an edge between those two will have the same attribute with value 1 or true.
The simplest cypher query that occurs to me looks somewhat like this:
Match (a:LABEL), (b:LABEL)
WHERE a.attr = b.attr
CREATE (a)-[r:SIMILAR {attr : 1}]->(b)
The graph has about 148000 vertices and the Java Heap Sizeoption is: dynamically calculated based on available system resources.
The query I posted gives a Neo.DatabaseError.General.UnknownFailure with a hint to Java Heap Space above.
A problem I could think of, is that a huge cartesian product is build first to then look for matches to create edges. Is there a smarter, maybe a consecutive way to do that?

I think you need a little change model: no need to connect every node to each other by the value of a particular attribute. It is better to have a an intermediate node to which you will bind the nodes with the same value attribute.
This can be done at the export time or later.
For example:
Match (A:LABEL) Where A.attr Is Not Null
Merge (S:Similar {propName: 'attr', propValue: A.attr})
Merge (A)-[r:Similar]->(S)
Later with separate query you can remove similar node with only one connection (no other nodes with an equal value of this attribute):
Match (S:Similar)<-[r]-()
With S, count(r) As r Where r=1
Detach Delete S
If you need connect by all props, you can use next query:
Match (A:LABEL) Where A.attr Is Not Null
With A, Keys(A) As keys
Unwind keys as key
Merge (S:Similar {propName: key, propValue: A[key]})
Merge (A)-[:Similar]->(S)

You're right that a huuuge cartesian product will be produced.
You can iterate the a nodes in batches of 1000 for eg and run the query by incrementing the SKIP value on every iteration until it returns 0.
MATCH (a:Label)
WITH a LIMIT SKIP 0 LIMIT 1000
MATCH (b:Label)
WHERE b.attr = a.attr AND id(b) > id(a)
CREATE (a)-[:SIMILAR_TO {attr: 1}]->(b)
RETURN count(*) as c

Gremlin: How to get duplicates in a graph?

Assume I have a graph with vertices that have property name, what is a good way to get Ids of all vertices that have the same name.
Extending this, if I have a graph with day and month properties, how to return IDs of these vertices that share the same values.

Assuming you don't know the value of the duplicates and you just want to find all duplicates possible:
Here is a quick and dirty solution for you. Use Group By, for example:
g.V().has("name").limit(50).group().by("name");
I only use limit because doing this operation on the whole graph will be very time consuming. For the day and month properties you can do the same thing.

Assuming you have created indices for these indexed properties before you created the vertices, then you can create a gremlin query like:
def g = graph.traversal(); def vertices = g.V().has("name", "David").id();
This assumes you know the value you are searching for.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Gremlin: Unexpected result when using coalesce with group().by() - gremlin

Related

gremlin query using select() in coalesce() step

Gremlin select multiple vertices gives an output without the properties with null values

How to get count of intermediate vertices?

Smart way to generate edges in Neo4J for big graphs

Gremlin: How to get duplicates in a graph?

Categories

Resources