gremlin order by with coalesce duplicates some values - gremlin

In some cases, I get inexplicable result when I use order().by(...) with coalesce(...).
Using the standard Modern graph,
gremlin> g.V()
.hasLabel("person")
.out("created")
.coalesce(values("name"), constant("x"))
.fold()
==>[lop,lop,ripple,lop]
But if I sort by name before the coalesce I get 9 lop instead of 3:
gremlin> g.V()
.hasLabel("person")
.out("created")
.order().by("name")
.coalesce(values("name"), constant("x"))
.fold()
==>[lop,lop,lop,lop,lop,lop,lop,lop,lop,ripple]
Why the number of elements differs between the two queries ?

That looks like a bug - I've created an issue in JIRA. There is a workaround but first consider that your traversal isn't really going to work even with the bug set aside, order() will fail because you're referencing a key that possibly doesn't exist in the by() modulator. So you need to account for that differently:
g.V().
hasLabel("person").
out("created").
order().by(coalesce(values('name'),constant('x')))
I then used choose() to do what coalesce() is supposed to do:
g.V().
hasLabel("person").
out("created").
order().by(coalesce(values('name'),constant('x'))).
choose(has("name"),values('name'),constant('x')).
fold()
and that seems to work fine.

Related

Gremlin comparing two properties of a single node

Given below mentioned dataset, how do I check
TotalPay of the employee where totalPay is variablePay + FixedPay
What percentage variable pay is to TotalPay
If totalPay is less than 100,000
g.addV('employee').
property(id,'employeeId_1').
property('variablePay',20000.00).
property('fixedPay',70000.00)
The query below doesn't seems to work.
g.V('employeeId_1').has('variablePay', lt('fixedPay')).hasNext()
This is an area of Gremlin that is a bit confusing as today, predicates such as lt and gt can only take a literal value and not any sort of traversal such as lt(values('fixedPay')).
You can work around this using a where by query that uses two by modulators:
gremlin> g.V('employeeId_1').as('v').where(lt('v')).by('variablePay').by('fixedPay')
==>v[employeeId_1]
If we reverse the test, there is no result as expected.
gremlin> g.V('employeeId_1').as('v').where(lt('v')).by('fixedPay').by('variablePay')
gremlin>

Gremlin: dedup() with groups of vertices not working

I have a query that returns groups of users like this:
==>[britney,ladygaga,aguilera]
==>[aguilera,ladygaga,britney]
These 2 example groups have the same items in a different order, the problem is that dedup() does not remove one of the groups in this case, because having the items in different order makes them different for dedup.
The only solution I can think of is to call order() in each group so they have the same order and dedup() works. But this solution means:
Extra computation just because dedup cannot handle this situation
An ugly comment I have to add like "This is here to make dedup work"
Is there another solution to this?
You can try my example above in the gremlin console with these lines:
g.addV("user").property("name", "britney")
g.addV("user").property("name", "aguilera")
g.addV("user").property("name", "ladygaga")
Dedup working:
g.V().hasLabel("user").values("name").fold().store("result").V().hasLabel("user").values("name").fold().store("result").select("result").unfold().dedup()
Dedup not working because the items are shuffled:
g.V().hasLabel("user").values("name").order().by(shuffle).fold().store("result").V().hasLabel("user").values("name").order().by(shuffle).fold().store("result").select("result").unfold().dedup()
You have to order() the lists for them to have equality:
gremlin> g.V().hasLabel("user").values("name").order().by(shuffle).fold().store("result").
......1> V().hasLabel("user").values("name").order().by(shuffle).fold().store("result").
......2> select("result").unfold().order(local).dedup()
==>[aguilera,britney,ladygaga]
which is standard list equality:
gremlin> [1,2,3] == [1,2,3]
==>true
gremlin> [1,2,3] == [3,2,1]
==>false

Gremlin. In a parent-child relation, filter by the higher version of the child

I have a parent-child structure. The child has a version and a group. I need to create a filter for the newest version grouping by group,parent.
This query returns the values properly, but I need the vertex for each case:
g.V().hasLabel('Child')
.group()
.by(
__.group()
.by('tenant')
.by(__.in('Has').values('name'))
)
.by(__.values('version').max())
Any tips or suggestions?
Thanks for the help!
Data:
g.addV('Parent').property('name','child1').as('re1').addV('Parent').property('name','child2').as('re2').addV('Parent').property('name','child3').as('re3').addV('Child').property('tenant','group1').property('version','0.0.1').as('dp1').addE('Has').from('re1').to('dp1').addV('Child').property('tenant','group1').property('version','0.0.2').as('dp4').addE('Has').from('re1').to('dp4').addV('Child').property('tenant','group2').property('version','0.1.2').as('dp5').addE('Has').from('re1').to('dp5').addV('Child').property('tenant','group1').property('version','0.1.2').as('dp2').addE('Has').from('re2').to('dp2').addV('Child').property('tenant','group1').property('version','3.0.3').as('dp3').addE('Has').from('re3').to('dp3')
output:
{{group1=child1}=0.0.2, {group2=child1}=0.1.2, {group1=child3}=3.0.3, {group1=child2}=0.1.2}
but I need the vertex for each case
I assume that you mean the Child vertex. The following traversal will give you all the data:
gremlin> g.V().hasLabel("Child").
group().
by(union(values("tenant"), __.in("Has").values("name")).fold()).
unfold()
==>[group2, child1]=[v[14]]
==>[group1, child1]=[v[6], v[10]]
==>[group1, child2]=[v[18]]
==>[group1, child3]=[v[22]]
However, you probably want it to be in a slightly better structure:
gremlin> g.V().hasLabel("Child").
group().
by(union(values("tenant"), __.in("Has").values("name")).fold()).
unfold().
project('tenant','name','v').
by(select(keys).limit(local, 1)).
by(select(keys).tail(local, 1)).
by(select(values).unfold())
==>[tenant:group2,name:child1,v:v[14]]
==>[tenant:group1,name:child1,v:v[6]]
==>[tenant:group1,name:child2,v:v[18]]
==>[tenant:group1,name:child3,v:v[22]]

Traverse implied edge through property match?

I'm trying to create edges between vertices based on matching the value of a property in each vertex, making what is currently an implied relationship into an explicit relationship. I've been unsuccessful in writing a gremlin traversal that will match up related vertices.
Specifically, given the following graph:
g = TinkerGraph.open().traversal()
g.addV('person').property('name','alice')
g.addV('person').property('name','bob').property('spouse','carol')
g.addV('person').property('name','carol')
g.addV('person').property('name','dave').property('spouse', 'alice')
I was hoping I could create a spouse_of relation using the following
> g.V().has('spouse').as('x')
.V().has('name', select('x').by('spouse'))
.addE('spouse_of').from('x')
but instead of creating one edge from bob to carol and another edge from dave to alice, bob and dave each end up with spouse_of edges to all of the vertices (including themselves):
> g.V().out('spouse_of').path().by('name')
==>[bob,alice]
==>[bob,bob]
==>[bob,carol]
==>[bob,dave]
==>[dave,carol]
==>[dave,dave]
==>[dave,alice]
==>[dave,bob]
It almost seems as if the has filter isn't being applied, or, to use RDBMS terms, as if I'm ending up with an "outer join" instead of the "inner join" I'd intended.
Any suggestions? Am I overlooking something trivial or profound (local vs global scope, perhaps)? Is there any way of accomplishing this in a single traversal query, or do I have to iterate through g.has('spouse') and create edges individually?
You can make this happen in a single traversal, but has() is not meant to work quite that way. The pattern for this is type of traversal is described in the Traversal Induced Values section of the Gremlin Recipes tutorial, but you can see it in action here:
gremlin> g.V().hasLabel('person').has('spouse').as('s').
......1> V().hasLabel('person').as('x').
......2> where('x', eq('s')).
......3> by('name').
......4> by('spouse').
......5> addE('spouse_of').from('s').to('x')
==>e[10][2-spouse_of->5]
==>e[11][7-spouse_of->0]
gremlin> g.E().project('x','y').by(outV().values('name')).by(inV().values('name'))
==>[x:bob,y:carol]
==>[x:dave,y:alice]
While this can be done in a single traversal note that depending on the size of your data this could be an expensive traversal as I'm not sure that either call to V() will be optimized by any graph. While it's neat to use this form, you may find that it's faster to take approaches that ensure that a use of an index is in place which might mean issuing multiple queries to solve the problem.

Create if not exist Vertex and Edge in 1 Gremlin Query

I find the following code to create edge if it has not existed yet.
g.V().hasLabel("V1")
.has("userId", userId).as("a")
.V().hasLabel("V1").has("userId", userId2)
.coalesce(
bothE("link").where(outV().as("a")),
addE("link").from("a")
)
It works fine but I want to create both vertices and edge if they are not existed in 1 query.
I try the following code with new graph, it just create new vertices but no relation between them.
g.V().hasLabel("V1")
.has("userId", userId).fold()
.coalesce(
unfold(),
addV("V1").property("userId", userId1)
).as("a")
.V().hasLabel("V1").has("userId", userId2).fold()
.coalesce(
unfold(),
addV("V1").property("userId", userId2)
)
.coalesce(
bothE("link").where(outV().as("a")),
addE("link").from("a")
)
Thanks to Daniel Kuppitz in JanusGraph google group. I found out the solution. I re-post it here for anyone who need it.
There are two issues in your query. The first one is the reason why it doesn't work as expected: the fold() step. Using fold() will destroy the path history, but you can easily work around it, by doing that part in a child traversal:
g.V().has("V1","userId", userId1).fold().
coalesce(unfold(),
addV("V1").property("userId", userId1)).as("a").
map(V().has("V1","userId", userId2).fold()).
coalesce(unfold(),
addV("V1").property("userId", userId2))
coalesce(inE("link").where(outV().as("a")),
addE("link").from("a"))
The second issue is the combination of bothE and outV. You should rather use bothE/otherV, outE/inV or inE/outV.
I used the approach suggested by #thangdc94 (thanks!) and found that the "map" step takes a long time, this query worked much faster (X20) for me:
g.V().has("V1","userId", userId1).fold().
coalesce(unfold(),
addV("V1").property("userId", userId1)).as("a").iterate();
g.V().has("V1","userId", userId2).fold().
coalesce(unfold(),
addV("V1").property("userId", userId2)).as("b").
V().has("V1","userId", userId1).
coalesce(outE("link").where(inV().as("b")),
addE("link").to("b"))
comment: I used Neptune DB

Resources