Given below mentioned dataset, how do I check
TotalPay of the employee where totalPay is variablePay + FixedPay
What percentage variable pay is to TotalPay
If totalPay is less than 100,000
g.addV('employee').
property(id,'employeeId_1').
property('variablePay',20000.00).
property('fixedPay',70000.00)
The query below doesn't seems to work.
g.V('employeeId_1').has('variablePay', lt('fixedPay')).hasNext()
This is an area of Gremlin that is a bit confusing as today, predicates such as lt and gt can only take a literal value and not any sort of traversal such as lt(values('fixedPay')).
You can work around this using a where by query that uses two by modulators:
gremlin> g.V('employeeId_1').as('v').where(lt('v')).by('variablePay').by('fixedPay')
==>v[employeeId_1]
If we reverse the test, there is no result as expected.
gremlin> g.V('employeeId_1').as('v').where(lt('v')).by('fixedPay').by('variablePay')
gremlin>
Related
I'm trying to create edges between vertices based on matching the value of a property in each vertex, making what is currently an implied relationship into an explicit relationship. I've been unsuccessful in writing a gremlin traversal that will match up related vertices.
Specifically, given the following graph:
g = TinkerGraph.open().traversal()
g.addV('person').property('name','alice')
g.addV('person').property('name','bob').property('spouse','carol')
g.addV('person').property('name','carol')
g.addV('person').property('name','dave').property('spouse', 'alice')
I was hoping I could create a spouse_of relation using the following
> g.V().has('spouse').as('x')
.V().has('name', select('x').by('spouse'))
.addE('spouse_of').from('x')
but instead of creating one edge from bob to carol and another edge from dave to alice, bob and dave each end up with spouse_of edges to all of the vertices (including themselves):
> g.V().out('spouse_of').path().by('name')
==>[bob,alice]
==>[bob,bob]
==>[bob,carol]
==>[bob,dave]
==>[dave,carol]
==>[dave,dave]
==>[dave,alice]
==>[dave,bob]
It almost seems as if the has filter isn't being applied, or, to use RDBMS terms, as if I'm ending up with an "outer join" instead of the "inner join" I'd intended.
Any suggestions? Am I overlooking something trivial or profound (local vs global scope, perhaps)? Is there any way of accomplishing this in a single traversal query, or do I have to iterate through g.has('spouse') and create edges individually?
You can make this happen in a single traversal, but has() is not meant to work quite that way. The pattern for this is type of traversal is described in the Traversal Induced Values section of the Gremlin Recipes tutorial, but you can see it in action here:
gremlin> g.V().hasLabel('person').has('spouse').as('s').
......1> V().hasLabel('person').as('x').
......2> where('x', eq('s')).
......3> by('name').
......4> by('spouse').
......5> addE('spouse_of').from('s').to('x')
==>e[10][2-spouse_of->5]
==>e[11][7-spouse_of->0]
gremlin> g.E().project('x','y').by(outV().values('name')).by(inV().values('name'))
==>[x:bob,y:carol]
==>[x:dave,y:alice]
While this can be done in a single traversal note that depending on the size of your data this could be an expensive traversal as I'm not sure that either call to V() will be optimized by any graph. While it's neat to use this form, you may find that it's faster to take approaches that ensure that a use of an index is in place which might mean issuing multiple queries to solve the problem.
In some cases, I get inexplicable result when I use order().by(...) with coalesce(...).
Using the standard Modern graph,
gremlin> g.V()
.hasLabel("person")
.out("created")
.coalesce(values("name"), constant("x"))
.fold()
==>[lop,lop,ripple,lop]
But if I sort by name before the coalesce I get 9 lop instead of 3:
gremlin> g.V()
.hasLabel("person")
.out("created")
.order().by("name")
.coalesce(values("name"), constant("x"))
.fold()
==>[lop,lop,lop,lop,lop,lop,lop,lop,lop,ripple]
Why the number of elements differs between the two queries ?
That looks like a bug - I've created an issue in JIRA. There is a workaround but first consider that your traversal isn't really going to work even with the bug set aside, order() will fail because you're referencing a key that possibly doesn't exist in the by() modulator. So you need to account for that differently:
g.V().
hasLabel("person").
out("created").
order().by(coalesce(values('name'),constant('x')))
I then used choose() to do what coalesce() is supposed to do:
g.V().
hasLabel("person").
out("created").
order().by(coalesce(values('name'),constant('x'))).
choose(has("name"),values('name'),constant('x')).
fold()
and that seems to work fine.
I am basically trying to search all the connected Vertices for a node type, the Cypher query version gives me the expected result, but the Gremlin version is not giving me the intended result. Any thing that I am doing incorrectly??
Visual Representation of my data
Cyher Query to fetch all the connections
MATCH p=shortestPath((n:Process)-[*]-(m:Process))
WHERE n <> m
RETURN ID(n), n, ID(m), m, length(p)
Gremlin version
gremlin> g.V().hasLabel('Process')
.repeat(both().simplePath())
.until(hasLabel('Process'))
.path().by('title')
==>[Cash Processing,Accounting]
==>[Cash Processing,Sales]
==>[Sales,Marketing]
==>[Sales,Cash Processing]
==>[Marketing,Accounting]
==>[Marketing,Sales]
==>[Accounting,Cash Processing]
==>[Accounting,Marketing]
Any idea why Gremlin is not catching the 'Cash Processing'->'Sales'->'Marketing' connection???
I got a feeling something needs a change in that until() function, but cant figure out what
You don't talk about the labels of your vertices, but to me, it seems like the Sales vertex already fulfills the until(HasLabel('Process')) stop condition.
The correct translation of your Cypher query would be something more like this:
g.V().hasLabel('Process').as('n').
repeat(both().simplePath()).
emit(hasLabel('Process')).as('m').
dedup('n','m').
path().count(local).as('len').
select('m','n','len')
Using Titan w/ Cassandra v 0.3.1, I created a vertex key index via createKeyIndex as described in the Titan docs.
gremlin> g.createKeyIndex("my_key", Vertex.class)
==>null
I now have appx 50k nodes and 186k edges in the graph, and I'm finding a significant performance difference between lookups using my_key. This query takes about 5 seconds to run:
gremlin> g.V.has("my_key", "abc")
==>v[12345]
whereas using the index ID takes less than 1 second:
gremlin> g.v(12345)
==>v[12345]
my_key does not have a unique constraint (I don't want to), but I'm wondering what is causing such a discrepancy in performance. How can I increase performance on lookups for a non-unique, indexed vertex key?
The issue here is the use of .has, which is a filter function and will not use any indexes. From GremlinDocs:
It is worth noting that the syntax of has is similar to g.V("name",
"marko"), which has the difference of being a key index lookup and as
such will perform faster. In contrast, this line, g.V.has("name",
"marko"), will iterate over all vertices checking the name property of
each vertex for a match and will be significantly slower than the key
index approach.
For the example above, this will use the index and perform the lookup very quickly (< 1 second):
gremlin> g.V("my_key", "abc")
==>v[12345]
I need to use Gremlin find the shortest path between two nodes (vertices) while avoiding a list of given vertices.
I already have:
v.bothE.bothV.loop(2){!it.object.equals(y)}.paths>>1
To get my shortest path.
I was attempting something like:
v.bothE.bothV.filter{it.name!="ignored"}.loop(3){!it.object.equals(y)}.paths>>1
but it does not seem to work.
Please HELP!!!
The second solution you have looks correct. However, to be clear on what you are trying to accomplish. If x and y are the vertices that you want to find the shortest path between and a vertex to ignore during the traversal if it has the property name:"ignored", then the query is:
x.both.filter{it.name!="ignored"}.loop(2){!it.object.equals(y)}.paths>>1
If the "list of given vertices" you want filtered is actually a list, then the traversal is described as such:
list = [ ... ] // construct some list
x.both.except(list).loop(2){!it.object.equals(y)}.paths>>1
Moreover, I tend to use a range filter just to be safe as this will go into an infinite loop if you forget the >>1 :)
x.both.except(list).loop(2){!it.object.equals(y)}[1].paths>>1
Also, if there is a potential for no path, then to avoid an infinitely long search, you can do a loop limit (e.g. no more than 4 steps):
x.both.except(list).loop(2){!it.object.equals(y) & it.loop < 5}.filter{it.object.equals(y)}.paths>>1
Note why the last filter step before paths is needed. There are two reasons the loop is broken out of. Thus, you might not be at y when you break out of the loop (instead, you broke out of the loop because it.loops < 5).
Here is you solution implemented over the Grateful Dead graph distributed with Gremlin. First some set up code, where we load the graph and define two vertices x and y:
gremlin> g = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> g.loadGraphML('data/graph-example-2.xml')
==>null
gremlin> x = g.v(89)
==>v[89]
gremlin> y = g.v(100)
==>v[100]
gremlin> x.name
==>DARK STAR
gremlin> y.name
==>BROWN EYED WOMEN
Now your traversal. Note that there is not name:"ignored" property, so instead, I altered it to account for the number of performances of each song along the path. Thus, shortest path of songs played more than 10 times in concert:
gremlin> x.both.filter{it.performances > 10}.loop(2){!it.object.equals(y)}.paths>>1
==>v[89]
==>v[26]
==>v[100]
If you use Gremlin 1.2+, then you can use a path closure to provide the names of those vertices (for example) instead of just the raw vertex objects:
gremlin> x.both.filter{it.performances > 10}.loop(2){!it.object.equals(y)}.paths{it.name}>>1
==>DARK STAR
==>PROMISED LAND
==>BROWN EYED WOMEN
I hope that helps.
Good luck!
Marko.