I have a graph with one start-node and two goal-vetices. Two paths lead to the first goal, another path to the second.
I look for all paths to the goals vertices and collect the edge weights (sack(sum)). I add the sum of all paths leading to the same goal via group().by().
query so far:
g.withSack(1.0f).V('v0')
.repeat(
outE().sack(mult).by('weight')
.inV()
).until(hasLabel('goal'))
.group().by().by(sack())
.unfold()
.project('sack', 'map')
.by(select(values))
.by(select(keys).valueMap(true))
.order().by('sack', desc)
the result looks like this:
{'sack': 0.27999999999999997, 'map': {<T.id: 1>: 'v7', <T.label: 4>: 'goal', 'date': ['02-02-2022']}}
{'sack': 0.125, 'map': {<T.id: 1>: 'v3', <T.label: 4>: 'goal', 'date': ['02-02-2022']}}
now I want to also sort by date... However, using by('date', desc) results in an error:
"java.util.LinkedHashMap cannot be cast to java.lang.Comparable"
can I somehow access the date-value inside the map? or sort by date in some other way?
(fyi: by(sack(), desc) works just as well as by('sack', desc) )
data:
g.addV('start').property(id, 'v0')
.addV('road').property(id, 'v1')
.addV('road').property(id, 'v2')
.addV('goal').property(id, 'v3').property('date', '02-02-2022')
.addV('road').property(id, 'v4').
.addV('road').property(id, 'v5').
.addV('road').property(id, 'v6').
.addV('goal').property(id, 'v7').property('date', '22-02-2022')
.addE('link').property('weight', 0.4).from(V('v0')).to(V('v1'))
.addE('link').property('weight', 0.4).from(V('v1')).to(V('v2'))
.addE('link').property('weight', 0.4).from(V('v2')).to(V('v3'))
.addE('link').property('weight', 0.5).from(V('v0')).to(V('v5'))
.addE('link').property('weight', 0.5).from(V('v5')).to(V('v4'))
.addE('link').property('weight', 0.5).from(V('v4')).to(V('v3'))
.addE('link').property('weight', 0.7).from(V('v0')).to(V('v6'))
.addE('link').property('weight', 0.4).from(V('v6')).to(V('v7'))
(my code runs on Neptune with 'gremlin': {'version': 'tinkerpop-3.4.11'})
You just need to select the date key from the map map.
gremlin> g.withSack(1.0f).
......1> V('v0').
......2> repeat(outE().sack(mult).by('weight').inV()).
......3> until(hasLabel('goal')).
......4> group().by().by(sack()).
......5> unfold().
......6> project('sack', 'map').
......7> by(select(values)).
......8> by(select(keys).valueMap(true)).
......9> order().by('sack', desc).by(select('map').select('date'),desc)
==>[sack:0.280,map:[id:v7,label:goal,date:[22-02-2022]]]
==>[sack:0.1250,map:[id:v3,label:goal,date:[02-02-2022]]]
Note however, for the dates to sort properly (if you encode them as strings), you should use something close to ISO 8601 form, such as YYYY-MM-DD or more concretely, 2022-02-22. It might be more efficient to store actual dates, or epoch offset integers.
UPDATED based on comment thread 2022-02-23
OK, I think I figured out what you are seeing - I was testing using the Gremlin Console at the 3.5.2 level and it looks like you are using Amazon Neptune. Neptune is currently at the 3.4.11 level (but should be moving up to the 3.5.2 level fairly soon). Here is the modified query running at the TinkerPop 3.5.2 level.
gremlin> g.withSack(1.0f).
......1> V('v0').
......2> repeat(outE().sack(mult).by('weight').inV()).
......3> until(hasLabel('goal')).
......4> group().by().by(sack()).
......5> unfold().
......6> project('sack', 'map').
......7> by(select(values)).
......8> by(select(keys).valueMap(true)).
......9> order().
.....10> by(select('map').select('date').unfold(),asc)
==>[sack:0.1250,map:[id:v3,label:goal,date:[2022-02-02]]]
==>[sack:0.280,map:[id:v7,label:goal,date:[2022-02-22]]]
I have simplified a decision graph. It starts with begin vertex and ends with decision. My aim is to calculate the sum of a score (score associated with vertex) while traveling different paths (to reach decision vertex).
The input to Graph is JSON.
Edges between vertices contain variables and values which can be checked from the input JSON.
Example input JSON :{ "age":45,"income_source":"job" }
Output is the sum of the scores [10 + 15 + 22] = 47
In Neo4j a Cypher query allows you to pass JSON input as query parameters but I do not know how this can be done in Gremlin.
Graph link : https://gremlify.com/nwgxqs5h7zh/
g.addV('begin').as('beg').
addV('decision').property('score',0).property('decision_code',"minor").as('dec0').
addV('age').property('score',10).as('age10').
addV('age').property('score',20).as('age20').
addV('salary').property('score',15).as('sal15').
addV('salary').property('score',25).as('sal25').
addV('salary').property('score',18).as('sal18').
addV('salary').property('score',30).as('sal30').
addV('decision').property('score',22).property('decision_code',"decision_22").as('dec22').
addV('decision').property('score',45).property('decision_code',"decision_45").as('dec45').
addV('decision').property('score',18).property('decision_code',"decision_18").as('dec18').
addV('decision').property('score',30).property('decision_code',"decision_30").as('dec30').
addE('relation').property('var',"age").property('val',"").property('min',"10").property('max',"18").from('beg').to('dec0').
addE('relation').property('var',"age").property('val',"").property('min',"19").property('max',"48").from('beg').to('age10').
addE('relation').property('var',"age").property('val',"").property('min',"49").property('max',"80").from('beg').to('age20').
addE('relation').property('var',"income_source").property('val',"job").property('min',"-1").property('max',"-1").from('age10').to('sal15').
addE('relation').property('var',"income_source").property('val',"buisness").property('min',"-1").property('max',"-1").from('age10').to('sal25').
addE('relation').property('var',"income_source").property('val',"job").property('min',"-1").property('max',"-1").from('age20').to('sal18').
addE('relation').property('var',"income_source").property('val',"buisness").property('min',"-1").property('max',"-1").from('age20').to('sal30').
addE('relation').property('var',"").property('val',"").property('min',"-1").property('max',"-1").from('sal15').to('dec22').
addE('relation').property('var',"").property('val',"").property('min',"-1").property('max',"-1").from('sal25').to('dec45').
addE('relation').property('var',"").property('val',"").property('min',"-1").property('max',"-1").from('sal18').to('dec18').
addE('relation').property('var',"").property('val',"").property('min',"-1").property('max',"-1").from('sal30').to('dec30')
There is an issue with lt, gt, inside, between predicate. It only accepts number not any thing which evaluates to number.
g.inject(['val1':10,'val2':15]).as('data').V().
where(select('data').select('val1').is(lt(select('data').values('val2'))))
Above query fails Cannot compare '10' (Integer) and '[SelectOneStep(last,data), PropertiesStep([val2],value)]'... Due to this issue below query also fails.
g.withSack(0).inject(['age':45,'source':'job']).as('data').
V().hasLabel('begin').
repeat(outE().as('e').where(select('data').select(select('e').values('var')).is(eq(select('e').values('val')).or(inside(select('e').values('min'),select('e').values('max'))))).inV().sack(sum).by('score')).
until(hasLabel('decision')).project('final_score','path').by(sack()).by(path())
Please let me know if this problem can be modeled in different way to achieve same output score
Thank you for your time.
I have converted input JSON as a List. The ordering of element in this list is important. It decides, the level at which the traversal will
compare which element from the list.
g.withSack(0).
inject(["age", 45, "income_source", "job"]).as("input").
# initialized sack and input List
V().hasLabel("begin").
outE().as('a').local(and(
select("input").unfold().range(0, 1).as("temp").
select("a").values("var").where(eq("temp")), # FILTER property "var"
select("input").unfold().range(1, 2).as("temp").
select("a").values("max").where(gte("temp")).
select("a").values("min").where(lte("temp")))). # FILTER by age from input.
inV().sack(sum).by("score").
outE().as("b").local(and(
select("input").unfold().range(2, 3).as("temp").
select("b").values("var").where(eq("temp")), # FILTER property "var"
select("input").unfold().range(3, 4).as("temp").
select("b").values("val").where(eq("temp")))). # FILTER property val
inV().sack(sum).by("score").
out().sack(sum).by("score").
sack()
You can inject a map into a Gremlin query which essentially has the same shape as your JSON document. The basic building blocks for the first part of the query will look something like this, which I tested using your data and TinkerGraph.
gremlin> g.inject(['age':45,'source':'job']).as('data').
......1> V().hasLabel('begin').
......2> outE().as('e1').
......3> where(gte('e1')).
......4> by(select('data').select('age')).
......5> by('min').
......6> where(lte('e1')).
......7> by(select('data').select('age')).
......8> by('max').
......9> valueMap()
==>[min:19,max:48,var:age]
The next step is to find the edges that have the job tag.
gremlin> g.inject(['age':45,'source':'job']).as('data').
......1> V().hasLabel('begin').
......2> outE().as('e1').
......3> where(gte('e1')).
......4> by(select('data').select('age')).
......5> by('min').
......6> where(lte('e1')).
......7> by(select('data').select('age')).
......8> by('max').
......9> inV().
.....10> outE().as('e2').
.....11> where(eq('e2')).
.....12> by(select('data').select('source')).
.....13> by('val').valueMap()
==>[val:job,var:income_source]
All we need to do now is traverse to the final node and calculate the sum.
gremlin> g.withSack(0).
......1> inject(['age':45,'source':'job']).as('data').
......2> V().hasLabel('begin').
......3> outE().as('e1').
......4> where(gte('e1')).
......5> by(select('data').select('age')).
......6> by('min').
......7> where(lte('e1')).
......8> by(select('data').select('age')).
......9> by('max').
.....10> inV().
.....11> sack(sum).
.....12> by('score').
.....13> outE().as('e2').
.....14> where(eq('e2')).
.....15> by(select('data').select('source')).
.....16> by('val').
.....17> inV().
.....18> sack(sum).
.....19> by('score').
.....20> out().
.....21> sack(sum).
.....22> by('score').
.....23> sack()
==>47
I am pretty new to Gremlin and I am trying to find within my graph the lightest paths from source_id vertex to target_id vertex. (On some edges I have weights and on others I don't)
To get the shortest path I can with:
g.V()
.has("id", source_id)
.repeat(outV().simplePath())
.until(has("id", target_id))
.path()
.limit(3)
.toList()
)
I have read this reference book
and it suggested to use something like:
g.V()
.has("id", source_id)
.repeat(out()).simplePath())
.until(has("id", target_id))
.path()
.by(coalesce(values("weight"), constant(0.0)))
.limit(limit)
.toList()
)
This is not working and returning the weights of the paths,
how can I achieve this with Gremlin? should first get the paths, calculate their weights and then sort them by the weights? there must be an easier and intuitive way for this basic need. (if it were neo4j I could just run Dijkstra's algorithm)
Would appreciate some help here,
thanks
I created the following example graph to help illustrate this answer.
gremlin> g.addV('A').as('a').
......1> addV('B').as('b').
......2> addV('C').as('c').
......3> addV('D').as('d').
......4> addV('E').as('e').
......5> addV('F').as('f').
......6> addV('G').as('g').
......7> addV('H').as('h').
......8> addV('Z').as('z').
......9> addE('knows').from('a').to('b').property('weight',0.2).
.....10> addE('knows').from('a').to('c').property('weight',0.5).
.....11> addE('knows').from('a').to('f').property('weight',3.5).
.....12> addE('knows').from('b').to('c').property('weight',0.1).
.....13> addE('knows').from('c').to('d').property('weight',0.3).
.....14> addE('knows').from('c').to('e').property('weight',0.2).
.....15> addE('knows').from('c').to('f').
.....16> addE('knows').from('d').to('f').property('weight',0.1).
.....17> addE('knows').from('d').to('g').property('weight',2.0).
.....18> addE('knows').from('f').to('g').property('weight',0.9).
.....19> addE('knows').from('f').to('h').property('weight',0.3).
.....20> addE('knows').from('f').to('z').property('weight',0.1).
.....21> addE('knows').from('h').to('z').property('weight',0.2).iterate()
The following routes exist from A to Z (regardless of edge weight).
gremlin> g.V().hasLabel('A').
......1> repeat(outE().inV().simplePath()).until(hasLabel('Z')).path().by(label)
==>[A,knows,F,knows,Z]
==>[A,knows,F,knows,H,knows,Z]
==>[A,knows,C,knows,F,knows,Z]
==>[A,knows,B,knows,C,knows,F,knows,Z]
==>[A,knows,C,knows,D,knows,F,knows,Z]
==>[A,knows,C,knows,F,knows,H,knows,Z]
==>[A,knows,B,knows,C,knows,D,knows,F,knows,Z]
==>[A,knows,B,knows,C,knows,F,knows,H,knows,Z]
==>[A,knows,C,knows,D,knows,F,knows,H,knows,Z]
==>[A,knows,B,knows,C,knows,D,knows,F,knows,H,knows,Z]
Note that one of the edges does not have a weight. We can find the paths with the lightest weights (where no weight is treated as a zero value) as follows:
gremlin> g.withSack(0).V().
......1> hasLabel('A').
......2> repeat(outE().sack(sum).by(coalesce(values('weight'),constant(0))).inV()).
......3> until(hasLabel('Z')).
......4> order().by(sack(),asc).
......5> path().
......6> by(label)
==>[A,knows,B,knows,C,knows,F,knows,Z]
==>[A,knows,C,knows,F,knows,Z]
==>[A,knows,B,knows,C,knows,D,knows,F,knows,Z]
==>[A,knows,B,knows,C,knows,F,knows,H,knows,Z]
==>[A,knows,C,knows,D,knows,F,knows,Z]
==>[A,knows,C,knows,F,knows,H,knows,Z]
==>[A,knows,B,knows,C,knows,D,knows,F,knows,H,knows,Z]
==>[A,knows,C,knows,D,knows,F,knows,H,knows,Z]
==>[A,knows,F,knows,Z]
==>[A,knows,F,knows,H,knows,Z]
Just to prove that things are working as expected we can add the total weight value to each result.
gremlin> g.withSack(0).V().
......1> hasLabel('A').
......2> repeat(outE().sack(sum).by(coalesce(values('weight'),constant(0))).inV()).
......3> until(hasLabel('Z')).
......4> order().by(sack(),asc).
......5> local(
......6> union(
......7> path().
......8> by(label),
......9> sack()).
.....10> fold())
==>[[A,knows,B,knows,C,knows,F,knows,Z],0.4]
==>[[A,knows,C,knows,F,knows,Z],0.6]
==>[[A,knows,B,knows,C,knows,D,knows,F,knows,Z],0.8]
==>[[A,knows,B,knows,C,knows,F,knows,H,knows,Z],0.8]
==>[[A,knows,C,knows,D,knows,F,knows,Z],1.0]
==>[[A,knows,C,knows,F,knows,H,knows,Z],1.0]
==>[[A,knows,B,knows,C,knows,D,knows,F,knows,H,knows,Z],1.2]
==>[[A,knows,C,knows,D,knows,F,knows,H,knows,Z],1.4]
==>[[A,knows,F,knows,Z],3.6]
==>[[A,knows,F,knows,H,knows,Z],4.0]
we have:
g = graph.traversal()
What I what to do is like:
numV = g.V().count()
g.V().range(0,numV-1).addE('label').to(g.V().range(1,numV))
I want to add out edges between two adjacent vertices.
In my case, numV is calculated using a long code instead of simple g.V().count(), also g.V() in the second line will be replaced by a long line, so there are two questions:
How to avoid pre-compute numV? as we have already done g.V() in the second line of the code.
I tried to simplify the code like this:
g.V().as('a').range(0,numV-1).addE('label').to(select('a').range(1,numV))
but it gives me error:
The provided traverser does not map to a value
I am kind of new to Gremlin.
Let's start with a small sample graph that consists of foo and bar vertices and sequential IDs.
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.withSack(1).
......1> addV('foo').
......2> property(id, sack()).
......3> emit().
......4> repeat(sack(sum).by(constant(1)).
......5> addV().
......6> property(label, __.constant(['foo','bar']).sample(local, 1).unfold()).
......7> property(id, sack())).
......8> times(20).
......9> valueMap(true)
==>[label:foo,id:1]
==>[label:foo,id:2]
==>[label:bar,id:3]
==>[label:foo,id:4]
==>[label:bar,id:5]
==>[label:bar,id:6]
==>[label:foo,id:7]
==>[label:foo,id:8]
==>[label:foo,id:9]
==>[label:bar,id:10]
==>[label:foo,id:11]
==>[label:bar,id:12]
==>[label:foo,id:13]
==>[label:bar,id:14]
==>[label:bar,id:15]
==>[label:foo,id:16]
==>[label:bar,id:17]
==>[label:bar,id:18]
==>[label:foo,id:19]
==>[label:foo,id:20]
==>[label:bar,id:21]
Now let's say that we want to connect all foo vertices with a baz edge; the order criteria will be the vertex id.
gremlin> g.V().hasLabel('foo').
......1> order().
......2> by(id).as('a').
......3> aggregate('x').
......4> flatMap(select('x').unfold().
......5> where(gt('a')).by(id).
......6> limit(1)).
......7> addE('baz').
......8> from('a')
==>e[0][1-baz->2]
==>e[1][2-baz->4]
==>e[2][4-baz->7]
==>e[3][7-baz->8]
==>e[4][8-baz->9]
==>e[5][9-baz->11]
==>e[6][11-baz->13]
==>e[7][13-baz->16]
==>e[8][16-baz->19]
==>e[9][19-baz->20]
I think that's pretty much the scenario you're looking for, but it's hard to tell without a provided sample graph and an expected outcome.
Also note, that you should always have an explicit order criterion (it doesn't have to be the id), otherwise TinkerPop won't guarantee a deterministic order.