Union step does not work with multiple elements - azure-cosmosdb

The following query returns a user map with an "injected" property called "questions", it works as expected when g.V().has() returns a single user, but not when returns multiple users:
return g.V().has("user", "userId", 1)
.union(
__.valueMap().by(__.unfold()),
__.project('questions').by(
__.outE('response').valueMap().by(__.unfold()).fold()
)
)
.unfold()
.group()
.by(__.select(column.keys))
.by(__.select(column.values));
It works, but when I change the first line to return multiple users:
g.V().hasLabel("user").union(....
I finish the query calling .toList() so I was expecting to get a list of all the users in the same way it works with a single user but instead I still get a single user.
How can I get my query to work for both, multiple users or a single user?

When using Gremlin, you have to think in terms of a stream. The stream contains traversers which travel through the steps you've written. In your case, with your initial test of:
g.V().has("user", "userId", 1)
.union(
__.valueMap().by(__.unfold()),
__.project('questions').by(
__.outE('response').valueMap().by(__.unfold()).fold()
)
)
.unfold()
.group()
.by(__.select(column.keys))
.by(__.select(column.values))
you have one traverser (i.e. V().has("user", "userId", 1) produces one user) that flows to the union() and is split so that it goes to both valueMap() and project() both producing Map instances. You now have two traversers which are unfolded to a stream and grouped together to one final Map traverser.
So with that in mind what changes when you do hasLabel("user")? Well, you now have more than one starting traverser which means you will produce two traversers for each of those users when you get to union(). They will each be flatted to stream by unfold() and then they will just overwrite one another (because they have the same keys) to produce one final Map.
You really want to execute your union() and follow on operations once per initial "user" vertex traverser. You can tell Gremlin to do that with map():
g.V().has("user", "userId", 1)
.map(
.union(
__.valueMap().by(__.unfold()),
__.project('questions').by(
__.outE('response').valueMap().by(__.unfold()).fold()
)
)
.unfold()
.group()
.by(__.select(column.keys))
.by(__.select(column.values))
)
Finally, you can simplify your final by() modulators as:
g.V().has("user", "userId", 1)
.map(
.union(
__.valueMap().by(__.unfold()),
__.project('questions').by(
__.outE('response').valueMap().by(__.unfold()).fold()
)
)
.unfold()
.group()
.by(keys)
.by(values)
)

Related

Upsert list of maps (update if exists, insert if not exists) in Gremlin

I have a list of maps which correspond to records that I will receive in streaming, I need to take each record and upsert it.
I have seen many examples that work fine with just one vertex at a time, such that:
g.V('Amy').outE().where(inV().hasId('John')).
fold().
coalesce(
unfold(),
addE('manages').from(V('Amy')).to(V('John'))).
property('duration', '1year')
Works just fine, my issue lies in the replication of those steps for each map inside the list.
I am currently using gremlin on Amazon Neptune Notebooks, if that makes any difference.
At the moment I am able to insert a record if it does not exist or just retreive it if it exists. Basically a "get or insert" functionality. How can I update each property of each record if it already exists?
My current query for the get or insert:
g.inject([['memshpnum':'13464406186','cmpcod':'LM','upddat':'2019-03-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'7531272','T.id': '7531272#ifl', 'T.label': 'ccp_node_customer', 'status': 'pend'],
['memshpnum':'00170674487','cmpcod':'LM','upddat':'2019-03-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'3076059','T.id': '3076059#ifl', 'T.label': 'ccp_node_customer'],
['memshpnum':'20203784496','cmpcod':'LM','upddat':'2019-04-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'3075659','T.id': '727745#ifl', 'T.label': 'ccp_node_customer'],
['memshpnum':'20203784498','cmpcod':'LM','upddat':'2019-04-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'3076058','T.id': '727365#ifl', 'T.label': 'ccp_node_customer']
]).unfold().as("properties").
where(select("properties").unfold().filter(select(keys).is('T.label').or().is('T.cusnum')).select(values)).fold().
coalesce(unfold(),
addV(select('properties').unfold().filter(select(keys).is('T.label')).select(values)).as("vertex").
property(T.id, select('properties').unfold().filter(select(keys).is('T.id')).select(values)).
sideEffect(select("properties").
unfold().filter(select(keys).is(without('T.label','T.id'))).as("kv").
select("vertex").
property(select("kv").by(keys), select("kv").by(values))
)
)
And amongst the things I have tried is this query:
g.inject([
['memshpnum':'13464406186','cmpcod':'LM','upddat':'2019-03-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'7531272','T.id': '7531272#ifl', 'T.label': 'ccp_node_customer', 'status': 'pend'],
['memshpnum':'00170674487','cmpcod':'LM','upddat':'2019-03-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'3076059','T.id': '3076059#ifl', 'T.label': 'ccp_node_customer'],
['memshpnum':'20203784496','cmpcod':'LM','upddat':'2019-04-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'3075659','T.id': '727745#ifl', 'T.label': 'ccp_node_customer'],
['memshpnum':'20203784498','cmpcod':'LM','upddat':'2019-04-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'3076058','T.id': '727365#ifl', 'T.label': 'ccp_node_customer']
]).unfold().as("properties").
where(select("properties").unfold().filter(select(keys).is('T.label').or().is('T.cusnum')).select(values)).fold().
coalesce(unfold()
.property(select('properties').unfold().filter(select(keys).is(without('T.label','T.id'))).select(keys), select('properties').unfold().filter(select(keys).is(without('T.label','T.id'))).select(values)),
addV(select('properties').unfold().filter(select(keys).is('T.label')).select(values)).as("vertex").
property(T.id, select('properties').unfold().filter(select(keys).is('T.id')).select(values)).
sideEffect(select("properties").
unfold().filter(select(keys).is(without('T.label','T.id'))).as("kv").
select("vertex").
property(select("kv").by(keys), select("kv").by(values))
)
).toList()
However, I get an "Failed to interpret Gremlin query: The provided traverser does not map to a value" error.

Iterate list of values from traversal A in traversal B (Gremlin)

This is my test data:
graph = TinkerGraph.open()
g= graph.traversal()
g.addV('Account').property('id',"0x0").as('a1').
addV('Account').property('id',"0x1").as('a2').
addV('Account').property('id',"0x2").as('a3').
addV('Token').property('address','1').as('tk1').
addV('Token').property('address','2').as('tk2').
addV('Token').property('address','3').as('tk3').
addV('Trx').property('address','1').as('Trx1').
addV('Trx').property('address','1').as('Trx2').
addV('Trx').property('address','3').as('Trx3').
addE('sent').from('a1').to('Trx1').
addE('sent').from('a2').to('Trx2').
addE('received_by').from('Trx1').to('a2').
addE('received_by').from('Trx2').to('a3').
addE('distributes').from('a1').to('tk1').
addE('distributes').from('a1').to('tk2').
addE('distributes').from('a1').to('tk3').
iterate()
I need to first get all the Token addresses using the distributes relationship and then with those values loop through a traversal. This is an example of what I need for one single token
h = g.V().has('Account','id','0x0').next()
token = '1'
g.V(h).
out('sent').has('address',token).as('t1').
out('received_by').as('a2').
out('sent').has('address',token).as('t2').
out('received_by').as('a3').
select('a3','a2'). \
by('id').toList()
This is the output:
[a3:0x2,a2:0x1]
Instead of doing that has('address',token) on each hop I could omit it and just make sure the token address is the same by placing a where('t1',eq('t2')).by('address') at the end of the traversal, but this performs badly given my database design and indexes.
So what I do to iterate is:
tokens = g.V(h).out('distributes').values('address').toList()
finalList = []
for (token in tokens){
finalList.add(g.V(h).
out('sent').has('address',token).
out('received_by').as('a2').
out('sent').has('address',token).
out('received_by').as('a3').
select('a3','a2'). \
by('id').toList())
}
And this is what's stored in finalList at the end:
==>[[a3:0x2,a2:0x1]]
==>[]
==>[]
This works but I was wondering how can I iterate that token list this way without leaving Gremlin and without introducing that for loop. Also, my results contain empty results which is not optimal. The key here for me is to always be able to do that has('address',token) for each hop with the tokens that the Account node has ever sent. Thank you very much.
There is still uncertainty about what you are trying to achieve.
Nevertheless, I think this query does what you need:
g.V().has('Account', 'id', '0x0').as('a').
out('distributes').values('address').as('t').
select('a').
repeat(out('sent').where(values('address').
as('t')).
out('received_by')).
emit()
Example: https://gremlify.com/spwya4itlvd

How to traverse all vertex and get nested objects

I want to get nested objects in the form of
{ country :
{code:'IN',states:
{code:'TG',cities:
{code:'HYD',malls:
{[shopping-mall1],[shopping-mall2],.....}
},
{code:'PKL',malls:
{[shopping-mall1],[shopping-mall2],.....}
}
},
{code:'AP',cities:
{code:'VJY',malls:
{[shopping-mall1],[shopping-mall2],.....}
}
}
}
}
MY graph is in format
vertex: country ---> states ---->cities ---> mallls
edges: (type:'state') ('type','city')
ex: inE('typeOf').outV().has('type','state') move to next vertex "states".
next same inE('typeOf').outV().has('type','city') moves to "city" vertex. then "malls" vertex .
And tired to write the code, some vertex has no cities i have an error that situavation."
error
The provided traverser does not map to a value: v[8320]->[JanusGraphVertexStep(IN,[partOf],vertex), HasStep([type.eq(city)]), JanusGraphPropertiesStep([code],value)]
Thats why i am using coalesce because some state has not an edge 'inE('partOf').outV().has('type','city')' means no city
.by(coalesce(select('states').inE('partOf').outV().has('type','city'))
My query
g.V().hasLabel('Country').has('code','IN')
.project('country')
.by(project('code','states')
.by(values('code'))
.by(inE('partOf').outV().has('type','state').has('code').as('states').
project('code','cities')
.by(select('states').values('code'))
.by(coalesce(select('states').inE('partOf').outV().
has('type','city').has('code').as('cities').
project('code','malls')
.by(select('cities').values('code'))
.by(coalesce(select('cities').inE('partOf').outV().
has('type','malls').valueMap(),constant(0))),
constant(0)))))
But the result is
{country={code=IN, states={code=DD, cities=0}}}
here i am getting one state 'DD' and that state is no city,so it gives 'cities = 0".
the above result is only one state is coming, i want all states, cities and malls in each city.
Please update query or change query
In order to collect all the results you should use .fold() traversal which returns a list of the collected traversals. without fold you will get only the first traversal like in your example.
In order to keep the types the same I changed the constant to [] instead of 0.
It was also not clear if the "type" property is on the edge or the vertex. I find it more appropriate to have it on the edge, so I fixed it as well by moving the has('type',...) between the inE() and outV().
Last, you don't need to "store" the traversal using "as" and then "select" it.
This query should give you the required result:
g.V().hasLabel('Country').has('code','IN')
.project('country')
.by(project('code','states')
.by(values('code'))
.by(inE('partOf').has('type','state').outV().has('code')
.project('code','cities')
.by(values('code'))
.by(coalesce(inE('partOf').has('type','city').outV().has('code')
.project('code','malls')
.by(values('code'))
.by(coalesce(
inE('partOf').has('type','malls').outV().valueMap(),
constant([])).fold()),
constant([])).fold())
.fold()))

Sort paths based on Edge properties

Sorting traversal paths based on Edge property and Dedup
Hello,
I'm having a in memory graph and I want to sort paths based on Edge property and also dedup where paths leading to same destination.
E.g.
String NAME = "name";
String id = "id";
g.addV().property(id, 1).property(NAME, "u1").as("u1")
.addV().property(id, 2).property(NAME, "u2").as("u2")
.addV().property(id, 3).property(NAME, "u3").as("u3")
.addV().property(id, 4).property(NAME, "u4").as("u4")
.addE(rel).from("u2").to("u1").property("order", 2)
.addE(rel).from("u3").to("u1").property("order", 1)
.addE(rel).from("u4").to("u2").property("order", 3)
.addE(rel).from("u4").to("u3").property("order", 4)
.iterate();
What I'm trying to achieve is a traversal which gives me only one path i.e.
vertices = [path[u1, u3, u4]].
I tried using below gremlin.
List<Path> maps = g.V()
.has("id", 1)
.repeat(in()
.simplePath())
.until(inE().count().is(0))
.order().by(outE("rel").values("order"),Order.asc)
.path().by("name")
.toList();
However sorting doesn't happen. It gives me two paths :
vertices = [path[u1, u2, u4], path[u1, u3, u4]]
But I'm looking for output as vertices = [path[u1, u3, u4]]
I'm new to gremlin and ran out of options to try.
can someone help ?
g.V()
.has("id", 1)
.repeat(in("rel") .order() .by(outE().values("order"), Order.asc) .simplePath() )
.until(inE().count().is(0))
.dedup()
.path()
.by("name")
.toList() ;
Using toList will give you all the passible traversals. In your case you did order the answers but didn't take only the first one.
You should add limit step:
...
.limit(1).toList()
Or you can use next() instead of toList()

How to filter certain nodes to discover paths

I've been playing around with the Movie Graph dataset and I would like to find the shortest path between two actors but omitting the movie nodes (nodes either can have the label Person or Movie).
This query returns the shortest path from Kevin Bacon to Meg Ryan:
MATCH p=shortestPath((bacon:Person {name:"Kevin Bacon"})-[*]-(meg:Person {name:"Meg Ryan"}))
RETURN p
I now want to exclude the movie nodes, but how? This is what I've come up with, but it doesn't yield any results, unfortunately:
MATCH path=shortestPath( (bacon:Person {name:"Kevin Bacon"})-[*]-(meg:Person {name:"Meg Ryan"}) )
WITH nodes(path) AS ns
WHERE
ALL(node IN ns
WHERE NOT node:Movie)
RETURN ns AS path_without_movies;
The query is executed, but only with this result:
"(no changes, no records)".
Any idea how I can improve the query?
To filter Nodes on the Path by Label you can modify your query as follow:
MATCH (bacon:Person {name:"Kevin Bacon"}), (meg:Person {name:"Meg Ryan"})
MATCH path=shortestPath( (bacon)-[*]-(meg) )
WHERE
ALL(node IN nodes(path)
WHERE NOT 'Movie' IN labels(node))
RETURN path AS path_without_movies;
OR
MATCH (bacon:Person {name:"Kevin Bacon"}), (meg:Person {name:"Meg Ryan"})
MATCH path=shortestPath( (bacon)-[*]-(meg) )
WHERE
ALL(node IN nodes(path)
WHERE NOT node:Movie)
RETURN path AS path_without_movies;
The problem with your second query is that the keyword WITH creates a logical partition in the query.
So if you removed the line WITH nodes(path) AS ns, the following WHERE would be applied during the match. With that line, the Cypher is finding the results of the match, and than removing results from that list. (normally this difference isn't noticeable, but shortestpath reduces the results, changing the final results)
As Raj's answer points out, you can just move the extraction (nodes(path)) from the WITH to the ALL to avoid the partition.
MATCH path=shortestPath( (bacon:Person {name:"Kevin Bacon"})-[*]-(meg:Person {name:"Meg Ryan"}) )
WHERE
ALL(node IN nodes(path)
WHERE NOT node:Movie)
RETURN nodes(path) AS path_without_movies;

Resources