Gremlin CosmosDB trying to query for a list of nodes from a list of nodes and give both back - azure-cosmosdb

graph image
enter code here
(Query to find owner)
.inE().hasLabel('OwnedBy').outV().not(inE().hasLabel('AssignedTo').has('Status', 'InUse'))
.not(
inE()
.hasLabel('AssignedTo')
.has('Status', 'InUse')
).as('cards')
.inE()
.hasLabel('AssignedTo')
.has('Status', 'FutureUse')
.as('OwnedByRequestEdges')
.outV()
.as('OwnedByRequests')
.Select('card', 'OwnedByRequests', 'OwnedByRequestEdges', 'Owner')
I really want it to give me a list of the cards and the list of the requests.
I user can have multiple cards and cards can have multiple future reservations.

In order to store all the values during traversal, you should use "store" and not "as".
Since you want the "select" to run once, you need to add fold() before it.
There was a redundant "not" filter (same filter).
(Query to find owner)
.inE().hasLabel('OwnedBy').outV()
.not(inE().hasLabel('AssignedTo').has('Status','InUse'))
.store('Cards')
.inE().hasLabel('AssignedTo').has('Status', 'FutureUse').outV()
.store('OwnedByRequests')
.fold()
.select('Cards', 'OwnedByRequests')

Related

Gremlin query - how to eliminate nested coalesce

I have person vertex, has_vehicle edge and vehicle vertex which models vehicle ownership use case. The graph path is person -> has_vehicle -> vehicle.
I want to implement a Gremlin query which associates a vehicle to a person only if
The person does not have a vehicle
AND
The input vehicle is not associated with a person yet.
I followed the fold-coalesce-unfold pattern and came out with following Gremlin query with nested coalesce
g.V().hasLabel('person').has('name', 'Tom').as('Tom').outE('has_vehicle').fold().coalesce(
__.unfold(), // check if Tom already have a vehicle
g.V().has('vehicle', 123).as('Vehicle').inE('has_vehicle').fold().coalesce(
__.unfold(), // check if vehicle 123 is already associated with a person
__.addE('has_vehicle').from('Tom').to('Vehicle') // associate the vehicle to Tom
)
)
Is there a way to eliminate the nested coalesce? If I have multiple criteria, it would be too complex to write the query.
This might be a case where a couple of where(not(...)) patterns, rather than nesting coalesce steps works well. For example, we might change the query as shown below.
g.V().hasLabel('person').has('name', 'Tom').as('Tom').
where(not(outE('has_vehicle'))).
V().has('vehicle', 123).as('Vehicle').
where(not(inE('has_vehicle'))).
addE('has_vehicle').from('Tom').to('Vehicle')
So long as the V steps do not fan out and yield multiple Tom or Vehicle nodes that should work and is easy to extend by adding more to the where filters as needed.
As as a side note, the not steps used above should work even if not wrapped by where steps, but I tend to find it just reads better as written.
This rewrite does make an assumption that you are able to tolerate the case where Tom already has a car and the query just ends there. In that case no vertex or edge will be returned. If you did a toList to run the query you would get an empty list back in that case however to indicate nothing was done.

How to generate recommendations for a User using Gremlin?

I am using gremlin QL on AWS Neptune Database to generate Recommendations for a user to try new food items. The problem that I am facing is that the recommendations need to be in the same cuisine as the user likes.
We are given with three different types of nodes which are- "User", "the cuisine he likes" and "the category of the cuisine" that it lies in.
In the picture above, the recommendations for "User 2" would be "Node 1" and "Node 2". However "Node 1" belongs to a different category which is why we cannot recommend that node to "User2". We can only recommend "Node 2" to the user since that is the only node that belongs to the same category as the user likes. How do I write a gremlin query to achieve the same?
Note- There are multiple nodes for a user and multiple categories that these nodes belong to.
Here's a sample dataset that we can use:
g.addV('user').property('name','ben').as('b')
.addV('user').property('name','sally').as('s')
.addV('food').property('foodname','chicken marsala').as('fvm')
.addV('food').property('foodname','shrimp diavolo').as('fsd')
.addV('food').property('foodname','kung pao chicken').as('fkpc')
.addV('food').property('foodname','mongolian beef').as('fmb')
.addV('cuisine').property('type','italian').as('ci')
.addV('cuisine').property('type','chinese').as('cc')
.addE('hasCuisine').from('fvm').to('ci')
.addE('hasCuisine').from('fsd').to('ci')
.addE('hasCuisine').from('fkpc').to('cc')
.addE('hasCuisine').from('fmb').to('cc')
.addE('eats').from('b').to('fvm')
.addE('eats').from('b').to('fsd')
.addE('eats').from('b').to('fkpc')
.addE('eats').from('b').to('fmb')
.addE('eats').from('s').to('fmb')
Let's start with the user Sally...
g.V().has('name','sally').
Then we want to find all food item nodes that Sally likes.
(Note: It is best to add edge labels to your edges here to help with navigation.)
Let's call the edge from a user to a food item, "eats". Let's also assume that the direction of the edge (they must have a direction) goes from a user to a food item. So let's traverse to all foods that they like. We'll save this to a temporary list called 'liked' that we'll use later in the query to filter out the foods that Sally already likes.
.out('eats').aggregate('liked').
From this point in the graph, we need to diverge and fetch two downstream pieces of data. First, we want to go fetch the cuisines related to food items that Sally likes. We want to "hold our place" in the graph while we go fetch these items, so we use the sideEffect() step which allows us to go do something but come back to where we currently are in the graph to continue our traversal.
sideEffect(
out('hasCuisine').
dedup().
aggregate('cuisineschosen')).
Inside of the sideEffect() we want to traverse from food items to cuisines, deduplicate the list of related cuisines, and save the list of cuisines in a temporary list called 'cuisinechosen'.
Once we fetch the cuisines, we'll come back to where we were previously at the food items. We now want to go find the related users to Sally based on common food items. We also want to make sure we're not traversing back to Sally, so we'll use simplePath() here. simplePath() tells the query to ignore cycles.
in('eats').
simplePath().
From here we want to find all food items that our related users like and only return the ones with a cuisine that Sally already likes. We also remove the foods that Sally already likes.
out('eats').
where(without('liked')).
where(
out('hasCuisine').
where(
within('cuisineschosen'))).
values('foodname')
NOTE: You may also want to add a dedup() here after out('eats') to only return a distinct list of food items.
Putting it altogether...
g.V().has('name','sally').
out('eats').aggregate('liked').
sideEffect(
out('hasCuisine').
dedup().
aggregate('cuisineschosen')).
in('eats').
simplePath().
out('eats').
where(without('liked')).
where(
out('hasCuisine').
where(
within('cuisineschosen'))).
values('foodname')
Results:
['kung pao chicken']
At scale, you may need to use the sample() or coin() steps in Gremlin when finding related users as this can fan out really fast. Query performance is going to be based on how many objects each query needs to traverse.

Upsert fails when using as() and coalesce()

I'm trying to create an upsert traversal in Gremlin. Update an edge if it exists, otherwise add a new edge.
g.V("123")
.as("user")
.V("456")
.as("post")
.inE("like")
.fold()
.coalesce(
__.unfold()
.property("likeCount", 1),
__.addE("like")
.from("user")
.to("post")
)
This returns an error.
The provided traverser does not map to a value: []->[SelectOneStep(last,post)]
I've narrowed this down to the to("post") step. From within coalesce it can't see post from as("post"). It is also unable to see user.
This is strange to me because the following does work:
g.V("123")
.as("user")
.V("456")
.as("post")
.choose(
__.inE("like"),
__.inE("like")
.property("likeCount", 1),
__.addE("like")
.from("user")
.to("post")
)
From within the choose() step I do have access to user and post.
I'd like to use the more efficient upsert pattern but can't get past this issue. I could just look up the user and post from within coalesce like so:
g.V("123")
.as("user")
.V("456")
.as("post")
.inE("like")
.fold()
.coalesce(
__.unfold()
.property("likeCount", 1),
__.V("456")
.as("post")
.V("123")
.addE("like")
.to("post")
)
But repeating that traversal seems inefficient. I need post and user in the outer traversal for other reasons.
Why can't I access user and post from within a coalesce in my first example?
The issue you are running into is that as soon as you hit the fold() step in your code you lose the path history, which means that it will not know user or post are referring to. fold() is what is known as a ReducingBarrierStep which means that many results are collected into a single result. The way I think about it is that because you have converted many results to one, anything like aliases that were added (e.g. user and post) no longer really have meaning as they have all been collected into a single element.
However you can rewrite your query as shown here to achieve the desired result:
g.V("456")
.inE("like")
.fold()
.coalesce(
__.unfold()
.property("likeCount", 1),
__.addE("like")
.from(V("123"))
.to(V("456"))
)
I am also not sure if you meant to only add a like count on an existing edge or if you wanted to add the like count to the edge in either case which would be like this:
g.V("456")
.inE("like")
.fold()
.coalesce(
__.unfold(),
__.addE("like")
.from(V("123"))
.to(V("456"))
).property("likeCount", 1)

Order results by number of coincidences in edge properties

I'm working on a recommendation system that recommends other users. The first results should be the most "similar" users to the "searcher" user. Users respond to questions and the amount of questions responded in the same way is the amount of similarity.
The problem is that I don't know how to write the query
So in technical words I need to sort the users by the amount of edges that has specific property values, I tried with this query, I thought it should work but it doesn't work:
let query = g.V().hasLabel('user');
let search = __;
for (const question of searcher.questions) {
search = search.outE('response')
.has('questionId', question.questionId)
.has('answerId', question.answerId)
.aggregate('x')
.cap('x')
}
query = query.order().by(search.unfold().count(), order.asc);
Throws this gremlin internal error:
org.apache.tinkerpop.gremlin.process.traversal.step.util.BulkSet cannot be cast to org.apache.tinkerpop.gremlin.structure.Vertex
I also tried with multiple .by() for each question, but the result was not ordered by the amount of coincidence.
How can I write this query?
When you cap() an aggregate() it returns a BulkSet which is a Set that has counts for how many times each object exists in that Set. It behaves like a List when you iterate through it by unrolling each object the associated size of the count. So you get your error because the output of cap('x') is a BulkSet but because you are building search in a loop you are basically just calling outE('response') on that BulkSet and that's not valid syntax as has() expects a graph Element such as a Vertex as indicated by the error.
I think you would prefer something more like:
let query = g.V().hasLabel('user').
outE('response');
let search = [];
for (const question of searcher.questions) {
search.push(has('questionId', question.questionId).
has('answerId', question.answerId));
}
query = query.or(...search).
groupCount().
by(outV())
order(local).by(values, asc)
I may not have the javascript syntax exactly right (and I used spread syntax in my or() to just convey the idea quickly of what needs to happen) but basically the idea here is to filter edges that match your question criteria and then use groupCount() to count up those edges.
If you need to count users who have no connection then perhaps you could switch to project() - maybe like:
let query = g.V().hasLabel('user').
project('user','count').
by();
let search = [];
for (const question of searcher.questions) {
search.push(has('questionId', question.questionId).
has('answerId', question.answerId));
}
query = query.by(outE('response').or(...search).count()).
order().by('count', asc);
fwiw, I think you might consider a different schema for your data that might make this recommendation algorithm a bit more graph-like. A thought might be to make the question/answer a vertex (a "qa" label perhaps) and have edges go from the user vertex to the "qa" vertex. Then users directly link to the question/answers they gave. You can easily see by way of edges, a direct relationship, which users gave the same question/answer combination. That change allows the query to flow much more naturally when asking the question, "What users answered questions in the same way user 'A' did?"
g.V().has('person','name','A').
out('responds').
in('responds').
groupCount().
order(local).by(values)
With that change you can see that we can rid ourselves of all those has() filters because they are implicitly implied by the "responds" edges which encode them into the graph data itself.

How to get a path from one node to another including all other nodes and relationships involved in between

I have designed a model in Neo4j in order to get paths from one station to another including platforms/legs involved. The model is depicted down here. Basically, I need a query to take me from NBW to RD. also shows the platforms and legs involved. I am struggling with the query. I get no result. Appreciate if someone helps.
Here is my cypher statement:
MATCH p = (a:Station)-[r:Goto|can_board|can_alight|has_platfrom*0..]->(c:Station)
WHERE (a.name='NBW')
AND c.name='RD'
RETURN p
Model:
As mentioned in the comments, in Cypher you can't use a directed variable-length relationship that uses differing directions for some of the relationships.
However, APOC Procedures just added the ability to expand based on sequences of relationships. You can give this a try:
MATCH (start:station), (end:station)
WHERE start.name='NBW' AND end.name='THT'
CALL apoc.path.expandConfig(start, {terminatorNodes:[end], limit:1,
relationshipFilter:'has_platform>, can_board>, goto>, can_alight>, <has_platform'}) YIELD path
RETURN path
I added a limit so that only the first (and shortest) path to your end station will be returned. Removing the limit isn't advisable, since this will continue to repeat the relationships in the expansion, going from station to station, until it finds all possible ways to get to your end station, which could hang your query.
EDIT
Regarding the new model changes, the reason the above will not work is because relationship sequences can't contain a variable-length sequence within them. You have 2 goto> relationships to traverse, but only one is specified in the sequence.
Here's an alternative that doesn't use sequences, just a whitelisting of allowed relationships. The spanningTree() procedure uses NODE_GLOBAL uniqueness so there will only be a single unique path to each node found (paths will not backtrack or revisit previously-visited nodes).
MATCH (start:station), (end:station)
WHERE start.name='NBW' AND end.name='RD'
CALL apoc.path.spanningTree(start, {terminatorNodes:[end], limit:1,
relationshipFilter:'has_platform>|can_board>|goto>|can_alight>|<has_platform'}) YIELD path
RETURN path
Your query is directed --> and not all of the relationships between your two stations run in the same direction. If you remove the relationship direction you will get a result.
Then once you have a result I think something like this could get you pointed in the right direction on extracting the particular details from the resulting path once you get that working.
Essentially I am assuming that everything you are interested in is in your path that is returned you just need to filter out the different pieces that are returned.
As #InverseFalcon points out this query should be limited in a larger graph or it could easily run away.
MATCH p = (a:Station)-[r:Goto|can_board|can_alight|has_platfrom*0..]-(c:Station)
WHERE (a.name='NBW')
AND c.name='THT'
RETURN filter( n in nodes(p) WHERE 'Platform' in labels(n)) AS Platforms

Resources