How to get a path from one node to another including all other nodes and relationships involved in between - graph

I have designed a model in Neo4j in order to get paths from one station to another including platforms/legs involved. The model is depicted down here. Basically, I need a query to take me from NBW to RD. also shows the platforms and legs involved. I am struggling with the query. I get no result. Appreciate if someone helps.
Here is my cypher statement:
MATCH p = (a:Station)-[r:Goto|can_board|can_alight|has_platfrom*0..]->(c:Station)
WHERE (a.name='NBW')
AND c.name='RD'
RETURN p
Model:

As mentioned in the comments, in Cypher you can't use a directed variable-length relationship that uses differing directions for some of the relationships.
However, APOC Procedures just added the ability to expand based on sequences of relationships. You can give this a try:
MATCH (start:station), (end:station)
WHERE start.name='NBW' AND end.name='THT'
CALL apoc.path.expandConfig(start, {terminatorNodes:[end], limit:1,
relationshipFilter:'has_platform>, can_board>, goto>, can_alight>, <has_platform'}) YIELD path
RETURN path
I added a limit so that only the first (and shortest) path to your end station will be returned. Removing the limit isn't advisable, since this will continue to repeat the relationships in the expansion, going from station to station, until it finds all possible ways to get to your end station, which could hang your query.
EDIT
Regarding the new model changes, the reason the above will not work is because relationship sequences can't contain a variable-length sequence within them. You have 2 goto> relationships to traverse, but only one is specified in the sequence.
Here's an alternative that doesn't use sequences, just a whitelisting of allowed relationships. The spanningTree() procedure uses NODE_GLOBAL uniqueness so there will only be a single unique path to each node found (paths will not backtrack or revisit previously-visited nodes).
MATCH (start:station), (end:station)
WHERE start.name='NBW' AND end.name='RD'
CALL apoc.path.spanningTree(start, {terminatorNodes:[end], limit:1,
relationshipFilter:'has_platform>|can_board>|goto>|can_alight>|<has_platform'}) YIELD path
RETURN path

Your query is directed --> and not all of the relationships between your two stations run in the same direction. If you remove the relationship direction you will get a result.
Then once you have a result I think something like this could get you pointed in the right direction on extracting the particular details from the resulting path once you get that working.
Essentially I am assuming that everything you are interested in is in your path that is returned you just need to filter out the different pieces that are returned.
As #InverseFalcon points out this query should be limited in a larger graph or it could easily run away.
MATCH p = (a:Station)-[r:Goto|can_board|can_alight|has_platfrom*0..]-(c:Station)
WHERE (a.name='NBW')
AND c.name='THT'
RETURN filter( n in nodes(p) WHERE 'Platform' in labels(n)) AS Platforms

Related

Gremlin query - how to eliminate nested coalesce

I have person vertex, has_vehicle edge and vehicle vertex which models vehicle ownership use case. The graph path is person -> has_vehicle -> vehicle.
I want to implement a Gremlin query which associates a vehicle to a person only if
The person does not have a vehicle
AND
The input vehicle is not associated with a person yet.
I followed the fold-coalesce-unfold pattern and came out with following Gremlin query with nested coalesce
g.V().hasLabel('person').has('name', 'Tom').as('Tom').outE('has_vehicle').fold().coalesce(
__.unfold(), // check if Tom already have a vehicle
g.V().has('vehicle', 123).as('Vehicle').inE('has_vehicle').fold().coalesce(
__.unfold(), // check if vehicle 123 is already associated with a person
__.addE('has_vehicle').from('Tom').to('Vehicle') // associate the vehicle to Tom
)
)
Is there a way to eliminate the nested coalesce? If I have multiple criteria, it would be too complex to write the query.
This might be a case where a couple of where(not(...)) patterns, rather than nesting coalesce steps works well. For example, we might change the query as shown below.
g.V().hasLabel('person').has('name', 'Tom').as('Tom').
where(not(outE('has_vehicle'))).
V().has('vehicle', 123).as('Vehicle').
where(not(inE('has_vehicle'))).
addE('has_vehicle').from('Tom').to('Vehicle')
So long as the V steps do not fan out and yield multiple Tom or Vehicle nodes that should work and is easy to extend by adding more to the where filters as needed.
As as a side note, the not steps used above should work even if not wrapped by where steps, but I tend to find it just reads better as written.
This rewrite does make an assumption that you are able to tolerate the case where Tom already has a car and the query just ends there. In that case no vertex or edge will be returned. If you did a toList to run the query you would get an empty list back in that case however to indicate nothing was done.

Neo4j: How to return a single path for each pair of nodes that have multiple relationships

Assuming a graph like this:
(Thanks to https://neo4j.com/blog/neo4j-2-0-ga-graphs-for-everyone/ )
(Not shown but assume all countries, all artists, and all recording contracts are in the graph)
What would the CYPHER be for:
Starting with United Kingdom, return one path for each country where there is at least one recording contract
It doesn't matter which path is returned, just that it's a single path
Should return (United Kingdom)<-[]-(Iron Maiden)-[]->(Epic)-[]->(United States), but not (United Kingdom)<-[]-(Hybrid Theory)-[]->(Mad Decent)-[]->(United States) or (United Kingdom)<-[]-(Iron Maiden)-[]->(Columbia)-[]->(United States), for example
Return a single path for each of any two countries that are connected
Should return one path for (United Kingdom)-[]-(United States), one for (Japan)-[]-(Canada), etc. Bonus points for LIMIT 20 limiting it to either 20 paths or 20 country nodes
Also does not matter which path is returned, just that it's a single path
Edit: I've tried various combinations of MATCH (c1:Country)-[]-(c2:Country), MATCH p=((c1:Country)-[]-(c2:Country)), WITH, and UNWIND. I've also tried to use FOREACH to return only one path, but can't quite get the formula right.
This is easier if you are using subqueries (Neo4j 4.1.x or higher). That's because the subquery can help scope the operations you need to perform (collect(), in this case) to expansions and work from a single country, per country, instead of having to perform it across all rows for the entirety of the query, which could stress the heap.
In reality, since the number of countries are low, it won't be a problem, but it's a good approach to use when dealing with larger sets of nodes.
MATCH (country:Country)
CALL {
WITH country
MATCH path = (country)<-[:FROM_AREA]-(:Artist)-[:RECORDING_CONTRACT]->(:Label)-[:FROM_AREA]->(other:Country)
WHERE id(country) < id(other)
RETURN other, collect(path)[0] as path
LIMIT 20
}
RETURN country, path
LIMIT 20
Let's look at what this is doing.
We MATCH to :Country nodes.
Per country we will MATCH to the pattern you're looking for. If these are the only such paths and labels in the graph, then you can omit the labels in the pattern, as the relationship types should be enough to find the correct nodes.
The WHERE id(country) < id(other) is here to prevent mirrored results. For example, in the course of the query if we find a path from (United Kingdom)-[*]-(United States), and we also find a path the other direction, for (United States)-[*]-(United Kingdom), you probably don't want to return both. So we place a restriction on the graph ids so that only one of these will meet the restriction, and the mirrored result gets filtered out.
We use RETURN other, collect(path)[0] as path to get a single path per the country and other nodes. Remember that this is happening inside a subquery being called per country node, so even though country is not present here, this operation is being performed for a specific country node.
When we aggregate (such as with this collect(path), the grouping key (usually the non-aggregation variables) become distinct, so for the country and the other country, this will collect all the paths between them and then take the first of that list of paths, so we get our single path between two distinct countries.
We LIMIT the subquery results to 20, since we know in total we don't want more than 20 paths, so per country we don't want more than 20 paths either. This might be a bit redundant for this case, but when the query is more complex it is the right approach to make sure you're not doing more work than is needed.
We also have another LIMIT outside the subquery, so that if there are only a few countries processed, with a few paths per country, the total paths won't exceed 20.

Order results by number of coincidences in edge properties

I'm working on a recommendation system that recommends other users. The first results should be the most "similar" users to the "searcher" user. Users respond to questions and the amount of questions responded in the same way is the amount of similarity.
The problem is that I don't know how to write the query
So in technical words I need to sort the users by the amount of edges that has specific property values, I tried with this query, I thought it should work but it doesn't work:
let query = g.V().hasLabel('user');
let search = __;
for (const question of searcher.questions) {
search = search.outE('response')
.has('questionId', question.questionId)
.has('answerId', question.answerId)
.aggregate('x')
.cap('x')
}
query = query.order().by(search.unfold().count(), order.asc);
Throws this gremlin internal error:
org.apache.tinkerpop.gremlin.process.traversal.step.util.BulkSet cannot be cast to org.apache.tinkerpop.gremlin.structure.Vertex
I also tried with multiple .by() for each question, but the result was not ordered by the amount of coincidence.
How can I write this query?
When you cap() an aggregate() it returns a BulkSet which is a Set that has counts for how many times each object exists in that Set. It behaves like a List when you iterate through it by unrolling each object the associated size of the count. So you get your error because the output of cap('x') is a BulkSet but because you are building search in a loop you are basically just calling outE('response') on that BulkSet and that's not valid syntax as has() expects a graph Element such as a Vertex as indicated by the error.
I think you would prefer something more like:
let query = g.V().hasLabel('user').
outE('response');
let search = [];
for (const question of searcher.questions) {
search.push(has('questionId', question.questionId).
has('answerId', question.answerId));
}
query = query.or(...search).
groupCount().
by(outV())
order(local).by(values, asc)
I may not have the javascript syntax exactly right (and I used spread syntax in my or() to just convey the idea quickly of what needs to happen) but basically the idea here is to filter edges that match your question criteria and then use groupCount() to count up those edges.
If you need to count users who have no connection then perhaps you could switch to project() - maybe like:
let query = g.V().hasLabel('user').
project('user','count').
by();
let search = [];
for (const question of searcher.questions) {
search.push(has('questionId', question.questionId).
has('answerId', question.answerId));
}
query = query.by(outE('response').or(...search).count()).
order().by('count', asc);
fwiw, I think you might consider a different schema for your data that might make this recommendation algorithm a bit more graph-like. A thought might be to make the question/answer a vertex (a "qa" label perhaps) and have edges go from the user vertex to the "qa" vertex. Then users directly link to the question/answers they gave. You can easily see by way of edges, a direct relationship, which users gave the same question/answer combination. That change allows the query to flow much more naturally when asking the question, "What users answered questions in the same way user 'A' did?"
g.V().has('person','name','A').
out('responds').
in('responds').
groupCount().
order(local).by(values)
With that change you can see that we can rid ourselves of all those has() filters because they are implicitly implied by the "responds" edges which encode them into the graph data itself.

From a given node get reachable node following unidirectional relationship and display that sub-graph as a tree

here is my problem:
I've got a graph where I have :Item with one relationship :CRAFTED_WITH to one :RECIPE and those :RECIPEhave one or more relationship :COMPOSED_OF{quantity} to ingredients that are :Item.
As you can imagine you can have several level of relationship to get from a high tier :Item to the most basic of components.
I want to be able to find all nodes that are reachable from a specific node while following only one direction. That part was easy I used the apoc procedure apoc.path.subgraphAll.
But now my next step is to have the result display as a tree and not a graph. In a graph I will ended up with multiple :Item on the receiving end of :COMPOSED_OF relationship. I want :Item to be "duplicated" so they are linked by a single :COMPOSED_OF relationship.
Is it even feasible only in cypher ? Or will I have to use another language to handle a graph to turn it into that "tree" structure ?
There is an apoc function to do that. The cypher below illustrates what is does.
MATCH treePath=(root:Thing)-[:CHILD*0..]->(leaf:Thing)
WHERE NOT (leaf)-[:CHILD]->()
AND NOT ()-[:CHILD]->(root)
WITH COLLECT(treePath) AS treePaths
CALL apoc.convert.toTree(treePaths) yield value AS tree
RETURN tree

How do i get all nodes in the graph on a certain relation ship type

I have build a small graph where all the screens are connected and the flow of the screen varies based on the system/user. So the system/user is the relationship type.
I am looking to fetch all nodes that are linked with a certain relation ship from a starting screen. I don't care about the depth since i don't know the depth of the graph.
Something like this, but the below query takes ever to get the result and its returning incorrect connections not matching the attribute {path:'CC'}
match (n:screen {isStart:true})-[r:NEXT*0..{path:'CC'}]-()
return r,n
A few suggestions:
Make sure you have created an index for :screen(isStart):
CREATE INDEX ON :screen(isStart);
Are you sure you want to include 0-length paths? If not, take out 0.. from your query.
You did not specify the directionality of the :NEXT relationships, so the DB has to look at both incoming and outgoing :NEXT relationships. If appropriate, specify the directionality.
To minimize the number of result rows, add a WHERE clause that ensures that the current path cannot be extended further.
Here is a proposed query that combines the last 3 suggestions (fix it up to suit your needs):
MATCH (n:screen {isStart:true})-[r:NEXT* {path:'CC'}]->(x)
WHERE NOT (x)-[:NEXT {path:'CC'}]->()
return r,n;

Resources