Neo4j: How to return a single path for each pair of nodes that have multiple relationships - graph

Assuming a graph like this:
(Thanks to https://neo4j.com/blog/neo4j-2-0-ga-graphs-for-everyone/ )
(Not shown but assume all countries, all artists, and all recording contracts are in the graph)
What would the CYPHER be for:
Starting with United Kingdom, return one path for each country where there is at least one recording contract
It doesn't matter which path is returned, just that it's a single path
Should return (United Kingdom)<-[]-(Iron Maiden)-[]->(Epic)-[]->(United States), but not (United Kingdom)<-[]-(Hybrid Theory)-[]->(Mad Decent)-[]->(United States) or (United Kingdom)<-[]-(Iron Maiden)-[]->(Columbia)-[]->(United States), for example
Return a single path for each of any two countries that are connected
Should return one path for (United Kingdom)-[]-(United States), one for (Japan)-[]-(Canada), etc. Bonus points for LIMIT 20 limiting it to either 20 paths or 20 country nodes
Also does not matter which path is returned, just that it's a single path
Edit: I've tried various combinations of MATCH (c1:Country)-[]-(c2:Country), MATCH p=((c1:Country)-[]-(c2:Country)), WITH, and UNWIND. I've also tried to use FOREACH to return only one path, but can't quite get the formula right.

This is easier if you are using subqueries (Neo4j 4.1.x or higher). That's because the subquery can help scope the operations you need to perform (collect(), in this case) to expansions and work from a single country, per country, instead of having to perform it across all rows for the entirety of the query, which could stress the heap.
In reality, since the number of countries are low, it won't be a problem, but it's a good approach to use when dealing with larger sets of nodes.
MATCH (country:Country)
CALL {
WITH country
MATCH path = (country)<-[:FROM_AREA]-(:Artist)-[:RECORDING_CONTRACT]->(:Label)-[:FROM_AREA]->(other:Country)
WHERE id(country) < id(other)
RETURN other, collect(path)[0] as path
LIMIT 20
}
RETURN country, path
LIMIT 20
Let's look at what this is doing.
We MATCH to :Country nodes.
Per country we will MATCH to the pattern you're looking for. If these are the only such paths and labels in the graph, then you can omit the labels in the pattern, as the relationship types should be enough to find the correct nodes.
The WHERE id(country) < id(other) is here to prevent mirrored results. For example, in the course of the query if we find a path from (United Kingdom)-[*]-(United States), and we also find a path the other direction, for (United States)-[*]-(United Kingdom), you probably don't want to return both. So we place a restriction on the graph ids so that only one of these will meet the restriction, and the mirrored result gets filtered out.
We use RETURN other, collect(path)[0] as path to get a single path per the country and other nodes. Remember that this is happening inside a subquery being called per country node, so even though country is not present here, this operation is being performed for a specific country node.
When we aggregate (such as with this collect(path), the grouping key (usually the non-aggregation variables) become distinct, so for the country and the other country, this will collect all the paths between them and then take the first of that list of paths, so we get our single path between two distinct countries.
We LIMIT the subquery results to 20, since we know in total we don't want more than 20 paths, so per country we don't want more than 20 paths either. This might be a bit redundant for this case, but when the query is more complex it is the right approach to make sure you're not doing more work than is needed.
We also have another LIMIT outside the subquery, so that if there are only a few countries processed, with a few paths per country, the total paths won't exceed 20.

Related

Gremlin query - how to eliminate nested coalesce

I have person vertex, has_vehicle edge and vehicle vertex which models vehicle ownership use case. The graph path is person -> has_vehicle -> vehicle.
I want to implement a Gremlin query which associates a vehicle to a person only if
The person does not have a vehicle
AND
The input vehicle is not associated with a person yet.
I followed the fold-coalesce-unfold pattern and came out with following Gremlin query with nested coalesce
g.V().hasLabel('person').has('name', 'Tom').as('Tom').outE('has_vehicle').fold().coalesce(
__.unfold(), // check if Tom already have a vehicle
g.V().has('vehicle', 123).as('Vehicle').inE('has_vehicle').fold().coalesce(
__.unfold(), // check if vehicle 123 is already associated with a person
__.addE('has_vehicle').from('Tom').to('Vehicle') // associate the vehicle to Tom
)
)
Is there a way to eliminate the nested coalesce? If I have multiple criteria, it would be too complex to write the query.
This might be a case where a couple of where(not(...)) patterns, rather than nesting coalesce steps works well. For example, we might change the query as shown below.
g.V().hasLabel('person').has('name', 'Tom').as('Tom').
where(not(outE('has_vehicle'))).
V().has('vehicle', 123).as('Vehicle').
where(not(inE('has_vehicle'))).
addE('has_vehicle').from('Tom').to('Vehicle')
So long as the V steps do not fan out and yield multiple Tom or Vehicle nodes that should work and is easy to extend by adding more to the where filters as needed.
As as a side note, the not steps used above should work even if not wrapped by where steps, but I tend to find it just reads better as written.
This rewrite does make an assumption that you are able to tolerate the case where Tom already has a car and the query just ends there. In that case no vertex or edge will be returned. If you did a toList to run the query you would get an empty list back in that case however to indicate nothing was done.

Incorrect work of autocomplete with Cyrillic

When sending a request to https://autocomplete.geocode.ls.hereapi.com/6.2/suggest.json?query=Вильнюс with an indication of cyrillic nothing comes and with a latin https: //autocomplete.geocode.ls.heraapi.com/6.2/suggest.json?query=Viln all is well. Tell me what the problem is or what I'm doing wrong?
You're not doing anything wrong. Autocomplete is designed to give you addresses that contain (perfectly match) your input string, and the results are sorted by relevance.
When you make your query in russian and provide only "Вильнюс" as input, the service is finding a lot of results (street names) that it considers are more relevant than the city. The city name is also found, but since the service doesn't think that this is what you're searching for, it puts the city much lower in the results list. You don't see it because you're limiting your query to give you only the first 10 matches (with the maxresults=10 parameter), but if you change the maxresults parameter to 20, for example, you will see that Vilnius appears in the 16th place of the API response.
If you want the service to better understand what is the thing you're querying for, you'll need to provide additional information. For example, if you continue typing and your input string is now "Вильнюс " (with a space at the end) or "Вильнюс Л" (a space and another letter), the service will understand what you mean and will return the result you want.
Another way of providing more information to change the way the service ranks the results is by adding a spatial filter, like the country, mapview, or prox parameters mentioned in the API Reference section of the documentation. Alternatively, the resultType parameter can help you filter out all the results with street names and return only city names, if that's what you want. These are just some options available, the one that is right for you will depend on your use case.

How to get a path from one node to another including all other nodes and relationships involved in between

I have designed a model in Neo4j in order to get paths from one station to another including platforms/legs involved. The model is depicted down here. Basically, I need a query to take me from NBW to RD. also shows the platforms and legs involved. I am struggling with the query. I get no result. Appreciate if someone helps.
Here is my cypher statement:
MATCH p = (a:Station)-[r:Goto|can_board|can_alight|has_platfrom*0..]->(c:Station)
WHERE (a.name='NBW')
AND c.name='RD'
RETURN p
Model:
As mentioned in the comments, in Cypher you can't use a directed variable-length relationship that uses differing directions for some of the relationships.
However, APOC Procedures just added the ability to expand based on sequences of relationships. You can give this a try:
MATCH (start:station), (end:station)
WHERE start.name='NBW' AND end.name='THT'
CALL apoc.path.expandConfig(start, {terminatorNodes:[end], limit:1,
relationshipFilter:'has_platform>, can_board>, goto>, can_alight>, <has_platform'}) YIELD path
RETURN path
I added a limit so that only the first (and shortest) path to your end station will be returned. Removing the limit isn't advisable, since this will continue to repeat the relationships in the expansion, going from station to station, until it finds all possible ways to get to your end station, which could hang your query.
EDIT
Regarding the new model changes, the reason the above will not work is because relationship sequences can't contain a variable-length sequence within them. You have 2 goto> relationships to traverse, but only one is specified in the sequence.
Here's an alternative that doesn't use sequences, just a whitelisting of allowed relationships. The spanningTree() procedure uses NODE_GLOBAL uniqueness so there will only be a single unique path to each node found (paths will not backtrack or revisit previously-visited nodes).
MATCH (start:station), (end:station)
WHERE start.name='NBW' AND end.name='RD'
CALL apoc.path.spanningTree(start, {terminatorNodes:[end], limit:1,
relationshipFilter:'has_platform>|can_board>|goto>|can_alight>|<has_platform'}) YIELD path
RETURN path
Your query is directed --> and not all of the relationships between your two stations run in the same direction. If you remove the relationship direction you will get a result.
Then once you have a result I think something like this could get you pointed in the right direction on extracting the particular details from the resulting path once you get that working.
Essentially I am assuming that everything you are interested in is in your path that is returned you just need to filter out the different pieces that are returned.
As #InverseFalcon points out this query should be limited in a larger graph or it could easily run away.
MATCH p = (a:Station)-[r:Goto|can_board|can_alight|has_platfrom*0..]-(c:Station)
WHERE (a.name='NBW')
AND c.name='THT'
RETURN filter( n in nodes(p) WHERE 'Platform' in labels(n)) AS Platforms

How would you find the most specific "filter" that matches a document? (determining which market segment a user fits in)

Imagine you have actions setup for when a user is from a certain demographic/market segment. The filters work a bit like a graph, matching for country, region, platform, operating system, and browser.
By default, you will match any value (if you specify US, you match for all users from the US regardless of region, platform, OS, or browser)
If you specify multiple values for any property of the filter it works like an OR (can be any of the values you specified), for the filter to match all the properties must have at least one match or be empty (accept all), essentially an AND operation.
So we can have:
Segment #1:
Countries: United States, Canada
Segment #2:
Countries: United States
Regions: New York
Platform: Tablets
Segment #3
Countries: United States
Browser: Chrome
Segment #4
Countries: United States
Segment #5
Match all (all filters left empty)
Scenario #1
User from Canada on his Tablet
Result: Segment #1
Scenario #2
User from New York, United States visits from Google Chrome on his Tablet.
Result: Segment #2, because the filter more specifically matches the user (matches country, region, and platform)
Scenario #3
User from Texas visits from his desktop
Result: Segment #4, tie with segment #1 is resolved because Segment #4 only matches United States and is therefore more specific
Work so far
I was thinking I could take each segment and load it up into a graph database that looks something like this
Country -> Region -> Platform --> OS -> Browser -> Segment
Each node either has a value (ex: United States, Chrome, Firefox, etc) and relationships that link it to any node below it in the tree (Country -> Browser is okay, Browser -> Country is not) or is null ("match all").
Each relationship (represented by ->) would also store a weight used to resolve ties. Relationships from a catch-all node get the max weight as they will always lose to a more specific filter.
Example database (numbers on the lines are the weight, lower weight becomes the prefered path)
Potential query
So now I need a query (maybe neo4j can do this?) that does the following:
Find the top level country node with the same value as the user or null
Go through each relationship (sorted by weight in ascending order)
Find the longest path, ties go to the node connected by a relationship with the lowest weight (if the tie is between a relationship to a null/catch-all node, the null node loses)
Continue this loop until we find a segment #
I'm sorry for the long post, it's hard to explain what I'm getting at via text.
What I'm looking for
Am I on the right path to solving this problem?
Are there better ways to go about this?
What would be the best way to store these relationships (graph database?)
How can I build a query that does what I want?
tl;dr: Need a way decent/performant way of finding the longest/most specific path in a graph like data structure. Comments requesting clarification or with any related information/documentation/projects/reading are very welcome
With Neo4j, you can store properties in a relationship, example:
(u1:User{name:"foo"})-[:FRIEND_WITH{since : "2015/01/01"}]->(u2:User{name:"bar"})
I think you should store country nodes this way:
(usa:Country{name: "USA", other attributes...})
So you can find every single country by matching with Country label, and then filter with the name property to get the one you're looking for.
Same for the cities, you can do a simple relationship to store every city :
(usa:Country{ name: "USA"})-[:CONTAINS_CITY]->(n:City{name: "New York", other attributes...})
and then you can add platform etc after the city.
To match a segment related to a certain country, you can do this way (example for Scenario #1) :
Match (c:Country{name : "Canada"})-[*1..2]->(p:Platform{name : "Tablet"})-[*1..]->(s:Segment) return s
Then you can create your segment by using nodes and create relations between them, the only problem may be on this case:
User1 has a Tablet in Canada
User2 has a Tablet in Canada using
Chrome
In this case, because of the depth match on the relationship ([*1..]) the User1 can be on the same segment as User2. The solution is to create intermediate nodes with default values, in case you don't have browser informations for example.

How do i get all nodes in the graph on a certain relation ship type

I have build a small graph where all the screens are connected and the flow of the screen varies based on the system/user. So the system/user is the relationship type.
I am looking to fetch all nodes that are linked with a certain relation ship from a starting screen. I don't care about the depth since i don't know the depth of the graph.
Something like this, but the below query takes ever to get the result and its returning incorrect connections not matching the attribute {path:'CC'}
match (n:screen {isStart:true})-[r:NEXT*0..{path:'CC'}]-()
return r,n
A few suggestions:
Make sure you have created an index for :screen(isStart):
CREATE INDEX ON :screen(isStart);
Are you sure you want to include 0-length paths? If not, take out 0.. from your query.
You did not specify the directionality of the :NEXT relationships, so the DB has to look at both incoming and outgoing :NEXT relationships. If appropriate, specify the directionality.
To minimize the number of result rows, add a WHERE clause that ensures that the current path cannot be extended further.
Here is a proposed query that combines the last 3 suggestions (fix it up to suit your needs):
MATCH (n:screen {isStart:true})-[r:NEXT* {path:'CC'}]->(x)
WHERE NOT (x)-[:NEXT {path:'CC'}]->()
return r,n;

Resources