Match nodes where all relations satisfy constraints - graph

I'm looking to find nodes that have relations where all relations satisfy that constraint. the exact example is do you have a relation in a list.
the graph is bascially cocktails, with the relations being ingredients. given a list of ingredients i want to know what I can make.
with ['Sweet Vermouth', 'Gin', 'Campari', 'Bourbon'] as list
...
should return Negroni, Boulevardier, ...
I've been finding this tricky because we want to make sure that all relations of a node satisfy the constraint, but the number of nodes could very easily be a subset of the list and not an exact match to the ingredient list.
this is the best I've done so far, and it only works if you have all the ingredients, but nothing extra.
with ['Sweet Vermouth', 'Gin', 'Campari', 'Bourbon'] as list
MATCH (n:Cocktail)-[h:HAS]-(x)
WITH list, count(list) AS lth, n, COLLECT(DISTINCT x.name) AS cx, collect(DISTINCT h) as hh
WHERE ALL (i IN list WHERE i IN cx)
RETURN n
I'ved looked at stackoverflow.com/a/62053139/974731. I don't think it solves my problem
as you can see the addition of Bourbon removes the Negroni, which shouldn't happen since all we've done is add an ingredient to our bar.

This should return all cocktails whose needed ingredients are in the have list.
WITH ['Sweet Vermouth', 'Gin', 'Campari', 'Bourbon'] as have
MATCH (c:Cocktail)-[:HAS]->(x)
WITH have, c, COLLECT(x.name) AS needed
WHERE ALL(n IN needed WHERE n IN have)
RETURN c
Or, if you pass have as a parameter:
MATCH (c:Cocktail)-[:HAS]->(x)
WITH c, COLLECT(x.name) AS needed
WHERE ALL(n IN needed WHERE n IN $have)
RETURN c

It's terribly hacky, but this is where I got
with ['Sweet Vermouth', 'Gin', 'Campari', 'Bourbon'] as list
call {
match (ali:Cocktail)--(ii:Ingredient) //pull all nodes
return ali, count(ii) as needed // get count for needed ingredients
}
MATCH (ali)--(i:Ingredient)
WHERE i.name in list // get ingredients that are in the list
WITH distinct ali.name as name, count(ali.name) as available, needed
WHERE available = needed
RETURN name;

Related

How to retrieve a node based on its top-most parent node in Neo4j

I have a hierarchy of node relationships like:
Organisation -> Department -> System -> Function -> Port - > Request -> Response -> Parameter
The query -
MATCH q=(p)-[*]->(b:checkoutby) WHERE p.name ="william" RETURN q
gives the entire network belonging to the Parent node -> william up-till the last node mentioned -> checkoutby.
However, I want only the two related nodes to appear.
I tried the query -
MATCH (n:william) WHERE n is null RETURN n UNION MATCH n=(p)-
[:Parameter]->(b) WHERE
b.name ="checkoutBy" RETURN n
But here the effect of "william" node i.e. the first parent node is nullified and we get the output irrespective of the parent node.
For which, I even tried this query -
MATCH (n) WHERE none(node in nodes(n) WHERE node:william) RETURN n
UNION MATCH n=(p)--()-[:Parameter]->(b) WHERE b.name ="cabinet"
RETURN n
but I get error -
Neo.ClientError.Statement.SyntaxError: Type mismatch: expected Path but was Node (line 1, column 36 (offset: 35))
"MATCH (n) WHERE none(node in nodes(n) WHERE node: william ) RETURN n UNION MATCH n=(p)--()-[:Parameter]->(b) WHERE b.name ="cabinet" RETURN n"
I even tried the intersection query but to no avail .
MATCH (n1:william), (n2),(q:cabinet)
WHERE (n1)<-[:Department]-() AND (n2)<-[:Parameter]-(q)
RETURN count(q), collect(q.name)
Warning Error-
This query builds a Cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this will build a Cartesian product between all those parts. This may produce a large amount of data and slow down query processing. While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH (identifier is: (n2))
EXPLAIN MATCH (n1:william), (n2),(ego:cabinet)
^
Even this query doesn't work -
MATCH (n:william) RETURN n UNION MATCH n=(p)-[:Parameter]->(b)
WHERE b.name ="checkoutBy"
call apoc.path.expandConfig(n, {labelFilter:'-william'}) yield path
return path
I want to retrieve the checkoutby / cabinet node only if it is from the topmost parent node (william).
I don't have reputations to comment so asking here:
It's not clear from your question if William is a Name property or a Label?
You used it as a name in the first query and as a Label in all other queries.
I am assuming it is a Label, It looks like it is a label from the screenshot you shared.
If you want to check if the checkoutby/cabinet node is related to William node and return only if it's related you can use the following query:
MATCH (w:william)-[*]-(c:checkoutby) return w,c
Please note: These type of queries consumes too much Memory.
If I've understood your problem the (b:checkoutby) node does not have any incoming relationships so you could write :
MATCH (p)-[*]->(b:checkoutby) WHERE p.name ="william" AND NOT EXISTS ( (b)-[]->()) RETURN p, b

Xquery - How to match two sequences within a quantifier expression

Like many, I'm tackling the Mondial database on XML. It would be a piece of cake, if XQuery syntax wasn't doing its best to sabotage.
let $inland := //province/#id
where every $sea in //sea satisfies
$sea/located/#province != $inland
return $inland
What I am trying to do in the above is find all "inland" provinces, the provinces that don't have a sea next to it. This, however, doesn't work, because the $sea/located/province is a big string, with every single province that it borders in it.
So I tried to modify into.
let $inland := //province/#id
where every $sea in //sea satisfies
not(contains($sea/located/#province, $inland))
return $inland
Where I would like it to only find the provinces that are a part of the sea's bordering provinces. Simple and straightforward.
Error message:
Stopped at C:/Users/saffekaffe/Desktop/mondial/xml/country_without_island.xml, 2/1:
[XPTY0004] Item expected, sequence found: (attribute id {"prov-Greece-2"},....
How do I get around this?
Example of //sea/located/province#
province="prov-France-5 prov-France-20 prov-France-89 prov-France-99"
Example of //province/#id
id="prov-Greece-2"
There are multiple ways in which XQuery works in a different way than you seem to expect.
The comparison operators = and != have existential semantics if at least one of their arguments is a sequence instead of a single item. This means that $seq1 = $seq2 is equivalent to some $x in $seq1, $y in $seq2 satisfies $x = $y. The query ('foo', 'bar') = ('bar', 'baz', 'quuz') returns true because there is at least one common item.
An XQuery exception like //province/#id evaluates to a sequence of all matching nodes. In your case that would be a sequence of over 1000 province IDs: (id="prov-cid-cia-Greece-2", id="prov-cid-cia-Greece-3", id="prov-cid-cia-Greece-4", [...]). This sequence is then bound to the variable $inland in your let clause. Since you don't iterate over individual items in $inland (for example using a for clause), the where condition then works on the whole sequence of all provinces worldwide at once. So your condition every $sea in //sea satisfies
$sea/located/#province != $inland now means:
"For every sea there is a province located next to it that has an #id that is not equal to at least one of all existing province IDs."
Th is returns false because there are seas with no located children, e.g.the Gulf of Aden.
contains($str, $sub) is not a good fit for checking if a substring is contained in a space-delimited string, because it also matches parts of entries: contains("foobar baz quux", "oob") returns true.
Instead you should either split the string into its parts using tokenize($str) and look through its parts, or use contains-token($str, $token).
Putting it all together, a correct query very similar to your original one is:
for $inland in //province/#id
where
every $sea in //sea
satisfies not(contains-token($sea/located/#province, $inland))
return $inland
Another approach would be to first gather all (unique) provinces that are next to seas and then return all provinces not in that sequence:
let $next-to-sea := distinct-values(//sea/located/#province/tokenize(.))
return //province/#id[not(. = $next-to-sea)]
Even more compact (but potentially less efficient):
//province/#id[not(. = //sea/located/#province/tokenize(.))]
On the other end of the spectrum you can use XQuery 3.0 maps to replace the potentially linear search through all seaside provinces by a single lookup:
let $seaside :=
map:merge(
for $id in //sea/located/#province/tokenize(.)
return map{ $id: () }
)
return //province/#id[not(map:contains($seaside, .))]

Neo4j Exclude nodes matching any given relations

My use-case contains two types of node Problem and Tag where a Problem can have "One-To-Many" relation with Tag i.e. there are multiple (Problem)-[:CONATINS]->(Tag) relations for a single problem (ignore the syntax). With the given array of tags I want cypher query to get Problem which do not contain any of those Tag
Sample Nodes:
Problem {id:101, rank:2.389} ; Tag {name: "python"}
Consider this example dataset:
CREATE (p1:Problem {id:1}),
(p2:Problem {id:2}),
(p3:Problem {id:3}),
(t1:Tag {name:'python'}),
(t2:Tag {name:'cypher'}),
(t3:Tag {name:'neo4j'}),
(t4:Tag {name:'ruby'}),
(t1)-[:TAGS]->(p1),
(t2)-[:TAGS]->(p1),
(t2)-[:TAGS]->(p2),
(t3)-[:TAGS]->(p2),
(t3)-[:TAGS]->(p3),
(t4)-[:TAGS]->(p3)
If you want problems that aren't tagged by python or cypher, you only want problem 3 to be returned.
MATCH (t:Tag)
WHERE t.name IN ['python', 'cypher']
MATCH (p:Problem)
WITH p, sum(size((t)-[:TAGS]->(p))) AS matches
WHERE matches = 0
RETURN p;
This returns only problem 3.
In your case, you have to MATCH both nodes which don't have connections to Tag and nodes which are connected to other Tag nodes (but not the ones in your list). I would split this in two queries:
// unconnected nodes
MATCH (p:Problem)
WHERE NOT (p)-[:CONTAINS]-()
RETURN p
// combine queries with union (both have to return the same)
UNION
// nodes which fullfill you criterion
MATCH (p:Problem)-[:CONTAINS]->(t:Tag)
// collect distinct Problem nodes with a list of associated Tag names
WITH DISTINCT p, collect(t.name) AS tag_name
// use a none predicate to filter all Problems where none
// of the query tag names are found
WHERE none(x IN ['python', 'java', 'haskell'] WHERE x IN tag_name)
RETURN p

Using Parameters in Neo4j Relationship Queries

I'm struggling to work around a small limitation of Neo4j in that I am unable to use a parameter in the Relationship section of a Cypher query.
Christophe Willemsen has already graciously assisted me in working my query to the following:
MATCH (n1:Point { name: {n1name} }),
(n2:Point { name: {n2name} }),
p = shortestPath((n1)-[r]->(n2))
WHERE type(r) = {relType}
RETURN p
Unfortunately as r is a Collection of relationships and not a single relationship, this fails with an error:
scala.collection.immutable.Stream$Cons cannot be cast to org.neo4j.graphdb.Relationship
Removing the use of shortestPath() allows the query to run successfully but returns no results.
Essentially my graph is a massive collection of "paths" that link "points" together. It is currently structured as such:
http://console.neo4j.org/r/rholp
I need to be able to provide a starting point (n1Name), an ending point (n2Name), and a single path to travel along (relType). I need a list of nodes to come out of the query (all the ones along the path).
Have I structured my graph incorrectly / not optimally? I am open to advice on whether the overall structure is not optimal as well as advice on how best to structure the query!
EDIT
Regarding your edit, the nodes() function returns you the nodes along the path :
MATCH p=allShortestPaths((n:Point { name:"Point5" })-[*]->(n2:Point { name:"Point8" }))
WHERE ALL (r IN rels(p) WHERE type(r)={relType})
RETURN nodes(p)
In the console link, it is returning nodes Points 5,6,7,8
I guess in your case that using a common relationship type name for connecting your Point nodes would be more efficient.
If having a Path1, Path2, .. is for knowing the distance between two points, you can easily know the distance by asking for the length of the path, like this query related to your console link :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=shortestPath((n)-[]->(n2))
RETURN length(p)
If you need to return only paths having a defined relationship length, you can use it without the shortestPath by specifying a strict depth :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=(n)-[*3..3]->(n2)
RETURN length(p)
LIMIT1
As you can see here, the need to specify the relationship is not mandatory, you can just omit it or add the :NEXT type if you have other relationship types in your graph
If you need to match on the type, for e.g. the path from point 5 to point 8 in your console link, and the path can only have a PATH_TWO relationship, then you can do this :
MATCH (n:Point { name:"Point5" })
WITH n
MATCH (n2:Point { name:"Point8" })
WITH n, n2
MATCH p=(n)-[r*]->(n2)
WHERE type(r[0])= 'PATH_TWO'
WITH p, length(p) AS l
ORDER BY l
RETURN p, l
LIMIT 1
If you really NEED to have the Path1, Path2 style, maybe a short explanation on the need could help us find the more appropriate query
MATCH p=shortestpath((n1:Point{name:{n1name}})-[:relType *]->(n2:Point {name:{n2name}}))
RETURN p

Python: get all values associated with key in a dictionary, where the values may be a list or a single item

I'm looking to get all values associated with a key in a dictionary. Sometimes the key holds a single dictionary, sometimes a list of dictionaries.
a = {
'shelf':{
'book':{'title':'the catcher in the rye', 'author':'j d salinger'}
}
}
b = {
'shelf':[
{'book':{'title':'kafka on the shore', 'author':'haruki murakami'}},
{'book':{'title':'atomised', 'author':'michel houellebecq'}}
]
}
Here's my method to read the titles of every book on the shelf.
def print_books(d):
if(len(d['shelf']) == 1):
print d['shelf']['book']['title']
else:
for book in d['shelf']:
print book['book']['title']
It works, but doesn't look neat or pythonic. The for loop fails on the single value case, hence the if/else.
Can you improve on this?
Given your code will break if you have a list with a single item (and this is how I think it should be), if you really can't change your data structure this is a bit more robust and logic:
def print_books(d):
if isinstance(d['shelf'], dict):
print d['shelf']['book']['title']
else:
for book in d['shelf']:
print book['book']['title']
Why not always make 'shelf' map to a list of elements, but in the single element case it's a ... single element list? Then you'd always be able to treat each bookshelf the same.
def print_books(d):
container = d['shelf']
books = container if isinstance(container, list) else [container['book']]
books = [ e['book'] for e in books ]
for book in books:
print book['title']
I would first get the input consistent, then loop through all the books even if only one.
def print_books(d):
books = d['shelf'] if type(d['shelf']) is list else [ d['shelf'] ]
for book in books:
print book['book']['title']
I think this looks a little neater and pythonic, although some might argue not as efficient as your original code to create an array with one element and loop through it.

Resources