Cypher query : Selecting branches without missing parts - collections

I'm new to Cypher and am struggling to do a basic filter.
Given the following model I would like to only retrieve "d2".
a1-
| \c1
| / \
b1- \d1
/
a2- /
\c2
/
b2-
a3-
| \c3--d2
| /
b3-
I saw things to identify the missing part :
(match (a)--(c)--(d) WHERE NOT (a)--(b)--(c) return d)
which gives me "d1".
How can I do WHERE ALL (a)--(b), or remove a set of nodes from above query (d1) from a given list (d1, d2)?

Found what I think is a dirty way (guess performance would be crappy) but does the trick :
match (a)--(c)--(b), (c)--(d)
where not (a)--(b)
with collect(d) as excluded
match (a)--(c)--(b)--(a), (c)--(d)
where NOT d IN excluded
return d;
Any better idea would be great :-)

Related

Does materialize function not work with tabular function argument in Kusto?

I created a simple function MaterializeRemoteInputTable below which accepts the output of a table as an input.
To test if the input is materialized, I am using it twice to calculate two scalar values - x and y:
.create-or-alter function MaterializeRemoteInputTable(inputTable: (State:string)) {
let materializedInput = materialize(inputTable);
let x = toscalar(materializedInput | count );
let y = toscalar(materializedInput | where State contains "S" | count);
print x, y;
}
I am using the sample kusto database's function output as an input to the above function:
cluster('https://help.kusto.windows.net').database('Samples').
GetStatesWithPopulationSmallerThan(1000000)
| invoke MaterializeRemoteInputTable()
On querying the help.kusto.windows.net cluster with my ClientActivityId, I see two queries are executed:
.show commands-and-queries
| where ClientActivityId contains "KE.RunQuery;<guid...>"
Above query outputs two rows:
GetStatesWithPopulationSmallerThan(long(1000000))|__executeAndCache|count as Count|limit long(1)|project ["b2fb..."]=["Count"]
GetStatesWithPopulationSmallerThan(long(1000000))|__executeAndCache|where (["State"] contains ("S"))|count as Count|limit long(1)|project ["5d0e2..."]=["Count"]
Since I have materialized the input to my MaterializeRemoteInputTable, why are two queries executed on the remote cluster, once each for x and y?
There is no issue with materialize & tabular argument to a function.
The issue is a known limitation in materialize with cross cluster query that in some cases the materialization is not performed.
Please note that this is only a performance issue and not a logical issue (query results are guaranteed to be correct).

Match nodes where all relations satisfy constraints

I'm looking to find nodes that have relations where all relations satisfy that constraint. the exact example is do you have a relation in a list.
the graph is bascially cocktails, with the relations being ingredients. given a list of ingredients i want to know what I can make.
with ['Sweet Vermouth', 'Gin', 'Campari', 'Bourbon'] as list
...
should return Negroni, Boulevardier, ...
I've been finding this tricky because we want to make sure that all relations of a node satisfy the constraint, but the number of nodes could very easily be a subset of the list and not an exact match to the ingredient list.
this is the best I've done so far, and it only works if you have all the ingredients, but nothing extra.
with ['Sweet Vermouth', 'Gin', 'Campari', 'Bourbon'] as list
MATCH (n:Cocktail)-[h:HAS]-(x)
WITH list, count(list) AS lth, n, COLLECT(DISTINCT x.name) AS cx, collect(DISTINCT h) as hh
WHERE ALL (i IN list WHERE i IN cx)
RETURN n
I'ved looked at stackoverflow.com/a/62053139/974731. I don't think it solves my problem
as you can see the addition of Bourbon removes the Negroni, which shouldn't happen since all we've done is add an ingredient to our bar.
This should return all cocktails whose needed ingredients are in the have list.
WITH ['Sweet Vermouth', 'Gin', 'Campari', 'Bourbon'] as have
MATCH (c:Cocktail)-[:HAS]->(x)
WITH have, c, COLLECT(x.name) AS needed
WHERE ALL(n IN needed WHERE n IN have)
RETURN c
Or, if you pass have as a parameter:
MATCH (c:Cocktail)-[:HAS]->(x)
WITH c, COLLECT(x.name) AS needed
WHERE ALL(n IN needed WHERE n IN $have)
RETURN c
It's terribly hacky, but this is where I got
with ['Sweet Vermouth', 'Gin', 'Campari', 'Bourbon'] as list
call {
match (ali:Cocktail)--(ii:Ingredient) //pull all nodes
return ali, count(ii) as needed // get count for needed ingredients
}
MATCH (ali)--(i:Ingredient)
WHERE i.name in list // get ingredients that are in the list
WITH distinct ali.name as name, count(ali.name) as available, needed
WHERE available = needed
RETURN name;

Using Parameters in Neo4j Relationship Queries

I'm struggling to work around a small limitation of Neo4j in that I am unable to use a parameter in the Relationship section of a Cypher query.
Christophe Willemsen has already graciously assisted me in working my query to the following:
MATCH (n1:Point { name: {n1name} }),
(n2:Point { name: {n2name} }),
p = shortestPath((n1)-[r]->(n2))
WHERE type(r) = {relType}
RETURN p
Unfortunately as r is a Collection of relationships and not a single relationship, this fails with an error:
scala.collection.immutable.Stream$Cons cannot be cast to org.neo4j.graphdb.Relationship
Removing the use of shortestPath() allows the query to run successfully but returns no results.
Essentially my graph is a massive collection of "paths" that link "points" together. It is currently structured as such:
http://console.neo4j.org/r/rholp
I need to be able to provide a starting point (n1Name), an ending point (n2Name), and a single path to travel along (relType). I need a list of nodes to come out of the query (all the ones along the path).
Have I structured my graph incorrectly / not optimally? I am open to advice on whether the overall structure is not optimal as well as advice on how best to structure the query!
EDIT
Regarding your edit, the nodes() function returns you the nodes along the path :
MATCH p=allShortestPaths((n:Point { name:"Point5" })-[*]->(n2:Point { name:"Point8" }))
WHERE ALL (r IN rels(p) WHERE type(r)={relType})
RETURN nodes(p)
In the console link, it is returning nodes Points 5,6,7,8
I guess in your case that using a common relationship type name for connecting your Point nodes would be more efficient.
If having a Path1, Path2, .. is for knowing the distance between two points, you can easily know the distance by asking for the length of the path, like this query related to your console link :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=shortestPath((n)-[]->(n2))
RETURN length(p)
If you need to return only paths having a defined relationship length, you can use it without the shortestPath by specifying a strict depth :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=(n)-[*3..3]->(n2)
RETURN length(p)
LIMIT1
As you can see here, the need to specify the relationship is not mandatory, you can just omit it or add the :NEXT type if you have other relationship types in your graph
If you need to match on the type, for e.g. the path from point 5 to point 8 in your console link, and the path can only have a PATH_TWO relationship, then you can do this :
MATCH (n:Point { name:"Point5" })
WITH n
MATCH (n2:Point { name:"Point8" })
WITH n, n2
MATCH p=(n)-[r*]->(n2)
WHERE type(r[0])= 'PATH_TWO'
WITH p, length(p) AS l
ORDER BY l
RETURN p, l
LIMIT 1
If you really NEED to have the Path1, Path2 style, maybe a short explanation on the need could help us find the more appropriate query
MATCH p=shortestpath((n1:Point{name:{n1name}})-[:relType *]->(n2:Point {name:{n2name}}))
RETURN p

Gremlin: How to reference for loop variable within query?

reI'm stuck with a gremlin query to assign rank values to nodes based on a sorted list of keys passed to the query as a parameter.
Each node identified by "uniqueId" values should be assigned a rank based on order of occurrence in the reranked array.
This works:
reranked = [uniqueId1, uniqueId2, uniqueId3]
v.outE.as('e').inV.filter{it.key == reranked[2]}.back('e').sideEffect{it.rank = 2}
But this doesn't (replacing int with for-loop variable):
reranked = [uniqueId1, uniqueId2, uniqueId3]
for (i in 1..reranked.size())
v.outE.as('e').inV.filter{it.key == reranked[i]}.back('e').sideEffect{it.rank = i}
Do you know why this doesn't work? I'd also be happy for simpler ideas to reach the same goal.
You could do this using Groovy's eachWithIndex like:
reranked = [uniqueId1, uinqueId2, uniqueId3]
reranked.eachWithIndex{uniqueId, idx -> v.outE.as('e').inV.has('key', uniqueId).back('e').sideEffect{it.rank = idx} }
I have used Gremlin's has step above because it's much more efficient than filter in case of simple property look-up.
Well, it looks like I found a solution, maybe clumsy but hey, it works!
c=0
v.as('st').outE.as('e').inV.filter{it.key == reranked[c]}.back('e').sideEffect{
it.rank = reranked.size() - c }.outV.loop('st'){ c++ < so.size() }
I will still accept another answer if there's a fix for above situation and perhaps a more elgant approach to the solution.

pyparsing for querying a database of chemical elements

I would like to parse a query for a database of chemical elements.
The database is stored in a xml file. Parsing that file produces a nested dictionary that is stored in a singleton object that inherit from collections.OrderedDict.
Asking for an element will give me an ordered dictionary of its corresponding properties
(i.e. ELEMENTS['C'] --> {'name':'carbon','neutron' : 0,'proton':6, ...}).
Conversely, asking for a propery will give me an ordered dictionary of its values for all the elements (i.e. ELEMENTS['proton'] --> {'H' : 1, 'He' : 2} ...).
A typical query could be:
mass > 10 or (nucleon < 20 and atomic_radius < 5)
where each 'subquery' (i.e. mass > 10) will return the set of elements that matches it.
Then, the query will be converted and transformed internally to a string that will be evaluated further to produce a set of the indexes of the elements that matched it. In that context the operators and/or are not boolean operator but rather ensemble operator that acts upon python sets.
I recently sent a post for building such a query. Thanks to the useful answers I got, I think that I did more or less the job (I hope on a nice way !) but I still have some questions related to pyparsing.
Here is my code:
import numpy
from pyparsing import *
# This import a singleton object storing the datase dictionary as
# described earlier
from ElementsDatabase import ELEMENTS
and_operator = oneOf(['and','&'], caseless=True)
or_operator = oneOf(['or' ,'|'], caseless=True)
# ELEMENTS.properties is a property getter that returns the list of
# registered properties in the database
props = oneOf(ELEMENTS.properties, caseless=True)
# A property keyword can be quoted or not.
props = Suppress('"') + props + Suppress('"') | props
# When parsed, it must be replaced by the following expression that
# will be eval later.
props.setParseAction(lambda t : "numpy.array(ELEMENTS['%s'].values())" % t[0].lower())
quote = QuotedString('"')
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
float_ = Regex(r'[+-]?(\d+(\.\d*)?)?([eE][+-]?\d+)?').setParseAction(lambda t:float(t[0]))
comparison_operator = oneOf(['==','!=','>','>=','<', '<='])
comparison_expr = props + comparison_operator + (quote | float_ | integer)
comparison_expr.setParseAction(lambda t : "set(numpy.where(%s)%s%s)" % tuple(t))
grammar = Combine(operatorPrecedence(comparison_expr, [(and_operator, 2, opAssoc.LEFT) (or_operator, 2, opAssoc.LEFT)]))
# A test query
res = grammar.parseString('"mass " > 30 or (nucleon == 1)',parseAll=True)
print eval(' '.join(res._asStringList()))
My question are the following:
1 using 'transformString' instead of 'parseString' never triggers any
exception even when the string to be parsed does not match the grammar.
However, it is exactly the functionnality I need. Is there is a way to do so ?
2 I would like to reintroduce white spaces between my tokens in order
that my eval does not fail. The only way I found to do so it the one
implemented above. Would you see a better way using pyparsing ?
sorry for the long post but I wanted to introduce in deeper details its context. BTW, if you find this approach bad, do not hesitate to tell it me!
thank you very much for your help.
Eric
do not worry about my concern, I found a work around. I used the SimpleBool.py example shipped with pyparsing (thanks for the hint Paul).
Basically, I used the following approach:
1 for each subquery (i.e. mass > 10), using the setParseAction method,
I joined a function that returns the set of eleements that matched
the subquery
2 then, I joined the following functions for each logical operator (and,
or and not):
def not_operator(token):
_, s = token[0]
# ELEMENTS is the singleton described in my original post
return set(ELEMENTS.keys()).difference(s)
def and_operator(token):
s1, _, s2 = token[0]
return (s1 and s2)
def or_operator(token):
s1, _, s2 = token[0]
return (s1 or s2)
# Thanks for Paul for the hint.
grammar = operatorPrecedence(comparison_expr,
[(not_token, 1,opAssoc.RIGHT,not_operator),
(and_token, 2, opAssoc.LEFT,and_operator),
(or_token, 2, opAssoc.LEFT,or_operator)])
Please not that these operators acts upon python sets rather than
on booleans.
And that does the job.
I hope that this approach will help anyone of you.
Eric

Resources