Get all childs of a particular node till a particular depth - graph

I have this relationship in my neo4j:
Parent -> Childs
F -> D,E
D -> A,B,C
A -> X
Use case: I am trying to get all child of a particular node using this query till a particular depth let's say depth = 2
Query to get All child of node F
MATCH (p:Person)-[:REPORTS_TO *]->(c:Person) WHERE p.name="F"
WITH COLLECT (c) + p AS all
UNWIND all as p MATCH (p)-[:REPORTS_TO]-(c)
RETURN p,c;
This returns me this: (which is all child's node of F without limit)
But when I try to get all childs till depth 2 :
Query to get All child of node F with depth = 2
MATCH (p:Person)-[:REPORTS_TO *2]->(c:Person) WHERE p.name="F"
WITH COLLECT (c) + p AS all
UNWIND all as p MATCH (p)-[:REPORTS_TO]->(c)
RETURN p,c;
Which returns
When I put depth = 2, it didn't return all child's of D' (only returned A and notB`, 'C')
Expected response was:
All child's of 'F', child's of all child's of `F' (i.e level 1) and child's of all childs of nodes of level 1 (i.e level 2)
Am I missing something in my query or any another way to get a response as I expected above?
Adding dataset
CREATE (f:Person {name: "F"})
CREATE (e:Person {name: "E"})
CREATE (d:Person {name: "D"})
CREATE (c:Person {name: "C"})
CREATE (b:Person {name: "B"})
CREATE (a:Person {name: "A"})
CREATE (x:Person {name: "X"})
CREATE (a)-[:REPORTS_TO]->(x)
CREATE (d)-[:REPORTS_TO]->(a)
CREATE (d)-[:REPORTS_TO]->(b)
CREATE (d)-[:REPORTS_TO]->(c)
CREATE (f)-[:REPORTS_TO]->(d)
CREATE (f)-[:REPORTS_TO]->(e)

The problem is you're not querying for what you think you're querying for.
[:REPORTS_TO *2]
doesn't query up to depth 2, it queries nodes at exactly depth 2. The result is nodes B, C, A, and F (since you added it in).
Of those nodes, only nodes A and F have an outgoing :REPORTS_TO relationship, so your match eliminates B and C from the result set. The nodes returned are A and F and the nodes reachable by an outgoing :REPORTS_TO relationship (E, D, and X).
If you want to alter your query so it's up to depth 2 instead of exactly depth 2, use a range on the variable-length relationship (omitting the lower-bound makes it default to 1):
[:REPORTS_TO *..2]
And if you want this to include F in the match itself (instead of manually adding it when you collect the nodes), use a lower bound of 0:
[:REPORTS_TO *0..2]

Related

Detecting cycles in Topological sort using Kahn's algorithm (in degree / out degree)

I have been practicing graph questions lately.
https://leetcode.com/problems/course-schedule-ii/
https://leetcode.com/problems/alien-dictionary/
The current way I detect cycles is to use two hashsets. One for visiting nodes, and one for fully visited nodes. And I push the result onto a stack with DFS traversal.
If I ever visit a node that is currently in the visiting set, then it is a cycle.
The code is pretty verbose and the length is long.
Can anyone please explain how I can use a more standard top-sort algorithm (Kahn's) to detect cycles and generate the top sort sequence?
I just want my method to exit or set some global variable which flags that a cycle has been detected.
Many thanks.
Khan's algorithm with cycle detection (summary)
Step 1: Compute In-degree: First we create compute a lookup for the in-degrees of every node. In this particular Leetcode problem, each node has a unique integer identifier, so we can simply store all the in-degrees values using a list where indegree[i] tells us the in-degree of node i.
Step 2: Keep track of all nodes with in-degree of zero: If a node has an in-degree of zero it means it is a course that we can take right now. There are no other courses that it depends on. We create a queue q of all these nodes that have in-degree of zero. At any step of Khan's algorithm, if a node is in q then it is guaranteed that it's "safe to take this course" because it does not depend on any courses that "we have not taken yet".
Step 3: Delete node and edges, then repeat: We take one of these special safe courses x from the queue q and conceptually treat everything as if we have deleted the node x and all its outgoing edges from the graph g. In practice, we don't need to update the graph g, for Khan's algorithm it is sufficient to just update the in-degree value of its neighbours to reflect that this node no longer exists.
This step is basically as if a person took and passed the exam for
course x, and now we want to update the other courses dependencies
to show that they don't need to worry about x anymore.
Step 4: Repeat: When we removing these edges from x, we are decreasing the in-degree of x's neighbours; this can introduce more nodes with an in-degree of zero. During this step, if any more nodes have their in-degree become zero then they are added to q. We repeat step 3 to process these nodes. Each time we remove a node from q we add it to the final topological sort list result.
Step 5. Detecting Cycle with Khan's Algorithm: If there is a cycle in the graph then result will not include all the nodes in the graph, result will return only some of the nodes. To check if there is a cycle, you just need to check whether the length of result is equal to the number of nodes in the graph, n.
Why does this work?:
Suppose there is a cycle in the graph: x1 -> x2 -> ... -> xn -> x1, then none of these nodes will appear in the list because their in-degree will not reach 0 during Khan's algorithm. Each node xi in the cycle can't be put into the queue q because there is always some other predecessor node x_(i-1) with an edge going from x_(i-1) to xi preventing this from happening.
Full solution to Leetcode course-schedule-ii in Python 3:
from collections import defaultdict
def build_graph(edges, n):
g = defaultdict(list)
for i in range(n):
g[i] = []
for a, b in edges:
g[b].append(a)
return g
def topsort(g, n):
# -- Step 1 --
indeg = [0] * n
for u in g:
for v in g[u]:
indeg[v] += 1
# -- Step 2 --
q = []
for i in range(n):
if indeg[i] == 0:
q.append(i)
# -- Step 3 and 4 --
result = []
while q:
x = q.pop()
result.append(x)
for y in g[x]:
indeg[y] -= 1
if indeg[y] == 0:
q.append(y)
return result
def courses(n, edges):
g = build_graph(edges, n)
ordering = topsort(g, n)
# -- Step 5 --
has_cycle = len(ordering) < n
return [] if has_cycle else ordering

Why is a sum (or discriminated union or disjoint union) the inverse of a product?

I'm trying to wrap my head around category theory and this question just came to my mind - why is the sum type the inverse of the product type? I mean, I see how arrows are changing directions in the opposite category, but I don't see why sum couldn't contain both components coming to it.
They are dual in the sense that one is defined by the mapping in property, and the other by the mapping out property. Every mapping into a product, c -> (a, b), is equivalent to a pair of functions c -> a and c -> b. Every mapping out of a coproduct, Either a b -> c is equivalent to a pair of functions a -> c and b -> c (think of pattern matching the Left a and Right b constructors).

Neo4j Creating relationship between nodes that already map to a relationship

Currently, I have a query in Neo4j that returns all the nodes in my graph that are pointed to by multiple nodes.
The query to return these nodes (in picture) looks like this:
MATCH (n)-[r:CLINICAL_SIGNIFICANCE]->()
WITH n, count(r) as rel_cnt
WHERE rel_cnt > 1
MATCH (c)-[r:PROTEIN_CHANGE]->(n)
return c, n
Is there a way to loop through the nodes labeled as c (blue nodes) and if they point to the same node labeled as n (yellow nodes), create a relationship between the nodes labeled as c (blue nodes)?

Finding an edge by unique ID or name

It is possible to search a subgraph and node by their unique names;
n = agnode(g, "myUniqueNodeName", FALSE);
h = agsubg(g, "myUniqueSubgrahName", FALSE);
Likewise, is there a way to search edges in a strict directed graph by their unique names?
e = agedge (g, u, v, "e28", FALSE);
Documentation indicates that:
The 'name' of an edge (more correctly, identifier) is treated as a
unique indentifier for edges between a particular node pair. That is,
there can only be at most one edge with name e28 between any given u
and v, but there can be many other edges between other nodes.
It seems that there must be a list of edges that can be searched by name. Otherwise, a separate (ID -> edge) map will need to be maintained separately.

Random tree with specific branching factor in Mathematica

Do you know if it's possible to somehow generate a random tree graph with a specific branching factor? I don't want it to be a k-ary tree.
It would be also great if I could define both the branching factor and the maximum depth. I want to randomly generate a bunch of trees that would differ in branching factor and depth.
TreePlot with random integer input returns something that's almost what I want:
TreePlot[RandomInteger[#] -> # + 1 & /# Range[0, 100]]
but I can't figure out a way to get a tree with a specific branching factor.
Thanks!
I guess I'm a bit late, but I like the question. Instead of creating a tree in the form
{0 -> 1, 0 -> 5, 1 -> 2, 1 -> 3, 1 -> 4}
I will use the following form of nested calls, where every argument is a child, which represents another node
0[1[2, 3, 4], 5]
Both forms are equivalent and can be transformed into each other.
Row[{
TreeForm[0[1[2, 3, 4], 5]],
TreePlot[{0 -> 1, 0 -> 5, 1 -> 2, 1 -> 3, 1 -> 4}]
}]
Here is how the algorithm works: As arguments we need a function f which gives a random number of children and is called when we create a node. Additionally, we have a depth d which defines the maximum depth a (sub-)tree can have.
[Choose branching] Define a branching function f which can be called like f[] and returns a random number of children. If you want a tree with either 2 or 4 children, you could use e.g. f[] := RandomChoice[{2, 4}]. This function will be called for each created node in the tree.
[Choose tree-depth] Choose a maximum depth d of the tree. At this point, I'm not sure what you want the randomness to be incorporated into the generation of the tree. What I do here is that when a new node is created, the depth of the tree below it is randomly chosen between the depth of its parent minus one and zero.
[Create ID Counter] Create a unique counter variable count and set it to zero. This will give us increasing node ID's. When creating a new node, it is increased by 1.
[Create a node] Increase count and use it as node-ID. If the current depth d is zero, give back a leaf with ID count, otherwise call f to decide how many children the node should get. For every new child chose randomly the depth of its sub-tree which can be 0,...,d-1 and call 4. for each new child. When all recursive calls have returned, the tree is built.
Fortunately, in Mathematica-code this procedure is not so verbose and consists only of a few lines. I hope you can find in the code what I have described above
With[{counter = Unique[]},
generateTree[f_, d_] := (counter = 0; builder[f, d]);
builder[f_, d_] := Block[
{nodeID = counter++, childs = builder[f, #] & /# RandomInteger[d - 1, f[]]},
nodeID ## childs
];
builder[f_, 0] := (counter++);
]
Now you can create a random tree like follows
branching[] := RandomChoice[{2, 4}];
t = generateTree[branching, 6];
TreeForm[t]
Or if you like you can use the next function to convert the tree into what is accepted by TreePlot
transformTree[tree_] := Module[{transform},
transform[(n_Integer)[childs__]] := (Sow[
n -> # & /# ({childs} /. h_Integer[__] :> h)];
transform /# {childs});
Flatten#Last#Reap[transform[tree]
]
and use it to create many random trees
trees = Table[generateTree[branching, depth], {depth, 3, 7}, {5}];
GraphicsGrid[Map[TreePlot[transformTree[#]] &, trees, {2}]]

Resources