I have a graph shown here. Just to node that nodes B_0, B_1, belong to node of type B, C_0, C_1. C_2, C_3 belong to node of type C and so on.
Now, I want to find multiple subgraphs, which could satify criteria like defined by this example -
Criteria -
subgraph contains 1 node of type A, 1 node of type B, 1 node of type C, one node of type D.
subgraph has one edge from node of type A to one node of type B, one edge connecting type B and type C and one node connecting type C and type D.
subgraph contains one edge from type A going out of subgraph to type B node, one edge
from type B to type C node outside, one edge from type D to type E outside.
Now this description should give result as -
subgraph :: A_0, B_0, C_1, D_1
subgraph :: A_0, B_0, C_0, D_0
subgraph :: A_0, B_1, C_2, D_2
subgraph :: A_0, B_1, C_3, D_3
I want to know, if there is any algorithm, by which I can find such sub-graphs?
I tried to figure out an algorithm by making all possible combinations. However, this would be exponential to number of nodes in subgraph. Thus, I want to know if there exists an efficient way to calculate it. Or if there exists a problem of similar nature in Graph Theory?
You can start by visiting all nodes of type A. For each A node, look at all nodes connected to it which are of type B. From there look all nodes of type C and so on, keeping track of the nodes you've visited from the last A node. Then whenever you reach a subnode that completes the criteria of your search, you add all the list of nodes from the A node until the point where you are. Essentially you're doing a depth-first search where you keep traversing into the graph as long as a node meets the criteria of what should follow and backtrack whenever there's no more valid nodes (ie. which would produce a valid subgraph) going out from your current node.
Related
I have been practicing graph questions lately.
https://leetcode.com/problems/course-schedule-ii/
https://leetcode.com/problems/alien-dictionary/
The current way I detect cycles is to use two hashsets. One for visiting nodes, and one for fully visited nodes. And I push the result onto a stack with DFS traversal.
If I ever visit a node that is currently in the visiting set, then it is a cycle.
The code is pretty verbose and the length is long.
Can anyone please explain how I can use a more standard top-sort algorithm (Kahn's) to detect cycles and generate the top sort sequence?
I just want my method to exit or set some global variable which flags that a cycle has been detected.
Many thanks.
Khan's algorithm with cycle detection (summary)
Step 1: Compute In-degree: First we create compute a lookup for the in-degrees of every node. In this particular Leetcode problem, each node has a unique integer identifier, so we can simply store all the in-degrees values using a list where indegree[i] tells us the in-degree of node i.
Step 2: Keep track of all nodes with in-degree of zero: If a node has an in-degree of zero it means it is a course that we can take right now. There are no other courses that it depends on. We create a queue q of all these nodes that have in-degree of zero. At any step of Khan's algorithm, if a node is in q then it is guaranteed that it's "safe to take this course" because it does not depend on any courses that "we have not taken yet".
Step 3: Delete node and edges, then repeat: We take one of these special safe courses x from the queue q and conceptually treat everything as if we have deleted the node x and all its outgoing edges from the graph g. In practice, we don't need to update the graph g, for Khan's algorithm it is sufficient to just update the in-degree value of its neighbours to reflect that this node no longer exists.
This step is basically as if a person took and passed the exam for
course x, and now we want to update the other courses dependencies
to show that they don't need to worry about x anymore.
Step 4: Repeat: When we removing these edges from x, we are decreasing the in-degree of x's neighbours; this can introduce more nodes with an in-degree of zero. During this step, if any more nodes have their in-degree become zero then they are added to q. We repeat step 3 to process these nodes. Each time we remove a node from q we add it to the final topological sort list result.
Step 5. Detecting Cycle with Khan's Algorithm: If there is a cycle in the graph then result will not include all the nodes in the graph, result will return only some of the nodes. To check if there is a cycle, you just need to check whether the length of result is equal to the number of nodes in the graph, n.
Why does this work?:
Suppose there is a cycle in the graph: x1 -> x2 -> ... -> xn -> x1, then none of these nodes will appear in the list because their in-degree will not reach 0 during Khan's algorithm. Each node xi in the cycle can't be put into the queue q because there is always some other predecessor node x_(i-1) with an edge going from x_(i-1) to xi preventing this from happening.
Full solution to Leetcode course-schedule-ii in Python 3:
from collections import defaultdict
def build_graph(edges, n):
g = defaultdict(list)
for i in range(n):
g[i] = []
for a, b in edges:
g[b].append(a)
return g
def topsort(g, n):
# -- Step 1 --
indeg = [0] * n
for u in g:
for v in g[u]:
indeg[v] += 1
# -- Step 2 --
q = []
for i in range(n):
if indeg[i] == 0:
q.append(i)
# -- Step 3 and 4 --
result = []
while q:
x = q.pop()
result.append(x)
for y in g[x]:
indeg[y] -= 1
if indeg[y] == 0:
q.append(y)
return result
def courses(n, edges):
g = build_graph(edges, n)
ordering = topsort(g, n)
# -- Step 5 --
has_cycle = len(ordering) < n
return [] if has_cycle else ordering
Currently, I have a query in Neo4j that returns all the nodes in my graph that are pointed to by multiple nodes.
The query to return these nodes (in picture) looks like this:
MATCH (n)-[r:CLINICAL_SIGNIFICANCE]->()
WITH n, count(r) as rel_cnt
WHERE rel_cnt > 1
MATCH (c)-[r:PROTEIN_CHANGE]->(n)
return c, n
Is there a way to loop through the nodes labeled as c (blue nodes) and if they point to the same node labeled as n (yellow nodes), create a relationship between the nodes labeled as c (blue nodes)?
Suppose I have a neo4j graph with 3 different kind of nodes (say type A, type B, type C).
There are :
5 nodes of type A
40 nodes of type B
200 nodes of type C
Each node of type A are connecting to one or more of type B ((A -> B)) and each node of type B are connecting to one or more of type C ((B -> C)).
One node of type B can be shared by more than one type A nodes (A1 -> B1, A2 -> B1) and one node of type C can be shared by more than one type B nodes (B1 -> C1, B2 -> C1).
No node of type A connects with any node of type C. And the relationships are directional as described above.
For a given node of type A, can I find out all the nodes in the connected network, i.e., the entire tree emerging from that node, and not just the immediately connected nodes?
So basically I am looking for a py2neo function or cypher query that can give me a full tree or full network emerging from a given node.
Is this query respond to your needs ?
MATCH p=(a:A)-->(b:B)-->(c:C)
WHERE a.id = 'your id' // your condition to find your specific A node
RETURN p
It is possible to search a subgraph and node by their unique names;
n = agnode(g, "myUniqueNodeName", FALSE);
h = agsubg(g, "myUniqueSubgrahName", FALSE);
Likewise, is there a way to search edges in a strict directed graph by their unique names?
e = agedge (g, u, v, "e28", FALSE);
Documentation indicates that:
The 'name' of an edge (more correctly, identifier) is treated as a
unique indentifier for edges between a particular node pair. That is,
there can only be at most one edge with name e28 between any given u
and v, but there can be many other edges between other nodes.
It seems that there must be a list of edges that can be searched by name. Otherwise, a separate (ID -> edge) map will need to be maintained separately.
I have a directed, unweighted, possibly cyclic graph that can contain loops and multiple duplicate edges (i.e. two edges from node 1 to node 2).
I would now like to find the length of the longest trail in this graph, i.e. the longest path that:
- uses no edge twice (but if there are multiple edges from node 1 to node 2, it can use every one of them)
- possibly visits nodes several time (i.e. it does not have to be a simple path)
In particular, is this problem NP-hard? I know that the longest simple path is NP-hard (reducing Hamiltonian Path to it) and the longest trail with edge reusal is in P (Bellman ford with weight -1 on every edge). However, with this problem, I am not quite sure and I could not find good information on it.
Although I am not completely sure, I think that this problem is NP-hard. As I understand, your question arises due to multiple edges between nodes. The graphs that has multiple edges between same nodes can be expanded to larger graphs with no multiple edges between them. Thus, a graph with multiple edges between same nodes has no difference than a graph without multiple edges.
Let me walkthrough a simple example to explain:
Let there be a graph with 3 nodes (A,B,C) and 5 edges between them (A to B, A to B, B to A, B to C, C to A)
This graph can be expanded and shown with 5 nodes and 7 edges.
Lets expand the node A to 3 different nodes (A1, A2, A3). When we adjust the edges according to previous edges, there exists 7 edges(A1 to B, A2 to B, B to A3, B to C, C to A1, C to A2, C to A3)
As a result, now we have a graph without multiple edges and can be evaluated with the help of Hamiltonian and Bellman Ford.
Hope I've at least cleared the problem a bit.