Enumerating k number of paths passing through a node in Neo4j - graph

Currently, I am using the random walk algorithm in Neo4j to do some path computations. The random walk algorithm has the option of specifying source nodes from which the walks can be constructed. However, I am wondering if there is a way to enumerate k number of paths passing through a certain node than making it a source node and combining the paths? Here k is a parameter, say, 2 paths or 3 paths.

If I'm reading your question correctly you want to find k random paths of length i that pass through node n and you don't want to combine the paths of two queries.
If so then the easiest way might be to find all paths containing n and take a random subset of that, i.e.:
MATCH (n) WHERE id(n) = $n_id // Find the node you want the paths to contain
path = MATCH (foo)-[:REL_TYPE*..i]->(bar) // Find all paths containing the node
WHERE n in nodes(path)
AND NOT (foo = n OR bar = n) // Paths should not start or end with n
WITH collect(path) AS paths // Put all the paths in a list
RETURN apoc.coll.randomItems(paths, $k) // Return a random subset of size k

Related

Write a pseudo code for a Graph algorithm

Given a DAG and a function which maps every vertex to a unique number from 1 to , I need to write a pseudo code for an algorithm that finds for every the minimal value of , among all vertices u that are reachable from v, and save it a an attribute of v. The time complexity of the algorithm needs to be (assuming that time complexity of is ).
I thought about using DFS (or a variation of it) and/or topological sort, but I don't know how to use it in order to solve this problem.
In addition, I need to think about an algorithm that gets an undirected graph and the function , and calculate the same thing for every vertex, and I don't know how to do it either.
Your idea of using DFS is right. Actually the function f(v) is only given for saying that each node can be uniquely identified from a number between 1 and |V|.
Just a hint for solving, you would have to modify DFS so that it returns the minimum value of f(v) of the vertex v reachable from this node and save it another array, let's say minReach and the index would be given by the function f(v). The visited array vis would similarly be identified using f(v).
I am also giving the pseudocode if you are not able to solve but try on your own first.
The pseudocode is similar to python and assuming the graph and function f(v) are available. And 0-based indexing is assumed.
vis=[0, 0, 0, .. |V| times] # visited array for dfs
minReach= [1, 2, 3, .. |V| times] # array for storing no. reachable nodes from a node
function dfs(node):
vis[f(node)-1]=1 ### mark unvisited node as visited
for v in graph[node]:
if vis[v]!=1: ## check whether adjacent node is visited
minReach[f(node)-1]=min(minReach[f(node)-1],dfs(v) ## if not visited apply dfs again
else:
minReach[f(node)-1]=min(minReach[f(node)-1],countReach[f(v)-1]) ## else store the minimum node that can be reached from this node.
return countReach[f(node)-1]
for vertex in graph: ### each vertex is checked in graph
if vis[f(vertex)-1]!=1: ### if vertex is not visited dfs is applied
dfs(vertex)

Cypher allShortestPaths just return one path?

Background statement:
I have a graph like bellow:
I want to find all the path between Node A and Node F (something like how many ways I can reach F from A), then my Cypher like this bellow:
MATCH (start:kg:test), (end:kg:test), p = allShortestPaths((start)-[*..8]-(end))
where start.value = 'A' and end.value = 'F'
RETURN start, end, p
As I expected, this query will return the whole graph, but it just returns A->F (return the same thing with using the shortestPath function), like bellow:
Problems
Why that query won't return all the different paths in the graph?
Do I misuse the allShortestPaths function?
How can I get all the path from Node A to Node F?
thanks
shortestPath() returns the single shortest path between the nodes (and if there are multiple of the same size it just returns the first that it finds).
If there are multiple paths that could have been returned by shortestPath() (they will all have the same size), then allShortesPaths() will return them.
If you just want to find all possible paths between two nodes (the length of the path doesn't matter, and you don't care about shortest paths at all), then you don't need to use either of these functions.
MATCH p=(start:kg:test)-[*..8]-(end:kg:test)
where start.value = 'A' and end.value = 'F'
RETURN start, end, p

Zero or more length of path in Cypher

For example I have a path:
1-[:A]->2-[:B]->3
And we can use the * operator to define if a particular edge can be repeated. I would like to use the * operator on the entire path, or both edges combined. I would like to follow: (A AND B) zero or more times.
Example:
1-[:A]->2-[:B]->3-[:A]->4-[:B]->5...
I am not sure how to apply the * operator for the entire path in Cypher. My intent is to express a pattern that allows a specific path to be repeated 0 or more times.
This is something variable-length patterns cannot do in Cypher. However, because of this, we added repeating sequences functionality to path expander procs in APOC Procedures.
As an example:
MATCH (n)
WHERE id(n) = 123
CALL apoc.path.expandConfig(n, {relationshipFilter:'A>, B>'}) YIELD path
RETURN path
This expands from a start node (n) expanding out only a repeating sequence of outgoing :A and :B relationships. No minLevel or maxLevel properties were provided, so this has a minimum of 0 length and no bounds on max length.

finding the total number of distinct shortest paths between 2 nodes in undirected weighted graph in linear time?

I was wondering, that if there is a weighted graph G(V,E), and I need to find a single shortest path between any two vertices S and T in it then I could have used the Dijkstras algorithm. but I am not sure how this can be done when we need to find all the distinct shortest paths from S to T. Is it solvable on O(n) time? I had one more question like if we assume that the weights of the edges in the graph can assume values only in certain range lets say 1 <=w(e)<=2 will this effect the time complexity?
You can do it using Dijkstra. In Dijkstra's algorithm, you assign labels to each node based on the distance it has from your source and settle nodes according to this distance (smallest first). Once the target node is settled, you have the length of the shortest path. To retrieve the path (=the sequence of edges), you can keep track of the parent of each node you settle. To retrieve all possible paths, you have to account for multiple parents of each node.
A small example:
Suppose you have a graph which looks like this (all edges weight 1 for simplicity):
B
/ \
A C - D
\ /
E
When you do dijkstra to find the distance A->D, you would get 3. You would first settle node A with distance 0, then nodes B and E with distance 1, node C with distance 2 and finally node D with distance 3. To get the paths, you would remember the parents of the nodes when you settle them. In this case you would first set the parents of B and C =node A. You then need to set the parents of node C to B and E as they both supply the same length. Node D would get node C as a parent.
When you need the paths, you could just follow the parent pointers, starting at D and branching each time a node has multiple parents. This would give you a branch at node C.
The post mentioned in the comment above uses the same principles, although it does not allow you to actually extract the paths, it only gives you the count.
One final remark: Dijkstra is not a linear time algorithm as you need to do a lot of operations to maintain you queue and node sets.
(a) Using BFS from s to mark nodes visited as usual, in the meantime, changing following things:
(b) for each node v, it has a record L(v) representing the layer that v is on and a record f(v)
representing the number of incoming paths:
(c) everytime a node v1 is found in another node's v2 neighbors:
(d) if v1 is not in the queue, add it into queue, L(v1) = L(v2) + 1,f(v1)+ = f(v2)
(d) if v1 is already in the queue and L(v1) equals L(v2), do nothing.
(e) if v1 is already in the queue but L(v1) doe not equal to L(v2), f(v1)+ = f(v2)
(f) until in this modied BFS, t pop from the queue for the rst time, we have f(t)
(g) and this f(t) represents the number of dierent shortest paths from s to t
This might solve your problem.

Finding all paths in directed graph with specific cost

Suppose we have the directed, weighted graph. Our task is to find all paths beetween two vertices (source and destination) which cost is less or equal =< N. We visit every vertex only once. In later version I'd like to add a condition that the source can be the destination (we just make a loop).
I think it can be done with modified Dijkstra's algorithm, but I have no idea how implement such thing. Thanks for any help.
You could use recursive backtracking to solve this problem. Terminate your recursion when:
You get to the destination
You visit a node that was already visited
Your path length exceeds N.
Pseudocode:
list curpath := {}
int dest, maxlen
def findPaths (curNode, dist):
if curNode = dest:
print curpath
return
if curNode is marked:
return
if dist > maxlen:
return
add curNode to curpath
mark curNode
for nextNode, edgeDist adjacent to curNode:
findPaths(nextNode, dist + edgeDist)
remove last element of curpath
You want to find all the paths from point A to point B in a directed graph, such as the distance from A to B is smaller than N, and allowing the possibility that A = B.
Dijkstra's algorithm is taylored to find the smallest path from one point to another in a graph, and drops many all the others along the way, so to speak. Because of this, it cannot be used to find all the paths, if we include paths which overlaps.
You can achieve your goal by doing a breadth first search in the graph, keeping each branch of the covering tree in its on stack (you will get an enormous amount of them if the nodes are very well connected), and stop at depth N. All the branches which have reached B are kept aside. Once depth N has been covered, you drop all the paths which didn't reach B. The remaining ones, as well as the one kept aside put together becomes your solutions.
You may choose to add the restriction of not having cycles in your paths, in which case you would have check at each step of the search if the newly reached node is already in the path covered so far, and prune that path if it is the case.
Here is some pseudo code:
function find_paths(graph G, node A):
list<path> L, L';
L := empty list;
push path(A) in L;
for i = 2 to N begin
L' := empty list;
for each path P in L begin
if last node of P = B then push P in L'
else
for each successor S of last node in P begin
if S not in P then
path P' := P;
push S in P';
push P' in L';
endif
end
endif
end
L := L';
end
for each path P in L begin
if last node of P != B
then remove P from L
endif
end
return L;
I think a possible improvement (depending on the size of the problem and the maximum cost N) to the recursive backtracking algorithm suggested by jma127 would be to pre-compute the minimum distance of each node from the destination (shortest path tree), then append the following to the conditions tested to terminate your recursion:
You get to the a node whose minimum distance from the destination is greater than the maximum cost N minus the distance travelled to reach the current node.
If one needs to run the algorithm several times for different sources and destinations, one could run, e.g., Johnson's algorithm at the beginning to create a matrix of the shortest paths between all pairs of nodes.

Resources