Neo4J - Extracting graph as a list based on relationship strength - graph

I have a typical friend of friend graph database i.e. a social network database. The requirement is to extract all the nodes as a list in such a way that the least connected nodes appear together in the list and the most connected nodes are placed further apart in the list.
Basically its asking a graph to be represented as a list and I'm not sure if we can really do that. For e.g. if A is related to B with strength 10, B is related to C with strength 80, A to C is 20
then how to place this in a list ?
A, B, C - no because then A is distant from C relatively more than B which is not the case
A, C, B - yes because A and B are less related that A,C and C,B.
With 3 nodes its very simple but with lot of nodes - is it possible to put them in a list based on relationship strength ?

Ok, I think this is maybe what you want. An inverse of the shortestPath traversal with weights. If not, tell me how the output should be.
http://console.neo4j.org/r/n8npue
MATCH p=(n)-[*]-(m) // search all paths
WHERE n <> m
AND ALL (x IN nodes(p) WHERE length([x2 IN nodes(p) WHERE x2=x])=1) // this filters simple paths
RETURN [n IN nodes(p)| n.name] AS names, // get the names out
reduce(acc=0, r IN relationships(p)| acc + r.Strength) AS totalStrength // calculate total strength produced by following this path
ORDER BY length(p) DESC , totalStrength ASC // get the max length (hopefully a full traversal), and the minimum strength
LIMIT 1
This is not going to be efficient for a large graph, but I think it's definitely doable--probably needs using the traversal/graphalgo API shortest path functionality if you need speed on a large graph.

Related

Using a recursive / fixed point / iterative structure in a Neo4j Cypher query

My Neo4j database contains relationships that may have a special property:
(a) -[{sustains:true}]-> (b)
This means that a sustains b: when the last node that sustains b is deleted, b itself should be deleted. I'm trying to write a Cypher statement that deletes a given node PLUS all nodes that now become unsustained as a result. This may set off a chain reaction, and I don't know how to encode this in Cypher. Is Cypher expressive enough?
In any other language, I could come up with a number of ways to implement this. A recursive algorithm for this would be something like:
delete(a) :=
MATCH (a) -[{sustains:true}]-> (b)
REMOVE a
WITH b
MATCH (aa) -[{sustains:true}]-> (b)
WHERE count(aa) = 0
delete(b)
Another way to describe the additional set of nodes to delete would be with a fixed point function:
setOfNodesToDelete(Set) :=
RETURN Set' ⊆ Set such that for all n ∈ Set'
there is no (m) -[{sustains:true}]-> (n) with m ∉ Set
We would start with the set of all z such that (a) -[{sustains:true}*1..]-> (z), then delete a, run setOfNodesToDelete on the set until it doesn't change anymore, then delete the nodes specified by the set. This requires an unspecified number of iterations.
Any way to accomplish my goal in Cypher?

Enumerate all paths from a single source in a graph

I was wondering if you are aware of an algorithm to enumerate all possible simple paths in a graph from a single source, without repeating any of the vertices. keep in mind that the graph will be very small (16 nodes) and relatively sparse (2-5 edges per node).
To make my question clear:
Vertices: A,B,C
A connects to B, C
B connects to A, C
C connects to A, B
Paths (from A):
A,B
A,C
A,B,C
A,C,B
Vertices: A,B,C,D
A connects to B, C
B connects to A, C, D
C connects to A, B, D
Paths (from A):
A,B
A,C
A,B,C
A,B,D
A,C,B
A,C,D
A,B,C,D
A,C,B,D
It is surely not BFS or DFS, although one of their possible variants might work. Most of the similar problems I saw in SO, were dealing with pair of nodes graphs, so my problem is slightly different.
Also this Find all possible paths from one vertex in a directed cyclic graph in Erlang is related, but the answers are too Erlang related or it is not clear what exactly needs to be done. As I see, the algorithm could be alternatively be decribed as find all possible simple paths for a destined number of hops from a single source. Then for number of hops (1 to N) we could find all solutions.
I work with Java but even a pseudocode is more than enough help for me.
In Python style, it is a BFS with a different tracking for visited:
MultiplePath(path, from):
from.visited = True
path.append(from)
print(path)
for vertex in neighbors(from):
if (not vertex.visited):
MultiplePath(path, vertex)
from.visited = False
Return

Shortest path between three points

In a graph I need to find a shortest path between two points and on the way visit one checkpoint. Also, I can visit each vertex only once. I suppose it have something to do with network flow but I have no idea how to implement that.
You can model it entirely as a capacitated multicommodity minimum cost flow problem. You want to go from A to B via C without using a vertex twice. You can model it as a flow from A to C (commodity 1) and a flow from B to C (commodity 2). To avoid a node being used twice, you have to perform the following trick on all your nodes (in your model):
Given a node X with p incoming and t outgoing edges, you create a new node Y and rewire the links. The p incoming links will all arrive in X, the q outgoing edges will all depart from Y. Add only 1 link (L) from X to Y. By setting the capacity of the L-link to 1, each node will only be used once.
You can then write it down as an (M)ILP and have it solved. The ILP will give you the correct solution if it exists. Depending on your application, it might be overkill. If you want a fast heuristic, just use 2 A* searches and hope they don't overlap.

finding the total number of distinct shortest paths between 2 nodes in undirected weighted graph in linear time?

I was wondering, that if there is a weighted graph G(V,E), and I need to find a single shortest path between any two vertices S and T in it then I could have used the Dijkstras algorithm. but I am not sure how this can be done when we need to find all the distinct shortest paths from S to T. Is it solvable on O(n) time? I had one more question like if we assume that the weights of the edges in the graph can assume values only in certain range lets say 1 <=w(e)<=2 will this effect the time complexity?
You can do it using Dijkstra. In Dijkstra's algorithm, you assign labels to each node based on the distance it has from your source and settle nodes according to this distance (smallest first). Once the target node is settled, you have the length of the shortest path. To retrieve the path (=the sequence of edges), you can keep track of the parent of each node you settle. To retrieve all possible paths, you have to account for multiple parents of each node.
A small example:
Suppose you have a graph which looks like this (all edges weight 1 for simplicity):
B
/ \
A C - D
\ /
E
When you do dijkstra to find the distance A->D, you would get 3. You would first settle node A with distance 0, then nodes B and E with distance 1, node C with distance 2 and finally node D with distance 3. To get the paths, you would remember the parents of the nodes when you settle them. In this case you would first set the parents of B and C =node A. You then need to set the parents of node C to B and E as they both supply the same length. Node D would get node C as a parent.
When you need the paths, you could just follow the parent pointers, starting at D and branching each time a node has multiple parents. This would give you a branch at node C.
The post mentioned in the comment above uses the same principles, although it does not allow you to actually extract the paths, it only gives you the count.
One final remark: Dijkstra is not a linear time algorithm as you need to do a lot of operations to maintain you queue and node sets.
(a) Using BFS from s to mark nodes visited as usual, in the meantime, changing following things:
(b) for each node v, it has a record L(v) representing the layer that v is on and a record f(v)
representing the number of incoming paths:
(c) everytime a node v1 is found in another node's v2 neighbors:
(d) if v1 is not in the queue, add it into queue, L(v1) = L(v2) + 1,f(v1)+ = f(v2)
(d) if v1 is already in the queue and L(v1) equals L(v2), do nothing.
(e) if v1 is already in the queue but L(v1) doe not equal to L(v2), f(v1)+ = f(v2)
(f) until in this modied BFS, t pop from the queue for the rst time, we have f(t)
(g) and this f(t) represents the number of dierent shortest paths from s to t
This might solve your problem.

A Shortest Path Algorithm With Minimum Number Of Nodes Traversed

I am looking for a Dijkstra's algorithm implementation, that also takes into consideration the number of nodes traversed.
What I mean is, a typical Dijkstra's algorithm, takes into consideration the weight of the edges connecting the nodes, while calculating the shortest route from node A to node B. I want to insert another parameter into this. I want the algorithm to give some weightage to the number of nodes traversed, as well.
So that the shortest route computed from A to B, under certain values, may not necessarily be the Shortest Route, but the route with the least number of nodes traversed.
Any thoughts on this?
Cheers,
RD
Edit :
My apologies. I should have explained better. So, lets say, the shortest route from
(A, B) is A -> C -> D -> E -> F -> B covering a total of 10 units
But I want the algorithm to come up with the route A -> M -> N -> B covering a total of 12 units.
So, what I want, is to be able to give some weightage to the number of nodes as well, not just the distance of the connected nodes.
Let me demonstrate that adding a constant value to all edges can change which route is "shortest" (least total weight of edges).
Here's the original graph (a triangle):
A-------B
\ 5 /
2 \ / 2
\ /
C
Shortest path from A to B is via C. Now add constant 2 to all edges. The shortest path becomes instead the single step from A directly to B (owing to "penalty" we've introduced for using additional edges).
Note that the number of edges used is (excluding the node you start from) the same as the number of nodes visited.
The way you can do that is adapt the weights of the edges to always be 1, so that you traverse 5 nodes, and you've gone a distance of "5". The algorithm would be the same at that point, optimizing for number of nodes traversed rather than distance traveled.
If you want some sort of hybrid, you need to determine how much importance to give to traversing a node and the distance. The weight used in calculations should look something like:
weight = node_importance * 1 + (1 - node_importance) * distance
Where node_importance would be a percentage which gauges how much distance is a factor and how much minimum node traversal is important. Though you may have to normalize the distances to be an average of 1.
I'm going to go out on a limb here, but have you tried the A* algorithm? I may have understood your question wrong, but it seems like A* would be a good starting point for what you want to do.
Check out: http://en.wikipedia.org/wiki/A*_search_algorithm
Some pseudo code there to help you out too :)
If i understood the question correctly then its best analogy would be that used to find the best network path.
In network communication a path may not only be selected because it is shortest but has many hop counts(node), thus may lead to distortion, interference and noise due to node connection.
So the best path calculation contains the minimizing the function of variables as in your case Distance and Hop Count(nodes).
You have to derive a functional equation that could relate the distance and node counts with quality.
so something as suppose
1 hop count change = 5 unit distance (which means the impact is same for 5unit distace or 1 node change)
so to minimize the loss you can use it in the linear equation.
minimize(distance + hopcount);
where hopcount can be expressed as distance.

Resources