ArangoDB - how to perform computations in graph traversals? - graph

I have a simple graph for keeping a track of people whom I have lent money to.
So the graph looks like this:
userB -- owes to (amount: 200) --> userA
userC -- owes to (amount: 150) --> userA
and so on...
Lets say you need to find out how much money each user is owed, using a graph traversal. How do you implement this?

Let me explain this using the city example graph
Vertices (cities) have a numeric attribute, population; Edges (highways) have a numeric attribute distance.
Inspecting what we expect to sumarize:
FOR v, e IN 1..1 INBOUND "frenchCity/Lyon" GRAPH "routeplanner"
RETURN {city: v, highway: e}
Summing up the population of all traversed cities is easy:
RETURN SUM(FOR v IN 1..1 INBOUND "frenchCity/Lyon" GRAPH "routeplanner"
RETURN v.population)
This uses a sub-query, which means all values are returned, and then the SUM operation is executed on them.
Its better to use COLLECT AGGREGATE to sum up the attributes during the traversal.
So while in the context of population of cities and their distances it may not make sense to sumerize these numbers, lets do it anyways:
FOR v, e IN 1..1 INBOUND "frenchCity/Lyon" GRAPH "routeplanner"
COLLECT AGGREGATE populationSum = SUM(v.population), distanceSum = SUM(e.distance)
RETURN {population : populationSum, distances: distanceSum}

Related

ArangoDB: Find last node in path

I'm pretty new to Arangodb and I'm trying to get the last/leaf node (I guess vertex) in a graph. So given I've the following graph:
Now I want start the traversal with 6010142. The query should return 6010625 because it is the last node that can be reached via 6010145. But how does the query looks like?
I already tried:
FOR v, e, p IN 1..5 OUTBOUND {_id: 'nodes/6010142'} GRAPH 'test' RETURN v
But it also returns 6010145. Furthermore it is limited to a maxDepth of 5 but my graph can exceed the limit. So I also need a solution that works for any depth. Hopefully anyone can help me :-)
I'm also just starting out with AQL but maybe this can help.
FOR v IN 1..5 OUTBOUND {_id: 'nodes/6010142'} GRAPH 'test' OPTIONS {uniqueVertices: 'global', bfs: true}
FILTER LENGTH(FOR vv IN OUTBOUND v GRAPH 'test' LIMIT 1 RETURN 1) == 0
RETURN v
This approach follows an older ArangoDB cook book (p. 39) for finding leaf nodes. The filter line takes the connected nodes found by the first line and does a second traversal to check if this is actually a leaf.
The OPTIONS {uniqueVertices: 'global', bfs: true} part is an optimization if you are only interested in unique leaf nodes and not all the different paths to those nodes.
Regarding maxDepth I would just use a sufficiently high number. The worst case would be the number of nodes in your graph.
(The graph you posted and your description seem to disagree about the direction of the edges. Maybe you need to use INBOUND.)

Traverse Graph With Directed Cycles using Relationship Properties as Filters

I have a Neo4j graph with directed cycles. I have had no issue finding all descendants of A assuming I don't care about loops using this Cypher query:
match (n:TEST{name:"A"})-[r:MOVEMENT*]->(m:TEST)
return n,m,last(r).movement_time
The relationships between my nodes have a timestamp property on them, movement_time. I've simulated that in my test data below using numbers that I've imported as floats. I would like to traverse the graph using the timestamp as a constraint. Only follow relationships that have a greater movement_time than the movement_time of the relationship that brought us to this node.
Here is the CSV sample data:
from,to,movement_time
A,B,0
B,C,1
B,D,1
B,E,1
B,X,2
E,A,3
Z,B,5
C,X,6
X,A,7
D,A,7
Here is what the graph looks like:
I would like to calculate the descendants of every node in the graph and include the timestamp from the last relationship using Cypher; so I'd like my output data to look something like this:
Node:[{Descendant,Movement Time},...]
A:[{B,0},{C,1},{D,1},{E,1},{X,2}]
B:[{C,1},{D,1},{E,1},{X,2},{A,7}]
C:[{X,6},{A,7}]
D:[{A,7}]
E:[{A,3}]
X:[{A,7}]
Z:[{B,5}]
This non-Neo4J implementation looks similar to what I'm trying to do: Cycle enumeration of a directed graph with multi edges
This one is not 100% what you want, but very close:
MATCH (n:TEST)-[r:MOVEMENT*]->(m:TEST)
WITH n, m, r, [x IN range(0,length(r)-2) |
(r[x+1]).movement_time - (r[x]).movement_time] AS deltas
WHERE ALL (x IN deltas WHERE x>0)
RETURN n, collect(m), collect(last(r).movement_time)
ORDER BY n.name
We basically find all the paths between any of your nodes (beware cartesian products get very expensive on non-trivial datasets). In the WITH we're building a collection delta's that holds the difference between two subsequent movement_time properties.
The WHERE applies an ALL predicate to filter out those having any non-positive value - aka we guarantee increasing values of movement_time along the path.
The RETURN then just assembles the results - but not as a map, instead one collection for the reachable nodes and the last value of movement_time.
The current issue is that we have duplicates since e.g. there are multiple paths from B to A.
As a general notice: this problem is much more elegantly and more performant solvable by using Java traversal API (http://neo4j.com/docs/stable/tutorial-traversal.html). Here you would have a PathExpander that skips paths with decreasing movement_time early instead of collection all and filter out (as Cypher does).

How to write this in Cypher

I have around 644 nodes in my graph database(Neo4j) . I need to compute distances between all these 644 nodes and display it visually in the GUI. I want to pre-compute and store the distances between every two pairs of nodes in the database itself rather than retrieving the nodes on to the server and then finding the distances between them on the fly and then showing on the GUI.
I want to understand how to write such a query in CYPHER. Please let me know.
I think this can work:
// half cross product
match (a),(b)
where id(a) < id(b)
match p=shortestPath((a)-[*]-(b))
with a,b,length(p) as l
create (a)-[:DISTANCE {distance:l}]->(b)
Set 4950 properties, created 4950 relationships, returned 0 rows in 4328 ms
But the browser viz will blow up with this, just that you know.
Regarding your distance measure (it won't be that fast but should work):
MATCH (a:User)-[:READ]->(book)<-[:READ]-(b:User)
WITH a,b,count(*) as common,
length(a-[:READ]->()) as a_read,
length(b-[:READ]->()) as b_read
CREATE UNIQUE (a)-[:DISTANCE {distance:common/(a_read+b_read-common)}]-(b)

Neo4J - Extracting graph as a list based on relationship strength

I have a typical friend of friend graph database i.e. a social network database. The requirement is to extract all the nodes as a list in such a way that the least connected nodes appear together in the list and the most connected nodes are placed further apart in the list.
Basically its asking a graph to be represented as a list and I'm not sure if we can really do that. For e.g. if A is related to B with strength 10, B is related to C with strength 80, A to C is 20
then how to place this in a list ?
A, B, C - no because then A is distant from C relatively more than B which is not the case
A, C, B - yes because A and B are less related that A,C and C,B.
With 3 nodes its very simple but with lot of nodes - is it possible to put them in a list based on relationship strength ?
Ok, I think this is maybe what you want. An inverse of the shortestPath traversal with weights. If not, tell me how the output should be.
http://console.neo4j.org/r/n8npue
MATCH p=(n)-[*]-(m) // search all paths
WHERE n <> m
AND ALL (x IN nodes(p) WHERE length([x2 IN nodes(p) WHERE x2=x])=1) // this filters simple paths
RETURN [n IN nodes(p)| n.name] AS names, // get the names out
reduce(acc=0, r IN relationships(p)| acc + r.Strength) AS totalStrength // calculate total strength produced by following this path
ORDER BY length(p) DESC , totalStrength ASC // get the max length (hopefully a full traversal), and the minimum strength
LIMIT 1
This is not going to be efficient for a large graph, but I think it's definitely doable--probably needs using the traversal/graphalgo API shortest path functionality if you need speed on a large graph.

Directed Acyclical Graph Traversal... help?

a little out of my depth here and need to phone a friend. I've got a directed acyclical graph I need to traverse and I'm stumbling into to graph theory for the first time. I've been reading a lot about it lately but unfortunately I don't have time to figure this out academically. Can someone give me a kick with some help as to how to process this tree?
Here are the rules:
there are n root nodes (I call them "sources")
there are n end nodes
source nodes carry a numeric value
downstream nodes (I call them "worker" nodes) preform various operations on the incoming values like Add, Mult, etc.
As you can see from the graph below, nodes a, b, and c need to be processed before d, e, or f.
What's the proper order to walk this tree?
I would look into linearization of DAGs which should be achievable through Topological sorts.
Linearization, from what I remember, basically sorts in an order which holds to the invariant that for all nodes (Node_X) that have an outdegree to any other given node NodeA, NodeX appears before NodeA.
This would mean that, from your example, nodes a,b, and d would be processed first. Node c second. Nodes e and f, last.
http://en.wikipedia.org/wiki/Topological_sorting
You need to process the nodes via a Topological sort. The sort is not necessarily unique so there might be more than one available order (not that this should matter anyway).
The linked wikipedia page should have concrete algorithms to help you.

Resources