ArangoDb: Storing all traversed paths of a Graph

ArangoDb: Storing all traversed paths of a Graph - graph

I would like to store all the distinct paths of a Graph till level 3 traversal and the code I have written below. However, I am getting syntax error near graph "G1".
let paths=(for d in Account
filter d.fraud==true
FOR v,e,p IN 1..3 OUTBOUND GRAPH "G1"
return p.e)
for d in Transaction
filter d._from in pathsand d._to in paths
return distinct {from:d._from, to:d._to, Amount:d.Amount, Time:d.Time}

Related

Cypher query - Combine 2 queries by pipe result of one to other

I am a beginner on cypher and want to create a query that find all nodes that connect to specific nodes that other node connect to them,
see the example
I need to get all the brown nodes that connect to the red nodes that the blue node connect to it.
for this example I want to get the brown nodes Ids: 2, 3 and 1 (because no red nodes needs to get it)
For now I did it on 2 different queries and use python to find it, but now I need to do this in 1 query.
Match (r:R)-[]-(a:A) return collect(a.Id)
and the second query:
Match (b:B) Optional Match (b)-[]-(a:A) return b.Id, collect(a.Id)
and in my script check if every record from the second query is a subset of the first list of all a.Id that connect to R
can I do it in 1 query?
Thank!

Improved answer:
Start with :B nodes and check if all their :A neighbours are have a link to :R
the ALL() function also returns true if :B does not have any neighbours
MATCH (b:B)
WHERE ALL(n IN [(b)--(a:A) | a] WHERE EXISTS ((n)--(:R)) )
RETURN b

ArangoDB: Find last node in path

I'm pretty new to Arangodb and I'm trying to get the last/leaf node (I guess vertex) in a graph. So given I've the following graph:
Now I want start the traversal with 6010142. The query should return 6010625 because it is the last node that can be reached via 6010145. But how does the query looks like?
I already tried:
FOR v, e, p IN 1..5 OUTBOUND {_id: 'nodes/6010142'} GRAPH 'test' RETURN v
But it also returns 6010145. Furthermore it is limited to a maxDepth of 5 but my graph can exceed the limit. So I also need a solution that works for any depth. Hopefully anyone can help me :-)

I'm also just starting out with AQL but maybe this can help.
FOR v IN 1..5 OUTBOUND {_id: 'nodes/6010142'} GRAPH 'test' OPTIONS {uniqueVertices: 'global', bfs: true}
FILTER LENGTH(FOR vv IN OUTBOUND v GRAPH 'test' LIMIT 1 RETURN 1) == 0
RETURN v
This approach follows an older ArangoDB cook book (p. 39) for finding leaf nodes. The filter line takes the connected nodes found by the first line and does a second traversal to check if this is actually a leaf.
The OPTIONS {uniqueVertices: 'global', bfs: true} part is an optimization if you are only interested in unique leaf nodes and not all the different paths to those nodes.
Regarding maxDepth I would just use a sufficiently high number. The worst case would be the number of nodes in your graph.
(The graph you posted and your description seem to disagree about the direction of the edges. Maybe you need to use INBOUND.)

Choose between 2 available edges during traversal

I'm relatively new to Gremlin, and the company I'm with is looking to implement a graph database with some temporary edges within it. Each vertex could have 1 or more edge, and each edge would have a property on it that is essentially isTemporary true/false.
When traversing the graph, if "isTemporary" = true we should follow that edge, otherwise we should follow the edge where "isTemporary" = false.
I.e.,
A-[isTemporary:true, value 1] -> B
A-[isTemporary:false, value 2] -> C
B-[isTemporary: false, value 3] -> D
Running a single gremlin query should return A->B->D in this case.
I've looked through TinkerPop3 documentation, and it seems like "choose" may be what I want to use here, but all the examples seem to return a value, when what I want is a traversal to come back so I can repeatedly act on the traversal.
Any help would be appreciated.

You could be looking for the coalesce step.
Considering this graph:
g.addV().as('a').property('name', 'A').
addV().as('b').property('name', 'B').
addV().as('c').property('name', 'C').
addV().as('d').property('name', 'D').
addE('someLink').from('a').to('b').
property('isTemporary', true).property('value', 1).
addE('someLink').from('a').to('c').
property('isTemporary', false).property('value', 2).
addE('someLink').from('b').to('d').
property('isTemporary', false).property('value', 3)
The following query will return all paths from A to D, attempting to traverse via isTemporary: true edges if present, or via isTemporary: false edges otherwise (coalesce step), iteratively.
g.V().has('name', 'A').
repeat(
coalesce(
outE().has('isTemporary', true).inV(),
outE().has('isTemporary', false).inV()
)
).
until(has('name', 'D')).
path().by('name')
Result:
==>[A,B,D]

How to return top n biggest cluster in Neo4j?

in my database, the graph looks somehow like this:
I want to find the top 3 biggest cluster in my data. A cluster is a collection of nodes connected to each other, the direction of the connection is not important. As can be seen from the picture, the expected result should have 3 clusters with size 3 2 2 respectively.
Here is what I came up with so far:
MATCH (n)
RETURN n, size((n)-[*]-()) AS cluster_size
ORDER BY cluster_size DESC
LIMIT 100
However, it has 2 problems:
I think the query is wrong because the size() function does not return the number of nodes in a cluster as I want, but the number of sub-graph matching the pattern instead.
The LIMIT clause limits the number of nodes to return, not taking the top result. That's why I put 100 there.
What should I do now? I'm stuck :( Thank you for your help.
UPDATE
Thanks to Bruno Peres' answer, I'm able to try algo.unionFind query in Neo4j Graph Algorithm. I can find the size of my connected components using this query:
CALL algo.unionFind.stream()
YIELD nodeId,setId
RETURN setId,count(*) as size_of_component
ORDER BY size_of_component DESC LIMIT 20;
And here is the result:
But that's all I know. I cannot get any information about the nodes in each component to visualize them. The collect(nodeId) takes forever because the top 2 components are too large. And I know it doesn't make sense to visualize those large components, but how about the third one? 235 nodes are fine to render.

I think you are looking for Connected Componentes. The section about connected components of Neo4j Graph Algorithms User Guide says:
Connected Components or UnionFind basically finds sets of connected
nodes where each node is reachable from any other node in the same
set. In graph theory, a connected component of an undirected graph is
a subgraph in which any two vertices are connected to each other by
paths, and which is connected to no additional vertices in the graph.
If this is your case you can install Neo4j Graph Algorithms and use algo.unionFind. I reproduced your scenario with this sample data set:
create (x), (y),
(a), (b), (c),
(d), (e),
(f), (g),
(a)-[:type]->(b), (b)-[:type]->(c), (c)-[:type]->(a),
(d)-[:type]->(e),
(f)-[:type]->(g)
Then running algo.unionFind:
// call unionFind procedure
CALL algo.unionFind.stream('', ':type', {})
YIELD nodeId,setId
// groupBy setId, storing all node ids of the same set id into a list
WITH setId, collect(nodeId) as nodes
// order by the size of nodes list descending
ORDER BY size(nodes) DESC
LIMIT 3 // limiting to 3
RETURN setId, nodes
The result will be:
╒═══════╤══════════╕
│"setId"│"nodes" │
╞═══════╪══════════╡
│2 │[11,12,13]│
├───────┼──────────┤
│5 │[14,15] │
├───────┼──────────┤
│7 │[16,17] │
└───────┴──────────┘
EDIT
From comments:
how can I get all nodeId of a specific setId? For example, from my
screenshot above, how can I get all nodeId of the setId 17506? That
setId has 235 nodes and I want to visualize them.
Run call CALL algo.unionFind('', ':type', {write:true, partitionProperty:"partition"}) YIELD nodes RETURN *. This statement will create apartition` property for each node, containing the partition ID the node is part of.
Run this statement to get the top 3 partitions: match (node)
with node.partition as partition, count(node) as ct order by ct desc
limit 3 return partition, ct.
Now you can get all nodes of each top 3 partitions individually with match (node {partition : 17506}) return node, using the partition ID returned in the second query.

ArangoDB - how to perform computations in graph traversals?

I have a simple graph for keeping a track of people whom I have lent money to.
So the graph looks like this:
userB -- owes to (amount: 200) --> userA
userC -- owes to (amount: 150) --> userA
and so on...
Lets say you need to find out how much money each user is owed, using a graph traversal. How do you implement this?

Let me explain this using the city example graph
Vertices (cities) have a numeric attribute, population; Edges (highways) have a numeric attribute distance.
Inspecting what we expect to sumarize:
FOR v, e IN 1..1 INBOUND "frenchCity/Lyon" GRAPH "routeplanner"
RETURN {city: v, highway: e}
Summing up the population of all traversed cities is easy:
RETURN SUM(FOR v IN 1..1 INBOUND "frenchCity/Lyon" GRAPH "routeplanner"
RETURN v.population)
This uses a sub-query, which means all values are returned, and then the SUM operation is executed on them.
Its better to use COLLECT AGGREGATE to sum up the attributes during the traversal.
So while in the context of population of cities and their distances it may not make sense to sumerize these numbers, lets do it anyways:
FOR v, e IN 1..1 INBOUND "frenchCity/Lyon" GRAPH "routeplanner"
COLLECT AGGREGATE populationSum = SUM(v.population), distanceSum = SUM(e.distance)
RETURN {population : populationSum, distances: distanceSum}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ArangoDb: Storing all traversed paths of a Graph - graph

Related

Cypher query - Combine 2 queries by pipe result of one to other

ArangoDB: Find last node in path

Choose between 2 available edges during traversal

How to return top n biggest cluster in Neo4j?

ArangoDB - how to perform computations in graph traversals?

Categories

Resources