In NebulaGraph Database, why is the number of hops of GetSubGraph in the returned result greater than step_count? - nebula-graph

The number of hops in the returned result is not the same as step_count. That is by design.

To show the completeness of the subgraph, an additional hop is made on all vertices that meet the conditions.
The returned paths of GET SUBGRAPH 1 STEPS FROM "A"; are A->B, B->A, and A->C. To show the completeness of the subgraph, an additional hop is made on all vertices that meet the conditions, namely B->C.
The returned path of GET SUBGRAPH 1 STEPS FROM "A" IN follow; is B->A. To show the completeness of the subgraph, an additional hop is made on all vertices that meet the conditions, namely A->B.
If you only query paths or vertices that meet the conditions, we suggest you use MATCH or GO. The example is as follows.
nebula> MATCH p= (v:player) -- (v2) WHERE id(v)=="A" RETURN p;
nebula> GO 1 STEPS FROM "A" OVER follow YIELD src(edge),dst(edge);

Related

Write a pseudo code for a Graph algorithm

Given a DAG and a function which maps every vertex to a unique number from 1 to , I need to write a pseudo code for an algorithm that finds for every the minimal value of , among all vertices u that are reachable from v, and save it a an attribute of v. The time complexity of the algorithm needs to be (assuming that time complexity of is ).
I thought about using DFS (or a variation of it) and/or topological sort, but I don't know how to use it in order to solve this problem.
In addition, I need to think about an algorithm that gets an undirected graph and the function , and calculate the same thing for every vertex, and I don't know how to do it either.
Your idea of using DFS is right. Actually the function f(v) is only given for saying that each node can be uniquely identified from a number between 1 and |V|.
Just a hint for solving, you would have to modify DFS so that it returns the minimum value of f(v) of the vertex v reachable from this node and save it another array, let's say minReach and the index would be given by the function f(v). The visited array vis would similarly be identified using f(v).
I am also giving the pseudocode if you are not able to solve but try on your own first.
The pseudocode is similar to python and assuming the graph and function f(v) are available. And 0-based indexing is assumed.
vis=[0, 0, 0, .. |V| times] # visited array for dfs
minReach= [1, 2, 3, .. |V| times] # array for storing no. reachable nodes from a node
function dfs(node):
vis[f(node)-1]=1 ### mark unvisited node as visited
for v in graph[node]:
if vis[v]!=1: ## check whether adjacent node is visited
minReach[f(node)-1]=min(minReach[f(node)-1],dfs(v) ## if not visited apply dfs again
else:
minReach[f(node)-1]=min(minReach[f(node)-1],countReach[f(v)-1]) ## else store the minimum node that can be reached from this node.
return countReach[f(node)-1]
for vertex in graph: ### each vertex is checked in graph
if vis[f(vertex)-1]!=1: ### if vertex is not visited dfs is applied
dfs(vertex)

ArangoDB: Find last node in path

I'm pretty new to Arangodb and I'm trying to get the last/leaf node (I guess vertex) in a graph. So given I've the following graph:
Now I want start the traversal with 6010142. The query should return 6010625 because it is the last node that can be reached via 6010145. But how does the query looks like?
I already tried:
FOR v, e, p IN 1..5 OUTBOUND {_id: 'nodes/6010142'} GRAPH 'test' RETURN v
But it also returns 6010145. Furthermore it is limited to a maxDepth of 5 but my graph can exceed the limit. So I also need a solution that works for any depth. Hopefully anyone can help me :-)
I'm also just starting out with AQL but maybe this can help.
FOR v IN 1..5 OUTBOUND {_id: 'nodes/6010142'} GRAPH 'test' OPTIONS {uniqueVertices: 'global', bfs: true}
FILTER LENGTH(FOR vv IN OUTBOUND v GRAPH 'test' LIMIT 1 RETURN 1) == 0
RETURN v
This approach follows an older ArangoDB cook book (p. 39) for finding leaf nodes. The filter line takes the connected nodes found by the first line and does a second traversal to check if this is actually a leaf.
The OPTIONS {uniqueVertices: 'global', bfs: true} part is an optimization if you are only interested in unique leaf nodes and not all the different paths to those nodes.
Regarding maxDepth I would just use a sufficiently high number. The worst case would be the number of nodes in your graph.
(The graph you posted and your description seem to disagree about the direction of the edges. Maybe you need to use INBOUND.)

How to return top n biggest cluster in Neo4j?

in my database, the graph looks somehow like this:
I want to find the top 3 biggest cluster in my data. A cluster is a collection of nodes connected to each other, the direction of the connection is not important. As can be seen from the picture, the expected result should have 3 clusters with size 3 2 2 respectively.
Here is what I came up with so far:
MATCH (n)
RETURN n, size((n)-[*]-()) AS cluster_size
ORDER BY cluster_size DESC
LIMIT 100
However, it has 2 problems:
I think the query is wrong because the size() function does not return the number of nodes in a cluster as I want, but the number of sub-graph matching the pattern instead.
The LIMIT clause limits the number of nodes to return, not taking the top result. That's why I put 100 there.
What should I do now? I'm stuck :( Thank you for your help.
UPDATE
Thanks to Bruno Peres' answer, I'm able to try algo.unionFind query in Neo4j Graph Algorithm. I can find the size of my connected components using this query:
CALL algo.unionFind.stream()
YIELD nodeId,setId
RETURN setId,count(*) as size_of_component
ORDER BY size_of_component DESC LIMIT 20;
And here is the result:
But that's all I know. I cannot get any information about the nodes in each component to visualize them. The collect(nodeId) takes forever because the top 2 components are too large. And I know it doesn't make sense to visualize those large components, but how about the third one? 235 nodes are fine to render.
I think you are looking for Connected Componentes. The section about connected components of Neo4j Graph Algorithms User Guide says:
Connected Components or UnionFind basically finds sets of connected
nodes where each node is reachable from any other node in the same
set. In graph theory, a connected component of an undirected graph is
a subgraph in which any two vertices are connected to each other by
paths, and which is connected to no additional vertices in the graph.
If this is your case you can install Neo4j Graph Algorithms and use algo.unionFind. I reproduced your scenario with this sample data set:
create (x), (y),
(a), (b), (c),
(d), (e),
(f), (g),
(a)-[:type]->(b), (b)-[:type]->(c), (c)-[:type]->(a),
(d)-[:type]->(e),
(f)-[:type]->(g)
Then running algo.unionFind:
// call unionFind procedure
CALL algo.unionFind.stream('', ':type', {})
YIELD nodeId,setId
// groupBy setId, storing all node ids of the same set id into a list
WITH setId, collect(nodeId) as nodes
// order by the size of nodes list descending
ORDER BY size(nodes) DESC
LIMIT 3 // limiting to 3
RETURN setId, nodes
The result will be:
╒═══════╤══════════╕
│"setId"│"nodes" │
╞═══════╪══════════╡
│2 │[11,12,13]│
├───────┼──────────┤
│5 │[14,15] │
├───────┼──────────┤
│7 │[16,17] │
└───────┴──────────┘
EDIT
From comments:
how can I get all nodeId of a specific setId? For example, from my
screenshot above, how can I get all nodeId of the setId 17506? That
setId has 235 nodes and I want to visualize them.
Run call CALL algo.unionFind('', ':type', {write:true, partitionProperty:"partition"}) YIELD nodes RETURN *. This statement will create apartition` property for each node, containing the partition ID the node is part of.
Run this statement to get the top 3 partitions: match (node)
with node.partition as partition, count(node) as ct order by ct desc
limit 3 return partition, ct.
Now you can get all nodes of each top 3 partitions individually with match (node {partition : 17506}) return node, using the partition ID returned in the second query.

finding the total number of distinct shortest paths between 2 nodes in undirected weighted graph in linear time?

I was wondering, that if there is a weighted graph G(V,E), and I need to find a single shortest path between any two vertices S and T in it then I could have used the Dijkstras algorithm. but I am not sure how this can be done when we need to find all the distinct shortest paths from S to T. Is it solvable on O(n) time? I had one more question like if we assume that the weights of the edges in the graph can assume values only in certain range lets say 1 <=w(e)<=2 will this effect the time complexity?
You can do it using Dijkstra. In Dijkstra's algorithm, you assign labels to each node based on the distance it has from your source and settle nodes according to this distance (smallest first). Once the target node is settled, you have the length of the shortest path. To retrieve the path (=the sequence of edges), you can keep track of the parent of each node you settle. To retrieve all possible paths, you have to account for multiple parents of each node.
A small example:
Suppose you have a graph which looks like this (all edges weight 1 for simplicity):
B
/ \
A C - D
\ /
E
When you do dijkstra to find the distance A->D, you would get 3. You would first settle node A with distance 0, then nodes B and E with distance 1, node C with distance 2 and finally node D with distance 3. To get the paths, you would remember the parents of the nodes when you settle them. In this case you would first set the parents of B and C =node A. You then need to set the parents of node C to B and E as they both supply the same length. Node D would get node C as a parent.
When you need the paths, you could just follow the parent pointers, starting at D and branching each time a node has multiple parents. This would give you a branch at node C.
The post mentioned in the comment above uses the same principles, although it does not allow you to actually extract the paths, it only gives you the count.
One final remark: Dijkstra is not a linear time algorithm as you need to do a lot of operations to maintain you queue and node sets.
(a) Using BFS from s to mark nodes visited as usual, in the meantime, changing following things:
(b) for each node v, it has a record L(v) representing the layer that v is on and a record f(v)
representing the number of incoming paths:
(c) everytime a node v1 is found in another node's v2 neighbors:
(d) if v1 is not in the queue, add it into queue, L(v1) = L(v2) + 1,f(v1)+ = f(v2)
(d) if v1 is already in the queue and L(v1) equals L(v2), do nothing.
(e) if v1 is already in the queue but L(v1) doe not equal to L(v2), f(v1)+ = f(v2)
(f) until in this modied BFS, t pop from the queue for the rst time, we have f(t)
(g) and this f(t) represents the number of dierent shortest paths from s to t
This might solve your problem.

Pre-Requisite for Graphs with Unique Topological Sort

Let's assume that a graph in question is a DAG (directed acyclic graph).
Question: can I conclude that such graph will have a unique topological sort if, and only if, only one of its vertices has no incoming edges?
In other words, is having only one vertex with no incoming edges necessary (but not sufficient) to generate a unique topological sort?
A topological sort will be unique if and only if there is a directed edge between each pair of consecutive vertices in the topological order (i.e., the digraph has a Hamiltonian path). Source
A Hamiltonian path just means that a path between two vertices will only visit each vertex once, it does not mean though that one vertex must have no incoming edges. You can have a Hamiltonian path that is in fact a cycle. This would still generate a unique topological sort (of course it would be a cycle as well if that is important to you).
Hope this helps
Haaaaa, ok. sorry for the misunderstanding.
In this case I assume that you are right, here is a sketch of proof:
We have a unique topological sort => We have only one vertex that it is legal to put in the first place => For every vertex, except one, it is not legal to put in the first place => For every vertex, except one, we have incoming edges.
Hope that now I answered your question....
No! The graph below has only one vertex with no incoming edges, and have 2 possible solutions.
1 -> 2
3 -> 4
3 -> 1
4 -> 2
Two solutions are:
2 0 3 1
2 3 0 1
Yes you can say that as a necessary condition as if there are multiple nodes with in-degree =0 then there will not be a Hamiltonian Path hence no unique topological order. Only for the starting node of graphs (in-degree=0) will have no incoming edge, rest all vertices MUST have an incoming edge from their topological ancestor, that means node just before the current node in topological ordering. If every consecutive node in topological ordering does not have an edge the DAG will NOT Have unique order.
Someone has given the answer. Here I just want to give you a counter example: G = {(1,2), (1,3)}. In this case there are 2 valid topological sort: 1,2,3 and 1,3,2

Resources