Is there any efficient way to find all connected subgraphs in NebulaGraph - nebula-graph

I have several isolated subgraphs in a graph space like: [1-2, 2-3], [4-5,5-6], [7-8]. I want to get subsets of nodes of all connected subgraphs such as:
[1,2,3], [4,5,6], [7,8] # 3 subgraphs
Can I get these results in an efficient way by the nGQL in Nebula Graph?

This is typically a Graph Computation job rather than a graph database query.
In theory, we could do this with LOOKUP xxx | GET SUBGRAPH (or of course with a complex multiple MATCH OpenCypher query) while, it in most cases(unless we are in an extremely small dataset, or, the graph is quite isolated).
Instead, if we are talking about all isolated subgraphs, we should go for NebulaGraph Algorithm, which is Open Source, too.

Related

Gremlin: limit by vertex label

Hello dear gremlin jedi,
I have a bunch of nodes with different labels in my graph:
g.addV('book')
.addV('book')
.addV('book')
.addV('movie')
.addV('movie')
.addV('movie')
.addV('album')
.addV('album')
.addV('album').iterate()
There also may be vertices with other labels.
and a hash map describing what labels and how many vertices of each label I want to get:
LIMITS = {
"book": 2,
"movie": 2,
"album": 2,
}
I'd like to write a query that returns a list of vertices consisting of vertices with specified labels whete amount of vertices with each label is limited in according to the LIMITS hash map. In this case there should be 2 books, 2 movies and 2 albums in the result.
The limits and requested labels are calculated independently for every query so they cannot be hardcoded.
As far as I can see the limit step does not support passing traversals as an argument.
What trick can I use to write such query? The only option I see is to build the query using capabilities of the client side programming language (Ruby with grumlin as a gremlin client in my case):
nodes = LIMITS.map do |label, limit|
__.hasLabel(label).limit(limit)
end
g.V().union(*nodes).toList
But I believe there is a better solution.
Thank you!
The most direct way would be to use group() I think:
gremlin> g.V().group().by(label)
==>[software:[v[3],v[5]],person:[v[1],v[2],v[4],v[6]]]
gremlin> g.V().group().by(label).by(unfold().limit(2).fold())
==>[software:[v[3],v[5]],person:[v[1],v[2]]]
You can filter the vertices going to group() with hasLabel() if you need those sorts of restrictions. Depending upon how you use this, the traversal could be expensive in the sense that you have to traverse a fair bit of data to filter away all but two (in this case) vertices. If that is a concern, your approach to dynamically construct the traversal and the piecing it together with union() doesn't seem so bad. While I could probably think up a way to write that in just Gremlin, it probably wouldn't not be as readable as your approach.

How to traverse all nodes of a Directed Acyclic Graph in a specific order?

I have a problem where I need to traverse ALL nodes of the Directed Acyclic graph in a specific order because some nodes/vertices are dependent on results of multiple other nodes/vertices.
In this case, DFS or BFS won't work.
What is the solution/algorithm/ threads for traversing a DAG like this?
Should I be also ordering the nodes? eg: That node which does not depend on anything else - is first, then Node A, then Node B, then C (which depends on Node A and Node B).. beforehand?
The answer was topological sort which can be implemented using
Kahn's algorithm
Depth-first search or
Parallel algorithms
Thanks #beaker

TITAN : Identify and remove duplicate vertices in graph

I am using TITAN 0.4 over Cassandra, I have indexed my key ("ip_address" in my case), but as NON-UNIQUE, for performance and scalability.
Now the challenge is graph allows duplicates vertices.
I am running a background task to cleanup the duplicate vertices in graph, by iterating through all vertices.
What is the best way or approach to identify a duplicate vertex in a graph.
The the estimated size of graph in production is around 10M ~ 15M vertices or even more than that.
Is there any feature exist in TITAN index, which helps to easily identify a duplicate?
Thanks in advance
Index creation Gremlin script
g.makeKey("ip_address").dataType(String.class).indexed("standard",Vertex.class).make();
I would start with a Titan/Hadoop job:
g.V().ip_address.groupCount()
Then use those IP addresses with a count > 1 to clean up / merge duplicated vertices in OLTP mode.

Reachable vertices from each other

Given a directed graph, I need to find all vertices v, such that, if u is reachable from v, then v is also reachable from u. I know that, the vertex can be find using BFS or DFS, but it seems to be inefficient. I was wondering whether there is a better solution for this problem. Any help would be appreciated.
Fundamentally, you're not going to do any better than some kind of search (as you alluded to). I wouldn't worry too much about efficiency: these algorithms are generally linear in the number of nodes + edges.
The problem is a bit underspecified, so I'll make some assumptions about your data structure:
You know vertex u (because you didn't ask to find it)
You can iterate both the inbound and outbound edges of a node (efficiently)
You can iterate all nodes in the graph
You can (efficiently) associate a couple bits of data along with each node
In this case, use a convenient search starting from vertex u (depth/breadth, doesn't matter) twice: once following the outbound edges (marking nodes as "reachable from u") and once following the inbound edges (marking nodes as "reaching u"). Finally, iterate through all nodes and compare the two bits according to your purpose.
Note: as worded, your result set includes all nodes that do not reach vertex u. If you intended the conjunction instead of the implication, then you can save a little time by incorporating the test in the second search, rather than scanning all nodes in the graph. This also relieves assumption 3.

Cycle detection in graphs containing multiple cycles

I have the following graph:
Is there a way I can identify all cycles in this graph? I know that DFS can be used to detect cycles by simply doing DFS until a back edge is found, but I was wondering if there is a computationally efficient way to return the individual cycles, considering that there are actually 3 cycles in the graph (1-2-3-4-5-6, 4-5-7-8-9, 1-2-3-4-9-8-7-5-6). I am a bit stuck because it seems like the carbon atom belongs to multiple graphs and I can't think of any way other than brute-forcing all possible paths originating from every vertex.
You don't have to find all pathes from EVERY vertex.
Only vertex refering to 3 or more other may belong to multiple cycles
You have to check only 4,5,6(,9)

Resources