How do i get a subgraph independent of the internal structure? - gremlin

I want to return a sub-graph that is attached to a certain vertex (Vertex 3). It should not matter how this sub-graph is structured in detail. Ultimately, all I want is to get the vertices and edges that make up the sub-graph.
Simple graph with sub-graph

Using the TinkerPop-modern sample graph you can do this with the following query:
graph = TinkerFactory.createModern()
g = graph.traversal()
g.E(8).drop() // Edge interferes with OP example graph
g.V(1).bothE().as('e').otherV().where(id().not(is(2))).repeat(
bothE().where(neq('e')).subgraph('subGraph').otherV().simplePath()
).emit().times(4).cap('subGraph').
next().traversal().V()
==>v[3]
==>v[4]
==>v[5]
==>v[6]
Explanation:
v(1) is the vertex connected to the subgraph to be found
v(2) is your start vertex, to be excluded from the subgraph
The edge between v(1) and the subgraph also needs to be excluded from the subgraph and gets a reference 'e'
The repeat(...).emit().times(4) does the looping starting from the subgraph's initial edge(s)
The where(neq('e')) makes sure v(1) is not included in the subgraph
cap('subGraph') makes the traversal hold a TinkerGraph objects, created from all the edges referenced by subgraph('subGraph')
next().traversal().V() returns the subgraph, creates a GraphTraversalSource from it and show all vertices in the subgraph
It does not seem possible to start the traversal from v(2), your start vertex, because gremlin's subgraph mechanism does not create separate subgraphs for different branches in the total graph but rather accumulates them in the global SideEffect referenced by 'subGraph'.

Related

Project disconnected vertex with Gremlin/Tinkerpop

I am looking at a kryo file with the following vertices
# Tree Vertices
V(label=tree, properties={treeId:1, treeName:treeA})
V(label=tree, properties={treeId:2, treeName:treeB})
# Root Node Vertices
V(label=node, properties={treeId:1, nodeId:111, nodeType:root})
V(label=node, properties={treeId:2, nodeId:222, nodeType:root})
There are no edges between the vertices labeled as tree and the vertices labeled as node. There are further edges nodes connected to the root nodes but they are irrelevant to this question. I do not want to add any edges as this graph file gets vended to me and I am treating it as read-only.
Now I want to join/project the treeNames into a traversal over the root nodes.
g.V()
.hasLabel('node').has('nodeType', 'root')
.project('nodeId', 'treeId', 'treeName') # return nodeId, treeId, treeName for each root node
.by(values('nodeId'))
.by(values('treeId'))
.by(""" # pseudo-sqlish gremlin to clarify my intent
select treeName
from V().hasLabel('tree')
.where(values('treeid'), eq($thisNode.values('treeId'))
"""
)
In SQL terms I'd say: I want to run a subquery (fully independent sub traversal starting from scratch) and then join it with my outer traversal on a given property. And again: No edge between trees and roots.
WITH
trees as (SELECT treeId, treeName FROM vertices v WHERE v.label = 'tree'),
roots as (SELECT nodeId, treeId FROM vertices v where v.label = 'node')
SELECT roots.nodeId, roots.treeId, trees.treeName
FROM roots
JOIN trees ON (roots.treeId, trees.treeId)
So I am looking for a way to perform a projection based on another traversal + one of the returned vertex properties
How abusive is this?
How to do it?
You can do it by starting a new traversal inside the project like this:
g.V().hasLabel('node').
has('nodeType', 'root').as('root').
project('nodeId', 'treeId', 'treeName').
by(values('nodeId')).
by(values('treeId')).
by(coalesce(
V().hasLabel('tree').where(eq('root')).
by('treeId').
values('treeName'),
constant('tree not exist')
))
see the example here: https://gremlify.com/bybp7s9mdia
How abusive is this: Very.
starting a sub-query for each node vertex can be very 'heavy' performance-wise.
and it's missing all of the advantages of graph DB if your graph schema doesn't fit your requirement

How get both the vertices and edge value in neptune db

I am new to Neptune DB, I have created vertices and connected two vertices with edges and I have given some properties to both the edge and value
I want to retrieve both the edge and vertices properties values
Can someone provide me a sample query for this?
Thanks in advance.
Eg:
Vertices: p1, P2, p3
Edges E1-connecting P1 and P2, E2- connecting P2 and P3
Vertices property: name
Edge property: relation
Now I need to take out name and relation for all the vertices connected to P1
path step is what you are looking for. Using the by modulator you can select properties in a round-robin fashion, i.e. vertex-edge.
Start by locating p1 vertex:
g.V().hasLabel("testV").has("name","p1")
Repeat traversal along edges with "relation" property:
.repeat(outE("testE").has("relation").inV()).until(__.not(outE("testE")))
Get the traversal path (or tree), and select "name" for vertices, and "relation" for edges using the by modulator:
.path().by("name").by("relation")
To see results in arrays of strings:
.local(unfold().fold())
Note that this traversal doesn't handle cycles, but that's another question.
If you need only first level neighbors, you can take a different approach:
g.V().hasLabel("testV").has("name","p2").bothE()
.project("relation","name")
.by(values("relation"))
.by(otherV().values("name"))

What is the Gremlin query that can get me all the vertices either directly or indirectly connected to one specific vertex

I need help with a Gremlin query that can output all the vertices related to one specific vertex A and their cascading related vertices (which means all the vertices related directly or indirectly to A).
For example, in a graph
A -> B -> C
D
Running this query on A will give me B and C.
The solution I have right now is an ugly one:
g.V('A').both(); g.V('A').both().both();
etc
Any help would be really appreciated.
Your solution isn't ugly; it only lacks a bit of iteration and an exit condition.
Do you require a maximum depth? Depending on the shape of your graph, the query you want to execute could be returning all vertices of that graph.
Assuming a toy modern TinkerGraph created in the Gremlin console:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
This query could be helpful:
gremlin> g.V(1).repeat(both().simplePath()).emit().times(3).dedup()
==>v[3]
==>v[2]
==>v[4]
==>v[6]
==>v[5]
"Starting from vertex with id=1, traverse the graph in all directions up to a maximum depth of 3 while discarding previously visited paths. The emit() step ensures that all traversed vertices found along the way, and not just leaves, are returned."
Chances are high that you want to figure out which vertices are linked to that vertex only via specific edges. In such case, you could be passing label(s) to the both() step, and/or maybe chain a few filters.
When developing your query, feel free to chain the path() step to better understand the output.
gremlin> g.V(1).repeat(both().simplePath()).emit().times(3).dedup().path()
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[3],v[6]]
==>[v[1],v[4],v[5]]
There are other ways to solve this, but this query should get you started and familiarize yourself with basic Gremlin steps and concepts.

Finding all maximal transitive closured subgraphs in given graph

I'm trying to solve the following problem:
I have an directed graph G = (E,V) which has a low number of edges. Now I try to find all subgraphs in it which are transitive closured and maximal which means there should be no subgraph which ist part of another subgraph.
The first idea I had was to start at every Node doing a DFS and on every step look if all edges for the closure exists but the performance is horrible. So I wonder if there is any faster algorithm

How do I transform an undirected, very cyclic graph into a directed acyclic graph?

I'm working on a modified TopSort algorithm and am having trouble finding / creating large (more than 1000 nodes) directed acyclic graphs to use for testing. I have an undirected sample graph from another project that is of a good size, but has many cycles. Is there an algorithm I could use to direct the edges so that there are no longer cycles?
this provides a way to get acyclic graphs. Basically, a graph traversal produces a tree, which defines a partial order on the original nodes. Then, just direct all the edges so that they either point in a consistent direction according to the partial order, or are between 2 elements that are not ordered (these can point in any direction).
To garuantee that the new directed graph is connected would I use beadth-first search as follows.
old_undirected graph G
new_directed graph D
dequeue Q
v is any node in G
add v to D
Q.push_back(v)
while(Q is not empty):
v = Q.pop_front()
for all neighbors u to v:
if u in D
add edge u->v to D
else
add u to D and add edge v->u to D
Q.push_back(u)
return D
this graph should contain all the edges of the original graph but the should be so directed that there won't be any circles.
You are looking to convert the graph into a forest of rooted trees. make a breadth-first or depth-first graph traversal of each component of the graph. During the traversal, make a directed edge between parent-child vertices.
see http://en.wikipedia.org/wiki/Graph_traversal

Resources