How to get all Graph Nodes in ArangoDB without Start-Node - graph

Like in OrientDB, for get the All Graph Only use 'Select From v'
So far, I have use AQL in ArangoDB with start node:
for v,e,p IN 2 ANY 'user/188802' graph 'a' return p
And now I want to get all graph nodes in ArangoDB without the start node?

Graphs are a grouping of Edge collections. Each Edge collection references _from and _to documents which are stored in Document collections.
The graph traversal queries expect you to have a starting position and it returns the results for that single starting position.
It is possible to identify all possible starting positions, and then run graph traversals over those positions.
You'll need to know the names of the document collections that make up your graph, you can insert them into an AQL query like this:
FOR vertex IN UNION(
(FOR v IN document_collection_1 RETURN v._id),
(FOR v IN document_collection_2 RETURN v._id),
(FOR v IN document_collection_3 RETURN v._id)
)
FOR v, e IN 1..5 OUTBOUND vertex GRAPH 'my_graph_name' OPTIONS { uniqueVertices: true }
RETURN DISTINCT [
{
_from: e._from,
_to: e._to
}
]
Remember that in ArangoDB it is possible for a document collection to be bound to more than one graph, so you'll need to ensure you identify all document collections that are part of the graph.
This query will then extract an array of objects that contain all links defined in the graph. This query focuses only on vertices with edges that are part of the graph. If the vertex has no edge on it, it won't appear in the output as it is not part of the graph.

Related

Partition graph into groups of neighbours having the same class

Using JGraphT, I would like partition a graph into groups where each group consists of a connected sub-graph of vertices that have the same "class" (denoted using colors below).
Example -- the desired groups are in red:
I think this is a rather simple demand and yet I can't find a (built-in) way to do this. I notice there is a PartitioningImpl class, that one constructs using a List<Set<V>> classes, yet I don't see a way to use this to partition a graph.
Ideally, I'd provide something with my graph and vertex classes (a map of V-->Integer for instance) and it would return something like a List<Set<V>> of partitioned vertex groups.
Sometimes you just cannot avoid writing some code
LOOP over classes
LOOP over nodes that are in class
Copy node to new graph
LOOP over edges that connect nodes in class
Copy edge to new graph
LOOP over connected components in new graph
Save component as a group graph for class
This is a fairly simple approach using JGraphT:
First remove edges linking neighbouring vertices that belong to different classes and then use ConnectivityInspector on the reduced graph to find connected components, which form the groups.
SimpleGraph<V, E> graph; // given by user
Map<V, Integer> classes; // given by user
List<E> toRemove = new ArrayList<>();
graph.edgeSet().forEach(e -> {
V a = graph.getEdgeSource(e);
V b = graph.getEdgeTarget(e);
if (!classes.get(a).equals(classes.get(b))) {
toRemove.add(e);
}
});
graph.removeAllEdges(toRemove);
ConnectivityInspector<V, E> ci = new ConnectivityInspector<>(graph);
List<Set<V>> groups = ci.connectedSets();

Project disconnected vertex with Gremlin/Tinkerpop

I am looking at a kryo file with the following vertices
# Tree Vertices
V(label=tree, properties={treeId:1, treeName:treeA})
V(label=tree, properties={treeId:2, treeName:treeB})
# Root Node Vertices
V(label=node, properties={treeId:1, nodeId:111, nodeType:root})
V(label=node, properties={treeId:2, nodeId:222, nodeType:root})
There are no edges between the vertices labeled as tree and the vertices labeled as node. There are further edges nodes connected to the root nodes but they are irrelevant to this question. I do not want to add any edges as this graph file gets vended to me and I am treating it as read-only.
Now I want to join/project the treeNames into a traversal over the root nodes.
g.V()
.hasLabel('node').has('nodeType', 'root')
.project('nodeId', 'treeId', 'treeName') # return nodeId, treeId, treeName for each root node
.by(values('nodeId'))
.by(values('treeId'))
.by(""" # pseudo-sqlish gremlin to clarify my intent
select treeName
from V().hasLabel('tree')
.where(values('treeid'), eq($thisNode.values('treeId'))
"""
)
In SQL terms I'd say: I want to run a subquery (fully independent sub traversal starting from scratch) and then join it with my outer traversal on a given property. And again: No edge between trees and roots.
WITH
trees as (SELECT treeId, treeName FROM vertices v WHERE v.label = 'tree'),
roots as (SELECT nodeId, treeId FROM vertices v where v.label = 'node')
SELECT roots.nodeId, roots.treeId, trees.treeName
FROM roots
JOIN trees ON (roots.treeId, trees.treeId)
So I am looking for a way to perform a projection based on another traversal + one of the returned vertex properties
How abusive is this?
How to do it?
You can do it by starting a new traversal inside the project like this:
g.V().hasLabel('node').
has('nodeType', 'root').as('root').
project('nodeId', 'treeId', 'treeName').
by(values('nodeId')).
by(values('treeId')).
by(coalesce(
V().hasLabel('tree').where(eq('root')).
by('treeId').
values('treeName'),
constant('tree not exist')
))
see the example here: https://gremlify.com/bybp7s9mdia
How abusive is this: Very.
starting a sub-query for each node vertex can be very 'heavy' performance-wise.
and it's missing all of the advantages of graph DB if your graph schema doesn't fit your requirement

Gremlin find highest match

I am planning to use a Graph Database (AWS Neptune) that can be queried with Gremlin as a sort of Knowledge base. The KB would be used as a classification tool on with entities with multiple features. For simplicity, I am using geometric shapes to code the properties of my entities in this example. Let's suppose I want to classify Points that can be related to Squares, Triangles and Circles. I have blueprint the different possible relationships of Points with the possibles Squares, Triangles and Circles in a graph as depicted in the picture below.
Created with:
g.addV('Square').property(id, 'S_A')
.addV('Square').property(id, 'S_B')
.addV('Circle').property(id, 'C_A')
.addV('Triangle').property(id, 'T_A')
.addV('Triangle').property(id, 'T_B')
.addV('Point').property(id, 'P1')
.addV('Point').property(id, 'P2')
.addV('Point').property(id, 'P3')
g.V('P1').addE('Has_Triangle').to(g.V('T_B'))
g.V('P2').addE('Has_Triangle').to(g.V('T_A'))
g.V('P1').addE('Has_Square').to(g.V('S_A'))
g.V('P2').addE('Has_Square').to(g.V('S_A'))
g.V('P2').addE('Has_Circle').to(g.V('C_A'))
g.V('P3').addE('Has_Circle').to(g.V('C_A'))
g.V('P3').addE('Has_Square').to(g.V('S_B'))
The different entities are for example Points, Squares, Triangles, Circles.
So my ultimate goal is to find the Point that satisfies the highest number of conditions. E.g.
g.V().hasLabel('Point').where(and(
out('Has_Triangle').hasId('T_A'),
out('Has_Circle').hasId('C_A'),
out('Has_Square').hasId('S_A')
))
// ==>v[P2]
The query above works very well for classifying a Point (a) with properties (T_A,S_A,C_A) respectively as a Point 2 (P2) type for example. But if I would have to use the same query for classifying a Point with properties (C_A,S_B,T_X) for example:
g.V().hasLabel('Point').where(and(
out('Has_Triangle').hasId('T_X'),
out('Has_Circle').hasId('C_A'),
out('Has_Square').hasId('S_B')
))
The query would fail to classify this point as Point 3 (P3) as in the KB there is no known Triangle property for P3.
Is there a way I can express a query that returns the vertex with the highest match which in this case would be P3?
Thank you in advance.
EDIT
Best idea to solve this so far, is to put sentinel values for KB properties that do not exist. Then modify the query to match each exact property or the sentinel value. But this means that if I add a new "type" of property to a Point in the future e.g. a Point Has_Hexagon, than I need to add sentinel Hexagon to all Points of my graph.
EDIT 2
Added Gremlin script that creates sample data
You can use the choose() step to increment a counter (sack) for each match, then order by counter values (descending) and pick the first one (highest match).
gremlin> g.withSack(0).V().hasLabel('Point').
choose(out('Has_Triangle').hasId('T_A'), sack(sum).by(constant(1))).
choose(out('Has_Circle').hasId('T_A'), sack(sum).by(constant(1))).
choose(out('Has_Square').hasId('T_A'), sack(sum).by(constant(1))).
order().
by(sack(), decr).
limit(1)
==>v[P2]
gremlin> g.withSack(0).V().hasLabel('Point').
choose(out('Has_Triangle').hasId('T_X'), sack(sum).by(constant(1))).
choose(out('Has_Circle').hasId('T_A'), sack(sum).by(constant(1))).
choose(out('Has_Square').hasId('S_B'), sack(sum).by(constant(1))).
order().
by(sack(), decr).
limit(1)
==>v[P3]
Each choose() step in the queries above can be read as if (condition) increment-counter. In any case, whether the condition is met or not, the original vertex (Point) will be emitted by the choose-step.

Following edges who's properties match one on the source vertex with Gremlin

I have a graph, and I want to follow the edges that contain a property that matches one on the vertex. E.g.
vertex -------edge ---------> vertex
vertValue: 123 vert: 123 vertValue: 463
So in the above case, say I'm starting at the first vertex, I want to follow the edges that has the property 'vert' equal to it's 'vertValue'.
I've manage to make a query using where that manages to follow the edge:
g.V(id).as('verts')
.outE().values('vert')
.where(eq('verts'))
.by('vertValue')
Problem, is I want to return the actual edge object. However, as I need to do the 'where' on the 'vert' value of the edge, in the query I'm doing a values('vert') so the result I get is just the value I'm testing, not the edge object. I want to do something like:
g.V(id).as('verts')
.outE()
.where(eq('verts'))
.by('vertValue','vert')
To compare the 'vertValue' on the vertex, to the 'vert' value on the edge. But I can't seem to find a way to specify by on the 'input' values of the where.
Edit: Trying:
g.V("95c4a57a-9262-45b7-a905-8cca95d3ebb6").as('v').
outE().
where(eq('v')).
by('vert').
by('vertValue')
Returns an error:
Gremlin Query Execution Error: Select One: Get Projection: The provided traversal or property name of path() does not map to a value.
g.V("95c4a57a-9262-45b7-a905-8cca95d3ebb6").as('v').
outE().
where(eq('v')).
by('vertValue').
by('vert')
returns an empty array!
You got really close to the solution. It's:
g.V(id).as('v').
outE().
where(eq('v')).
by('vert').
by('vertValue')

How to randomly pick a vertex or edge from graph of jGraphT

I have created a Graph with a set of edges I have (4000K Edges and 4K nodes).
Now I want to take 10% of the edges from the corpus to create a train and test data set.
I want to pick an edge in random, verify if the vertices of this edge has an edge with a random vertex. If so, I will remove that edge in the graph and also write that edge in a test file. So, that later I will predict the edges of the test file using some similarity function.
Logic is I am trying to predict A->C, given A->B and B->C.
Now the problem is, I cannot get a way to randomly pick an edge and randomly pick a vertex in JGraphT. My vertex names are some strings with random numbers.
Any one has a solution for this ?
There is a possibility. See the example first:
DirectedGraph<String, DefaultEdge> graph = new DefaultDirectedGraph<String, DefaultEdge>(DefaultEdge.class);
Object[] vertexSet = graph.vertexSet().toArray();
Object[] edgeSet = graph.edgeSet().toArray();
String someRndNode = (String) vertexSet [ getSomeRandomNumberBetween(0, vertexSet.length)];
DefaultEdge someRndEdge = (DefaultEdge) edgeSet [ getSomeRandomNumberBetween(0, edgeSet.length)];
You simply get the set of edges and nodes of your graph. Determine a random number based on the arrays. Get the stuff you need out of it.

Resources