Gremlin find highest match - gremlin

I am planning to use a Graph Database (AWS Neptune) that can be queried with Gremlin as a sort of Knowledge base. The KB would be used as a classification tool on with entities with multiple features. For simplicity, I am using geometric shapes to code the properties of my entities in this example. Let's suppose I want to classify Points that can be related to Squares, Triangles and Circles. I have blueprint the different possible relationships of Points with the possibles Squares, Triangles and Circles in a graph as depicted in the picture below.
Created with:
g.addV('Square').property(id, 'S_A')
.addV('Square').property(id, 'S_B')
.addV('Circle').property(id, 'C_A')
.addV('Triangle').property(id, 'T_A')
.addV('Triangle').property(id, 'T_B')
.addV('Point').property(id, 'P1')
.addV('Point').property(id, 'P2')
.addV('Point').property(id, 'P3')
g.V('P1').addE('Has_Triangle').to(g.V('T_B'))
g.V('P2').addE('Has_Triangle').to(g.V('T_A'))
g.V('P1').addE('Has_Square').to(g.V('S_A'))
g.V('P2').addE('Has_Square').to(g.V('S_A'))
g.V('P2').addE('Has_Circle').to(g.V('C_A'))
g.V('P3').addE('Has_Circle').to(g.V('C_A'))
g.V('P3').addE('Has_Square').to(g.V('S_B'))
The different entities are for example Points, Squares, Triangles, Circles.
So my ultimate goal is to find the Point that satisfies the highest number of conditions. E.g.
g.V().hasLabel('Point').where(and(
out('Has_Triangle').hasId('T_A'),
out('Has_Circle').hasId('C_A'),
out('Has_Square').hasId('S_A')
))
// ==>v[P2]
The query above works very well for classifying a Point (a) with properties (T_A,S_A,C_A) respectively as a Point 2 (P2) type for example. But if I would have to use the same query for classifying a Point with properties (C_A,S_B,T_X) for example:
g.V().hasLabel('Point').where(and(
out('Has_Triangle').hasId('T_X'),
out('Has_Circle').hasId('C_A'),
out('Has_Square').hasId('S_B')
))
The query would fail to classify this point as Point 3 (P3) as in the KB there is no known Triangle property for P3.
Is there a way I can express a query that returns the vertex with the highest match which in this case would be P3?
Thank you in advance.
EDIT
Best idea to solve this so far, is to put sentinel values for KB properties that do not exist. Then modify the query to match each exact property or the sentinel value. But this means that if I add a new "type" of property to a Point in the future e.g. a Point Has_Hexagon, than I need to add sentinel Hexagon to all Points of my graph.
EDIT 2
Added Gremlin script that creates sample data

You can use the choose() step to increment a counter (sack) for each match, then order by counter values (descending) and pick the first one (highest match).
gremlin> g.withSack(0).V().hasLabel('Point').
choose(out('Has_Triangle').hasId('T_A'), sack(sum).by(constant(1))).
choose(out('Has_Circle').hasId('T_A'), sack(sum).by(constant(1))).
choose(out('Has_Square').hasId('T_A'), sack(sum).by(constant(1))).
order().
by(sack(), decr).
limit(1)
==>v[P2]
gremlin> g.withSack(0).V().hasLabel('Point').
choose(out('Has_Triangle').hasId('T_X'), sack(sum).by(constant(1))).
choose(out('Has_Circle').hasId('T_A'), sack(sum).by(constant(1))).
choose(out('Has_Square').hasId('S_B'), sack(sum).by(constant(1))).
order().
by(sack(), decr).
limit(1)
==>v[P3]
Each choose() step in the queries above can be read as if (condition) increment-counter. In any case, whether the condition is met or not, the original vertex (Point) will be emitted by the choose-step.

Related

Gremlin: How to obtain outgoing edges and their target vertices in a single query

Given a set of vertices (say, for simplicity, that I start with one: G.V().hasId("something")), I want to obtain all outgoing edges and their target vertices. I know that .out() will give me all target vertices, but without the information about the edges (which have properties on them, too). On the other hand, .outE() will give me the edges but not the target vertices. Can I obtain both in a single Gremlin query?
Gremlin is as much about transforming graph data as it is navigating graph data. Typically folks seem to understand the navigation first which got you to:
g.V().hasId("something").outE()
You then need to transform those edges into the result you want - one that includes the edge data and it's adjacent vertex. One way to do that is with project():
g.V().hasId("something").outE()
project('e','v').
by().
by(inV())
Each by()-modulator supplied to project() aligns to the keys supplied as arguments. The first applies to "e" and the second to "v". The first by() is empty and is effectively by(identity()) which returns the same argument given to it (i.e. the current edge in the stream).
Never mind. Figured this out.
G.V().hasId("something").outE().as("E").otherV().as("V").select("E", "V")

Break up graph into smallest sub-components of 2-nodes or greater

I wish to be able to separate my graph into subcomponent such that the removal of any single node would create no further sub-components (excluding single nodes). As an example see the two images below.
The first image shows the complete graph. The second image shows the sub-components of the graph when it has been split into the smallest possible subcomponents. As can be seen from the second image, the vertex names have been maintained. I don't need the new structure to be a single graph it can be a list of graphs, or even a list of the nodes in each component.
The component of nodes 4-5-6 remains as removing any of the three nodes will not create a new component as the node that was broken off will only be a single node.
At the moment I am trying to put together an iterative process, that removes nodes sequentially in ascending degree order and recurses into the resultant new components. However, it is difficult and I imagine someone else has done it better before.
You say you want the "smallest subcomponents of 2 nodes of greater", and that your example has the "smallest possible subcomponents". But what you actually meant is the largest possible subcomponents such that the removal of any single node would create no further sub-components, right? Otherwise you could just separate the graph into a collection of all of the 2-graphs.
I believe, then, that your problem can be described as finding all "biconnected components" (aka maximal biconnected subgraphs of a graph): https://en.wikipedia.org/wiki/Biconnected_component
As you said in the comments, igraph has the function biconnected_components(g), which will solve your problem. :)

Searching for an atypical graph pathfinding algorithm

Graph structure
graph is oriented
each edge has an assigned value representing the cost of using this edge. The value can be positive or negative
Problem description
input: graph, starting node and initial value (I don't have a goal node)
both, nodes and edges can be used repeatedly
goal: change the initial value to zero by passing through the graph. The answer should be if it is possible to reach zero (exactly zero, without reaching a negative actual value in the process)
I don't need the final path as the result, just the information if it is possible is enough. I would be most interested in a name of algorithm that is designed for this problem.
It is clearly NP-Hard (subset-sum can be reduced to it by using an appropriate complete graph). Breadth-first search seems like a natural approach, though to get a decision-procedure out of it you would need to find an upper-bound on the length of the path.

Coloring a graph using Depth first traversal

I know that for coloring graph nodes, backtracking/brute force is a common solution. But I was wondering if using DFS I can also achieve a solution ?
Backtracking gives you the opportunity to go back and try other color possibility in order to paint all the nodes with N colors
DFS will start from one node and color it, then jump to its neighbor and color it in different color than its neighbors etc ...
I did a search about using this method but I didn't find an algorithm that uses this.
Question: Is using DFS possible for coloring graph nodes. If yes, is it more efficient than backtracking ?
Thank you
I believe there is some confusion when comparing backtracking and DFS wrt vertex coloring. DFS traversal for a graph gives full enumeration of its vertices in a sequence related to its structure. It does not, however, constitute a full enumeration for the vertex coloring problem, which would require taking into account the possible colors of the vertices.
Thus, if I understand correctly, what you have implemented is a greedy heuristic coloring for a graph directed by DFS.
On the other hand, a backtracking/brute force solution as you name it (such as [Randall-Brown 72]) will provide an exact solution for the minimum coloring problem since it considers every possible vertex coloring. Note that DFS traversal could be used to sort the vertices initially (topological sort) and feed that order to the exact solver.

Best way to build color matrix with different color in contiguous items

I have interesting case I have table in database where I store some color in hex format.
I make a query to this table and getting a list with different colors from database.
I need to show a query result to user in "matrix style" for example 8 row and 10 columns.
But result matrix shouldn't have the same or similar hue in contiguous items matrix items.
What the best way to do it ?
You could start with a very simple algorithm:
Place all colours at random
Find a bad entry (i.e. one where the colour is not very different to a neighbour)
Swap this entry with another random location (but only if the swap makes things better)
Repeat several times
You are doing a Vertex Coloring of a planar graph. The good news is that since your graph is planar you really only need four colors, the bad news is that finding the coloring is not always easy. If your graphs are always that size, though, a recursive solution with backtracking will probably be sufficient.

Resources