traversal.asAdmin().addStep(step) for remote calls - gremlin

I am trying to build complex traversals using Java client to a remote JanusGraph server.
The following traversal returns the ReferenceVertex elements identified by the label "mylabel":
GraphTraversal<Vertex, Vertex> t = g.V().hasLabel("mylabel");
List<Vertex> r = t.toList();
For more complex queries I need to concatenate multiple traversals that form a part of the whole queries. The following code illustrates the concept:
GraphTraversal<Vertex, Vertex> addMe = __.hasLabel("mylabel");
GraphTraversal<Vertex, Vertex> t = g.V();
for (Step<?, ?> step : addMe.asAdmin().getSteps()) {
t.asAdmin().addStep(step);
}
List<Vertex> r = t.toList();
For local access this works. For remote access however, it returns all vertices available on the server, not the ones identified by the label.
In both cases, t.toString() returns
[GraphStep(vertex,[]), HasStep([~label.eq(mylabel)])]
What am I doing wrong?

I don't think you need to get into any of the asAdmin() methods. Rather than build anonymous traversals to try to append to a parent traversal, I think it would be better to just pass around the parent GraphTraversal instance and simply add steps to that as needed:
private static GraphTraversal addFilters(GraphTraversal t) {
return t.hasLabel("mylabel");
}
...
GraphTraversal<Vertex, Vertex> t = g.V();
t = addFilters(t);
List<Vertex> r = t.toList();
There is a reason why your approach doesn't work with remote traversals and it has to do with how Gremlin bytecode is constructed behind the scenes. Using the asAdmin() methods bypasses some inner workings and those portions of the traversal are not sent to the server - that's a simple way of explaining it anyway. If you absolutely must construct anonymous portions of a traversal that way and then append them in that exact fashion then I guess I would do:
GraphTraversal<Vertex, Vertex> addMe = __.hasLabel("mylabel");
GraphTraversal<Vertex, Vertex> t = g.V();
List<Vertex> r = t.filter(addMe).toList();
I don't particularly like that approach because depending on what you're doing you could trick out JanusGraph traversal strategies that optimize your traversals and you'll lose some performance optimizations. I also don't like the style as much - it just seems more natural to pass around the GraphTraversal to the functions that need to mutate it with new steps. You might also find this information about traversal re-use helpful.

Related

Create edges from one traversal to another in Gremlin

I use Gremlin API in Java.
Assume we have a traversal to persons and another traversal to locations that is quite long and dependent on the first:
GraphTraversal<?, Vertex> persons = g.V().has("prop", "value");
GraphTraversal<?, Vertex> locations = persons.out("place").has(..)..;
Now I want to link each person to the locations that correspond to that persons with a direct link, considering that some of these edges are already in place.
Which strategy would be good to do such links using Gremlin API in Java?
I couldn't find an easy way to link two streams of vertices with many to many relationship. But getting the set of objects and creating edge in loop as usually for one to many works for me:
Set<Object> personVertexIds = persons.id().toSet();
personVertexIds.forEach(id -> {
GraphTraversal<Vertex, Vertex> person = g.V(id).as("p");
GraphTraversal<?, Vertex> locations = persons.out("place").has(..)..;
locations.coalesce(inE("link").where(outV().where(P.eq("p"))),
addE("link").from("p")).property("prop", value);
});

Conditional has()

I am building a query based on passed parameters. For example, if pass p1='a' and p2='b', my query will look something like this:
g.V()
.has("p1","a")
.has("p2","b")
If let's say p2 is not passed, then I won't have a second check:
g.V()
.has("p1","a")
Is it possible to perform parameter check inside the query instead of creating conditional checks for parameters before creating query?
Edit: Use case is based RESTful web service where I have something like, /server/myEndpoint?p1=a and endpoint implementation would build gremlin query with .has() steps solely based on presence of p1 or p2, so if p2 is not passed, query would look like in second snippet, and if passed would look like the one in first.
One possible approach is to build GraphTraversal until it's not executed:
Map<String, String> map = new HashMap<>();
map.put("p1", "a");
map.put("p2", null);
final GraphTraversal<Vertex, Vertex> v = g.V();
map.forEach((k,v) -> {
if(v != null) {
v.has(k,v);
}
});
return v.toStream().collect(Collectors.toList());
As a suggested improvement where you are creating the search, the Gremlin-optimized way to approach this would be through a parameterized script. In theory this would mean that you are building the query in a way which is looking for each parameter.
However, I don't believe there is an inherent "if" statement, per-say, so if you wanted to use the query to optionally handle your "Has" cases, it could be done with an "Or Step", an "And Step", or a "Choose Step". I would discourage it as a pattern, though. It is similar to how MySQL has the options to handle cases in Select queries: it would almost always be more performant and a better separation of concerns to bind the parameters right in the first place than to have the query builder sort out the programmatic logic.

How to add a default value for empty traversals in gremlin?

I'm working on a gremlin query that navigates along several edges and eventually produces a String. Depending on the graph content, this traversal may be empty. In case that the traversal ends up being empty, I want to return a default value instead.
Here's what I am currently doing:
GraphTraversal<?, ?> traversal = g.traversal().V().
// ... fairly complex navigation here...
// eventually, we arrive at the target vertex and use its name
.values("name")
// as we don't know if the target vertex is present, lets add a default
.union(
identity(), // if we found something we want to keep it
constant("") // empty string is our default
)
// to make sure that we do not use the default if we have a value...
.order().by(s -> ((String)s).length(), Order.decr)
.limit(1)
This query works, but it is fairly convoluted - all I want is a default if the traversal ends up not finding anything.
Does anybody have a better proposal? My only restriction is that it has to be done within gremlin itself, i.e. the result must be of type GraphTraversal.
You can probably use coalesce() in some way:
gremlin> g.V().has('person','name','marko').coalesce(has('person','age',29),constant('nope'))
==>v[1]
gremlin> g.V().has('person','name','marko').coalesce(has('person','age',231),constant('nope'))
==>nope
If you have more complex logic in mind for determining if something is found or not then consider choose() step.

Dynamic gremlin traversal construction

I'm migrating from the deprecated gremlin-javascript to the new Tinkerpop gremlin.
gremlin-javascript supported an execute method that would take an arbitrary string as a traversal. We could dynamically create and pass this string, such as chaining an arbitrary number of property traversals on a vertex.
Is there a way to dynamically build traversals in the gremlin js client?
For all language variants of Gremlin (Java, JS, Python, etc), you write Gremlin by constructing a Traversal object. You have a g which is a GraphTraversalSource which spawns those Traversal objects and thus:
var t = g.V().values('names');
does not yield a result (i.e. a list of "name" values) in t but a Traversal object. To get the result you need to iterate the traversal as for example:
t.toList().then(names => console.log(names));
So, if you have a Traversal object that is not yet iterated you can continue to add to it:
var t = g.V().values('names');
t = t.limit(1);
t.next().then(...)

How to traverse graph created using ConfiguredPlanFactory in JanusGraph?

I have set up a Janusgraph Cluster with Cassandra + ES. The cluster has been set up to support ConfiguredGraphFactory. Also, I am connecting the gremlin cluster remotely. I have set up a client and am able to create a graph using :
client.submit(String.format("ConfiguredGraphFactory.create(\"%s\")", graphName));
However, I am not able to get the traversalSource of the graph created using the gremlin driver. Do I have to create raw gremlin queries and traverse the graph using client.submit or is there a way to get it through the gremlin driver using Emptygraph.Instance().
To get the remote traversal reference, you need to pass in a variable name that is bound to your graph traversal. This binding is usually done as part of the "globals" in your startup script when you start the remote server (the start up script is configured to run as part of the gremlin-server.yaml).
There is currently no inherent way to dynamically bind a variable to a graph or traversal reference, but I plan on fixing this at some point.
A short term fix is to bind your graph and traversal references to a method that will be variably defined, and then create some mechanism to change the variable dynamically.
To further explain a potential solution:
Update your server's startup script to bind g to something variable:
globals << [g : DynamicBindingTool.getBoundGraphTraversal()]
Create DynamicBindingTool, which has to do two things:
A. Provide a way to setBoundGraph() which may look something like:
setBoundGraph(graphName) {
this.boundGraph = ConfiguredGraphFactory.open(graphName);
}
B. Provide a way to getBoundGraphTraversal() which may look something like:
getBoundGraphTraversal() {
this.boundGraph.traversal();
}
You can include these sorts of functions in your start-up script or perhaps even create a separate jar that you attach to your Gremlin Server.
Finally, I would like to note that the proposed example solution does not take into account a multi-node JanusGraph cluster, i.e. your notion of the current bound graph would not be shared across the JG nodes. To make this a multi-node solution, you can update the functions to define the bound graph on an external database or even piggybacked on a JanusGraph graph.
For example, something like this would be a multi-node safe implementation:
setBoundGraph(graphName) {
def managementGraph = ConfiguredGraphFactory.open("managementGraph");
managementGraph.traversal().V().has("boundGraph", true).remove();
def v = managementGraph.addVertex();
v.property("boundGraph", true);
v.property("graph.graphname", graphName);
}
and:
getBoundGraphTraversal() {
def managementGraph = ConfiguredGraphFactory.open("managementGraph");
def graphName = managementGraph.traversal().V().has("boundGraph", true).values("graph.graphname");
return ConfiguredGraphFactory.open(graphName).traversal();
}
EDIT:
Unfortunately, the above "short-term trick" will not work as the global bindings are evaluated once and stored in a Map for the duration of the sever life cycle. Please see here for more information and updates on fixes: https://issues.apache.org/jira/browse/TINKERPOP-1839.

Resources