Anonymous traversal vs normal traversal gremlin - gremlin

I have read the documentation about anonymous traversals. I understand they can be started with __ and they can be used inside step modulators. Although I dont understand it conceptually. Why cannot we use a normal traversal spawned from graph traversal source inside step modulators? For example, in the following gremlin code to create an edge
this.g
.V(fromId) // get vertex of id given for the source
.as("fromVertex") // label as fromVertex to be accessed later
.V(toId) // get vertex of id given for destination
.coalesce( // evaluates the provided traversals in order and returns the first traversal that emits at least one element
inE(label) // check incoming edge of label given
.where( // conditional check to check if edge exists
outV() // get destination vertex of the edge to check
.as("fromVertex")), // against staged vertex
addE(label) // add edge if not present
.property(T.id, id) // with given id
.from("fromVertex")) // from source vertexx
.next(); // end traversal to commit to graph
why are __.inE() and __.addE() anonymous? Why cannot we write this.g.inE() and this.g.addE() instead? Either ways, the compiler is not complaining. So what special benefit does anonymous traversal gives us here?

tldr; Note that in 3.5.0, users are prevented from utilizing a traversal spawned from a GraphTraversalSource and must use __ so it is already something you can expect to see enforced in the latest release.
More historically speaking....
A GraphTraversalSource, your g, is meant to spawn new traversals from start steps with the configurations of the source assigned. An anonymous traversal is meant to take on the internal configurations of the parent traversal it is assigned to as it is spawned "blank". While a traversal spawned from g can have its internal configuration overwritten, when assigned to a parent, it's not something that is really part of the design for it to always work that way, so you take a chance in relying on that behavior.
Another point is that from the full list of Gremlin steps, only a few are actually "start steps" (i.e. addV(), addE(), inject(), V(), E()) so in building your child traversals you can really only ever use those options. As you often need access to the full list of Gremlin steps to start a child traversal argument, it is better to simply prefer __. By being consistent with this convention, it prevents confusion as to why child traversals "sometimes start with g and other times start with __" if they are used interchangeably throughout a single traversal.
There are perhaps other technical reasons why the __ is required. An easy one to see that doesn't require a ton of explanation can be demonstrated in the following Gremlin Console snippet:
gremlin> __.addV('person').steps[0].class
==>class org.apache.tinkerpop.gremlin.process.traversal.step.map.AddVertexStep
gremlin> g.addV('person').steps[0].class
==>class org.apache.tinkerpop.gremlin.process.traversal.step.map.AddVertexStartStep
The two traversals do not produce analogous steps. If using g in replace of __ works today, it is by coincidence and not by design, which means that it could have the potential to break in the future.

Related

Condensing Gremlin queries into one

I have two queries that delete certain vertices in a graph for the same initial vertex
g.V(id).outV().drop().iterate()
g.V(id).drop().iterate()
Is it possible to combine these two queries into one?
Second question is how can perform some terminal operation on vertices before they are dropped, I tried with sideEffect, but it needs to return value
g.V(id).outV().sideEffect(outV().forEachRemainig(x -> // do something)).drop()
For your initial question you can accomplish this via a sideEffect() like this:
g.V(id).sideEffect(out().drop()).drop()
For the second traversal you can accomplish this by switching the sideEffect() to performing the drop and then put the remaining operations to be part of the main traversal stream. Since sideEffect() streams the incoming traversals to the output you will be able to perform operations on them like this:
g.V(id).sideEffect(drop()).valueMap()
Just a note here, in your original traversals you went g.V(id).outV() which is not allowed as outV() only works from an edge, so I changed it to out() which takes you to the adjacent vertex.

Golang RWMutex on map content edit

I'm starting to use RWMutex in my Go project with map since now I have more than one routine running at the same time and while making all of the changes for that a doubt came to my mind.
The thing is that I know that we must use RLock when only reading to allow other routines to do the same task and Lock when writing to full-block the map. But what are we supposed to do when editing a previously created element in the map?
For example... Let's say I have a map[int]string where I do Lock, put inside "hello " and then Unlock. What if I want to add "world" to it? Should I do Lock or can I do RLock?
You should approach the problem from another angle.
A simple rule of thumb you seem to understand just fine is
You need to protect the map from concurrent accesses when at least one of them is a modification.
Now the real question is what constitutes a modification of a map.
To answer it properly, it helps to notice that values stored in maps are not addressable — by design.
This was engineered that way simply due to the fact maps internally have intricate implementation which
might move values they contain in memory
to provide (amortized) fast access time
when the map's structure changes due to insertions and/or deletions of its elements.
The fact map values are not addressable means you can not do
something like
m := make(map[int]string)
m[42] = "hello"
go mutate(&m[42]) // take a single element and go modifying it...
// ...while other parts of the program change _other_ values
m[123] = "blah blah"
The reason you are not allowed to do this is the
insertion operation m[123] = ... might trigger moving
the storage of the map's element around, and that might
involve moving the storage of the element keyed by 42
to some other place in memory — pulling the rug
from under the feet of the goroutine
running the mutate function.
So, in Go, maps really only support three operations:
Insert — or replace — an element;
Read an element;
Delete an element.
You cannot modify an element "in place" — you can only
go in three steps:
Read the element;
Modify the variable containing the (read) copy;
Replace the element by the modified copy.
As you can now see, the steps (1) and (3) are mere map accesses,
and so the answer to your question is (hopefully) apparent:
the step (1) shall be done under at least an read lock,
and the step (3) shall be done under a write (exclusive) lock.
In contrast, elements of other compound types —
arrays (and slices) and fields of struct types —
do not have the restriction maps have: provided the storage
of the "enclosing" variable is not relocated, it is fine to
change its different elements concurrently by different goroutines.
Since the only way to change the value associated with the key in the map is to reassign the changed value to the same key, that is a write / modification, so you have to obtain the write lock–simply using the read lock will not be sufficient.

Limiting depth of shortest path query using Gremlin on JanusGraph

I have a fairly large graph (currently 3806702 vertices and 7774654 edges, all edges with the same label) in JanusGraph. I am interested in shortest path searches in it. Gremlin recipes mention this query:
g.V(startId).until(hasId(targetId)).repeat(out().simplePath()).path().limit(1)
This returns path that I know to be a correct one immediately but then hangs the console (top shows janusgraph and scylla to be processing stuff furiously though, so I guess it's working in the background, but it takes forever). It does the right thing and returns first (correct) shortest path if used like this:
g.V(startId).until(hasId(targetId)).repeat(out().simplePath()).path().next()
I would like to limit this query so that gremlin/janusgraph stops searching for path over, let's say, 100 hops (so I want max depth of 100 edges basically). I have tried to use .times(100) in multiple positions but if .until() is used with .times() in the same query it always crashes with a NullPointerException in gremlin traversal classes, ie:
java.lang.NullPointerException
at org.apache.tinkerpop.gremlin.process.traversal.util.TraversalHelper.hasStepOfAssignableClassRecursively(TraversalHelper.java:351)
at org.apache.tinkerpop.gremlin.process.traversal.strategy.optimization.RepeatUnrollStrategy.apply(RepeatUnrollStrategy.java:61)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversalStrategies.applyStrategies(DefaultTraversalStrategies.java:86)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.applyStrategies(DefaultTraversal.java:119)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:198)
at java_util_Iterator$next.call(Unknown Source)
...
Does anyone have any idea how can I apply such limit? I need this to return first result or fail, fast.
Thanks!
Add another break condition in your until() and also make sure to limit() the result before you ask for paths:
g.V(startId).
until(__.hasId(targetId).or().loops().is(100)).
repeat(__.both().simplePath()).
hasId(targetId).limit(1).path()
Calling tryNext() on this traversal will give you an Optional<Path>. If it's empty, then no path was found within the given distance.

How to detect cycles in directed graph in iterative DFS?

My current project features a set of nodes with inputs and outputs. Each node can take its input values and generate some output values. Those outputs can be used as inputs for other nodes. To minimize the amount of computation needed, node dependencies are checked on application start. When updating the nodes, they are updated in the reverse order they depend on each other.
That said, the nodes resemble a directed graph. I am using iterative DFS (no recursion to avoid stack overflows in huge graphs) to work out the dependencies and create an order for updating the nodes.
I further want to avoid cycles in a graph because cyclic dependencies will break the updater algorithm and causing a forever running loop.
There are recursive approaches to finding cycles with DFS by tracking nodes on the recursion stack, but is there a way to do it iteratively? I could then embed the cycle search in the main dependency resolver to speed things up.
There are plenty of cycle-detection algorithms available on line. The simplest ones are augmented versions of Dijkstra's algorithm. You maintain a list of visited nodes and costs to get there. In your design, replace the "cost" with the path to get there.
In each iteration of the algorithm, you grab the next node on the "active" list and look at each node that follows it in the graph (i.e. each of its dependencies). If that node is on the "visited" list, then you have a cycle. The path you maintained in getting here shows the loop path.
Is that enough to get you moving?
Try a timestamp. Add a meta timestamp and set it to zero on your nodes.
Previous Answer (non applicable):
When you start a search, increment or grab a time() stamp. Then, when
you visit a node, compare it to the current search timestamp. If it
is the same, then you have found a cycle. If not then set the stamp
to current.
Next search, increment again.
Ok, this is how I'm assuming you are performing your DFS search:
Add Root node to a stack (for searching) and a vector (for updating).
Pop the stack and add children of the current node to the stack and to the vector
loop until stack is empty
reverse iterate the vector and update values (by referencing child nodes)
The problem: Cycles will cause the same set of nodes to be added to the stack.
Solution 1: Use a boolean/timestamp to see if the node has been visited before adding to the DFS search stack. This will eliminate cycles, but will not resolve them. You can spit out an error and quit.
Solution 2: Use a timestamp, but increment it each time you pop the stack. If a child node has a timestamp set, and it is less than the current stamp, you have found a cycle. Here's the kicker. When iterating over the values backwards, you can check the timestamps of the child nodes to see if they are greater than the current node. If less, then you've found a cycle, but you can use a default value.
In fact, I think Solution 1 can be resolved the same way by never following more than one child when updating the value and setting all nodes to a default value on start. Solution 2 will give you a warning while evaluating the graph whereas solution 1 only gives you a warning when creating the vector.

Javascript library for graph operations

Is there any suggested javascript alternative(s) to pythons pygraph or NetworkX? It should be noted that visualization is not necessary (even prefered not to have this).
The library should be able to parse a format capable of retaining labeling and attributes on nodes and edges (DOT, GraphML?). It should support operations such as:
Listing nodes and edges.
Given a node, the edges which point in/out to/from it.
Given a node or edge, return the attached attributes.
Given two nodes that are connected, determine the most complete path. When running this operation a predicate function should be provided to determine if a node should be included in the search or not.
To put it in context, the web browser based application will traverse the graph from a pre-determined start node. Each node holds an attribute 'userValue' which is compared against conditions (rules?) held as attributes on the nodes out-edges. For the traversal to continue the edge condition must evaluate to true against 'userValue'. The graph will always contain a predetermined start and end (or goal) node.
You could try
JSNetworkX
A port of the NetworkX graph library to JavaScript
http://felix-kling.de/JSNetworkX/

Resources