Add edge if not exist using gremlin - graph

I'm using cosmos graph db in azure.
Does anyone know if there is a way to add an edge between two vertex only if it doesn't exist (using gremlin graph query)?
I can do that when adding a vertex, but not with edges. I took the code to do so from here:
g.Inject(0).coalesce(__.V().has('id', 'idOne'), addV('User').property('id', 'idOne'))
Thanks!

It is possible to do with edges. The pattern is conceptually the same as vertices and centers around coalesce(). Using the "modern" TinkerPop toy graph to demonstrate:
gremlin> g.V().has('person','name','vadas').as('v').
V().has('software','name','ripple').
coalesce(__.inE('created').where(outV().as('v')),
addE('created').from('v').property('weight',0.5))
==>e[13][2-created->5]
Here we add an edge between "vadas" and "ripple" but only if it doesn't exist already. the key here is the check in the first argument to coalesce().

The performance of the accepted answer isn't great since it use inE(...), which is an expensive operation.
This query is what I use for my work in CosmosDB:
g.E(edgeId).
fold().
coalesce(
unfold(),
g.V(sourceId).
has('pk', sourcePk).
as('source').
V(destinationId).
has('pk', destinationPk).
addE(edgeLabel).
from('source').
property(T.id, edgeId)
)
This uses the id and partition keys of each vertex for cheap lookups.

I have been working on similar issues, trying to avoid duplication of vertices or edges. The first is a rough example of how I check to make sure I am not duplicating a vertex:
"g.V().has('word', 'name', '%s').fold()"
".coalesce(unfold(),"
"addV('word')"
".property('name', '%s')"
".property('pos', '%s')"
".property('pk', 'pk'))"
% (re.escape(category_),re.escape(category_), re.escape(pos_))
The second one is the way I can make sure that isn't a directional edge in either direction. I make use of two coalesce statements, one nested inside the other:
"x = g.V().has('word', 'name', '%s').next()\n"
"y = g.V().has('word', 'name', '%s').next()\n"
"g.V(y).bothE('distance').has('weight', %f).fold()"
".coalesce("
"unfold(),"
"g.addE('distance').from(x).to(y).property('weight', %f)"
")"
% (word_1, word_2, weight, weight)
So, if the edge exists y -> x, it skips producing another one. If y -> x doesn't exist, then it tests to see if x -> y exists. If not, then it goes to the final option of creating x -> y
Let me know if anyone here knows of a more concise solution. I am still very new to gremlin, and would love a cleaner answer. Though, this one appears to suffice.
When I implemented the previous solutions provided, when I ran my code twice, it produced an edge for each try, because it only tests one direction before creating a new edge.

Related

How to replace deprecated addInE and addOutE steps with addE?

I am completely new to Gremlin and have some really old code that is using addInE() and addOutE(). I understand that it is deprecated as of release 3.1.0 and - according to the javadocs - should be replaced with addE().
My problem is that I have very little knowledge of Gremlin in general and found almost no documentation for the addInE() and addOutE() steps.
In the reference documentation for version 3.0.0 there is exactly one example where it is used, but not explained.
Here is the example that is given:
gremlin> g.V(1).as('a').out('created').in('created').where(neq('a')).addOutE('co-developer','a','year',2009) //(1)
==>e[12][4-co-developer->1]
==>e[13][6-co-developer->1]
gremlin> g.withSideEffect('a',g.V(3,5).toList()).V(4).addInE('createdBy','a') //(2)
==>e[14][3-createdBy->4]
==>e[15][5-createdBy->4]
gremlin> g.V().as('a').out('created').as('b').select('a','b').addOutE('b','createdBy','a','acl','public') //(3)
==>e[16][3-createdBy->1]
==>e[17][5-createdBy->4]
==>e[18][3-createdBy->4]
==>e[19][3-createdBy->6]
gremlin> g.V(1).as('a').out('knows').addInE('livesNear','a','year',2009).inV().inE('livesNear').values('year') //(4)
==>2009
==>2009
My current interpretation of the first query
g.V(1).as('a').out('created').in('created').where(neq('a')) selects elements from the graph
addOutE('co-developer','a','year',2009) will add something to the selection
I would appreciate if someone could first elaborate on what is happening here and then point out how addInE() and addOutE() could be represented using addE().
This is a trip down memory lane!
Using one of the examples you found
gremlin> g.V(1).as('a').out('created').in('created').where(neq('a')).addOutE('co-developer','a','year',2009)
would, in current day Gremlin be written as
g.V(1).as('a').
out('created').
in('created').
where(neq('a')).
addE('co-developer').to('a').property('year',2009)
The way to read this is
Starting at the vertex with an ID of one.
Find all the vertices connected to V(1) by outgoing 'created' edges
Find all the people who also created the same thing
Don't include where you started (ie ignore yourself)
Add a new 'co-developer' edge to V(1) from the people found, with an edge property of the year.
When replacing addInE still use the addE step, but replace the to with a from. Note that an addE can also have both from and to used with it at the same time.

Gremlin: limit by vertex label

Hello dear gremlin jedi,
I have a bunch of nodes with different labels in my graph:
g.addV('book')
.addV('book')
.addV('book')
.addV('movie')
.addV('movie')
.addV('movie')
.addV('album')
.addV('album')
.addV('album').iterate()
There also may be vertices with other labels.
and a hash map describing what labels and how many vertices of each label I want to get:
LIMITS = {
"book": 2,
"movie": 2,
"album": 2,
}
I'd like to write a query that returns a list of vertices consisting of vertices with specified labels whete amount of vertices with each label is limited in according to the LIMITS hash map. In this case there should be 2 books, 2 movies and 2 albums in the result.
The limits and requested labels are calculated independently for every query so they cannot be hardcoded.
As far as I can see the limit step does not support passing traversals as an argument.
What trick can I use to write such query? The only option I see is to build the query using capabilities of the client side programming language (Ruby with grumlin as a gremlin client in my case):
nodes = LIMITS.map do |label, limit|
__.hasLabel(label).limit(limit)
end
g.V().union(*nodes).toList
But I believe there is a better solution.
Thank you!
The most direct way would be to use group() I think:
gremlin> g.V().group().by(label)
==>[software:[v[3],v[5]],person:[v[1],v[2],v[4],v[6]]]
gremlin> g.V().group().by(label).by(unfold().limit(2).fold())
==>[software:[v[3],v[5]],person:[v[1],v[2]]]
You can filter the vertices going to group() with hasLabel() if you need those sorts of restrictions. Depending upon how you use this, the traversal could be expensive in the sense that you have to traverse a fair bit of data to filter away all but two (in this case) vertices. If that is a concern, your approach to dynamically construct the traversal and the piecing it together with union() doesn't seem so bad. While I could probably think up a way to write that in just Gremlin, it probably wouldn't not be as readable as your approach.

How do I produce output even when there is no edge and when using select for projection

Can someone help me please with this simple query...Many thanks in advance...
I am using the following gremlin query and it works well giving me the original vertex (v) (with id-=12345), its edges (e) and the child vertex (id property). However, say if the original vertex 'v' (with id-12345) has no outgoing edges, the query returns nothing. I still want the properties of the original vertex ('v') even if it has no outgoing edges and a child. How can I do that?
g.V().has('id', '12345').as('v').
outE().as('e').
inV().
as('child_v').
select('v', 'e', 'child_v').
by(valueMap()).by(id).by(id)
There are a couple of things going on here but the major update you need to the traversal is to use a project() step instead of a select().
select() and project() steps are similar in that they both allow you to format the results of a traversal however they differ in (at least) one significant way. select() steps function by allowing you to access previously traversed and labeled elements (via as). project() steps allow you take the current traverser and branch it to manipulate the output moving forward.
In your original traversal, when there are no outgoing edges from original v so all the traversers are filtered out during the outE() step. Since there are no further traversers after the outE() step then remainder of the traversal has no input stream so there is no data to return. If you use a project() step after the original v you're able to return the original traverser as well as return the edges and incident vertex. This does lead to a slight complication when handling cases where no out edges exist. Gremlin does not handle null values, such as no out edges existing, you need to return some constant value for these statements using a coalesce statement.
Here is functioning version of this traversal:
g.V().hasId(3).
project('v', 'e', 'child_v').
by(valueMap()).
by(coalesce(outE().id(), constant(''))).
by(coalesce(out().id(), constant('')))
Currently you will get a lot of duplicate data, in the above query you will get the vertex properties E times. probably will be better to use project:
g.V('12345').project('v', 'children').
by(valueMap()).
by(outE().as('e').
inV().as('child').
select('e', 'child').by(id).fold())
example: https://gremlify.com/a1
You can get the original data format if you do something like this:
g.V('12345').as('v').
coalesce(
outE().as('e').
inV().
as('child_v')
select('v', 'e', 'child_v').
by(valueMap()).by(id).by(id),
project('v').by(valueMap())
)
example: https://gremlify.com/a2

Time Complexity of Adding Edge to Graph using Adjacency List

I've been studying up on graphs using the Adjacency List implementation, and I am reading that adding an edge is an O(1) operation.
This makes sense if you are just tacking an edge on to the Vertex's Linked List of edges, but I don't understand how this could be the case so long as you care about removing the old edge if one already exists. Finding that edge would take O(V) time.
If you don't do this, and you add an edge that already exists, you would have duplicate entries for that edge, which means they could have different weights, etc.
Can anyone explain what I seem to be missing? Thanks!
You're right at your complecxity analysis. Find if edge already exist is truly O(V). But notice that adding this edge even if existed is still O(1).
You need to remember that having 2 edges with the same source an destination are valid input to graph - even with different weights (maybe not even but because).
That way adding edge to adjacency-list-graph is O(1)
What people usually do to have both optimal search time complexity and the advantages of adjacency lists is to use an array of hashsets instead of an array of lists.
Alternatively,
If you want a worst-case optimal solution, use RadixSort to order the
list of all edges in O(v+e) time, remove duplicates, and then build
the adjacency list representation in the usual way.
source: https://www.quora.com/What-are-the-various-approaches-you-can-use-to-build-adjacency-list-representation-of-a-undirected-graph-having-time-complexity-better-than-O-V-*-E-and-avoiding-duplicate-edges

create multiple edges having vertex id number 0 to 49

This can be a bit problem occuring for me as i am working for more time than needed . Can you tell me can i add an edge between two vertices and here i have 50 vertices and i cant find a way to add edge on it having vertex id 0 to 49 . Till now I have use
gremlin> (0..<50)each{g.addEdge(V[it],V[it+1]).next()}
No such property: V for class: groovysh_evaluate
gremlin> (0..<=49)each{g.addEdge(g.getVertex([NodeID]),g.getVertex([NodeID+1]),'abc')}
groovysh_parse: 2: unexpected token: = # line 2, column 6.
(0..<=49)each{g.addEdge(g.getVertex([NodeID]),g.getVertex([NodeID+1]),'abc')}
^
1 error
It looks like you just want to iterate through the vertices and add an edge from one vertex to the next until they are all connected. First, I'll create the 50 vertices:
gremlin> g.inject((0..<50).toArray()).as('i').addV('myid',select('i')).iterate()
Then I'll add the edges:
gremlin> (0..<49).each { def v = g.V().has('myid',(long) it).next(); v.addEdge('knows',g.V().has('myid',(long)it+1).next()) }
I cast to "long" in my example above as I was using a TinkerGraph. That cast may not be necessary for dynamo. Note that you can combine all of this into a single line with:
gremlin> g.addV().repeat(__.as('a').addV().as('b').
select(last,'a','b').
addE('.').from('a').to('b').
inV().as('a')).
times(49)
The above will create both the vertices and the edges at the same time in an iterative fashion. Note that "49" represents the number of edges you'd like to have.
You have spread this same question across multiple tags in StackOverflow including: here and here. In all cases you have lots of basic syntax errors and are calling methods that don't exist and referencing objects that don't exist. I suggest you focus on more of the basics of Java and Groovy before digging too deeply into dynamodb and TinkerPop. At a mimimum, start with the TinkerPop tutorials (like the one mentioned in the comment to your question) to get a better feel for the APIs and how the programmign syntax.

Resources