Create if not exist Vertex and Edge in 1 Gremlin Query - graph

I find the following code to create edge if it has not existed yet.
g.V().hasLabel("V1")
.has("userId", userId).as("a")
.V().hasLabel("V1").has("userId", userId2)
.coalesce(
bothE("link").where(outV().as("a")),
addE("link").from("a")
)
It works fine but I want to create both vertices and edge if they are not existed in 1 query.
I try the following code with new graph, it just create new vertices but no relation between them.
g.V().hasLabel("V1")
.has("userId", userId).fold()
.coalesce(
unfold(),
addV("V1").property("userId", userId1)
).as("a")
.V().hasLabel("V1").has("userId", userId2).fold()
.coalesce(
unfold(),
addV("V1").property("userId", userId2)
)
.coalesce(
bothE("link").where(outV().as("a")),
addE("link").from("a")
)

Thanks to Daniel Kuppitz in JanusGraph google group. I found out the solution. I re-post it here for anyone who need it.
There are two issues in your query. The first one is the reason why it doesn't work as expected: the fold() step. Using fold() will destroy the path history, but you can easily work around it, by doing that part in a child traversal:
g.V().has("V1","userId", userId1).fold().
coalesce(unfold(),
addV("V1").property("userId", userId1)).as("a").
map(V().has("V1","userId", userId2).fold()).
coalesce(unfold(),
addV("V1").property("userId", userId2))
coalesce(inE("link").where(outV().as("a")),
addE("link").from("a"))
The second issue is the combination of bothE and outV. You should rather use bothE/otherV, outE/inV or inE/outV.

I used the approach suggested by #thangdc94 (thanks!) and found that the "map" step takes a long time, this query worked much faster (X20) for me:
g.V().has("V1","userId", userId1).fold().
coalesce(unfold(),
addV("V1").property("userId", userId1)).as("a").iterate();
g.V().has("V1","userId", userId2).fold().
coalesce(unfold(),
addV("V1").property("userId", userId2)).as("b").
V().has("V1","userId", userId1).
coalesce(outE("link").where(inV().as("b")),
addE("link").to("b"))
comment: I used Neptune DB

Related

How do I produce output even when there is no edge and when using select for projection

Can someone help me please with this simple query...Many thanks in advance...
I am using the following gremlin query and it works well giving me the original vertex (v) (with id-=12345), its edges (e) and the child vertex (id property). However, say if the original vertex 'v' (with id-12345) has no outgoing edges, the query returns nothing. I still want the properties of the original vertex ('v') even if it has no outgoing edges and a child. How can I do that?
g.V().has('id', '12345').as('v').
outE().as('e').
inV().
as('child_v').
select('v', 'e', 'child_v').
by(valueMap()).by(id).by(id)
There are a couple of things going on here but the major update you need to the traversal is to use a project() step instead of a select().
select() and project() steps are similar in that they both allow you to format the results of a traversal however they differ in (at least) one significant way. select() steps function by allowing you to access previously traversed and labeled elements (via as). project() steps allow you take the current traverser and branch it to manipulate the output moving forward.
In your original traversal, when there are no outgoing edges from original v so all the traversers are filtered out during the outE() step. Since there are no further traversers after the outE() step then remainder of the traversal has no input stream so there is no data to return. If you use a project() step after the original v you're able to return the original traverser as well as return the edges and incident vertex. This does lead to a slight complication when handling cases where no out edges exist. Gremlin does not handle null values, such as no out edges existing, you need to return some constant value for these statements using a coalesce statement.
Here is functioning version of this traversal:
g.V().hasId(3).
project('v', 'e', 'child_v').
by(valueMap()).
by(coalesce(outE().id(), constant(''))).
by(coalesce(out().id(), constant('')))
Currently you will get a lot of duplicate data, in the above query you will get the vertex properties E times. probably will be better to use project:
g.V('12345').project('v', 'children').
by(valueMap()).
by(outE().as('e').
inV().as('child').
select('e', 'child').by(id).fold())
example: https://gremlify.com/a1
You can get the original data format if you do something like this:
g.V('12345').as('v').
coalesce(
outE().as('e').
inV().
as('child_v')
select('v', 'e', 'child_v').
by(valueMap()).by(id).by(id),
project('v').by(valueMap())
)
example: https://gremlify.com/a2

Gremlin query not working - 2 x vertex if doesn't exist and add edge

I'm using gremlin and have the statement below. I want to:
create a vertex if one does not exist
create another vertex if one does not exist
create an edge between the verticies
The edge doesn't get created. I'd really appreciate help with the logic/approach here.
g.V().has('User','refId','435').
fold().coalesce(unfold(),addV(label, 'User').
property('name','John Smith').property('refId','435'))
.as('user').
V().has('JobTitle','name','Advisor').
fold().coalesce(unfold(),addV(label,'JobTitle').
property('name','Advisor'))
.as('jobtitle').
addE('REGISTERED_AS').from('user').to('jobtitle')
Given your most recent comment on Kfir's answer that includes your latest code, I think that there are several problems to correct with your approach. First note that addV() does not take a long list of labels and properties. I'm surprised that didn't generate an error for you. addV() just takes the vertex label as an argument and then you use property() to provide the associated key/value pairs.
g.V().has('User', 'refId', '435').
fold().
coalesce(unfold(),
addV('User').
property('name', 'John Smith').property('refId', '435').property('firstName', 'John').
property('lastName', 'Smith').property('JobTitle', 'Chief Executive Officer')).as('user').
V().has('JobTitle', 'name', 'Advisor').
fold().
coalesce(unfold(), addV(label, 'JobTitle', 'name', 'Advisor')).as('jobtitle').
V().
addE('REGISTERED_AS').
from('user').
to('jobtitle')
There is an extra V() right before addE() which would basically call addE() for every vertex you have in your graph rather than just for the one vertex to which you want to add an edge.
g.V().has('User', 'refId', '435').
fold().
coalesce(unfold(),
addV('User').
property('name', 'John Smith').property('refId', '435').property('firstName', 'John').
property('lastName', 'Smith').property('JobTitle', 'Chief Executive Officer')).as('user').
V().has('JobTitle', 'name', 'Advisor').
fold().
coalesce(unfold(), addV(label, 'JobTitle', 'name', 'Advisor')).as('jobtitle').
addE('REGISTERED_AS').
from('user').
to('jobtitle')
So, now things look syntax correct but there is a problem and it stems from this:
gremlin> g.V(1).as('x').fold().unfold().addE('self').from('x').to('x')
The provided traverser does not map to a value: v[1]->[SelectOneStep(last,x)]
Type ':help' or ':h' for help.
Display stack trace? [yN]
The path information in the traversal is lost after the a reducing step (i.e. fold()) so you can't select back to "x" after that. You need to reform your traversal a bit to not require the fold():
g.V().has('User', 'refId', '435').
fold().
coalesce(unfold(),
addV('User').
property('name', 'John Smith').property('refId', '435').property('firstName', 'John').
property('lastName', 'Smith').property('JobTitle', 'Chief Executive Officer')).as('user').
coalesce(V().has('JobTitle', 'name', 'Advisor'),
addV('JobTitle').property('name', 'Advisor')).as('jobtitle').
addE('REGISTERED_AS').
from('user').
to('jobtitle')
That really just means using coalesce() more directly without the fold() and unfold() pattern. You really only need that pattern at the start of a traversal to ensure that a traverser stays alive in the stream (i.e. if the user doesn't exist fold() produces an empty list which becomes the new traverser and the traversal will continue executing).
The question code is partial. Some alignment would help.
Nevertheless, I think your problem is in the as('jobtitle'), which is inside the coalesce statement. That is, if the vertex exists, we don't get to the second traversal and the as statement is not executed. Same for the as('user').
To solve, just move the as statements outside the coalesce.

Traverse implied edge through property match?

I'm trying to create edges between vertices based on matching the value of a property in each vertex, making what is currently an implied relationship into an explicit relationship. I've been unsuccessful in writing a gremlin traversal that will match up related vertices.
Specifically, given the following graph:
g = TinkerGraph.open().traversal()
g.addV('person').property('name','alice')
g.addV('person').property('name','bob').property('spouse','carol')
g.addV('person').property('name','carol')
g.addV('person').property('name','dave').property('spouse', 'alice')
I was hoping I could create a spouse_of relation using the following
> g.V().has('spouse').as('x')
.V().has('name', select('x').by('spouse'))
.addE('spouse_of').from('x')
but instead of creating one edge from bob to carol and another edge from dave to alice, bob and dave each end up with spouse_of edges to all of the vertices (including themselves):
> g.V().out('spouse_of').path().by('name')
==>[bob,alice]
==>[bob,bob]
==>[bob,carol]
==>[bob,dave]
==>[dave,carol]
==>[dave,dave]
==>[dave,alice]
==>[dave,bob]
It almost seems as if the has filter isn't being applied, or, to use RDBMS terms, as if I'm ending up with an "outer join" instead of the "inner join" I'd intended.
Any suggestions? Am I overlooking something trivial or profound (local vs global scope, perhaps)? Is there any way of accomplishing this in a single traversal query, or do I have to iterate through g.has('spouse') and create edges individually?
You can make this happen in a single traversal, but has() is not meant to work quite that way. The pattern for this is type of traversal is described in the Traversal Induced Values section of the Gremlin Recipes tutorial, but you can see it in action here:
gremlin> g.V().hasLabel('person').has('spouse').as('s').
......1> V().hasLabel('person').as('x').
......2> where('x', eq('s')).
......3> by('name').
......4> by('spouse').
......5> addE('spouse_of').from('s').to('x')
==>e[10][2-spouse_of->5]
==>e[11][7-spouse_of->0]
gremlin> g.E().project('x','y').by(outV().values('name')).by(inV().values('name'))
==>[x:bob,y:carol]
==>[x:dave,y:alice]
While this can be done in a single traversal note that depending on the size of your data this could be an expensive traversal as I'm not sure that either call to V() will be optimized by any graph. While it's neat to use this form, you may find that it's faster to take approaches that ensure that a use of an index is in place which might mean issuing multiple queries to solve the problem.

gremlin order by with coalesce duplicates some values

In some cases, I get inexplicable result when I use order().by(...) with coalesce(...).
Using the standard Modern graph,
gremlin> g.V()
.hasLabel("person")
.out("created")
.coalesce(values("name"), constant("x"))
.fold()
==>[lop,lop,ripple,lop]
But if I sort by name before the coalesce I get 9 lop instead of 3:
gremlin> g.V()
.hasLabel("person")
.out("created")
.order().by("name")
.coalesce(values("name"), constant("x"))
.fold()
==>[lop,lop,lop,lop,lop,lop,lop,lop,lop,ripple]
Why the number of elements differs between the two queries ?
That looks like a bug - I've created an issue in JIRA. There is a workaround but first consider that your traversal isn't really going to work even with the bug set aside, order() will fail because you're referencing a key that possibly doesn't exist in the by() modulator. So you need to account for that differently:
g.V().
hasLabel("person").
out("created").
order().by(coalesce(values('name'),constant('x')))
I then used choose() to do what coalesce() is supposed to do:
g.V().
hasLabel("person").
out("created").
order().by(coalesce(values('name'),constant('x'))).
choose(has("name"),values('name'),constant('x')).
fold()
and that seems to work fine.

Add edge if not exist using gremlin

I'm using cosmos graph db in azure.
Does anyone know if there is a way to add an edge between two vertex only if it doesn't exist (using gremlin graph query)?
I can do that when adding a vertex, but not with edges. I took the code to do so from here:
g.Inject(0).coalesce(__.V().has('id', 'idOne'), addV('User').property('id', 'idOne'))
Thanks!
It is possible to do with edges. The pattern is conceptually the same as vertices and centers around coalesce(). Using the "modern" TinkerPop toy graph to demonstrate:
gremlin> g.V().has('person','name','vadas').as('v').
V().has('software','name','ripple').
coalesce(__.inE('created').where(outV().as('v')),
addE('created').from('v').property('weight',0.5))
==>e[13][2-created->5]
Here we add an edge between "vadas" and "ripple" but only if it doesn't exist already. the key here is the check in the first argument to coalesce().
The performance of the accepted answer isn't great since it use inE(...), which is an expensive operation.
This query is what I use for my work in CosmosDB:
g.E(edgeId).
fold().
coalesce(
unfold(),
g.V(sourceId).
has('pk', sourcePk).
as('source').
V(destinationId).
has('pk', destinationPk).
addE(edgeLabel).
from('source').
property(T.id, edgeId)
)
This uses the id and partition keys of each vertex for cheap lookups.
I have been working on similar issues, trying to avoid duplication of vertices or edges. The first is a rough example of how I check to make sure I am not duplicating a vertex:
"g.V().has('word', 'name', '%s').fold()"
".coalesce(unfold(),"
"addV('word')"
".property('name', '%s')"
".property('pos', '%s')"
".property('pk', 'pk'))"
% (re.escape(category_),re.escape(category_), re.escape(pos_))
The second one is the way I can make sure that isn't a directional edge in either direction. I make use of two coalesce statements, one nested inside the other:
"x = g.V().has('word', 'name', '%s').next()\n"
"y = g.V().has('word', 'name', '%s').next()\n"
"g.V(y).bothE('distance').has('weight', %f).fold()"
".coalesce("
"unfold(),"
"g.addE('distance').from(x).to(y).property('weight', %f)"
")"
% (word_1, word_2, weight, weight)
So, if the edge exists y -> x, it skips producing another one. If y -> x doesn't exist, then it tests to see if x -> y exists. If not, then it goes to the final option of creating x -> y
Let me know if anyone here knows of a more concise solution. I am still very new to gremlin, and would love a cleaner answer. Though, this one appears to suffice.
When I implemented the previous solutions provided, when I ran my code twice, it produced an edge for each try, because it only tests one direction before creating a new edge.

Resources