Upsert list of maps (update if exists, insert if not exists) in Gremlin - graph

I have a list of maps which correspond to records that I will receive in streaming, I need to take each record and upsert it.
I have seen many examples that work fine with just one vertex at a time, such that:
g.V('Amy').outE().where(inV().hasId('John')).
fold().
coalesce(
unfold(),
addE('manages').from(V('Amy')).to(V('John'))).
property('duration', '1year')
Works just fine, my issue lies in the replication of those steps for each map inside the list.
I am currently using gremlin on Amazon Neptune Notebooks, if that makes any difference.
At the moment I am able to insert a record if it does not exist or just retreive it if it exists. Basically a "get or insert" functionality. How can I update each property of each record if it already exists?
My current query for the get or insert:
g.inject([['memshpnum':'13464406186','cmpcod':'LM','upddat':'2019-03-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'7531272','T.id': '7531272#ifl', 'T.label': 'ccp_node_customer', 'status': 'pend'],
['memshpnum':'00170674487','cmpcod':'LM','upddat':'2019-03-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'3076059','T.id': '3076059#ifl', 'T.label': 'ccp_node_customer'],
['memshpnum':'20203784496','cmpcod':'LM','upddat':'2019-04-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'3075659','T.id': '727745#ifl', 'T.label': 'ccp_node_customer'],
['memshpnum':'20203784498','cmpcod':'LM','upddat':'2019-04-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'3076058','T.id': '727365#ifl', 'T.label': 'ccp_node_customer']
]).unfold().as("properties").
where(select("properties").unfold().filter(select(keys).is('T.label').or().is('T.cusnum')).select(values)).fold().
coalesce(unfold(),
addV(select('properties').unfold().filter(select(keys).is('T.label')).select(values)).as("vertex").
property(T.id, select('properties').unfold().filter(select(keys).is('T.id')).select(values)).
sideEffect(select("properties").
unfold().filter(select(keys).is(without('T.label','T.id'))).as("kv").
select("vertex").
property(select("kv").by(keys), select("kv").by(values))
)
)
And amongst the things I have tried is this query:
g.inject([
['memshpnum':'13464406186','cmpcod':'LM','upddat':'2019-03-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'7531272','T.id': '7531272#ifl', 'T.label': 'ccp_node_customer', 'status': 'pend'],
['memshpnum':'00170674487','cmpcod':'LM','upddat':'2019-03-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'3076059','T.id': '3076059#ifl', 'T.label': 'ccp_node_customer'],
['memshpnum':'20203784496','cmpcod':'LM','upddat':'2019-04-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'3075659','T.id': '727745#ifl', 'T.label': 'ccp_node_customer'],
['memshpnum':'20203784498','cmpcod':'LM','upddat':'2019-04-01 00:00:00','ccp_loaded_date':'2022-11-22T15:29:59.933Z','cusnum':'3076058','T.id': '727365#ifl', 'T.label': 'ccp_node_customer']
]).unfold().as("properties").
where(select("properties").unfold().filter(select(keys).is('T.label').or().is('T.cusnum')).select(values)).fold().
coalesce(unfold()
.property(select('properties').unfold().filter(select(keys).is(without('T.label','T.id'))).select(keys), select('properties').unfold().filter(select(keys).is(without('T.label','T.id'))).select(values)),
addV(select('properties').unfold().filter(select(keys).is('T.label')).select(values)).as("vertex").
property(T.id, select('properties').unfold().filter(select(keys).is('T.id')).select(values)).
sideEffect(select("properties").
unfold().filter(select(keys).is(without('T.label','T.id'))).as("kv").
select("vertex").
property(select("kv").by(keys), select("kv").by(values))
)
).toList()
However, I get an "Failed to interpret Gremlin query: The provided traverser does not map to a value" error.

Related

Iterate list of values from traversal A in traversal B (Gremlin)

This is my test data:
graph = TinkerGraph.open()
g= graph.traversal()
g.addV('Account').property('id',"0x0").as('a1').
addV('Account').property('id',"0x1").as('a2').
addV('Account').property('id',"0x2").as('a3').
addV('Token').property('address','1').as('tk1').
addV('Token').property('address','2').as('tk2').
addV('Token').property('address','3').as('tk3').
addV('Trx').property('address','1').as('Trx1').
addV('Trx').property('address','1').as('Trx2').
addV('Trx').property('address','3').as('Trx3').
addE('sent').from('a1').to('Trx1').
addE('sent').from('a2').to('Trx2').
addE('received_by').from('Trx1').to('a2').
addE('received_by').from('Trx2').to('a3').
addE('distributes').from('a1').to('tk1').
addE('distributes').from('a1').to('tk2').
addE('distributes').from('a1').to('tk3').
iterate()
I need to first get all the Token addresses using the distributes relationship and then with those values loop through a traversal. This is an example of what I need for one single token
h = g.V().has('Account','id','0x0').next()
token = '1'
g.V(h).
out('sent').has('address',token).as('t1').
out('received_by').as('a2').
out('sent').has('address',token).as('t2').
out('received_by').as('a3').
select('a3','a2'). \
by('id').toList()
This is the output:
[a3:0x2,a2:0x1]
Instead of doing that has('address',token) on each hop I could omit it and just make sure the token address is the same by placing a where('t1',eq('t2')).by('address') at the end of the traversal, but this performs badly given my database design and indexes.
So what I do to iterate is:
tokens = g.V(h).out('distributes').values('address').toList()
finalList = []
for (token in tokens){
finalList.add(g.V(h).
out('sent').has('address',token).
out('received_by').as('a2').
out('sent').has('address',token).
out('received_by').as('a3').
select('a3','a2'). \
by('id').toList())
}
And this is what's stored in finalList at the end:
==>[[a3:0x2,a2:0x1]]
==>[]
==>[]
This works but I was wondering how can I iterate that token list this way without leaving Gremlin and without introducing that for loop. Also, my results contain empty results which is not optimal. The key here for me is to always be able to do that has('address',token) for each hop with the tokens that the Account node has ever sent. Thank you very much.
There is still uncertainty about what you are trying to achieve.
Nevertheless, I think this query does what you need:
g.V().has('Account', 'id', '0x0').as('a').
out('distributes').values('address').as('t').
select('a').
repeat(out('sent').where(values('address').
as('t')).
out('received_by')).
emit()
Example: https://gremlify.com/spwya4itlvd

Union step does not work with multiple elements

The following query returns a user map with an "injected" property called "questions", it works as expected when g.V().has() returns a single user, but not when returns multiple users:
return g.V().has("user", "userId", 1)
.union(
__.valueMap().by(__.unfold()),
__.project('questions').by(
__.outE('response').valueMap().by(__.unfold()).fold()
)
)
.unfold()
.group()
.by(__.select(column.keys))
.by(__.select(column.values));
It works, but when I change the first line to return multiple users:
g.V().hasLabel("user").union(....
I finish the query calling .toList() so I was expecting to get a list of all the users in the same way it works with a single user but instead I still get a single user.
How can I get my query to work for both, multiple users or a single user?
When using Gremlin, you have to think in terms of a stream. The stream contains traversers which travel through the steps you've written. In your case, with your initial test of:
g.V().has("user", "userId", 1)
.union(
__.valueMap().by(__.unfold()),
__.project('questions').by(
__.outE('response').valueMap().by(__.unfold()).fold()
)
)
.unfold()
.group()
.by(__.select(column.keys))
.by(__.select(column.values))
you have one traverser (i.e. V().has("user", "userId", 1) produces one user) that flows to the union() and is split so that it goes to both valueMap() and project() both producing Map instances. You now have two traversers which are unfolded to a stream and grouped together to one final Map traverser.
So with that in mind what changes when you do hasLabel("user")? Well, you now have more than one starting traverser which means you will produce two traversers for each of those users when you get to union(). They will each be flatted to stream by unfold() and then they will just overwrite one another (because they have the same keys) to produce one final Map.
You really want to execute your union() and follow on operations once per initial "user" vertex traverser. You can tell Gremlin to do that with map():
g.V().has("user", "userId", 1)
.map(
.union(
__.valueMap().by(__.unfold()),
__.project('questions').by(
__.outE('response').valueMap().by(__.unfold()).fold()
)
)
.unfold()
.group()
.by(__.select(column.keys))
.by(__.select(column.values))
)
Finally, you can simplify your final by() modulators as:
g.V().has("user", "userId", 1)
.map(
.union(
__.valueMap().by(__.unfold()),
__.project('questions').by(
__.outE('response').valueMap().by(__.unfold()).fold()
)
)
.unfold()
.group()
.by(keys)
.by(values)
)

How to traverse all vertex and get nested objects

I want to get nested objects in the form of
{ country :
{code:'IN',states:
{code:'TG',cities:
{code:'HYD',malls:
{[shopping-mall1],[shopping-mall2],.....}
},
{code:'PKL',malls:
{[shopping-mall1],[shopping-mall2],.....}
}
},
{code:'AP',cities:
{code:'VJY',malls:
{[shopping-mall1],[shopping-mall2],.....}
}
}
}
}
MY graph is in format
vertex: country ---> states ---->cities ---> mallls
edges: (type:'state') ('type','city')
ex: inE('typeOf').outV().has('type','state') move to next vertex "states".
next same inE('typeOf').outV().has('type','city') moves to "city" vertex. then "malls" vertex .
And tired to write the code, some vertex has no cities i have an error that situavation."
error
The provided traverser does not map to a value: v[8320]->[JanusGraphVertexStep(IN,[partOf],vertex), HasStep([type.eq(city)]), JanusGraphPropertiesStep([code],value)]
Thats why i am using coalesce because some state has not an edge 'inE('partOf').outV().has('type','city')' means no city
.by(coalesce(select('states').inE('partOf').outV().has('type','city'))
My query
g.V().hasLabel('Country').has('code','IN')
.project('country')
.by(project('code','states')
.by(values('code'))
.by(inE('partOf').outV().has('type','state').has('code').as('states').
project('code','cities')
.by(select('states').values('code'))
.by(coalesce(select('states').inE('partOf').outV().
has('type','city').has('code').as('cities').
project('code','malls')
.by(select('cities').values('code'))
.by(coalesce(select('cities').inE('partOf').outV().
has('type','malls').valueMap(),constant(0))),
constant(0)))))
But the result is
{country={code=IN, states={code=DD, cities=0}}}
here i am getting one state 'DD' and that state is no city,so it gives 'cities = 0".
the above result is only one state is coming, i want all states, cities and malls in each city.
Please update query or change query
In order to collect all the results you should use .fold() traversal which returns a list of the collected traversals. without fold you will get only the first traversal like in your example.
In order to keep the types the same I changed the constant to [] instead of 0.
It was also not clear if the "type" property is on the edge or the vertex. I find it more appropriate to have it on the edge, so I fixed it as well by moving the has('type',...) between the inE() and outV().
Last, you don't need to "store" the traversal using "as" and then "select" it.
This query should give you the required result:
g.V().hasLabel('Country').has('code','IN')
.project('country')
.by(project('code','states')
.by(values('code'))
.by(inE('partOf').has('type','state').outV().has('code')
.project('code','cities')
.by(values('code'))
.by(coalesce(inE('partOf').has('type','city').outV().has('code')
.project('code','malls')
.by(values('code'))
.by(coalesce(
inE('partOf').has('type','malls').outV().valueMap(),
constant([])).fold()),
constant([])).fold())
.fold()))

Gremlin traversal.Output all Edge details and also in/out Vertex id's

I'm having trouble constructing the gremlin query to give me all of the Edge details(label, properties) and also the ID's of the Inv and OutV adjoining Vertex's (I don't need any more info from the linked Vertex's, just the ID's).
All I have is the Edge ID as a starting point.
So my Edge is as follows:
Label: "CONTAINS"
id: c6b4f3cb-f96e-cc97-dedb-e405771cb4f2
keys:
key="ekey1", value="e1"
key="ekey2", value="e2"
inV has id 50b4f3cb-f907-c31c-6284-1a3463fd72b9
outV has id 7cb4f3cb-d9a2-1398-61d7-9339be34833b
What I want is a single query that will return me something like -
"CONTAINS", "c6b4f3cb-f96e-cc97-dedb-e405771cb4f2", {ekey1=e1, ekey2=e2, ...}, "50b4f3cb-f907-c31c-6284-1a3463fd72b9", "7cb4f3cb-d9a2-1398-61d7-9339be34833b"
I can get the info in separate queries i.e.
g.E("c6b4f3cb-f96e-cc97-dedb-e405771cb4f2").bothV()
==>v[50b4f3cb-f907-c31c-6284-1a3463fd72b9]
==>v[7cb4f3cb-d9a2-1398-61d7-9339be34833b]
g.E("c6b4f3cb-f96e-cc97-dedb-e405771cb4f2").valueMap()
==>{ekey1=e1, ekey2=e2}
g.E("c6b4f3cb-f96e-cc97-dedb-e405771cb4f2").label()
==>CONTAINS
But I can't for the life of me work out how to combine these into one.
You could use project() to get what you're looking for:
g.E("c6b4f3cb-f96e-cc97-dedb-e405771cb4f2").
project('ekey1', 'inV', 'outV', 'label').
by('ekey1').
by(inV().id()).
by(outV().id()).
by(label).

Gremlin Query to return multiple Result in a ResultSet

May be my understanding of gremlin query is wrong :). I am trying to execute a query from Java client and the query is: g.V().hasLabel('MYLABEL').
Have multiple (say 20) vertices that match the label and the ResultSet just have one Result with the data of all twenty vertices included. I would like to have the ResultSet with 20 Results. What way that I need to rearrange the query. please suggest.
few more details:
From Console.
[query result as run from gremlin console][1]
gremlin> client.submit("g.V().hasLabel('PERSON')")
==>result{object=v[11] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex}
==>result{object=v[13] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex}
==>result{object=v[15] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex}
From Java Client
Query -> g.V().hasLabel('PERSON')
The result -> result{object={#type=g:List, #value=[{#type=g:Vertex, #value={id={#type=g:Int64, #value=11}, label=PERSON, properties={AGE=[{#type=g:VertexProperty, #value={id={#type=g:Int64, #value=12}, value={#type=g:Int32, #value=11}, label=AGE}}]}}}, {#type=g:Vertex, #value={id={#type=g:Int64, #value=13}, label=PERSON, properties={AGE=[{#type=g:VertexProperty, #value={id={#type=g:Int64, #value=14}, value={#type=g:Int32, #value=12}, label=AGE}}]}}}, {#type=g:Vertex, #value={id={#type=g:Int64, #value=15}, label=PERSON, properties={AGE=[{#type=g:VertexProperty, #value={id={#type=g:Int64, #value=16}, value={#type=g:Int32, #value=13}, label=AGE}}]}}}]} class=java.util.LinkedHashMap}
Just use fold() as in - you can see my example here:
gremlin> cluster = Cluster.open()
==>localhost/127.0.0.1:8182
gremlin> client = cluster.connect()
==>org.apache.tinkerpop.gremlin.driver.Client$ClusteredClient#51efb731
gremlin> r = client.submit("g.V().hasLabel('person')").all().get()
==>result{object=v[1] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex}
==>result{object=v[2] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex}
==>result{object=v[4] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex}
==>result{object=v[6] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex}
gremlin> r = client.submit("g.V().hasLabel('person').fold()").all().get()
==>result{object=[v[1], v[2], v[4], v[6]] class=java.util.ArrayList}
Note that the downside to fold() in this example is that the result won't be streamed back to the client. You will build the entire list in memory on the server and then it will serialize that list as a single payload. If that list is sufficiently large and you generate enough of such lists you may hit memory/GC issues.

Resources