How to select two verted id's in one gremlin-python query? - gremlin

`
g.V().hasLabel('node1') \
.as('prop1', 'prop2', 'prop3', 'prop4', 'prop5', 'prop6', 'prop7', 'node1_id_prop8', 'prop9') \
.inE().hasLabel('edge').outV().hasLabel('node2').id().as('node2_id_prop10') \
.select('prop1', 'prop2', 'prop3', 'prop4', 'prop5', 'prop6', 'prop7',
'node1_id_prop8', 'prop9', 'node2_id_prop10') \
.by('prop1').by('prop2').by('prop3').by('prop4').by('prop5') \
.by(values('prop6').fold()).by('prop7').by(T.id).by('prop9').by(T.id).toList()
Here I want to get values include both node ID's?
But I am getting
error message '{'requestId': '7abb9f32-a228-44fb-9206-8b034eb67aa5', 'status': {'message': '{"code":"InternalFailureException","requestId":"7abb9f32-a228-
44fb-9206-8b034eb67aa5","detailedMessage":"TokenTraversal support of java.lang.String does not allow selection by id"}', 'code': 500, 'attributes': {}}, 'result':
{'data': None, 'meta': {}}}'
`
I am expecting to get list of key value pair values that traverse through this node.

With Gremlin, you typically want to follow a pattern where you're touching/fetching the higher level components in the graph first, and then applying filters and serialization techniques later in the query.
In your example, you want to traverse from node1 to node2 first and gather up those higher level components (the vertices) before starting to worry about the properties. You can do that using as() or aggregate()...
A side note here: You typically want to use labels as lower cardinality concepts. Think of them as groupings, or if coming from a relational database background, you can think of them as table names. You typically would not have a vertex/node with a label of "node1". It is more common that you would have a vertex/node with an ID of "node1".
g.V().hasLabel('node1').
aggregate('v').
inE().hasLabel('edge').outV(). # this can be simplified as in('edge')
hasLabel('node2').
aggregate('v').
The values in the aggregate then form a list that you can now select and perform other operations upon. In this case, you want all of the available properties for each of those two vertices. The valueMap() step creates a map of all properties associated with a vertex. Adding true inside of the valueMap() adds the ID and label. The unfold() step here is to unroll the aggregated list and pass each vertex in the list to valueMap():
select('v').
unfold().
valueMap(true)
That would get you a list of maps for each vertex in the following fashion:
[{
<T.id: 1>: <id>,
<T.label: 4>: 'node2',
'Prop1': ['prop1']
},{
<T.id: 1>: <id>,
<T.label: 4>: 'node1',
'Prop1': ['prop1'],
'Prop2': ['prop2'],
'Prop3': ['prop3'],
'Prop4': ['prop4'],
'Prop5': ['prop5'],
...
'Propn': ['propn']
}]

Related

Sort vertices by presence of 2 properties

UPDATE 1
I've added the descLength and imageLength properties to allow for easier sorting. The idea is that constant(0) can be used to fill in the values for users who lack either property, and any length greater than 0 can be used to identify a user who actually has the property. The furthest this gets me is being able to order().by() only one property at a time, using a query such as:
g.V().
order().
by(coalesce(values('descLength'), constant(0)))
But this isn't the full solution to match what I need.
Original Post
In amazon neptune I want to sort vertices based on the presence of 2 properties, desc and image. The order of ranking should be:
vertices that have both properties
vertices that have desc but not image
vertices that have image but not desc
vertices that have neither property
Consider this graph of users and their properties:
g.addV('user').property('type','person').as('u1').
addV('user').property('type','person').property('desc', 'second person').property('descLength', 13).as('u2').
addV('user').property('type','person').property('desc', 'third person').property('descLength', 12).property('image', 'https://www.example.com/image-3.jpeg').property('imageLength', 36).as('u3').
addV('user').property('type','person').property('image', 'https://www.example.com/image-4.jpeg').property('imageLength', 36).as('u4')
Using the ranking order I outlined, the results should be:
u3 because it has both desc and image
u2 because it has desc but not image
u4 because it has image but not desc
u1 because it has neither desc nor image
The order().by() samples I've seen work with data like numbers and dates that can be ranked by increasing/decreasing values, but of course strings like urls and text can't. What's the correct way to achieve this?
This first query is not exactly what you are looking for as it treats 'image' and 'desc' as the same weighting, but with this foundation, it should be possible to build out any variations of the query to better meet your needs.
Given:
g.V().hasLabel('user').
project('id','data').
by(id).
by(values('desc','image').fold()).
order().
by(select('data').count(local),desc)
we get
{'id': '92c04ae3-5a7f-ea4c-e74f-e7f79b44ad3a', 'data': ['third person', 'https://www.example.com/image-3.jpeg']}
{'id': 'e8c04ae3-5a7f-2cfb-cc28-cd663bd58ef9', 'data': ['second person']}
{'id': 'c8c04ae3-5a80-5707-8ba6-56554de98f33', 'data': ['https://www.example.com/image-4.jpeg']}
{'id': 'a6c04ae3-5a7e-fd0f-1197-17f3ce44595f', 'data': []}
Building on this, we can go one step further and calculate a score based on how many of the properties exist in each case. The query below gives desc a higher score than image so in the cases where they do not both exist, desc will sort higher.
g.V().hasLabel('user').
project('id','data','score').
by(id).
by(values('desc','image').fold()).
by(union(
has('desc').constant(2),
has('image').constant(1),
constant(0)).
sum()).
order().
by(select('score'),desc)
which yields
{'id': '92c04ae3-5a7f-ea4c-e74f-e7f79b44ad3a', 'data': ['third person', 'https://www.example.com/image-3.jpeg'], 'score': 3}
{'id': 'e8c04ae3-5a7f-2cfb-cc28-cd663bd58ef9', 'data': ['second person'], 'score': 2}
{'id': 'c8c04ae3-5a80-5707-8ba6-56554de98f33', 'data': ['https://www.example.com/image-4.jpeg'], 'score': 1}
{'id': 'a6c04ae3-5a7e-fd0f-1197-17f3ce44595f', 'data': [], 'score': 0}
UPDATED 2022-05-06 To show how to get just the ID
Taking the query above, to get the ID from the results is as simple as adding a select('id') at the end of he query.
g.V().hasLabel('user').
project('id','data','score').
by(id).
by(values('desc','image').fold()).
by(union(
has('desc').constant(2),
has('image').constant(1),
constant(0)).
sum()).
order().
by(select('score'),desc).
select('id')
However, we can also remove some of the other work the query is doing to fetch the results. I mainly included those for demonstration purposes. So we can reduce the query to:
g.V().hasLabel('user').
project('id','score').
by(id).
by(union(
has('desc').constant(2),
has('image').constant(1),
constant(0)).
sum()).
order().
by(select('score'),desc).
select('id')

Iterate list of values from traversal A in traversal B (Gremlin)

This is my test data:
graph = TinkerGraph.open()
g= graph.traversal()
g.addV('Account').property('id',"0x0").as('a1').
addV('Account').property('id',"0x1").as('a2').
addV('Account').property('id',"0x2").as('a3').
addV('Token').property('address','1').as('tk1').
addV('Token').property('address','2').as('tk2').
addV('Token').property('address','3').as('tk3').
addV('Trx').property('address','1').as('Trx1').
addV('Trx').property('address','1').as('Trx2').
addV('Trx').property('address','3').as('Trx3').
addE('sent').from('a1').to('Trx1').
addE('sent').from('a2').to('Trx2').
addE('received_by').from('Trx1').to('a2').
addE('received_by').from('Trx2').to('a3').
addE('distributes').from('a1').to('tk1').
addE('distributes').from('a1').to('tk2').
addE('distributes').from('a1').to('tk3').
iterate()
I need to first get all the Token addresses using the distributes relationship and then with those values loop through a traversal. This is an example of what I need for one single token
h = g.V().has('Account','id','0x0').next()
token = '1'
g.V(h).
out('sent').has('address',token).as('t1').
out('received_by').as('a2').
out('sent').has('address',token).as('t2').
out('received_by').as('a3').
select('a3','a2'). \
by('id').toList()
This is the output:
[a3:0x2,a2:0x1]
Instead of doing that has('address',token) on each hop I could omit it and just make sure the token address is the same by placing a where('t1',eq('t2')).by('address') at the end of the traversal, but this performs badly given my database design and indexes.
So what I do to iterate is:
tokens = g.V(h).out('distributes').values('address').toList()
finalList = []
for (token in tokens){
finalList.add(g.V(h).
out('sent').has('address',token).
out('received_by').as('a2').
out('sent').has('address',token).
out('received_by').as('a3').
select('a3','a2'). \
by('id').toList())
}
And this is what's stored in finalList at the end:
==>[[a3:0x2,a2:0x1]]
==>[]
==>[]
This works but I was wondering how can I iterate that token list this way without leaving Gremlin and without introducing that for loop. Also, my results contain empty results which is not optimal. The key here for me is to always be able to do that has('address',token) for each hop with the tokens that the Account node has ever sent. Thank you very much.
There is still uncertainty about what you are trying to achieve.
Nevertheless, I think this query does what you need:
g.V().has('Account', 'id', '0x0').as('a').
out('distributes').values('address').as('t').
select('a').
repeat(out('sent').where(values('address').
as('t')).
out('received_by')).
emit()
Example: https://gremlify.com/spwya4itlvd

How to traverse all vertex and get nested objects

I want to get nested objects in the form of
{ country :
{code:'IN',states:
{code:'TG',cities:
{code:'HYD',malls:
{[shopping-mall1],[shopping-mall2],.....}
},
{code:'PKL',malls:
{[shopping-mall1],[shopping-mall2],.....}
}
},
{code:'AP',cities:
{code:'VJY',malls:
{[shopping-mall1],[shopping-mall2],.....}
}
}
}
}
MY graph is in format
vertex: country ---> states ---->cities ---> mallls
edges: (type:'state') ('type','city')
ex: inE('typeOf').outV().has('type','state') move to next vertex "states".
next same inE('typeOf').outV().has('type','city') moves to "city" vertex. then "malls" vertex .
And tired to write the code, some vertex has no cities i have an error that situavation."
error
The provided traverser does not map to a value: v[8320]->[JanusGraphVertexStep(IN,[partOf],vertex), HasStep([type.eq(city)]), JanusGraphPropertiesStep([code],value)]
Thats why i am using coalesce because some state has not an edge 'inE('partOf').outV().has('type','city')' means no city
.by(coalesce(select('states').inE('partOf').outV().has('type','city'))
My query
g.V().hasLabel('Country').has('code','IN')
.project('country')
.by(project('code','states')
.by(values('code'))
.by(inE('partOf').outV().has('type','state').has('code').as('states').
project('code','cities')
.by(select('states').values('code'))
.by(coalesce(select('states').inE('partOf').outV().
has('type','city').has('code').as('cities').
project('code','malls')
.by(select('cities').values('code'))
.by(coalesce(select('cities').inE('partOf').outV().
has('type','malls').valueMap(),constant(0))),
constant(0)))))
But the result is
{country={code=IN, states={code=DD, cities=0}}}
here i am getting one state 'DD' and that state is no city,so it gives 'cities = 0".
the above result is only one state is coming, i want all states, cities and malls in each city.
Please update query or change query
In order to collect all the results you should use .fold() traversal which returns a list of the collected traversals. without fold you will get only the first traversal like in your example.
In order to keep the types the same I changed the constant to [] instead of 0.
It was also not clear if the "type" property is on the edge or the vertex. I find it more appropriate to have it on the edge, so I fixed it as well by moving the has('type',...) between the inE() and outV().
Last, you don't need to "store" the traversal using "as" and then "select" it.
This query should give you the required result:
g.V().hasLabel('Country').has('code','IN')
.project('country')
.by(project('code','states')
.by(values('code'))
.by(inE('partOf').has('type','state').outV().has('code')
.project('code','cities')
.by(values('code'))
.by(coalesce(inE('partOf').has('type','city').outV().has('code')
.project('code','malls')
.by(values('code'))
.by(coalesce(
inE('partOf').has('type','malls').outV().valueMap(),
constant([])).fold()),
constant([])).fold())
.fold()))

How do I collect values from a vertex used in a traversal?

I want the details of a vertex along with details of vertices that are joined to it.
I have a group vertex, incoming 'member' edges to user vertices. I want the details of the vertices.
g.V(1).as('a').in('member').valueMap().as('b').select('a','b').unfold().dedup()
==>a=v[1]
==>b={image=[images/profile/friend9.jpg], name=[Thomas Thompson], email=[me#thomasthompson.co.uk]}
==>b={image=[images/profile/friend13.jpg], name=[Laura Tostevin], email=[me#lauratostevin.co.uk]}
==>b={image=[images/profile/friend5.jpg], name=[Alan Thompson], email=[me#alanthompson.co.uk]}
==>b={image=[images/profile/friend10.jpg], name=[Laura Bourne], email=[me#laurabourne.co.uk]}
Ideally what I'd want is:
{label: 'group', id=1, name='A Group', users=[{id=2, label="user",name=".."}, ... }]}
When I tried a project, it didn't like me using 'in'
gremlin> g.V('1').project('name','users').by('name').by(in('member').select())
groovysh_parse: 1: unexpected token: in # line 1, column 83.
'name','users').by('name').by(in('member
To get your preferred output format, you have to join the group's valueMap() with the list of users. On TinkerPop's modern toy graph you would do something like this:
gremlin> g.V(3).union(valueMap(true).
by(unfold()),
project('users').
by(__.in('created').
valueMap(true).
by(unfold()).
fold())).
unfold().
group().
by(keys).
by(select(values))
==>[name:lop,id:3,lang:java,label:software,users:[[id:1,label:person,name:marko,...],...]]
Mapping this to your graph should be pretty straight-forward, it's basically just about changing labels.
Because in is a reserved keyword in Groovy you must use the verbose syntax __.in
try:
g.V('1').project('name','users').by('name').by(__.in('member').valueMap(true).fold())

Gremlin traversal.Output all Edge details and also in/out Vertex id's

I'm having trouble constructing the gremlin query to give me all of the Edge details(label, properties) and also the ID's of the Inv and OutV adjoining Vertex's (I don't need any more info from the linked Vertex's, just the ID's).
All I have is the Edge ID as a starting point.
So my Edge is as follows:
Label: "CONTAINS"
id: c6b4f3cb-f96e-cc97-dedb-e405771cb4f2
keys:
key="ekey1", value="e1"
key="ekey2", value="e2"
inV has id 50b4f3cb-f907-c31c-6284-1a3463fd72b9
outV has id 7cb4f3cb-d9a2-1398-61d7-9339be34833b
What I want is a single query that will return me something like -
"CONTAINS", "c6b4f3cb-f96e-cc97-dedb-e405771cb4f2", {ekey1=e1, ekey2=e2, ...}, "50b4f3cb-f907-c31c-6284-1a3463fd72b9", "7cb4f3cb-d9a2-1398-61d7-9339be34833b"
I can get the info in separate queries i.e.
g.E("c6b4f3cb-f96e-cc97-dedb-e405771cb4f2").bothV()
==>v[50b4f3cb-f907-c31c-6284-1a3463fd72b9]
==>v[7cb4f3cb-d9a2-1398-61d7-9339be34833b]
g.E("c6b4f3cb-f96e-cc97-dedb-e405771cb4f2").valueMap()
==>{ekey1=e1, ekey2=e2}
g.E("c6b4f3cb-f96e-cc97-dedb-e405771cb4f2").label()
==>CONTAINS
But I can't for the life of me work out how to combine these into one.
You could use project() to get what you're looking for:
g.E("c6b4f3cb-f96e-cc97-dedb-e405771cb4f2").
project('ekey1', 'inV', 'outV', 'label').
by('ekey1').
by(inV().id()).
by(outV().id()).
by(label).

Resources