Am relatively new to graph.
Looking for help to build gremlin query equivalent 4 below sql.
Select a.x1,a.x2,b.y1,b.y2 from table1 a, table b where a.x1=b.y1 and a.x2=b.y2.
Consider table as vertices and x1 x2 y1 y2 as properties.
In janusgraph there are no edges for these vertices and property labels are also different. Before getting the result for , need to check if the vertices have no edges.
If there are no edges, then this isn't a terribly "graphy" query so this might look a little clumsy. I think you would have to use some form of mid-traversal V(). I demonstrated here with a little data:
gremlin> g.addV('a').property('x1',1).property('x2',2).
......1> addV('b').property('y1',1).property('y2',2).
......2> addV('b').property('y1',2).property('y2',3).iterate()
gremlin> g.V().hasLabel('a').as('a').
......1> V().hasLabel('b').as('b').
......2> where('a', eq('b')).
......3> by('x1').
......4> by('y1').
......5> where('a', eq('b')).
......6> by('x2').
......7> by('y2').
......8> select('a','b').
......9> by(valueMap(true))
==>[a:[label:a,id:0,x1:[1],x2:[2]],b:[label:b,id:3,y1:[1],y2:[2]]]
I'm not sure if there isn't a nicer way to do this. Depending on how large your dataset is, this could be a tremendously expensive traversal and would probably be a better candidate for a form of OLAP traversal using Gremlin Spark.
Related
I've followed the Gremlin pagination receipe and find it works fine in all scenarios, except when you ask for 1 item against an empty traversal, then it fails with an error:
The provided start does not map to a value: []->[NeptuneMemoryTrackerStep, RangeLocalStep(0,1)]
Asking for one item from a non-empty traversal works fine. Asking for 0 items, 2 items or 10,000 items from an empty traversal is also fine!
Behaviour is as-expected with no errors except when asking for exactly 1 item from an empty traversal.
How can I make pagination resilient and work against queries that find no data when consumers are asking for exactly one item?
g.inject([])
.as('items','count')
.select('items','count')
.by(range(local, 0, 1))
.by(count(local))
.project('count', 'items')
.by(select('count'))
.by(select('items'))
BTW this is running on AWS Neptune.
Running the query just in the Gremlin Console perhaps further illustrates why this happens with values of 2 in the first case and 1 in the second.
gremlin> g.inject([]).
......1> as('items','count').
......2> select('items','count').
......3> by(range(local, 0, 2)).
......4> by(count(local)).
......5> project('count', 'items').
......6> by(select('count')).
......7> by(select('items'))
==>[count:0,items:[]]
gremlin> g.inject([]).
......1> as('items','count').
......2> select('items','count').
......3> by(range(local, 0, 1)).
......4> by(count(local)).
......5> project('count', 'items').
......6> by(select('count')).
......7> by(select('items'))
The provided start does not map to a value: []->[RangeLocalStep(0,1)]
The reason this happens is that any call to range with a length greater than 1 will return a list. However, when at most 1 item can be returned a single value gets returned, which in this case is essentially "nothing" and hence there is no value to put into the final project result. You will have to change the query to allow for this if an empty result and a range of length 1 is something you expect to encounter.
To further illustrate the difference, see the example below:
gremlin> g.inject([1,2,3]).range(local,0,2)
==>[1,2]
gremlin> g.inject([1,2,3]).range(local,0,1)
==>1
Here is one way to adjust your query so that all of these cases still yield a result.
gremlin> g.inject([]).
......1> as('items','count').
......2> select('items','count').
......3> by(coalesce(range(local, 0, 1),constant([]))).
......4> by(count(local)).
......5> project('count', 'items').
......6> by(select('count')).
......7> by(select('items'))
==>[count:0,items:[]]
I am trying to create a user vertex and a city vertex (if they do not already exist in the graph), and then add an edge between the two of them. When I execute the command in the same traversal, I run into this InternalFailureException from Neptune.
g.V("user12345").
fold().
coalesce(unfold(),addV("user").property(id, "user-12345")).as('user').
V("city-ATL").
fold().
coalesce(unfold(), addV("city").property(id, "city-ATL")).as("city").
addE("lives_in").
from("user").
to("city")
{"code":"InternalFailureException","detailedMessage":"An unexpected error has occurred in Neptune.","requestId":"xxx"}
(Note in the case above, both user-12345 and city-ATL do not exist in the graph).
However, when I create the city before executing the command, it works just fine:
gremlin> g.V("city-ATL").
fold().
coalesce(unfold(),
addV("city").property(id, "city-ATL"))
==>v[city-ATL]
gremlin> g.V("user-12345").
fold().
coalesce(unfold(),addV("user").property(id, "user-12345")).as('user').
V("city-ATL").
fold().
coalesce(unfold(), addV("city").property(id, "city-ATL")).as("city").
addE("lives_in").from("user").to("city")
==>e[1abd87d6-6f54-9e42-ae0a-47401c9dcfe6][user-12345-lives_in-city-ATL]
I am trying to build a traversal that can do them both together. Does anyone know why Neptune might be throwing this InternalFailureException when the city doesn't exist?
I will investigate further why you did not get a more useful error message but I can see that the Gremlin query will need to change. After a fold step any prior as labels are lost as fold reduces the traversers down to one. A fold is both a barrier and a map. You should be able to use store or aggregate(local) instead of as in this case where you have to use fold for each coalesce.
gremlin> g.V('user-1234').
......1> fold().
......2> coalesce(unfold(),addV('person').property(id,'user-1234')).store('a').
......3> V('city-ATL').
......4> fold().
......5> coalesce(unfold(),addV('city').property(id,'city-ATL')).store('b').
......6> addE('lives_in').
......7> from(select('a').unfold()).
......8> to(select('b').unfold())
==>e[0][user-1234-lives_in->city-ATL]
I'm using Amazon Neptune, which does not support variables. For complex queries, however, I need to use a variable in multiple places. How can I do this without querying twice for the same data?
Here's the problem I'm trying to tackle:
Given a start Person, find Persons that the start Person is connected to by at most 3 steps via the knows relationship. Return each Person's name and email, as well as the distance (1-3).
How would I write this query in Gremlin without variables, since variables are unsupported in Neptune?
I don't see any reason why you would need variables for your traversal and there are many ways you could get an answer. Assuming this graph:
g = TinkerGraph.open().traversal()
g.addV('person').property('name','A').property('age',20).as('a').
addV('person').property('name','B').property('age',21).as('b').
addV('person').property('name','C').property('age',22).as('c').
addV('person').property('name','D').property('age',19).as('d').
addV('person').property('name','E').property('age',22).as('e').
addV('person').property('name','F').property('age',24).as('f').
addE('next').from('a').to('b').
addE('next').from('b').to('c').
addE('next').from('b').to('d').
addE('next').from('c').to('e').
addE('next').from('d').to('e').
addE('next').from('e').to('f').iterate()
You could do something like:
gremlin> g.V().has('person','name','A').
......1> repeat(out().
......2> group('m').
......3> by(loops()).
......4> by(valueMap('name','age').by(unfold()).fold())).
......5> times(3).
......6> cap('m')
==>[0:[[name:B,age:21]],1:[[name:C,age:22],[name:D,age:19]],2:[[name:E,age:22],[name:E,age:22]]]
Find a particular "person" vertex by their name, in this case "A", then repeatedly traverse out() and group those vertices you come across by loops() which is how deep you have traversed. I use valueMap() in this case to extract the properties you wanted. The times(3) is the limit to the depth of your search. Finally you cap() out the side-effect Map held in "m" from our group(). That approach was meant to just give you a bit of basic structure to how you would accomplish this. You could perhaps polish it further this way:
gremlin> g.V().has('person','name','A').
......1> repeat(out().
......2> group('m').
......3> by(loops())).
......4> times(3).
......5> cap('m').unfold().select(values).unfold().
......6> dedup().
......7> valueMap('name','age').by(unfold())
==>[name:B,age:21]
==>[name:C,age:22]
==>[name:D,age:19]
==>[name:E,age:22]
The above example, extracts the values from the Map in "m", removes the duplicates with dedup() and then converts to the result you want. Maybe you don't need the Map in the first place (I just have it on my mind because of this answer actually) - you could simple store() your results as follows:
gremlin> g.V().has('person','name','A').
......1> repeat(out().store('m')).
......2> times(3).
......3> cap('m').unfold().
......4> dedup().
......5> valueMap('name','age').by(unfold())
==>[name:B,age:21]
==>[name:C,age:22]
==>[name:D,age:19]
==>[name:E,age:22]
You might look at using something like simplePath() as well to help avoid re-traversing the same paths over and over again. You can read about that step in the Reference Documentation.
g.AddV('test').property('id','1').property('name','test 1')
g.AddV('test').property('id','2').property('name','test 2')
g.V('1').AddE('owns').to(g.AddV('another').property('id','3'))
Is there any way i can clone this owns edge and it's target another vertex of test 1 with all properties into test 2 vertex? This is just a sample data. I have vertex with at least 10 properties.
NOTE : Query needs to support cosmos db gremlin api.
The answer to this one is mostly presented in this other StackOverflow question which explains how to clone a vertex and all it's edges. Since this question is slightly different I thought I'd adapt it a bit rather this suggesting closing this as a duplicate.
gremlin> g.V().has('test','name','test 1').as('t1').
......1> outE('owns').as('e').inV().as('source').
......2> V().has('test','name','test 2').as('target').
......3> sideEffect(
......4> select('source').properties().as('p').
......5> select('target').
......6> property(select('p').key(), select('p').value())).
......7> sideEffect(
......8> select('t1').
......9> addE(select('e').label()).as('eclone').
.....10> to(select('target')).
.....11> select('e').properties().as('p').
.....12> select('eclone').
.....13> property(select('p').key(), select('p').value()))
==>v[3]
gremlin> g.E()
==>e[8][0-owns->6]
==>e[10][0-owns->3]
gremlin> g.V().valueMap(true)
==>[id:0,label:test,name:[test 1],id:[1]]
==>[id:3,label:test,name:[test 2],id:[3]]
==>[id:6,label:another,id:[3]]
Note that since labels are immutable, you are stuck with the vertex label being "another" given the way that you laid out your sample data. Also, I know it is just sample data, but note that overloading "id" isn't a good choice as it can lead to confusion with T.id.
Execute api: g.V().has('name','test 1').id()
Then try to loop the results in java code and execute the add edge api:
g.V(<the id of vertex loop>).AddE('owns').to(<the id of vertex 'test2'>)
If the vertexes of test 2 are multiple,then you could two-dimensional loop.
I have the following Gremlin traversal, running on Azure CosmosDB, and I only want to return URLs with a count greater than 1. I'm not sure how to limit the return from the groupCount().
g.V().hasLabel('article').values('url').groupCount()
Here's an example from the modern toy graph:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().hasLabel('software').in().
......1> groupCount().
......2> by('name').
......3> unfold().
......4> filter(select(values).unfold().is(gt(1)))
==>josh=2
So you do the groupCount() and then unfold() the resulting Map then filter() the individual values from the Map.
In your case you would probably have something like:
g.V().hasLabel('article').
groupCount()
by('url').
unfold().
filter(select(values).unfold().is(gt(1)))
Per my comment on the answer from Stephen Mallette, Azure CosmosDB Graph https://learn.microsoft.com/en-us/azure/cosmos-db/gremlin-support doesn't support the filter step so I used the where step to achieve the desired results.
g.V().hasLabel('article').groupCount().by('url').unfold().where(select(values).is(gt(1)))