How sort vertices by edge values in Gremlin - gremlin

Given the air-routes graph, say I want to get all possible one-stopover routes, like so:
[home] --distance--> [stopover] --distance--> [destination]
where [home], [stopover] and [destination] are airport nodes that each have a property 'code' that represent an airport code; and distance is an integer weight given to each edge connecting two airport nodes.
How could I write a query that gets me the airport codes for [home], [stopover] and [destination] such that the results are sorted as follows:
[home] airport codes are sorted alphabetically.
For each group of [home] airport, the [stopover] airport codes are sorted by the distance between [home] and [stopover] (ascending).
After sorting 1 and 2, [destination] airport codes are sorted by the distance between [stopover] and [destination].
(Note: it doesn't matter if [home] and [destination] are the same airport)

One way you could do this is through group with nested by modulation.
g.V().
group().
by('code').
by(
outE('route').
order().by('dist').
inV().
group().
by('code').
by(
outE('route').
order().by('dist').
inV().
values('code').fold())).
unfold()
The result is something like:
1. {'SHH': {'WAA': ['KTS', 'SHH', 'OME'], 'OME': ['TLA', 'WMO', 'KTS', 'GLV', 'ELI', 'TNC', 'WAA', 'WBB', 'SHH', 'SKK', 'KKA', 'UNK', 'SVA', 'OTZ', 'GAM', 'ANC']}}
2. {'KWN': {'BET': ['WNA', 'KWT', 'ATT', 'KUK', 'TLT', 'EEK', 'WTL', 'KKH', 'KWN', 'KLG', 'MLL', 'KWK', 'PQS', 'CYF', 'KPN', 'NME', 'OOK', 'GNU', 'VAK', 'SCM', 'HPB', 'EMK', 'ANC'], 'EEK': ['KWN', 'BET'], 'TOG': ['KWN']}}
3. {'NUI': {'SCC': ['NUI', 'BTI', 'BRW', 'FAI', 'ANC'], 'BRW': ['ATK', 'AIN', 'NUI', 'PIZ', 'SCC', 'FAI', 'ANC']}}
4. {'PSG': {'JNU': ['HNH', 'GST', 'HNS', 'SGY', 'SIT', 'KAE', 'PSG', 'YAK', 'KTN', 'ANC', 'SEA'], 'WRG': ['PSG', 'KTN']}}
5. {'PIP': {'UGB': ['PTH']}}
.
.
.

Related

Query to find node which has only one vertex in common

I have the following vertices -
Person1 -> Device1 <- Person2
^
| |
v
Email1 <- Person3
Now I want to write a gremlin query (janusgraph) which will give me all persons connected to the device(only) with which person1 is connected.
So according to the above graph, our output should be - [Person2].
Person3 is not in output because Person3 is also connected with "Email1" of "Person1".
g.addV('person').property('name', 'Person1').as('p1').
addV('person').property('name', 'Person2').as('p2').
addV('person').property('name', 'Person3').as('p3').
addV('device').as('d1').
addV('email').as('e1').
addE('HAS_DEVICE').from('p1').to('d1').
addE('HAS_EMAIL').from('p1').to('e1').
addE('HAS_DEVICE').from('p2').to('d1').
addE('HAS_DEVICE').from('p3').to('d1').
addE('HAS_EMAIL').from('p3').to('e1')
The following traversal will give you the person vertices that are connected to "Person1" via one or more "device" vertices and not connected via any other type of vertices.enter code here
g.V().has('person', 'name', 'Person1').as('p1').
out().as('connector').
in().where(neq('p1')).
group().
by().
by(select('connector').label().fold()).
unfold().
where(
select(values).
unfold().dedup().fold(). // just in case the persons are connected by multiple devices
is(eq(['device']))
).
select(keys)

How to get in / out spanning tree of a node with readable labels in response with gremlin?

This question is to how get a more readable response from gremlin that can be used to further process the data. I can get the spanning tree with simplePath step with some arbitrary depth (10 here) as follows:
g.V().has(name, foo).repeat(both().simplePath()).emit().times(10).dedup()
or further break down by in / out:
g.V().has(name, foo).repeat(in().simplePath()).emit().times(10).dedup()
g.V().has(name, foo).repeat(out().simplePath()).emit().times(10).dedup()
but this returns an array of vertices without clear grouping, for example I can't know, just by getting the response, what are the in vertices of an in vertex or any other depth.
Is there a way to get a nested response back or clearly label each level and the in / out vertices at those levels in gremlin?
You can use loops() and group().by() to accomplish this. Example below uses the air-routes dataset from Practical Gremlin.
g.V().has('airport', 'code', 'ANC').
emit().repeat(group('x').by(loops()).by(__.fold()).out().dedup()).times(3).
cap('x').unfold()
This returns:
{0: [v[2]]}
{1: [v[3], v[1101], v[1102], v[1103], v[1104], v[1105], v[1106], v[1107], v[1108], v[1109], v[1110], v[1111], v[1112], v[1113], v[1114], v[8], v[11], v[13], v[17], v[18], v[20], v[22], v[23], v[27], v[30], v[31], v[37], v[884], v[886], v[149], v[159], v[1092], v[1093], v[1094], v[1095], v[1096], v[1097], v[1098], v[1099], v[1100], v[2658]]}
{2: [v[368], v[371], v[375], v[389], v[1], v[4], v[5], v[6], v[7], v[9], v[10], v[12], v[15], v[16], v[21], v[24], v[25], v[26], v[28], v[29], v[34], v[35], v[38], v[39], v[41], v[42], v[43], v[45], v[46], v[47], v[49], v[50], v[416], v[430], v[444], v[52], v[99], v[1274], v[136], v[147], v[150], v[549], v[929], v[151], v[178], v[180], v[182], v[183], v[184], v[185], v[186], v[187], v[188], v[189], v[190], v[193], v[194], v[227], v[239], v[240], v[244], v[265], v[268], v[269], v[273], v[278], v[280], v[281], v[2], v[2335], v[2336], v[2338], v[2355], v[2357], v[2365], v[2379], v[3208], v[1283], v[2358], v[2388], v[1771], v[3133], v[2983], v[2618], v[2337], v[2340], v[2350], v[1770], v[3008], v[2333], v[2334], v[2342], v[2346], v[2363], v[2364], v[2368], v[2369], v[2377], v[2381], v[2384], v[2387], v[2874], v[2321], v[2326], v[2331], v[2341], v[2359], v[2366], v[2373], v[2378], v[2386], v[3256], v[3257], v[2320], v[2361], v[2353], v[2383], v[2616], v[2351], v[301], v[305], v[306], v[314], v[356], v[357], v[358], v[359], v[360], v[361], v[362], v[363], v[364], v[365], v[366], v[367], v[369], v[370], v[372], v[373], v[374], v[376], v[377], v[379], v[380], v[381], v[382], v[383], v[384], v[385], v[386], v[387], v[388], v[390], v[391], v[392], v[393], v[394], v[395], v[396], v[397], v[398], v[399], v[400], v[1123], v[1127], v[14], v[19], v[33], v[36], v[40], v[44], v[48], v[401], v[402], v[403], v[404], v[405], v[406], v[407], v[408], v[409], v[410], v[411], v[412], v[413], v[414], v[415], v[417], v[418], v[419], v[420], v[421], v[422], v[423], v[424], v[425], v[426], v[427], v[428], v[429], v[431], v[432], v[441], v[443], v[445], v[51], v[54], v[55], v[58], v[60], v[61], v[64], v[67], v[68], v[70], v[74], v[80], v[83], v[85], v[86], v[865], v[872], v[877], v[883], v[890], v[895], v[2076], v[2082], v[3259], v[106], v[122], v[131], v[132], v[133], v[134], v[135], v[138], v[525], v[530], v[909], v[919], v[921], v[934], v[936], v[941], v[950], v[3306], v[164], v[181], v[195], v[196], v[198], v[563], v[571], v[573], v[575], v[576], v[577], v[580], v[969], v[208], v[211], v[217], v[225], v[237], v[238], v[241], v[243], v[245], v[246], v[603], v[605], v[625], v[629], v[630], v[640], v[3416], v[263], v[264], v[267], v[270], v[271], v[277], v[288], v[289], v[291], v[292], v[295], v[296], v[297], v[300], v[651], v[1082], v[312], v[378], v[1117], v[1118], v[1119], v[1120], v[1121], v[1122], v[1124], v[1125], v[1126], v[435], v[63], v[84], v[899], v[900], v[102], v[161], v[205], v[220], v[223], v[224], v[236], v[608], v[304], v[318], v[337], v[701], v[714], v[837], v[1241], v[53], v[56], v[57], v[66], v[73], v[75], v[76], v[92], v[93], v[100], v[864], v[868], v[870], v[874], v[876], v[878], v[882], v[885], v[887], v[888], v[889], v[891], v[892], v[893], v[894], v[896], v[897], v[898], v[3283], v[103], v[105], v[109], v[140], v[146], v[148], v[901], v[902], v[903], v[907], v[916], v[924], v[177], v[199], v[200], v[567], v[568], v[213], v[215], v[249], v[250], v[638], v[3046], v[3426], v[272], v[294], v[299], v[665], v[669], v[676], v[677], v[307], v[437], v[438], v[439], v[447], v[449], v[450], v[917], v[918], v[938], v[943], v[947], v[949], v[3314], v[3326], v[152], v[153], v[166], v[958], v[961], v[964], v[611], v[613], v[620], v[623], v[632], v[283], v[290], v[293], v[1069], v[1072], v[1073], v[1074], v[1075], v[1076], v[1077], v[1078], v[1079], v[1080], v[1151], v[32], v[434], v[436], v[446], v[448], v[59], v[78], v[79], v[90], v[482], v[1295], v[2074], v[2075], v[3288], v[110], v[112], v[145], v[526], v[527], v[529], v[2107], v[163], v[192], v[956], v[957], v[959], v[960], v[962], v[963], v[965], v[966], v[967], v[968], v[970], v[202], v[242], v[609], v[612], v[615], v[649], v[650], v[3019], v[3439], v[266], v[286], v[652], v[1128], v[1129], v[1534], v[3347], v[951], v[87], v[863], v[1272], v[1273], v[1275], v[1276], v[920], v[866], v[867], v[869], v[871], v[873], v[875], v[107], v[671], v[673], v[1294], v[2073], v[913], v[1050], v[1438], v[282], v[1070], v[1071], v[1661], v[3286], v[3287], v[911], v[912], v[915], v[923], v[926], v[930], v[931], v[932], v[933], v[935], v[937], v[939], v[940], v[944], v[945], v[946], v[948], v[3323], v[952], v[953], v[954], v[955], v[628], v[71], v[458], v[212], v[214], v[639], v[641], v[642], v[643], v[644], v[647], v[648], v[2100], v[2322], v[2323], v[2324], v[2332], v[2347], v[2352], v[2362], v[2367], v[2374], v[2380], v[2389], v[2048], v[2420], v[2934], v[2601], v[3028], v[2325], v[2327], v[2328], v[2329], v[2330], v[2339], v[2345], v[2354], v[2356], v[2360], v[2371], v[2382], v[2385], v[3122], v[2024], v[3258], v[1768], v[1772], v[2621], v[3006], v[3009], v[2370], v[2376], v[2396], v[2343], v[2372], v[3203], v[2313], v[2344], v[2349], v[3236], v[1769], v[710], v[657]]}
with a combination of .project() and tree() steps.
Something like:
g.V().has(name, foo).project('in', 'out').
by(repeat(in().simplePath()).emit().times(10).dedup().
tree().by(values(name))).
by(repeat(out().simplePath()).emit().times(10).dedup().
tree().by(values(name)))

Sort cosine similarity scores before adding edges to graph

I was trying to get the example for jaccard similarity found here to work for cosine similarity but wanted to limit the number of created links to the top 10 scores.
I reviewed https://gist.github.com/dkuppitz/79e0b009f0c9ae87db5a but couldn't figure out how to skip over the edge creation piece to sort it before and get the same results as from the link above.
Based on the jaccard example above this is what I have come up with so far:
g.V().
match(
__.as('v1').outE('RECOMMENDS').values('amount').fold().as('v1rec'),
__.as('v1').V().as('v2'),
__.as('v2').outE('RECOMMENDS').values('amount').fold().as('v2rec'),
__.as('v1').out().dedup().fold().as('v1n'),
__.as('v2').out().dedup().fold().as('v2n')
).
where('v1',lt('v2')).
by(id).
where('v1',neq('v2').and(without('v1n'))).
where('v2',without('v1n')).
project('v1','v2','n','d1','d2').
by(select('v1')).
by(select('v2')).
by(
select('v1rec','v2rec') <-- this does not work, can't get dot product from this
).
by(coalesce(
select('v1rec').
unfold().
math('_ ^ 2').
sum(),
constant(0))).
by(coalesce(
select('v2rec').
unfold().
math('_ ^ 2').
sum(),
constant(0))).
filter(select('d1').is(gt(0))).
filter(select('d2').is(gt(0))).
project('v1','v2','cosine').
by(select('v1')).
by(select('v2')).
by(math('n/(sqrt(d1)*sqrt(d2))')).
sort{-it.cosine}.
toList()[0..9].
each {
r -> g.V(r['v2']).as('v2').
V(r['v1']).
addE('PREDICTED_COSINE').
to('v2').
property('score', r['cosine']).
toList()
}
but can't figure out how to get the dot product in the third by step with select('v1rec','v2rec'). Please help.
UPDATE:
I couldn't fit this in a comment so posting here:
I tried another approach that gets me closer (I think) but still have an issue iterating over each list of maps to extract values from each:
g.V().
match(
__.as('v1').outE().as('e1'),
__.as('v1').V().as('v2'),
__.as('v2').outE().as('e2'),
__.as('v1').out().dedup().fold().as('v1n'),
__.as('v2').out().dedup().fold().as('v2n')).
where('v1',neq('v2').
and(without('v1n'))).
where('v2',without('v1n')).
project('v1','v2','a1','a2').
by(select('v1')).
by(select('v2')).
by(select('e1').by('amount')).
by(select('e2').by('amount')).
project('v1','v2','n','d1','d2').
by(select('v1')).
by(select('v2')).
by(math('a1 * a2')).
by(math('a1 * a1')).
by(math('a2 * a2')).
group().
by(select('v1','v2')).
unfold()
One line of output:
==>{v1=v[4240], v2=v[8320]}=[{v1=v[4240], v2=v[8320], n=210.0, d1=196.0, d2=225.0}, {v1=v[4240], v2=v[8320], n=182.0, d1=196.0, d2=169.0}, {v1=v[4240], v2=v[8320], n=182.0, d1=196.0, d2=169.0}, {v1=v[4240], v2=v[8320], n=45.0, d1=9.0, d2=225.0}, {v1=v[4240], v2=v[8320], n=39.0, d1=9.0, d2=169.0}, {v1=v[4240], v2=v[8320], n=39.0, d1=9.0, d2=169.0}, {v1=v[4240], v2=v[8320], n=45.0, d1=9.0, d2=225.0}, {v1=v[4240], v2=v[8320], n=39.0, d1=9.0, d2=169.0}, {v1=v[4240], v2=v[8320], n=39.0, d1=9.0, d2=169.0}]
My goal is to sum up all the "n", "d1", and "d2" values from the maps so I can calculate the similarity as sum(n)/(sqrt(sum(d1))*sqrt(sum(d2))) for each key (such as {v1=v[4240], v2=v[8320]} outside the list from the example so n would be 210 + 182 + 182 + 45 + 39 + 39 + 45 + 39 + 39 = 820). I want to do this for a bunch of graph so I don't have one specific for this. Make sense now?
This is what I finally came up with:
g.V().
match(
__.as('v1').outE().as('e1'),
__.as('v1').V().as('v2'),
__.as('v2').outE().as('e2'),
__.as('v1').out().dedup().fold().as('v1n'),
__.as('v2').out().dedup().fold().as('v2n')
).
where('v1',neq('v2').
and(without('v1n'))).
where('v2',without('v1n')).
project('v1','v2','a1','a2').
by(select('v1')).
by(select('v2')).
by(select('e1').by('amount')).
by(select('e2').by('amount')).
project('v1','v2','n','d1','d2').
by(select('v1')).
by(select('v2')).
by(math('a1 * a2')).
by(math('a1 * a1')).
by(math('a2 * a2')).
group().
by(select('v1','v2')).
unfold().
project('v1','v2','n','d1','d2').
by(select(keys).select('v1')).
by(select(keys).select('v2')).
by(select(values).local(unfold().select('n').sum())).
by(select(values).local(unfold().select('d1').sum())).
by(select(values).local(unfold().select('d2').sum())).
project('v1','v2','c').
by(select('v1')).
by(select('v2')).
by(math('n / (sqrt(d1) * sqrt(d2))')).
sort{ -it.c }.
toList()[0..9]
Thank for all the help.

Gremlin query with nested vertices

My use case is: Bag vertex has edge holds to Box vertex and Box vertex has edge contains to Fruit vertex. So it's a parent-child relation between 3 vertices.
Is it possible to write gremlin query which returns all related 3 vertices. for e.g i need to fetch all Bags by id including Box vertex and further down to Fruit vertex for that Bag id. In SQL like syntax it's a simple select * from bag where id = 1.
sample structure:
g.addV('bag').property('id',1).property('name','bag1').property('size','12').as('1').
addV('box').property('id',2).property('name','box1').property('width','12').as('2').
addV('fruit').property('id',3).property('name','apple').property('color','red').as('3').
addV('bag').property('id',4).property('name','bag2').property('size','44').as('4').
addV('box').property('id',5).property('name','box2').property('width','14').as('5').
addV('fruit').property('id',6).property('name','orange').property('color','yellow').as('6').
addE('holds').from('1').to('2').
addE('contains').from('2').to('3').
addE('holds').from('4').to('5').
addE('contains').from('5').to('6').iterate()
I want to get all properties of 1, 2, 3 when i query for vertices 1.
I want the response in the below format.
"bags" : [{
"id":"1",
"name":"bag1",
"size" :"12",
"boxes":[ {
"id" : "2",
"name":"box1",
"width" : "12",
"fruits": [{
"id":"3",
"name" : "apple",
"color" : "red"
}]
}]
},
{
"id":"4",
"name":"bag2",
"size" : "44",
"boxes":[ {
"id" : "5",
"name":"box2",
"width" : "44",
"fruits": [{
"id":"6",
"name" : "orange"
"color" : "yellow"
}]
}]
}]
But not sure if similar case is possible in gremlin as there are no implicit relation between vertices.
I would probably use project() to accomplish this:
gremlin> g.V().hasLabel('bag').
......1> project('id', 'name','boxes').
......2> by('id').
......3> by('name').
......4> by(out('holds').
......5> project('id','name','fruits').
......6> by('id').
......7> by('name').
......8> by(out('contains').
......9> project('id','name').
.....10> by('id').
.....11> by('name').
.....12> fold()).
.....13> fold())
==>[id:1,name:bag1,boxes:[[id:2,name:box1,fruits:[[id:3,name:apple]]]]]
==>[id:4,name:bag2,boxes:[[id:5,name:box2,fruits:[[id:6,name:orange]]]]]
I omitted the "bags" root level key as there were no other keys in the Map and it didn't seem useful to add that extra level.

Gremlin query uneven result issue

Suppose I have 3 students (A,B,C) and having a major subject and marks respectievely but when I query the result shown in a uneven way.
Data
A -> Math -> 77
B -> History -> 70
C -> Science -> 97
Query
g.V('Class').has('name',within('A','B','C'))
Result
{"student_name":['A','B','C'], "major_subject":['Math','Science','History'], "marks":[70,77,97]}
The data displayed by querying the database is not in order according to the name of the student.
I assume that your graph looks kinda like this:
g = TinkerGraph.open().traversal()
g.addV('student').property('name', 'A').
addE('scored').to(addV('subject').property('name', 'Math')).
property('mark', 77).
addV('student').property('name', 'B').
addE('scored').to(addV('subject').property('name', 'History')).
property('mark', 70).
addV('student').property('name', 'C').
addE('scored').to(addV('subject').property('name', 'Science')).
property('mark', 97).iterate()
Now the easiest way to gather the data is this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('student').
outE('scored').as('mark').inV().as('major').
select('student','major','mark').
by('name').
by('name').
by('mark')
==>[student:A,major:Math,mark:77]
==>[student:B,major:History,mark:70]
==>[student:C,major:Science,mark:97]
But if you really depend on the format shown in your question, you can do this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).
store('student').by('name').
outE('scored').store('mark').by('mark').
inV().store('major').by('name').
cap('student','major','mark')
==>[major:[Math,History,Science],student:[A,B,C],mark:[77,70,97]]
If you want to get the cap'ed result to be ordered by marks, you'll need a mix of the 2 queries:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('a').
outE('scored').as('b').
order().
by('mark').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[History,Math,Science],student:[B,A,C],mark:[70,77,97]]
To order by the order of inputs:
gremlin> input = ['C', 'B', 'A']; []
gremlin> g.V().has('student', 'name', within(input)).as('a').
order().
by {input.indexOf(it.value('name'))}.
outE('scored').as('b').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[97,70,77],student:[C,B,A],mark:[Science,History,Math]]

Resources