This question is to how get a more readable response from gremlin that can be used to further process the data. I can get the spanning tree with simplePath step with some arbitrary depth (10 here) as follows:
g.V().has(name, foo).repeat(both().simplePath()).emit().times(10).dedup()
or further break down by in / out:
g.V().has(name, foo).repeat(in().simplePath()).emit().times(10).dedup()
g.V().has(name, foo).repeat(out().simplePath()).emit().times(10).dedup()
but this returns an array of vertices without clear grouping, for example I can't know, just by getting the response, what are the in vertices of an in vertex or any other depth.
Is there a way to get a nested response back or clearly label each level and the in / out vertices at those levels in gremlin?
You can use loops() and group().by() to accomplish this. Example below uses the air-routes dataset from Practical Gremlin.
g.V().has('airport', 'code', 'ANC').
emit().repeat(group('x').by(loops()).by(__.fold()).out().dedup()).times(3).
cap('x').unfold()
This returns:
{0: [v[2]]}
{1: [v[3], v[1101], v[1102], v[1103], v[1104], v[1105], v[1106], v[1107], v[1108], v[1109], v[1110], v[1111], v[1112], v[1113], v[1114], v[8], v[11], v[13], v[17], v[18], v[20], v[22], v[23], v[27], v[30], v[31], v[37], v[884], v[886], v[149], v[159], v[1092], v[1093], v[1094], v[1095], v[1096], v[1097], v[1098], v[1099], v[1100], v[2658]]}
{2: [v[368], v[371], v[375], v[389], v[1], v[4], v[5], v[6], v[7], v[9], v[10], v[12], v[15], v[16], v[21], v[24], v[25], v[26], v[28], v[29], v[34], v[35], v[38], v[39], v[41], v[42], v[43], v[45], v[46], v[47], v[49], v[50], v[416], v[430], v[444], v[52], v[99], v[1274], v[136], v[147], v[150], v[549], v[929], v[151], v[178], v[180], v[182], v[183], v[184], v[185], v[186], v[187], v[188], v[189], v[190], v[193], v[194], v[227], v[239], v[240], v[244], v[265], v[268], v[269], v[273], v[278], v[280], v[281], v[2], v[2335], v[2336], v[2338], v[2355], v[2357], v[2365], v[2379], v[3208], v[1283], v[2358], v[2388], v[1771], v[3133], v[2983], v[2618], v[2337], v[2340], v[2350], v[1770], v[3008], v[2333], v[2334], v[2342], v[2346], v[2363], v[2364], v[2368], v[2369], v[2377], v[2381], v[2384], v[2387], v[2874], v[2321], v[2326], v[2331], v[2341], v[2359], v[2366], v[2373], v[2378], v[2386], v[3256], v[3257], v[2320], v[2361], v[2353], v[2383], v[2616], v[2351], v[301], v[305], v[306], v[314], v[356], v[357], v[358], v[359], v[360], v[361], v[362], v[363], v[364], v[365], v[366], v[367], v[369], v[370], v[372], v[373], v[374], v[376], v[377], v[379], v[380], v[381], v[382], v[383], v[384], v[385], v[386], v[387], v[388], v[390], v[391], v[392], v[393], v[394], v[395], v[396], v[397], v[398], v[399], v[400], v[1123], v[1127], v[14], v[19], v[33], v[36], v[40], v[44], v[48], v[401], v[402], v[403], v[404], v[405], v[406], v[407], v[408], v[409], v[410], v[411], v[412], v[413], v[414], v[415], v[417], v[418], v[419], v[420], v[421], v[422], v[423], v[424], v[425], v[426], v[427], v[428], v[429], v[431], v[432], v[441], v[443], v[445], v[51], v[54], v[55], v[58], v[60], v[61], v[64], v[67], v[68], v[70], v[74], v[80], v[83], v[85], v[86], v[865], v[872], v[877], v[883], v[890], v[895], v[2076], v[2082], v[3259], v[106], v[122], v[131], v[132], v[133], v[134], v[135], v[138], v[525], v[530], v[909], v[919], v[921], v[934], v[936], v[941], v[950], v[3306], v[164], v[181], v[195], v[196], v[198], v[563], v[571], v[573], v[575], v[576], v[577], v[580], v[969], v[208], v[211], v[217], v[225], v[237], v[238], v[241], v[243], v[245], v[246], v[603], v[605], v[625], v[629], v[630], v[640], v[3416], v[263], v[264], v[267], v[270], v[271], v[277], v[288], v[289], v[291], v[292], v[295], v[296], v[297], v[300], v[651], v[1082], v[312], v[378], v[1117], v[1118], v[1119], v[1120], v[1121], v[1122], v[1124], v[1125], v[1126], v[435], v[63], v[84], v[899], v[900], v[102], v[161], v[205], v[220], v[223], v[224], v[236], v[608], v[304], v[318], v[337], v[701], v[714], v[837], v[1241], v[53], v[56], v[57], v[66], v[73], v[75], v[76], v[92], v[93], v[100], v[864], v[868], v[870], v[874], v[876], v[878], v[882], v[885], v[887], v[888], v[889], v[891], v[892], v[893], v[894], v[896], v[897], v[898], v[3283], v[103], v[105], v[109], v[140], v[146], v[148], v[901], v[902], v[903], v[907], v[916], v[924], v[177], v[199], v[200], v[567], v[568], v[213], v[215], v[249], v[250], v[638], v[3046], v[3426], v[272], v[294], v[299], v[665], v[669], v[676], v[677], v[307], v[437], v[438], v[439], v[447], v[449], v[450], v[917], v[918], v[938], v[943], v[947], v[949], v[3314], v[3326], v[152], v[153], v[166], v[958], v[961], v[964], v[611], v[613], v[620], v[623], v[632], v[283], v[290], v[293], v[1069], v[1072], v[1073], v[1074], v[1075], v[1076], v[1077], v[1078], v[1079], v[1080], v[1151], v[32], v[434], v[436], v[446], v[448], v[59], v[78], v[79], v[90], v[482], v[1295], v[2074], v[2075], v[3288], v[110], v[112], v[145], v[526], v[527], v[529], v[2107], v[163], v[192], v[956], v[957], v[959], v[960], v[962], v[963], v[965], v[966], v[967], v[968], v[970], v[202], v[242], v[609], v[612], v[615], v[649], v[650], v[3019], v[3439], v[266], v[286], v[652], v[1128], v[1129], v[1534], v[3347], v[951], v[87], v[863], v[1272], v[1273], v[1275], v[1276], v[920], v[866], v[867], v[869], v[871], v[873], v[875], v[107], v[671], v[673], v[1294], v[2073], v[913], v[1050], v[1438], v[282], v[1070], v[1071], v[1661], v[3286], v[3287], v[911], v[912], v[915], v[923], v[926], v[930], v[931], v[932], v[933], v[935], v[937], v[939], v[940], v[944], v[945], v[946], v[948], v[3323], v[952], v[953], v[954], v[955], v[628], v[71], v[458], v[212], v[214], v[639], v[641], v[642], v[643], v[644], v[647], v[648], v[2100], v[2322], v[2323], v[2324], v[2332], v[2347], v[2352], v[2362], v[2367], v[2374], v[2380], v[2389], v[2048], v[2420], v[2934], v[2601], v[3028], v[2325], v[2327], v[2328], v[2329], v[2330], v[2339], v[2345], v[2354], v[2356], v[2360], v[2371], v[2382], v[2385], v[3122], v[2024], v[3258], v[1768], v[1772], v[2621], v[3006], v[3009], v[2370], v[2376], v[2396], v[2343], v[2372], v[3203], v[2313], v[2344], v[2349], v[3236], v[1769], v[710], v[657]]}
with a combination of .project() and tree() steps.
Something like:
g.V().has(name, foo).project('in', 'out').
by(repeat(in().simplePath()).emit().times(10).dedup().
tree().by(values(name))).
by(repeat(out().simplePath()).emit().times(10).dedup().
tree().by(values(name)))
I was trying to get the example for jaccard similarity found here to work for cosine similarity but wanted to limit the number of created links to the top 10 scores.
I reviewed https://gist.github.com/dkuppitz/79e0b009f0c9ae87db5a but couldn't figure out how to skip over the edge creation piece to sort it before and get the same results as from the link above.
Based on the jaccard example above this is what I have come up with so far:
g.V().
match(
__.as('v1').outE('RECOMMENDS').values('amount').fold().as('v1rec'),
__.as('v1').V().as('v2'),
__.as('v2').outE('RECOMMENDS').values('amount').fold().as('v2rec'),
__.as('v1').out().dedup().fold().as('v1n'),
__.as('v2').out().dedup().fold().as('v2n')
).
where('v1',lt('v2')).
by(id).
where('v1',neq('v2').and(without('v1n'))).
where('v2',without('v1n')).
project('v1','v2','n','d1','d2').
by(select('v1')).
by(select('v2')).
by(
select('v1rec','v2rec') <-- this does not work, can't get dot product from this
).
by(coalesce(
select('v1rec').
unfold().
math('_ ^ 2').
sum(),
constant(0))).
by(coalesce(
select('v2rec').
unfold().
math('_ ^ 2').
sum(),
constant(0))).
filter(select('d1').is(gt(0))).
filter(select('d2').is(gt(0))).
project('v1','v2','cosine').
by(select('v1')).
by(select('v2')).
by(math('n/(sqrt(d1)*sqrt(d2))')).
sort{-it.cosine}.
toList()[0..9].
each {
r -> g.V(r['v2']).as('v2').
V(r['v1']).
addE('PREDICTED_COSINE').
to('v2').
property('score', r['cosine']).
toList()
}
but can't figure out how to get the dot product in the third by step with select('v1rec','v2rec'). Please help.
UPDATE:
I couldn't fit this in a comment so posting here:
I tried another approach that gets me closer (I think) but still have an issue iterating over each list of maps to extract values from each:
g.V().
match(
__.as('v1').outE().as('e1'),
__.as('v1').V().as('v2'),
__.as('v2').outE().as('e2'),
__.as('v1').out().dedup().fold().as('v1n'),
__.as('v2').out().dedup().fold().as('v2n')).
where('v1',neq('v2').
and(without('v1n'))).
where('v2',without('v1n')).
project('v1','v2','a1','a2').
by(select('v1')).
by(select('v2')).
by(select('e1').by('amount')).
by(select('e2').by('amount')).
project('v1','v2','n','d1','d2').
by(select('v1')).
by(select('v2')).
by(math('a1 * a2')).
by(math('a1 * a1')).
by(math('a2 * a2')).
group().
by(select('v1','v2')).
unfold()
One line of output:
==>{v1=v[4240], v2=v[8320]}=[{v1=v[4240], v2=v[8320], n=210.0, d1=196.0, d2=225.0}, {v1=v[4240], v2=v[8320], n=182.0, d1=196.0, d2=169.0}, {v1=v[4240], v2=v[8320], n=182.0, d1=196.0, d2=169.0}, {v1=v[4240], v2=v[8320], n=45.0, d1=9.0, d2=225.0}, {v1=v[4240], v2=v[8320], n=39.0, d1=9.0, d2=169.0}, {v1=v[4240], v2=v[8320], n=39.0, d1=9.0, d2=169.0}, {v1=v[4240], v2=v[8320], n=45.0, d1=9.0, d2=225.0}, {v1=v[4240], v2=v[8320], n=39.0, d1=9.0, d2=169.0}, {v1=v[4240], v2=v[8320], n=39.0, d1=9.0, d2=169.0}]
My goal is to sum up all the "n", "d1", and "d2" values from the maps so I can calculate the similarity as sum(n)/(sqrt(sum(d1))*sqrt(sum(d2))) for each key (such as {v1=v[4240], v2=v[8320]} outside the list from the example so n would be 210 + 182 + 182 + 45 + 39 + 39 + 45 + 39 + 39 = 820). I want to do this for a bunch of graph so I don't have one specific for this. Make sense now?
This is what I finally came up with:
g.V().
match(
__.as('v1').outE().as('e1'),
__.as('v1').V().as('v2'),
__.as('v2').outE().as('e2'),
__.as('v1').out().dedup().fold().as('v1n'),
__.as('v2').out().dedup().fold().as('v2n')
).
where('v1',neq('v2').
and(without('v1n'))).
where('v2',without('v1n')).
project('v1','v2','a1','a2').
by(select('v1')).
by(select('v2')).
by(select('e1').by('amount')).
by(select('e2').by('amount')).
project('v1','v2','n','d1','d2').
by(select('v1')).
by(select('v2')).
by(math('a1 * a2')).
by(math('a1 * a1')).
by(math('a2 * a2')).
group().
by(select('v1','v2')).
unfold().
project('v1','v2','n','d1','d2').
by(select(keys).select('v1')).
by(select(keys).select('v2')).
by(select(values).local(unfold().select('n').sum())).
by(select(values).local(unfold().select('d1').sum())).
by(select(values).local(unfold().select('d2').sum())).
project('v1','v2','c').
by(select('v1')).
by(select('v2')).
by(math('n / (sqrt(d1) * sqrt(d2))')).
sort{ -it.c }.
toList()[0..9]
Thank for all the help.