Janusgraph Gremlin addE created different vertices id after index implemented - gremlin

I am new to janusgraph and gremlin, when I try to create an edge with two existing vertices, I expect the return edge information, the edge source and end vertices id should be the same as I used to create the edge, however, it is not, two new ids are return. Moreover, when I try to find the edges connecting to one of the vertex ("tom"), I found that ("tom") has an edge connecting from and to it-self with has a different id, but the vertice count is just 2.
gremlin> g.V().count()
==>0
gremlin> tom = g.addV("party").property("name", "Tom").property("identity_number", "01234567")\
.property("identity_type", "PASSPORT").property("identity_country", "USA").next()
==>v[57402]
gremlin> mary = g.addV("party").property("name", "Mary").property("identity_number", "76543210")\
.property("identity_type", "PASSPORT").property("identity_country", "USA").next()
==>v[61626]
gremlin> g.V(tom).addE('relationship').to(mary)
==>e[3k4-18ci-80et-1bia][57474-relationship->61570]
gremlin> g.V(tom).bothE().otherV().path().by(__.valueMap().with(WithOptions.tokens))
==>path[{id=57402, label=party, identity_country=[USA], identity_number=[01234567],\
identity_type=[PASSPORT], name=[Tom]}, {id=3k4-18ci-80et-1bia, label=relationship},\
{id=57474, label=party, identity_country=[USA], identity_number=[01234567], identity_type=[PASSPORT],\
name=[Tom]}]
gremlin> g.V().count()
==>2
Could anyone tell me if this is a normal situation? or if there is some configuration make this happened?
Many Thanks.
UPDATE:
I find that this situation is happened after I implemented the janusgraph index by the following code:
m = amlGraph.openManagement();
party = m.makeVertexLabel('party').partition().make();
relationship = m.makeEdgeLabel('relationship').make();
identity_country_key = m.makePropertyKey('identity_country').dataType(String.class).make();
identity_number_key = m.makePropertyKey('identity_number').dataType(String.class).make();
identity_type_key = m.makePropertyKey('identity_type').dataType(String.class).make();
name_key = m.makePropertyKey('name').dataType(String.class).make();
first_seen_datetime_key = m.makePropertyKey('first_seen_datetime').dataType(Date.class).make();
relationship_type_key = m.makePropertyKey('relationship_type').dataType(String.class).make();
party = m.getVertexLabel('party');
identity_country_key = m.getPropertyKey('identity_country');
identity_number_key = m.getPropertyKey('identity_number');
identity_type_key = m.getPropertyKey('identity_type');
name_key = m.getPropertyKey('name');
m.buildIndex('partyMixed', Vertex.class).addKey(identity_country_key, Mapping.TEXTSTRING.asParameter(), Parameter.of('identity_country', 'identity_country')).addKey(identity_number_key, Mapping.TEXTSTRING.asParameter(), Parameter.of('identity_number', 'identity_number')).addKey(identity_type_key, Mapping.TEXTSTRING.asParameter(), Parameter.of('identity_type', 'identity_type')).addKey(name_key, Mapping.TEXTSTRING.asParameter(), Parameter.of('name', 'name')).indexOnly(party).buildMixedIndex('search');
relationship = m.getEdgeLabel('relationship');
first_seen_datetime_key = m.getPropertyKey('first_seen_datetime');
relationship_type_key = m.getPropertyKey('relationship_type');
m.buildIndex('relationshipMixed', Edge.class).addKey(first_seen_datetime_key).addKey(relationship_type_key).indexOnly(relationship).buildMixedIndex('search');
m.commit()

Which version of JanusGraph are you using ? May be if you are using an older version it can be a bug...
I used one of the latest version(0.5.3) and tried to generate the same scenario and I am getting the correct ID.
gremlin>
gremlin> tom = g.addV("party").property("name", "Tom").property("identity_number", "01234567").property("identity_type", "PASSPORT").property("identity_country", "USA").next()
==>v[4112]
gremlin>
gremlin> mary = g.addV("party").property("name", "Mary").property("identity_number", "76543210").property("identity_type", "PASSPORT").property("identity_country", "USA").next()
==>v[40964232]
gremlin>
gremlin> g.V(tom).addE('relationship').to(mary)
==>e[2rm-368-3ehh-oe07c][4112-relationship->40964232]
gremlin>

Related

Query to find node which has only one vertex in common

I have the following vertices -
Person1 -> Device1 <- Person2
^
| |
v
Email1 <- Person3
Now I want to write a gremlin query (janusgraph) which will give me all persons connected to the device(only) with which person1 is connected.
So according to the above graph, our output should be - [Person2].
Person3 is not in output because Person3 is also connected with "Email1" of "Person1".
g.addV('person').property('name', 'Person1').as('p1').
addV('person').property('name', 'Person2').as('p2').
addV('person').property('name', 'Person3').as('p3').
addV('device').as('d1').
addV('email').as('e1').
addE('HAS_DEVICE').from('p1').to('d1').
addE('HAS_EMAIL').from('p1').to('e1').
addE('HAS_DEVICE').from('p2').to('d1').
addE('HAS_DEVICE').from('p3').to('d1').
addE('HAS_EMAIL').from('p3').to('e1')
The following traversal will give you the person vertices that are connected to "Person1" via one or more "device" vertices and not connected via any other type of vertices.enter code here
g.V().has('person', 'name', 'Person1').as('p1').
out().as('connector').
in().where(neq('p1')).
group().
by().
by(select('connector').label().fold()).
unfold().
where(
select(values).
unfold().dedup().fold(). // just in case the persons are connected by multiple devices
is(eq(['device']))
).
select(keys)

Gremlin-python: how to write a query which uses the properties of both joined vertices in the where condition?

I am new to Gremlin and am trying to convert a SQL query to Gremlin. I have two vertex types, labeled host and repo, and here is the Gremlin script to create the vertices and the edges:
g.addV('asset').property(id, 'a1').property('ip', '127.4.8.51').property('scanDate', '2020-09-10').property('repoId', 1)
g.addV('asset').property(id, 'a2').property('ip', '127.4.8.55').property('scanDate', '2020-09-20').property('repoId', 1)
g.addV('asset').property(id, 'a3').property('ip', '127.4.8.57').property('scanDate', '2020-09-21').property('repoId', 1)
g.addV('asset').property(id, 'a4').property('ip', '127.4.10.36').property('scanDate', '2020-09-12').property('repoId', 2)
g.addV('asset').property(id, 'a5').property('ip', '127.4.10.75').property('scanDate', '2020-09-14').property('repoId', 2)
g.addV('repo').property(id, 'r1').property('repoName', 'repo1').property('assetAge', 10).property('repoId', 1)
g.addV('repo').property(id, 'r2').property('repoName', 'repo2').property('assetAge', 9).property('repoId', 2)
g.V('a1').addE('has').to(g.V('r1'))
g.V('a2').addE('has').to(g.V('r1'))
g.V('a3').addE('has').to(g.V('r1'))
g.V('a4').addE('has').to(g.V('r2'))
g.V('a5').addE('has').to(g.V('r2'))
I would like to write down a query in Gremlin that does the same thing as the bellow SQL query:
SELCET *
FROM asset a
JOIN repo r ON a.repoId = r.repoId
WHERE a.scanDate >= CURDATE() - INTERVAL (r.assetAge + 1) DAY
I have so far tried the bellow code in python:
from datetime import datetime, timedelta
from gremlin_python.process.traversal import gte
d = datetime.today() - timedelta(days=10) # here I have hard coded the days
traversal = g.V().hasLabel("asset").has("scanDate", gte(d))
traversal.valueMap().toList()
But I do not know how I can pass the repo.assetAge from the mapped repo vertices to the days parameter in timedelta(). Any help is really appreciated. Thanks.

How to get in / out spanning tree of a node with readable labels in response with gremlin?

This question is to how get a more readable response from gremlin that can be used to further process the data. I can get the spanning tree with simplePath step with some arbitrary depth (10 here) as follows:
g.V().has(name, foo).repeat(both().simplePath()).emit().times(10).dedup()
or further break down by in / out:
g.V().has(name, foo).repeat(in().simplePath()).emit().times(10).dedup()
g.V().has(name, foo).repeat(out().simplePath()).emit().times(10).dedup()
but this returns an array of vertices without clear grouping, for example I can't know, just by getting the response, what are the in vertices of an in vertex or any other depth.
Is there a way to get a nested response back or clearly label each level and the in / out vertices at those levels in gremlin?
You can use loops() and group().by() to accomplish this. Example below uses the air-routes dataset from Practical Gremlin.
g.V().has('airport', 'code', 'ANC').
emit().repeat(group('x').by(loops()).by(__.fold()).out().dedup()).times(3).
cap('x').unfold()
This returns:
{0: [v[2]]}
{1: [v[3], v[1101], v[1102], v[1103], v[1104], v[1105], v[1106], v[1107], v[1108], v[1109], v[1110], v[1111], v[1112], v[1113], v[1114], v[8], v[11], v[13], v[17], v[18], v[20], v[22], v[23], v[27], v[30], v[31], v[37], v[884], v[886], v[149], v[159], v[1092], v[1093], v[1094], v[1095], v[1096], v[1097], v[1098], v[1099], v[1100], v[2658]]}
{2: [v[368], v[371], v[375], v[389], v[1], v[4], v[5], v[6], v[7], v[9], v[10], v[12], v[15], v[16], v[21], v[24], v[25], v[26], v[28], v[29], v[34], v[35], v[38], v[39], v[41], v[42], v[43], v[45], v[46], v[47], v[49], v[50], v[416], v[430], v[444], v[52], v[99], v[1274], v[136], v[147], v[150], v[549], v[929], v[151], v[178], v[180], v[182], v[183], v[184], v[185], v[186], v[187], v[188], v[189], v[190], v[193], v[194], v[227], v[239], v[240], v[244], v[265], v[268], v[269], v[273], v[278], v[280], v[281], v[2], v[2335], v[2336], v[2338], v[2355], v[2357], v[2365], v[2379], v[3208], v[1283], v[2358], v[2388], v[1771], v[3133], v[2983], v[2618], v[2337], v[2340], v[2350], v[1770], v[3008], v[2333], v[2334], v[2342], v[2346], v[2363], v[2364], v[2368], v[2369], v[2377], v[2381], v[2384], v[2387], v[2874], v[2321], v[2326], v[2331], v[2341], v[2359], v[2366], v[2373], v[2378], v[2386], v[3256], v[3257], v[2320], v[2361], v[2353], v[2383], v[2616], v[2351], v[301], v[305], v[306], v[314], v[356], v[357], v[358], v[359], v[360], v[361], v[362], v[363], v[364], v[365], v[366], v[367], v[369], v[370], v[372], v[373], v[374], v[376], v[377], v[379], v[380], v[381], v[382], v[383], v[384], v[385], v[386], v[387], v[388], v[390], v[391], v[392], v[393], v[394], v[395], v[396], v[397], v[398], v[399], v[400], v[1123], v[1127], v[14], v[19], v[33], v[36], v[40], v[44], v[48], v[401], v[402], v[403], v[404], v[405], v[406], v[407], v[408], v[409], v[410], v[411], v[412], v[413], v[414], v[415], v[417], v[418], v[419], v[420], v[421], v[422], v[423], v[424], v[425], v[426], v[427], v[428], v[429], v[431], v[432], v[441], v[443], v[445], v[51], v[54], v[55], v[58], v[60], v[61], v[64], v[67], v[68], v[70], v[74], v[80], v[83], v[85], v[86], v[865], v[872], v[877], v[883], v[890], v[895], v[2076], v[2082], v[3259], v[106], v[122], v[131], v[132], v[133], v[134], v[135], v[138], v[525], v[530], v[909], v[919], v[921], v[934], v[936], v[941], v[950], v[3306], v[164], v[181], v[195], v[196], v[198], v[563], v[571], v[573], v[575], v[576], v[577], v[580], v[969], v[208], v[211], v[217], v[225], v[237], v[238], v[241], v[243], v[245], v[246], v[603], v[605], v[625], v[629], v[630], v[640], v[3416], v[263], v[264], v[267], v[270], v[271], v[277], v[288], v[289], v[291], v[292], v[295], v[296], v[297], v[300], v[651], v[1082], v[312], v[378], v[1117], v[1118], v[1119], v[1120], v[1121], v[1122], v[1124], v[1125], v[1126], v[435], v[63], v[84], v[899], v[900], v[102], v[161], v[205], v[220], v[223], v[224], v[236], v[608], v[304], v[318], v[337], v[701], v[714], v[837], v[1241], v[53], v[56], v[57], v[66], v[73], v[75], v[76], v[92], v[93], v[100], v[864], v[868], v[870], v[874], v[876], v[878], v[882], v[885], v[887], v[888], v[889], v[891], v[892], v[893], v[894], v[896], v[897], v[898], v[3283], v[103], v[105], v[109], v[140], v[146], v[148], v[901], v[902], v[903], v[907], v[916], v[924], v[177], v[199], v[200], v[567], v[568], v[213], v[215], v[249], v[250], v[638], v[3046], v[3426], v[272], v[294], v[299], v[665], v[669], v[676], v[677], v[307], v[437], v[438], v[439], v[447], v[449], v[450], v[917], v[918], v[938], v[943], v[947], v[949], v[3314], v[3326], v[152], v[153], v[166], v[958], v[961], v[964], v[611], v[613], v[620], v[623], v[632], v[283], v[290], v[293], v[1069], v[1072], v[1073], v[1074], v[1075], v[1076], v[1077], v[1078], v[1079], v[1080], v[1151], v[32], v[434], v[436], v[446], v[448], v[59], v[78], v[79], v[90], v[482], v[1295], v[2074], v[2075], v[3288], v[110], v[112], v[145], v[526], v[527], v[529], v[2107], v[163], v[192], v[956], v[957], v[959], v[960], v[962], v[963], v[965], v[966], v[967], v[968], v[970], v[202], v[242], v[609], v[612], v[615], v[649], v[650], v[3019], v[3439], v[266], v[286], v[652], v[1128], v[1129], v[1534], v[3347], v[951], v[87], v[863], v[1272], v[1273], v[1275], v[1276], v[920], v[866], v[867], v[869], v[871], v[873], v[875], v[107], v[671], v[673], v[1294], v[2073], v[913], v[1050], v[1438], v[282], v[1070], v[1071], v[1661], v[3286], v[3287], v[911], v[912], v[915], v[923], v[926], v[930], v[931], v[932], v[933], v[935], v[937], v[939], v[940], v[944], v[945], v[946], v[948], v[3323], v[952], v[953], v[954], v[955], v[628], v[71], v[458], v[212], v[214], v[639], v[641], v[642], v[643], v[644], v[647], v[648], v[2100], v[2322], v[2323], v[2324], v[2332], v[2347], v[2352], v[2362], v[2367], v[2374], v[2380], v[2389], v[2048], v[2420], v[2934], v[2601], v[3028], v[2325], v[2327], v[2328], v[2329], v[2330], v[2339], v[2345], v[2354], v[2356], v[2360], v[2371], v[2382], v[2385], v[3122], v[2024], v[3258], v[1768], v[1772], v[2621], v[3006], v[3009], v[2370], v[2376], v[2396], v[2343], v[2372], v[3203], v[2313], v[2344], v[2349], v[3236], v[1769], v[710], v[657]]}
with a combination of .project() and tree() steps.
Something like:
g.V().has(name, foo).project('in', 'out').
by(repeat(in().simplePath()).emit().times(10).dedup().
tree().by(values(name))).
by(repeat(out().simplePath()).emit().times(10).dedup().
tree().by(values(name)))

Gremlin query uneven result issue

Suppose I have 3 students (A,B,C) and having a major subject and marks respectievely but when I query the result shown in a uneven way.
Data
A -> Math -> 77
B -> History -> 70
C -> Science -> 97
Query
g.V('Class').has('name',within('A','B','C'))
Result
{"student_name":['A','B','C'], "major_subject":['Math','Science','History'], "marks":[70,77,97]}
The data displayed by querying the database is not in order according to the name of the student.
I assume that your graph looks kinda like this:
g = TinkerGraph.open().traversal()
g.addV('student').property('name', 'A').
addE('scored').to(addV('subject').property('name', 'Math')).
property('mark', 77).
addV('student').property('name', 'B').
addE('scored').to(addV('subject').property('name', 'History')).
property('mark', 70).
addV('student').property('name', 'C').
addE('scored').to(addV('subject').property('name', 'Science')).
property('mark', 97).iterate()
Now the easiest way to gather the data is this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('student').
outE('scored').as('mark').inV().as('major').
select('student','major','mark').
by('name').
by('name').
by('mark')
==>[student:A,major:Math,mark:77]
==>[student:B,major:History,mark:70]
==>[student:C,major:Science,mark:97]
But if you really depend on the format shown in your question, you can do this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).
store('student').by('name').
outE('scored').store('mark').by('mark').
inV().store('major').by('name').
cap('student','major','mark')
==>[major:[Math,History,Science],student:[A,B,C],mark:[77,70,97]]
If you want to get the cap'ed result to be ordered by marks, you'll need a mix of the 2 queries:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('a').
outE('scored').as('b').
order().
by('mark').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[History,Math,Science],student:[B,A,C],mark:[70,77,97]]
To order by the order of inputs:
gremlin> input = ['C', 'B', 'A']; []
gremlin> g.V().has('student', 'name', within(input)).as('a').
order().
by {input.indexOf(it.value('name'))}.
outE('scored').as('b').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[97,70,77],student:[C,B,A],mark:[Science,History,Math]]

Data to plot graph in gremlin

I am new to gremlin and have a requirement to provide the data to plot a graph
Graph has
x axis -> timestamp
y-axis -> Sum of product
Below is the data in graph format
Nodes relation(properties) Nodes
userA likes(timestamp = 22/02/2013) productXY
userX likes(timestamp = 21/05/2013) productAA
userG likes(timestamp = 22/07/2014) productXB
userT likes(timestamp = 03/02/2013) productXR
userA likes(timestamp = 22/02/2013) productXT
userC likes(timestamp = 19/11/2014) productUY
userD likes(timestamp = 22/07/2014) productPY
userE likes(timestamp = 09/07/2013) productLY
userJ likes(timestamp = 09/07/2013) productXY
userP likes(timestamp = 09/07/2013) productKY
Output of the query should be like this.
[09/07/2013, 3]
[22/02/2013, 2]
[21/05/2013, 1]
[22/07/2014, 2]
[03/02/2013, 1]
[19/11/2014, 1]
Could somebody help me in bulind the query.
Note: I have using Rexster RESTAPI to render the data to the application.
Thanks in advance
Answered in the gremlin-users mailing list:
Basically -
you can use the Gremlin extension and this query:
g.V().outE("likes").timestamp.groupCount().cap().next()

Resources