Query to find node which has only one vertex in common - graph

I have the following vertices -
Person1 -> Device1 <- Person2
^
| |
v
Email1 <- Person3
Now I want to write a gremlin query (janusgraph) which will give me all persons connected to the device(only) with which person1 is connected.
So according to the above graph, our output should be - [Person2].
Person3 is not in output because Person3 is also connected with "Email1" of "Person1".

g.addV('person').property('name', 'Person1').as('p1').
addV('person').property('name', 'Person2').as('p2').
addV('person').property('name', 'Person3').as('p3').
addV('device').as('d1').
addV('email').as('e1').
addE('HAS_DEVICE').from('p1').to('d1').
addE('HAS_EMAIL').from('p1').to('e1').
addE('HAS_DEVICE').from('p2').to('d1').
addE('HAS_DEVICE').from('p3').to('d1').
addE('HAS_EMAIL').from('p3').to('e1')
The following traversal will give you the person vertices that are connected to "Person1" via one or more "device" vertices and not connected via any other type of vertices.enter code here
g.V().has('person', 'name', 'Person1').as('p1').
out().as('connector').
in().where(neq('p1')).
group().
by().
by(select('connector').label().fold()).
unfold().
where(
select(values).
unfold().dedup().fold(). // just in case the persons are connected by multiple devices
is(eq(['device']))
).
select(keys)

Related

Janusgraph Gremlin addE created different vertices id after index implemented

I am new to janusgraph and gremlin, when I try to create an edge with two existing vertices, I expect the return edge information, the edge source and end vertices id should be the same as I used to create the edge, however, it is not, two new ids are return. Moreover, when I try to find the edges connecting to one of the vertex ("tom"), I found that ("tom") has an edge connecting from and to it-self with has a different id, but the vertice count is just 2.
gremlin> g.V().count()
==>0
gremlin> tom = g.addV("party").property("name", "Tom").property("identity_number", "01234567")\
.property("identity_type", "PASSPORT").property("identity_country", "USA").next()
==>v[57402]
gremlin> mary = g.addV("party").property("name", "Mary").property("identity_number", "76543210")\
.property("identity_type", "PASSPORT").property("identity_country", "USA").next()
==>v[61626]
gremlin> g.V(tom).addE('relationship').to(mary)
==>e[3k4-18ci-80et-1bia][57474-relationship->61570]
gremlin> g.V(tom).bothE().otherV().path().by(__.valueMap().with(WithOptions.tokens))
==>path[{id=57402, label=party, identity_country=[USA], identity_number=[01234567],\
identity_type=[PASSPORT], name=[Tom]}, {id=3k4-18ci-80et-1bia, label=relationship},\
{id=57474, label=party, identity_country=[USA], identity_number=[01234567], identity_type=[PASSPORT],\
name=[Tom]}]
gremlin> g.V().count()
==>2
Could anyone tell me if this is a normal situation? or if there is some configuration make this happened?
Many Thanks.
UPDATE:
I find that this situation is happened after I implemented the janusgraph index by the following code:
m = amlGraph.openManagement();
party = m.makeVertexLabel('party').partition().make();
relationship = m.makeEdgeLabel('relationship').make();
identity_country_key = m.makePropertyKey('identity_country').dataType(String.class).make();
identity_number_key = m.makePropertyKey('identity_number').dataType(String.class).make();
identity_type_key = m.makePropertyKey('identity_type').dataType(String.class).make();
name_key = m.makePropertyKey('name').dataType(String.class).make();
first_seen_datetime_key = m.makePropertyKey('first_seen_datetime').dataType(Date.class).make();
relationship_type_key = m.makePropertyKey('relationship_type').dataType(String.class).make();
party = m.getVertexLabel('party');
identity_country_key = m.getPropertyKey('identity_country');
identity_number_key = m.getPropertyKey('identity_number');
identity_type_key = m.getPropertyKey('identity_type');
name_key = m.getPropertyKey('name');
m.buildIndex('partyMixed', Vertex.class).addKey(identity_country_key, Mapping.TEXTSTRING.asParameter(), Parameter.of('identity_country', 'identity_country')).addKey(identity_number_key, Mapping.TEXTSTRING.asParameter(), Parameter.of('identity_number', 'identity_number')).addKey(identity_type_key, Mapping.TEXTSTRING.asParameter(), Parameter.of('identity_type', 'identity_type')).addKey(name_key, Mapping.TEXTSTRING.asParameter(), Parameter.of('name', 'name')).indexOnly(party).buildMixedIndex('search');
relationship = m.getEdgeLabel('relationship');
first_seen_datetime_key = m.getPropertyKey('first_seen_datetime');
relationship_type_key = m.getPropertyKey('relationship_type');
m.buildIndex('relationshipMixed', Edge.class).addKey(first_seen_datetime_key).addKey(relationship_type_key).indexOnly(relationship).buildMixedIndex('search');
m.commit()
Which version of JanusGraph are you using ? May be if you are using an older version it can be a bug...
I used one of the latest version(0.5.3) and tried to generate the same scenario and I am getting the correct ID.
gremlin>
gremlin> tom = g.addV("party").property("name", "Tom").property("identity_number", "01234567").property("identity_type", "PASSPORT").property("identity_country", "USA").next()
==>v[4112]
gremlin>
gremlin> mary = g.addV("party").property("name", "Mary").property("identity_number", "76543210").property("identity_type", "PASSPORT").property("identity_country", "USA").next()
==>v[40964232]
gremlin>
gremlin> g.V(tom).addE('relationship').to(mary)
==>e[2rm-368-3ehh-oe07c][4112-relationship->40964232]
gremlin>

How sort vertices by edge values in Gremlin

Given the air-routes graph, say I want to get all possible one-stopover routes, like so:
[home] --distance--> [stopover] --distance--> [destination]
where [home], [stopover] and [destination] are airport nodes that each have a property 'code' that represent an airport code; and distance is an integer weight given to each edge connecting two airport nodes.
How could I write a query that gets me the airport codes for [home], [stopover] and [destination] such that the results are sorted as follows:
[home] airport codes are sorted alphabetically.
For each group of [home] airport, the [stopover] airport codes are sorted by the distance between [home] and [stopover] (ascending).
After sorting 1 and 2, [destination] airport codes are sorted by the distance between [stopover] and [destination].
(Note: it doesn't matter if [home] and [destination] are the same airport)
One way you could do this is through group with nested by modulation.
g.V().
group().
by('code').
by(
outE('route').
order().by('dist').
inV().
group().
by('code').
by(
outE('route').
order().by('dist').
inV().
values('code').fold())).
unfold()
The result is something like:
1. {'SHH': {'WAA': ['KTS', 'SHH', 'OME'], 'OME': ['TLA', 'WMO', 'KTS', 'GLV', 'ELI', 'TNC', 'WAA', 'WBB', 'SHH', 'SKK', 'KKA', 'UNK', 'SVA', 'OTZ', 'GAM', 'ANC']}}
2. {'KWN': {'BET': ['WNA', 'KWT', 'ATT', 'KUK', 'TLT', 'EEK', 'WTL', 'KKH', 'KWN', 'KLG', 'MLL', 'KWK', 'PQS', 'CYF', 'KPN', 'NME', 'OOK', 'GNU', 'VAK', 'SCM', 'HPB', 'EMK', 'ANC'], 'EEK': ['KWN', 'BET'], 'TOG': ['KWN']}}
3. {'NUI': {'SCC': ['NUI', 'BTI', 'BRW', 'FAI', 'ANC'], 'BRW': ['ATK', 'AIN', 'NUI', 'PIZ', 'SCC', 'FAI', 'ANC']}}
4. {'PSG': {'JNU': ['HNH', 'GST', 'HNS', 'SGY', 'SIT', 'KAE', 'PSG', 'YAK', 'KTN', 'ANC', 'SEA'], 'WRG': ['PSG', 'KTN']}}
5. {'PIP': {'UGB': ['PTH']}}
.
.
.

Gremlin: how can I return vertex and their associated vertex?

I Need to return some groups and people in that group, like this:
Group A
-----Person A
-----Person B
-----Person C
Group B
-----Person D
-----Person E
-----Person F
How can I do that with gremlin. They are connected to group with a edge.
It is always helpful to include a sample graph with your questions on Gremlin preferably as a something easily pasted to the Gremlin Console as follows:
g.addV('group').property('name','Group A').as('ga').
addV('group').property('name','Group B').as('gb').
addV('person').property('name','Person A').as('pa').
addV('person').property('name','Person B').as('pb').
addV('person').property('name','Person C').as('pc').
addV('person').property('name','Person D').as('pd').
addV('person').property('name','Person E').as('pe').
addV('person').property('name','Person F').as('pf').
addE('contains').from('ga').to('pa').
addE('contains').from('ga').to('pb').
addE('contains').from('ga').to('pc').
addE('contains').from('gb').to('pd').
addE('contains').from('gb').to('pe').
addE('contains').from('gb').to('pf').iterate()
A solution to your problem is to use group() step:
gremlin> g.V().has('group', 'name', within('Group A','Group B')).
......1> group().
......2> by('name').
......3> by(out('contains').values('name').fold())
==>[Group B:[Person D,Person E,Person F],Group A:[Person A,Person B,Person C]]

Gremlin query uneven result issue

Suppose I have 3 students (A,B,C) and having a major subject and marks respectievely but when I query the result shown in a uneven way.
Data
A -> Math -> 77
B -> History -> 70
C -> Science -> 97
Query
g.V('Class').has('name',within('A','B','C'))
Result
{"student_name":['A','B','C'], "major_subject":['Math','Science','History'], "marks":[70,77,97]}
The data displayed by querying the database is not in order according to the name of the student.
I assume that your graph looks kinda like this:
g = TinkerGraph.open().traversal()
g.addV('student').property('name', 'A').
addE('scored').to(addV('subject').property('name', 'Math')).
property('mark', 77).
addV('student').property('name', 'B').
addE('scored').to(addV('subject').property('name', 'History')).
property('mark', 70).
addV('student').property('name', 'C').
addE('scored').to(addV('subject').property('name', 'Science')).
property('mark', 97).iterate()
Now the easiest way to gather the data is this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('student').
outE('scored').as('mark').inV().as('major').
select('student','major','mark').
by('name').
by('name').
by('mark')
==>[student:A,major:Math,mark:77]
==>[student:B,major:History,mark:70]
==>[student:C,major:Science,mark:97]
But if you really depend on the format shown in your question, you can do this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).
store('student').by('name').
outE('scored').store('mark').by('mark').
inV().store('major').by('name').
cap('student','major','mark')
==>[major:[Math,History,Science],student:[A,B,C],mark:[77,70,97]]
If you want to get the cap'ed result to be ordered by marks, you'll need a mix of the 2 queries:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('a').
outE('scored').as('b').
order().
by('mark').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[History,Math,Science],student:[B,A,C],mark:[70,77,97]]
To order by the order of inputs:
gremlin> input = ['C', 'B', 'A']; []
gremlin> g.V().has('student', 'name', within(input)).as('a').
order().
by {input.indexOf(it.value('name'))}.
outE('scored').as('b').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[97,70,77],student:[C,B,A],mark:[Science,History,Math]]

How would you model this non-relational database?

I am making a graph database using Neo4j and I'm wondering what's the best way to model this case:
Person1 > told > quote > to > Person2 > who told it to -> Person3 -> who told it to -> Person4 > Who told it to -> Person1
I've thought about quote being an attribute of link. But then maybe quote needs also to be a node. In this case the edges would be "told" and "was_told". Like:
Person1 -> created > quote
Quote attributes: id, text
Person attributes: id, name
Person2 > told: {to: Person 3} > quote
Person3 > was_told: {by: Person2} > quote
or:
Person3 > told:quote > Person1
What's the best approach to use to model this database?
I think you need the following model:
A fragment (talk) of a conversation (including time)
Who was the speaker of this fragment
Who was an audience of this fragment
Content (quote) of this fragment
For example, here's the code for creating the first fragment:
MERGE (P1:Person {name:'Person1'})
MERGE (P2:Person {name:'Person2'})
MERGE (Q:Quote {name:'Quote1', text:'Quote1 text'})
MERGE (P1)<-[:has_speaker]-(T1:Talk {name:'Talk1', time: 1})-[:has_audience]->(P2)
MERGE (T1)-[:talk_about]->(Q)
Visualization:
The query for the entire life cycle of a quote:
MATCH (Q:Quote {name:'Quote1', text:'Quote1 text'})<-[:talk_about]-(T:Talk)
WITH Q, T
MATCH (P1:Person)<-[:has_speaker]-(T)-[:has_audience]->(P2)
WITH Q, T, P1 as speaker, collect(P2.name) as audience ORDER BY T.time ASC
RETURN Q as quote,
collect( {time: T.time,
speaker: speaker.name,
audience: audience}
) as quoteTimeline

Resources