I have a similar graph as provided here. I have simplified to be with airports as vertex and edge as a person travelling though those airports. I want to find the number of people who have travelled to the two airport from b to f (except airport d). Also I want to order the graph by the highest to lowest traffic.
Sample Graph:
https://gremlify.com/bgdnijf9xs6
If above question doesn't provide clarity. Here's simple form
Find the path between two vertex except through a mid vertex(you can take any vertex in the midpoint). Sort the path by the highest traffic based on edge property(property will have unique value and will be connected to vertex).
For identifying person we have uniquename on it. If uniquename is same then we know it's a person travelling to destination. So a edge with unique name from a -> b -> c is essentially same person travelling.
For the path query I have
g.V()
.has("name", 'b')
.repeat(
out('person').not(__.has('name', 'd'))
)
.until(has('name', 'f'))
.path()
.dedup()
.fold()
The output would be following:
b -> c -> c1 -> e -> f count(3) // 3 person travelled full path
b -> c -> b2 -> e -> f count(2) // 2 person travelled full path
b -> c -> b3 -> e -> f count(1) // 1 ...
Or if you want to go from a to g then
a -> b -> c -> c1 -> e -> f -> g count(3) // 3 person travelled full path
a -> b -> c -> b2 -> e -> f -> g count(2) // 2 person travelled full path
a -> b -> c -> b3 -> e -> f -> g count(1) // 1 ...
For what I have tried up till now: https://gremlify.com/fz54u5jiszo
Edit: Latest query I have come up with
g.V().has('name', 'c').as('c')
.sideEffect(
V().has('name', 'a').aggregate('a')
.V().has('name', 'b').aggregate('b')
.V().has('name', 'e').aggregate('e')
.V().has('name', 'f').aggregate('f')
.V().has('name', 'g').aggregate('g')
)
.barrier()
// Get All users From Start To Finish
.sideEffect(
select('a').unfold().outE().where(inV().has('name', 'b')).dedup().aggregate('before_users')
)
.sideEffect(
select('b').unfold().outE().where(inV().has('name', 'c')).dedup().aggregate('before_users')
)
.sideEffect(
select('before_users').unfold().fold().unfold()
.groupCount()
.by(values('uniquename').fold())
.unfold()
.where(select(values).is(eq(2)))
.select(keys)
.unfold()
.aggregate('unique_before_users')
)
.sideEffect(
select('e').unfold().outE().where(inV().has('name', 'f')).dedup().aggregate('after_users')
)
.sideEffect(
select('f').unfold().outE().where(inV().has('name', 'g')).dedup().aggregate('after_users')
)
.sideEffect(
select('after_users').unfold().fold().unfold()
.groupCount()
.by(values('uniquename').fold())
.unfold()
.where(select(values).is(eq(2)))
.select(keys)
.unfold()
.aggregate('unique_after_users')
)
.sideEffect(
project('').
union(select('unique_after_users').unfold(), select('unique_before_users').unfold())
.groupCount()
.unfold()
.where(select(values).is(eq(2)))
.select(keys)
.unfold()
.aggregate('unique_users')
)
.barrier()
// Start to analyze traffic based on our crieteria
// not through d
.sideEffect(
identity()
.repeat(
outE()
.where(within('unique_users')).by('uniquename').by()
.inV()
.not(__.has('name', 'd'))
)
.until(has('name', 'e'))
.path()
.aggregate('allpath')
)
.select('allpath')
.unfold()
.map(
project('path', 'count')
.by(
identity()
)
.by(
identity().unfold().filter(where(hasLabel('airport'))).fold()
)
)
.groupCount()
.by('count')
Replicating sample graph:
g.addV('airport').as('1').property(single, 'name', 'a').
addV('airport').as('2').property(single, 'name', 'b').
addV('airport').as('3').property(single, 'name', 'c').
addV('airport').as('4').property(single, 'name', 'd').
addV('airport').as('5').property(single, 'name', 'e').
addV('airport').as('6').property(single, 'name', 'f').
addV('airport').as('7').property(single, 'name', 'g').
addV('airport').as('8').property(single, 'name', 'b1').
addV('airport').as('9').property(single, 'name', 'b2').
addV('airport').as('10').property(single, 'name', 'b3').
addE('person').from('1').to('2').property('uniquename', 'p1').
addE('person').from('1').to('2').property('uniquename', 'p2').
addE('person').from('2').to('3').property('uniquename', 'p3').
addE('person').from('2').to('3').property('uniquename', 'p1').
addE('person').from('2').to('3').property('uniquename', 'p4').
addE('person').from('2').to('3').property('uniquename', 'p21').
addE('person').from('2').to('3').property('uniquename', 'p2').
addE('person').from('2').to('3').property('uniquename', 'p22').
addE('person').from('2').to('3').property('uniquename', 'p31').
addE('person').from('3').to('4').property('uniquename', 'p1').
addE('person').from('3').to('8').property('uniquename', 'p21').
addE('person').from('3').to('8').property('uniquename', 'p2').
addE('person').from('3').to('8').property('uniquename', 'p22').
addE('person').from('3').to('9').property('uniquename', 'p3').
addE('person').from('3').to('10').property('uniquename', 'p4').
addE('person').from('3').to('9').property('uniquename', 'p31').
addE('person').from('4').to('5').property('uniquename', 'p1').
addE('person').from('5').to('6').property('uniquename', 'p1').
addE('person').from('5').to('6').property('uniquename', 'p21').
addE('person').from('5').to('6').property('uniquename', 'p2').
addE('person').from('5').to('6').property('uniquename', 'p22').
addE('person').from('6').to('7').property('uniquename', 'p1').
addE('person').from('6').to('7').property('uniquename', 'p21').
addE('person').from('6').to('7').property('uniquename', 'p2').
addE('person').from('6').to('7').property('uniquename', 'p22').
addE('person').from('8').to('5').property('uniquename', 'p21').
addE('person').from('8').to('5').property('uniquename', 'p2').
addE('person').from('8').to('5').property('uniquename', 'p22').
addE('person').from('9').to('5').property('uniquename', 'p3').
addE('person').from('10').to('5').property('uniquename', 'p4')
Using the Gremlin console, here is a query that uses sack to collect the uniquename values as the query progresses. The sack manipulation is a little odd looking as you cannot sack(sum) when dealing with strings.
gremlin> g.withSack([]).V().
......1> has("name", 'b').
......2> repeat(outE('person').
......3> sack(assign).
......4> by(union(sack().unfold(),values('uniquename')).fold()).
......5> inV().has('name', neq('d'))).
......6> until(has('name', 'f')).
......7> where(sack().unfold().dedup().count().is(1)).
......8> path().
......9> by('name').
.....10> by('uniquename')
When run this yields
==>[b,p21,c,p21,b1,p21,e,p21,f]
==>[b,p2,c,p2,b1,p2,e,p2,f]
==>[b,p22,c,p22,b1,p22,e,p22,f]
Related
I have a query that needs to get complete path based on property. There are locations and peoples. People can travel from a location to another so I want a complete map of where they started from and ended at
Suppose
P1 travels from a -> b -> c -> d -> e -> f
P2 travels from c -> d -> e -> f
P3 travels from a -> b -> c -> d
P4 travels from b -> c -> a -> d -> e -> f
P5 travels from e -> f -> a -> b -> c
P6 travels from d -> e -> a -> b -> c
P7 travels from a -> c -> e -> f
Those are the path that I want from the graph. Where p1, p2 ... pn is the property in edge called name.
I already came up with query but I don't know how to optimize it. Also it can't handle those people that travel from a vertex and end at same vertex I have time on every session (but maybe gremlin can't travel by previous time and same session?)
g.withSack([])
.V() // will eventually have some starting condition of about 10 unique people
.repeat(
choose(
loops().is(0),
outE().as('outgoing')
.where(
__.outV()
.inE().values('name')
.where(
eq('outgoing'))
.by()
.by(values('name')
)
.count().is(0)
)
.sack(assign)
.by(
union(
sack().unfold(),
identity().values('name')
)
.fold()
)
.filter(
sack().unfold().dedup().count().is(1)
)
.inV(),
outE()
.sack(assign)
.by(
union(
sack().unfold(),
identity().values('name')
)
.fold()
)
.filter(
sack().unfold().dedup().count().is(1)
)
.inV()
)
)
.until(
outE().filter(sack().unfold().dedup().count().is(1)).count().is(1)
)
.filter(path().unfold().count().is(gt(5)))
.path()
Right now there are several limitations of it. It gets every starting path from a provided vertex. But the query frequently runs into
{
"detailedMessage": "A timeout occurred within the script during evaluation.",
"requestId": "f34358bd-9db9-488f-be66-613a34d29f9b",
"code": "TimeLimitExceededException"
}
Or memory exception. Is there some way to optimize this query? I can't exactly replicate this in gremlify since I have about 50,000 unique sessions and each of those travel anywhere from 2 to 50 vertex.
I will eventually perform traffic analysis on it but I still can't get this to perform within default neptune time even with about 1000 limit. But I would like to get this within 10 sec at max if possible. Or 30 at the upper limit
Here's relatively simple way to replicate this graph
g.addV('place').as('1').
property(single, 'placename', 'a').
addV('place').as('2').
property(single, 'placename', 'b').
addV('place').as('3').
property(single, 'placename', 'c').
addV('place').as('4').
property(single, 'placename', 'd').
addV('place').as('5').
property(single, 'placename', 'e').
addV('place').as('6').
property(single, 'placename', 'f').
addV('place').as('7').
property(single, 'placename', 'g').
addV('place').as('8').
property(single, 'placename', 'h').
addV('place').as('9').
property(single, 'placename', 'i').
addE('person').from('1').to('2').
property('name', 'p1').addE('person').
from('2').to('3').property('name', 'p1').
addE('person').from('3').to('4').
property('name', 'p1').addE('person').
from('4').to('5').property('name', 'p1').
addE('person').from('2').to('3').
property('name', 'p2').addE('person').
from('3').to('4').property('name', 'p2').
addE('person').from('4').to('5').
property('name', 'p2').addE('person').
from('6').to('7').property('name', 'p3').
property('time', '2022-05-04 12:00:00').
addE('person').from('7').to('8').
property('name', 'p3').
property('time', '2022-05-05 12:00:00').
addE('person').from('8').to('9').
property('name', 'p3').
property('time', '2022-05-10 12:00:00').
addE('person').from('9').to('6').
property('name', 'p3').
property('time', '2022-05-03 12:00:00').
addE('person').from('5').to('6').
property('name', 'p4').addE('person').
from('6').to('7').property('name', 'p4').
addE('person').from('7').to('8').
property('name', 'p4').addE('person').
from('8').to('9').property('name', 'p4').
addE('person').from('3').to('4').
property('name', 'p5').addE('person').
from('4').to('4').property('name', 'p5').
addE('person').from('4').to('5').
property('name', 'p5').addE('person').
from('5').to('6').property('name', 'p5').
addE('person').from('6').to('7').
property('name', 'p5').addE('person').
from('1').to('2').property('name', 'p6').
addE('person').from('2').to('3').
property('name', 'p6').addE('person').
from('3').to('4').property('name', 'p6').
addE('person').from('4').to('5').
property('name', 'p6')
Also I am starting with about 5000 vertex in the graph as I have set the condition to be 5 people should start from the place to be consider a valid starting point.
I have the following vertices -
Person1 -> Device1 <- Person2
^
| |
v
Email1 <- Person3
Now I want to write a gremlin query (janusgraph) which will give me all persons connected to the device(only) with which person1 is connected.
So according to the above graph, our output should be - [Person2].
Person3 is not in output because Person3 is also connected with "Email1" of "Person1".
g.addV('person').property('name', 'Person1').as('p1').
addV('person').property('name', 'Person2').as('p2').
addV('person').property('name', 'Person3').as('p3').
addV('device').as('d1').
addV('email').as('e1').
addE('HAS_DEVICE').from('p1').to('d1').
addE('HAS_EMAIL').from('p1').to('e1').
addE('HAS_DEVICE').from('p2').to('d1').
addE('HAS_DEVICE').from('p3').to('d1').
addE('HAS_EMAIL').from('p3').to('e1')
The following traversal will give you the person vertices that are connected to "Person1" via one or more "device" vertices and not connected via any other type of vertices.enter code here
g.V().has('person', 'name', 'Person1').as('p1').
out().as('connector').
in().where(neq('p1')).
group().
by().
by(select('connector').label().fold()).
unfold().
where(
select(values).
unfold().dedup().fold(). // just in case the persons are connected by multiple devices
is(eq(['device']))
).
select(keys)
I have a graph in neo4j with vertices of:
person:ID,name,value:int,:LABEL
1,Alice,1,Person
2,Bob,0,Person
3,Charlie,0,Person
4,David,0,Person
5,Esther,0,Person
6,Fanny,0,Person
7,Gabby,0,Person
8,XXXX,1,Person
and edges:
:START_ID,:END_ID,:TYPE
1,2,call
2,3,text
3,2,text
6,3,text
5,6,text
5,4,call
4,1,call
4,5,text
1,5,call
1,8,call
6,8,call
6,8,text
8,6,text
7,1,text
imported into neo4j like:
DATA_DIR_SAMPLE=/data_network/
$NEO4J_HOME/bin/neo4j-admin import --mode=csv \
--database=graph.db \
--nodes:Person ${DATA_DIR_SAMPLE}/vertices.csv \
--relationships ${DATA_DIR_SAMPLE}/edges.csv
which looks like:
Now when querying the graph like:
MATCH (source:Person)-[*1]-(destination:Person)
RETURN source.name, source.value, avg(destination.value), 'undir_1_any' as type
UNION ALL
MATCH (source:Person)-[*2]-(destination:Person)
RETURN source.name, source.value, avg(destination.value), 'undir_2_any' as type
one can see that the graph is traversed multiple times, and additionally as I want to obtain a table like:
Vertex | value | type_undir_1_any | type_undir_2_any
Alice | 1 | 0.2 | 0
an additional aggregation step (pivot/reshape) would be required
In the future, I would like to add the following patterns
undirected | directed
all relations | type of relation
as outlined up to 3 levels into the graph
and all permutations of these
Is there a better way to combine the queries?
You need to aggregate along the path length, while with a custom function of calculating the average value:
MATCH p = (source:Person)-[*1..2]-(destination:Person)
WITH
length(p) as L, source, destination
RETURN
source.name as Vertex,
source.value as value,
1.0 *
sum(CASE WHEN L = 1 THEN destination.value ELSE 0 END) /
sum(CASE WHEN L = 1 THEN 1 ELSE 0 END) as type_undir_1_any,
1.0 *
sum(CASE WHEN L = 2 THEN destination.value ELSE 0 END) /
sum(CASE WHEN L = 2 THEN 1 ELSE 0 END) as type_undir_2_any
Or a more elegant version with function from the APOC library to calculate the average on the collection:
MATCH p = (source:Person)-[*1..2]-(destination:Person)
RETURN
source.name as Vertex,
source.value as value,
apoc.coll.avg(COLLECT(
CASE WHEN length(p) = 1 THEN destination.value ELSE NULL END
)) as type_undir_1_any,
apoc.coll.avg(COLLECT(
CASE WHEN length(p) = 2 THEN destination.value ELSE NULL END
)) as type_undir_2_any
I Need to return some groups and people in that group, like this:
Group A
-----Person A
-----Person B
-----Person C
Group B
-----Person D
-----Person E
-----Person F
How can I do that with gremlin. They are connected to group with a edge.
It is always helpful to include a sample graph with your questions on Gremlin preferably as a something easily pasted to the Gremlin Console as follows:
g.addV('group').property('name','Group A').as('ga').
addV('group').property('name','Group B').as('gb').
addV('person').property('name','Person A').as('pa').
addV('person').property('name','Person B').as('pb').
addV('person').property('name','Person C').as('pc').
addV('person').property('name','Person D').as('pd').
addV('person').property('name','Person E').as('pe').
addV('person').property('name','Person F').as('pf').
addE('contains').from('ga').to('pa').
addE('contains').from('ga').to('pb').
addE('contains').from('ga').to('pc').
addE('contains').from('gb').to('pd').
addE('contains').from('gb').to('pe').
addE('contains').from('gb').to('pf').iterate()
A solution to your problem is to use group() step:
gremlin> g.V().has('group', 'name', within('Group A','Group B')).
......1> group().
......2> by('name').
......3> by(out('contains').values('name').fold())
==>[Group B:[Person D,Person E,Person F],Group A:[Person A,Person B,Person C]]
Suppose I have 3 students (A,B,C) and having a major subject and marks respectievely but when I query the result shown in a uneven way.
Data
A -> Math -> 77
B -> History -> 70
C -> Science -> 97
Query
g.V('Class').has('name',within('A','B','C'))
Result
{"student_name":['A','B','C'], "major_subject":['Math','Science','History'], "marks":[70,77,97]}
The data displayed by querying the database is not in order according to the name of the student.
I assume that your graph looks kinda like this:
g = TinkerGraph.open().traversal()
g.addV('student').property('name', 'A').
addE('scored').to(addV('subject').property('name', 'Math')).
property('mark', 77).
addV('student').property('name', 'B').
addE('scored').to(addV('subject').property('name', 'History')).
property('mark', 70).
addV('student').property('name', 'C').
addE('scored').to(addV('subject').property('name', 'Science')).
property('mark', 97).iterate()
Now the easiest way to gather the data is this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('student').
outE('scored').as('mark').inV().as('major').
select('student','major','mark').
by('name').
by('name').
by('mark')
==>[student:A,major:Math,mark:77]
==>[student:B,major:History,mark:70]
==>[student:C,major:Science,mark:97]
But if you really depend on the format shown in your question, you can do this:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).
store('student').by('name').
outE('scored').store('mark').by('mark').
inV().store('major').by('name').
cap('student','major','mark')
==>[major:[Math,History,Science],student:[A,B,C],mark:[77,70,97]]
If you want to get the cap'ed result to be ordered by marks, you'll need a mix of the 2 queries:
gremlin> g.V().has('student', 'name', within('A', 'B', 'C')).as('a').
outE('scored').as('b').
order().
by('mark').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[History,Math,Science],student:[B,A,C],mark:[70,77,97]]
To order by the order of inputs:
gremlin> input = ['C', 'B', 'A']; []
gremlin> g.V().has('student', 'name', within(input)).as('a').
order().
by {input.indexOf(it.value('name'))}.
outE('scored').as('b').
inV().as('c').
select('a','c','b').
by('name').
by('name').
by('mark').
aggregate('student').by(select('a')).
aggregate('major').by(select('b')).
aggregate('mark').by(select('c')).
cap('student','major','mark')
==>[major:[97,70,77],student:[C,B,A],mark:[Science,History,Math]]