Retrieve all IDs for duplicate vertices in Gremlin

Retrieve all IDs for duplicate vertices in Gremlin - gremlin

Problem
I am running a query which finds duplicate vertices by the name property. I would like to know the IDs for all the corresponding vertices.
At this time, only the ids in the where clause are returned.
Example graph
Here is a toy example graph. There are two vertices with the same name ex.
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('X').property('name', 'ex')
==>v[0]
gremlin> g.addV('Y').property('name', 'why')
==>v[2]
gremlin> g.addV('Y').property('name', 'ex')
==>v[4]
gremlin> g.V().elementMap()
==>[id:0,label:X,name:ex]
==>[id:2,label:Y,name:why]
==>[id:4,label:Y,name:ex]
Detecting duplicates
When I find the duplicates and get the elementMap(), the IDs are only for the vertex matched in the where clause.
gremlin> g.V().hasLabel('X').as('x').V().hasLabel('Y').as('y').where('x', P.eq('y')).by('name').elementMap()
==>[id:4,label:Y,name:ex]
Whereas I would like to see the id for both which would be id:0 and id:4
I would like something like:
==>[[id:0,label:X,name:ex], [id:4,label:Y,name:ex]]

You had actually got very close
gremlin> g.V().hasLabel('X').as('x').
......1> V().hasLabel('Y').as('y').
......2> where(eq('x')).by('name').
......3> select('x','y').
......4> by(valueMap().by(unfold()).
with(WithOptions.tokens)).
......5> select(values)
==>[[id:0,label:X,name:ex],[id:4,label:Y,name:ex]]

Related

Gremlin on Neptune: how to get "outV" from an edge that is returned from "select"

I have a graph with two vertices: 'a' & 'b'
There is an edge between 'a' and 'b' labeled 'Y'
gremlin> g.V('a').outE()
==>e[dcb543f5-2189-9ffe-e617-b928dc565c1a][a-Y->b]
The edge has a property 'foo'
gremlin> g.V('a').outE().valueMap(true)
==>{label=Y, foo=bar, id=dcb543f5-2189-9ffe-e617-b928dc565c1a}
My question: why is the following statement returning an edge? I expected a vertex.
gremlin> g.E('dcb543f5-2189-9ffe-e617-b928dc565c1a').as('e').properties('foo').as('foo').select('e').outV()
==>e[dcb543f5-2189-9ffe-e617-b928dc565c1a][a-Y->b]

It shouldn't behave that way. Note that TinkerGraph doesn't exhibit that behavior:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.E().as('e').properties('weight').as('w').select('e').outV()
==>v[1]
==>v[1]
==>v[1]
==>v[4]
==>v[4]
==>v[6]
Is there some sample data that will reproduce this problem? Perhaps it is a bug in the graph you are using?

Joining vertices with type indirection

I want to do a join between two vertex types using gremlin
select * from type1 inner join type2 in type2.id = type1.type2_id
The following works when using type1 and type2 as vertex labels:
g.V()
.hasLabel("type2").as("t2")
.inE("hasJoin")
.hasLabel("type1").as("t1")
.select("t1", "t2")
However, my graph does not use the vertex label to represent the type, but uses another vertex connected via the "hasType" edge instead.
g.V()//
.addV("instance1").as("instance1")//
.addV("instance2").as("instance2")//
.addV("type1").as("type1")//
.addV("type2").as("type2")//
.addE("hasType").from("instance1").to("type1")//
.addE("hasType").from("instance2").to("type2")//
.addE("hasJoin").from("instance1").to("instance2")//
.iterate();
I would need to do something like replacing
hasLabel("type2").as("t2")
with
hasLabel("type2").inE("hasType").outV().as("t2"):
which would result in
g.V()
.hasLabel("type2").inE("hasType").outV().as("t2")
.inE("hasJoin")
.hasLabel("type1").inE("hasType").outV().as("t1")
.select("t1", "t2")
This works for "t2", but not for "t1", as .inE("hasJoin").hasLabel("type1") is just wrong. What function do I need to use to join "t1" and "t2"?

All you need is a filter that checks the adjacent type vertex. Here's your sample graph (your script doesn't quite work):
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV("instance1").property("name","instance1").as("instance1").
......1> addV("instance2").property("name","instance2").as("instance2").
......2> addV("type1").as("type1").
......3> addV("type2").as("type2").
......4> addE("hasType").from("instance1").to("type1").
......5> addE("hasType").from("instance2").to("type2").
......6> addE("hasJoin").from("instance1").to("instance2").
......7> iterate()
And the query you're looking for should be something like this:
gremlin> g.V().hasLabel("type2").in("hasType").as("t2").
both("hasJoin").
filter(out("hasType").hasLabel("type1")).as("t1").
select("t1", "t2").
by("name")
==>[t1:instance1,t2:instance2]

How to build a subgraph from transitive edges?

I have a graph with reified relations, which hold useful information, but for visualization purpose I need to create a subgraph without these intermediary nodes.
Example :
[A:Person] <--AFFILIATE-- [B:Affiliation] --COMPANY--> [C:Org]
And I want to produce a subgraph like this :
[A:Person] --AFFILIATED_TO--> [C:Org]
Is there any simple way to get that with Gremlin ?

I think that your best option might be to use subgraph() step as you normally might to extract the edge-induced subgraph and then execute some Gremlin on that subgraph to introduce the visualization edges and remove the stuff you don't want.
I can demonstrate with the modern toy graph packaged with TinkerPop:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> sg = g.V().outE('created').subgraph('sg').cap('sg').next() // subgraph creation
==>tinkergraph[vertices:5 edges:4]
gremlin> g = sg.traversal()
==>graphtraversalsource[tinkergraph[vertices:5 edges:4], standard]
gremlin> g.V().as('a'). // add special subgraph edge
......1> out('created').as('software').
......2> in('created').where(neq('a')).
......3> addE('co-developer').from('a').
......4> property('project',select('software').by('name'))
==>e[0][1-co-developer->4]
==>e[1][1-co-developer->6]
==>e[2][4-co-developer->1]
==>e[3][4-co-developer->6]
==>e[4][6-co-developer->1]
==>e[5][6-co-developer->4]
gremlin> g.V().hasLabel('software').drop() //remove junk from subgraph
gremlin> g.E()
==>e[0][1-co-developer->4]
==>e[1][1-co-developer->6]
==>e[2][4-co-developer->1]
==>e[3][4-co-developer->6]
==>e[4][6-co-developer->1]
==>e[5][6-co-developer->4]
gremlin> g.V().has('name','marko').outE('co-developer').valueMap(true)
==>[label:co-developer,project:lop,id:0]
==>[label:co-developer,project:lop,id:1]

How to get vertex connected to several vertices

In the example below, using gremlin syntax, I want to get vertex number 1, knowing attributes of vertices 3 and 4.
So verbally - who is connected by 'created' edge to vertex with attribute name=lop AND by 'knows' edge to vertex with attribute name=josh
I want to specify exactly the names of edges so v.out.name.filter{it.matches('lop|josh')} is not good, as it will take all out edges of 1.

This works fine:
gremlin> Gremlin.version()
==>3.2.4
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has('name', 'lop').in('created').as('x').
......1> out('knows').has('name', 'josh').select('x')
==>v[1]
The syntax you used in your question looks more like TinkerPop 2, which is out of support. You should be using TinkerPop 3.

You could use match for this (TinkerPop 3.x):
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().
......1> match(
......2> __.as('a').out('created').has('name','lop'),
......3> __.as('a').out('knows').has('name','josh')).
......4> select('a')
==>v[1]
It reads quite similarly to your English version, though it does not start with an index look up like Jason Plurad's answer. I guess it could be inverted a bit to get the same answer:
gremlin> g.V().
......1> has('name','lop').
......2> match(
......3> __.as('a').in('created').as('b'),
......4> __.as('b').out('knows').has('name','josh')).
......5> select('b')
==>v[1]

counting total child nodes in titangraph through gremlin query

i had created titan graph of hierarchical tree in Java.
how to find total child nodes hierarchy from a specified node with gremlin.
suggest me gremlin query for counting and it should be faster.

The basic pattern for traversing a tree lies in repeat() step. As an example, I use the graph depicted in the Tree Recipes section of the TinkerPop documentation:
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV(id, 'A').as('a').
......1> addV(id, 'B').as('b').
......2> addV(id, 'C').as('c').
......3> addV(id, 'D').as('d').
......4> addV(id, 'E').as('e').
......5> addV(id, 'F').as('f').
......6> addV(id, 'G').as('g').
......7> addE('hasParent').from('a').to('b').
......8> addE('hasParent').from('b').to('c').
......9> addE('hasParent').from('d').to('c').
.....10> addE('hasParent').from('c').to('e').
.....11> addE('hasParent').from('e').to('f').
.....12> addE('hasParent').from('g').to('f').iterate()
gremlin> g.V('F').repeat(__.in('hasParent')).emit().count()
==>6
gremlin> g.V('C').repeat(__.in('hasParent')).emit().count()
==>3
gremlin> g.V('A').repeat(__.in('hasParent')).emit().count()
==>0
The key to getting the count is in the use of emit() which allows all the traversers encountered in the repeat() to be counted.
Just for comparison to what kind of speed you can get with TinkerGraph (in-memory) I generated a 400,000 vertex deep tree:
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> lastV = g.addV().next()
==>v[0]
gremlin> (0..<400000).each{lastV=g.V(lastV).as('f').addV().as('t').addE('next').from('f').to('t').select('t').next()}
==>0
==>1
==>2
==>3
...
gremlin> graph
==>tinkergraph[vertices:400001 edges:400000]
gremlin> clockWithResult{ g.V(0L).repeat(__.out('next')).emit().count().next() }
==>171.44102253
==>400000
Done in 171ms. TinkerGraph is obviously faster as it holds its data purely in-memory. Titan/JanusGraph and other graphs have to read from disk.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Retrieve all IDs for duplicate vertices in Gremlin - gremlin

Related

Gremlin on Neptune: how to get "outV" from an edge that is returned from "select"

Joining vertices with type indirection

How to build a subgraph from transitive edges?

How to get vertex connected to several vertices

counting total child nodes in titangraph through gremlin query

Categories

Resources