Gremlin query combine vertices with unrelated vertices CosmosDB - azure-cosmosdb

I would like to get several vertices e.G. with the label "user" combined with vertices, they are not related to, yet e.G. with the label "movie".
I know, that the strength of Gremlin is traversing the vertex, and combining objects that are not related is not the best use case for the graph. I am using Azure CosmosDB for my application, so if there is any idea how to do this more performant feel free to let me know. If you can do this with gremlin I need some help with the query. I provide an example here:
There are 4 users: bob, jose, frank, peter
and 4 movies: movie1, movie2, movie3, movie4
Between the users and movies there can be an edge "watched"
My example data looks as follows:
watched:
[bob, [movie1,movie2]]
[jose, [movie3]]
[frank, []]
[peter, [movie]]
The result and format I would like to get is following:
not watched:
[bob, movie3]
[bob, movie4]
[jose, movie1]
[jose, movie2]
[jose, movie4]
[frank, movie1]
[frank, movie2]
[frank, movie3]
[frank, movie4]
[peter, movie1]
[peter, movie2]
[peter, movie3]
The script to set up the graph (using /partition_key as partition key):
g.addV("user").property("partition_key", 1).property("id", "bob")
g.addV("user").property("partition_key", 1).property("id", "jose")
g.addV("user").property("partition_key", 1).property("id", "frank")
g.addV("user").property("partition_key", 1).property("id", "peter")
g.addV("movie").property("partition_key", 1).property("id", "movie1")
g.addV("movie").property("partition_key", 1).property("id", "movie2")
g.addV("movie").property("partition_key", 1).property("id", "movie3")
g.addV("movie").property("partition_key", 1).property("id", "movie4")
g.V("bob").addE("watched").to(g.V("movie1"))
g.V("bob").addE("watched").to(g.V("movie2"))
g.V("jose").addE("watched").to(g.V("movie3"))
g.V("peter").addE("watched").to(g.V("movie4"))
Please consider, that I cannot use lambdas, because Azure CosmosDB doesn't support them.

A join in gremlin can be realized by repeating the V() step. After realizing that, the gremlin query almost reads as an ordinary SQL query, see below.
g.V().has("id", "bob").addE("watched").to(__.V().has("id", "movie1"))
g.V().has("id", "bob").addE("watched").to(__.V().has("id", "movie2"))
g.V().has("id", "jose").addE("watched").to(__.V().has("id", "movie3"))
g.V().has("id", "peter").addE("watched").to(__.V().has("id", "movie4"))
g.V().hasLabel("user").as("u").
V().hasLabel("movie").as("m").
in("watched").where(neq("u")).
select("u", "m").by("id").
order().by("u").by("m")
==>[u:bob,m:movie3]
==>[u:bob,m:movie4]
==>[u:frank,m:movie1]
==>[u:frank,m:movie2]
==>[u:frank,m:movie3]
==>[u:frank,m:movie4]
==>[u:jose,m:movie1]
==>[u:jose,m:movie2]
==>[u:jose,m:movie4]
==>[u:peter,m:movie1]
==>[u:peter,m:movie2]
==>[u:peter,m:movie3]
You are right in saying that this query does not perform well in gremlin and I would advise you to use the SQL API of CosmosDb.

Related

Gremlin - run additional traversal in same query in case not enough values are found

Is there a way to check how many records were found, and in case less than X records were returned do another query?
For example first run this query ->
g.V().
hasLabel('courseContent').
has('status', 'active').as('cc').
outE('ccBelongsToCourse').
has('status', 'active').
inV().
hasLabel('course').
has('externalId', ':courseId').
select('cc').by(valueMap('externalId')).
dedup().
range(:offSet, :limit);
And in case less than 10 records were found, run this query:
g.V().
hasLabel('educatorContent').
has('status', 'active').as('ec').
select('ec').by(valueMap('externalId')).
dedup().
range(:offSet, :limit);
but do it all inside the same .gremlin file?
(Sorry if the question is too basic, super new to Gremlin)
You could use choose() which provides if-then semantics:
range(:offSet, :limit).fold().
choose(count(local).is(gt(9)),
identity(),
V().has('educatorContent', 'status','active')....)

How can we reach to Key/value pairs after group by function on Gremlin?

I have a simple gremlin query, first of all, I aggregate data, then I want to reach aggregated data and I want to use it on other operations. But after group-by operation, I couldn’t reach them as key/value pair.
GraphTraversal t = graph.V().hasLabel("App").as("a")
.inE("RANKS").as("r")
.outV().as("k")
.choose(__.select("k").by("countryCode").is(__.in(...)),
__.math("1.0 / r").by("rank1"),
__.math("1.0 / r").by("rank2"))
.as("score")
.select("a").aggregate("ap")
.select("k").by("countryCode").aggregate("country")
.select("a", "k").by("appId").by("countryCode")
.group().by("grp_res").by(__.select("score").sum().as("sum_score"))
.cap("ap", "country", "grp_res")
.V().hasLabel("App").where(P.within("ap")).as("app")
.select("app", "country", "ak").by("appId").by().by();
The last line .select("app", "ak").by("appId").by() couldn’t be reachable askey/value pair after group-by. How can I reach them? Do you have any suggestion?
Now it looks:
{app=1, country=US, ak={{a=1, k=US}=363.0, {a=2, k=US}=544.5}}
{app=2, country=US, ak={{a=1, k=US}=363.0, {a=2, k=US}=544.5}}
But expected output is:
{app=1, country=US, ak=363.0}
{app=2, country=US, ak=544.5}
Solved like:
group().by(...)
.select(Column.values).as("grp_result")
.select("grp_result").select("score").as("score")
...
I don't see where you define "ak" in your query but the basic issue is that you have nested maps where the keys are themselves a map. So you will need to select from the map using that exact key or order the groups differently to have simpler keys. Then would do something like select('ak').select('key').

Neo4J and Cypher query

I am new to Neo4j and Cypher query.My create query is like each Shop has 2 chillers which has 2 PLCs each which in turn has 2 sensors each.
The create is as below
Create(:SHOP{name:"Shop1"})-[:hasChiller]->(:CHILLER{name:"Chiller1"})
Create(:SHOP{name:"Shop1"})-[:hasChiller]->(:CHILLER{name:"Chiller2"})
Create(:SHOP{name:"Shop2"})-[:hasChiller]->(:CHILLER{name:"Chiller3"})
Create(:SHOP{name:"Shop2"})-[:hasChiller]->(:CHILLER{name:"Chiller4"})
Create(:CHILLER{name:"Chiller1"})-[:hasPLC]->(:PLC{name:"Plc1"})
Create(:CHILLER{name:"Chiller1"})-[:hasPLC]->(:PLC{name:"Plc2"})
Create(:CHILLER{name:"Chiller2"})-[:hasPLC]->(:PLC{name:"Plc3"})
Create(:CHILLER{name:"Chiller2"})-[:hasPLC]->(:PLC{name:"Plc4"})
Create(:CHILLER{name:"Chiller3"})-[:hasPLC]->(:PLC{name:"Plc5"})
Create(:CHILLER{name:"Chiller3"})-[:hasPLC]->(:PLC{name:"Plc6"})
Create(:CHILLER{name:"Chiller4"})-[:hasPLC]->(:PLC{name:"Plc7"})
Create(:CHILLER{name:"Chiller4"})-[:hasPLC]->(:PLC{name:"Plc8"})
Create(:PLC{name:"Plc1"})-[:hasSensor]->(:SENSOR{name:"Sensor1"})
Create(:PLC{name:"Plc1"})-[:hasSensor]->(:SENSOR{name:"Sensor2"})
Create(:PLC{name:"Plc2"})-[:hasSensor]->(:SENSOR{name:"Sensor3"})
Create(:PLC{name:"Plc2"})-[:hasSensor]->(:SENSOR{name:"Sensor4"})
Create(:PLC{name:"Plc3"})-[:hasSensor]->(:SENSOR{name:"Sensor5"})
Create(:PLC{name:"Plc3"})-[:hasSensor]->(:SENSOR{name:"Sensor6"})
Create(:PLC{name:"Plc4"})-[:hasSensor]->(:SENSOR{name:"Sensor7"})
Create(:PLC{name:"Plc4"})-[:hasSensor]->(:SENSOR{name:"Sensor8"})
Create(:PLC{name:"Plc5"})-[:hasSensor]->(:SENSOR{name:"Sensor9"})
Create(:PLC{name:"Plc5"})-[:hasSensor]->(:SENSOR{name:"Sensor10"})
Create(:PLC{name:"Plc6"})-[:hasSensor]->(:SENSOR{name:"Sensor11"})
Create(:PLC{name:"Plc6"})-[:hasSensor]->(:SENSOR{name:"Sensor12"})
Create(:PLC{name:"Plc7"})-[:hasSensor]->(:SENSOR{name:"Sensor13"})
Create(:PLC{name:"Plc7"})-[:hasSensor]->(:SENSOR{name:"Sensor14"})
Create(:PLC{name:"Plc8"})-[:hasSensor]->(:SENSOR{name:"Sensor15"})
Create(:PLC{name:"Plc8"})-[:hasSensor]->(:SENSOR{name:"Sensor16"})
However the Match to get the sensors under SHOP1
MATCH(s:SHOP{name:"Shop1"})-[:hasChiller]->(cc:CHILLER)-[:hasPLC]->(pp:PLC)-[:hasSensor]->(ss:SENSOR) return ss.name
returns nothing.Says no changes and no data.
I am trying this out on Neo4J sandbox environment.I did this based on the understanding i had using match clause in SQL SERVER GRAPH 2019 where this works.
Can anyone point out where i am going wrong?
You are improperly creating multiple instances of the "same" node. You should create each node once, and then use its bound variable name later on when you need to create relationships involving that node.
Delete all your data and follow this pattern instead (you have to fill in the "..." parts):
CREATE
(sh1:SHOP{name:"Shop1"}), (sh2:SHOP{name:"Shop1"}),
(c1:CHILLER{name:"Chiller1"}), (c2:CHILLER{name:"Chiller2"}),(c3:CHILLER{name:"Chiller3"}), (c4:CHILLER{name:"Chiller4"}),
(p1:PLC{name:"Plc1"}), ..., (p8:PLC{name:"Plc8"}),
(se1:SENSOR{name:"Sensor1"}), ..., (se16:SENSOR{name:"Sensor16"}),
(sh1)-[:hasChiller]->(c1), (sh1)-[:hasChiller]->(c2),
... // create remaining relationships using bound variable names for nodes

Assign query using 'match()' to subgraph

I have a JanusGraph database with a graph structure as follows:
(Paper)<-[AuthorOf]-(Author)
I'm want to use Gremlin's match clause to query the data and assign the results to a subgraph. This is what I have so far:
g.V().match(
__.as('a').has('Paper','paperTitle', 'The name of my paper'),
__.as('a').in('AuthorOf').outV().as('b')).
select('b').values()
This query returns what I want, the Authors of the paper I'm for which I'm searching. However, I want to assign the results to a subgraph so I can export it using:
sg.io(IoCore.graphml()).writeGraph("/home/ubuntu/myresults.graphml")
Previously, I've achieved this with a different query structure like this:
sg = g.V().has('paperTitle', 'The name of my paper').
inE('AuthorOf').subgraph('sg1').
outV().
cap('sg1').
next()
Is there away to achieve the same results using the 'match()' statement?
After a little trial and error I was able to create a working solution:
sg = g.V().match(
__.as('a').has('Paper','paperTitle', 'ladle pouring guide'),
__.as('a').inE('AuthorOf').subgraph('sg').outV().as('b')).
cap('sg').next()
At first, I was trying to use the 'select' statement to isolate the subgraph. After reviewing the documentation on 'subgraph' and learning more about side-effects in gremlin I realized it wasn't necessary.

Printing/Fetching Vertex values from a path

Just getting started with gremlin.
Printing out all the Vertex values worked out fine
gremlin> g.V().values()
==>testing 2
==>Cash Processing
==>Sales
==>Marketing
==>Accounting
I was able to find all the directly connected path between my Vertices.
gremlin> g.V().hasLabel('Process')
.repeat(both().simplePath())
.until(hasLabel('Process'))
.dedup().path()
==>[v[25],v[28]]
==>[v[25],v[26]]
==>[v[26],v[27]]
==>[v[26],v[25]]
Now am trying to print out the values in the path like ['Sales', 'Accounting'] instead of [v[25],v[28]]
Not been able to figure out a way yet
Already tried and failed with
Unfold: Does not get me 1-1 mapping
gremlin> g.V().hasLabel('Process').repeat(both().simplePath()).until(hasLabel('Process')).dedup().path().unfold().values()
==>Cash Processing
==>Accounting
==>Cash Processing
==>Sales
==>Sales
==>Marketing
==>Sales
==>Cash Processing
Path seems to be of a different data-type and does not support .values() function
gremlin> g.V().hasLabel('Process')
.repeat(both().simplePath())
.until(hasLabel('Process'))
.dedup().path().values()
org.apache.tinkerpop.gremlin.process.traversal.step.util.ImmutablePath cannot be cast to org.apache.tinkerpop.gremlin.structure.Element
Tried the following google searches and didnt get the answer
gremlin print a path
gremlin get values in a path
and few more word twisting
Found one at here that was for java but that didnt work for me
l = []; g.V().....path().fill(l)
(but cant create list, Cannot set readonly property: list for class: org.apache.tinkerpop.gremlin.structure.VertexProperty$Cardinality
)
I have running it on Gremlin console (running ./gremlin.sh)
You can use the by step to modulate the elements inside the path. For example by supplying valueMap(true) to by you get the properties of the vertices, together with the vertex labels and their ids:
gremlin> g.V().repeat(both().simplePath()).times(1).dedup().path().by(valueMap(true))
==>[[id:1,name:[marko],label:person,age:[29]],[id:3,name:[lop],lang:[java],label:software]]
==>[[id:1,name:[marko],label:person,age:[29]],[id:2,name:[vadas],label:person,age:[27]]]
==>[[id:1,name:[marko],label:person,age:[29]],[id:4,name:[josh],label:person,age:[32]]]
==>[[id:2,name:[vadas],label:person,age:[27]],[id:1,name:[marko],label:person,age:[29]]]
==>[[id:3,name:[lop],lang:[java],label:software],[id:6,name:[peter],label:person,age:[35]]]
==>[[id:4,name:[josh],label:person,age:[32]],[id:5,name:[ripple],lang:[java],label:software]]
I used the modern graph which is one of TinkerPop's toy graphs that are often used for such examples. Your output will look a bit different and you may want to use something else than valueMap(true) for the by modulator. The TinkerPop documentation of the path step itself contains two more advanced examples for path().by() that you might want to check out.

Resources