Find all vertices with no out edges - azure-cosmosdb

I'm new to Gremlin and I can't figure out a simple query which will return all vertices of my graph which do not have any edges (ie: orphaned Vertex). Ideally I'd like those without any 'out' edge.
I've been reading and some questions/articles say I can interpret an out edge as a property, but that didn't work for me either. I've been looking at hasNot and filtering.
Any ideas?
Thanks
-John

You can simply do this:
g.V().not(outE())
Or if you want to find total orphans:
g.V().not(bothE())

Try this: g.V().as('a').where(out().count().is(0)).select('a')
But, depending on how many vertices you have, you can run into request rate too large exception (aka 429).
To avoid that you can do the query in ranges, if you know the id ranges of the vertices, or it can be some other property ranges. An id range based example is below:
g.V().has('id', gt(0)).has('id', lt(100)).as('a').where(out().count().is(0)).select('a')
g.V().has('id', gt(99)).has('id', lt(200)).as('a').where(out().count().is(0)).select('a')
....
and so on

Related

Gremlin Emit Times Loses Lower Hop Solution (Gremlify Example Included)

Relatively new to gremlin and working with this query:
g.V().or(has('LOCATION', eq('IPLTINMYGT0')),
has('LOCATION', eq('IPLTINMYK01'))).
repeat(bothE().otherV()).emit().times(1).has('LOCATION',eq('FSHRIN01K00')).dedup().
path().by(valueMap('LOCATION')).dedup()
And this simple graph on gremlify:
https://gremlify.com/grrlq20ig57/1
When I vary the query from times(1) to times(2), the result that shows up in the times(1) query no longer shows up in the times(2) query. I'm guessing this can be read as 'at most 1 hop' or 'at most 2 hops' so I was expecting when I went to higher level hops the times(1) result would still be included. Any way to get the times(1) result to show up (in addition to the times(2) result) when issuing times(2) queries (or greater)? Does this behavior have anything to do with DFS vs BFS? Any help is appreciated. Thanks!
Your use of dedup() after has('LOCATION',eq('FSHRIN01K00')) is filtering away one of the vertices emitted when you do times(2). Therefore when you call path() it is only called once on the traverser that survives that filter. If you remove that dedup() you get both paths traversed.

Order results by number of coincidences in edge properties

I'm working on a recommendation system that recommends other users. The first results should be the most "similar" users to the "searcher" user. Users respond to questions and the amount of questions responded in the same way is the amount of similarity.
The problem is that I don't know how to write the query
So in technical words I need to sort the users by the amount of edges that has specific property values, I tried with this query, I thought it should work but it doesn't work:
let query = g.V().hasLabel('user');
let search = __;
for (const question of searcher.questions) {
search = search.outE('response')
.has('questionId', question.questionId)
.has('answerId', question.answerId)
.aggregate('x')
.cap('x')
}
query = query.order().by(search.unfold().count(), order.asc);
Throws this gremlin internal error:
org.apache.tinkerpop.gremlin.process.traversal.step.util.BulkSet cannot be cast to org.apache.tinkerpop.gremlin.structure.Vertex
I also tried with multiple .by() for each question, but the result was not ordered by the amount of coincidence.
How can I write this query?
When you cap() an aggregate() it returns a BulkSet which is a Set that has counts for how many times each object exists in that Set. It behaves like a List when you iterate through it by unrolling each object the associated size of the count. So you get your error because the output of cap('x') is a BulkSet but because you are building search in a loop you are basically just calling outE('response') on that BulkSet and that's not valid syntax as has() expects a graph Element such as a Vertex as indicated by the error.
I think you would prefer something more like:
let query = g.V().hasLabel('user').
outE('response');
let search = [];
for (const question of searcher.questions) {
search.push(has('questionId', question.questionId).
has('answerId', question.answerId));
}
query = query.or(...search).
groupCount().
by(outV())
order(local).by(values, asc)
I may not have the javascript syntax exactly right (and I used spread syntax in my or() to just convey the idea quickly of what needs to happen) but basically the idea here is to filter edges that match your question criteria and then use groupCount() to count up those edges.
If you need to count users who have no connection then perhaps you could switch to project() - maybe like:
let query = g.V().hasLabel('user').
project('user','count').
by();
let search = [];
for (const question of searcher.questions) {
search.push(has('questionId', question.questionId).
has('answerId', question.answerId));
}
query = query.by(outE('response').or(...search).count()).
order().by('count', asc);
fwiw, I think you might consider a different schema for your data that might make this recommendation algorithm a bit more graph-like. A thought might be to make the question/answer a vertex (a "qa" label perhaps) and have edges go from the user vertex to the "qa" vertex. Then users directly link to the question/answers they gave. You can easily see by way of edges, a direct relationship, which users gave the same question/answer combination. That change allows the query to flow much more naturally when asking the question, "What users answered questions in the same way user 'A' did?"
g.V().has('person','name','A').
out('responds').
in('responds').
groupCount().
order(local).by(values)
With that change you can see that we can rid ourselves of all those has() filters because they are implicitly implied by the "responds" edges which encode them into the graph data itself.

How to get a path from one node to another including all other nodes and relationships involved in between

I have designed a model in Neo4j in order to get paths from one station to another including platforms/legs involved. The model is depicted down here. Basically, I need a query to take me from NBW to RD. also shows the platforms and legs involved. I am struggling with the query. I get no result. Appreciate if someone helps.
Here is my cypher statement:
MATCH p = (a:Station)-[r:Goto|can_board|can_alight|has_platfrom*0..]->(c:Station)
WHERE (a.name='NBW')
AND c.name='RD'
RETURN p
Model:
As mentioned in the comments, in Cypher you can't use a directed variable-length relationship that uses differing directions for some of the relationships.
However, APOC Procedures just added the ability to expand based on sequences of relationships. You can give this a try:
MATCH (start:station), (end:station)
WHERE start.name='NBW' AND end.name='THT'
CALL apoc.path.expandConfig(start, {terminatorNodes:[end], limit:1,
relationshipFilter:'has_platform>, can_board>, goto>, can_alight>, <has_platform'}) YIELD path
RETURN path
I added a limit so that only the first (and shortest) path to your end station will be returned. Removing the limit isn't advisable, since this will continue to repeat the relationships in the expansion, going from station to station, until it finds all possible ways to get to your end station, which could hang your query.
EDIT
Regarding the new model changes, the reason the above will not work is because relationship sequences can't contain a variable-length sequence within them. You have 2 goto> relationships to traverse, but only one is specified in the sequence.
Here's an alternative that doesn't use sequences, just a whitelisting of allowed relationships. The spanningTree() procedure uses NODE_GLOBAL uniqueness so there will only be a single unique path to each node found (paths will not backtrack or revisit previously-visited nodes).
MATCH (start:station), (end:station)
WHERE start.name='NBW' AND end.name='RD'
CALL apoc.path.spanningTree(start, {terminatorNodes:[end], limit:1,
relationshipFilter:'has_platform>|can_board>|goto>|can_alight>|<has_platform'}) YIELD path
RETURN path
Your query is directed --> and not all of the relationships between your two stations run in the same direction. If you remove the relationship direction you will get a result.
Then once you have a result I think something like this could get you pointed in the right direction on extracting the particular details from the resulting path once you get that working.
Essentially I am assuming that everything you are interested in is in your path that is returned you just need to filter out the different pieces that are returned.
As #InverseFalcon points out this query should be limited in a larger graph or it could easily run away.
MATCH p = (a:Station)-[r:Goto|can_board|can_alight|has_platfrom*0..]-(c:Station)
WHERE (a.name='NBW')
AND c.name='THT'
RETURN filter( n in nodes(p) WHERE 'Platform' in labels(n)) AS Platforms

How do i get all nodes in the graph on a certain relation ship type

I have build a small graph where all the screens are connected and the flow of the screen varies based on the system/user. So the system/user is the relationship type.
I am looking to fetch all nodes that are linked with a certain relation ship from a starting screen. I don't care about the depth since i don't know the depth of the graph.
Something like this, but the below query takes ever to get the result and its returning incorrect connections not matching the attribute {path:'CC'}
match (n:screen {isStart:true})-[r:NEXT*0..{path:'CC'}]-()
return r,n
A few suggestions:
Make sure you have created an index for :screen(isStart):
CREATE INDEX ON :screen(isStart);
Are you sure you want to include 0-length paths? If not, take out 0.. from your query.
You did not specify the directionality of the :NEXT relationships, so the DB has to look at both incoming and outgoing :NEXT relationships. If appropriate, specify the directionality.
To minimize the number of result rows, add a WHERE clause that ensures that the current path cannot be extended further.
Here is a proposed query that combines the last 3 suggestions (fix it up to suit your needs):
MATCH (n:screen {isStart:true})-[r:NEXT* {path:'CC'}]->(x)
WHERE NOT (x)-[:NEXT {path:'CC'}]->()
return r,n;

neo4j Cypher Query

i have a following graph in neo4j graph database and by using the cypher query language, i want to retrieve the whole data with is connected to root node and their child node.
For example :
kindly find the below graph image.
[As per the image, node 1 has two child and their child also have too many child with the same relationship. now what i want, using Cypher, i hit the node 1 and it should response with the whole data of child node and there child node and so on, relationship between nodes are "Parent_of" relationship.]
can anyone help me on this.
start n=node(1) // use the id, or find it using an index
match n-[:parent_of*0..]->m
return m
will get you all the graph nodes in m. You could also take m.some_property instead of m if you don't want the node itself, but some property that is stored in your nodes.
Careful though, as the path has no limit, this query could become pretty huge in a large graph.
You can see an example of *0.. here: http://gist.neo4j.org/?6608600

Resources