I have an injected array of values. I'm I want to add vertices if they don't exist. I use the fold and coalesce step, but it doesn't work in this instance since I'm trying to do it for multiple vertices. Since 1 vertex exists I can no longer get a null value, and the the unfold inside the coalesce step returns a value from there on. This leads to vertices that don't exist yet not to be added.
This is my current traversal:
const traversal = await g
?.inject([
{ twitterPostId: 'kay', like: true, retweet: false },
{ twitterPostId: 'fay', like: true, retweet: false },
{ twitterPostId: 'nay', like: true, retweet: false },
])
.unfold()
.as('a')
.aggregate('ta')
.V()
.as('b')
.where('b', p.eq('a'))
.by(__.id())
.by('twitterPostId')
.fold()
.coalesce(__.unfold(), __.addV().property(t.id, __.select('ta').unfold().select('twitterPostId')))
.toList();
Returns:
[Bn { id: 'kay', label: 'vertex', properties: undefined }]
Without using coalesce you can do conditional upserts using what we often refer to as "map injection". The Gremlin does get a little advanced, but here is an example
g.withSideEffect('ids',['3','4','xyz','abc']).
withSideEffect('p',['xyz': ['type':'dog'],'abc':['type':'cat']]).
V('3','4','xyz','abc').
id().fold().as('found').
select('ids').
unfold().
where(without('found')).as('missing').
addV('new-vertex').
property(id,select('missing')).
property('type',select('p').select(select('missing')).select('type'))
That query will look for a set of vertices, figure out which ones exist, and for the rest use the ID values and properties from the map called 'p' to create the new vertices. You can build on this pattern a great many ways and I find it very useful until mergeV and mergeE are more broadly available
You can also use the list of IDs in the query to check which ones exist. However, this may lead to inefficient query plans depending on the given implementation:
g.withSideEffect('ids',['3','4','xyz','abc']).
withSideEffect('p',['xyz': ['type':'dog'],'abc':['type':'cat']]).
V().
where(within('ids')).
by(id).
by().
id().fold().as('found').
select('ids').
unfold().
where(without('found')).as('missing').
addV('new-vertex').
property(id,select('missing')).
property('type',select('p').select(select('missing')).select('type'))
This is trickier than the first query, as the V step cannot take a traversal. So you cannot do V(select('ids')) in Gremlin today.
Related
Using python gremlin on Neptune workbench, I have two functions:
The first adds a Vertex with a set of properties, and returns a reference to the traversal operation
The second adds to that traversal operation.
For some reason, the first function's operations are getting persisted to the DB, but the second operations do not. Why is this?
Here are the two functions:
def add_v(v_type, name):
tmp_id = get_id(f"{v_type}-{name}")
result = g.addV(v_type).property('id', tmp_id).property('name', name)
result.iterate()
return result
def process_records(features):
for i in features:
v_type = i[0]
name = i[1]
v = add_v(v_type, name)
if len(i) > 2:
%debug
props = i[2]
for r in props:
v.property(r[0], r[1]).iterate()
Your add_V method has already iterated the traversal. If you want to return the traversal from add_v in a way that you can add to it remove the iterate.
I'm trying to return a limited number of vertices matching a pattern, as well as the total (non-limited) count of vertices matching that pattern.
g.V()
.hasLabel("PersonPublic")
.has('partitionKey', "Q2r1NaG6KWdScX4RaeZs")
.has('docId', "Q2r1NaG6KWdScX4RaeZs")
.out("CONTACT_LIST")
.out("SUBSCRIBER")
.dedup()
.order()
.by("identifier")
.by("docId")
.fold()
.project('people','total')
.by(
unfold()
.has('docId', gt("23")),
.limit(2)
.project('type','id')
.by(label())
.by(values('docId'))
)
.by(unfold().count())
In plain English, I'm finding a person, finding all the contact lists of that person, finding all the subscribers to those contact lists, de-duplicating the subscribers, ordering the subscribers, pausing there to collect everything and then projecting the results in the form
{
people: [{type: string, id: string}],
total: number,
}
The "people" part of the projection is unfolded, filtered to only contain results with a "docId" greater than "23", limited to 2, and then projected again.
The "total" part of the projection is unfolded (no-limit) and counted.
My goal is to allow paging through a pattern while still retrieving the total number of vertices associated with the pattern.
Unfortunately, on cosmosdb this query is not working. Results are in the form
{
people: {type: string, id: string},
total: number,
}
And only the first person result is returned (rather than an array).
Any help would be greatly appreciated!
You need to fold() the projected value again, otherwise, it's always gonna be trimmed to the first one. Also, for the total you don't need to unfold(), that's just a waste of resources.
g.V()
.hasLabel("PersonPublic")
.has('partitionKey', "Q2r1NaG6KWdScX4RaeZs")
.has('docId', "Q2r1NaG6KWdScX4RaeZs")
.out("CONTACT_LIST")
.out("SUBSCRIBER")
.dedup()
.order()
.by("identifier")
.by("docId")
.fold()
.project('people','total')
.by(
unfold()
.has('docId', gt("23"))
.limit(2)
.project('type','id')
.by(label)
.by('docId')
.fold()
)
.by(count(local))
I faced this issue during a migration of gremlin queries from v2 to v3.
V2-way: inE().has(some condition).outV().map().toList()[0] will return an object. This is wrapped in transform{label: it./etc/} step.
V3-way, still WIP: inE().has(some condition).outV().fold() will return an array. This is wrapped in project(...).by(...) step.
V3 works fine, I just have to unwrap an item from the array manually. I wonder if there is a more sane approach (anyway, this feels like non-graph-friendly step).
Environment: JanusGraph, TinkerPop3+. For v2: Titan graph db and TinkerPop2+.
Update: V3 query sample
inE('edge1').
has('cond1').outV(). // one vertex left
project('items', 'count'). // pagination
by(
order().
by('field1', decr).
project('vertex_itself', 'vertex2', 'vertices3').
by(identity()).
by(outE('edge2').has('type', 'type1').limit(1).inV().fold()). // now this is empty array or single-element array, can we return element itself?
by(inE('edge2').has('type', 'type2').outV().fold()).
fold()).
by(count())
Desired result shape:
[{
items: [
{vertex_itself: Object, vertex2: Object/null/empty, veroces3: Array},
{}...
],
cont: Number,
}]
Problem: vertex2 property is always an array, empty or single-element.
Expected: vertex2 to be object or null/empty.
Update 2: it turns out my query is not finished yet, it returns many object if there are no single element in has('cond1').outV() step, e.g. [{items, count}, {items, count}...]
it looks like your main issue is getting a single item from the traversal.
you can do this with next(), which will retrieve the next element in the current traversal iteration:
inE().has(some condition).outV().next()
the iteratee's structure is, i think, implementation specific. e.g. in javascript, you can access the item with the value property:
const result = await inE().has(some condition).outV().next();
const item = result.value;
I may not fully understand, but it sounds like from this:
inE().has(some condition).outV().fold()
you want to just grab the first vertex you come across. If that's right, then is there a reason to fold() at all? maybe just do:
inE().has(some condition).outV().limit(1)
I have a perfectly working query that looks as follows:
SELECT p.id FROM place p WHERE ST_DISTANCE(p.geometry, {'type': 'Point', 'coordinates':[52.0826443333333, 5.11771783333333]} ) > 6000
It returns a list of id's of documents that are more than 6000 m from the geospatial point. Everything seems fine. However, if I turn around the '>' (greater than) sign to '<' (smaller than), it does not give any result. Interestingly, it does return false/true statements if I put the WHERE clause in a SELECT statement, as follows:
SELECT ST_DISTANCE(p.geometry, {'type': 'Point', 'coordinates':[52.0826443333333, 5.11771783333333]}) < 6000 AS result FROM place p
It generates both true and false statements as expected. So the evaluation seems to work, but it does not return any output. Currently, I just use this latter work around, and also select the computed distances. But now I have to compute the points that are within a certain distance somewhere else (like on the client side or in a stored procedure).
UPDATE
I tested with a specified index policy (thanks to this example):
'indexingPolicy': {'includedPaths': [{'path': '/"geometry"/?', 'indexes': [ {'kind': 'Spatial', 'dataType': 'LineString'}]}, {'path': '/'}]}
And that solved the problem. I still think it is odd that the spatial function did work on 'greater than' and not on 'smaller than', but I think it is solved with this.
You should specify a Spatial index on that field like this:
'indexingPolicy': {
'includedPaths': [
{
'path': '/"geometry"/?',
'indexes': [
{'kind': 'Spatial', 'dataType': 'LineString'}
]
},
{'path': '/'}
]
}
I'm struggling to work around a small limitation of Neo4j in that I am unable to use a parameter in the Relationship section of a Cypher query.
Christophe Willemsen has already graciously assisted me in working my query to the following:
MATCH (n1:Point { name: {n1name} }),
(n2:Point { name: {n2name} }),
p = shortestPath((n1)-[r]->(n2))
WHERE type(r) = {relType}
RETURN p
Unfortunately as r is a Collection of relationships and not a single relationship, this fails with an error:
scala.collection.immutable.Stream$Cons cannot be cast to org.neo4j.graphdb.Relationship
Removing the use of shortestPath() allows the query to run successfully but returns no results.
Essentially my graph is a massive collection of "paths" that link "points" together. It is currently structured as such:
http://console.neo4j.org/r/rholp
I need to be able to provide a starting point (n1Name), an ending point (n2Name), and a single path to travel along (relType). I need a list of nodes to come out of the query (all the ones along the path).
Have I structured my graph incorrectly / not optimally? I am open to advice on whether the overall structure is not optimal as well as advice on how best to structure the query!
EDIT
Regarding your edit, the nodes() function returns you the nodes along the path :
MATCH p=allShortestPaths((n:Point { name:"Point5" })-[*]->(n2:Point { name:"Point8" }))
WHERE ALL (r IN rels(p) WHERE type(r)={relType})
RETURN nodes(p)
In the console link, it is returning nodes Points 5,6,7,8
I guess in your case that using a common relationship type name for connecting your Point nodes would be more efficient.
If having a Path1, Path2, .. is for knowing the distance between two points, you can easily know the distance by asking for the length of the path, like this query related to your console link :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=shortestPath((n)-[]->(n2))
RETURN length(p)
If you need to return only paths having a defined relationship length, you can use it without the shortestPath by specifying a strict depth :
MATCH (n:Point { name:"Point1" })
WITH n
MATCH (n2:Point { name:"Point4" })
WITH n, n2
MATCH p=(n)-[*3..3]->(n2)
RETURN length(p)
LIMIT1
As you can see here, the need to specify the relationship is not mandatory, you can just omit it or add the :NEXT type if you have other relationship types in your graph
If you need to match on the type, for e.g. the path from point 5 to point 8 in your console link, and the path can only have a PATH_TWO relationship, then you can do this :
MATCH (n:Point { name:"Point5" })
WITH n
MATCH (n2:Point { name:"Point8" })
WITH n, n2
MATCH p=(n)-[r*]->(n2)
WHERE type(r[0])= 'PATH_TWO'
WITH p, length(p) AS l
ORDER BY l
RETURN p, l
LIMIT 1
If you really NEED to have the Path1, Path2 style, maybe a short explanation on the need could help us find the more appropriate query
MATCH p=shortestpath((n1:Point{name:{n1name}})-[:relType *]->(n2:Point {name:{n2name}}))
RETURN p