Upsert fails when using as() and coalesce() - gremlin

I'm trying to create an upsert traversal in Gremlin. Update an edge if it exists, otherwise add a new edge.
g.V("123")
.as("user")
.V("456")
.as("post")
.inE("like")
.fold()
.coalesce(
__.unfold()
.property("likeCount", 1),
__.addE("like")
.from("user")
.to("post")
)
This returns an error.
The provided traverser does not map to a value: []->[SelectOneStep(last,post)]
I've narrowed this down to the to("post") step. From within coalesce it can't see post from as("post"). It is also unable to see user.
This is strange to me because the following does work:
g.V("123")
.as("user")
.V("456")
.as("post")
.choose(
__.inE("like"),
__.inE("like")
.property("likeCount", 1),
__.addE("like")
.from("user")
.to("post")
)
From within the choose() step I do have access to user and post.
I'd like to use the more efficient upsert pattern but can't get past this issue. I could just look up the user and post from within coalesce like so:
g.V("123")
.as("user")
.V("456")
.as("post")
.inE("like")
.fold()
.coalesce(
__.unfold()
.property("likeCount", 1),
__.V("456")
.as("post")
.V("123")
.addE("like")
.to("post")
)
But repeating that traversal seems inefficient. I need post and user in the outer traversal for other reasons.
Why can't I access user and post from within a coalesce in my first example?

The issue you are running into is that as soon as you hit the fold() step in your code you lose the path history, which means that it will not know user or post are referring to. fold() is what is known as a ReducingBarrierStep which means that many results are collected into a single result. The way I think about it is that because you have converted many results to one, anything like aliases that were added (e.g. user and post) no longer really have meaning as they have all been collected into a single element.
However you can rewrite your query as shown here to achieve the desired result:
g.V("456")
.inE("like")
.fold()
.coalesce(
__.unfold()
.property("likeCount", 1),
__.addE("like")
.from(V("123"))
.to(V("456"))
)
I am also not sure if you meant to only add a like count on an existing edge or if you wanted to add the like count to the edge in either case which would be like this:
g.V("456")
.inE("like")
.fold()
.coalesce(
__.unfold(),
__.addE("like")
.from(V("123"))
.to(V("456"))
).property("likeCount", 1)

Related

Gremlin Emit Times Loses Lower Hop Solution (Gremlify Example Included)

Relatively new to gremlin and working with this query:
g.V().or(has('LOCATION', eq('IPLTINMYGT0')),
has('LOCATION', eq('IPLTINMYK01'))).
repeat(bothE().otherV()).emit().times(1).has('LOCATION',eq('FSHRIN01K00')).dedup().
path().by(valueMap('LOCATION')).dedup()
And this simple graph on gremlify:
https://gremlify.com/grrlq20ig57/1
When I vary the query from times(1) to times(2), the result that shows up in the times(1) query no longer shows up in the times(2) query. I'm guessing this can be read as 'at most 1 hop' or 'at most 2 hops' so I was expecting when I went to higher level hops the times(1) result would still be included. Any way to get the times(1) result to show up (in addition to the times(2) result) when issuing times(2) queries (or greater)? Does this behavior have anything to do with DFS vs BFS? Any help is appreciated. Thanks!
Your use of dedup() after has('LOCATION',eq('FSHRIN01K00')) is filtering away one of the vertices emitted when you do times(2). Therefore when you call path() it is only called once on the traverser that survives that filter. If you remove that dedup() you get both paths traversed.

Order results by number of coincidences in edge properties

I'm working on a recommendation system that recommends other users. The first results should be the most "similar" users to the "searcher" user. Users respond to questions and the amount of questions responded in the same way is the amount of similarity.
The problem is that I don't know how to write the query
So in technical words I need to sort the users by the amount of edges that has specific property values, I tried with this query, I thought it should work but it doesn't work:
let query = g.V().hasLabel('user');
let search = __;
for (const question of searcher.questions) {
search = search.outE('response')
.has('questionId', question.questionId)
.has('answerId', question.answerId)
.aggregate('x')
.cap('x')
}
query = query.order().by(search.unfold().count(), order.asc);
Throws this gremlin internal error:
org.apache.tinkerpop.gremlin.process.traversal.step.util.BulkSet cannot be cast to org.apache.tinkerpop.gremlin.structure.Vertex
I also tried with multiple .by() for each question, but the result was not ordered by the amount of coincidence.
How can I write this query?
When you cap() an aggregate() it returns a BulkSet which is a Set that has counts for how many times each object exists in that Set. It behaves like a List when you iterate through it by unrolling each object the associated size of the count. So you get your error because the output of cap('x') is a BulkSet but because you are building search in a loop you are basically just calling outE('response') on that BulkSet and that's not valid syntax as has() expects a graph Element such as a Vertex as indicated by the error.
I think you would prefer something more like:
let query = g.V().hasLabel('user').
outE('response');
let search = [];
for (const question of searcher.questions) {
search.push(has('questionId', question.questionId).
has('answerId', question.answerId));
}
query = query.or(...search).
groupCount().
by(outV())
order(local).by(values, asc)
I may not have the javascript syntax exactly right (and I used spread syntax in my or() to just convey the idea quickly of what needs to happen) but basically the idea here is to filter edges that match your question criteria and then use groupCount() to count up those edges.
If you need to count users who have no connection then perhaps you could switch to project() - maybe like:
let query = g.V().hasLabel('user').
project('user','count').
by();
let search = [];
for (const question of searcher.questions) {
search.push(has('questionId', question.questionId).
has('answerId', question.answerId));
}
query = query.by(outE('response').or(...search).count()).
order().by('count', asc);
fwiw, I think you might consider a different schema for your data that might make this recommendation algorithm a bit more graph-like. A thought might be to make the question/answer a vertex (a "qa" label perhaps) and have edges go from the user vertex to the "qa" vertex. Then users directly link to the question/answers they gave. You can easily see by way of edges, a direct relationship, which users gave the same question/answer combination. That change allows the query to flow much more naturally when asking the question, "What users answered questions in the same way user 'A' did?"
g.V().has('person','name','A').
out('responds').
in('responds').
groupCount().
order(local).by(values)
With that change you can see that we can rid ourselves of all those has() filters because they are implicitly implied by the "responds" edges which encode them into the graph data itself.

Gremlin CosmosDB trying to query for a list of nodes from a list of nodes and give both back

graph image
enter code here
(Query to find owner)
.inE().hasLabel('OwnedBy').outV().not(inE().hasLabel('AssignedTo').has('Status', 'InUse'))
.not(
inE()
.hasLabel('AssignedTo')
.has('Status', 'InUse')
).as('cards')
.inE()
.hasLabel('AssignedTo')
.has('Status', 'FutureUse')
.as('OwnedByRequestEdges')
.outV()
.as('OwnedByRequests')
.Select('card', 'OwnedByRequests', 'OwnedByRequestEdges', 'Owner')
I really want it to give me a list of the cards and the list of the requests.
I user can have multiple cards and cards can have multiple future reservations.
In order to store all the values during traversal, you should use "store" and not "as".
Since you want the "select" to run once, you need to add fold() before it.
There was a redundant "not" filter (same filter).
(Query to find owner)
.inE().hasLabel('OwnedBy').outV()
.not(inE().hasLabel('AssignedTo').has('Status','InUse'))
.store('Cards')
.inE().hasLabel('AssignedTo').has('Status', 'FutureUse').outV()
.store('OwnedByRequests')
.fold()
.select('Cards', 'OwnedByRequests')

Why does SELECT then performing a step like hasId() change what was selected?

Am I not using select() properly in my code? When I re-select("pair") for some reason, what it contained originally has been updated after performing some step. Shouldn't what was labeled using as() preserve what was contained?
g.V()
.hasLabel("Project")
.hasId("parentId","childId").as("pair")
.select("pair")
.hasId("parentId").as("parent")
.select("pair") // no longer what it was originally set to
I think this is expected. You (presumably) find two vertices with hasId("parentId","childId") and so the first select("pair") would of course show each vertex. But, then you filter again, hasId("parentId") and kill the traverser that contains the vertex with the id of "childId". It gets filtered away and therefore never triggers the second/last select("pair") step and would only therefore return the one vertex that has the id of "parentId".

Can we filter multiple labels simultaneously

I have a scenario where I have to check multiple vertices with different labels and match their properties under a parent vertex. And then return the parent vertex if everything matches fine.
I tried writing queries with 'and' clause and 'where' clause but none is working:
Here are my trials:
g.V().hasLabel('schedule').inE().outV().hasLabel('url').as('a').outE().inV().aggregate('x').hasLabel('schedule').has('name', '3').as('b').select('x').hasLabel('states').has('name', 'federal').as('c').select('a')
g.V().hasLabel('schedule').inE().outV().hasLabel('url').as('a').outE().where(inV().hasLabel('schedule').has('name', '3')).where(inV().hasLabel('states').has('name', 'federal')).select('a')
g.V().hasLabel('schedule').inE().outV().hasLabel('url').as('a').outE().and(inV().hasLabel('schedule').has('name', '3'),inV().hasLabel('states').has('name', 'federal')).select('a')
g.V().hasLabel('schedule').inE().outV().hasLabel('url').as('a').outE().inV().aggregate('x').hasLabel('schedule').has('name', '3').as('b').select('x').unfold().hasLabel('states').has('name', 'federal').as('c').select('a')
Please guide me through the right path
You can definitely simplify your approach. I don't think you need the step labels and select() for what you are doing which is good, because they add cost to your traversal. I tried to re-write the first traversal you supplied and I"m hoping I have the logic right, but regardless, I'm thinking you will get the idea for what you need to do when you see the change:
g.V().hasLabel('schedule').in().hasLabel('url').
where(and(out().hasLabel('schedule').has('name', '3'),
out().hasLabel('states').has('name', 'federal')))
You already have the "parent" that you want to return on the first line, so just do a filter with where() and add your filtering logic there to traverse away from each of those "parents".

Resources