Gremlin/Tinkerpop - is there a way to add metadata to a union step so I know which query the resulting traversal came from? - gremlin

This is a little strange, but I have a situation where it'd be beneficial for me to know which traversal an element came from.
For a simple example, something like this:
.union(
select('parent').out('contains'), //traversal 1
select('parent2').out('contains') //traversal 2
)
.dedup()
.project('id','traversal')
.by(id())
.by( //any way to determine which traversal it came from? or if it was in both? )
Edit: One thing I found is that I can use Map with Group/By to get partly there:
.union(
select('parent').out('contains')
.map(group().by(identity()).by(constant('t1'))),
select('parent2').out('contains')
.map(group().by(identity()).by(constant('t2'))),
)
.dedup() //Dedup isn't gonna work here because each hashmap will be different.
.project('id','traversal')
.by( //here I can't figure out how to read a value from the hashmap inline )
The above query without the project/by piece returns this:
[{v[199272505353083909]: 't1'}, {v[199272515180338177]: 't2'}]
Or is there a better way to do this?
Thanks!

One simple approach might be to just fold the results. If you get back an empty list you will know you did not find any on that "branch":
g.V('44').
union(out('route').fold().as('a').project('res','branch').by().by(constant('b1')),
out('none').fold().as('b').project('res','branch').by().by(constant('b2')))
which yields
{'res': [v[8], v[13], v[20], v[31]], 'branch': 'b1'}
{'res': [], 'branch': 'b2'}
UPDATED after discussion in comments to include an alternative approach that uses nested union steps to avoid the project step inside the union. I still think I prefer the project approach unless the performance when measured is not good.
g.V('44').
union(local(union(out('route').fold(),constant('b1')).fold()),
local(union(out('none').fold(),constant('b2')).fold()))
which yields
[[v[8], v[13], v[20], v[31]], 'b1']
[[], 'b2']

Related

I am facing issues with group() in gremlin

I am trying to find the count of all workspaces group by Customer, and then sorting the response with the count value.
t3 = g.withSideEffect("Neptune#repeatMode","BFS")
.V().has("Project","sid","A68FA527BB214F0E9D2287B455BEFE0A8AACFED0724B407F9EBF727ED439E8ED")
.both().hasLabel("Customer").group().by().by(
both().hasLabel("Workspace").count().order().by(Column.values,Order.desc)
)
.unfold()
.project("rowName","data")
.by(select(Column.keys).properties(MandatoryCustomerAttributes.firstName.name()).value())
.by(select(Column.values))
.by(fold().unfold());
I am facing the following error which I am not able to understand. If someone can help it will be great.
java.util.concurrent.CompletionException: org.apache.tinkerpop.gremlin.driver.exception.ResponseException: {"requestId":"c3246ac6-d306-41e6-a6d1-3adbacb928fb","code":"InvalidParameterException","detailedMessage":"The provided object does not have accessible keys: class java.lang.Long"}
at java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375)
at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
at org.apache.tinkerpop.gremlin.driver.ResultSet.one(ResultSet.java:123)
at org.apache.tinkerpop.gremlin.driver.ResultSet$1.hasNext(ResultSet.java:175)
at org.apache.tinkerpop.gremlin.driver.ResultSet$1.next(ResultSet.java:182)
at org.apache.tinkerpop.gremlin.driver.ResultSet$1.next(ResultSet.java:169)
at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal$TraverserIterator.next(DriverRemoteTraversal.java:146)
at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal$TraverserIterator.next(DriverRemoteTraversal.java:131)
at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal.nextTraverser(DriverRemoteTraversal.java:112)
at org.apache.tinkerpop.gremlin.process.remote.traversal.step.map.RemoteStep.processNextStart(RemoteStep.java:80)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:129)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:39)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:204)
at org.apache.tinkerpop.gremlin.process.traversal.Traversal.forEachRemaining(Traversal.java:278)
at com.callcomm.eserve.enitty.trial.NeptuneTrialCode.main(NeptuneTrialCode.java:373)
Ok I think after staring at this for a while I spotted the issue.
This part of the query
hasLabel("Workspace").count().order().by(Column.values,Order.desc
Is going to try to apply Column.values against the count result (an integer, not a map) - hence the error message.
You will need to order the group after it is created. Here is a simple example, note the use of local.
g.V('44','3','8').
group().
by().
by(out().count()).
order(local).
by(values,desc)
==>[v[8]:251,v[3]:93,v[44]:4]
As a side note, I did notice that the project only has only 2 items declared but has 3 by modulators. That should not be a problem but you may not be getting the results you wanted.

Gremlin : How do you find vertex and edges when some edges does not exists

I am new to gremlin.
I am facing issue in fetching the vertex and edges when sometimes edge from a vertex does not exists.
for example bellow query works fine if it gets all the vertex and edges.
but for one use case edge
`.outE("PRODUCES").`as`("produces"))`
does not exists in db.
in that case bellow query doesnt return any result.
I need your help to resolve this issue.
when edges does not exit then i want input_entity and processed_by in result.
janusGraph.traversal().V()
.has("isActive", "true")
.hasLabel("ENTITY").`as`("input_entity")
.outE("PROCESSED_BY").`as`("processed_by")
.inV().`as`("job")
.outE("PRODUCES").`as`("produces")
.select<String>("job").outE("HAS_STATE")
.`as`("job_state_edge").inV().hasLabel("JOB_STATE").`as`("job_state")
.select<String>("input_entity").outE("HAS_STATE")
.`as`("input_entity_state_edge").inV().hasLabel("ENTITY_STATE").`as`("input_entity_state")
.select<String>("input_entity").outE("CONNECTS_TO").`as`("connects_to")
.inV().hasLabel("ENTITY").has("entityName", TextP.startingWith(rootNamespace))
.`as`("output_entity").outE("HAS_STATE")
.`as`("output_entity_state_edge").inV().hasLabel("ENTITY_STATE").`as`("output_entity_state")
.select<String>("input_entity","output_entity","processed_by","produces","job","job_state","input_entity_state","output_entity_state","input_entity_state_edge","output_entity_state_edge","job_state_edge","connects_to")
.by(elementMap<Element, Any>()).toList()
with optional
janusGraph.traversal().V()
.has("isActive", "true")
.hasLabel("ENTITY").`as`("input_entity")
.outE("HAS_STATE").`as`("input_entity_state_edge").inV().hasLabel("ENTITY_STATE").`as`("input_entity_state")
.select<String>("input_entity").outE("PROCESSED_BY").`as`("processed_by")
.inV().`as`("job").outE("HAS_STATE").`as`("job_state_edge").inV().hasLabel("JOB_STATE").`as`("job_state")
.select<String>("job")
.optional(
outE("PRODUCES").`as`("produces")
.select<String>("input_entity").outE("CONNECTS_TO").`as`("connects_to")
.inV().hasLabel("ENTITY").has("entityName", TextP.startingWith(rootNamespace))
.`as`("output_entity").outE("HAS_STATE").`as`("output_entity_state_edge").inV().hasLabel("ENTITY_STATE").`as`("output_entity_state"))
.select<String>("input_entity","output_entity","processed_by","produces","job","job_state","input_entity_state","output_entity_state","input_entity_state_edge","output_entity_state_edge","job_state_edge","connects_to")
.by(elementMap<Element, Any>()).toList()
There are two Gremlin steps that can help in cases like this. When you have a part of a query that may or may not exist, but you either want the results up to that point if it does not exist or the results afterwards if it does exist, you can wrap that part of the query in an optional step.
For example :
g.V('3').optional(out())
Will either return V['3'] or the adjacent vertices if out yields results.
In cases where you want to select a value that may not exist, you can do something like this:
coalesce(select('a'),constant('No results'))
EDITED to add:
If you need to return multiple results, rather than just using select try a project('a','b,',c') type of approach where each by modulator for the project can contain its own coalesce step.

Adobe AEM Querybuilder Debugger - Multiple Paths and Multiple Nodenames

I am using querybuilder debugger and want to do a search where "nodename=.pdf OR nodename=.doc*" and "path=/content/dam/1 OR path=/content/dam/2".
I have been trying to find an example but no luck on the web. What I have below is not quite right - just wondering what I am missing.
The query does work but there is a huge difference in the amount of time that it runs when compared with when I just query using one nodename instead of 2.
Thanks in advance,
Jerry
type=dam:asset
mainasset=true
1_group.p.or=true
1_group.1.nodename=*.pdf
1_group.2.nodename=*.doc*
2_group.p.or=true
2_group.1_path=/content/dam/1
2_group.2_path=/content/dam/2
p.limit=-1
orderby=path
I thought maybe something as simple as this might work but no luck....
type=dam:asset
mainasset=true
group.p.or=true
group.1_nodename=*.doc*
group.1_path=/content/dam/1
group.2_nodename=*.doc*
group.2_path=/content/dam/2
group.3_nodename=*.pdf
group.3_path=/content/dam/1
group.4_nodename=*.pdf
group.4_path=/content/dam/2
p.limit=-1
orderby=path
Try splitting your query if this won't affect the behaviour you're trying to achieve.
path=/content/dam/1
type=dam:asset
mainasset=true
group.1.nodename=*.pdf
group.2.nodename=*.doc*
p.limit=-1
orderby=path
path=/content/dam/2
type=dam:asset
mainasset=true
group.1.nodename=*.pdf
group.2.nodename=*.doc*
p.limit=-1
orderby=path

Gremlin order on Map results

I have the below query:
g.V('1')
.union(
out('R1')
.project('timestamp', 'data')
.by('completionDate')
.by(valueMap().by(unfold()))
out('R2')
.project('timestamp', 'data')
.by('endDate')
.by(valueMap().by(unfold()))
)
How can I order the UNION results by timestamp?
I've tried using ".order().by('timestamp')" but this only works on traversals and UNION returns a MAP object.
Here are a couple of ways to approach it. First, you could just use your code as-is and then order() by the "timestamp":
g.V('1').
union(out('R1').
project('timestamp', 'data').
by('completionDate').
by(valueMap().by(unfold())),
out('R2').
project('timestamp', 'data').
by('endDate').
by(valueMap().by(unfold()))).
order().by(select('timestamp'))
Note the difference is to select() the key from the Map that you want to sort on. Versions after 3.4.5 will work more as you expect and you can simply do by('timestamp') for a Map as well as an Element.
I think that a more readable approach however would be to go with this approach:
g.V('1').
out('R1','R2').
project('timestamp', 'data').
by(coalesce(values('endDate'), values('completionDate'))).
by(valueMap().by(unfold())).
order().by(select('timestamp'))
You might need to enhance the by(coalesce(...)) depending on the nature of your schema, but hopefully you get the idea of what I'm trying to do there.

sqlite-net-plc full text rank function

I'm creating a xamarin.forms application, and we use sqlite-net-plc by Frank A. Krueger. It is supposed to support full text searching, which I am trying to implement.
Now, full text search seems to work. I created a query like:
SELECT * FROM Document d JOIN(
SELECT document_id
FROM SearchDocument
WHERE SearchDocument MATCH 'test*'
) AS ranktable USING(document_id)
which seems to work fine. However, I'd like to return the results in order of their rank, otherwise the result is useless. According to the documentation (https://www.sqlite.org/fts3.html), the syntax should be:
SELECT * FROM Document d JOIN(
SELECT document_id, rank(matchinfo(SearchDocument)) AS rank
FROM SearchDocument
WHERE SearchDocument MATCH 'test*'
) AS ranktable USING(document_id)
ORDER BY ranktable.rank
However, the engine doesn't seem to know the "rank" function:
[ERROR] FATAL UNHANDLED EXCEPTION: SQLite.SQLiteException: no such function: rank
It does know the "matchinfo" function though.
Can anyone tell me what I'm doing wrong?
Edit: After some more searching it seems that the rank function is simply not implemented in the library. I'm confused. How can people use the fulltext search without caring about the order of the results? Is there some other way of ordering the results so that the most relevant results are at the top?
It depends on SQLitePCLRaw.bundle_green. It's worth looking into that.

Resources