I am facing issues with group() in gremlin

I am facing issues with group() in gremlin - gremlin

I am trying to find the count of all workspaces group by Customer, and then sorting the response with the count value.
t3 = g.withSideEffect("Neptune#repeatMode","BFS")
.V().has("Project","sid","A68FA527BB214F0E9D2287B455BEFE0A8AACFED0724B407F9EBF727ED439E8ED")
.both().hasLabel("Customer").group().by().by(
both().hasLabel("Workspace").count().order().by(Column.values,Order.desc)
)
.unfold()
.project("rowName","data")
.by(select(Column.keys).properties(MandatoryCustomerAttributes.firstName.name()).value())
.by(select(Column.values))
.by(fold().unfold());
I am facing the following error which I am not able to understand. If someone can help it will be great.
java.util.concurrent.CompletionException: org.apache.tinkerpop.gremlin.driver.exception.ResponseException: {"requestId":"c3246ac6-d306-41e6-a6d1-3adbacb928fb","code":"InvalidParameterException","detailedMessage":"The provided object does not have accessible keys: class java.lang.Long"}
at java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375)
at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
at org.apache.tinkerpop.gremlin.driver.ResultSet.one(ResultSet.java:123)
at org.apache.tinkerpop.gremlin.driver.ResultSet$1.hasNext(ResultSet.java:175)
at org.apache.tinkerpop.gremlin.driver.ResultSet$1.next(ResultSet.java:182)
at org.apache.tinkerpop.gremlin.driver.ResultSet$1.next(ResultSet.java:169)
at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal$TraverserIterator.next(DriverRemoteTraversal.java:146)
at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal$TraverserIterator.next(DriverRemoteTraversal.java:131)
at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal.nextTraverser(DriverRemoteTraversal.java:112)
at org.apache.tinkerpop.gremlin.process.remote.traversal.step.map.RemoteStep.processNextStart(RemoteStep.java:80)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:129)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:39)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:204)
at org.apache.tinkerpop.gremlin.process.traversal.Traversal.forEachRemaining(Traversal.java:278)
at com.callcomm.eserve.enitty.trial.NeptuneTrialCode.main(NeptuneTrialCode.java:373)

Ok I think after staring at this for a while I spotted the issue.
This part of the query
hasLabel("Workspace").count().order().by(Column.values,Order.desc
Is going to try to apply Column.values against the count result (an integer, not a map) - hence the error message.
You will need to order the group after it is created. Here is a simple example, note the use of local.
g.V('44','3','8').
group().
by().
by(out().count()).
order(local).
by(values,desc)
==>[v[8]:251,v[3]:93,v[44]:4]
As a side note, I did notice that the project only has only 2 items declared but has 3 by modulators. That should not be a problem but you may not be getting the results you wanted.

Related

Gremlin/Tinkerpop - is there a way to add metadata to a union step so I know which query the resulting traversal came from?

This is a little strange, but I have a situation where it'd be beneficial for me to know which traversal an element came from.
For a simple example, something like this:
.union(
select('parent').out('contains'), //traversal 1
select('parent2').out('contains') //traversal 2
)
.dedup()
.project('id','traversal')
.by(id())
.by( //any way to determine which traversal it came from? or if it was in both? )
Edit: One thing I found is that I can use Map with Group/By to get partly there:
.union(
select('parent').out('contains')
.map(group().by(identity()).by(constant('t1'))),
select('parent2').out('contains')
.map(group().by(identity()).by(constant('t2'))),
)
.dedup() //Dedup isn't gonna work here because each hashmap will be different.
.project('id','traversal')
.by( //here I can't figure out how to read a value from the hashmap inline )
The above query without the project/by piece returns this:
[{v[199272505353083909]: 't1'}, {v[199272515180338177]: 't2'}]
Or is there a better way to do this?
Thanks!

One simple approach might be to just fold the results. If you get back an empty list you will know you did not find any on that "branch":
g.V('44').
union(out('route').fold().as('a').project('res','branch').by().by(constant('b1')),
out('none').fold().as('b').project('res','branch').by().by(constant('b2')))
which yields
{'res': [v[8], v[13], v[20], v[31]], 'branch': 'b1'}
{'res': [], 'branch': 'b2'}
UPDATED after discussion in comments to include an alternative approach that uses nested union steps to avoid the project step inside the union. I still think I prefer the project approach unless the performance when measured is not good.
g.V('44').
union(local(union(out('route').fold(),constant('b1')).fold()),
local(union(out('none').fold(),constant('b2')).fold()))
which yields
[[v[8], v[13], v[20], v[31]], 'b1']
[[], 'b2']

Gremlin : How do you find vertex and edges when some edges does not exists

I am new to gremlin.
I am facing issue in fetching the vertex and edges when sometimes edge from a vertex does not exists.
for example bellow query works fine if it gets all the vertex and edges.
but for one use case edge
`.outE("PRODUCES").`as`("produces"))`
does not exists in db.
in that case bellow query doesnt return any result.
I need your help to resolve this issue.
when edges does not exit then i want input_entity and processed_by in result.
janusGraph.traversal().V()
.has("isActive", "true")
.hasLabel("ENTITY").`as`("input_entity")
.outE("PROCESSED_BY").`as`("processed_by")
.inV().`as`("job")
.outE("PRODUCES").`as`("produces")
.select<String>("job").outE("HAS_STATE")
.`as`("job_state_edge").inV().hasLabel("JOB_STATE").`as`("job_state")
.select<String>("input_entity").outE("HAS_STATE")
.`as`("input_entity_state_edge").inV().hasLabel("ENTITY_STATE").`as`("input_entity_state")
.select<String>("input_entity").outE("CONNECTS_TO").`as`("connects_to")
.inV().hasLabel("ENTITY").has("entityName", TextP.startingWith(rootNamespace))
.`as`("output_entity").outE("HAS_STATE")
.`as`("output_entity_state_edge").inV().hasLabel("ENTITY_STATE").`as`("output_entity_state")
.select<String>("input_entity","output_entity","processed_by","produces","job","job_state","input_entity_state","output_entity_state","input_entity_state_edge","output_entity_state_edge","job_state_edge","connects_to")
.by(elementMap<Element, Any>()).toList()
with optional
janusGraph.traversal().V()
.has("isActive", "true")
.hasLabel("ENTITY").`as`("input_entity")
.outE("HAS_STATE").`as`("input_entity_state_edge").inV().hasLabel("ENTITY_STATE").`as`("input_entity_state")
.select<String>("input_entity").outE("PROCESSED_BY").`as`("processed_by")
.inV().`as`("job").outE("HAS_STATE").`as`("job_state_edge").inV().hasLabel("JOB_STATE").`as`("job_state")
.select<String>("job")
.optional(
outE("PRODUCES").`as`("produces")
.select<String>("input_entity").outE("CONNECTS_TO").`as`("connects_to")
.inV().hasLabel("ENTITY").has("entityName", TextP.startingWith(rootNamespace))
.`as`("output_entity").outE("HAS_STATE").`as`("output_entity_state_edge").inV().hasLabel("ENTITY_STATE").`as`("output_entity_state"))
.select<String>("input_entity","output_entity","processed_by","produces","job","job_state","input_entity_state","output_entity_state","input_entity_state_edge","output_entity_state_edge","job_state_edge","connects_to")
.by(elementMap<Element, Any>()).toList()

There are two Gremlin steps that can help in cases like this. When you have a part of a query that may or may not exist, but you either want the results up to that point if it does not exist or the results afterwards if it does exist, you can wrap that part of the query in an optional step.
For example :
g.V('3').optional(out())
Will either return V['3'] or the adjacent vertices if out yields results.
In cases where you want to select a value that may not exist, you can do something like this:
coalesce(select('a'),constant('No results'))
EDITED to add:
If you need to return multiple results, rather than just using select try a project('a','b,',c') type of approach where each by modulator for the project can contain its own coalesce step.

Printing/Fetching Vertex values from a path

Just getting started with gremlin.
Printing out all the Vertex values worked out fine
gremlin> g.V().values()
==>testing 2
==>Cash Processing
==>Sales
==>Marketing
==>Accounting
I was able to find all the directly connected path between my Vertices.
gremlin> g.V().hasLabel('Process')
.repeat(both().simplePath())
.until(hasLabel('Process'))
.dedup().path()
==>[v[25],v[28]]
==>[v[25],v[26]]
==>[v[26],v[27]]
==>[v[26],v[25]]
Now am trying to print out the values in the path like ['Sales', 'Accounting'] instead of [v[25],v[28]]
Not been able to figure out a way yet
Already tried and failed with
Unfold: Does not get me 1-1 mapping
gremlin> g.V().hasLabel('Process').repeat(both().simplePath()).until(hasLabel('Process')).dedup().path().unfold().values()
==>Cash Processing
==>Accounting
==>Cash Processing
==>Sales
==>Sales
==>Marketing
==>Sales
==>Cash Processing
Path seems to be of a different data-type and does not support .values() function
gremlin> g.V().hasLabel('Process')
.repeat(both().simplePath())
.until(hasLabel('Process'))
.dedup().path().values()
org.apache.tinkerpop.gremlin.process.traversal.step.util.ImmutablePath cannot be cast to org.apache.tinkerpop.gremlin.structure.Element
Tried the following google searches and didnt get the answer
gremlin print a path
gremlin get values in a path
and few more word twisting
Found one at here that was for java but that didnt work for me
l = []; g.V().....path().fill(l)
(but cant create list, Cannot set readonly property: list for class: org.apache.tinkerpop.gremlin.structure.VertexProperty$Cardinality
)
I have running it on Gremlin console (running ./gremlin.sh)

You can use the by step to modulate the elements inside the path. For example by supplying valueMap(true) to by you get the properties of the vertices, together with the vertex labels and their ids:
gremlin> g.V().repeat(both().simplePath()).times(1).dedup().path().by(valueMap(true))
==>[[id:1,name:[marko],label:person,age:[29]],[id:3,name:[lop],lang:[java],label:software]]
==>[[id:1,name:[marko],label:person,age:[29]],[id:2,name:[vadas],label:person,age:[27]]]
==>[[id:1,name:[marko],label:person,age:[29]],[id:4,name:[josh],label:person,age:[32]]]
==>[[id:2,name:[vadas],label:person,age:[27]],[id:1,name:[marko],label:person,age:[29]]]
==>[[id:3,name:[lop],lang:[java],label:software],[id:6,name:[peter],label:person,age:[35]]]
==>[[id:4,name:[josh],label:person,age:[32]],[id:5,name:[ripple],lang:[java],label:software]]
I used the modern graph which is one of TinkerPop's toy graphs that are often used for such examples. Your output will look a bit different and you may want to use something else than valueMap(true) for the by modulator. The TinkerPop documentation of the path step itself contains two more advanced examples for path().by() that you might want to check out.

BigQuery Timeout Errors in R Loop Using bigrquery

I am running a query in a loop for each store in a dataframe. Typically there are 70 or so stores so the loop repeats that many times for each complete loop.
Maybe 75% of the time this loop works all the way through with no errors.
About 25% of the time I get the following error during any one of the loop iterations:
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached
Then I have to figure out which iteration bombed, and repeat the loop excluding iterations that completed successfully.
I can't find anything on the web to help me understand what is causing this seemingly random error. Perhaps it is a BQ technical issue? There does not seem to be any relation to the size of the result set it crashes on.
Here is the part of my code that does the loop...again it works all the way through most of the time. The cartesian product across IDs is intentional, as I want every combination of each Test ID with all possible Control IDs within store.
sql<-"SELECT pstore as store, max(pretrips) as pretrips FROM analytics.campaign_ids
group by 1 order by 1"
store_maxtrips<-query_exec(sql,project=project, max_pages = 1)
store_maxtrips
for (i in 1:length(store_maxtrips$store)) {
#pull back all ids shopping in same primary store as each test ID with their pre metrics
sql<-paste("SELECT a.pstore as pstore, a.id as test_id,
b.id as ctl_id,
(abs(a.zpbsales-b.zpbsales)*",wt_pb_sales,")+(abs(a.zcatsales-b.zcatsales)*",wt_cat_sales,")+
(abs(a.zsales-b.zsales)*",wt_retail_sales,")+(abs(a.ztrips-b.ztrips)*",wt_retail_trips,") as zscore
FROM analytics.campaign_ids a inner join analytics.pre_zscores b
on a.pstore=b.pstore
where a.id<>b.id and a.pstore=",store_maxtrips$store[i]," order by a.pstore, a.id, zscore")
print(paste("processing store",store_maxtrips$store[i]))
query_exec(sql,project=project,destination_table = "analytics.campaign_matches",
write_disposition = "WRITE_APPEND", max_pages = 1)
}

Solved!
It turns out I was using query_exec, but I should have been using insert_query_job since I do not want to retrieve any results. The errors were all happening in the course of R trying to retrieve results from BigQuery which I didn't want anyhow.
By using insert_query_job + wait_for(job) in my loop instead of the query_exec command, it eliminated all issues with the loop finishing.
I did also need to add a try() function to help circumvent some rare errors that still popped up with this approach. Thanks to MarkeD for this tip. So my final solution looked like this:
try(job<-insert_query_job(sql,project=project,destination_table = "analytics.campaign_matches",
write_disposition = "WRITE_APPEND"))
wait_for(job)
Thanks to everyone who commented and helped me research the issue.

[Titan Example Graph]: copySplit and back not working as expected

The following query of the Titan example graph does not produce what I expected:
g.V.has("age", T.lte,1000).as('young').out('battled').has("name","cerberus").copySplit(
_().back('young'),
_()
).exhaustMerge
it gives me twice the cerberus vertex, instead of hercules and cerberus
It appears that back is not working after a copySplit. Is there a way around this limitation?

Already answered on the Gremlin users mailing list, but here we go again:
These 2 alternatives would still work in Gremlin3 (with a slightly different syntax, but the concept is the same):
gremlin> g.V().has("age", T.lte, 1000).as("young").out("battled").has("name", "cerberus").as("monster").select()
==>[young:v[24], monster:v[44]]
Or:
gremlin> g.V().has("age", T.lte, 1000).out("battled").has("name", "cerberus").path()
==>[v[24], v[44]]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

I am facing issues with group() in gremlin - gremlin

Related

Gremlin/Tinkerpop - is there a way to add metadata to a union step so I know which query the resulting traversal came from?

Gremlin : How do you find vertex and edges when some edges does not exists

Printing/Fetching Vertex values from a path

BigQuery Timeout Errors in R Loop Using bigrquery

[Titan Example Graph]: copySplit and back not working as expected

Categories

Resources