Queries executing slow with tinkerpop Java api - gremlin

Using tinkerpop Java API for queries with Titan , Cassandra as storage and Elastic as the query index. Works fine with the gremlin console. Any pointers on what to look for ? ( The data or the query size does not matter , seen with no data and minimal query as well . Can rule out caching too.)

Related

Debugging gremlin problems with AWS Neptune

I've sent a gremlin statement which fails with the following error:
{"code":"InternalFailureException","requestId":"d924c667-2c48-4a7e-a288-6f23bac6283a","detailedMessage":"null: .property(Cardinality, , String)"} (599)
I've enabled audit logs on the cluster side but there aren't any indications for any errors there although I see the request.
Are there any technics to debug such problems with AWS Neptune?
Since the gremlin is built dynamically and builds complicated graphs with thousands of operations, I'm looking for some way to understand better where the error is.
In one case it turned out the payload was too big and in another, the gremlin bytecode failed although it worked great on the local Tinkerpop Server.
But with the generic internal failure exception it is very hard to pinpoint the issues.
I'm using the Node.js gremlin package (3.5.0).
Thanks for the clarifications. As a side note, Neptune is not yet at the 3.5.0 TinkerPop level so there is always the possibility of mismatches between client and server. The audit log only shows the query received. It does not reflect the results of running the query. When I am debugging issues like this one, I often create a text version of the query (the node client has a translator that can do that for you) and use the Neptune REST API to first of all check the /explain plan - you can also generate that from the Neptune notebooks using %%gremlin explain. If nothing obvious shows up in the explain plan I usually try to run the text version of the query just to see if anything shows up - to rule out the client driver. I also sometimes run my query against a TinkerGraph just to see if the issue is as simple as Neptune not returning the best error message it could. If the text query should work, changing the Node.js Gremlin client driver to 3.4.11 is worth trying.
UPDATED to add a link to the documentation on Gremlin translators.
https://tinkerpop.apache.org/docs/current/reference/#translators

Is there a way to run GQL queries without defining entity model in python

I have a java, spring-data app that uses Datastore. I need a subset of this data to run analytics using python app. What I need in python app is essentially a join (yup, relational doesnt get out of me) between two "Kinds" queried by key of one kind.
NDB client requires creating same entity models in python to be able to query data, which is a drag. Why cant i simply run the console version of GQL(select * from kind) using python. Maybe I am missing something as this sort of querying is available in almost all relational and nosql DBs.
Your observations are correct: a GQL query cannot perform a SQL-like "join" query. This is documented on the GQL Reference for Python NDB/DB documentation page.
If you would like to submit a feature request to request its implementation, you can simply open an issue for it in the Public Issue Tracker.

How to access a query "Query Execution Metrics" in Cosmos db .NET Core SDK V3

I am running a query against an Azure Cosmos db and I need to know the total number of retrieved documents regardless of the pagination. Running a Count query against the actual query without the pagination could be very heavy if the number retrieved documents are huge.
In the bellow link it is described how to access to a query "Query Execution Metrics" in Cosmos db .NET SDK V2, I appreciated if someone guide me how to do it using the SDK V3.
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-query-metrics
Version 3.2.0 of the SDK that was released yesterday addresses this issue. Instead of asking for the metrics, they are included in every query. You can access them through ResponseMessage.Diagnostics.
The usage is probably easiest to see by looking at the SDK's tests:
((QueryOperationStatistics)responseMessage.Diagnostics)
.queryMetrics
.Values
.First()
.RetrievedDocumentCount
You can see the full list of properties in the QueryMetrics definition: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/2cdcde1b747db59721ede152fc9b5aa87fc62dd4/Microsoft.Azure.Cosmos/src/Query/Core/QueryMetrics/QueryMetrics.cs

How do I export a remote graph to json using tinkerpop gremlin and neptune?

I'm trying to export an entire remote graph into json. When I use the following code, it results in an empty file. I am using Gremlin-driver 3.3.2 as this is the same version in the underling graph database, AWS Neptune.
var traversal = EmptyGraph.instance().traversal().withRemote(DriverRemoteConnection.using(getCluster()))
traversal.getGraph().io(graphson()).writeGraph("my-graph.json");
How is one supposed to populate the graph with data such that it can be exported?
I also posted this to the Gremlin Users list.
Here's some code that will do it for you with Neptune and should work with most Gremlin Server implementations I would think.
https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-export
The results of the export can be used to load via Neptune's bulk loader if you choose to export as CSV.
Hope that is useful
If that is more than you needed hopefully it will at least give you some pointers that help.
With hosted graphs, including Neptune, it is not uncommon to find that they do not expose the Graph object or give access to the io() classes.
The Gremlin io( ) Step is not supported in Neptune. Here is the Neptune's documentation which talks about the other difference between the Amazon Neptune implementation of Gremlin and the TinkerPop implementation.
Taking on-board the valuable feed-back from Ankit and Kelvin, I concentrated on using a local gremlin server to handle the data wrangling.
Once I had the data in a locally running server, by generating gremlin script from an in-memory entity model,I accessed it via a Gremlin console and ran the following:
~/apache-tinkerpop-gremlin-console-3.3.7/bin/gremlin.sh
gremlin> :remote connect tinkerpop.server conf/remote.yaml
gremlin> :> graph.io(graphson()).writeGraph("my-graph.json")
==>null
This put the my-graph.json file in /opt/gremlin-server/ on the docker container.
I extracted it using docker cp $(docker container ls -q):/opt/gremlin-server/my-graph.json .
I can then use this data to populate a gremlin-server testcontainer for running integration tests against a graph database.
neptune-export doesn't support direct export to S3. You'll have to export to the local file system, and then separately copy the files to S3.

How do you connect to a Cosmos Db (primarily updated via SQL API) using Gremlin.Net ? (can you?)

Im working on a Cosmos DB app that stores both standard documents and graph documents. We are saving both types via the documentdb api and I am able to run graph queries that return Graphson using the DocumentClient.CreateGremlinQuery method. This graphson is to be read by a web app and the graph displayed for user viewing and so on.
My issue is that I cannot define the version of the Graphson format returned when using the Microsoft.Azure.Graphs method. So i looked into Gremlin.net and that has a lot more options in this regard from the documentation.
However I am finding connecting to the Cosmos Document Db using gremlin.net difficult. The server variable which you define like this :
var server = new GremlinServer("https://localhost/",8081 , enableSsl: true, username: $"/dbs/TheDatabase/colls/TheCOllection", password: "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==");
then results in a uri that has "/gremlin" and it cannot locate the database end point.
Has anyone used Gremlin.net to connect to a Cosmos document database (not a Cosmos db configured as a graph db) that has been setup as a document db not a graph db ? The documents in it are graph/gremlin compatible in their format with _isEdge / label / _sink etc.
Cheers,
Mark (Document db/Gremlin/graph newbie)

Resources