Use Traversal to query Azure Cosmos DB graph - azure-cosmosdb

I am trying to use Traversal to query an Azure Cosmos DB graph as follows
val cluster = Cluster.build(File("remote.yaml")).create()
val client = cluster.connect()
val graph = EmptyGraph.instance()
val g = graph.traversal().withRemote(DriverRemoteConnection.using(cluster))
val traversal = g.V().count()
val aliased = client.alias("g")
val result = aliased.submit(traversal)
val resultList = result.all().get()
resultList.forEach { println(it) }
Problem is execution hangs after result.all().get() and never get a response. I only have this problem when submitting a traversal. When submitting a Gremlin query string directly it works properly.

I'm in a similar boat, but according to this recent query Does Cosmos DB support Gremlin.net c# GLV? traversals are not possible just yet. However, for those using (or thinking about using) Gremlin.NET to connect to Cosmos, I'll share some of what I've been able to do.
Firstly, I have no trouble connecting to cosmos from the gremlin console, just when using Gremlin.NET as follows:
var gremlinServer = new GremlinServer(hostname, port, enableSsl: true,
username: "/dbs/" + database + "/colls/" + collection,
password: authKey);
var driver = new DriverRemoteConnection(new GremlinClient(gremlinServer));
//var driver = new DriverRemoteConnection(new GremlinClient(new GremlinServer("localhost", 8182)));
var graph = new Gremlin.Net.Structure.Graph();
var g = graph.Traversal().WithRemote(driver);
g.V().Drop().Next(); // nullreferenceexception
When using Gremlin.NET to work with a locally hosted gremlin server (see commented out line), all works fine.
The only way I can work with cosmos using gremlin.net is to submit queries as string literals e.g.
var task = gremlinClient.SubmitAsync<dynamic>("g.V().Drop()");
This works, but I want to be able to use fluent traversals.
I can work with Cosmos quite easily using the Azure/Graph API (documentclient etc), but still only with string literals. Also, this isn't very portable, and apparently slower too

Related

How to get available Graph container list in my Cosmos Graph DB?

I am trying to get metadata information of my Cosmos Graph Database. There are a number of Graphs created in this database and I want to list those Graph names.
In the Gremlin API, we have support to connect to any Graph DB container and then we can submit the query as I mentioned in the below code sample. But here we need a {collection} that is our GraphName as well. So somehow we are bound to a particular graph here.
var gremlinServer = new GremlinServer(hostname, port, enableSsl: true,
username: "/dbs/" + database + "/colls/" + collection,
password: authKey);
using (var gremlinClient = new GremlinClient(gremlinServer, new GraphSON2Reader(), new GraphSON2Writer(), GremlinClient.GraphSON2MimeType))
{
gremlinClient.SubmitAsync(query);
}
Is there any way so that we can connect to GraphDB only and get some metadata information ? Such as, in my case, list of available Graphs.
It looks like the Gremlin Client is implemented at a Collection level (i.e. graph) so it won't be possible to enumerate Graphs from one account / database using the gremlin connection.
You can always use the CosmosDB SDK to connect to the account and enumerate the databases/collections and then use the Gremlin Clients to connect to each of them separately.
Install-Package Microsoft.Azure.Cosmos
using (var client = new CosmosClient(endpoint, authKey))
{
var dbIterator = client.GetDatabaseQueryIterator<DatabaseProperties>();
while(dbIterator.HasMoreResults)
{
foreach (var database in await dbIterator.ReadNextAsync())
{
var containerIterator = database.GetContainerQueryIterator<ContainerProperties>();
while (containerIterator.HasMoreResults)
{
foreach (var container in await containerIterator.ReadNextAsync())
{
Console.WriteLine($"{database.Id} - {container.Id}");
}
}
}
}
}

Invalid index exception when using BulkExecutor in CosmosDb

I have an error when I'm trying to use BulkExecutor to update one of the properties in CosmosDb. The error message is "Index was out of range. Must be non-negative and less than the size of the collection.
Parameter name: index"
Important point- I don't have partition key defined on my collection.
Here is my code:
SetUpdateOperation<string> player1NameUpdateOperation = new SetUpdateOperation<string>("Player1Name", name);
var updateOperations = new List<UpdateOperation>();
updateOperations.Add(player1NameUpdateOperation);
var updateItems = new List<UpdateItem>();
foreach (var match in list)
{
string id = match.id;
updateItems.Add(new UpdateItem(id, null, updateOperations));
}
var executor = new Microsoft.Azure.CosmosDB.BulkExecutor.BulkExecutor(_client, _collection);
await executor.InitializeAsync();
var executeResult = await executor.BulkUpdateAsync(updateItems);
var count = executeResult.NumberOfDocumentsUpdated;
What am I missing?
If I run the bulk executor on a collection without a partition key, I get the same error. If I run it with a collection that does have it and i specify it, the bulk executor works fine.
Pretty sure they just don't support it right now through the bulk executor api, just use the normal cosmos api for updating the doc as a workaround for now.

How to get metrics for a request on CosmosDB graph collection?

I want to find out details about a Gremlim query - so I set the PopulateQueryMetrics property of the FeedOptions argument to true.
But the FeedResponse object I get back doesn't have the QueryMetrics property populated.
var queryString = $"g.addV('{d.type}').property('id', '{d.Id}')";
var query = client.CreateGremlinQuery<dynamic>(graphCollection, queryString,
new FeedOptions {
PopulateQueryMetrics = true
});
while (query.HasMoreResults)
{
FeedResponse<dynamic> response = await query.ExecuteNextAsync();
//response.QueryMetrics is null
}
Am I missing something?
According to your description, I created my Azure Cosmos DB account with Gremlin (graph) API, and I could encounter the same issue as you mentioned. I found a tutorial Monitoring and debugging with metrics in Azure Cosmos DB and read the Debugging why queries are running slow section as follows:
In the SQL API SDKs, Azure Cosmos DB provides query execution statistics.
IDocumentQuery<dynamic> query = client.CreateDocumentQuery(
UriFactory.CreateDocumentCollectionUri(DatabaseName, CollectionName),
“SELECT * FROM c WHERE c.city = ‘Seattle’”,
new FeedOptions
{
PopulateQueryMetrics = true,
MaxItemCount = -1,
MaxDegreeOfParallelism = -1,
EnableCrossPartitionQuery = true
}).AsDocumentQuery();
FeedResponse<dynamic> result = await query.ExecuteNextAsync();
// Returns metrics by partition key range Id
IReadOnlyDictionary<string, QueryMetrics> metrics = result.QueryMetrics;
Then, I queried my Cosmos DB Gremlin (graph) account via the SQL API above, I retrieved the QueryMetrics as follows:
Note: I also checked that you could specify the SQL expression like this SELECT * FROM c where c.id='thomas' and c.label='person'. For adding new Vertex, I do not know how to construct the SQL expression. Moreover, the CreateDocumentAsync method does not support the FeedOptions parameter.
Per my understanding, the PopulateQueryMetrics setting may only work when using the SQL API. You could add your feedback here.

How to fetch All records from azure cosmos db using query

I want to fetch more than 100 records from azure-cosmos DB using select query.
I am writing a stored procedure and using a select query to fetch the record.
This is my stored procedure -
function getall(){
var context = getContext();
var response = context.getResponse();
var collection = context.getCollection();
var collectionLink = collection.getSelfLink();
var filterQuery = 'SELECT * FROM c';
collection.queryDocuments(collectionLink, filterQuery, {pageSize:-1 },
function(err, documents) {
response.setBody(response.getBody() + JSON.stringify(documents));
}
);
}
Initially, It was working with less amount of data in database.
But, with large amount of data,
The stored procedure is throwing this exception -
Encountered exception while executing function. Exception = Error:
Resulting message would be too large because of "Body". Return from
script with current message and use continuation token to call the
script again or modify your script. Stack trace: Error: Resulting
message would be too large because of "Body". Return from script with
current message and use continuation token to call the script again or
modify your script.
Document DB imposes limits on Response page size.
This link summarizes some of those limits:
Azure DocumentDb Storage Limits - what exactly do they mean?
You can paginate your data using continuation tokens. The Document Db sdk supports reading paginated data seamlessly.
https://azure.microsoft.com/en-us/blog/documentdb-paging-support-with-top-and-more-query-improvements/
Are you using .NET sdk to retrieve the data returned by your stored procedure? If so, take advantage of the .HasMoreResults. It automatically get the allowed size data results thus not showing the error you posted. Loop through it until there's no more fetched results.
http://www.kevinkuszyk.com/2016/08/19/paging-through-query-results-in-azure-documentdb/

Query parameters in Cosmos DB Graph-API

Are query parameters supported in the new Cosmos DB graph API?
For example in the query:
IDocumentQuery<dynamic> query = client.CreateGremlinQuery<dynamic>(graph, "g.V().has('name', 'john')");
Can I replace the hard-coded value 'john' with a query parameter as we could do in DocumentDB:
IQueryable<Book> queryable = client.CreateDocumentQuery<Book>(
collectionSelfLink,
new SqlQuerySpec
{
QueryText = "SELECT * FROM books b WHERE (b.Author.Name = #name)",
Parameters = new SqlParameterCollection()
{
new SqlParameter("#name", "Herman Melville")
}
});
I am asking with security in mind. Or might there be other ways to defend against injections in Gremlin?
Tinkerpop in general has a notion of bindings, which allow your to define your data separately from your gremlins scripts. An example using Java code can be found here: https://github.com/tinkerpop/gremlin/wiki/Using-Gremlin-through-Java
(search for bindings).
You can also use bindings through the Http endpoint for example by doing something like:
curl http://localhost:8182 -d '{"gremlin": "g.V().has(key1, value1);", "bindings": {"key1": "name", "value1": "david"}}'
You need to find out if the client in your query supports the bindings parameters, but it seems to me what you are looking for is a Tinkerpop compatible functionality.

Resources