I have a simple query to fetch vertex count in cosmos gremlin database.
g.V('person1').out('know').hasLabel('person').count()
The output of this query is say 1000. The number of RUs consumed by this query is ~466. Just wanted to know if there is any way to optimize this query or the way vertex can be stored is graph.
Adding PartitionKey to the query would help here,
g.V([partitionkey],[vertex id]).out('know').hasLabel('person').count()
Related
I use Azure Cosmos DB API for MongoDB account, version 3.6. In queries using skip and limit I noticed higher throughput usage. The higher skip is the more costly query is.
db.MyCollection.find({Property:"testtest"}).skip(12000).limit(10)
Above query costs around 3000 RU. The property that is in find clause is my partition key. I have read that currently cosmosdb is capable of doing queries with offset and limit but I found that officaly only in SQL API for CosmosDb there's OFFSET LIMIT clause. Is it possible with MongoDb API either or should I live with costly queries with skip ?
The SQL API will yield the same result with OFFSET LIMIT. You'll find an almost linear increase in RU as you increase the offset as each query loops over all skipped documents.
If possible you should try to use the continuation token if possible in your context. You could also adjust your filter criteria using a indexed property to move over your data.
The RU charge of a query with OFFSET LIMIT will increase as the number of terms being offset increases. For queries that have multiple pages of results, we typically recommend using continuation tokens. Continuation tokens are a "bookmark" for the place where the query can later resume. If you use OFFSET LIMIT, there is no "bookmark". If you wanted to return the query's next page, you would have to start from the beginning.
Source
I am evaluating Cosmos Db for a project and working through the documentation. I have created a sample collection following the documentation on this page https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-getting-started. When I run the first query on this page in the local emulator I get the following results:
Why is the Request Charge 2.89 RUs? From all of the documentation I have read this should be 1 RU. The collection is partitioned on the id field and is auto indexed and Cross Partition Queries are enabled. I have event tried putting both items in the same partition and I get the same results.
1 RU is the cost of a Point-Read operation, not a query. Reference: https://learn.microsoft.com/azure/cosmos-db/request-units:
The cost to read a 1 KB item is 1 Request Unit (or 1 RU).
Also there:
Query patterns: The complexity of a query affects how many RUs are consumed for an operation. Factors that affect the cost of query operations include
If you want to read a single document, and you know the id and partition key, just do a point operation, it will always be cheaper than a query with the id="something" query. If you don't know the partition key, then yes, you need a cross partition query, because you don't know on which partition key is stored and there could be multiple documents with the same id (as long as their partition keys are different, see https://learn.microsoft.com/azure/cosmos-db/partitioning-overview).
You can use any of the available SDKs or work with the REST API.
I am thinking of creating a reporting tool for a Cosmos DB instance in our system. I want to discover all the partition keys and the number of items stored for each. I think I should be using pkranges but do not seem to be able to find any examples of how this should work. Any suggestions?
You can try to get list of pkranges using this api
As of now, there is no api to return list of partition keys. There was a feature request at Azure Cosmos DB Improvement Ideas, but it is declined.
So, you can try "SELECT distinct c.partitionKey FROM c" to get list of distinct partition keys.
I run the queries likes the following in gremlin in Cosmos Graph:
g.V().hasLabel('vertex_label').limit(1)
This query is problematic in concern of size of data which returned from DB as this query returns all inE and outE of the selected vertex. The question is how can I optimize this query in notion of size of query result?
As mentioned, the mentioned query returns a vertex with all its dependencies and connections. Therefore, it can be problematic in high volume of data (when there are a lot of connection with the specified vertex). Hence, We can optimize our queries using something likes properties, properyMap, values,and valueMap. In sum, valueMap(true) in the end of related queries can be useful and minimize the size of the transferred data from Cosmos. For example:
g.V().hasLabel('play').limit(1).valueMap(true)
The boolean value is for getting id and label of the vertex in addition to vertex properties.
Also, If there is any notion of optimization in the structure of a query, you can find more in this link.
How are you using CosmosDB Graph, via Microsoft.Azure.Graphs SDK or Gremlin server?
If you are using Microsoft.Azure.Graphs, the latest version (0.2.4-preview as of posting) supports specifying the GraphSONFormat as a parameter on DocumentClient.CreateGremlinRequest(..). You can choose from either GraphSONFormat.Normal or GraphSONFormat.Compact and Compact should be the default if it is not supplied.
For the CosmosDB Gremlin server, Compact is also the default behavior.
With GraphSONFormat.Compact, vertex results won't include edges and as a result, outE and inE fetches can be skipped when fetching the vertex. GraphSONFormat.Normal will return the full GraphSON response if this is desired.
Additional Note: There are optimizations on limit() that will be included in the next release of SDK/server, so I would expect additional perf gains on the traversal example that you provided when the release becomes available.
How to get past 30 days data using dynamo db with group by clause(power).
Having table name lightpowerinfo with fields like id, lightport, sessionTime, power.
Amazon DynamoDB is a NoSQL database, which means that it is not possible to write SQL queries against the data. Therefore, there is no concept of a GROUP BY statement.
Instead, you would need to write an application to retrieve the relevant raw data, and then calculate the results you seek.