Alternative to group by for cosmos db - azure-cosmosdb

Given that cosmos db does not support group by, what is a good alternative to achieve similar functionality:
Select sum(*) , groupterm from tble group by groupterm
Can I efficiently achieve this in a cosmos stored procedure?

As Cosmos_DB states as follows:
Aggregation capability in SQL limited to COUNT, SUM, MIN, MAX, AVG functions. No support for GROUP BY or other aggregation functionality found in database systems. However, stored procedures can be used to implement in-the-database aggregation capability.
Can I efficiently achieve this in a cosmos stored procedure?
For .NET and Node.js
Larry Maccherone has provided a great package documentdb-lumenize which supports Aggregations (Group-by, Pivot-table, and N-dimensional Cube) and Time Series Transformations as Stored Procedures in DocumentDB.
Additionally, for Python and Scala, you could refer to azure-cosmosdb-spark.

Group by is now supported in Cosmos db SQL API. You will be needing SDK version 3.3 or higher
Azure Cosmos DB currently supports GROUP BY in .NET SDK 3.3 or later.
Support for other language SDK's and the Azure Portal is not currently
available but is planned.
https://learn.microsoft.com/en-gb/azure/cosmos-db/sql-query-group-by

Finally, Azure Cosmos DB currently supports GROUP BY in .NET SDK 3.3 or later. Support for other language SDK's and the Azure Portal is not currently available but is planned.
<group_by_clause> ::= GROUP BY <scalar_expression_list>
<scalar_expression_list> ::=
<scalar_expression>
| <scalar_expression_list>, <scalar_expression>

Related

Rolling joins with database backends

Version 1.1.0 of dplyr acquired features that allow it to express complex joins; for example one can now express rolling joins. I believe that at the moment (as of dbplyr 2.3.0) there is no translation of these constructs into SQL. I was curious about:
I assume that the plan is to provide backend translations for most of these new constructs. Is this correct?
If so, I was wondering what the likely translation of these constructs would be for the MS SQL Server backend? For example, what are possible T-SQL translations for the likes of join_by(company == id, closest(year >= since))?

CosmosDB Zone Redundancy using Azure Libraries for Net

I currently create a CosmosDB with the following properties:
cosmosDb = await azure.CosmosDBAccounts
.Define(cosmosDbResource.Name)
.WithRegion(cosmosDbResource.Region)
.WithExistingResourceGroup(cosmosDbResource.ResourceGroup.Name)
.WithKind(DatabaseAccountKind.GlobalDocumentDB)
.WithStrongConsistency()
.WithTags(cosmosDbResource.ResourceGroup.Tags)
.CreateAsync();
The only place I have seen to be able to set Zone Redundancy on is the ReadReplication database, like so:
cosmosDb = await azure.CosmosDBAccounts
.Define(cosmosDbResource.Name)
.WithRegion(cosmosDbResource.Region)
.WithExistingResourceGroup(cosmosDbResource.ResourceGroup.Name)
.WithKind(DatabaseAccountKind.GlobalDocumentDB)
.WithStrongConsistency()
.WithReadReplication(Region.USEast, true)
.WithTags(cosmosDbResource.ResourceGroup.Tags)
.CreateAsync();
The problem is that I don't care about a Read Replication database. I want to set Zone Redundancy on the initial database I create. I noticed that in the Azure Portal when I create a CosmosDB manually, it gives me the option to set Zone Redundancy. Is this not possible via the Azure Libraries for NET SDK?
To specify write region with Zone Redundancy do this below:
.WithWriteReplication(Region.USWest2, true)
PS: If at all possible I would recommend you use the Auto-rest generated version of this SDK. The fluent API is not generally as up to date as the Auto-rest generated API's. This gets built directly off our the Cosmos DB swagger spec and everything downstream is built upon this including ARM, PowerShell and CLI.
There is a repository with a fairly complete set of examples as well that you can use to help build your own management libraries. It also includes fluent samples but also out of date. Cosmos DB Samples
This is the repo for the Auto-rest generated SDK. Cosmos DB Management SDK for .NET

Is there ParitionKey input In upsertItem in cosmos sql api in java sdk?

Is there any paritionKey input in upsertItem in cosmos sql api in java sdk.
because CosmosContainer provide only two methods 1) upsertItem(Item) and 2) upsertItem(Item, CosmosItemRequestOptions).
Sure, just clone and try the Java SDK samples here.

How to run aggregates across partitions

When I run a query like this in the portal, it runs just fine. But when I run it from the python SDK, I get “Cross partition query only supports 'VALUE ' for aggregates.". I want to run it across all partitions. Any suggestion on how to get it working from the SDK?
Thanks,
Peter
SELECT
c.data.video.id,
COUNT(1) as nTraces,
MAX(c.data.time) as lastReport
FROM c
WHERE c.data.time > "2020-06-02T17:40:25.593141+00:00"
GROUP BY c.data.video.id
According to my test,python sdk don't support group by.
Below is my test code:
query = "SELECT c.id FROM c group by c.id"
items = list(container.query_items(
query=query,
enable_cross_partition_query=True
))
Here is the error:
azure.cosmos.exceptions.CosmosHttpResponseError: (BadRequest) Gateway Failed to Retrieve Query Plan: Query contains 1 or more unsupported features. Upgrade your SDK to a version that does support the requested features:
Query contained GroupBy, which the calling client does not support.
Refer to this document,.net sdk and js sdk support group by.
So you can do this by using .net sdk and js sdk.
By the way,I search for the python sdk release history and it doesn't mentioned supporting group by.
Hope this can help you.

How to utilize automatic indexing in CosmosDB/Cassanadra API?

Cosmos DB FAQ says in Cassandra API section that Azure Cosmos DB provides automatic indexing of all attributes without any schema definition. https://learn.microsoft.com/en-us/azure/cosmos-db/faq#does-this-mean-i-dont-have-to-create-more-than-one-index-to-satisfy-the-queries-1
But when I try to add WHERE column1 = 'x' filter to my CQL query, I get exception from Datastax cassandra driver saying that data filtering is not supported. I tried to bypass client driver by supplying ALLOW FILTERING but this time got error from cosmos server saying this feature is not implemented.
So, if automatic indexing is implemented for Cosmos/Cassandra API, how can it be used?

Resources