How to write aggregations in cosmos db similar to Mongo DB aggregations - azure-cosmosdb

I have a set of defined aggregation in Mongodb for a collection, i want the same feature in cosmosdb. How to have a aggregation in cosmosdb for a collection.
In azure portal i dont see similar option.Please help

Aggregations for Azure Cosmos DB for NoSQL are similar to those used for relational SQL using a combination of Group By and one or more aggregate functions.
You can learn more by seeing the docs and examples for Group by and Aggregate Functions
One thing to note is that Cosmos DB is not designed to do large-scale heavy analytical queries. Cosmos DB is optimized as an operational database. If you are looking to do heavy analytical queries you may want to take a look at Synapse Link.

Related

SQL Graph Database VS Cosmos Gremlin graph DB

I have been thinking about what are the pros and cons of SQL Graph Database and Cosmos Graph Database, as far as I understand, SQL graph database is using nodes and vertex, but it still stores all of the information in tables.
So my question would be if the graph data can be handled by graph Db, what are the advantages of using SQL graph database? What is the added value of it compared with the original graph Database ?
SQL Graph Database and Cosmos Graph Database both are almost same kind of services, just the structure of handling the data is different. As such there are no advantages and disadvantages, but choosing the right service based on your use-case is the key factor.
Azure Cosmos DB's Gremlin API combines the power of graph database
algorithms with highly scalable, managed infrastructure to provide a
unique, flexible solution to most common data problems associated with
lack of flexibility and relational approaches.
So, by using Azure Cosmos DB Gremlin API, you will get more leverage on the datasets with additional features. On the top of that, all the prerequisites will be taken care by CosmosDB while creating the database using Gremlin API.
In SQL Graph DB, nodes and edges are in tabular form, whereas in Cosmos DB it is in JSON like format.
I would highly encourage you to analyze how these databases support graph database models and the mechanism to exploit the maximum potential of these database systems for the right use-cases.
Please refer below articles to get the better understanding of both the services.
SQL Graph Architecture
Introduction to Gremlin API in Azure Cosmos DB

BigQuery vs Cloud SQL autoscaling?

I declare that I am a beginner in using Google Cloud Platform.
I am developing a web application in react using firebase, so all data is saved on firestore.
Now I need to have a relational database, and I am very confused as to which is the best between Cloud SQL and BigQuery.
My idea was to have one part of the data on Cloud SQL and the other part on Firestore.
When an event happens, the data from Cloud SQL and firestore are merged and uploaded to BigQuery for analysis.
Example:
On Firestore I have a product that has an array field where IDs are
stored. These IDs are related to the Database saved on Cloud SQL. When
an order is placed it is added to a collection on Firestore and
appended to the database on BigQuery.
My problem is that from what I have read there is no possibility of autoscaling on Cloud SQL, while on BigQuery it does.
So my question is can you autoscale on CloudSQL?
If it can't be done, is it correct to use BigQuery exclusively?
Is there another solution on GCP that allows you to have a relational database but with autoscaling?
Edit 1
This is the very simplified model of a part of the database on CloudSQL / BigQuery
I'll use a 2/3 inner join query to get all the values I need.
I don't know how to make it non-relational and therefore be able to use firestore without having a large duplication of data, I am open to any kind of advice
Not sure that I understood correctly, but I reckon you would like to get some data (from one data source), combine/process that data with the data from a Firestore collection, and load/stream the result into BigQuery. All of that - is operationally in run time. The question is about the choice of that data source - either a Cloud SQL or a BigQuery.
Am I right that from you point of view the main Cloud SQL drawback - is a lack of scalability (autoscale). And you would like to consider a BigQuery instead of the Cloud SQL due to the 'autoscale'?
It is not clear what is the rate of the request/queries you expect, and where the data is located (any requirements on a global access), so it may be difficult to discuss the situation. Anyway...
Thinking about BigQuery, in my opinion, - this is a great "database" (the best from my point of view), but mainly for analytical purposes... Each query has some 'initial' latency (the query job won't be executed faster than some threshold), which cannot be significantly minimised, and there is no binary indexes in BigQuery tables. It means that your query will take a few seconds (let's assume 3 or more) every time you run it (unless the result is taken from the cache). If the number of requests is significant - it may become expensive (in BigQuery) and expensive in the component, which is used to process that task (i.e. Cloud Function triggered by some event) - as the later has to wait (and do nothting) during the query time.
In addition, BigQuery is very good in loading or steeaming data into it, but not very good in regular data updates inside it - there are plenty of limitations. Thus, depending on your context, it may be not very good idea to maintain operational data in BigQuery.
If I rule out the BigQuery -
Can we sacrifice 'autoscalability' for the Cloud SQL?
Can we use a Firestore collection instead of the Cloud SQL (and sacrifice the 'relational' property?
Can we use Cloud SQl and handle the the amount of data in tables which are used for querying, so there is no delays?
Not sure if I managed to help, but at least I provided some thoughts about the problem.
'Now I need to have a relational database, and I am very confused as to which is the best between Cloud SQL and BigQuery.'
Please be aware that BigQuery cannot be used to substitute a relational Database, and it is oriented on running analytical queries, not for simple CRUD operations and queries (Like in Cloud SQL). That doesn’t mean BigQuery can’t handle normalized data and joins. It absolutely can. It just performs better on denormalized stuff because BigQuery is essentially an OLAP engine. So, denormalize whenever possible (please read here).
You can use read replications to scale Cloud SQL. Read Replica instances allow data from the master instance to be replicated to one or more slaves. This setup can provide increased read throughput. Please see this.

Where do I set a partitionKey in CosmosDB deployed as a Gremlin instance?

I have several Vertices and Edges to create and think I might have "hot" sections of data. (as in Azure Table Storage)
Are my scalability and other knowledge from Azure Tables applicable to Gremlin on Azure? If so, how?
Namely, I want to have "subdivided slices" of sub-tenants (or user partitions) on the database. (If possible I might want to reference between them, or query both at the same time)
Scalability and performance of any Azure Cosmos DB API is based on partitioning. Same concept is applicable for Azure Cosmos Gremlin API. While creating a graph you need to define the partition key and partitions will be created based on that.
On top of it, you can go through below article that mentions few more optimization that can help with scalability and performance. As per the article, "Queries that obtain data from a single partition provide the best possible performance."
https://learn.microsoft.com/en-us/azure/cosmos-db/graph-partitioning

cosmos db support different schema in a collection?

I am trying to write a record with a different schema to an existing collection with records . I don't get a exception, but i don't see the new record.
Do I need to use a different collection?
DocumentDBRepository<ScheduleViewModel>.CreateItemAsync(task).GetAwaiter();
Cosmos DB doesn't care about what you put into to it (as almost any other nosql db), so this is supported from the Cosmos DB perspective. from the code perspective, I suppose you need to create a connection that would support the model your are using and create a document

What are the differences between CosmoDB and DocumentDB

As far as I can work out, CosmoDB has the ability to make Graph queries using the Gremlin query language. Apart from that the pricing, marketing etc. all seem the same. It seems strange that they came up with a new product to add Gremlin when they didn't do the same to add MongoDB support. What are the discernable differences between these two products?
The Azure Cosmos DB team member here.
Azure Cosmos DB started as “Project Florence” in 2010 to address developer pain-points faced by large scale applications inside Microsoft. Observing that the challenges of building globally distributed apps are not a problem unique to Microsoft, in 2015 we made the first generation of this technology available to Azure developers in the form of Azure DocumentDB. Since that time, we’ve added new features and introduced significant new capabilities. Azure Cosmos DB is the result. It is the next big leap in globally distributed, at scale, cloud databases. As a part of this release of Azure Cosmos DB, DocumentDB customers, with their data, are automatically Azure Cosmos DB customers. The transition is seamless and they now have access to the new breakthrough system and capabilities offered by Azure Cosmos DB.
In the evolution of Cosmos DB, we have added significant new capabilities since 2015 (when DocumentDB was made generally available) but only a subset of these capabilities was available in DocumentDB. These capabilities are in the areas of the core database engine as well as, global distribution, elastic scalability and industry-leading, comprehensive SLAs. Specifically, we have evolved the Cosmos DB database engine to be able to efficiently map all popular data models, type systems and APIs to the underlying data model of Cosmos DB. The developer facing manifestation of this work currently will experience it via support for Gremlin and Table Storage APIs. And this is just the beginning… We will be adding other popular APIs and newer data models over time with more advances towards performance and storage at global scale.
We also have extended the foundation for global and elastic scalability of throughput and storage. One of the very first manifestations of it is the RU/m (https://learn.microsoft.com/en-us/azure/cosmos-db/request-units-per-minute) but we have more capabilities that we will be announcing in these areas. The new capabilities will help save cost for our customers for various workloads. We have made several foundational enhancements to the global distribution subsystem. One of the many developer facing manifestations of this work is the consistent prefix consistency model (making in total 5 well-defined consistency models). However, there are many more interesting capabilities we will release as they mature.
It is important to point out that we view Azure Cosmos DB as a constantly evolving database service. Typically, we first validate all new capabilities with the large scale applications inside Microsoft, subsequently expose them to key external customers, and finally, release them to the world.
It is also important to point out that DocumentDB’s SQL dialect has always been just one of the many APIs that the underlying Cosmos DB was capable of supporting. As a developer using a fully managed service like Cosmos DB, the only interface to the service is the APIs exposed by the service. To that end, nothing really changes for a DocumentDB customer. Cosmos DB offers the exactly the same SQL API that DocumentDB did. However, now (and in the future) you can get access to other capabilities which were previously not accessible.
DocumentDB is one of the APIs for CosmosDB. Others include Table Storage, MongoDB, Gremlin.
Think about CosmosDB as the database platform that handles scaling, throughput, consitency, etc and DocumentDB as one of the types of the databases than run on CosmosDB.
Azure Cosmos DB natively supports multiple data models including documents, key-value, graph, and column-family. The core content-model of Cosmos DB’s database engine is based on atom-record-sequence (ARS). Atoms consist of a small set of primitive types like string, bool, and number. Records are structs composed of these types. Sequences are arrays consisting of atoms, records, or sequences.
The database engine can efficiently translate and project different data models onto the ARS-based data model. The core data model of Cosmos DB is natively accessible from dynamically typed programming languages and can be exposed as-is as JSON.
https://learn.microsoft.com/en-us/azure/cosmos-db/introduction
CosmosDB is the new DocumentDB for NoSQL solution.
As Cosmosdb architect Rimma mentioned
The Azure Cosmos DB DocumentDB API or SQL (DocumentDB) API is now
known as Azure Cosmos DB SQL API. You don't need to change anything to
continue running your apps built with DocumentDB/DocumentDB API. The
functionality remains the same. Thanks.
DocumentDB is one of the APIs for CosmosDB.As of now, if you go to Azure portal and try to create an Azure Cosmos DB, you have to select one of the 4 APIs available there:
Gremlin (Graph)
MongoDB
SQL (DocumentDB)
Table (key-value)

Resources