As far as I can work out, CosmoDB has the ability to make Graph queries using the Gremlin query language. Apart from that the pricing, marketing etc. all seem the same. It seems strange that they came up with a new product to add Gremlin when they didn't do the same to add MongoDB support. What are the discernable differences between these two products?
The Azure Cosmos DB team member here.
Azure Cosmos DB started as “Project Florence” in 2010 to address developer pain-points faced by large scale applications inside Microsoft. Observing that the challenges of building globally distributed apps are not a problem unique to Microsoft, in 2015 we made the first generation of this technology available to Azure developers in the form of Azure DocumentDB. Since that time, we’ve added new features and introduced significant new capabilities. Azure Cosmos DB is the result. It is the next big leap in globally distributed, at scale, cloud databases. As a part of this release of Azure Cosmos DB, DocumentDB customers, with their data, are automatically Azure Cosmos DB customers. The transition is seamless and they now have access to the new breakthrough system and capabilities offered by Azure Cosmos DB.
In the evolution of Cosmos DB, we have added significant new capabilities since 2015 (when DocumentDB was made generally available) but only a subset of these capabilities was available in DocumentDB. These capabilities are in the areas of the core database engine as well as, global distribution, elastic scalability and industry-leading, comprehensive SLAs. Specifically, we have evolved the Cosmos DB database engine to be able to efficiently map all popular data models, type systems and APIs to the underlying data model of Cosmos DB. The developer facing manifestation of this work currently will experience it via support for Gremlin and Table Storage APIs. And this is just the beginning… We will be adding other popular APIs and newer data models over time with more advances towards performance and storage at global scale.
We also have extended the foundation for global and elastic scalability of throughput and storage. One of the very first manifestations of it is the RU/m (https://learn.microsoft.com/en-us/azure/cosmos-db/request-units-per-minute) but we have more capabilities that we will be announcing in these areas. The new capabilities will help save cost for our customers for various workloads. We have made several foundational enhancements to the global distribution subsystem. One of the many developer facing manifestations of this work is the consistent prefix consistency model (making in total 5 well-defined consistency models). However, there are many more interesting capabilities we will release as they mature.
It is important to point out that we view Azure Cosmos DB as a constantly evolving database service. Typically, we first validate all new capabilities with the large scale applications inside Microsoft, subsequently expose them to key external customers, and finally, release them to the world.
It is also important to point out that DocumentDB’s SQL dialect has always been just one of the many APIs that the underlying Cosmos DB was capable of supporting. As a developer using a fully managed service like Cosmos DB, the only interface to the service is the APIs exposed by the service. To that end, nothing really changes for a DocumentDB customer. Cosmos DB offers the exactly the same SQL API that DocumentDB did. However, now (and in the future) you can get access to other capabilities which were previously not accessible.
DocumentDB is one of the APIs for CosmosDB. Others include Table Storage, MongoDB, Gremlin.
Think about CosmosDB as the database platform that handles scaling, throughput, consitency, etc and DocumentDB as one of the types of the databases than run on CosmosDB.
Azure Cosmos DB natively supports multiple data models including documents, key-value, graph, and column-family. The core content-model of Cosmos DB’s database engine is based on atom-record-sequence (ARS). Atoms consist of a small set of primitive types like string, bool, and number. Records are structs composed of these types. Sequences are arrays consisting of atoms, records, or sequences.
The database engine can efficiently translate and project different data models onto the ARS-based data model. The core data model of Cosmos DB is natively accessible from dynamically typed programming languages and can be exposed as-is as JSON.
https://learn.microsoft.com/en-us/azure/cosmos-db/introduction
CosmosDB is the new DocumentDB for NoSQL solution.
As Cosmosdb architect Rimma mentioned
The Azure Cosmos DB DocumentDB API or SQL (DocumentDB) API is now
known as Azure Cosmos DB SQL API. You don't need to change anything to
continue running your apps built with DocumentDB/DocumentDB API. The
functionality remains the same. Thanks.
DocumentDB is one of the APIs for CosmosDB.As of now, if you go to Azure portal and try to create an Azure Cosmos DB, you have to select one of the 4 APIs available there:
Gremlin (Graph)
MongoDB
SQL (DocumentDB)
Table (key-value)
Related
I have been thinking about what are the pros and cons of SQL Graph Database and Cosmos Graph Database, as far as I understand, SQL graph database is using nodes and vertex, but it still stores all of the information in tables.
So my question would be if the graph data can be handled by graph Db, what are the advantages of using SQL graph database? What is the added value of it compared with the original graph Database ?
SQL Graph Database and Cosmos Graph Database both are almost same kind of services, just the structure of handling the data is different. As such there are no advantages and disadvantages, but choosing the right service based on your use-case is the key factor.
Azure Cosmos DB's Gremlin API combines the power of graph database
algorithms with highly scalable, managed infrastructure to provide a
unique, flexible solution to most common data problems associated with
lack of flexibility and relational approaches.
So, by using Azure Cosmos DB Gremlin API, you will get more leverage on the datasets with additional features. On the top of that, all the prerequisites will be taken care by CosmosDB while creating the database using Gremlin API.
In SQL Graph DB, nodes and edges are in tabular form, whereas in Cosmos DB it is in JSON like format.
I would highly encourage you to analyze how these databases support graph database models and the mechanism to exploit the maximum potential of these database systems for the right use-cases.
Please refer below articles to get the better understanding of both the services.
SQL Graph Architecture
Introduction to Gremlin API in Azure Cosmos DB
I have several Vertices and Edges to create and think I might have "hot" sections of data. (as in Azure Table Storage)
Are my scalability and other knowledge from Azure Tables applicable to Gremlin on Azure? If so, how?
Namely, I want to have "subdivided slices" of sub-tenants (or user partitions) on the database. (If possible I might want to reference between them, or query both at the same time)
Scalability and performance of any Azure Cosmos DB API is based on partitioning. Same concept is applicable for Azure Cosmos Gremlin API. While creating a graph you need to define the partition key and partitions will be created based on that.
On top of it, you can go through below article that mentions few more optimization that can help with scalability and performance. As per the article, "Queries that obtain data from a single partition provide the best possible performance."
https://learn.microsoft.com/en-us/azure/cosmos-db/graph-partitioning
CosmosDb has a good feature of Globally Distributed which gives Faster Response of data. This will be useful for Mobile Applications directly accessing CosmosDb where Users are spread across the Globe.
However I am using ASP.NET Web Application hosted in Azure. Here my Application to Database communication will be of Fixed Distance always.
Can I benefit from CosmosDb in this case?
This is for Azure hosted ASP.NET Application
You can utilize CosmosDB when you know noSQL concept and so is your code, it has different implementation for read and write processes or you are planning to do microservices or you have other projects that depends/communicate on your Webapp project and your using the same database
There are some points you need to take into account before choosing CosmosDB as the database.
Pricing model! CosmosDB is not a cheep database and pricing model is based on the provisioned throughput. Requests that exceed the provisioned throughput will be rejected by the database. So first make sure you completely understand how things work.
Like other document based databases, if you wanna keep a graph of objects in a document, you should consider how to handle concurrent updates to the documents (if that is the case in your app). Hope you know well the difference between document based and relational databases.
But regarding the benefits:
It has a great a integration support with other PaaS services in Azure
It scales very well if you have a good partitioning strategy
I'm interested in using the Azure CosmosDB for it's Graph capability.
Looking through the docs i saw that it sores graph vertices and edges as JSON documents (with an agreed schema) and so it can be accessed as a plain old DocumentDB.
Taking this into consideration what is the meaning of the API selection you need to make when creating a new instance (link)?
eg :
what am i losing if i create the DB as SQL (DocumentDB) and
manipulating data via the graph part of the client (eg CreateGremlinQuery)
what am i losing if i create the DB as Graph and
manipulating data via the DocumentDB part of the client (eg CreateDocumentAsync)
UPDATE : I am aware of the portal difference (as described below by Jesse Carter). I am interested if this switch drives anything else under the hood in the specific scenario of choosing between SQL(Document DB) vs Graph
There is no functional difference from the perspective of interacting with your Cosmos Collection through either SQL or Graph APIs regardless of which API you choose at creation time.
HOWEVER, there is a difference from the perspective of the Azure portal when navigating your resources. Collections created specifically using the Graph API will get tagged as such and enable additional UI features in the portal for executing Gremlin queries and basic graph visualization.
If you don't care about those querying abilities in the Azure portal, then you're fine to create the collection with either option.
API selection is to avoid confusion for users who are only familier with gremlin and don’t want to learn documentDB.
If you are an advanced user, using both graph and documentDB will give you more power.
Note that we are committed to making the gremlin and documentDB SQL integration even more seamless.
Please drop us a note askcosmosdbgraphapi#microsoft.com, if you want to lean more or set up a time to talk to us.
Jayanta
I am starting to port one old desktop single tenant application into the cloud and wish to hear what would be your recommendation about the databases for my cloud-based multi-tenant application?
My basic requirement is simple:
For each tenant, its data is separate to any other tenants' data. I can easily backup, restore, export the data for one single tenant without affecting other tenants.
I don't really want to care about multi-tenancy in the business logic code. It should look like a single tenant application behind the security layer, no tenant ID pass around etc.
Easy to query using some mature technology like LINQ.
Availability and scalability, of course, easy to set up replicas, fail-over and scaling up and down etc.
I have gone through some investigations about multi-tenant application development. I have noticed SQL databases from Azure and AWS are both very expensive(the cost for just SQL database instance is close to the license fee of the original application), so I definitely can't use separate SQL database instances for tenants.
Now I'm reading this book Developing Multi-tenant Applications for the Cloud, 3rd Edition, and it uses Azure Storage Service to implement multi-tenancy. I haven't finished the book yet, it seems you still have to handle the multi-tenancy by yourself and the sample code is already out of date.
I have seen lots of SO questions compare Azure Table Storage with MongoDB. The MongoDB is very new to me, not sure whether it could be easily used to fulfill my requirements?
And I have seen RavenDB as well, it does support multi-tenancy out of box. But I didn't see some good sample code about how to use it in Azure app development.
Hope to hear some good advices from awesome SO guys.
I would better opt with RavenDB on top of MongoDB. Even Raven is a new comer in to the game, it supports most of the features which traditional SQL supports.
Also to make up a decisions the volume of data you are dealing is a also a key decision pointer. Also the amount of traffic you are expecting.
Also keep in mind that operational costs and development efforts. HA and DR scenarios can be problematic when you use Raven or Mongo because of the fact that you need to host them. But when it comes to Azure Storage, it by defaults protects you to a maximum extent by maintaining 3 copies of information.
So I would suggest you to carefully make the trade offs and opt wisely based on your business needs, cost optimization, development and operational effort.
Having a single instance of your application for each tenant is a very expensive way to implement an application, however I realise that if an application was developed with a single tenant in mind, then the costs of changing over can be high.
First can we start out with why you have a desktop application connecting to a database at another location. The latency can really slow down an application. Ideally you would want a locally installed database and have it sync with the cloud DB, or add in appropriate caching into your application.
However the DB would still need to differentiate the clients.
Why do you need this to go to a cloud database? Is it for backup purposes, not installing a DB locally on a clients machine, accessing the same data from many machines or something else?
Unless your application is extremely large, I would recommend rewriting it for multi-tenant to one SQL Azure database. The architecture chosen at the beginning of the project doesn't suit your requirements now. As you expand you will run into further issues.