Azure Data Explorer vs Azure Synapse Analytics (a.k.a SQL DW) - azure-data-explorer

I design a data management strategy for a big IoT company. Our use case is fairly typical, we ingest large quantities of data, analyze them, and produce datasets that customers can query to learn about the insights they need.
I am looking at both Azure Data Explorer and the Data Warehouse side of Azure Synapse Analytics (a.k.a Azure SQL Data Warehouse) and find many commonalities. Yes, they use different languages and a different query engine on the backend, but both serve as a "serving layer" that customers use to query read-only data at a large scale.
I could not find any clear guidance from Microsoft about how to choose between the two, or maybe it makes sense to use them together? In that case, what is the best use case or type of data for each of the services?
If you can enlighten me please share your thoughts here. If you know about some guidance about the matter please reply with a link.

The classic and also the modern data warehouse pattern involve first designing a well curated data model, with documented entities and their attributes, creating a scheduled ETL pipeline that transforms and aggregates the raw data, big and small into the data model. Then you load and serve it. The curated data model provides stability, consistency and reliability when consuming these entities across an enterprise.
Azure Data Explorer was designed as an analytical data platform for telemetry. In this workload you do not aggregate the data first, but actually keep it close to the raw format as you do not want to lose data. It allows you to deal with the unexpected nature of security attacks, malfunctions, competitive behaviors, and in general the unknowns, as it allows looking at the fresh raw data from different angles and provide a lot flexibility.
This is why Azure Data Explorer is the storage for Microsoft Telemetry and also a growing set of analytical solutions like: Azure Monitor, Azure Security Center, Azure Sentinel, Azure Time Series Insights, IoT Central, PlayFab gaming analytics, Windows Intune Analytics, Customer Insights, Teams Education analytics and more.
Providing high performance analytics on raw data, with schema-on-read capability on textual, semi structured and structured data.
Quite a few of our partners and customers are adopting ADX for the same reasons.
Check out the overview webinar that describe these concepts in detail.
Azure Synapse Analytics packed SQL DW, ADF and Spark to have all the data warehouse pattern components highly integrated and easier to work with and manage. As we announced on the Azure Data Explorer Virtual Event, Azure Data Explorer is being integrated to Azure Synapse Analytics along side the SQL and Spark pools to cater for telemetry workloads - Real time analytics on high velocity, high volume, high variety data.
Check out some of the IoT cases Buhler, Daimler video,story, Bosch, AGL and there are more leading IoT platforms who are adopting Azure Data Explorer for this purpose. Reach out to us if you need additional help.

Related

Which API should be used for querying Application Insights trace logs?

Our ASP.NET Core app logs trace messages to App Insights. We need to be able to query them and filter by some customDimentions. However, I have found 3 APIs and am not sure which one to use:
App Insights REST API
Azure Log Analytics REST API
Azure Data Explorer .NET SDK (Preview)
Firstly, I don't understand the relationships between these options. I thought that App Insights persisted its data to Log Analytics; but if that's the case I would expect to only be able to query through Log Analytics.
Regardless, I just need to know which is the best to use and I wish that documentation were clearer. My instinct says to use the App Insights API, since we only need data from App Insights and not from other sources.
The difference between #1 and #2 is mostly historical and converging.
Application Insights existed as a product before log analytics, and were based on different underlying database technologies
Both Application Insights and Log Analytics converged to use the same underlying database, based on ADX (Azure Data Explorer), and the same exact REST API service to query either. So while your #1 and #2 links are different, they point to effectively the same service backend by the same team, but the pathing/semantics are subtly different where the service looks depending on the inbound request.
both AI and LA introduce the concept of multi-tenancy and a specific set of tables/schema on top of their azure resources. They effectively hide the entire database from you, and make it look like one giant database.
there is now the possibility (suggested) to even have your Application Insights data placed in a Log Analytics Workspace:
https://learn.microsoft.com/en-us/azure/azure-monitor/app/create-workspace-resource
this lets you put the data for multiple AI applications/components into the SAME log analytics workspace, to simplify query across different apps, etc
Think of ADX as any other kind of database offering. If you create an ADX cluster instance, you have to create database, manage schema, manage users, etc. AI and LA do all that for you. So in your question above, the third link to ADX SDK would be used to talk to an ADX cluster/database directly. I don't believe you can use it to directly talk to any AI/LA resources, but there are ways to enable an ADX cluster to query AI/LA data:
https://learn.microsoft.com/en-us/azure/data-explorer/query-monitor-data
And ways to have a LA/AI query also join with an ADX cluster using the adx keyword in your query:
https://learn.microsoft.com/en-us/azure/azure-monitor/logs/azure-monitor-data-explorer-proxy

Where to use CosmosDB?

CosmosDb has a good feature of Globally Distributed which gives Faster Response of data. This will be useful for Mobile Applications directly accessing CosmosDb where Users are spread across the Globe.
However I am using ASP.NET Web Application hosted in Azure. Here my Application to Database communication will be of Fixed Distance always.
Can I benefit from CosmosDb in this case?
This is for Azure hosted ASP.NET Application
You can utilize CosmosDB when you know noSQL concept and so is your code, it has different implementation for read and write processes or you are planning to do microservices or you have other projects that depends/communicate on your Webapp project and your using the same database
There are some points you need to take into account before choosing CosmosDB as the database.
Pricing model! CosmosDB is not a cheep database and pricing model is based on the provisioned throughput. Requests that exceed the provisioned throughput will be rejected by the database. So first make sure you completely understand how things work.
Like other document based databases, if you wanna keep a graph of objects in a document, you should consider how to handle concurrent updates to the documents (if that is the case in your app). Hope you know well the difference between document based and relational databases.
But regarding the benefits:
It has a great a integration support with other PaaS services in Azure
It scales very well if you have a good partitioning strategy

Cloud database for Azure multi-tenant application?

I am starting to port one old desktop single tenant application into the cloud and wish to hear what would be your recommendation about the databases for my cloud-based multi-tenant application?
My basic requirement is simple:
For each tenant, its data is separate to any other tenants' data. I can easily backup, restore, export the data for one single tenant without affecting other tenants.
I don't really want to care about multi-tenancy in the business logic code. It should look like a single tenant application behind the security layer, no tenant ID pass around etc.
Easy to query using some mature technology like LINQ.
Availability and scalability, of course, easy to set up replicas, fail-over and scaling up and down etc.
I have gone through some investigations about multi-tenant application development. I have noticed SQL databases from Azure and AWS are both very expensive(the cost for just SQL database instance is close to the license fee of the original application), so I definitely can't use separate SQL database instances for tenants.
Now I'm reading this book Developing Multi-tenant Applications for the Cloud, 3rd Edition, and it uses Azure Storage Service to implement multi-tenancy. I haven't finished the book yet, it seems you still have to handle the multi-tenancy by yourself and the sample code is already out of date.
I have seen lots of SO questions compare Azure Table Storage with MongoDB. The MongoDB is very new to me, not sure whether it could be easily used to fulfill my requirements?
And I have seen RavenDB as well, it does support multi-tenancy out of box. But I didn't see some good sample code about how to use it in Azure app development.
Hope to hear some good advices from awesome SO guys.
I would better opt with RavenDB on top of MongoDB. Even Raven is a new comer in to the game, it supports most of the features which traditional SQL supports.
Also to make up a decisions the volume of data you are dealing is a also a key decision pointer. Also the amount of traffic you are expecting.
Also keep in mind that operational costs and development efforts. HA and DR scenarios can be problematic when you use Raven or Mongo because of the fact that you need to host them. But when it comes to Azure Storage, it by defaults protects you to a maximum extent by maintaining 3 copies of information.
So I would suggest you to carefully make the trade offs and opt wisely based on your business needs, cost optimization, development and operational effort.
Having a single instance of your application for each tenant is a very expensive way to implement an application, however I realise that if an application was developed with a single tenant in mind, then the costs of changing over can be high.
First can we start out with why you have a desktop application connecting to a database at another location. The latency can really slow down an application. Ideally you would want a locally installed database and have it sync with the cloud DB, or add in appropriate caching into your application.
However the DB would still need to differentiate the clients.
Why do you need this to go to a cloud database? Is it for backup purposes, not installing a DB locally on a clients machine, accessing the same data from many machines or something else?
Unless your application is extremely large, I would recommend rewriting it for multi-tenant to one SQL Azure database. The architecture chosen at the beginning of the project doesn't suit your requirements now. As you expand you will run into further issues.

Is Firebase an all-purpose database?

I've been reading about Firebase and playing with it for a short while. The idea (BAAS) and implementation are impressive, and having programmed with Javascript it seems a viable choice. Not having to deal with scaling and other server side concerns makes it even more attractive.
My question is: generally speaking, is Firebase a first class back-end candidate for any average data-based application? e.g. billing, CRM, e-commerce, social, location based, etc. I do not include super light or heavy extremes such as a basic chat, or a nuclear plant monitor...
The answer may not be a clear yes/no, but was it built to support the general application space, or just stand out as a real-time read/write data service?
Would appreciate answers based on experience and existing production applications.
Thanks
Yes, Firebase is intended to be a first class back-end for any data based Web, iOS or Android application. The service offers real-time data reads and writes, but also comes with a powerful and flexible security system that allows you to write secure client-only apps, without needing any server code to enforce data boundaries.
There are several apps in production listed on the front page as customer and on the app showcase page on https://firebase.google.com/customers/
Firebase is now more capable and is considered as a full stand-alone back-end, especially after the introduction of cloud function. https://firebase.google.com/docs/functions/
Firebase may not have support for transaction spanning multiple business objects.
e.g. When a sales order is booked then it needs to update inventory for multiple items, update billing in receivables, give sales credit to multiple sales persons etc.
Firebase team is supposed to come up with a database trigger option which will make all these happen.

reporting functionality in SOA services

I'm looking to design a SOA service which in addition to its main requirements has a requirement for a small number of reports.
When should SOA services (a) include their own reporting functionality, and when should (b) reports be made available as part of a separate reporting service?
I guess (a) makes the service more self-contained, but (b) should probably be preferred when the organisation already has reporting services deployed?
You can/should report on individual services if the report is self contained and relates to data that changes frequently.
It is ok to use a centralised reporting when the data is immutable i.e. historic copy of the data. This way the services are still the owners of the data (and responsible for updating the data) and the data is still available for cross-service reports. This is a pattern I call aggregated reporting (you can see a draft of it here)
You can see an article I published on infoQ called bridging the gap between BI & SOA

Resources