What is Search Unit in Azure Cognitive search? - microsoft-cognitive

What is search unit in Azure cognitive search?
Need more details about internal process Search Unit.
What is the advantage of more than one search unit in Azure cognitive search?

Azure Cognitive Search allows you to add redundancy and partitioning to your service.
With redundancy, the data in your index exists in multiple copies. The main advantage of that is failure tolerance.
With partitioning, the data is split into shards. The main advantage of that is performance as requests can be routed and handled with more parallelism.
Search units are the product of partitions and redundancy, pretty much how many virtual machines are needed to achieve the specified numbers.

Related

Best way to handle multiple container transactions operations in Cosmosdb Nosql?

Currently I am trying to design an application where we have a CosmosDB account representing a group of customers with:
One container is used an overall Metadata store that contains all customers
Other containers will containers will contain data specific to one customer where data will be partitioned on according to different categories of customer history etc.
When we onboard a new customer (which will not happen too often and once) we'd like to make sure that we create an row in the Overall customer Metadata and then provision the customer specific container if fail rollback the transaction if it fails. (In the future we'd like to remove customers as well.)
Unfortunately the Cosmosdb Nosql only supports transactions in one container within the same logical partition, and does not support multi-container transactions. Our own POC indicates the MongoDB api does support this but unfortunately MongoDB does not fit our use case as we need support for Azure Functions.
The heart of the problem here isn't whether Cosmos DB supports distributed transactions. The core problem is you can't enlist an Azure Control Plane action (in this case, creating a container resource) into a transaction.
Since you're building in the cloud, my recommendation would be to employ the outbox pattern to manage your provisioning state for your customers. There's an easy to understand example here you can read.
Given you are building a multi-tenant application for Cosmos DB and using containers as your tenant boundary, please note that the maximum number of databases and/or containers in an account is 500. Please see Service Quotas for more information.

Azure Data Explorer vs Azure Synapse Analytics (a.k.a SQL DW)

I design a data management strategy for a big IoT company. Our use case is fairly typical, we ingest large quantities of data, analyze them, and produce datasets that customers can query to learn about the insights they need.
I am looking at both Azure Data Explorer and the Data Warehouse side of Azure Synapse Analytics (a.k.a Azure SQL Data Warehouse) and find many commonalities. Yes, they use different languages and a different query engine on the backend, but both serve as a "serving layer" that customers use to query read-only data at a large scale.
I could not find any clear guidance from Microsoft about how to choose between the two, or maybe it makes sense to use them together? In that case, what is the best use case or type of data for each of the services?
If you can enlighten me please share your thoughts here. If you know about some guidance about the matter please reply with a link.
The classic and also the modern data warehouse pattern involve first designing a well curated data model, with documented entities and their attributes, creating a scheduled ETL pipeline that transforms and aggregates the raw data, big and small into the data model. Then you load and serve it. The curated data model provides stability, consistency and reliability when consuming these entities across an enterprise.
Azure Data Explorer was designed as an analytical data platform for telemetry. In this workload you do not aggregate the data first, but actually keep it close to the raw format as you do not want to lose data. It allows you to deal with the unexpected nature of security attacks, malfunctions, competitive behaviors, and in general the unknowns, as it allows looking at the fresh raw data from different angles and provide a lot flexibility.
This is why Azure Data Explorer is the storage for Microsoft Telemetry and also a growing set of analytical solutions like: Azure Monitor, Azure Security Center, Azure Sentinel, Azure Time Series Insights, IoT Central, PlayFab gaming analytics, Windows Intune Analytics, Customer Insights, Teams Education analytics and more.
Providing high performance analytics on raw data, with schema-on-read capability on textual, semi structured and structured data.
Quite a few of our partners and customers are adopting ADX for the same reasons.
Check out the overview webinar that describe these concepts in detail.
Azure Synapse Analytics packed SQL DW, ADF and Spark to have all the data warehouse pattern components highly integrated and easier to work with and manage. As we announced on the Azure Data Explorer Virtual Event, Azure Data Explorer is being integrated to Azure Synapse Analytics along side the SQL and Spark pools to cater for telemetry workloads - Real time analytics on high velocity, high volume, high variety data.
Check out some of the IoT cases Buhler, Daimler video,story, Bosch, AGL and there are more leading IoT platforms who are adopting Azure Data Explorer for this purpose. Reach out to us if you need additional help.

Where do I set a partitionKey in CosmosDB deployed as a Gremlin instance?

I have several Vertices and Edges to create and think I might have "hot" sections of data. (as in Azure Table Storage)
Are my scalability and other knowledge from Azure Tables applicable to Gremlin on Azure? If so, how?
Namely, I want to have "subdivided slices" of sub-tenants (or user partitions) on the database. (If possible I might want to reference between them, or query both at the same time)
Scalability and performance of any Azure Cosmos DB API is based on partitioning. Same concept is applicable for Azure Cosmos Gremlin API. While creating a graph you need to define the partition key and partitions will be created based on that.
On top of it, you can go through below article that mentions few more optimization that can help with scalability and performance. As per the article, "Queries that obtain data from a single partition provide the best possible performance."
https://learn.microsoft.com/en-us/azure/cosmos-db/graph-partitioning

Where to use CosmosDB?

CosmosDb has a good feature of Globally Distributed which gives Faster Response of data. This will be useful for Mobile Applications directly accessing CosmosDb where Users are spread across the Globe.
However I am using ASP.NET Web Application hosted in Azure. Here my Application to Database communication will be of Fixed Distance always.
Can I benefit from CosmosDb in this case?
This is for Azure hosted ASP.NET Application
You can utilize CosmosDB when you know noSQL concept and so is your code, it has different implementation for read and write processes or you are planning to do microservices or you have other projects that depends/communicate on your Webapp project and your using the same database
There are some points you need to take into account before choosing CosmosDB as the database.
Pricing model! CosmosDB is not a cheep database and pricing model is based on the provisioned throughput. Requests that exceed the provisioned throughput will be rejected by the database. So first make sure you completely understand how things work.
Like other document based databases, if you wanna keep a graph of objects in a document, you should consider how to handle concurrent updates to the documents (if that is the case in your app). Hope you know well the difference between document based and relational databases.
But regarding the benefits:
It has a great a integration support with other PaaS services in Azure
It scales very well if you have a good partitioning strategy

How robust is the asp.net profile system? Is it ready for prime time?

I've used asp.net profiles (using the AspNetSqlProfileProvider) for holding small bits of information about my users. I started to wonder how it would handle a robust profile for a large number of users. Does anyone have experience using this on a large website with large numbers of simultaneous users? What are the performance implications? How about maintenance?
Running against this via SQL I have found is a bit tricky, but i have worked with clients that have scaled it up to a few hundred properties, and 10K+ users without difficulty. Granted not a lot of users but it is working thus far.
I think it really depends on the specific project, and your exact needs when it comes to working with the profile information. Do you need to query on it regularly via SQL? Do you just need to for user display only, these types of things might help provide a more solid answer for your needs.
The SQL provider performance is more closely correlated to big iron throughput. Performance is more or less directly proportional to a single SQL Server's ability to handle the number of queries. Scale-up is the only option, so as such its not really five-nines robust out the box.
You'll have to figure out if you need scale-out performance and availability e.g. through partitioning, replication, redundancy etc. and at what cost to performance. Some of the capabilities are are possible as is - the current implementation is more aimed at the middle-market and enterprise.
Good thing is you can put your own implementation of the profile provider - then attach it to services and systems with capabilities outlined above.
We wrote a custom authn,authz and profile provider and strapped it to large AD/LDS LDAP cluster across 3 datacenters. We're in the Comscore Top 10 - so you could say that we deal with a good slice of internet every day. 1000's of profile queries per second and 100'millions of profiles - it can scale with good planning, engineering and operations.

Resources