ADX request throttling improvements - azure-data-explorer

I am getting {"code": "Too many requests", "message": "Request is denied due to throttling."} from ADX when I run some batch ADF pipelines. I have came across this document on workload group. I have a cluster where we did not configured work load groups. Now i assume all the queries will be managed by default workload group. I found that MaxConcurrentRequests property is 20. I have following doubts.
Does it mean that this is the maximum concurrent requests my cluster can handle?
If I create a rest API which provides data from ADX will it support only 20 requests at a given time?
How to find the maximum concurrent requests an ADX cluster can handle?

for understanding the reason your command is throttled, the key element in the error message is this: Capacity: 6, Origin: 'CapacityPolicy/Ingestion'.
this means - the number of concurrent ingestion operations your cluster can run is 6. this is calculated based on the cluster's ingestion capacity, which is part of the cluster's capacity policy.
it is impacted by the total number of cores/nodes the cluster has. Generally, you could:
scale up/out in order to reach greater capacity, and/or
reduce the parallelism of your ingestion commands, so that only up to 6 are being run concurrently, and/or
add logic to the client application to retry on such throttling errors, after some backoff.
additional reference: Control commands throttling

Related

Kusto ingestion limit and command throttle because of capacity policy

I use kusto ingest client kustoClient.IngestFromDataReader to ingest data. And it throws exception An error occurred for source: 'DataReader'. Error: 'Failed to ingest: State='Throttled', Status='The control command was aborted due to throttling. Retrying after some backoff might succeed. CommandType: 'DataIngestPull', Capacity: 18, Origin: 'CapacityPolicy/Ingestion'.'. I read the document here https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/capacitypolicy#ingestion-capacity, and guess it may because that there are too many requests run concurrently and the cluster capacity is limited, am I right?
I am still a bit confused about the document. What does the final number (Minimum(ClusterMaximumConcurrentOperations, Number of nodes in cluster * Maximum(1, Core count per node * CoreUtilizationCoefficient))) mean? Does it means the total concurrent operations number? And specifically does one kusto ingest client or one kusto ingest command only have one concurrent operation or it is configurable?
Thanks a lot!
Effectively the document means that ingest capacity (in terms of concurrent ingest operations) is 3/4 times the overall number of cores in the cluster, but not higher than 512.
You can view your cluster capacity and its utilization by running '.show cluster capacity' command.
If you do not want to handle the throttling by yourself, you should use the KustoQueuedIngestClient class, and pass to it the ingestion service endpoint (https://ingest-..kusto.windows.net).
The ingestion service will take care of managing the load on your cluster.
See Ingestion Overview article for more details.

Interpreting output of .show capacity command

I am trying to make sense of output return .show capacity command. I am finding that different clusters I have access to have the same capacity policy, yet when I run .show capacity command I see different number in 'Total' column of the resultset. Isn't total determined from capacity policy?
Also, what it means when we say for example remaining capacity for 'DataExport' resource is say 30 ? Does it mean that 30 more export commands can be accommodated (all with their unique OperationsId) without getting queued up (if it queues up at all when more export commands are issued than 'Remaining' slots) ?
based on the scope property, the output of .show capacity may depend not only on the cluster's capacity policy, but also on the cluster's workload groups and their request rate limit policies.
Even if unaltered, both the former and the latter would be different by default in clusters that have different SKUs (different number of cores per node), or a different number of nodes (total number of cores).
30 means that up to 30 export commands may run concurrently. the 31st will be throttled.
It doesn't necessarily mean the cluster can't physically handle more than 30, rather it's a (configurable) threshold at which requests of this type will be throttled, to limit the export workload from consuming too many resources.
There's no queuing of such requests (unless your workload group definition and its request queueing policy specify that queuing is enabled for these kinds of requests)

When CosmosDb says 10 ms write latency, does it include the time it takes for the write request to reach the cosmosdb server?

This document on cosmosdb consistency levels and latency says CosmosDb writes have a latency of 10ms at the 99th percentile. Does this include the time it takes for the write to reach CosmosDB. I suspect not, since if I issue a request far away from my configured azure regions, I don't see how it can take < 10 ms.
The SLA is for the latency involved in performing the operations and returning results. As you mention, it does not include time taken to reach the Cosmos endpoint, which depends on the client's distance.
As indicated in performance guidance:
You can get the lowest possible latency by ensuring that the calling
application is located within the same Azure region as the provisioned
Azure Cosmos DB endpoint.
In my experience latency <10ms is typical for an app located in the same region as the Cosmos endpoint it works against.

Impala Resource Management

Admission control is embedded within each impalad daemon and communicates through the statestore service. The impalad daemon determines if a query runs immediately or if the query is queued.
However If a sudden flow of requests causes more queries to run concurrently than expected, the overall Impala memory limit and the Linux cgroups mechanism at the cluster level serve as hard limits to prevent over allocation of memory. When queries hit these limits, Impala cancels the queries.
Does this mean Impala Resource Limits Enforced at individual Impala daemon level or at the cluster level?.
The answer is both. Each impalad daemon has its own MEM_LIMIT. Exceeding it will cause the query to be canceled. The admission control pool works at the cluster level, even though the gatekeep (decide whether the query should be run or queued) is at each impalad level, even though these impalad makes the admission decision based on the pool resource at the cluster level. That's why when there are a flood of queries sent to different impalad instances, the impalad daemons might admit more queries than they should because they cannot get the most current cluster resource usage information at the time. CGroup limit does not cause query to be canceled. It determines the percentage of CPU that the impalad should get when there is CPU contention.

DynamoDB: High SuccessfulRequestLatency

We had a period of latency in our application that was directly correlated with latency in DynamoDB and we are trying to figure out what caused that latency.
During that time, the consumed reads and consumed writes for the table were normal (much below the provisioned capacity) and the number of throttled requests was also 0 or 1. The only thing that increased was the SuccessfulRequestLatency.
The high latency occurred during a period where we were doing a lot of automatic writes. In our use case, writing to dynamo also includes some reading (to get any existing records). However, we often write the same quantity of data in the same period of time without causing any increased latency.
Is there any way to understand what contributes to an increase in SuccessfulRequest latency where it seems that we have provisioned enough read capacity? Is there any way to diagnose the latency caused by this set of writes to dynamodb?
You can dig deeper by checking the Get Latency and Put Latency in CloudWatch.
As you have already mentioned, there was no throttling, and your writes involve some reading as well, and your writes at other period of time don't cause any latency, you should check for what exactly in read operation is causing this.
Check SuccessfulRequestLatency metric while including the Operation dimension as well. Start with GetItem and BatchGetItem. If that doesn't
help include Scan and Query as well.
High request latency can sometimes happen when DynamoDB is doing an internal failover of one of its storage nodes.
Internally within Dynamo each storage partition has to be replicated across multiple nodes to provide a high level of fault tolerance. Occasionally one of those nodes will fail and a replacement node has to be introduced, and this can result in elevated latency for a subset of affected requests.
The advice I've had from AWS is to use a short timeout and a fast retry (e.g. 100ms) if your use-case is latency-sensitive. It's my understanding that only requests that hit the affected node experience increased latency, so within one or two retries you'll hit a different node and get a successful response, with minimal impact on your overall latency. Obviously it's hard to verify this, because it's not a scenario you can reproduce!
If you've got a support contract with AWS, it's well worth submitting a support ticket from the AWS console when events like this happen. They are usually able to provide an insight into what actually happened.
Note: If you're doing retries, remember to use exponential backoff to reduce the risk of throttling.

Resources