DynamoDB Update/Put throttled despite high provisioned capacity - amazon-dynamodb

I am seeing some throttles on my updates on DynamoDB table. I know that throttle work on per second basis, that peaks above provisioned capacity can be sometimes absorbed, but not guaranteed. I know that one is supposed to evenly distribute the load, which I have not done.
BUT please look at the 1 minute average graphs from metrics; attached. The utilized capacity is way below the provisioned capacity. Where are these throttles coming from? Because all writes went to a particular shard?
There are no batch writes. The workload distribution is something that cannot, easily, control.

DynamoDB is built on the assumption that to get the full potential out of your provisioned throughput your reads and writes must be uniformly distributed over space (hash/range keys) and time (not all coming in at the exact same second).
Based on the allocated throughput on your graphs you are still most likely at one shard, but it is possible that there are two or more shards if you have previously raised the throughput above the current level and lowered it down to what it is at now. While this is something to be mindful of, it likely is not what is causing this throttling behavior directly. If you have a lot of data in your table, over 10 GB then you definitely will have multiple shards. This would mean you likely have a lot of cold data in your table and that may be causing this issue, but that seems less likely.
The most likely issue is that you have some hot keys. Specifically, you have one or just a few records that are receiving a very high number of read or write requests and this is resulting in throttling. Essentially DynamoDB can support massive IOPS for both writes and reads, but you can't apply all of those IOPS to just a few records, they need to be distributed among all of the records uniformly in an ideal situation.
Since the number of throttles you were showing is in the order of magnitude of 10s to 100s it may not be something to worry about. As long as you are using the official AWS SDK it will automatically take care of retries with exponential backoff to retry requests several times before completely giving up.
While it is difficult in many circumstances to control the distribution of reads and writes to a table, it may be worth taking another look at your hash/range key design to make sure it is really optimal for your pattern of reads and writes to the table. Also, for reads you may employ caching through Memcached or Redis, even if the cache expired in just a few minutes or a few seconds to help reduce the impact of hot keys. For writes you would need to look at the logic in the application to make sure there are not any unnecessary writes being performed that could be causing this issue.
One last point related to batch writes: A batch operation in DynamoDB does not reduce the consumed amount of read or writes the different child requests consume, it simply reduces the overhead of making multiple HTTP requests. While batch requests generally help with throughput, they are not useful at reducing the likelihood of throttling in DynamoDB.

Related

Does the connection cost associated with writing to Firebase Realtime Database depend on the amount of data being written?

Note: There are similar SO questions related to costs associated with writing to Firebase Realtime Database (such as here), but I have a question that specifically relates to the relationship between write volume and the resulting connection cost.
So according to the Firebase documentation, it looks like the Realtime Database does not directly charge for writing data, although "writes can lead to connection costs on your bill".
But I would like to know, are these connection costs variable - dependent on the amount of data being written? Or is it the same no matter the size.
For example, if I update 10 MB of data once every minute, will this result in the same cost as writing only 10KB of data once every minute?
I'm trying to figure out what degree of "write-efficiency" is worth my time.
The documentation you linked suggests an answer. Under "Protocol overhead" it states:
Each time a connection is established, this overhead, combined with any SSL encryption overhead, contributes to the connection costs. Although this isn't a lot of bandwidth for a single request, it can be a substantial part of your bill if your payloads are tiny or you make frequent, short connections.
This is pretty clearly saying that, due to necessary protocol overhead, frequent connections incur more expense than a single connection. It doesn't say that the connection costs are variable based on the size of the payload being written, so it stands to reason that the connection cost is fairly static. Think HTTP headers that are essentially the same for each connection.

Why does Cosmos DB return 429 for a portion of requests despite not exceeding my manual set throughput

My Cosmos DB is using Shared Throughput across several containers. I have manually scaled up my Cosmos DB to 70,000 RU/s and I am currently running a large number of requests.
Looking in azure I can see that a portion of my requests are being throttled (returning 429).
To give an idea of numbers around 25k requests return 200 and around 5k requests return 429.
When I follow the warning in the azure portal that says my collection is exceeding provisioned throughput it shows the average throughput is 6.78k RU/s.
I don't understand why when I have 70,000 RU/s that my requests are being throttled when the average throughput is supposedly only 6,780 RU/s.
No other containers are being read or written to, all these requests are made against just one container.
As all these requests are to run a stored procedure they all have a Partition key supplied.
The most likely reason is you have a hot partition that is reaching its allocated throughput before the other partitions are.
For a horizontally scalable database, throughput is allocated across physical partitions (computers) and data is partitioned using a partition key that basically acts as an address to route it to a specific computer to be stored.
Assume I have a collection with three partitions 1, 2, 3 and 30K RU/s. Each one of those will get 10K RU/s allocated to it. If I then run an operation that does a ton of operations on partition 2 and consumes all of it's 10K I'm going to get rate limited (429) even I don't touch partition 1 or 3.
To avoid this you need to pick a partition key that BOTH distributes data as evenly as possible during writes and ideally can also be used to answer queries within one or a small number (bounded) number of partitions, trying to avoid "fan out" queries where queries have to hit every partition.
Now for small collections that only reside on a single physical partition none of this matters because your data is all on a single physical partition. However, as the collection grows larger this causes issues which will prevent the database from scaling fully.
You can learn more here

How to handle scaling when request per minute go from 500 to 5000 instanly

I have an application that spikes from 500 rpm to 5000 and stays there for 20-30min. I know that's not a ton of requests but its the magnitude of the jump that is killing me. AWS-EC2 takes 5 min to scale up so that's not helpful when things move so fast. Maybe multiple DB's that handle different pieces of the application.
How would you go about analyzing this and thinking about infrastructure if you will always go from 500 to 5000RPM or higher in one minute?
This is the graph from my AWS logs:
If you can predict that demand will increase at some point you can automate provisioning of new instances. If you can't determine this then you need to do proper capacity planning. For instance, how many servers/containers do you need running to sustain the load with an acceptable user experience? This will be key to determine.
You also should look at implement asynchronous messaging patterns that offload the spike although this may come with some performance degradation.
One additional consideration would be moving to a serverless architecture like AWS Lambda. This likely wouldn't fully solve the problem but would provide you more ability to quickly provision on demand infrastructure.

DynamoDB: High SuccessfulRequestLatency

We had a period of latency in our application that was directly correlated with latency in DynamoDB and we are trying to figure out what caused that latency.
During that time, the consumed reads and consumed writes for the table were normal (much below the provisioned capacity) and the number of throttled requests was also 0 or 1. The only thing that increased was the SuccessfulRequestLatency.
The high latency occurred during a period where we were doing a lot of automatic writes. In our use case, writing to dynamo also includes some reading (to get any existing records). However, we often write the same quantity of data in the same period of time without causing any increased latency.
Is there any way to understand what contributes to an increase in SuccessfulRequest latency where it seems that we have provisioned enough read capacity? Is there any way to diagnose the latency caused by this set of writes to dynamodb?
You can dig deeper by checking the Get Latency and Put Latency in CloudWatch.
As you have already mentioned, there was no throttling, and your writes involve some reading as well, and your writes at other period of time don't cause any latency, you should check for what exactly in read operation is causing this.
Check SuccessfulRequestLatency metric while including the Operation dimension as well. Start with GetItem and BatchGetItem. If that doesn't
help include Scan and Query as well.
High request latency can sometimes happen when DynamoDB is doing an internal failover of one of its storage nodes.
Internally within Dynamo each storage partition has to be replicated across multiple nodes to provide a high level of fault tolerance. Occasionally one of those nodes will fail and a replacement node has to be introduced, and this can result in elevated latency for a subset of affected requests.
The advice I've had from AWS is to use a short timeout and a fast retry (e.g. 100ms) if your use-case is latency-sensitive. It's my understanding that only requests that hit the affected node experience increased latency, so within one or two retries you'll hit a different node and get a successful response, with minimal impact on your overall latency. Obviously it's hard to verify this, because it's not a scenario you can reproduce!
If you've got a support contract with AWS, it's well worth submitting a support ticket from the AWS console when events like this happen. They are usually able to provide an insight into what actually happened.
Note: If you're doing retries, remember to use exponential backoff to reduce the risk of throttling.

Client and Cache configuration for Oracle coherence

I have the specific scenario for which we want to use Coherence as sitributed cache. Which I am gonna describe here.
I have 20+ standalone processes which are going to put the data in cache continuously. the frequency of all of them differs, though thats not a concern.
And 2 procesess which will be reading data from those cache.
I dont need any underlying db except for the way which coherence provide. Data will be written to the cache and read from the cache.
I have 4 node cluster at my disposal (cost constraint whatever) and the coherence cluster will be on different boxes (infra constraint whatever) and both the populating portion of the cache and the reading part will be on differnt nmachines.
The peak memory size of the cache daily will hover around 6 GB max, min being 2 GB.
Cache will have daily data only and I will have separate archiving processes to simulatneosuly keep archiving it also. the point is that cache size for now will have this size only. Lets say I am gonna keep the date out of key equation.
Though Would like to explore if I can store more into those 4 nodes. Right now its simple serialization, can explore other nbinary formats. Or should I definietly at this size of the cache?
My read and write operations are fairly spread out in the day. Meaning the read and write will keep on happening by those 2 reading clients and 20+ writing clients. Its not like one of them is more. Though there is a startup batch process in all of the background process which push more to the cache than the continuous pushing afterwards. But continuous pushing pushes fair amount of data too.
Now my questions regarding those above points (and because of some confusion also)
The biggest one is somebody told me that I an have limited number of connection depending on the nodes we have bought. so he said if its 4, you ideally should have 4 connections only at the max. So, develop a gatekeeper kind of application and what not. Even if we use TCP Extend. Now from my reading so far, I dont think so. Is it? The point is dont wanna go that way if its really is not a constraint.
In other words is there limit on connection through Proxy Service dependeing on the nodes in the cluster?
Soemwhat related to above only. at the very max, I am going to get some penalty on the performance while pushing to cache only if I go the Extend way, right?
Partioned cache/near cache. As the reading time as well as the most update cache both are extremely critical. (the most imp question i have).
Really want to see the benefit which can be obtained from going to POF instead of lets say serialization/externalizatble/protobuf. Can coherence support protobuf out of the box? (may be for later on)
There's no technical limitation to the number of connections a Coherence Extend proxy can support except normal network and hardware resource constraints. You will have to ask an Oracle sales person if there are licensing limitations.
There is some performance impact from using a proxy because you are adding an additional network hop (client to proxy to cluster). If you use POF serialization then the proxy does not have to serialize/deserialize values. It can just pass the object through in its serialized form. In most applications the performance impact of using a proxy is tiny because Coherence is highly optimized for network speed. You are not required to use a proxy unless your clients are .NET or C++, but there are advantages of isolating client performance from impacting the cache.
Near cache will improve retrieval performance dramatically if there a number of frequently retrieved items for a client since they will be found in-process.
POF offers performance improvements based on faster serialization/deserialization and more compact storage. It is always best to try with test data based on your real production data and measure the difference yourself. Coherence does not support protobuf out of the box.

Resources