Can you control write speed to DynamoDB using Boto3? - amazon-dynamodb

I need to write to DynamoDB which is Autoscaling enabled, my goal is:
best utilize provisioned capacity according to its changing capacity (by Autoscaling) without or with little "throttled writes"
We are using batch_writer() at the moment, the problem with it is there's no response like what BatchWriteItem does, so you can adjust capacity based on response. But BatchWriteItem has its own problem -- it has a limit of 25 items per request, even I have many threads, it's probably not quick enough for my needs, I need about 10000 WCU / second at maximum.
What would you suggest?

Related

How to handle dynamodb WCU limitation ? is it queued when you go above the limit?

I need to feed the db with something like 10k items, I don't need to rush and can/want stay below the 25wcu free plan.
It will take something like 6.5minutes (10000requests/25requests/sec).
Here is the question. I will loop on a json to feed the base do I have to handle the number of request by second myself or can I push to the max and it will be queued ?
I read that I may just have an error message (400?) when I exceed the limit can I just brutally retry the failed requests (eventually making more fail) until my 10k items are put in the db ?
tldr; => best way/strategy to feed the base knowing there is a limit of calls/sec
ps: it's run from a lambda idk if it matters.
Your calculation is a little bit off unless you do this constantly, because AWS actually allows you to burst a little bit (docs):
DynamoDB provides some flexibility in your per-partition throughput
provisioning by providing burst capacity. Whenever you're not fully
using a partition's throughput, DynamoDB reserves a portion of that
unused capacity for later bursts of throughput to handle usage spikes.
DynamoDB currently retains up to 5 minutes (300 seconds) of unused
read and write capacity. During an occasional burst of read or write
activity, these extra capacity units can be consumed quickly—even
faster than the per-second provisioned throughput capacity that you've
defined for your table.
Since 300 (seconds) * 25 (WCU) = 7500 this leaves you with about 7.5k items until it will actually throttle.
Afterwards just responding to the ProvisionedThroughputExceeded error by retrying later is fine - but make sure to add a small delay between retries (e.g. 1 second) as you know that it takes time for the new WCU to flow into the tocken bucket. Immediately retrying and hammering the API is not a kind way to use the service and might look like a DoS attack.
You can also write the items in Batches to reduce the amount of network requests as network throughput is also limited in Lambda. This makes handling the ProvisionedThroughputExceeded error slightly more cumbersome, because you need to inspect the response which items failed to write, but will probably be a net positive.

Dynamo - Increased Read Latency During Writes

There is a DynamoDB table Entity which has a hash key on id and GSI on another attribute: cardId. The GSI only has range key and does not have any sort key.
Whenever, we get a batch of create/update requests, we first use the GSI to read existing data and then write the main table, which also updates the GSI table eventually. During this time, we may also serve some parallel read requests from the GSI.
We are seeing an issue where the latency of both main table and GSI table increases from 200ms to 10-15 seconds during this time (batch writes + reads). I am not able to establish a co-relation between consecutive reads and writes in the table. The table is set to use on-demand capacity and there is no throttling. "SuccessfulRequestLatency" is ~300-400 ms only.
It is the DDB client method that has latency in seconds. It does not do any data transformation, just return the DB data as is to upper layers. Anything else that I should be monitoring to get to the root cause for this?
Thanks!
I don't have a full answer, but do have some directions you might want to investigate.
First, I noticed in the past is that extremely long latencies may indicate that your client gave up and retried the request. Some clients hide this retry, and it just looks like a very slow request from the outside.
Second, you're right that on-demand billing mode doesn't throttle based on provisioned throughput, but it nevertheless can do throttling - see
https://aws.amazon.com/premiumsupport/knowledge-center/on-demand-table-throttling-dynamodb/. By default there are limits on the throughput that an on-demand table can have, as well as how quickly the throughput may grow. These limits are at least partially for your protection - you wouldn't want a run-away-train application to accidentally do billions of requests and cost you a million dollars :-)

Does DynamoDB latency depend on number of items per partition

Newbie to DDB here. I've been using a DDB table for a year now. Recently, I made improvements by compressing the payload using gzip (and representing it as a binary in DDB) and storing the new data in another newly created beta table. Overall compression was 3x. I expected the read latency(GetItem) to improve as well as it's less data to be transported over the wire. However, I'm seeing that the read latency has increased from ~ 50ms p99.9 to ~114 ms p99.9. I'm not sure how that happened and was wondering if because of the compression, now I have a lot of rows per partition (which I think is defined as <= 10 GB). I now have 3-4x more rows per partition. So, I'm wondering that once dynamoDb determines the right partition for a partition key, then within the partition how does it find the correct item? Gut feel is that this shouldn't lead to an increase in latency as a simplified representation of the partition can be a giant hashmap so it'd just be a simple lookup. I'd appreciate any help here.
My DDB schema:
partition-key - user-id,dataset-name
range-key - update-timestamp
payload - used to be string, now is compressed/binary.
In my GetItem requests, I specify both partition key and range key.
According to your description, your change included two unrelated parts: You compressed the payload, and increased the number of items per partition. The first change - the compression - probably has little effect on the p99 latency (it could have a more noticable effect on the mean latency - which, according to Little's Law is related to throughput, if your client has fixed concurrency - but I'd expect it to lower, not increase).
Some guesses as to what might have increased the p99 latency:
More items per partition means that DynamoDB (which uses a B-tree) needs to do more disk reads to find a specific item. Since each disk access has rare delays caused by queueing, this adds to the tail latency.
You said that the change caused each partition to hold more items, I guess this means you now have fewer partitions. If you have too few of them, you can start getting unbalanced load on the different DynamoDB partitions, and more contention and latency for specific "hot" partitions.
I don't know how you measure your latency. Your client now needs (I guess) to uncompress the returned result, maybe it is now busier, adding queening delays in the client? Can you lower your client's concurrency (how many client threads run in parallel) and see if the high tail latency is an artifact of the server design, or the client's design?

Amazon DynamoDB throttling

I've enabled auto-scaling for our dynamo-db table. It has a target utilization at 30% but it keeps throttling.
See this example screenshot where throttling is happening
As you can see it's exactly scaling up as you want it too. But I don't understand why it's still throttling. Its almost always below the provisioned throughput.
Can anyone explain what's going wrong and why it's still throttling?
Thanks,
Hendrik
Very hard to tell from the graph, and limited information.
Some thoughts:
AutoScaling can take 5 - 10 minutes to kick in. This is not fast enough if there is a sudden increase in usage. Perhaps you are seeing throttling in that 5 - 10 minute window before it scales up.
If you set CloudWatch Metrics to 1 min interval, you might see whats going on in a big more detail.
As mkobit mentioned, you might be hitting throughput on your Partition, depending on how your data is structured.
Your capacity units are evenly distributed between your partitions. So you may hit the capacity limit on your partition you have and which records you are trying to access, but not your table throughput.
This also depends on the amount of data you have stored, number of partitions etc.
HTH

AWS DynamoDB: read/write units estimation issue

I am creating an online crowd driven game. I expect the read/write requests to fluctuate (like, 50,50,50,1500,50,50,50)every second and I need to process all 100% requests with strong consistency.
I am planning to go with AWS's DynamoDB from GAE datastore for its strong consistency. I have the below doubts which I could not get clear answers in other discussions.
1. If the item size for a write action is just 4B, Will that be rounded to a 1KB and consume a write unit?
2. Financially it is not wise to set the Provisioned Throughput Capacity around the expected peak value. Alarms can warn us. But in the case of sudden rise, the requests could be throttled at the time we receive alarm. Is DynamoDB really designed to handle highly fluctuating read/write?
3. I read about Dynamc DynamoDB to update the read/write throughput capacity for us, When we add some read/write units, How long it will take to allocate them? If it takes too long, Whats the use of increasing the bar after the tide hits?
Google app engine bills just for the number of requests happen in that month. If I can make AWS work like, "Whatever the request count could be, I will expand and contract myself and charge you only for the used read/write units", I will go for AWS.
Please advise. Dont hesitate if I am not being clear at parts.
Thanks,
Karthick.
Yes. Item sizes are rounded up and the throughput is used. From the Provisioned Throughput in Amazon DynamoDB documentation:
The total number of read operations necessary is the item size, rounded up to the next multiple of 4 KB, divided by 4 KB.
It can handle some bursting, but it is generally intended to be used for uniform workloads. Here is a section from the Guidelines for Working with Tables documentation and some other helpful links about the best practices:
A temporary non-uniformity in a workload can generally be absorbed by the bursting allowance, as described in Use Burst Capacity Sparingly. However, if your application must accommodate non-uniform workloads on a regular basis, you should design your table with DynamoDB's partitioning behavior in mind (see Understand Partition Behavior), and be mindful when increasing and decreasing provisioned throughput on that table.
Query and Scan guidelines for avoiding bursts of read activity
The Table Best Practices section
Use Burst Capacity Sparingly
This one is going to depend on how much data your table has, because DynamoDB will have to repartition the data if you are scaling up. See the Consider Workload Uniformity When Adjusting Provisioned Throughput documentation for more information about the partitioning..

Resources