I've enabled auto-scaling for our dynamo-db table. It has a target utilization at 30% but it keeps throttling.
See this example screenshot where throttling is happening
As you can see it's exactly scaling up as you want it too. But I don't understand why it's still throttling. Its almost always below the provisioned throughput.
Can anyone explain what's going wrong and why it's still throttling?
Thanks,
Hendrik
Very hard to tell from the graph, and limited information.
Some thoughts:
AutoScaling can take 5 - 10 minutes to kick in. This is not fast enough if there is a sudden increase in usage. Perhaps you are seeing throttling in that 5 - 10 minute window before it scales up.
If you set CloudWatch Metrics to 1 min interval, you might see whats going on in a big more detail.
As mkobit mentioned, you might be hitting throughput on your Partition, depending on how your data is structured.
Your capacity units are evenly distributed between your partitions. So you may hit the capacity limit on your partition you have and which records you are trying to access, but not your table throughput.
This also depends on the amount of data you have stored, number of partitions etc.
HTH
Related
I need to write to DynamoDB which is Autoscaling enabled, my goal is:
best utilize provisioned capacity according to its changing capacity (by Autoscaling) without or with little "throttled writes"
We are using batch_writer() at the moment, the problem with it is there's no response like what BatchWriteItem does, so you can adjust capacity based on response. But BatchWriteItem has its own problem -- it has a limit of 25 items per request, even I have many threads, it's probably not quick enough for my needs, I need about 10000 WCU / second at maximum.
What would you suggest?
I'm running a simple api that gets an item from a dynamodb table on each call, I have auto scaling set to a minimum of 25 and a maximum of 10 000.
However if I send 15 000 requests with a tool like wrk or hey, I get about 1000 502s,
dynamodb's metrics show that reads are throttled
the scaling activities log on the table shows that the RCUs were scaled to 99 but not more than that
lambda logs show that the function starts to take longer, it usually takes about 20ms to run, but the function starts to run for 500.1500,3000 ms and start timing out (I'm assuming that's caused by the throttling)
Why isn't the autoscaling working better? It only scales upto 99RCUs but my max is 10, 000.
We ran into the same problem when testing DynamoDB autoscaling for short amounts of time, and it turns out the problem is that the scaling events only happen after 5 minutes of elevated throughput (you can see this by inspecting the CloudWatch alarms the autoscaling sets up)
This excellent blog post helped us solve this by creating a Lambda that responds to the CloudWatch API events and improves the responsiveness of the alarms to one minute: https://hackernoon.com/the-problems-with-dynamodb-auto-scaling-and-how-it-might-be-improved-a92029c8c10b
from: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/AutoScaling.html
What you defined as "target utilization"?
Target utilization is the ratio of consumed capacity units to provisioned capacity units, expressed as a percentage. Application Auto Scaling uses its target tracking algorithm to ensure that the provisioned read capacity of ProductCatalog is adjusted as required so that utilization remains at or near 70 percent.
also, i think that the main reason that autoscale not works for you, is because your work might not stay elevated for a long time:
"DynamoDB auto scaling modifies provisioned throughput settings only when the actual workload stays elevated (or depressed) for a sustained period of several minutes"
DynamoDB auto scaling modifies provisioned throughput settings only when the actual workload stays elevated (or depressed) for a sustained period of several minutes. The Application Auto Scaling target tracking algorithm seeks to keep the target utilization at or near your chosen value over the long term.
Sudden, short-duration spikes of activity are accommodated by the table's built-in burst capacity. For more information, see Use Burst Capacity Sparingly.
I have an application using DynamoDB and I noticed they just implemented autoscaling which is awesome. I love the concept and the timing for my app is pretty perfect. However I am still getting some issues that I wonder if I can't tweak settings to remove.
My application gets definite spikes in usage so I think this is an ideal thing to use, however with autoscaling on I still am getting some throttling. Here is my read graphs for the last 12 hours:
As you can see, when it spikes the usage is set low, so it throttles for a minute or two until the update kicks in, then works. That's ok I guess and better than not scaling, but I would like it not to throttle at all...
Is there any way to tell DynamoDB to never throttle unless it goes over 100 (or 200 or whatever I set as the top limit)? Just if it gets a surge turn up the throughput for 15 minutes or whatever until the surge is over?
Autoscaling uses CloudWatch. You can see these alarms by going to the CloudWatch dashboard and look for the alarms that include your table name and "DO NOT EDIT OR DELETE" in the description.
Why am I telling you this?
Well, the CloudWatch has some minimal period granularity. Currently it's 1 minute. It means that it will wait at least for 1 minute before firing any event to its listeners. Therefore, it will take at least one minute after the load starts until the capacity is increased. Actually it will be even more, since increasing the capacity also takes time. Bottom line: if you have very large spike, some requests may be throttled, since the the auto-scaling will not yet take effect and bursting can be exhausted.
The simple, but costly, solution will be increasing the initial capacity.
If you know about the upcoming spike in advance (e.g. you have some job running periodically, or customer peaks in certain times) you can use API to modify autoscaling programmatically.
On Thursday the 21th of April we saw a massive increase in the Database Bandwidth for all our App, when we didn't made any logic changes:
App 1, App 2
These 2 app are not related, so the increase must come for Firebase side !
For both the bandwidth was multiplied by around 4, so this is not something we can overlook (Remember that we are billed on the bandwidth limit !)
Is this a global change in Firebase database bandwidth measuring logic ?
We don't mind changement, by a x4 increase in a potential billing metric is never nice.
The answer is yes, sort of. We didn't change the measurements, we just found that a bug was causing over half of the packets to be unaccounted for and we fixed that. So the bandwidth graph now more accurately represents your actual usage.
Typically it's best to reach out to firebase-support#google.com for this sort of question as there is no way for non-Firebasers to know the answer. That's also the best place to reach us if you're worried about unfair billing.
I am creating an online crowd driven game. I expect the read/write requests to fluctuate (like, 50,50,50,1500,50,50,50)every second and I need to process all 100% requests with strong consistency.
I am planning to go with AWS's DynamoDB from GAE datastore for its strong consistency. I have the below doubts which I could not get clear answers in other discussions.
1. If the item size for a write action is just 4B, Will that be rounded to a 1KB and consume a write unit?
2. Financially it is not wise to set the Provisioned Throughput Capacity around the expected peak value. Alarms can warn us. But in the case of sudden rise, the requests could be throttled at the time we receive alarm. Is DynamoDB really designed to handle highly fluctuating read/write?
3. I read about Dynamc DynamoDB to update the read/write throughput capacity for us, When we add some read/write units, How long it will take to allocate them? If it takes too long, Whats the use of increasing the bar after the tide hits?
Google app engine bills just for the number of requests happen in that month. If I can make AWS work like, "Whatever the request count could be, I will expand and contract myself and charge you only for the used read/write units", I will go for AWS.
Please advise. Dont hesitate if I am not being clear at parts.
Thanks,
Karthick.
Yes. Item sizes are rounded up and the throughput is used. From the Provisioned Throughput in Amazon DynamoDB documentation:
The total number of read operations necessary is the item size, rounded up to the next multiple of 4 KB, divided by 4 KB.
It can handle some bursting, but it is generally intended to be used for uniform workloads. Here is a section from the Guidelines for Working with Tables documentation and some other helpful links about the best practices:
A temporary non-uniformity in a workload can generally be absorbed by the bursting allowance, as described in Use Burst Capacity Sparingly. However, if your application must accommodate non-uniform workloads on a regular basis, you should design your table with DynamoDB's partitioning behavior in mind (see Understand Partition Behavior), and be mindful when increasing and decreasing provisioned throughput on that table.
Query and Scan guidelines for avoiding bursts of read activity
The Table Best Practices section
Use Burst Capacity Sparingly
This one is going to depend on how much data your table has, because DynamoDB will have to repartition the data if you are scaling up. See the Consider Workload Uniformity When Adjusting Provisioned Throughput documentation for more information about the partitioning..