Does DynamoDB latency depend on number of items per partition - amazon-dynamodb

Newbie to DDB here. I've been using a DDB table for a year now. Recently, I made improvements by compressing the payload using gzip (and representing it as a binary in DDB) and storing the new data in another newly created beta table. Overall compression was 3x. I expected the read latency(GetItem) to improve as well as it's less data to be transported over the wire. However, I'm seeing that the read latency has increased from ~ 50ms p99.9 to ~114 ms p99.9. I'm not sure how that happened and was wondering if because of the compression, now I have a lot of rows per partition (which I think is defined as <= 10 GB). I now have 3-4x more rows per partition. So, I'm wondering that once dynamoDb determines the right partition for a partition key, then within the partition how does it find the correct item? Gut feel is that this shouldn't lead to an increase in latency as a simplified representation of the partition can be a giant hashmap so it'd just be a simple lookup. I'd appreciate any help here.
My DDB schema:
partition-key - user-id,dataset-name
range-key - update-timestamp
payload - used to be string, now is compressed/binary.
In my GetItem requests, I specify both partition key and range key.

According to your description, your change included two unrelated parts: You compressed the payload, and increased the number of items per partition. The first change - the compression - probably has little effect on the p99 latency (it could have a more noticable effect on the mean latency - which, according to Little's Law is related to throughput, if your client has fixed concurrency - but I'd expect it to lower, not increase).
Some guesses as to what might have increased the p99 latency:
More items per partition means that DynamoDB (which uses a B-tree) needs to do more disk reads to find a specific item. Since each disk access has rare delays caused by queueing, this adds to the tail latency.
You said that the change caused each partition to hold more items, I guess this means you now have fewer partitions. If you have too few of them, you can start getting unbalanced load on the different DynamoDB partitions, and more contention and latency for specific "hot" partitions.
I don't know how you measure your latency. Your client now needs (I guess) to uncompress the returned result, maybe it is now busier, adding queening delays in the client? Can you lower your client's concurrency (how many client threads run in parallel) and see if the high tail latency is an artifact of the server design, or the client's design?

Related

Storage cost / supportability / performance tradeoffs using compact attributes in DynamoDB

I'm working on large scale component that generates unique/opaque tokens representing business entities. Over time there will be many billions of these records, but for the first year we're not expecting growth to exceed more than 2 billion individual items (probably less than 500 million).
The system itself is horizontally scaled but needs token generation to be idempotent; data integrity is maintained by using a contained but reasonably complex combination of transactional writes with embedded condition expressions AND standalone condition check write items.
The tokens themselves are UUIDs, and 'being efficient' are persisted as Binary attribute values (16 bytes) rather than the string representation (36 bytes), however the downside is that the data doesn't visualise nicely in query consoles making support hard if we encounter any bugs and/or broken data. Note there is no extra code complexity since we implement attributevalue.Marshaler interface to bind UUID (language) types to DynamoDB Binary attributes, and similarly do the same for any composite attributes.
My question relates to (mostly) data size/saving. Since the tokens are the partition keys, and some mapping columns are [token] -> [other token composite attributes], for example two UUIDs concatenated together into 32 bytes.
I wanted to keep really tight control over storage costs knowing that, over time, we will be spending ~$0.25/GB per month for this. My question is really three parts:
Are the PK/SK index size 'reserved' (i.e. padded) so it would make no difference at all to storage cost if we compress the overall field sizes down to the minimum possible size? (... I read somewhere that 100 bytes is typically reserved.
If they ARE padded, the cost savings for the data would be reasonably high, because each (tree) index node will be nearly as big as the data being mapped. (I assume a tree index is used once hashed PK has routed the query to the right server node/disk etc.)
Is there any observable query time performance benefit to compacting 36 bytes into 16 (beyond saving a few bytes across the network)? i.e. if Dynamo has to read fewer pages it'll work faster, but in practice are we talking microseconds at best?
This is a secondary concern, but is worth considering if there is a lot of concurrent access to the data. UUIDs will distribute partitions but inevitably sometimes we will have some more active partitions than others.
Are there any tools that can parse bytes back into human-readable UUIDs (or that we customise to inject behaviour to do this)?
This is concern, because making things small and efficient is ok, but supporting and resolving data issues will be difficult without significant tooling investment, and (unsurprisingly) the DynamoDB console, DynamoDB IntelliJ plugin and AWS NoSQL Workbench all garble the binary into unreadable characters.
No, the PK/SK types are not padded. There's 100 bytes of overhead per item stored.
Sending less data certainly won't hurt your performance. Don't expect a noticeable improvement though. If shorter values can keep your items at 1,024 bytes instead of 1,025 bytes then you save yourself a Write Unit during the save.
For the "garbled" binary values I assume you're looking at the base64 encoded values, which is a standard binary encoding standard which can be reversed by lots of tooling (now that you know the name of it).

Dynamo - Increased Read Latency During Writes

There is a DynamoDB table Entity which has a hash key on id and GSI on another attribute: cardId. The GSI only has range key and does not have any sort key.
Whenever, we get a batch of create/update requests, we first use the GSI to read existing data and then write the main table, which also updates the GSI table eventually. During this time, we may also serve some parallel read requests from the GSI.
We are seeing an issue where the latency of both main table and GSI table increases from 200ms to 10-15 seconds during this time (batch writes + reads). I am not able to establish a co-relation between consecutive reads and writes in the table. The table is set to use on-demand capacity and there is no throttling. "SuccessfulRequestLatency" is ~300-400 ms only.
It is the DDB client method that has latency in seconds. It does not do any data transformation, just return the DB data as is to upper layers. Anything else that I should be monitoring to get to the root cause for this?
Thanks!
I don't have a full answer, but do have some directions you might want to investigate.
First, I noticed in the past is that extremely long latencies may indicate that your client gave up and retried the request. Some clients hide this retry, and it just looks like a very slow request from the outside.
Second, you're right that on-demand billing mode doesn't throttle based on provisioned throughput, but it nevertheless can do throttling - see
https://aws.amazon.com/premiumsupport/knowledge-center/on-demand-table-throttling-dynamodb/. By default there are limits on the throughput that an on-demand table can have, as well as how quickly the throughput may grow. These limits are at least partially for your protection - you wouldn't want a run-away-train application to accidentally do billions of requests and cost you a million dollars :-)

AWS DynamoDB: read/write units estimation issue

I am creating an online crowd driven game. I expect the read/write requests to fluctuate (like, 50,50,50,1500,50,50,50)every second and I need to process all 100% requests with strong consistency.
I am planning to go with AWS's DynamoDB from GAE datastore for its strong consistency. I have the below doubts which I could not get clear answers in other discussions.
1. If the item size for a write action is just 4B, Will that be rounded to a 1KB and consume a write unit?
2. Financially it is not wise to set the Provisioned Throughput Capacity around the expected peak value. Alarms can warn us. But in the case of sudden rise, the requests could be throttled at the time we receive alarm. Is DynamoDB really designed to handle highly fluctuating read/write?
3. I read about Dynamc DynamoDB to update the read/write throughput capacity for us, When we add some read/write units, How long it will take to allocate them? If it takes too long, Whats the use of increasing the bar after the tide hits?
Google app engine bills just for the number of requests happen in that month. If I can make AWS work like, "Whatever the request count could be, I will expand and contract myself and charge you only for the used read/write units", I will go for AWS.
Please advise. Dont hesitate if I am not being clear at parts.
Thanks,
Karthick.
Yes. Item sizes are rounded up and the throughput is used. From the Provisioned Throughput in Amazon DynamoDB documentation:
The total number of read operations necessary is the item size, rounded up to the next multiple of 4 KB, divided by 4 KB.
It can handle some bursting, but it is generally intended to be used for uniform workloads. Here is a section from the Guidelines for Working with Tables documentation and some other helpful links about the best practices:
A temporary non-uniformity in a workload can generally be absorbed by the bursting allowance, as described in Use Burst Capacity Sparingly. However, if your application must accommodate non-uniform workloads on a regular basis, you should design your table with DynamoDB's partitioning behavior in mind (see Understand Partition Behavior), and be mindful when increasing and decreasing provisioned throughput on that table.
Query and Scan guidelines for avoiding bursts of read activity
The Table Best Practices section
Use Burst Capacity Sparingly
This one is going to depend on how much data your table has, because DynamoDB will have to repartition the data if you are scaling up. See the Consider Workload Uniformity When Adjusting Provisioned Throughput documentation for more information about the partitioning..

Is there an extensible open address hash table?

I'm implementing a key-value store in memory used as a real-time service. It needs to be fast and low latency. Because the number of elements is not known in advance, the table should grow gradually. I prefer open-address hash tables since they are significantly faster than chaining ones. However, open-address hash tables typically require occasional very slow rehashs, during which the service is unavailable. This is not acceptable. On the other hand, extensible hash tables are typically based on chaining, and are slower than open address ones.
Are there any hash tables that are as fast as open address ones (like google's dense_hash_map) and do not have large rehash overhead?
One simple way is to use an array of k small hash tables, so the rehash overhead can be reduced to 1/k. However, this doesn't make sense in my case, because I need to reduce the total unavailable time rather than the max unavailable time. If k small hash tables are used, although the max unavailable time is reduced to 1/k, the rehashs occur k times more often.

How would a "hot" hash key affect throughout in practice on Amazon DynamoDB?

First, here's a support document for DyanamoDB giving guidance on how to avoid a "hot" hash key.
Conceptually, a hot hash key is simple and they are (typically) straightforward to avoid - the documents give good examples of how to do so. I am not asking what a hot hash key is.
What I do want to know is how much would throughout performance actually degrade for a given level of provisioned read/write units at the limit, that is, when all read/write activity is focused on only one (or very few) partition(s). For properly distributed hash key activity (uniform across partitions), DynamoDB gives single millisecond response times. So, what would response times look like in the worst case scenario?
Here's a post on AWS asking a related question which gives a specific use-case where knowledge of this answer matters.
DynamoDB will also guarantee you single millisecond response times, even for your 'hot' hash key, BUT you will very likely see a lot throttled requests. And that even when you seem to have plenty of unspent provisioned throuput. That is because your provisioned throuput gets effectively divided by the number of partitions. But as you don't know how many partitions there are at a given time, it varies how much of your provisioned throuput you can spent for a single hashkey...

Resources