DynamoDB read and write - amazon-dynamodb

what constitutes an actual read in DynamoDB?
is it reading every line in a table or what data is returned?
is this why a scan is so expensive - you read the entire table and are charged for every table line that is read?
Can you put ElasticCache (Memcached) in front of DynamoDB to keep the cost down?
Finally are you charged for a query that yields no results?

See this link: http://aws.amazon.com/dynamodb/faqs/
1 Write = 1 Write per second for an item up to 1Kb in size.
1 Read = 2 Reads per second for an item up to 1Kb in size, or 1 per second if you required fully consistent results.
For example, if your items are 512 bytes and you need to read 100
items per second from your table, then you need to provision 100 units
of Read Capacity.
If your items are larger than 1KB in size, then you should calculate
the number of units of Read Capacity and Write Capacity that you need.
For example, if your items are 1.5KB and you want to do 100
reads/second, then you would need to provision 100 (read per second) x
2 (1.5KB rounded up to the nearest whole number) = 200 units of Read
Capacity.
Note that the required number of units of Read Capacity is determined
by the number of items being read per second, not the number of API
calls. For example, if you need to read 500 items per second from your
table, and if your items are 1KB or less, then you need 500 units of
Read Capacity. It doesn’t matter if you do 500 individual GetItem
calls or 50 BatchGetItem calls that each return 10 items.
The above applies to all the usual methods, GET, BATCH X & QUERY.
SCAN is a little different, they don't document exactly how they calculate the usage but they do offer the following:
The Scan API will iterate through your entire dataset and apply the
filter conditions to every row. Since only 1MB of data can be scanned
at a time, you may need to do multiple round trips (using a
continuation token) to complete the scan. Further, using this API may
consume much of your provisioned read throughput. Hence, this method
has limited scaling characteristics and we do not recommend that you
use it as a part of your application’s regular behavior.
So to answer your question directly: The calculation is made on what data is returned in all cases except for SCAN, where there isn't really any clear indication on how they charge. A query that yields no results will not cost you anything.
You can definitely set up a caching system infront of Dynamo, definitely recommend you look into that if you want to keep your reads down.
Hope that helps!

Related

What count as one read in DynamoDB?

In AWS documentation, it stated that
"For provisioned mode tables, you specify throughput capacity in terms of read capacity units (RCUs) and write capacity units (WCUs):
One read capacity unit represents **one strongly consistent read per second**, or two eventually consistent reads per second, for an item up to 4 KB in size."
But what count as one read? If I loop through different partitions to read from dynamodb, will each loop count as one read? Thank you.
Reference: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html
For a GetItem and BatchGetItem operation which read an individual item, the size of the entire item is used to calculate the amount of RCU (read capacity units) used, even if you only ask to read specific parts from this item. As you quoted, this size is than rounded up to a multiple of 4K: If the item is 3.9K you'll pay one RCU for a strongly-consistent read (ConsistentRead=true), and two RCUs for a 4.1K item. Again, as you quoted, if you asked for an eventual-consistent read (ConsistentRead=false) the number of RCUs would be halved.
For transactions (TransactGetItems) the number of RCUs is double what it would have been with consistent reads.
For scans - Scan or Query - the cost is calculated the same as reading a single item, except for one piece of good news: The rounding up happens for the entire size read, not for each individual item. This is very important for small items - for example consider that you have items of 100 bytes each. Reading each one individually costs you one RCU even though it's only 100 bytes, not 4K. But if you Query a partition that has 40 of these items, the total size of these 40 items is 4000 bytes so you pay just one RCU to read all 40 items - not 40 RCUs. If the length of the entire partion is 4 MB, you'll pay 1024 RCUs when ConsistentRead=true, or 512 RCUs when ConsistentRead=false, to read the entire partition - regardless of how many items this partition contains.

how many read capacity unit I need for dynamodb

my application will query dynamodb 500queries/second, and for each query the estimated response data will be 300bytes. and my application will keep this frequency every second meaning it will continuous make 500queries/second. what's the right number I should pick for read capacity unit in my case? Thanks
From the docs...
Read capacity unit (RCU): Each API call to read data from your table is a read request. Read requests can be strongly consistent, eventually consistent, or transactional. For items up to 4 KB in size, one RCU can perform one strongly consistent read request per second. Items larger than 4 KB require additional RCUs. For items up to 4 KB in size, one RCU can perform two eventually consistent read requests per second.
So if eventually consistent is good enough, then 250 RCU is all that is needed.
If you need strongly consistent, then you'd need 500 RCU.

Does AWS Dynamodb limit option in query limits the capacity unit used?

I have a question...
If I have 1000 item having same partition key in a table... And if I made a query for this partition key with limit 10 then I want to know does it take read capacity unit for 1000 items or for just 10 items
Please clear my doubt
I couldn't find the exact point in the DynamoDB documentation. From my experience it uses only the returned limit for consumed capacity which is 10 (Not 1000).
You can quickly evaluate this also using the following approach.
However, you can specify the ReturnConsumedCapacity parameter in
a Query request to obtain this information.
The limit option will limit the number of results returned. The capacity consumed depends on the size of the items, and how many of them are accessed (I say accessed because if you have filters in place, more capacity may be consumed than the number of items actually returned would consume if there are items that get filtered out) to produce the results returned.
The reason I mention this is because, for queries, each 4KB of returned capacity is equivalent to 1 read capacity unit.
Why is this important? Because if your items are small, then for each capacity unit consumed you could return multiple items.
For example, if each item is 200 bytes in size, you could be returning up to 20 items for each capacity unit.
According to the aws documentation:
The maximum number of items to evaluate (not necessarily the number of matching items). If DynamoDB processes the number of items up to the limit while processing the results, it stops the operation and returns the matching values up to that point, and a key in LastEvaluatedKey to apply in a subsequent operation, so that you can pick up where you left off.
It seems to me that it means that it will not consume the capacity units for all the items with the same partition key. According to your example the consumed capacity units will be for your 10 items.
However since I did not test it I cannot be sure, but that is how I understand the documentation.

dynamodb scan/query returning significantly less than 1MB in results

I am performing a scan operation on one of my tables, and in the request I specify a "projectionExpression", which aims at reducing the amount of returned data.
I am setting no limit to the scan (although I have also tried setting the limit to 50, 100 etc.)
I am getting only about 20-30 results, weighing around 12KB-15KB total response data, I am using a javascript function to measure the size of the response.
I have also tried returning only the primary key in my porjectionExpression, to see if this affects the number of results I get, but I still get the same number of results.
I know from the documentation, that a scan operation will return only up to 1MB of data, but it surprises me that I get so few results, although my returning data is much less than 1MB, and that I did not specify a limit.
I do get a LastEvaluatedKey and am able to continue scanning, but the nominal number of results seems very low.
Same happens with a query on an index.
So my question is: does the 1MB limit apply to the raw data or to the actual data returned in the response (the latter is the impression I got from the documentation).
Thank you,
Ilan
The 1MB limit applies to the underlying data. A projection only reduces the amount of data sent over the wire.
You can use a GSI to project a small number of attributes if your items are quite large to make queries and scans less costly if you only need access to a subset of the fields.

How consumed throughput is influenced by write into local secondary index with no change in data?

Condider a table A with index A-index. I write around 100 items into A in batches (using PutRequest within BatchWriteItem).
If I repeat the operation with the same set of items, they will be just replacing the existing items. But how does that impact the local secondary index? Since it's a complete replace, does it replace in index also, thereby consuming throughput there too? Or does it figure out the items are exactly same and hence doesn't perform any operation, thereby resulting in no additional consumed throughput for index?
Found the answer by running a trial program and noticing the results in ConsumedCapacity attribute for table and indices.
During replace, if there are no changes, the consumed throughput is not calculated as DynamoDB figures out it's exactly the same. But if there are changes, throughput per item is calculated.

Resources