Deal with I/O capacities in dynamodb - amazon-dynamodb

We are using the dynogels library to query a dynamoDB table. Unfortunately, as dynamoDB do not have a pagination feature, for a specific need, we are retrieving all data from the table through a loadAll to get all items of the table (18K items) and we are facing a error due to exceed of the I/O read capacity.
Except this query that retrieve all the content of the table, we only have very small read usage of the table. We also tried to dynamically update the I/O unity but we are limited to 4 changes/per hour.
Can you suggest as a solution? Do you know how to use the pagination in dynamodB ? is-it possible to use DAX as a local dynamoDB cache?
Thank you

Unfortunately, as dynamoDB do not have a pagination feature, for a specific need,
DynamoDB does have pagination feature where you can specify the limit on number of pages to query and as part of the result, DynamoDB query / scan API returns a nextStartKey which can be used as the exclusiveStartKey to retrieve next, and do this until the nextStartKey is null which indicates the end of results: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.Pagination
Don't they have support of pagination in the dynogels library?

Related

query with pagination causing high throughput usage

I use Azure Cosmos DB API for MongoDB account, version 3.6. In queries using skip and limit I noticed higher throughput usage. The higher skip is the more costly query is.
db.MyCollection.find({Property:"testtest"}).skip(12000).limit(10)
Above query costs around 3000 RU. The property that is in find clause is my partition key. I have read that currently cosmosdb is capable of doing queries with offset and limit but I found that officaly only in SQL API for CosmosDb there's OFFSET LIMIT clause. Is it possible with MongoDb API either or should I live with costly queries with skip ?
The SQL API will yield the same result with OFFSET LIMIT. You'll find an almost linear increase in RU as you increase the offset as each query loops over all skipped documents.
If possible you should try to use the continuation token if possible in your context. You could also adjust your filter criteria using a indexed property to move over your data.
The RU charge of a query with OFFSET LIMIT will increase as the number of terms being offset increases. For queries that have multiple pages of results, we typically recommend using continuation tokens. Continuation tokens are a "bookmark" for the place where the query can later resume. If you use OFFSET LIMIT, there is no "bookmark". If you wanted to return the query's next page, you would have to start from the beginning.
Source

Azure Cosmos DB pagination using azure-documentdb-java

I am trying to understand how to implement pagination using azure-documentdb-java. As I see continuation token allows me retrieve only the next page from a query executed earlier.
Is there a way how I can return the concrete page and previous page effortlessly?
No, Cosmos DB doesn't support (efficient) offset-based pagination. You can use OFFSET LIMIT but it's not efficient. The only efficient pagination mode is token-based. You can get previously fetched pages by memorizing the previous continuation tokens (but if the underlying data has changed, the previous pages will also change).

Simple CosmosDb query high RU

I am evaluating Cosmos Db for a project and working through the documentation. I have created a sample collection following the documentation on this page https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-getting-started. When I run the first query on this page in the local emulator I get the following results:
Why is the Request Charge 2.89 RUs? From all of the documentation I have read this should be 1 RU. The collection is partitioned on the id field and is auto indexed and Cross Partition Queries are enabled. I have event tried putting both items in the same partition and I get the same results.
1 RU is the cost of a Point-Read operation, not a query. Reference: https://learn.microsoft.com/azure/cosmos-db/request-units:
The cost to read a 1 KB item is 1 Request Unit (or 1 RU).
Also there:
Query patterns: The complexity of a query affects how many RUs are consumed for an operation. Factors that affect the cost of query operations include
If you want to read a single document, and you know the id and partition key, just do a point operation, it will always be cheaper than a query with the id="something" query. If you don't know the partition key, then yes, you need a cross partition query, because you don't know on which partition key is stored and there could be multiple documents with the same id (as long as their partition keys are different, see https://learn.microsoft.com/azure/cosmos-db/partitioning-overview).
You can use any of the available SDKs or work with the REST API.

Elastic Cache vs DynamoDb DAX

I have use case where I write data in Dynamo db in two table say t1 and t2 in transaction.My app needs to read data from these tables lot of times (1 write, at least 4 reads). I am considering DAX vs Elastic Cache. Anyone has any suggestions?
Thanks in advance
K
ElastiCache is not intended for use with DynamoDB.
DAX is good for read-heavy apps, like yours. But be aware that DAX is only good for eventually consistent reads, so don't use it with banking apps, etc. where the info always needs to be perfectly up to date. Without further info it's hard to tell more, these are just two general points to consider.
Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache that can reduce Amazon DynamoDB response times from milliseconds to microseconds, even at millions of requests per second. While DynamoDB offers consistent single-digit millisecond latency, DynamoDB with DAX takes performance to the next level with response times in microseconds for millions of requests per second for read-heavy workloads. With DAX, your applications remain fast and responsive, even when a popular event or news story drives unprecedented request volumes your way. No tuning required. https://aws.amazon.com/dynamodb/dax/
AWS recommends that you use **DAX as solution for this requirement.
Elastic Cache is an old method and it is used to store the session states in addition to the cache data.
DAX is extensively used for intensive reads through eventual consistent reads and for latency sensitive applications. Also DAX stores cache using these parameters:-
Item cache - populated with items with based on GetItem results.
Query cache - based on parameters used while using query or scan method
Cheers!
I'd recommend to use DAX with DynamoDB, provided you're having more read calls using item level API (and NOT query level API), such as GetItem API.
Why? DAX has one weird behavior as follows. From, AWS,
"Every write to DAX alters the state of the item cache. However, writes to the item cache don't affect the query cache. (The DAX item cache and query cache serve different purposes, and operate independently from one another.)"
Hence, If I elaborate, If your query operation is cached, and thereafter if you've write operation that affect's result of previously cached query and if same is not yet expired, in that case your query cache result would be outdated.
This out of sync issue, is also discussed here.
I find DAX useful only for cached queries, put item and get item. In general very difficult to find a use case for it.
DAX separates queries, scans from CRUD for individual items. That means, if you update an item and then do a query/scan, it will not reflect changes.
You can't invalidate cache, it only invalidates when ttl is reached or nodes memory is full and it is dropping old items.
Take Aways:
doing puts/updates and then queries - two seperate caches so out of sync
looking for single item - you are left only with primary key and default index and getItem request (no query and limit 1). You can't use any indexes for gets/updates/deletes.
Using ConsistentRead option when using query to get latest data - it works, but only for primary index.
Writing through DAX is slower than writing directly to Dynamodb since you have a hop in the middle.
XRay does not work with DAX
Use Case
You have queries that you don't really care they are not up to date
You are doing few putItem/updateItem and a lot of getItem

Can I use ZCatalogs Query Plan to Optimise Catalog Queries?

I'm wondering if I can make use of the information provided by the Query Report and Query Plan tabs on the portal catalog. Can I optimize ZCatalog queries based on the query report? How does ZCatalogs Query Plan differ from a query plan of an SQL database?
The query plan information is used to improve catalog performance, but you cannot optimize your own queries based on plan information.
The catalog only builds up that information as needed, based on your index sizes; unlike a SQL database the catalog does not plan each query based on such information but rather looks up pre-calculated plans from the structure reflected in the Query Plan tab.
The query report tab does give you information about what indexes are performing poorly for your code; you may want to rethink code that uses those combinations of indexes and/or look into why those indexes performed poorly; perhaps your query didn't limit the result quickly enough or the slow index is very large, indicating that perhaps your ZODB cache is too small to hold that large index or that other results keep pushing it out.
On the whole, for large applications it is a good idea to retain the query plan; in one project we dump cache information before stopping instances and reload that after starting again, and that includes the catalog query plan:
plan = site.portal_catalog.getCatalogPlan()
with open(PLAN_PATH, 'w') as out:
out.write(plan)
and on load:
if os.path.exists(PLAN_PATH):
from Products.ZCatalog.plan import PriorityMap
try:
PriorityMap.load_from_path(PLAN_PATH)
except Exception:
pass

Resources