Throttling with on-demand DynamoDB - amazon-dynamodb

Our DynamoDB is configured with On-Demand capacity, but still seeing read/write throttling requests during high traffic hours.
Any clues?

On-Demand does not mean "unlimited throughtput". There is a limit, according to the docs:
If you recently switched an existing table to on-demand capacity mode for the first time, or if you created a new table with on-demand capacity mode enabled, the table has the following previous peak settings, even though the table has not served traffic previously using on-demand capacity mode:
Newly created table with on-demand capacity mode: The previous peak is 2,000 write request units or 6,000 read request units. You can drive up to double the previous peak immediately, which enables newly created on-demand tables to serve up to 4,000 write request units or 12,000 read request units, or any linear combination of the two.
Existing table switched to on-demand capacity mode: The previous peak is half the previous write capacity units and read capacity units provisioned for the table or the settings for a newly created table with on-demand capacity mode, whichever is higher.
I've also found an interesting article with some experiments and numbers: Understanding the scaling behaviour of DynamoDB OnDemand tables.

You could try to analyze how the partition keys for your write requests vary over the individual writes.. If too many write-requests for the same PK happen, it can overwhelm the partitions, and cause throttling.
Consider mixing up the PKs in BatchWriteItem calls, and also over time, so you don't hit the same partitions too frequently

Related

What is the storage limit in Azure Data Explorer and what does it depend on?

I could not find it in the official documentation and service limits also keep silence about this aspect.
What is the maximum size of a database in Azure Data Explorer?
What is the maximum total size of all databases in the cluster?
Where exactly is my data stored? Is it on the cluster nodes HDDs?
There is no hard limit on the amount of data that can be ingested into a database. Ingested data is persisted to limitless durable storage (Azure blob storage). Based on the databases' effective caching policy - ingested data can be cached on the cluster's nodes' local SSDs.
The total amount of data that can fit in a single cluster's hot cache depends on the number of nodes and the chosen SKU - if, for example, the maximum SSD size per VM is 4TB, and the maximum number of nodes in a cluster is 1000, then you can have up to 4000TB of (compressed) data available in the hot cache.
Data compression ratio varies based on the schema and data. For example, if the compression ratio is 10, in the example above you can cache up to 40PB of data, while still having additional data queryable from cold storage (not cached).
Whether or not it makes sense to store and cache so much data in a single, and very large, cluster - greatly depends on the scenario and workloads running against the cluster.

Throttling requests in Cosmos DB SQL API

I'm running a simple adf pipeline for storing data from data lake to cosmos db (sql api).
After setting database throughput to Autopilot 4000 RU/s, the run took ~11 min and I see 207 throttling requests. On setting database throughput to Autopilot 20,000 RU/s, the run took ~7 min and I see 744 throttling requests. Why is that? Thank you!
Change the Indexing Policy to None from Consistent for the ADF copy activity and then change back to Consistent when done.
Azure Cosmos DB supports two indexing modes:
Consistent: The index is updated synchronously as you create, update or delete items. This means that the consistency of your read queries will be the consistency configured for the account.
None: Indexing is disabled on the container. This is commonly used when a container is used as a pure key-value store without the need for secondary indexes. It can also be used to improve the performance of bulk operations. After the bulk operations are complete, the index mode can be set to Consistent and then monitored using the IndexTransformationProgress until complete.
How to modify the indexing policy:
Modifying the indexing policy

HammerDB - how to do in-memory testing with mysql, TPCC mode?

I was doing TPCC testing with HammerDB 3.1 and with mysql as a backend.
By looking at the resources consumption during testing, I don't think it does in-memory testing, as there is a lot of IO activity.
So my question is if with mysql server it is possible to do in-memory hammerdb testing, or how can I achieve that?
Using the default mysql configuration otherwise, based on the hammerdb documentation.
For TPCC testing on relational databases the I/O activity will be divided into 2 major areas, the data area and the redo/transaction log or WAL and both of these will be buffered in memory but with a key difference. For the data area you will have a buffer cache or pool into which you read the data blocks or pages, for MySQL and the InnoDB storage engine this is set by for example innodb_buffer_pool_size=64000M. At a basic level during a test rampup you will read most of your data blocks from disk into this buffer pool from where all of the operations on the blocks will take place in memory. Periodically the modified blocks will be written out to disk. To prevent data being lost as result of a failure all of the changes are written out to the redo log and this is flushed to disk on commit. There is an in memory buffer where changes will be queued and potentially flushed to disk together however this buffer will be small as all the changes need to reach persistent media when they happen (so a log buffer larger than 10s of MB will not be filled before it is flushed). Therefore for the TPCC test you will see a lot of write activity to the redo log. If the persistent media (HDD or SDD) cannot keep up with the writes this will be a bottleneck preventing a higher transaction rate from adding more virtual users and therefore need less memory in the data area as by default one virtual user will work mostly on 1 warehouse. If you want to increase the data area activity the "use all warehouses" check-box will increase the number of warehouses that each virtual user will use.

Druid: how it uses cache and OS page cache?

I'm observing that Druid query performance can benefit from the previous queries. Thus, I'm trying to understand the reasons.
I Know that Druid uses cache (I'm using cache in the Broker), but this cache just stores the result of the queries per segment (right?). However, I have noticed that if the subsequent queries use the same segments, the performance improves.
Example:
Select sum(metric), dimteste2, dimteste3 from table x where dimteste='x' group by dimteste2, dimteste3 -> 2 seconds
Select sum(metric), dimteste2, dimteste3 from table x where dimteste3='y' group by dimteste2, dimteste3 -> 0.5 seconds
I searched and found that this behavior can be achieved by the OS page cache. Based on my research, I think that Druid, during the first query to the datasource, loads the necessary segments to memory (OS page cache). And the segments can be read faster in the next queries.
Am I right?
I looked in the Druid documentation and I was unable to find anything helpful.
Can you please give me some help explaining this awesome behavior?
Best regards,
José Correia
Druid does uses caching to improve the performance at various levels. At segment level on historicals and druid query level on broker. The more memory you give it the faster it works.
Below is the documentation on the caching -
Query Caching
Druid supports query result caching through an LRU cache. Results are stored on a per segment basis, along with the parameters of a given query. This allows Druid to return final results based partially on segment results in the cache and partially on segment results from scanning historical/real-time segments.
Segment results can be stored in a local heap cache or in an external distributed key/value store. Segment query caches can be enabled at either the Historical and Broker level (it is not recommended to enable caching on both).
Query caching on Brokers
Enabling caching on the broker can yield faster results than if query caches were enabled on Historicals for small clusters. This is the recommended setup for smaller production clusters (< 20 servers). Take note that when caching is enabled on the Broker, results from Historicals are returned on a per segment basis, and Historicals will not be able to do any local result merging.
Query caching on Historicals
Larger production clusters should enable caching only on the Historicals to avoid having to use Brokers to merge all query results. Enabling caching on the Historicals instead of the Brokers enables the Historicals to do their own local result merging and puts less strain on the Brokers.
Druid broker doesn't do anything directly at the OS page cache, if
there is virtual memory available by the os, than based on memory
requirements the heaps are allocated.

DynamoDB tables uses more Read/Write capacity than expected

Background: I have a DynamoDB table which I interact exclusively with a DAO class. This DAO class logs metrics on the number of calls to insert/update/delete operations to the boto library.
I noticed that the # of operations I logged in my code do correlate with the consumed read/write capacity on AWS monitoring but the AWS measurements on consumption are 2 - 15 times the # of operations I logged in my code.
I know for a fact that the only other process interacting with the table is my manual queries on the AWS UI (which is insignificant in capacity consumption). I also know that the size of each item is < 1 KB, which would mean each call should only consume 1 read.
I use strong consistent reads so I do not enjoy the 2x benefit of eventual consistent reads.
I am aware that boto auto-retries at most 10 times when throttled but my throttling threshold is seldomly reached to trigger such a problem.
With that said, I wonder if anyone knows of any factor that may cause such a discrepency in # of calls to boto w.r.t. the actual consume capacities.
While I'm not sure of the support with the boto AWS SDK, in other languages it is possible to ask DynamoDB to return the capacity that was consumed as part of each request. It sounds like you are logging actual requests and not this metric from the API itself. The values returned by the API should accurately reflect what is consumed.
One possible source for this discrepancy is if you are doing query/scan requests where you are performing server side filtering. DynamoDB will consume the capacity for all of the records scanned and not just those returned.
Another possible cause of a discrepancy are the actual metrics you are viewing in the AWS console. If you are viewing the CloudWatch metrics directly make sure you are looking at the appropriate SUM or AVERAGE value depending on what metric you are interested in. If you are viewing the metrics in the DynamoDB console the interval you are looking at can dramatically affect the graph (ex: short spikes that appear in a 5 minute interval would be smoothed out in a 1 hour interval).

Resources