peak read capacity units dynamo DB table - amazon-dynamodb

I need to find out the peak read capacity units consumed in the last 20 seconds in one of my dynamo DB table. I need to find this pro-grammatically in java and set an auto-scaling action based on the usage.
Please can you share a sample java program to find the peak read capacity units consumed in the last 20 seconds for a particular dynamo DB table?
Note: there are unusual spikes in the dynamo DB requests on the database and hence needs dynamic auto-scaling.
I've tried this:
result = DYNAMODB_CLIENT.describeTable(recomtableName);
readCapacityUnits = result.getTable()
.getProvisionedThroughput().getReadCapacityUnits();
but this gives the provisioned capacity but I need the consumed capacity in last 20 seconds.

You could use the CloudWatch API getMetricStatistics method to get a reading for the capacity metric you require. A hint for the kinds of parameters you need to set can be found here.

For that you have to use Cloudwatch.
GetMetricStatisticsRequest metricStatisticsRequest = new GetMetricStatisticsRequest()
metricStatisticsRequest.setStartTime(startDate)
metricStatisticsRequest.setEndTime(endDate)
metricStatisticsRequest.setNamespace("AWS/DynamoDB")
metricStatisticsRequest.setMetricName('ConsumedWriteCapacityUnits',)
metricStatisticsRequest.setPeriod(60)
metricStatisticsRequest.setStatistics([
'SampleCount',
'Average',
'Sum',
'Minimum',
'Maximum'
])
List<Dimension> dimensions = []
Dimension dimension = new Dimension()
dimension.setName('TableName')
dimension.setValue(dynamoTableHelperService.campaignPkToTableName(campaignPk))
dimensions << dimension
metricStatisticsRequest.setDimensions(dimensions)
client.getMetricStatistics(metricStatisticsRequest)
But I bet you'd results older than 5 minutes.
Actually current off the shelf autscaling is using Cloudwatch. This does have a drawback and for some applications is unacceptable.
When spike load is hitting your table it does not have enough capacity to respond with. Reserved with some overload is not enough and a table starts throttling. If records are kept in memory while waiting a table to respond it can simply blow the memory up. Cloudwatch on the other hand reacts in some time often when spike is gone. Based on our tests it was at least 5 mins. And rising capacity gradually, when it was needed straight up to the max
Long story short. We have created custom solution with own speedometers. What it does is counting whatever it has to count and changing tables's capacity accordingly. There is a still a delay because
App itself takes a bit of time to understand what to do
Dynamo table takes ~30 sec to get updated with new capacity details.
On a top we also have a throttling detector. So if write/read request has got throttled we immediately rise a capacity accordingly. Some times level of capacity looks all right but throttling because of HOT key issue.

Related

How to download many separate JSON documents from CosmosDB without deserializing?

Context & goal
I need to periodically create snapshots of cosmosDB partitions. That is:
export all documents from a single CosmosDB partition. Ca 100-10k doc per partition, 1KB-200KB each doc, entire partition JSON usually <50M)
each document must be handled separately, with id known.
Host the process in Azure function app, using consumption plan (so mem/CPU/duration matters).
And run this for thousands of partitions..
Using Microsoft.Azure.Cosmos v3 C# API.
What I've tried
I can skip deserialization using the Container.*StreamAsync() tools in API, and avoid parsing the document contents. This should notably reduce the CPU/Mem need also avoids accidentally touching the documents to be exported with serialization roundtrip. The tricky part is how to combine it with having 10k documents per partition.
Query individually x 10k
I could query item ids per partition using SQL and just send send separate ReadItemStreamAsync(id) requests.
This skips deserialization, still have ids, I could control how many docs are in memory at given time, etc.
It would work, but it smells as too chatty, sending 1+10k requests to CosmosDB per partition, which is a lot = millions of requests.. Also, by my experience SQL-querying large documents would usually be RU-wise cheaper than loading those documents by point reads which would add up in this scale. It would be nice to be able to pull N docuents with a single (SQL query) request..
Query all as stream x 1.
There is Container.GetItemQueryStreamIterator() for which I could just pass in select * from c where c.partiton = #key. It would be simpler, use less RUs, I could control batch size with MaxItemsCount, it sends just a minimal number or requests to cosmos (query+continuations). All is good, except ..
.. it would return a single JSON array for all documents in batch and I would need to deserialize it all to split it into individual documents and mapping to their ids. Defeating the purpose of loading them as Stream.
Similarly, ReadManyItemsStreamAsync(..) would return the items as single response stream.
Question
Does the CosmosDB API provide a better way to download a lot of individual raw JSON documents without deserializing?
Preferably with having some control over how much data is being buffered in client.
While I agree that designing the solution around streaming documents with change feed is promising and might have better scalability and cost-effect on cosmosDB side, but to answer the original question ..
The chattiness of solution "Query individually x 10k" could be reduced with Bulk mode.
That is:
Prepare a bulk CosmosClient with AllowBulkExecution option
query document ids to export (select c.id from c where c.partition = #key)
(Optionally) split the ids to batches of desired size to limit the number of documents loaded in memory.
For each batch:
Load all documents in batch concurrently using ReadItemStreamAsync(id, partition), this avoids deserialization but retains link to id.
Write all documents to destination before starting next batch to release memory.
Since all reads are to the same partition, then bulk mode will internally merge the underlying requests to CosmosDB, reducing the network "chattiness" and trading this for some internal (hidden) complexity and slight increase in latency.
It's worth noting that:
It is still doing the 1+10k queries to cosmosDB + their RU cost. It's just compacted in network.
batching ids and waiting on batch completion is required as otherwise Bulk would send all internal batches concurrently (See: Bulk support improvements for Azure Cosmos DB .NET SDK). Or don't, if you prefer to max out throughput instead and don't care about memory footprint. In this case the partitions are smallish enough so it does not matter much.
Bulk has a separate internal batch size. Most likely its best to use the same value. This seems to be 100, which is a rather good chunk of data to process anyway.
Bulk may add latency to requests if waiting for internal batch to fill up
before dispatching (100ms). Imho this is largely neglible in this case and could be avoided by fully filling the internal Bulk batch bucket if possible.
This solution is not optimal, for example due to burst load put on CosmosDB, but the main benefit is simplicity of implementation, and the logic could be run in any host, on-demand, with no infra setup required..
There isn't anything out of the box that provides a means to doing on-demand, per-partition batch copying of data from Cosmos to blob storage.
However, before looking at other ways you can do this as a batch job, another approach you may consider, is to stream your data using Change Feed from Cosmos to blob storage. The reason is that, for a database like Cosmos, throughput (and cost) is measured on a per-second basis. The more you can amortize the cost of some set of operations over time, the less expensive it is. One other major thing I should point out too is, the fact that you want to do this on a per-partition basis means that the amount of throughput and cost required for the batch operation will be a combination of throughput * the number of physical partitions for your container. This is because when you increase throughput in a container, the throughput is allocated evenly across all physical partitions, so if I need 10k RU additional throughput to do some work on data in one container with 10 physical partitions, I will need to provision 100k RU/s to do the same work for the same amount of time.
Streaming data is often a less expensive solution when the amount of data involved is significant. Streaming effectively amortizes cost over time reducing the amount of throughput required to move that data elsewhere. In scenarios where the data is being moved to blob storage, often when you need the data is not important because blob storage is very cheap (0.01 USD/GB)
compared to Cosmos (0.25c USD/GB)
As an example, if I have 1M (1Kb) documents in a container with 10 physical partitions and I need to copy from one container to another, the amount of RU needed to do the entire thing will be 1M RU to read each document, then approximately 10 RU (with no indexes) to write it into another container.
Here is the breakdown for how much incremental throughput I would need and the cost for that throughput (in USD), if I ran this as a batch job over that period of time. Keep in mind that Cosmos DB charges you for the maximum throughput per hour.
Complete in 1 second = 11M RU/s $880 USD * 10 partitions = $8800 USD
Complete in 1 minute = 183K RU/s $14 USD * 10 partitions = $140 USD
Complete in 10 minutes = 18.3K $1/USD * 10 partitions = $10 USD
However, if I streamed this job over the course of a month, the incremental throughput required would be, only 4 RU/s which can be done without any additional RU at all. Another benefit is that it is usually less complex to stream data than to handle as a batch. Handling exceptions and dead-letter queuing are easier to manage. Although because you are streaming, you will need to first look up the document in blob storage and then replace it due to the data being streamed over time.
There are two simple ways you can stream data from Cosmos DB to blob storage. The easiest is Azure Data Factory. However, it doesn't really give you the ability to capture costs on a per logical partition basis as you're looking to do.
To do this you'd need to write your own utility using change feed processor. Then within the utility, as you read in and write each item, you can capture the amount of throughput to read the data (usually 1 RU/s) and can calculate the cost of writing it to blob storage based upon the per unit cost for whatever your monthly hosting cost is for the Azure Function that hosts the process.
As I prefaced, this is only a suggestion. But given the amount of data and the fact that it is on a per-partition basis, may be worth exploring.

AWS Neptune Query gremlins slowness on cold call

I'm currently running some queries with a big gap of performance between first call (up to 2 minutes) and the following one (around 5 seconds).
This duration difference can be seen through the gremlin REST API in both execution and profile mode.
As the query is loading a big amount of data, I expect the issue is coming from the caching functionalities of Neptune in its default configuration. I was not able to find any way to improve this behavior through configuration and would be glad to have some advices in order to reduce the length of the first call.
Context :
The Neptune database is running on a db.r5.8xlarge instance, and during execution CPU always stay bellow 20%. I'm also the only user on this instance during the tests.
As we don't have differential inputs, the database is recreated on a weekly basis and switched to production once the loader has loaded everything. Our database have then a short lifetime.
The database is containing slightly above 1.000.000.000 nodes and far more edges. (probably around 10.000.000.000) Those edges are splitted across 10 types of labels, and most of them are not used in the current query.
Query :
// recordIds is a table of 50 ids.
g.V(recordIds).HasLabel("record")
// Convert local id to neptune id.
.out('local_id')
// Go to tree parent link. (either myself if edge come back, or real parent)
.bothE('tree_top_parent').inV()
// Clean duplicates.
.dedup()
// Follow the tree parent link backward to get all children, this step load a big amount of nodes members of the same tree.
.in('tree_top_parent')
.not(values('some flag').Is('Q'))
// Limitation not reached, result is between 80k and 100K nodes.
.limit(200000)
// Convert back to local id for the 80k to 100k selected nodes.
.in('local_id')
.id()
Neptune's architecture is comprised of a shared cluster "volume" (where all data is persisted and where this data is replicated 6 times across 3 availability zones) and a series of decoupled compute instances (one writer and up to 15 read replicas in a single cluster). No data is persisted on the instances however, approximately 65% of the memory capacity on an instance is reserved for a buffer pool cache. As data is read from the underlying cluster volume, it is stored in the buffer pool cache until the cache fills. Once the cache fills, a least-recently-used (LRU) eviction policy will clear buffer pool cache space for any newer reads.
It is common to see first reads be slower due to the need to fetch objects from the underlying storage. One can improve this by writing and issuing "prefetch" queries that pull in objects that they think they might need in the near future.
If you have a use case that is filling buffer pool cache and constantly seeing buffer pool cache misses (a metric one can see in the CloudWatch metrics for Neptune), then you may also want to consider using one of the "d" instance types (ex: r5d.8xlarge) and enabling the Lookup Cache feature [1]. This feature specifically focuses on improving access to property values/literals at query time by keeping them in a directly attached NVMe store on the instance.
[1] https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-lookup-cache.html

How to handle dynamodb WCU limitation ? is it queued when you go above the limit?

I need to feed the db with something like 10k items, I don't need to rush and can/want stay below the 25wcu free plan.
It will take something like 6.5minutes (10000requests/25requests/sec).
Here is the question. I will loop on a json to feed the base do I have to handle the number of request by second myself or can I push to the max and it will be queued ?
I read that I may just have an error message (400?) when I exceed the limit can I just brutally retry the failed requests (eventually making more fail) until my 10k items are put in the db ?
tldr; => best way/strategy to feed the base knowing there is a limit of calls/sec
ps: it's run from a lambda idk if it matters.
Your calculation is a little bit off unless you do this constantly, because AWS actually allows you to burst a little bit (docs):
DynamoDB provides some flexibility in your per-partition throughput
provisioning by providing burst capacity. Whenever you're not fully
using a partition's throughput, DynamoDB reserves a portion of that
unused capacity for later bursts of throughput to handle usage spikes.
DynamoDB currently retains up to 5 minutes (300 seconds) of unused
read and write capacity. During an occasional burst of read or write
activity, these extra capacity units can be consumed quickly—even
faster than the per-second provisioned throughput capacity that you've
defined for your table.
Since 300 (seconds) * 25 (WCU) = 7500 this leaves you with about 7.5k items until it will actually throttle.
Afterwards just responding to the ProvisionedThroughputExceeded error by retrying later is fine - but make sure to add a small delay between retries (e.g. 1 second) as you know that it takes time for the new WCU to flow into the tocken bucket. Immediately retrying and hammering the API is not a kind way to use the service and might look like a DoS attack.
You can also write the items in Batches to reduce the amount of network requests as network throughput is also limited in Lambda. This makes handling the ProvisionedThroughputExceeded error slightly more cumbersome, because you need to inspect the response which items failed to write, but will probably be a net positive.

Dynamo - Increased Read Latency During Writes

There is a DynamoDB table Entity which has a hash key on id and GSI on another attribute: cardId. The GSI only has range key and does not have any sort key.
Whenever, we get a batch of create/update requests, we first use the GSI to read existing data and then write the main table, which also updates the GSI table eventually. During this time, we may also serve some parallel read requests from the GSI.
We are seeing an issue where the latency of both main table and GSI table increases from 200ms to 10-15 seconds during this time (batch writes + reads). I am not able to establish a co-relation between consecutive reads and writes in the table. The table is set to use on-demand capacity and there is no throttling. "SuccessfulRequestLatency" is ~300-400 ms only.
It is the DDB client method that has latency in seconds. It does not do any data transformation, just return the DB data as is to upper layers. Anything else that I should be monitoring to get to the root cause for this?
Thanks!
I don't have a full answer, but do have some directions you might want to investigate.
First, I noticed in the past is that extremely long latencies may indicate that your client gave up and retried the request. Some clients hide this retry, and it just looks like a very slow request from the outside.
Second, you're right that on-demand billing mode doesn't throttle based on provisioned throughput, but it nevertheless can do throttling - see
https://aws.amazon.com/premiumsupport/knowledge-center/on-demand-table-throttling-dynamodb/. By default there are limits on the throughput that an on-demand table can have, as well as how quickly the throughput may grow. These limits are at least partially for your protection - you wouldn't want a run-away-train application to accidentally do billions of requests and cost you a million dollars :-)

Dynamodb Autoscaling not working fast enough

I'm running a simple api that gets an item from a dynamodb table on each call, I have auto scaling set to a minimum of 25 and a maximum of 10 000.
However if I send 15 000 requests with a tool like wrk or hey, I get about 1000 502s,
dynamodb's metrics show that reads are throttled
the scaling activities log on the table shows that the RCUs were scaled to 99 but not more than that
lambda logs show that the function starts to take longer, it usually takes about 20ms to run, but the function starts to run for 500.1500,3000 ms and start timing out (I'm assuming that's caused by the throttling)
Why isn't the autoscaling working better? It only scales upto 99RCUs but my max is 10, 000.
We ran into the same problem when testing DynamoDB autoscaling for short amounts of time, and it turns out the problem is that the scaling events only happen after 5 minutes of elevated throughput (you can see this by inspecting the CloudWatch alarms the autoscaling sets up)
This excellent blog post helped us solve this by creating a Lambda that responds to the CloudWatch API events and improves the responsiveness of the alarms to one minute: https://hackernoon.com/the-problems-with-dynamodb-auto-scaling-and-how-it-might-be-improved-a92029c8c10b
from: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/AutoScaling.html
What you defined as "target utilization"?
Target utilization is the ratio of consumed capacity units to provisioned capacity units, expressed as a percentage. Application Auto Scaling uses its target tracking algorithm to ensure that the provisioned read capacity of ProductCatalog is adjusted as required so that utilization remains at or near 70 percent.
also, i think that the main reason that autoscale not works for you, is because your work might not stay elevated for a long time:
"DynamoDB auto scaling modifies provisioned throughput settings only when the actual workload stays elevated (or depressed) for a sustained period of several minutes"
DynamoDB auto scaling modifies provisioned throughput settings only when the actual workload stays elevated (or depressed) for a sustained period of several minutes. The Application Auto Scaling target tracking algorithm seeks to keep the target utilization at or near your chosen value over the long term.
Sudden, short-duration spikes of activity are accommodated by the table's built-in burst capacity. For more information, see Use Burst Capacity Sparingly.

Resources