DynamoDB tables uses more Read/Write capacity than expected - amazon-dynamodb

Background: I have a DynamoDB table which I interact exclusively with a DAO class. This DAO class logs metrics on the number of calls to insert/update/delete operations to the boto library.
I noticed that the # of operations I logged in my code do correlate with the consumed read/write capacity on AWS monitoring but the AWS measurements on consumption are 2 - 15 times the # of operations I logged in my code.
I know for a fact that the only other process interacting with the table is my manual queries on the AWS UI (which is insignificant in capacity consumption). I also know that the size of each item is < 1 KB, which would mean each call should only consume 1 read.
I use strong consistent reads so I do not enjoy the 2x benefit of eventual consistent reads.
I am aware that boto auto-retries at most 10 times when throttled but my throttling threshold is seldomly reached to trigger such a problem.
With that said, I wonder if anyone knows of any factor that may cause such a discrepency in # of calls to boto w.r.t. the actual consume capacities.

While I'm not sure of the support with the boto AWS SDK, in other languages it is possible to ask DynamoDB to return the capacity that was consumed as part of each request. It sounds like you are logging actual requests and not this metric from the API itself. The values returned by the API should accurately reflect what is consumed.
One possible source for this discrepancy is if you are doing query/scan requests where you are performing server side filtering. DynamoDB will consume the capacity for all of the records scanned and not just those returned.
Another possible cause of a discrepancy are the actual metrics you are viewing in the AWS console. If you are viewing the CloudWatch metrics directly make sure you are looking at the appropriate SUM or AVERAGE value depending on what metric you are interested in. If you are viewing the metrics in the DynamoDB console the interval you are looking at can dramatically affect the graph (ex: short spikes that appear in a 5 minute interval would be smoothed out in a 1 hour interval).

Related

How does azure cosmosdb change feed internal communication work?

Hi I am wandering how the internal mechanism of subscribing to an azure cosmosdb change feed actually works. Specifically if you are using azure-cosmosdb-js from node. Is there some sort of long polling mechanism that checks a change feed table or are events pushed to the subscriber using web-sockets?
Are there any limits on the number of subscriptions that you can have to any partition keys change feed?
Imagine the change feed as nothing other than an event source that keeps track of document changes.
All the actual change feed consuming logic is abstracted into the SDKs. The server just offers the change feed as something the SDK can consume. That is what the change feed processor libraries are using to operate.
We don't know much about the Change Feed Processor SDKs mainly because they are not open source. (Edit: Thanks to Matias for pointing out that they are actually open source now). However, from extensive personal usage I can tell you the following.
The Change Feed Processor will need a collection to store some documents. Those documents are just checkpoints for the processor to keep track of the consumption. Each main document in those lease collections corresponds to a physical partition. Each physical partition will be polled by the processor in a set interval. you can set that by setting the FeedPollDelay setting which "gets or sets the delay in between polling a partition for new changes on the feed, after all current changes are drained.".
The library is also capable of spreading the leases if multiple processors are running against a single collection. If a service fails, the running services will pick up the lease. Due to polling and delays you might end up reprocessing already processed documents. You can also choose to set the CheckpointFrequency of the change feed processor.
In terms of "subscriptions", you can have as many as you want. Keep in mind however that the change feed processor is writing the lease documents. They are smaller than 1kb so you will be paying the minimum charge of 10 RUs per change. However, if you end up with more than 40 physical partitions you might have to raise the throughput from the minimum 400 RU/s.

Monitor DocumentDb RU usage

Is there a way to programmatically monitor the Request Unit utilization of a DocumentDB database so we can manually increase the Request Units proactively?
There isn't currently a call you can execute to see the remaining RU, since they're replenished at every second. Chances are that the time it would take to request and process the current RU levels for a given second would return expired data.
To proactively increase RU/s, the best that can be done would be to use the data from the monitoring blade.
I think you could try below steps:
Use azure cosmos db Database - List Metrics rest api from here.
Create Azure Time Trigger Function to execute above code in schedule(maybe every 12 hours). If the metrics touch the threshold value,send a warning email to yourself.

Druid: how it uses cache and OS page cache?

I'm observing that Druid query performance can benefit from the previous queries. Thus, I'm trying to understand the reasons.
I Know that Druid uses cache (I'm using cache in the Broker), but this cache just stores the result of the queries per segment (right?). However, I have noticed that if the subsequent queries use the same segments, the performance improves.
Example:
Select sum(metric), dimteste2, dimteste3 from table x where dimteste='x' group by dimteste2, dimteste3 -> 2 seconds
Select sum(metric), dimteste2, dimteste3 from table x where dimteste3='y' group by dimteste2, dimteste3 -> 0.5 seconds
I searched and found that this behavior can be achieved by the OS page cache. Based on my research, I think that Druid, during the first query to the datasource, loads the necessary segments to memory (OS page cache). And the segments can be read faster in the next queries.
Am I right?
I looked in the Druid documentation and I was unable to find anything helpful.
Can you please give me some help explaining this awesome behavior?
Best regards,
José Correia
Druid does uses caching to improve the performance at various levels. At segment level on historicals and druid query level on broker. The more memory you give it the faster it works.
Below is the documentation on the caching -
Query Caching
Druid supports query result caching through an LRU cache. Results are stored on a per segment basis, along with the parameters of a given query. This allows Druid to return final results based partially on segment results in the cache and partially on segment results from scanning historical/real-time segments.
Segment results can be stored in a local heap cache or in an external distributed key/value store. Segment query caches can be enabled at either the Historical and Broker level (it is not recommended to enable caching on both).
Query caching on Brokers
Enabling caching on the broker can yield faster results than if query caches were enabled on Historicals for small clusters. This is the recommended setup for smaller production clusters (< 20 servers). Take note that when caching is enabled on the Broker, results from Historicals are returned on a per segment basis, and Historicals will not be able to do any local result merging.
Query caching on Historicals
Larger production clusters should enable caching only on the Historicals to avoid having to use Brokers to merge all query results. Enabling caching on the Historicals instead of the Brokers enables the Historicals to do their own local result merging and puts less strain on the Brokers.
Druid broker doesn't do anything directly at the OS page cache, if
there is virtual memory available by the os, than based on memory
requirements the heaps are allocated.

Integration testing a DynamoDB client which uses inconsistent reads?

Situation:
A web service with an API to read records from DynamoDB. It uses eventually consistent reads (GetItem default mode)
An integration test consisting of two steps:
create test data in DynamoDB
call the service to verify that it is returning the expected result
I worry that this test is bound to be fragile due to eventual consistency of the data.
If I attempt to verify the data immediately after writing using GetItem withConsistenRead=true it only guarantees that the data has been written to the majority of DB copies, but not all, so the service under test still has a chance to read from the non-updated copy on the next step.
Is there a way to ensure that the data has been written to all DynamoDB copies before proceeding?
The data usually reach all geographically distributed replicas in a second.
My suggestion is to wait (i.e. in Java terms sleep for few seconds) for couple of seconds before calling the web service should produce the desired result.
After inserting the data into DynamoDB table, wait for few seconds before calling the web service.
Eventually Consistent Reads (Default) – the eventual consistency
option maximizes your read throughput. However, an eventually
consistent read might not reflect the results of a recently completed
write. Consistency across all copies of data is usually reached within
a second. Repeating a read after a short time should return the
updated data.

Can I rely on riak as master datastore in e-commerce

In riak documentation, there are often examples that you could model your e-commerce datastore in certain way. But here is written:
In a production Riak cluster being hit by lots and lots of concurrent writes,
value conflicts are inevitable, and Riak Data Types
are not perfect, particularly in that they do not guarantee strong
consistency and in that you cannot specify the rules yourself.
From http://docs.basho.com/riak/latest/theory/concepts/crdts/#Riak-Data-Types-Under-the-Hood, last paragraph.
So, is it safe enough to user Riak as primary datastore in e-commerce app, or its better to use another database with stronger consistency?
Riak out of the box
In my opinion out of the box Riak is not safe enough to use as the primary datastore in an e-commerce app. This is because of the eventual consistency nature of Riak (and a lot of the NoSQL solutions).
In the CAP Theorem distributed datastores (Riak being one of them) can only guarentee at most 2 of:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response
about whether it succeeded or failed)
Partition tolerance (the system
continues to operate despite arbitrary partitioning due to network
failures)
Riak specifically errs on the side of Availability and Partition tolerance by having eventual consistency of data held in its datastore
What Riak can do for an e-commerce app
Using Riak out of the box, it would be a good source for the content about the items being sold in your e-commerce app (content that is generally written once and read lots is a great use case for Riak), however maintaining:
count of how many items left
money in a users' account
Need to be carefully handled in a distributed datastore.
Implementing consistency in an eventually consistent datastore
There are several methods you can use, they include:
Implement a serialization method when writing updates to values that you need to be consistent (ie: go through a single/controlled service that guarantees that it will only update a single item sequentially), this would need to be done outside of Riak in your API layer
Change the replication properties of your consistent buckets so that you can 'guarantee' you never retrieve out of date data
At the bucket level, you can choose how many copies of data you want
to store in your cluster (N, or n_val), how many copies you wish to
read from at one time (R, or r), and how many copies must be written
to be considered a success (W, or w).
The above method is similar to using the strong consistency model available in the latest versions of Riak.
Important note: In all of these data store systems (distributed or not) you in general will do:
Read the current data
Make a decision based on the current value
Change the data (decrement the Item count)
If all three of the above actions cannot be done in atomic way (either by locking or failing the 3rd if the value was changed by something else) an e-commerce app is open to abuse. This issue exists in traditional SQL storage solutions (which is why you have SQL Transactions).

Resources