How accurate is Cosmos DB's _ts timestamp? (time drift) - azure-cosmosdb

I would like to store events with timestamps, and I need clients to be able to retrieve them in the order in which they were stored. However, I'm wondering how much time drift may exist in the Cosmos DB storage layer: the machines that perform the writes may be off by a few seconds in their time synchronization, and so an event that really happened later is registered as having happened earlier than another event. I can tolerate up to 1s of time drift, but more than that and there may start being weird effects on the client side.

The _ts property on Cosmos is accurate and there is no clock skew either between replicas during leader election or during regional failover.

Related

How can Cosmos DB guarantee similar availability as eventual consistency for strong consistent account?

Cosmos DB documentation seems to suggest that if we configure our Strong consistent Cosmos DB account with >= 3 regions, we get similar availability as eventual consistency (SLAs).
But, according to the CAP theorem how can this be the case. Suppose we have 3 regions, and there is a network partition isolating 1 read region from the remaining two (1 write and 1 read region). If a write requests comes to the write region, there are two options:
Fail the request
Commit the write to the write region and the reachable read region. The region outside the partition cannot be reached.
If Cosmos DB goes with option 2, then if a read requests were to come to the region that could not be reached, then because Cosmos DB uses local quorum, it will return stale data, which violates the consistency is guarantees.
Therefore, Cosmos DB must fail the write request in the face of network partitions.
This is accomplished by the use of a dynamic quorum over the regions when using 3+ regions. When one of the secondary read regions is impacted by a network partition, the service will remove it from the quorum, allowing writes to commit and replicate to the other online region for an RTO of 0.
The primary region periodically gets health signals from all regions that have been created for the account. It also keeps track of the commits that all regions have caught up to. Until the read region that was previously unavailable has caught up to commits it missed out on, it is not marked online. Once fully caught up, it starts accepting new writes from the primary region at steady state and is simultaneously marked available for serving read traffic.

Relational DB and kafka consumer sync

I have a unique requirement where i have to fetch messages from a topic and persist it on db and then poll it by 15 minutes interval. Please could somebody suggest how to do it effectively using Spring-Kafka ? Thanks.
If I understand the requirement correctly you pretty much describe the solution - call poll every 15 minutes using a scheduled job (if you have a cluster of consumers then in order to retrieve all messages every 15 minutes use a Cron based schedule so they all poll at the same time, if it's just a single consumer you can just use a simple schedule to run every 15 minutes).
Each time you poll, persist the data from the records in your DB and commit the DB transaction. Use auto commit in Kafka with commit.interval.ms less than 15 minutes - that means each time you poll, the previous batch of records (which have definitely been persisted in the DB) will be committed in Kafka (note that auto-commit causes commit of offsets to happen synchronously in the poll method).
The other important configuration to set is max.poll.interval.ms - make that greater than 15 minutes otherwise a rebalance would get triggered between polls.
Note that if a rebalance does occur (as is generally inevitable at some point) you will consume the same records again. Simplest approach here is to use a unique index in the database and just catch and ignore the exception if you try to store the same record.
I would be interested in why you have the 15 minute requirement - the simplest way of consuming is just to poll continuously in a loop, or using a framework like spring-kafka, which again polls frequently. If you are just persisting the data from the records wouldn't it just work regardless of the interval between batches of records? I guess you have some other constraint to deal with (or I have just misunderstood your requirements)
Note, as mentioned in my comments below if you want to use spring-kafka there is an idle between polls property - https://docs.spring.io/spring-kafka/api/org/springframework/kafka/listener/ContainerProperties.html#setIdleBetweenPolls-long-
The other info above is still important if you take this approach rather than scheduling your own polling.

How does azure cosmosdb change feed internal communication work?

Hi I am wandering how the internal mechanism of subscribing to an azure cosmosdb change feed actually works. Specifically if you are using azure-cosmosdb-js from node. Is there some sort of long polling mechanism that checks a change feed table or are events pushed to the subscriber using web-sockets?
Are there any limits on the number of subscriptions that you can have to any partition keys change feed?
Imagine the change feed as nothing other than an event source that keeps track of document changes.
All the actual change feed consuming logic is abstracted into the SDKs. The server just offers the change feed as something the SDK can consume. That is what the change feed processor libraries are using to operate.
We don't know much about the Change Feed Processor SDKs mainly because they are not open source. (Edit: Thanks to Matias for pointing out that they are actually open source now). However, from extensive personal usage I can tell you the following.
The Change Feed Processor will need a collection to store some documents. Those documents are just checkpoints for the processor to keep track of the consumption. Each main document in those lease collections corresponds to a physical partition. Each physical partition will be polled by the processor in a set interval. you can set that by setting the FeedPollDelay setting which "gets or sets the delay in between polling a partition for new changes on the feed, after all current changes are drained.".
The library is also capable of spreading the leases if multiple processors are running against a single collection. If a service fails, the running services will pick up the lease. Due to polling and delays you might end up reprocessing already processed documents. You can also choose to set the CheckpointFrequency of the change feed processor.
In terms of "subscriptions", you can have as many as you want. Keep in mind however that the change feed processor is writing the lease documents. They are smaller than 1kb so you will be paying the minimum charge of 10 RUs per change. However, if you end up with more than 40 physical partitions you might have to raise the throughput from the minimum 400 RU/s.

Monitor DocumentDb RU usage

Is there a way to programmatically monitor the Request Unit utilization of a DocumentDB database so we can manually increase the Request Units proactively?
There isn't currently a call you can execute to see the remaining RU, since they're replenished at every second. Chances are that the time it would take to request and process the current RU levels for a given second would return expired data.
To proactively increase RU/s, the best that can be done would be to use the data from the monitoring blade.
I think you could try below steps:
Use azure cosmos db Database - List Metrics rest api from here.
Create Azure Time Trigger Function to execute above code in schedule(maybe every 12 hours). If the metrics touch the threshold value,send a warning email to yourself.

DynamoDB tables uses more Read/Write capacity than expected

Background: I have a DynamoDB table which I interact exclusively with a DAO class. This DAO class logs metrics on the number of calls to insert/update/delete operations to the boto library.
I noticed that the # of operations I logged in my code do correlate with the consumed read/write capacity on AWS monitoring but the AWS measurements on consumption are 2 - 15 times the # of operations I logged in my code.
I know for a fact that the only other process interacting with the table is my manual queries on the AWS UI (which is insignificant in capacity consumption). I also know that the size of each item is < 1 KB, which would mean each call should only consume 1 read.
I use strong consistent reads so I do not enjoy the 2x benefit of eventual consistent reads.
I am aware that boto auto-retries at most 10 times when throttled but my throttling threshold is seldomly reached to trigger such a problem.
With that said, I wonder if anyone knows of any factor that may cause such a discrepency in # of calls to boto w.r.t. the actual consume capacities.
While I'm not sure of the support with the boto AWS SDK, in other languages it is possible to ask DynamoDB to return the capacity that was consumed as part of each request. It sounds like you are logging actual requests and not this metric from the API itself. The values returned by the API should accurately reflect what is consumed.
One possible source for this discrepancy is if you are doing query/scan requests where you are performing server side filtering. DynamoDB will consume the capacity for all of the records scanned and not just those returned.
Another possible cause of a discrepancy are the actual metrics you are viewing in the AWS console. If you are viewing the CloudWatch metrics directly make sure you are looking at the appropriate SUM or AVERAGE value depending on what metric you are interested in. If you are viewing the metrics in the DynamoDB console the interval you are looking at can dramatically affect the graph (ex: short spikes that appear in a 5 minute interval would be smoothed out in a 1 hour interval).

Resources