Can replication cause request throttling?

Can replication cause request throttling? - azure-cosmosdb

I have a following use case:
we have single write region Azure Cosmos
the db will be replicated to other Azure regions (e.g. 5 additional Azure regions treated as read replicas)
we have a daily ETL job that cannot interrupt users querying the database. Because of that we're rate limiting in the application layer the requests we're making to Cosmos - e.g. we're consuming only 5k RUs/s out 10k RUs/s provisioned (to be strict we're provisioning 1k RUs/s with Auto-Scale setting). Thanks to that, while we're doing the ETL job we're consuming 50% of available RUs.
Question:
is it possible that during replication we will hit 100% RU utilization in one of the read replicas because Cosmos will try to replicate everything as fast as possible?

It depends on (1) whether the ETL is reading from Cosmos DB as a source or writing to Cosmos DB as a target and (2) what the aggregate workload (ETL + app) looks like.
I'll explain -
The best way to think about RU's is its a proxy metrics for the physical system resources it takes to perform a request (CPU, memory, IOPs).
Writes must be applied to all regions - and therefore consume RUs (CPU/memory/IOPs) in each of the replicated regions. Given an example 3-region setup consisting of West US + East US + North Europe - writing a record will result in RU consumption in West US, East US, and North Europe.
Reads can be served out a single region independently of another region. Given an example 3-region setup consisting of West US + East US + North Europe - reading a record in West US has no impact on East US or North Europe.
As you suggested -
Rate-limiting the ETL job is a good choice. Depending on what your ETL tool is - a few of them have easier to use client rate-limiting configuration options (e.g. notably, Azure Data Factory data flow and the spark connector for Cosmos DB's Core SQL API has a "Write throughput budget" concept) - alternatively, you can scale down the ETL job itself to ensure the ETL job is a natural bottleneck.
Configuring autoscale maximums to have sufficient headroom for [RU/sec needed for the rate-limited ETL] + [upper bound for expected RU/sec needed for the application] is a good call as well - while also noting Cosmos DB's autoscale comes with a 10x scaling factor. (e.g. configuring a 20K RU/sec maximum on cosmos db autoscale results in automatic scaling between 2K - 20K RU/sec).
One side note worth mentioning... depending on what the use-case is for the ETL job - if this is a classic ETL from OLTP => OLAP, it may be worthwhile to consider looking at Cosmos DB's analytical storage + Synapse Link feature set as an easier out-of-box solution.

Related

Will selecting the firestore multi-region speed up access from regions around the world?

Currently, there are only two multi-regions of firestore, but if you select these, will access from other regions be quicker?
When I accessed an app that selected the Tokyo region from Germany, I was surprised at the slow speed of the firestore.

Will access from other regions be quicker?
Before you use Cloud Firestore, you must choose a location for your database. To reduce latency and increase availability, store your data close to the users and services that need it. Select a regional location for lower costs, for lower write latency if your application is sensitive to latency, or for co-location with other GCP resources.
Will selecting the Firestore multi-region speed up access from regions around the world?
Multi-region Locations locations can withstand the loss of entire regions and maintain availability without losing data. Global apps can take advantage of the Multiregion deployment of the Firestore. Having multiple servers distributed worldwide reduces the latency for the end-users, increases performance, and data will not be lost in the event of a catastrophic event in a single datacenter region.
Note:
For choosing the database location, it is important to know what are the best practices that you can follow. Having been said that, when you create your database instance, select the database location closest to your users and compute resources. Far-reaching network hops are more error-prone and increase query latency.

When CosmosDb says 10 ms write latency, does it include the time it takes for the write request to reach the cosmosdb server?

This document on cosmosdb consistency levels and latency says CosmosDb writes have a latency of 10ms at the 99th percentile. Does this include the time it takes for the write to reach CosmosDB. I suspect not, since if I issue a request far away from my configured azure regions, I don't see how it can take < 10 ms.

The SLA is for the latency involved in performing the operations and returning results. As you mention, it does not include time taken to reach the Cosmos endpoint, which depends on the client's distance.
As indicated in performance guidance:
You can get the lowest possible latency by ensuring that the calling
application is located within the same Azure region as the provisioned
Azure Cosmos DB endpoint.
In my experience latency <10ms is typical for an app located in the same region as the Cosmos endpoint it works against.

Different RCUs per region for dynamo db global tables

I am using dynamo db global tables for one of my services and it is provisioned with same RCUS/WCUS(rWCUs) for all regions as per the general recommendations. I am not using the on-demand mode.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/globaltables_reqs_bestpractices.html#globaltables_reqs_bestpractices.tables
I understand that the WCUs should be kept consistent across regions to allow writes to replicate. However, the read traffic for my service varies quite a lot across regions. So I was wondering if it is ok to configure different RCUs per region? The documentation doesn't specifically mention anything about RCUs.

It is safe to keep different RCUs in different regions. Typical use cases is active passive multi-region architecture.
But If failover from one region to another region is automatic, you should make sure that passive region will be able to take care of the burst of traffic.

DynamoDB DAX and High Availability

What's your preferred strategy for dealing with DAX's maintenance windows?
DynamoDB itself has no MWs and is very highly available. When DAX is introduced into the mix, if it's the sole access point of clients to DDB then it becomes a SPOF. How do you then handle degradation gracefully during DAX scheduled downtimes?
My thinking was to not use the DAX Client directly but introduce some abstraction layer that allows it to fall back to direct DDB access when DAX is down. Is that a good approach?

DAX maintenance window doesn't take the cluster offline, unless it is a one-node cluster. DAX provides availability through multiple nodes in the cluster. For a multi-node cluster, each node in the cluster goes through maintenance in a specific order in order for the cluster to remain available. With retries configured on the DAX client, your worload shouldn't see an impact during maintenance windows.
Other than maintenance window, cluster nodes need to be divided across multiple AZs, for availability in case an AZ goes down.
An abstraction layer to fall back to DDB is not a bad idea. But you need to make sure you have the provisioned capacity configured to handle the load spike.

Network Internet Egress Price

I never got the right pricing policy from Google. It's a little confusing for me. I'm currently testing google compute engine to try to understand how it all works.
In a simple example when using Cloud Laucher Wordpress there is a sustainable forecast of $4,49, Using a machine Instance of the VM: 1 shared vCPU + 0.6 GB of memory (f1-micro) and standard 10G disk.
In less than 10 days of testing where I am the only user, where the instance became connected throughout the period and my use was very little. I began tracking billing details.
Look at the numbers:
Generic Micro instance with burstable CPU, no scratch disk 4.627 Minutes $0,62
Storage PD Capacity 1,92 GB-month $0,08
And my big surprise
Network Internet Egress from Americas to Americas 12,82 GB $1,54
I am aware that this value is very small, this is very clear
But imagine an example 100 people making use in the same period
Network Internet Egress from Americas to Americas Would jump $ 154,00
Is my reasoning correct?
Is there a way to lower this value?
Another doubt.
Which has the lowest cost Google compute engine or Google app engine?

Don't buy web server server in cloud platform unless you know the pricing strategy inside out.
Yes, GCS and other cloud platform charge a hefty sum for Egress/outgoing traffics if you are not careful, e.g. if you get DDoS on your website, you will be doomed with a huge bills. As shown in the table, GCS is charging $0,12/GB egress
# Europe using "," as decimal separator
12,82GB x $0,12/GB = $1,5384 ~ $1,54
If you expect more traffics, then you should look into Google cloud CDN service offer, which charge a lower egress price. However, you need to be careful when using those CDN, they will charge ingress traffics (so only allow traffics between your storage repo with the CDN).
It is a good idea to setup alarm/event alert services to warn you about abnormal traffics.
Since you are in the cloud, you should compare prices of CDN services
(Update)
Google App Engine(GAE) is just a Platform as a Services, which Google will give you daily free resources, i.e. 1GB egress per day. $0,12/GB price still apply if you use above the limit. In addition, you are limited to the offering given, there is not web services GAE at the moment.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex