Monitor DocumentDb RU usage - azure-cosmosdb

Is there a way to programmatically monitor the Request Unit utilization of a DocumentDB database so we can manually increase the Request Units proactively?

There isn't currently a call you can execute to see the remaining RU, since they're replenished at every second. Chances are that the time it would take to request and process the current RU levels for a given second would return expired data.
To proactively increase RU/s, the best that can be done would be to use the data from the monitoring blade.

I think you could try below steps:
Use azure cosmos db Database - List Metrics rest api from here.
Create Azure Time Trigger Function to execute above code in schedule(maybe every 12 hours). If the metrics touch the threshold value,send a warning email to yourself.

Related

setTimeout() on realtime database trigger in Google cloud functions

I have a Firebase realtime database structure that looks like this:
rooms/
room_id/
user_1/
name: 'hey',
connected: true
connected is basically a Boolean indicating as to whether the user is connected or not and will be set to false using onDisconnect() that Firebase provides.
Now my question is - If I trigger a cloud function every time theconnected property of a user changes , can I run a setTimeout() for 45 seconds . If the connected property is still false, at the end of the setTimeout() (for which I read that particular connected value from the db ) then I delete the node of the user (like the user_1 node above).
Will ths setTimeout() pose a problem if there are many triggers fired simultaneously?
In short, Cloud Functions have a maximum time they can run.
If your timeout makes it's callback after that time limit expired, the function will already have been terminated.
Consider using a very simple and efficient way for calling scheduled code in Cloud Functions called a cron-job.
Ref
If you use setTimeout() to delay the execution of a function for 45 seconds, that's probably not enough time to cause a problem. The default timeout for a function is 1 minute, and if you exceed that, the function will be forced to terminate immediately. If you are concerned about this, you should simply increase the timeout.
Bear in mind that you are paying for the entire time that a function executes, so even if you pause the function, you will be billed for that time. If you want a more efficient way of delaying some work for later, consider using Cloud Tasks to schedule that.
For my understanding your functionality is intend to monitoring the users that connected and is on going connect to the Firebase mealtime database from the cloud function. Is that correct?
For monitoring the Firebase Realtime database, GCP provided the tool to monitor the DB performance and usage.
Or simply you just want to keep the connection a live ?
if the request to the Firebase Realtime DB is RESTful requests like GET and PUT, it only keep the connection per request, but is is High requests, it still cost more.
Normally, we suggest the client to use the native SDKs for your app's platform, instead of the REST API. The SDKs maintain open connections, reducing the SSL encryption costs and database load that can add up with the REST API.
However, If you do use the REST API, consider using an HTTP keep-alive to maintain an open connection or use server-sent events and set keep-alive, which can reduce costs from SSL handshakes.

Relational DB and kafka consumer sync

I have a unique requirement where i have to fetch messages from a topic and persist it on db and then poll it by 15 minutes interval. Please could somebody suggest how to do it effectively using Spring-Kafka ? Thanks.
If I understand the requirement correctly you pretty much describe the solution - call poll every 15 minutes using a scheduled job (if you have a cluster of consumers then in order to retrieve all messages every 15 minutes use a Cron based schedule so they all poll at the same time, if it's just a single consumer you can just use a simple schedule to run every 15 minutes).
Each time you poll, persist the data from the records in your DB and commit the DB transaction. Use auto commit in Kafka with commit.interval.ms less than 15 minutes - that means each time you poll, the previous batch of records (which have definitely been persisted in the DB) will be committed in Kafka (note that auto-commit causes commit of offsets to happen synchronously in the poll method).
The other important configuration to set is max.poll.interval.ms - make that greater than 15 minutes otherwise a rebalance would get triggered between polls.
Note that if a rebalance does occur (as is generally inevitable at some point) you will consume the same records again. Simplest approach here is to use a unique index in the database and just catch and ignore the exception if you try to store the same record.
I would be interested in why you have the 15 minute requirement - the simplest way of consuming is just to poll continuously in a loop, or using a framework like spring-kafka, which again polls frequently. If you are just persisting the data from the records wouldn't it just work regardless of the interval between batches of records? I guess you have some other constraint to deal with (or I have just misunderstood your requirements)
Note, as mentioned in my comments below if you want to use spring-kafka there is an idle between polls property - https://docs.spring.io/spring-kafka/api/org/springframework/kafka/listener/ContainerProperties.html#setIdleBetweenPolls-long-
The other info above is still important if you take this approach rather than scheduling your own polling.

Can the Google Calendar API events watch be used without risking to exceed the usage quotas?

I am using the Google Calendar API to preprocess events that are being added (adjust their content depending on certain values they may contain). This means that theoretically I need to update any number of events at any given time, depending on how many are created.
The Google Calendar API has usage quotas, especially one stating a maximum of 500 operations per 100 seconds.
To tackle this I am using a time-based trigger (every 2 minutes) that does up to 500 operations (and only updates sync tokens when all events are processed). The downside of this approach is that I have to run a check every 2 minutes, whether or not anything has actually changed.
I would like to replace the time-based trigger with a watch. I'm not sure though if there is any way to limit the amount of watch calls so that I can ensure the 100 seconds quota is not exceeded.
My research so far shows me that it cannot be done. I'm hoping I'm wrong. Any ideas on how this can be solved?
AFAIK, that is one of the best practice suggested by Google. Using watch and push notification allows you to eliminate the extra network and compute costs involved with polling resources to determine if they have changed. Here are some tips to best manage working within the quota from this blog:
Use push notifications instead of polling.
If you cannot avoid polling, make sure you only poll when necessary (for example poll very seldomly at night).
Use incremental synchronization with sync tokens for all collections instead of repeatedly retrieving all the entries.
Increase page size to retrieve more data at once by using the maxResults parameter.
Update events when they change, avoid re-creating all the events on every sync.
Use exponential backoff for error retries.
Also, if you cannot avoid exceeding to your current limit. You can always request for additional quota.

Overcome Marketo's quota limits

As far as I know, Marketo limits the number of REST API requests to 10,000 per day. Is there a way to overcome this limit? Can I pay and get more of those?
I found out that the REST API requests and the SOAP API requests counts separately but I'm trying to find a solution that is limited to REST API.
Moreover, in order to get an access token I need to sacrifice a request. I need to know how long this access token will be alive in order to save as much requests as possible.
You can increase your limit just by asking your account manager. It costs about 15K per year to increase your limit by 10K API calls.
Here are the default limits in case you don't have them yet:
Default Daily API Quota: 10,000 API calls (counter resets daily at 12:00 AM CST)
Rate Limit: 100 API calls in a 20 second window
Documentation: REST API
You'll want to ask your Marketo account manager about this.
I thought I would update this with some more information since I get this question a lot:
http://developers.marketo.com/rest-api/
Daily Quota: Most subscriptions are allocated 10,000 API calls per day (which resets daily at 12:00AM CST).  You can increase your daily quota through your account manager.
Rate Limit: API access per instance limited to 100 calls per 20 seconds.
Concurrency Limit:  Maximum of 10 concurrent API calls.
For the Daily limit:
Option 1: Call your account manager. This will cost you $'s. For a client I work for we have negotiated a much higher limit.
Option 2: Store and Batch your records. For example, you can send a batch of 300 leads in a single lead insert/update call. Which means you can insert/update 3,000,000 leads per day.
For the Rate limit:
Option 1 will probably not work. Your account manager will be reluctant to change this unless you a very large company.
Option 2: You need to add some governance to your code. There are several ways to do this, including queues, timers with a counter, etc. If you make multi-threaded calls, you will need to take into account concurrency etc.
Concurrent call limit:
You have to limit your concurrent threads to 10.
There are multiple ways to handle API Quota limits.
If you all together want to avoid hitting API limit, try to achieve your functionality thru Marketo Webhooks. Marketo webhook will not have API limits, but it has its own CONS. Please research on this.
You may use REST API, but design your strategy to batch the maximum records in a single payload instead of smaller chunks, e.g. sending 10 different API calls with each 20 records, accumulate the max allowed payload and call Marketo API once.
The access token is valid for 1 hour after authenticating.
Marketo's Bulk API can be helpful in regard to rate limiting as once you have the raw activities the updates, etc on the lead object can be done without pinging marketo for each lead: http://developers.marketo.com/rest-api/bulk-extract/ however be aware of export limits that you may run into when bulk exporting lead + activities. Currently, Marketo only counts the size of the export against the limit when the job has been completed which means you can launch a max of 2 concurrent export jobs(which sum to more than the limit) at the same time as a workaround. Marketo will not kill a running job if a limit has been reached so long as the job was launched prior to the limit being reached.
Marketo has recently upgraded the maximum limit
Daily Quota: Subscriptions are allocated 50,000 API calls per day (which resets daily at 12:00AM CST). You can increase your daily quota through your account manager.
Rate Limit: API access per instance limited to 100 calls per 20 seconds.
Concurrency Limit: Maximum of 10 concurrent API calls.
https://developers.marketo.com/rest-api/

DynamoDB tables uses more Read/Write capacity than expected

Background: I have a DynamoDB table which I interact exclusively with a DAO class. This DAO class logs metrics on the number of calls to insert/update/delete operations to the boto library.
I noticed that the # of operations I logged in my code do correlate with the consumed read/write capacity on AWS monitoring but the AWS measurements on consumption are 2 - 15 times the # of operations I logged in my code.
I know for a fact that the only other process interacting with the table is my manual queries on the AWS UI (which is insignificant in capacity consumption). I also know that the size of each item is < 1 KB, which would mean each call should only consume 1 read.
I use strong consistent reads so I do not enjoy the 2x benefit of eventual consistent reads.
I am aware that boto auto-retries at most 10 times when throttled but my throttling threshold is seldomly reached to trigger such a problem.
With that said, I wonder if anyone knows of any factor that may cause such a discrepency in # of calls to boto w.r.t. the actual consume capacities.
While I'm not sure of the support with the boto AWS SDK, in other languages it is possible to ask DynamoDB to return the capacity that was consumed as part of each request. It sounds like you are logging actual requests and not this metric from the API itself. The values returned by the API should accurately reflect what is consumed.
One possible source for this discrepancy is if you are doing query/scan requests where you are performing server side filtering. DynamoDB will consume the capacity for all of the records scanned and not just those returned.
Another possible cause of a discrepancy are the actual metrics you are viewing in the AWS console. If you are viewing the CloudWatch metrics directly make sure you are looking at the appropriate SUM or AVERAGE value depending on what metric you are interested in. If you are viewing the metrics in the DynamoDB console the interval you are looking at can dramatically affect the graph (ex: short spikes that appear in a 5 minute interval would be smoothed out in a 1 hour interval).

Resources