Azure data explorer Batching policy modifications - azure-data-explorer

I have huge amount of data flowing from Eventhub to Azure Data Explorer. Currently we have not done any modification on the batching policy, so it is scheduling every 5 minutes. But we need to reduce it to a less value so that the end to end lag is reduced.
How can I calculate the ideal batching time for this setup. Is there any calculation based on the CPU of ADX and the Data ingestion on Eventhub , so that I can figure out an ideal time without affecting the CPU usage of ADX

There is no tool or other functionality that allows you to do it today, you will need to try the desired setting for "MaximumBatchingTimeSpan" and observe the impact on the CPU usage.

Essentially, if you are ingesting huge volumes of data (per table), you are probably not using the 5 minutes batching window, or can decrease it significantly without detrimental impact.
Please have a look at the latency and batching metrics for your cluster (https://learn.microsoft.com/en-us/azure/data-explorer/using-metrics#ingestion-metrics) and see a) if your actual latency is below 5 minutes - which would indicate the batching is not driven by time, and b) what is the "Batching type" that your cluster most often enacts - time/size/number of items.
Based on these numbers you can tweak down the time component of your ingestion batching policy.

Related

Why wouldn't a small Firebase Functions app just use a single Function to handle logic?

...aside from the benefit in separate performance monitoring and logging.
For logging, I am confident I can get granularity through manually adding the name of the "routine" to each call. This is how it is now with several discrete Functions for different parts of the system:
There are multiple automatic logs: start and finish of the routine, for example. It would be more challenging to find out how expensive certain routines are, but it would not be impossible.
The reason I want the entire logic of the application handled by a single handle function is because of reducing cold starts: one function means only one container that can be persistently kept alive when there are very few users of the app.
If a month is ~2.6m seconds and we assume the system uses 1 GB RAM and 1 GHz CPU frequency at all times, that's:
2600000 * 0.0000025 + 2600000 * 0.000001042 = USD$9.21 a month
...for one minimum instance.
I should also state that all of my functions have the bare minimum amount of global scope code; it just sets up Firebase assets (RTDB and Firestore).
From a billing, performance (based on user wait time), and user/developer experience perspective, is there any reason why it would be smart to keep all my functions discrete?
I'd also accept an answer saying "one single function for all logic is reasonable" as long as there's a reason for it.
Thanks!
If you have very small app with ~5 end points and very low traffic. Sure you could do something like this. But why not do it:
billing and performance
The important thing to realize is that with every request a new instance of your function is created. Which means there could be 10s of them running at the same time.
If you would like to have just 1 instance handling all the traffic you should explore GCP Cloud run, where you have 1 container handling multiple requests and scaling only when it's not sufficient.
Imagine you have several end-points and every one of them have different performance requirements.
1 can need only 128MB or RAM
1 can need 1GB RAM
(FYI: You can control the CPU MHz of the function via the RAM settings too - which can speed up execution in some cases)
If you had only 1 function with 1GB of ram. Every request would allocate such function and in some cases most of the memory could go to waste.
But if you split it into multiple, some requests will require much less resources and can save you $ when we talk about bigger amount of executions / month. (tens of thousands+).
Let's imagine function, 3 second execution, 10k executions/month:
128MB would cost you $0.0693
1024MB would cost you $0.495
As you can see, with small app the difference could be nothing. But if you scale it matters. (*The cost can vary based on datacenter)
As for the logging, I don't think it matters. Usually in bigger systems there could be messages traveling trough several functions so you have to deal with that anyway.
As for the cold start. You just need good UI to facilitate that. At first I was worry about it in our apps but later on, you just get used to it that some action can take ~2s to execute (cold start). And you should have the UI "loading" regardless, because you don't know if the function will take ~100ms or 3s due to bad connection.

Dynamodb Autoscaling not working fast enough

I'm running a simple api that gets an item from a dynamodb table on each call, I have auto scaling set to a minimum of 25 and a maximum of 10 000.
However if I send 15 000 requests with a tool like wrk or hey, I get about 1000 502s,
dynamodb's metrics show that reads are throttled
the scaling activities log on the table shows that the RCUs were scaled to 99 but not more than that
lambda logs show that the function starts to take longer, it usually takes about 20ms to run, but the function starts to run for 500.1500,3000 ms and start timing out (I'm assuming that's caused by the throttling)
Why isn't the autoscaling working better? It only scales upto 99RCUs but my max is 10, 000.
We ran into the same problem when testing DynamoDB autoscaling for short amounts of time, and it turns out the problem is that the scaling events only happen after 5 minutes of elevated throughput (you can see this by inspecting the CloudWatch alarms the autoscaling sets up)
This excellent blog post helped us solve this by creating a Lambda that responds to the CloudWatch API events and improves the responsiveness of the alarms to one minute: https://hackernoon.com/the-problems-with-dynamodb-auto-scaling-and-how-it-might-be-improved-a92029c8c10b
from: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/AutoScaling.html
What you defined as "target utilization"?
Target utilization is the ratio of consumed capacity units to provisioned capacity units, expressed as a percentage. Application Auto Scaling uses its target tracking algorithm to ensure that the provisioned read capacity of ProductCatalog is adjusted as required so that utilization remains at or near 70 percent.
also, i think that the main reason that autoscale not works for you, is because your work might not stay elevated for a long time:
"DynamoDB auto scaling modifies provisioned throughput settings only when the actual workload stays elevated (or depressed) for a sustained period of several minutes"
DynamoDB auto scaling modifies provisioned throughput settings only when the actual workload stays elevated (or depressed) for a sustained period of several minutes. The Application Auto Scaling target tracking algorithm seeks to keep the target utilization at or near your chosen value over the long term.
Sudden, short-duration spikes of activity are accommodated by the table's built-in burst capacity. For more information, see Use Burst Capacity Sparingly.

Amazon DynamoDB throttling

I've enabled auto-scaling for our dynamo-db table. It has a target utilization at 30% but it keeps throttling.
See this example screenshot where throttling is happening
As you can see it's exactly scaling up as you want it too. But I don't understand why it's still throttling. Its almost always below the provisioned throughput.
Can anyone explain what's going wrong and why it's still throttling?
Thanks,
Hendrik
Very hard to tell from the graph, and limited information.
Some thoughts:
AutoScaling can take 5 - 10 minutes to kick in. This is not fast enough if there is a sudden increase in usage. Perhaps you are seeing throttling in that 5 - 10 minute window before it scales up.
If you set CloudWatch Metrics to 1 min interval, you might see whats going on in a big more detail.
As mkobit mentioned, you might be hitting throughput on your Partition, depending on how your data is structured.
Your capacity units are evenly distributed between your partitions. So you may hit the capacity limit on your partition you have and which records you are trying to access, but not your table throughput.
This also depends on the amount of data you have stored, number of partitions etc.
HTH

DynamoDB AutoScaling still throttles

I have an application using DynamoDB and I noticed they just implemented autoscaling which is awesome. I love the concept and the timing for my app is pretty perfect. However I am still getting some issues that I wonder if I can't tweak settings to remove.
My application gets definite spikes in usage so I think this is an ideal thing to use, however with autoscaling on I still am getting some throttling. Here is my read graphs for the last 12 hours:
As you can see, when it spikes the usage is set low, so it throttles for a minute or two until the update kicks in, then works. That's ok I guess and better than not scaling, but I would like it not to throttle at all...
Is there any way to tell DynamoDB to never throttle unless it goes over 100 (or 200 or whatever I set as the top limit)? Just if it gets a surge turn up the throughput for 15 minutes or whatever until the surge is over?
Autoscaling uses CloudWatch. You can see these alarms by going to the CloudWatch dashboard and look for the alarms that include your table name and "DO NOT EDIT OR DELETE" in the description.
Why am I telling you this?
Well, the CloudWatch has some minimal period granularity. Currently it's 1 minute. It means that it will wait at least for 1 minute before firing any event to its listeners. Therefore, it will take at least one minute after the load starts until the capacity is increased. Actually it will be even more, since increasing the capacity also takes time. Bottom line: if you have very large spike, some requests may be throttled, since the the auto-scaling will not yet take effect and bursting can be exhausted.
The simple, but costly, solution will be increasing the initial capacity.
If you know about the upcoming spike in advance (e.g. you have some job running periodically, or customer peaks in certain times) you can use API to modify autoscaling programmatically.

AWS DynamoDB: read/write units estimation issue

I am creating an online crowd driven game. I expect the read/write requests to fluctuate (like, 50,50,50,1500,50,50,50)every second and I need to process all 100% requests with strong consistency.
I am planning to go with AWS's DynamoDB from GAE datastore for its strong consistency. I have the below doubts which I could not get clear answers in other discussions.
1. If the item size for a write action is just 4B, Will that be rounded to a 1KB and consume a write unit?
2. Financially it is not wise to set the Provisioned Throughput Capacity around the expected peak value. Alarms can warn us. But in the case of sudden rise, the requests could be throttled at the time we receive alarm. Is DynamoDB really designed to handle highly fluctuating read/write?
3. I read about Dynamc DynamoDB to update the read/write throughput capacity for us, When we add some read/write units, How long it will take to allocate them? If it takes too long, Whats the use of increasing the bar after the tide hits?
Google app engine bills just for the number of requests happen in that month. If I can make AWS work like, "Whatever the request count could be, I will expand and contract myself and charge you only for the used read/write units", I will go for AWS.
Please advise. Dont hesitate if I am not being clear at parts.
Thanks,
Karthick.
Yes. Item sizes are rounded up and the throughput is used. From the Provisioned Throughput in Amazon DynamoDB documentation:
The total number of read operations necessary is the item size, rounded up to the next multiple of 4 KB, divided by 4 KB.
It can handle some bursting, but it is generally intended to be used for uniform workloads. Here is a section from the Guidelines for Working with Tables documentation and some other helpful links about the best practices:
A temporary non-uniformity in a workload can generally be absorbed by the bursting allowance, as described in Use Burst Capacity Sparingly. However, if your application must accommodate non-uniform workloads on a regular basis, you should design your table with DynamoDB's partitioning behavior in mind (see Understand Partition Behavior), and be mindful when increasing and decreasing provisioned throughput on that table.
Query and Scan guidelines for avoiding bursts of read activity
The Table Best Practices section
Use Burst Capacity Sparingly
This one is going to depend on how much data your table has, because DynamoDB will have to repartition the data if you are scaling up. See the Consider Workload Uniformity When Adjusting Provisioned Throughput documentation for more information about the partitioning..

Resources