My application requires periodic updates in the background (Android system). To avoid having continuous connections (which would very quickly saturate the concurrent connections limit), I have my app call goOffline() after an update.
However, doing calculations on my reported bandwidth usage (in the analytics area) is showing much more than I would have expected for the few objects I have listeners registered on. Is the data usage being dominated by calling goOnline() due to extra data being transferred when re-registering the active listeners when my app updates? If this is true I'm going to have problems scaling this app.
Yes, but it's almost certainly negligible.
For example, there's going to be a little overhead in establishing the new TCP session, and even more involved in the SSL negotiation. There's a little more overhead that's specific to Firebase, but nothing beyond the order of magnitude involved in the lower level networking stuff.
So, if you're counting individual bytes, yes you're going to notice. However, in the vast majority of use cases aren't going to notice.
Related
We had a period of latency in our application that was directly correlated with latency in DynamoDB and we are trying to figure out what caused that latency.
During that time, the consumed reads and consumed writes for the table were normal (much below the provisioned capacity) and the number of throttled requests was also 0 or 1. The only thing that increased was the SuccessfulRequestLatency.
The high latency occurred during a period where we were doing a lot of automatic writes. In our use case, writing to dynamo also includes some reading (to get any existing records). However, we often write the same quantity of data in the same period of time without causing any increased latency.
Is there any way to understand what contributes to an increase in SuccessfulRequest latency where it seems that we have provisioned enough read capacity? Is there any way to diagnose the latency caused by this set of writes to dynamodb?
You can dig deeper by checking the Get Latency and Put Latency in CloudWatch.
As you have already mentioned, there was no throttling, and your writes involve some reading as well, and your writes at other period of time don't cause any latency, you should check for what exactly in read operation is causing this.
Check SuccessfulRequestLatency metric while including the Operation dimension as well. Start with GetItem and BatchGetItem. If that doesn't
help include Scan and Query as well.
High request latency can sometimes happen when DynamoDB is doing an internal failover of one of its storage nodes.
Internally within Dynamo each storage partition has to be replicated across multiple nodes to provide a high level of fault tolerance. Occasionally one of those nodes will fail and a replacement node has to be introduced, and this can result in elevated latency for a subset of affected requests.
The advice I've had from AWS is to use a short timeout and a fast retry (e.g. 100ms) if your use-case is latency-sensitive. It's my understanding that only requests that hit the affected node experience increased latency, so within one or two retries you'll hit a different node and get a successful response, with minimal impact on your overall latency. Obviously it's hard to verify this, because it's not a scenario you can reproduce!
If you've got a support contract with AWS, it's well worth submitting a support ticket from the AWS console when events like this happen. They are usually able to provide an insight into what actually happened.
Note: If you're doing retries, remember to use exponential backoff to reduce the risk of throttling.
we're currently having issue in our production servers and would like to try to replicate the issue in our dev. I'm currently awaiting access to our Performance Monitoring Tool, and while waiting would like to play with it a little.
I'm thinking of, since I suspect a host throttling in prod, forcing hosts to throttle in dev and see if it will recreate the issue.
Is there a way to do this?
As others have mentioned, monitoring of the throttling counters and other counters like memory and WIP messages is a must to see what is going on in your production server. Also would recommend that set up a SCOM alert on throttling states of 3+ (publishing + delivery states), if you have SCOM.
Message throughput can grind to a halt on especially the memory (4, 5) and Queue Size (6) states. States 1+2 are generally short lived (e.g. arrival of a large batch of messages) and Biztalk recovers within a few seconds.
Simulating the memory state in your Dev environment should be straightforward by tweaking the throttling thresholds (obviously not something to be taken lightly in production!)
e.g. to trigger the Memory threshold states - AFAIK the lowest memory usage threshold you can set is 101MB. Running a load test in dev should then be able reproduce the throttle.
There is also apparantly a user-based throttling override to set states 10 and 11 although haven't actually tried this.
Some other experience on avoiding throttling:
(Caveat - I don't have an active BizTalk 2006/R2 setup - this is for 2009 / 2010)
If you do a lot of asynchronous processing (e.g. Queue receives), ensure that you have split functionality into separate Hosts for Receive, Processing and Send hosts. This way you can adjust the throttling for asynch Receive hosts to trigger much earlier than the processing and sending hosts - this should have the effect of constricting new incoming messages to the messagebox but allowing existing messages to complete processing.
On 64 bit hosts, the default 25% memory host usage throttling level is usually an unnecessary liability - we increased this using Yossi Dahan's recommendation of 50% on a 4GB server
Note that suspended messages count toward throttling state 6 - ensure that you have a strategy for dealing with suspended messages (and obviously ensure that the Sql Agent jobs are running!).
I was running a benchmark on CouchDB when I noticed that even with large bulk inserts, running a few of them in parallel is almost twice as fast. I also know that web browsers use a number of parallel connections to speed up page loading.
What is the reason multiple connections are faster than one? They go over the same wire, or even to localhost.
How do I determine the ideal number of parallel requests? Is there a rule of thumb, like "threadpool size = # cores + 1"?
The gating factor is not the wire itself which, after all, runs pretty quick (ignoring router delays) but the software overhead at each end. Each physical transfer has to be set up, the data sent and stored, and then completely handled before anything can go the other way. So each connection is effectively synchronous, no matter what it claims to be at the socket level: one socket operating asynchronously is still moving data back and forth in a synchronous way because the software demands synchronicity.
A second connection can take advantage of the latency -- the dead time on the wire -- that arises from the software doing its thing for first connection. So, even though each connection is synchronous, multiple connections let things happen much faster. Things seem (but of course only seem) to happen in parallel.
You might want to take a look at RFC 2616, the HTTP spec. It will tell you about the interchanges that happen to get an HTTP connection going.
I can't say anything about optimal number of parallel requests, which is a matter between the browser and the server.
Each connection consume one own thread. Each thread, have a quantum for consume CPU, network and other resources. Mainly, CPU.
When you start a parallel call, thread will dispute CPU time and run things "at the same time".
It's a high level overview of the things. I suggest you to read about asynchronous calls and thread programming to understand it better.
[]'s,
And Past
I'm optimizing a very popular website and since the user base is constantly growing I'm interested in what matters when it comes to scaling.
Currently I am scaling by adding more CPU power / RAM memory to the server. This works nicely - even though the site is quite popular, currently CPU usage is at 10%.
So, if possible, I'd keep doing that. What I am worried about is whether I could get to the point where CPU usage is low but users have problems connecting because of the number of HTTP connections. Is it better to scale horizontally, by adding more servers to the cluster?
Thanks!
Eventually just adding more memory won't be enough. There are concurrent connection limits for TCP rather than IIS (though both factors do come into account, IIS can handle about 3000 connections without a strain).
You probably won't encounter what you suggest where the CPU usage is low but number of HTTP connections is high unless it is a largely static site, but the more connections open, the higher the CPU usage.
But regardless of this, what you need for a popular site is redundancy, which is essential for a site which has a large user base. There is nothing more annoying to the user than the site being down as your sole server goes offline for some reason. If you have 2 servers behind a load balancer, you can grow the site, even take a server offline with less fear of your site going offline.
I've been tasked with creating a WCF service that will query a db and return a collection of composite types. Not a complex task in itself, but the service is going to be accessed by several web sites which in total average maybe 500,000 views a day.
Are there any special considerations I need to take into account when designing this?
Thanks!
No special problems for the development side.
Well designed WCF services can serve 1000's of requests per second. Here's a benchmark for WCF showing 22,000 requests per second, using a blade system with 4x HP ProLiant BL460c Blades, each with a single, quad-core Xeon E5450 cpu. I haven't looked at the complexity or size of the messages being sent, but it sure seems that on a mainstream server from HP, you're going to be able to get 1000 messages per second or more. And with good design, scale-out will just work. At that peak rate, 500k per day is not particularly stressful for the commnunications layer built on WCF.
At the message volume you are working with, you do have to consider operational aspects.
Logging
Most system ops people who oversee WCF systems (and other .NET systems) that I have spoken use an approach where, in the morning, they want to look at basic vital signs of the system:
moving averages of request volume: 1min, 1hr, 1day.
comparison of those quantities with historical averages
error/exception rate: 1min, 1hr, 1day
comparison of those quantities
If your exceptions are low enough in volume (in most cases they should be), you may wish to log every one of them into a special application event log, or some other audit log. This requires some thought - planning for storage of the audits and so on. The reason it's tricky is that in some cases, highly exceptional conditions can lead to very high volume logging, which exacerbates the exceptional conditions - a snowball effect. Definitely want some throttling on the exception logging to avoid this. a "pop off valve" if you know what I mean.
Data store
And of course you need to insure that the data source, whatever it is, can support the volume of queries you are throwing at it. Just as a matter of good citizenship - you may want to implement caching on the service to relieve load from the data store.
Network
With the benchmark I cited, the network was a pretty wide open gigabit ethernet. In your environment, the network may be shared, and you'll have to check that the additional load is reasonable.