How to handle scaling when request per minute go from 500 to 5000 instanly - scaling

I have an application that spikes from 500 rpm to 5000 and stays there for 20-30min. I know that's not a ton of requests but its the magnitude of the jump that is killing me. AWS-EC2 takes 5 min to scale up so that's not helpful when things move so fast. Maybe multiple DB's that handle different pieces of the application.
How would you go about analyzing this and thinking about infrastructure if you will always go from 500 to 5000RPM or higher in one minute?
This is the graph from my AWS logs:

If you can predict that demand will increase at some point you can automate provisioning of new instances. If you can't determine this then you need to do proper capacity planning. For instance, how many servers/containers do you need running to sustain the load with an acceptable user experience? This will be key to determine.
You also should look at implement asynchronous messaging patterns that offload the spike although this may come with some performance degradation.
One additional consideration would be moving to a serverless architecture like AWS Lambda. This likely wouldn't fully solve the problem but would provide you more ability to quickly provision on demand infrastructure.


Is it "okay" to host a small wordpress blog on one AWS EC2 Instance without load balancers/beanstalk?

This is a very simple question for those with the knowledge, but I'm a newbie.
In essence, I just need to know if it would be considered okay to run a small, approx. 700 visitors/day bitnami wordpress blog on just one t2.medium EC2 instance (without any auto-scaling, beanstalk).
Am at risk of it crashing? What stats should I monitor or be aware of to be aware of potential dangers? Sorry for the basic nature of these questions, but this is new.
tl;dr: It might be "okay", but it's not ideal.
If your question is because of:
Initial setup time - Load-balancing and auto-scaling will be less expensive (more time-efficient) over time.
Cost - Auto-scaling spins down instances that aren't being used to reduce cost.
Minimal setup for a great user experience - The goal of a great AWS setup is to ensure that capacity matches demand
Am at risk of it crashing?
Possibly, yes. If you average 700 visitors, then the risk is traffic spikes if all visitors hit at the same. It also depends on what your maximum visitors are, which could vary widely from the average (or not)
What stats should I monitor or be aware of to be aware of potential dangers?
Monitor the usage on high traffic days (ie. public holiday sales)
Setup billing alerts
Setup the right metrics:
See John Rotenstein's SO answer:
CPU Utilization is not always the right measure to use -- your
application might only be able to handle a limited number of
connections, it might be squeezed on RAM and the types of requests
might vary too.
You can use normal monitoring tools, or you can write something that
pushes metrics to Amazon CloudWatch, so that you go beyond the basic
CPU and Network metrics that CloudWatch normally provides. You could
even use the Load Balancer's Latency metric to trigger scaling when
the application slows down (custom code required).
I'd start with:
Two or more instances - to deal with instance redundancy (an instance going down)
Several t2.small rather than one t2.medium can work out to be more cost-efficient, and more cost efficient than EC in some use cases.
Add auto-scaling - automatically spin up or down instances based on minimum and maximum counts
Load balancing - to re-route users from unhealthy to healthy instances. And also to keep all of the spun up instances all working as evenly as possible (rather than a single instance handling 80% of the workload while the others bludge).
You can always reduce your instances after time with monitoring.
In my opinion, with 700 visitors a day, the safer option would be to run a load balanced/auto-scaling environment on Elastic Beanstalk with at least 2 instances. The problem with running just one instance is that yes you are at a great risk of crashing in case you get an increase in traffic or when the instance goes down and with just one running you will not have a fallback. You can easily set up CloudWatch monitoring on NetworkIn, NetworkOut to get a sense of the number of requests your site is receiving and serving, and setup CPU Usage monitoring as well. The trade-off with running a load balanced environment over a single instance environment is that the cost might significantly increase as you introduce other things into your environment such as a load balancer. Also if you introduce a load balancer consider reducing the instance size to maybe a t2.small, could aid in reducing the cost.
It actually depends. This question range is wide. You have multiple options here.
You can use only ec2 instance for that much amount of visitors or even more if your application allows. You can also consider caching if your app need it.
You may add instance in an autoscaling group. So that if by any chance you need more resources you can increase them horizontally.
You can add load balancers lateron also. You just need to add user data in your launch configuration attached to autoscaling group. So when your instance get up it should automatically register itself in your load balancer.
For monitoring, you can check for the request metrics in cloudwarch for ELB. You have to keep an eye on your CPU and trigger the scale out policy once it reaches a particular threshold. application hosting in Amazon Cloud

Hosting .NET application in Amazon EC2. what would be optimum
configuration for a group that has 525 employers and around 85,000 employees ? I am googling this for past 1 week but could not found a reliable solution
You might want to consider hosting your application on AppHarbor. We'll seamlessly scale you application, and you won't have to worry about sizing your infrastructure up front.
(disclaimer, I'm co-founder of AppHarbor)
Perhaps you need to provide more information to get better answers - for example, what does your application do? How many users it has? What is the relevance of "525 employers and around 85,000 employees" - does it indicate amount of data or users? How many users will be concurrent at a time? What will be the average request time? What will be the usage pattern? How much memory it needs? Is your app CPU intensive or IO intensive? If its IO intensive, where exactly is your data stored?
Said all that, you need not worry too much from provisioning/scaling front. Amazon EC2 offers on-demand resourcing - so you can easily up-scale your configuration as per your need.
If you really want to find out optimal configuration, only way is to load test your application (with typical usage pattern/scenarios). Decide your parameters such as average response time and find out user limits served by say 1, 4 and 8 ECU (Elastic Compute Unit). You can load test using say standard instances - small, large and extra large. You can easily interpolate to project your actual ECU & Memory needs. Based on that you can choose actual optimal configuration.
You can try off-site load testing considering the fact that as per Amazon:
EC2 Compute Unit (ECU) – One EC2 Compute Unit (ECU) provides the
equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon
You can arrange hardware equivalent of say 1, 2 and 4 ECU and do your load testing looking at memory consumption with performance counter. That should give you some clue as to what is needed. IMO, you will be better off load testing in actual EC2 environment.

How many requests per second should my asp(class) app handle

I'm profiling a asp(classic) web service. The web service makes database calls, reads/writes to files, and processes xml. On a windows server 2003 box(2.7ghz, 4 core, 4gb ram) how many requests per second should I be able to handle before things start to fail.
I'm building a tool to test this, but I'm looking for a number of requests per second to shoot for.
I know this is fairly vague, but please give the best estimate you can. If you need more information, please ask.
95% of the performance of any data-driven app is dependent on the database: 1) the way you do your calls, 2) the indexes, 3) the hardware under the database (disk subsystem in particular).
I have seen a machine, like you are describing, handle 40 requests per second (2500/minute), but numbers like 10 per second (600/minute) are more common. I would expect even lower if you are running your DB on the same machine, and even lower still if that DB is SQLExpress or MSAccess.
Also, at capacity, your app will probably not fail, but IIS will Queue requests, once it is saturated, and may timeout some of those requests if it can't service them before the timeout expires.
Btw, instead of building a tool to test your app, you may want to look into using a test tool such as Microsoft WCAT. It is pretty smooth and easy to use.
How fast should it be? Fast enough.
How fast is fast enough? That's a question that only you and your users can answer. If your service is horrifically inefficient and keeps up with demand, it's fast enough. If your service is assembly-optimized, lightning-fast, and overwhelmed with requests, it's not fast enough.
If the server is handling its actual workload, then don't worry about how fast it "should" be. When the server is having trouble, or when you anticipate that it soon will, then you should look at improving the code or upgrading the hardware. Remember Knuth's Law – premature optimization is the root of all evil. Any work you do now to make it faster may never pay off, and you may be forced to make compromises with flexivility or maintainability. Remember, too, an older adage – if it ain't broke, don't fix it.
Yes I would also say 10 per second is a good benchmark. For a high performance app you would want to get more than this, but if you have no specific goal you should generally be able to get at least 10 requests per sec for a general web page with a bunch of database queries.

design considerations for a WCF service to be accessed 500k times/day

I've been tasked with creating a WCF service that will query a db and return a collection of composite types. Not a complex task in itself, but the service is going to be accessed by several web sites which in total average maybe 500,000 views a day.
Are there any special considerations I need to take into account when designing this?
No special problems for the development side.
Well designed WCF services can serve 1000's of requests per second. Here's a benchmark for WCF showing 22,000 requests per second, using a blade system with 4x HP ProLiant BL460c Blades, each with a single, quad-core Xeon E5450 cpu. I haven't looked at the complexity or size of the messages being sent, but it sure seems that on a mainstream server from HP, you're going to be able to get 1000 messages per second or more. And with good design, scale-out will just work. At that peak rate, 500k per day is not particularly stressful for the commnunications layer built on WCF.
At the message volume you are working with, you do have to consider operational aspects.
Most system ops people who oversee WCF systems (and other .NET systems) that I have spoken use an approach where, in the morning, they want to look at basic vital signs of the system:
moving averages of request volume: 1min, 1hr, 1day.
comparison of those quantities with historical averages
error/exception rate: 1min, 1hr, 1day
comparison of those quantities
If your exceptions are low enough in volume (in most cases they should be), you may wish to log every one of them into a special application event log, or some other audit log. This requires some thought - planning for storage of the audits and so on. The reason it's tricky is that in some cases, highly exceptional conditions can lead to very high volume logging, which exacerbates the exceptional conditions - a snowball effect. Definitely want some throttling on the exception logging to avoid this. a "pop off valve" if you know what I mean.
Data store
And of course you need to insure that the data source, whatever it is, can support the volume of queries you are throwing at it. Just as a matter of good citizenship - you may want to implement caching on the service to relieve load from the data store.
With the benchmark I cited, the network was a pretty wide open gigabit ethernet. In your environment, the network may be shared, and you'll have to check that the additional load is reasonable.

Best way to determine the number of servers needed

How much traffic can one web server handle? What's the best way to see if we're beyond that?
I have an ASP.Net application that has a couple hundred users. Aspects of it are fairly processor intensive, but thus far we have done fine with only one server to run both SqlServer and the site. It's running Windows Server 2003, 3.4 GHz with 3.5 GB of RAM.
But lately I've started to notice slows at various times, and I was wondering what's the best way to determine if the server is overloaded by the usage of the application or if I need to do something to fix the application (I don't really want to spend a lot of time hunting down little optimizations if I'm just expecting too much from the box).
What you need is some info on Capacity Planning..
Capacity planning is the process of planning for growth and forecasting peak usage periods in order to meet system and application capacity requirements. It involves extensive performance testing to establish the application's resource utilization and transaction throughput under load. First, you measure the number of visitors the site currently receives and how much demand each user places on the server, and then you calculate the computing resources (CPU, RAM, disk space, and network bandwidth) that are necessary to support current and future usage levels.
If you have access to some profiling tools (such as those in the Team Suite edition of Visual Studio) you can try to set up a testing server and running some synthetic requests against it and see if there's any specific part of the code taking unreasonably long to run.
You should probably check some graphs of CPU and memory usage over time before doing this, to see if it can even be that. (A number alike to the UNIX "load average" could be a useful metric, I don't know if Windows has anything like it. Basically the average number of threads that want CPU time for every time-slice.)
Also check the obvious, that you aren't running out of bandwidth.
Measure, measure, measure. Rico Mariani always says this, and he's right.
Measure req/sec, RAM, CPU, Sessions, etc.
You may come up with a caching strategy (Output caching, data caching, caching dependencies, and so on.)
See also how your SQL Server is doing... indexes are a good place to start but not the only thing to look at..
On that hardware, a .NET application should be able to serve about 200-400 requests per second. If you have only a few hundred users, I doubt you are seeing even 2 requests per second, so I think you have a lot of capacity on that box, even with SQL server running.
Without know all of the details, I would say no, you will not see any performance improvement by adding servers.
By the way, if you're not using the Output Cache, I would start there.
