How to loadbalance multiple application servers against multiple Vtgates - scaling

We are doing a POC to prove that Vitess can scale massively and meet our requirements. We are using around 40 application servers, 15 VTGates and 30 shards (each shard contains master, replica and rdonly). However we were able to scale up to a point and above that point getting a flat line.
Main dark point for us is connecting application server and multiple VTGates. We have tried loadbalancer (AWS nlb) in between them and seen increased QPS but much lower TPS (~15000 QPS, ~ 1500-2000 TPS). Then we tired each application use JDBC connection pooling to connect VTGate without loadbalancer. We got similar results. Then we tried without connection pooling. Then we were able to increase TPS, however saw massive dips in QPS which affect the TPS.
As you can see we have hit a certain roadblock and need few brilliant ideas to overcome this. Really appreciate valuable inputs 

Related

MariaDB max connections

We have a big application that uses 40 microservices (Spring boot) that each have about 5 database connections to a mariadb server. That causes too many connection errors on our mariadb server. Default is 151 however I was thinking of just setting the max connections to 1000 to be on the safe side. I cant find anywhere on the Internet if this is possible or even wise. Our MariaDB is running standalone on a VPS with 8GB memory. It is not running in a docker container or something like that. It is run directly on the VPS.
What is the maximum connections advisable taking into consideration that we might scale up with our microservices?
You can scale up your max_connections just fine. Put a line like
max_connections=250
in your MariaDB my.cnf file. But don't just set it to a very high number; each potential connection consumes RAM, and with only 8GiB you need to be a bit careful about that.
If you give this command you'll get a bunch of data about your connections.
SHOW STATUS LIKE '%connect%';
The important ones to watch:
Connection_errors_max_connections The number of connection attempts that failed because you ran out of connection slots.
Connections The total number of connections ever handled
Max_used_connections The largest number of simultaneous connections used.
Max_used_connections_time The date and time when the server had its largest number of connections.
The numbers shown are cumulative since the last server boot or the most recent FLUSH STATUS; statement.
Keep an eye on these. If you run short you can always add more. If you have to add many more connections as you scale up, you probably will need to provision your VPS with more RAM. The last two are cool because you can figure out whether you're getting hammered at a particular time of day.
And, in your various microservices be very careful to use connection pools of reasonable maximum size. Don't let your microservices grab more than ten connections unless you run into throughput trouble. You didn't say what client tech you use (nodejs? dotnet? php? Java?) so it's hard to give you specific advice how to do that.

Web load testing - what does a wavy response time graph indicate?

I am running a load test on an API using JMeter. When I host the API on the same pc as the test (the database is remote though) I get ok results.
However, when I tried running the load test through the same API but hosted on a different pc on the same network, I got this wavy pattern in my test results.
Each of the four grouped lines are response times for a particular API endpoint and the blue line is active thread count.
The question is: does this wavy pattern mean anything? This pattern isn't visible when the API is hosted on the same machine as the test.
The results are very different and I am thinking this pattern might be correlated to the problem.
I used 200 active threads and no specific configuration which would produce the requests in this pattern.
You need pay attention to the following points:
Connect Time and Latency metrics, Elapsed Time is a sum of Connect Time, Latency and the actual server response time so these "waves" might be caused by networking issues.
It might be indicating the application under tests is doing i.e. garbage collection or using swap file which is much slower than memory due to lack of resources Make sure that it has enough headroom to operate in terms of CPU, RAM, Network and Disk IO. These metrics can be checked using i.e. JMeter PerfMon Plugin. The same is applicable for JMeter, if JMeter will not be able to send requests fast enough - you will see throughput dropdowns.
The most efficient way to get to the bottom of the issue is running your application under profiling tool telemetry, this will allow you to
identify the heaviest functions, largest objects in heap, etc.
Consider checking your database as well and detect slow queries as the issue might be caused by database issues (including networking layer)

Is it "okay" to host a small wordpress blog on one AWS EC2 Instance without load balancers/beanstalk?

This is a very simple question for those with the knowledge, but I'm a newbie.
In essence, I just need to know if it would be considered okay to run a small, approx. 700 visitors/day bitnami wordpress blog on just one t2.medium EC2 instance (without any auto-scaling, beanstalk).
Am at risk of it crashing? What stats should I monitor or be aware of to be aware of potential dangers? Sorry for the basic nature of these questions, but this is new.
tl;dr: It might be "okay", but it's not ideal.
If your question is because of:
Initial setup time - Load-balancing and auto-scaling will be less expensive (more time-efficient) over time.
Cost - Auto-scaling spins down instances that aren't being used to reduce cost.
Minimal setup for a great user experience - The goal of a great AWS setup is to ensure that capacity matches demand
Am at risk of it crashing?
Possibly, yes. If you average 700 visitors, then the risk is traffic spikes if all visitors hit at the same. It also depends on what your maximum visitors are, which could vary widely from the average (or not)
What stats should I monitor or be aware of to be aware of potential dangers?
Monitor the usage on high traffic days (ie. public holiday sales)
Setup billing alerts
Setup the right metrics:
See John Rotenstein's SO answer:
CPU Utilization is not always the right measure to use -- your
application might only be able to handle a limited number of
connections, it might be squeezed on RAM and the types of requests
might vary too.
You can use normal monitoring tools, or you can write something that
pushes metrics to Amazon CloudWatch, so that you go beyond the basic
CPU and Network metrics that CloudWatch normally provides. You could
even use the Load Balancer's Latency metric to trigger scaling when
the application slows down (custom code required).
I'd start with:
Two or more instances - to deal with instance redundancy (an instance going down)
Several t2.small rather than one t2.medium can work out to be more cost-efficient, and more cost efficient than EC in some use cases.
Add auto-scaling - automatically spin up or down instances based on minimum and maximum counts
Load balancing - to re-route users from unhealthy to healthy instances. And also to keep all of the spun up instances all working as evenly as possible (rather than a single instance handling 80% of the workload while the others bludge).
You can always reduce your instances after time with monitoring.
In my opinion, with 700 visitors a day, the safer option would be to run a load balanced/auto-scaling environment on Elastic Beanstalk with at least 2 instances. The problem with running just one instance is that yes you are at a great risk of crashing in case you get an increase in traffic or when the instance goes down and with just one running you will not have a fallback. You can easily set up CloudWatch monitoring on NetworkIn, NetworkOut to get a sense of the number of requests your site is receiving and serving, and setup CPU Usage monitoring as well. The trade-off with running a load balanced environment over a single instance environment is that the cost might significantly increase as you introduce other things into your environment such as a load balancer. Also if you introduce a load balancer consider reducing the instance size to maybe a t2.small, could aid in reducing the cost.
It actually depends. This question range is wide. You have multiple options here.
You can use only ec2 instance for that much amount of visitors or even more if your application allows. You can also consider caching if your app need it.
You may add instance in an autoscaling group. So that if by any chance you need more resources you can increase them horizontally.
You can add load balancers lateron also. You just need to add user data in your launch configuration attached to autoscaling group. So when your instance get up it should automatically register itself in your load balancer.
For monitoring, you can check for the request metrics in cloudwarch for ELB. You have to keep an eye on your CPU and trigger the scale out policy once it reaches a particular threshold.

How to handle scaling when request per minute go from 500 to 5000 instanly

I have an application that spikes from 500 rpm to 5000 and stays there for 20-30min. I know that's not a ton of requests but its the magnitude of the jump that is killing me. AWS-EC2 takes 5 min to scale up so that's not helpful when things move so fast. Maybe multiple DB's that handle different pieces of the application.
How would you go about analyzing this and thinking about infrastructure if you will always go from 500 to 5000RPM or higher in one minute?
This is the graph from my AWS logs:
If you can predict that demand will increase at some point you can automate provisioning of new instances. If you can't determine this then you need to do proper capacity planning. For instance, how many servers/containers do you need running to sustain the load with an acceptable user experience? This will be key to determine.
You also should look at implement asynchronous messaging patterns that offload the spike although this may come with some performance degradation.
One additional consideration would be moving to a serverless architecture like AWS Lambda. This likely wouldn't fully solve the problem but would provide you more ability to quickly provision on demand infrastructure.

High performace server for 1x1 pixels (500M GET requests per day)

I need to set up a tracking server that will only serve 1x1 pixels and log all requests.
I initially thought of using Amazon's S3 or CloudFront but their costs are prohibitively high for me. I need to serve 500M pixels a day, and S3 charges $0.4 per 1M GET requests, so even without the data transfer costs I'm at $6,000/month.
I am considering setting up nginx or lighttpd on an EC2 instance. What performance should I expect with those two (e.g. per one large EC2 instance)? Are there better free products for this task?
Nginx is indeed a good candidate for this and already has built in support for empty GIFs (see http://wiki.nginx.org/HttpEmptyGifModule).
Disk I/O will probably be the biggest issue for this server because of the access logging. The only way to figure out the performance of the different EC2 instances is to test them.
If one EC2 instance does not offer the performance you need, or if you need any redundancy for this service, you should also look into using a load balancer (either an AWS Elastic Load Balancer or your own custom one).
You could also set up multiple smaller servers in different geographical regions and use DNS latency based routing to route requests to them (use either AWS Route 53 latency based routing or another DNS solution). This would significantly reduce the connection time to your server and would distribute the load across several data centers.

Resources