I have an HTTP application with standalone workers that perform well. The issue is that some times they need to purge and rebuild their caches, so they stop responding for up to 30 seconds.
I have looked into a number of load balancers, but none of them seem to address this issue. I have tried Perlbal and some Apache modules (like fcgid) and they happily send requests to workers that are busy rebuilding their cache.
So my take is this: isn't there some kind of message bus solution where all http requests are queued up, leaving it up to the workers to process messages when they are able to?
Or - alternatively - a load balancer that can take into account that the workers are some times unable to respond.
Added later: I am aware that a strategy could be that the workers could use a management protocol to inform the load balancer when they are busy, but that solution seems kludgy and I worry that there will be some edge cases that results in spurious errors.
If you use Amazon Web Services Load Balancer you can achieve your desired result. You can mark an EC2 Instance behind an Elastic Load Balancer (ELB) as unhealthy while it does this cache purge and rebuild.
What I would do is create an additional endpoint for each instance, that is called rebuild_cache for example. So if you have 5 instances behind your ELB, you can make a script to hit each individual instance (not through the load balancer) on that rebuild_cache endpoint. This endpoint would do 3 things:
Mark the instance as unhealthy. The load balancer will realize it's unhealthy after a failed health check (the timing and threshold of health checks are configurable from AWS Web Console).
Run your cache purge and rebuild
Mark the instance as healthy. The load balancer will run a health check on the instance and only start sending it traffic once it has been healthy for the required amount of healthy health checks (again, this threshold is defined through ELB Health configuration)
I see two strategies here: put a worker offline for the period, so a balancer will abandon it; inverse control - workers pull for tasks from a balancer, instead of the balancer pushes tasks to workers. Second strategy easy to do with a Message Queue.
Related
I am working testing the auto-scaling feature of OpenStack. In my test set-up, java servlet applications are deployed in tomcat web servers behind a HAproxy load balancer. I aim at stressing testing the application, to see how it scales and the response time using JMeter as the stress tester. However the I observe that HAProxy (or something else) terminates the connection immediately the onComplete signal is sent by one of the member instances. Consequently, the subsequent responses from the remaining servers are reported as failures (timeouts). I have configured the HAProxy server to use a round-robin algorithm with sticky sessions. See attached JMeter results tree , I am not sure of the next step to undertake. The web applications are asyncronous hence my expectation was that the client (HAProxy in this case) should wait until the last thread is submitted before sending the response.
Is there be some issues with my approach or some set up flaws ?
Docker Swarm mode achieves Inner Load Balancing, and as far as I know, nginx is called hard load balancing, zookeeper is kinda soft load balancing.
So what's the mechanism of the Inner Load Balancing coming along with Docker v1.12?
Does it embed a nginx inside or similar methods like zookeeper?
"Inner" load balancing? Not exactly.
Commit ea4fef2 documents it (docs/swarm/key-concepts.md) as
Swarm uses ingress load balancing to expose the services you want to make available externally to the Swarm.
Swarm can automatically assign the service a PublishedPort or you can configure a PublishedPort for the service in the 30000-32767 range.
External components, such as cloud load balancers, can access the service on the PublishedPort of any node in the cluster, even if the node is not currently running the service.
Swarm has an internal DNS component that automatically assigns each service in the Swarm DNS entry.
Swarm uses internal load balancing to distribute requests among services within the cluster based upon the services' DNS name.
Right now (docker 1.12 August 2016), that inner load balancing does not work consistently: issue 25325
➜ ~ time curl http://10.218.3.5:30000
I'm 272dd0310a95
curl http://10.218.3.5:30000 0.01s user 0.01s system 6% cpu 0.217 total
➜ ~ time curl http://10.218.3.5:30000
curl: (7) Failed to connect to 10.218.3.5 port 30000: Operation timed out
And swarmkit issue 1077 illustrates there is no plan yet to
provide capabilities for session stickiness (cookie-based etc.) in this router mesh.
As awesome as it would be, not all apps are stateless, and we need to route users to the proper container in certain cases
Because:
since we do load balancing at L3/L4 it cannot be bases on things like session cookie.
The best that can be done is to have Source IP based stickiness.
And source IP is not always good enough:
That wouldn't work for our case.
We would have an upstream load balancer (F5) which would make traffic appear to come from a single IP, the "SNAT pool" IP on the F5 since it is a full proxy.
Effectively, Source IP based stickiness would cause all requests to go to one container since all the source IPs would come from the same address.
So the internal load balancer remains quite "basic":
The main issue with adding "session stickyness" is that there are a hundred ways to do it.
It is also an L7 feature, whereas our load balancing operates at L3/4.
There are two high-level paths here:
Monitor events coming from the docker API to modify F5 state to route directly task slots.
Integrate with libnetwork and have the loadbalancer operate as an L7 LB would if it were running directly in the swarm.
The conclusion for now is:
If you want to handle all aspects of load balancing and not use IPVS, you can disable it by running services in DNSRR mode. You can run any load balancer inside of swarm to do load balancing, bypassing the service VIP and populate backends with the DNSRR entries.
That is why the latest release 1.12 has, with PR 827, added support for DNSRR mode and disabling ingress.
I've found at that Instagram share their technology implementation with other developers trough their blog. They've some great solutions for the problems they run into. One of those solutions they've is an Elastic Load Balancer on Amazon with 3 nginx instances behind it. What is the task of those nginx servers? And what is the task of the Elastic Load balancers, and what is the relation between them?
Disclaimer: I am no expert on this in any way and am in the process of learning about AWS ecosystem myself.
The ELB (Elastic load balancer) has no functionality on its own except receiving the requests and routing it to the right server. The servers can run Nginx, IIS, Apache, lighthttpd, you name it.
I will give you a real use case.
I had one Nginx server running one WordPress blog. This server was, like I said, powered by Nginx serving static content and "upstreaming" .php requests to phpfpm running on the same server. Everything was going fine until one day. This blog was featured on a tv show. I had a ton of users and the server could not keep up with that much traffic.
My first reaction would be to just use the AMI (Amazon machine image) to spin up a copy of my server on a more powerful instance like m1.heavy. The problem was I knew I would have traffic increasing over time over the next couple of days. Soon I would have to spin an even more powerful machine, which would mean more downtime and trouble.
Instead, I launched an ELB (elastic load balancer) and updated my DNS to point website traffic to the ELB instead of directly to the server. The user doesn’t know server IP or anything, he only sees the ELB, everything else goes on inside amazon’s cloud.
The ELB decides to which server the traffic goes. You can have ELB and only one server on at the time (if your traffic is low at the moment), or hundreds. Servers can be created and added to the server array (server group) at any time, or you can configure auto scaling to spawn new servers and add them to the ELB Server group using amazon command line, all automatically.
Amazon cloud watch (another product and important part of the AWS ecosystem) is always watching your server’s health and decides to which server it will route that user. It also knows when all the servers are becoming too loaded and is the agent that gives the order to spawn another server (using your AMI). When the servers are not under heavy load anymore they are automatically destroyed (or stopped, I don’t recall).
This way I was able to serve all users at all times, and when the load was light, I would have ELB and only one Nginx server. When the load was high I would let it decide how many servers I need (according to server load). Minimal downtime. Of course, you can set limits to how many servers you can afford at the same time and stuff like that so you don’t get billed over what you can pay.
You see, Instagram guys said the following - "we used to run 2 Nginx machines and DNS Round-Robin between them". This is inefficient IMO compared to ELB. DNS Round Robin is DNS routing each request to a different server. So first goes to server one, second goes to server two and on and on.
ELB actually watches the servers' HEALTH (CPU usage, network usage) and decides which server traffic goes based on that. Do you see the difference?
And they say: "The downside of this approach is the time it takes for DNS to update in case one of the machines needs to get decommissioned."
DNS Round robin is a form of a load balancer. But if one server goes kaput and you need to update DNS to remove this server from the server group, you will have downtime (DNS takes time to update to the whole world). Some users will get routed to this bad server. With ELB this is automatic - if the server is in bad health it does not receive any more traffic - unless of course the whole group of servers is in bad health and you do not have any kind of auto-scaling setup.
And now the guys at Instagram: "Recently, we moved to using Amazon’s Elastic Load Balancer, with 3 NGINX instances behind it that can be swapped in and out (and are automatically taken out of rotation if they fail a health check).".
The scenario I illustrated is fictional. It is actually more complex than that but nothing that cannot be solved. For instance, if users upload pictures to your application, how can you keep consistency between all the machines on the server group? You would need to store the images on an external service like Amazon s3. On another post on Instagram engineering – “The photos themselves go straight to Amazon S3, which currently stores several terabytes of photo data for us.”. If they have 3 Nginx servers on the load balancer and all servers serve HTML pages on which the links for images point to S3, you will have no problem. If the image is stored locally on the instance – no way to do it.
All servers on the ELB would also need an external database. For that amazon has RDS – All machines can point to the same database and data consistency would be guaranteed.
On the image above, you can see an RDS "Read replica" - that is RDS way of load balancing. I don't know much about that at this time, sorry.
Try and read this: http://awsadvent.tumblr.com/post/38043683444/using-elb-and-auto-scaling
Can you please point the blog entry out?
Load balancers balance load. They monitor the Web servers health (response time etc) and distribute the load between the Web servers. On more complex implementations it is possible to have new servers spawn automatically if there is a traffic spike. Of course you need to make sure there is a consistency between the servers. THEY CAN share the same databases for instance.
So I believe the load balancer gets hit and decides to which server it will route the traffic according to server health.
.
Nginx is a Web server that is extremely good at serving a lot of static content for simultaneous users.
Requests for dynamic pages can be offloaded to a different server using cgi. Or the same servers that run nginx can also run phpfpm.
.
A lot of possibilities. I am on my cell phone right now. tomorrow I can write a little more.
Best regards.
I am aware that I am late to the party, but I think the use of NGINX instances behind ELB in Istagram blogpost is to provide high available load balancer as described here.
NGINX instances do not seem to be used as web servers in the blogpost.
For that role they mention:
Next up comes the application servers that handle our requests. We run Djangoon Amazon High-CPU Extra-Large machines
So ELB is used just as a replacement for their older solution with DNS Round-Robin between NGINX instances that was not providing high availability.
We have a biztalk server that makes frequent calls to a web service that we also host.
The web service is hosted on 4 servers with a DNS load balancer sitting between them. The theory is that each subsequent call to the service will round robin the servers and balance the load.
However this does not work presumably because the result of the DNS lookup is cached for a small amount of time on the client. The result is that we get a flood of requests to each server before it moves on to the next.
Is that presumption correct and what are the alternative options here?
a bit more googling has suggested that I can disable client side caching for DNS: http://support.microsoft.com/kb/318803
...however this states the default cache time is 1 day which is not consistent with my experiences
You need to load balance at a lower level with NLB Clustering on Windows or LVS on Linux (or other equivalent piece of software). If you are letting clients to the web service keep an HTTP connection open for longer than a single request/response you still might not get the granularity of load balancing you are looking for so you might have to reconfigure your application servers if that is the case.
The solution we finally decided to go with was Application Request Routing which is an IIS extension. In tests this has shown to do what we want and is far easier for us (as developers) to get up and running as compared to a hardware load balancer.
http://www.iis.net/download/ApplicationRequestRouting
Say I have a web farm of six IIS 7 web servers, each running an identical ASP.NET application.
They are behind a hardware load balancer (say F5).
Can the load balancer detect when the ASP.NET application worker process is restarting and divert traffic away from that server until after the worker process has restarted?
What happens during an IIS restart is actually a roll-over process. A new IIS worker process starts that accepts new connections, while the old worker process continues to process existing connections.
This means that if you configure your load balancer to use a balancing algorithm other than simple round-robin, that it will tend to "naturally" divert some, but not all, connections away from the machine that's recycling. For example, with a "least connections" algorithm, connections will tend to hang around longer while a server is recycling. Or, with a performance or latency algorithm, the recycling server will suddenly appear slower to the load balancer.
However, unless you write some code or scripts that explicitly detect or recognize a recycle, that's about the best you can do--and in most environments, it's also really all you need.
We use a Cisco CSS 11501 series load balancer. The keepalive "content" we are checking on each server is actually a PHP script.
service ac11-192.168.1.11
ip address 192.168.1.11
keepalive type http
keepalive method get
keepalive uri "/LoadBalancer.php"
keepalive hash "b026324c6904b2a9cb4b88d6d61c81d1"
active
Because it is a dynamic script, we are able to tell it to check various aspects of the server, and return "1" if all is well, or "0" if not all is well.
In your case, you may be able to implement a similar check script that will not work if the ASP.NET application worker process is restarting (or down).
It depends a lot on the polling interval of the load balancer, a request from the balancer has to fail in order before it can decide to divert traffic
IIS 6 and 7 restart application pools every 1740 minutes by default. It does this in an overlapped manner so that services are not impacted.
http://technet.microsoft.com/en-us/library/cc770764%28v=ws.10%29.aspx
http://forums.iis.net/t/1153736.aspx/1
On the other hand, in case of a fault, a good load balancer (I'm sure F5 is) can detect a fault with one of the web servers and send requests to the remaining, healthy web servers. That's a critical part of a high-availability web infrastructure.