Sticky session load balancer with nginx open source

Sticky session load balancer with nginx open source - nginx

What the main difference between the sticky session available in nginx plus and hashing cookie in open source version?
According to the docs nginx open source allows session persistence based on hashing different global variables available within nginx, including $cookie_
With the following configuration:
upstream myserver {
hash $cookie_sessionID;
server localhost:8092;
server localhost:8093;
server localhost:8094 weight=3;
}
location / {
proxy_pass http://myserver;
}
Assuming, there will be centralized mechanism across backends for generating unique sessionID cookie for all new requests, so what the main disadvantages of such method compared to the nginx plus sticky session approach?

IP Hash load‑balancing can work as "sticky sessions",
but you have to keep in mind that this load balancing method is working relative bad itself, because a lot of user/devices shared same external IP-Address in modern world.
We experimented with a rather heavily loaded (thousands of parallel users) application and observed tens of percent imbalance between servers when using IP Hash.
Theoretically, the situation should improve with increasing load and number of servers, but for example we did not see any significant difference when using 3 and 5 servers.
So, I would strongly advise against using IP Hash in productive environment.
As open-source based sticky sessions solution, not bad idea to use HAProxy, because HAProxy support it out-of-the-box.
Or HAProxy + Nginx bundle, where HAProxy is responsible for "sticky sessions".
(I know about one extremely loaded system that successfully uses such a bundle for this very purpose, so, this is working idea.)

Your approach will work. According to the official NGINX documentation (Configuring Basic Session Persistence):
"If your application requires basic session persistence (also known as sticky sessions), you can implement it in NGINX Open Source with the IP Hash load‑balancing algorithm."
While NGINX Plus "offers a more sophisticated form of session persistence". For example "Least Time" method – when for each request the server with the lowest average latency and the lowest number of active connections is chosen.

Related

What will happen if a SSL-configured Nginx reverse proxy pass to an web server without SSL?

I use Nginx to manage a lot of my web services. They listens different port, but all accessed by the reverse proxy of Nginx within one domain. Such as to access a RESTful-API server I can use http://my-domain/api/, and to access a video server I can use http://my-domain/video.
I have generated a SSL certificate for my-domain and added it into my Nginx conf so my Nginx server is HTTPS now -- But those original servers are still using HTTP.
What will happen when I visit https://my-domain/<path>? Is this as safe as configuring SSL on the original servers?

One of the goals of making sites be HTTPS is to prevent the transmitted data between two endpoints from being intercepted by outside parties to either be modified, as in a man-in-the-middle attack, or for the data to be stolen and used for bad purposes. On the public Internet, any data transmitted between two endpoints needs to be secured.
On private networks, this need isn't quite so great. Many services do run on just HTTP on private networks just fine. However, there are a couple points to take into consideration:
Make sure unused ports are blocked:
While you may have an NGINX reverse proxy listening on port 443, is port 80 blocked, or can the sites still be accessed via HTTP?
Are the other ports to the services blocked as well? Let's say your web server runs on port 8080, and the NGINX reverse proxy forwards certain traffic to localhost:8080, can the site still be accessed at http://example.com:8080 or https://example.com:8080? One way to prevent this is to use a firewall and block all incoming traffic on any ports you don't intend to accept traffic on. You can always unblock them later, if you add a service that requires that port be opened.
Internal services are accessible by other services on the same server
The next consideration relates to other software that may be running on the server. While it's within a private ecosystem, any service running on the server can access localhost:8080. Since the traffic between the reverse proxy and the web server are not encrypted, that traffic can also be sniffed, even if authorisation is required in order to authenticate localhost:8080. All a rogue service would need to do is monitor the port and wait for a user to login. Then that service can capture everything between the two endpoints.
One strategy to mitigate the dangers created by spyware is to either use virtualisation to separate a single server into logical servers, or use different hardware for things that are not related. This at least keeps things separate so that the people responsible for application A don't think that service X might be something the team running application B is using. Anything out of place will more likely stand out.
For instance, a company website and an internal wiki probably don't belong on the same server.
The simpler we can keep the setup and configuration on the server by limiting what that server's job is, the more easily we can keep tabs on what's happening on the server and prevent data leaks.
Use good security practices
Use good security best practices on the server. For instance, don't run as root. Use a non-root user for administrative tasks. For any services that run which are long lived, don't run them as root.
For instance, NGINX is capable of running as the user www-data. With specific users for different services, we can create groups and assign the different users to them and then modify the file ownership and permissions, using chown and chmod, to ensure that those services only have access to what they need and nothing more. As an example, I've often wondered why NGINX needs read access to logs. It really should, in theory, only need write access to them. If this service were to somehow get compromised, the worst it could do is write a bunch of garbage to the logs, but an attacker might find their hands are tied when it comes to retrieving sensitive information from them.
localhost SSL certs are generally for development only
While I don't recommend this for production, there are ways to make localhost use HTTPS. One is with a self signed certificate. The other uses a tool called mkcert which lets you be your own CA (certificate authority) for issuing SSL certificates. The latter is a great solution, since the browser and other services will implicitly trust the generated certificates, but the general consensus, even by the author of mkcert, is that this is only recommended for development purposes, not production purposes. I've yet to find a good solution for localhost in production. I don't think it exists, and in my experience, I've never seen anyone worry about it.

Achive a "more than 10 connections, pass to next server" setup with NGINX or other

Idea
Gradually use a few small-scale dedicated servers in combination with an expensive cloud platform, where - on little traffic - the dedicated servers should first filled up before the cloud kicks in. Hedging against occasional traffic spikes.
nginx
Is there an easy way (without nginx plus) to achieve a "waterfall like" set-up, where small servers should first be served up to a maximum number of concurrent connections, or better, current bandwidth before the cloud platform sees any traffic?
Nginx Config, Libraries, Tools?
Thanks

You will use nginx upstream module.
If you want total control, set your cloud servers with backup parameter, so that they won't be used until your primary servers fail. Then use custom monitoring scripts to determine when those could servers should kick-in, change nginx config and remove the backup keyword from them. Also monitor conditions when you want to stop using the cloud servers and alter the nginx config.
More simple solution (but without fine tuning like avoiding spikes) is to use the max_conns=number parameter. Nginx should start to use the backup server if all other already have max number of connections (I didn't test it).
NOTE: max_conns parameter was only available in paid nginx between v1.5.9 and v1.11.5, so the only solution with these versions is own monitoring + reloading of nginx config when needed to change the upstream servers. Thanks Mickaël Le Baillif's comment to point out this parameter is now available to all.

understanding load balancing in asp.net

I'm writing a website that is going to start using a load balancer and I'm trying to wrap my head around it.
Does IIS just do all the balancing for you?
Do you have a separate web layer that sits on the distributed server that does some work before sending to the sub server, like auth or other work?
It seems like a lot of the articles I keep reading don't really give me a straight answer, or I'm just not understanding them correctly, I'd like to get my head around how true load balancing works from a techincal side, and if anyone has any code to share that would also be nice.
I understand caching is gonna be a problem but that's a different topic, session as well.

IIS do not have a load balancer by default but you can use at least two Microsoft technologies:
Application Request Routing that integrates with IIS, there you should ideally have a separate web layer to do routing work,
Network Load Balancing that is integrated with Microsoft Windows Server, there you can join existing servers into NLB cluster.
Both of those technologies do not require any code per se, it a matter of the infrastructure. But you must of course remember about load balanced environment during development. For example, to make a web sites truly balanced, they should be stateless. Otherwise you will have to provide so called stickiness between client and the server, so the same client will be connecting always to the same server.
To make service stateless, do not persist any state (Session, for example, in case of ASP.NET website) on the server but on external server shared between all servers in the farm. So it is common for example to use external ASP.NET Session server (StateServer or SQLServer modes) for all sites in the cluster.
EDIT:
Just to clarify a few things, a few words about both mentioned technologies:
NLB works on network level (as a networking driver in fact), so without any knowledge about applications used. You create so called clusters consisting of a few machines/servers and expose them as a single IP address. Then another machine can use this IP as any other IP, but connections will be routed to the one of the cluster's machines automatically. A cluster is configured on each server, there is no external, additional routing machine. Depending on the clusters settings, as we have already mentioned, a stickiness can be enabled or disabled (called here a Single or None Affinity). There is also a Load weight parameter, so you can set weighed load distribution, sending more connections to the fastest machine for example. But this parameter is static, it can't be dynamically based on network, CPU or any other usage. In fact NLB does not care if target application is even running, it just route network traffic to the selected machine. But it notices servers went offline, so there will be no routing there. The advantages of NLB is that it is quite lightweight and requires no additional machines.
ARR is much more sophisticated, it is built as a module on top of IIS and is designed to make the routing decisions at application level. Network load balancing is only one of its features as it is a more complete, routing solution. It has "rule-based routing, client and host name affinity, load balancing of HTTP server requests, and distributed disk caching" as Microsoft states. You create there Server Farms with many options like load balance algorithm, load distribution and client stickiness. You can define health tests and routing rules to forward request to other servers. Disadvantage of all of it is that there should be a dedicated machine where ARR is installed, so it takes more resources (and costs).
NLB & ARR - as using a single ARR machine can be the single point of failure, Microsoft states that it is worth consideration to create a NLB cluster of ARR machines.

Does IIS just do all the balancing for you?
Yes,if you configure Application Request Routing:
Do you have a separate web layer that sits on the distributed server
Yes.
that does some work before sending to the sub server, like auth or other work?
No, ARR is pretty 'dumb':
IIS ARR doesn't provide any pre-authentication. If pre-auth is a requirement then you can look at Web Application Proxy (WAP) which is available in Windows Server 2012 R2.
It just acts as a transparent proxy that accepts and forwards requests, while adding some caching when configured.
For authentication you can look at Windows Server 2012's Web Application Proxy.

Some tips, and perhaps items to get yourself fully acquainted with:
ARR as all the above answers above state is a "proxy" that handles the traffic from your users to your servers.
You can handle State as Konrad points out, or you can have ARR do "sticky" sessions (ensure that a client always goes to "this server" - presumably the server that maintains state for that specific client). See the discussion/comments on that answer - it's great.
I haven't worn an IT/server hat for so long and frankly haven't touched clustering hands on (always "handled for me automagically" by some provider), so I did ask this question from our host, "what/how is replication among our cluster/farm" done?" - The question covers things like
I'm only working/setting things on 1 server, does that get replicated across X VMs in our cluster/farm? How long?
What about dynamically generated,code and/or user generated files (file system)? If it's on VM1's file system, and I have 10 load balanced VMs, and the client can hit any one of them at any time, then...?
What about encryption? e.g. if you use DPAPI to encrypt web.config stuff (e.g.db conn strings/sections), what is the impact of that (because it's based on machine key, and well, the obvious thing is now you have machine(s) or VM(s). RSA re-write....?
SSL: ARR can handle this for you as well, and that's great! But as with all power, comes a "con" - if you check/validate in your code, e.g. HttpRequest.IsSecureConnection, well, it'll always be false. Your servers/VMs don't have the cert, ARR does. The encrypted conn is between client and ARR. ARR to your servers/VMs isn't. As the link explains, if you prefer it the other way around (no offloading), you can...but that means all your servers/VMs should then have a cert (and how that pertains to "replication" above starts popping in your head).
Not meant to be comprehensive, just listing things out from memory...Hth

Hitting a specific jboss instance behind a load balancer

My webapp is deployed in a cluster of multiple JBoss instances. There is an admin page in the webapp to perform certain Jboss instance-specific operations.
The problem is that requests are sent to a load balancer instead of directly hitting specific individual instance.
Is there any way to direct request to a specific instance? Or at least when the admin page is up, all subsequent requests (Ajax) will stick to the original instance that serves the page at the beginning.
I don't think HttpSession is going to help here. I need to target specific instance and not maintaining the state of individual client.
Thanks.

You were looking for how to configure for Sticky sessions.
Send all requests in a user session consistently to the same backend server known as persistence or stickiness. A significant downside to this technique is its lack of automatic failover: if a backend server goes down, its per-session information becomes inaccessible, and any sessions depending on it are lost. The same problem is usually relevant to central database servers; even if web servers are "stateless" and not "sticky".
Assignment to a particular server might be based on a username, client IP address, or by random assignment. While there are advantages and disadvantages to the approaches.
I would suggest to please go through below article in configuring JBoss under a cluster rather going in deep understanding unless and until you would want to know in deep.
http://docs.jboss.org/jbossas/docs/Clustering_Guide/beta422/html/clustering-http-nodes.html
https://community.jboss.org/wiki/HTTPLoadbalancer

Round robin load balancing options for a single client

We have a biztalk server that makes frequent calls to a web service that we also host.
The web service is hosted on 4 servers with a DNS load balancer sitting between them. The theory is that each subsequent call to the service will round robin the servers and balance the load.
However this does not work presumably because the result of the DNS lookup is cached for a small amount of time on the client. The result is that we get a flood of requests to each server before it moves on to the next.
Is that presumption correct and what are the alternative options here?
a bit more googling has suggested that I can disable client side caching for DNS: http://support.microsoft.com/kb/318803
...however this states the default cache time is 1 day which is not consistent with my experiences

You need to load balance at a lower level with NLB Clustering on Windows or LVS on Linux (or other equivalent piece of software). If you are letting clients to the web service keep an HTTP connection open for longer than a single request/response you still might not get the granularity of load balancing you are looking for so you might have to reconfigure your application servers if that is the case.

The solution we finally decided to go with was Application Request Routing which is an IIS extension. In tests this has shown to do what we want and is far easier for us (as developers) to get up and running as compared to a hardware load balancer.
http://www.iis.net/download/ApplicationRequestRouting

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex