So actually I'm using nginx to reverse proxy (and load balancing) some API backend servers with nginx and I'm using the directive limit_req_zone to limit max requests per IP and URI. No problem with that.
Eventually we might need to scale out and add a couple more of nginx instances. Every nginx instance uses a "shared memory zone" to temporary save (in a cache, I guess) every request so it can properly check if the request passes or not accordingly with the limit_req_zone mentioned above. That being said, how does nginx handles it if multiple nginx are running at same time?
For example:
limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;
This tells nginx to only allow 1 request per second coming from the same IP address, but what about if the second request (within the same second) comes to another nginx instance? As I understand, it will pass because they not share the shared memory where it stores the cache, I guess.
I've been trying to research a bit about it but could't find anything. Any help would be appreciate.
If by multiple nginx you mean multiple master processes, I'm not completely sure what the result is. To have multiple master processes running, they would need to have different configs / different ports to bind.
For worker processes with a single master instance, the shared memory is precisely that, shared, and all of the workers will limit the requests together. The code documentation says:
shared_memory — List of shared memory zones, each added by calling the ngx_shared_memory_add() function. Shared zones are mapped to the same address range in all nginx processes and are used to share common data, for example the HTTP cache in-memory tree.
http://nginx.org/en/docs/dev/development_guide.html#cycle
In addition, there's a blog entry about limit_req stating the following:
Zone – Defines the shared memory zone used to store the state of each IP address and how often it has accessed a request‑limited URL. Keeping the information in shared memory means it can be shared among the NGINX worker processes. The definition has two parts: the zone name identified by the zone= keyword, and the size following the colon. State information for about 16,000 IP addresses takes 1 megabyte, so our zone can store about 160,000 addresses.
Taken from https://www.nginx.com/blog/rate-limiting-nginx/
Related
I'm using the net/http library to make HTTP requests.
I have a use case where sometimes I resolve a domain name that fronts an Amazon CloudFront distribution. Occasionally for reasons of its own, CloudFront will resolve the DNS to a point of presence (POP) on a different continent, resulting in very high latencies.
When this happens, I want to cause a "targeted reset" making my http.Client "forget" about that POP, basically closing open TCP connections and forcing the DNS for that hostname to be re-resolved.
I'm not able to find a way to force this "targeted reset". My options seem to be:
Close all idle connections in the http.Transport using Transport.CloseIdleConnections(), but this will close all TCP connections and it won't do anything about cached DNS results.
Construct a whole new http.Client. I've confirmed this does result in re-doing the DNS lookup and it does also create a new connection pool. The downside is it forces re-resolution of all other DNS and force-closes all other TCP connections, even ones I like.
Is there a way to achieve the "targeted reset" (DNS for specific hostnames, TCP connections for specific IP addresses) I've described? I'm happy to use reflection and traverse down into the http.Client implementation to some extent.
I have server with nginx and one working app. I want to add several apps to this servers. I would like to assimilate a few things for myself.
What is the difference between load balancer and reverse proxy?
In which situations should I use the first, and in which situations should I use the second?
What should I use if my sites are static, and what if not static?
And additionally it would be a big plus to hear about containers in the context of several sites for nginx
Differences between load balancer and reverse proxy
A reverse proxy accepts a request from a client, forwards it to a server that can fulfill it, and returns the server’s response to the client.
A load balancer distributes incoming client requests among a group of servers, in each case returning the response from the selected server to the appropriate client.
Taken from nginx docs
TL;DR :
Reverse proxying is about : routing requests to the correct server using the domain name
Load balancing is about : distributing load to multiple instances
What should I use if my sites are static, and what if not static?
You can combine an HTTP reverse proxy + load balancing with both static and non static web apps, so it depends.
And additionally it would be a big plus to hear about containers in the context of several sites for nginx
I recommend one nginx container per app / site + a dynamic reverse proxy, traefik in particular (http://traefik.io)
You need a reverse proxy to route the incoming traffic to the proper application taking into account the content of the original request (and rules that you may define).
When the target application(s) is determined, you will need to load balance them in order to distribute the amount of work across them.
Both tasks can be done by software like classic nginx, apache, haproxy, etc or by those that are designed for the microservices world, like fabio, traefik and others.
I need to assign different IP addresses to different processes (mostly PHP & Ruby programs) running on my Linux server. They will be making queries to various servers, including the situation where processes connecting to the same external server should have different IPs.
How this can be achieved?
Any option (system wide, or PHP/Ruby-specific, using proxy servers etc) will suit me.
The processes bind sockets (both incoming and outgoing) to an interface (or multiple interfaces), addressable by IP address, with various ports. In order to have them directly addressable by different IP addresses, you must have them bind their sockets to different NICs (virtual or hardware).
You could point each process to a proxy (configure the hostname of the server to be queried to be a different proxy for each process), in which case the external server will see the different IPs of the proxies. Otherwise, if you could directly configure the processes to use different NICs for their communications, that would be ideal.
You may need to make changes to the code to make this configurable (very often, programmers create outgoing TCP connections with convenience functions without specifying the NIC they will use, as they typically don't care). In PHP, you can use "socket_bind" to bind the endpoint to a nic, e.g. see the first example in the docs for socket_bind.
As per #LeonardoRick request, I'm providing the details for the solution that I ended up with.
Say, I have a server with 172.16.0.1 and 172.16.0.2 IP addresses.
I set up nginx (on the same machine) with the configuration that was looking somewhat like this:
server {
# NEVER EXPOSE THIS SERVER TO THE INTERNET, MAKE SURE PORT 10024 is not available from outside
listen 127.0.0.1:10024;
# block access from outside on nginx level as well
allow 127.0.0.1;
deny all;
# actual proxy rules
location ~* ^/from-172-16-0-1/http(s?)\:\/\/(.*) {
proxy_bind 172.16.0.1;
proxy_pass http$1://$2?$args;
}
location ~* ^/from-172-16-0-2/http(s?)\:\/\/(.*) {
proxy_bind 172.16.0.2;
proxy_pass http$1://$2?$args;
}
}
(Actually I cannot remember all the details now (this code is 'from whiteboard', it's not an actual working one), nevertheless it should represent all the key ideas. Check regexes before deployment).
Double-check that port 10024 is firewalled and not accessible from outside, add extra authentication if necessary. Especially if you are running Docker.
This nginx setup makes it possible to run HTTP requests likehttp://127.0.0.1:10024/from-172-16-0-2/https://example.com/some-URN/object?argument1=something
Once received a request, nginx will fetch the HTTP response from the requested URL using the IP specified by the corresponding proxy_bind directive.
Then - as I was running in-house or open-source software - I simply configured it (or altered its code) so it would perform requests like the one above instead of (original) https://example.com/some-URN/object?argument1=something.
All the management - what IP should be used at the moment - was also done by 'my' software, it simply selected the necessary /from-172-16-0-XXX/ endpoint according to its business logic.
That worked very well for my original question/task. However, this may not be suitable for some other applications, where it could not be possible to alter the request URLs. However, a similar approach with setting some kind of proxy may work for those cases.
(If you are not familiar with nginx, there are some starting guides here and here)
We use nginx with an application server as a backend.
We need to limit number of simultaneous connections per IP to backend. We used limit_conn nginx directive for this purpose. But it doesn't work well in all cases.
If user generates a lot of connections from one IP and quickly closes them, then nginx passes this request to a backend, but because client connection is already closed, this connection is not count in limit_conn.
Is it possible to limit number of simultaneous connections per IP to backend server with nginx?
You may want to set
proxy_ignore_client_abort off;
Determines should the connection with a proxied server be closed if a
client closes a connection without waiting for a response.
from the documentation
Another suggestion is to use limit_req to limit the request rate.
I'm afraid this facility is not yet available for nginx out of the box. According to the Nginx FAQ
Many users have requested that Nginx implement a feature in the load
balancer to limit the number of requests per backend (usually to one).
While support for this is planned, it's worth mentioning that demand
for this feature is rooted in misbehaviour on the part of the
application being proxied
I've seen some 3rd parties module for that nginx-limit-upstream but I've never tried.
I'm trying to understand the best way to handle SOA on heroku, i've got it into my head that making requests to custom domains will somehow be slower, or would all requests go "out" via the internet?
On previous projects which are SOA in nature we've had dedicated hosting so could make requests like http://blogs/ (obviously on the internal network) I'm wondering if heroku treats *.herokuapp.com requests as "internal"... Or is it clever enough to know the myapp.com is actually myapp.herokuapp.com and route locally, or am i missing the point completely, and in fact all requests are "external"
What you are asking about is general knowledge of how internet requests are working.
Whenever you do request from your application to lets say example.com, domain name will first be translated into IP address using so called DNS servers.
So this how it works: does not matter you request myapp.com or myapp.heroku.com you will always request infromation from specific IP address, and domain name you have requested will be passed as part of request headers.
Server which receives this request will try to find in its internal records this domain name and handle request accordingly.
So conclusion is that does not matter you put myapp.com or myapp.heroku.com, the speed of request will always be same.
PS: As heroku will load balance your requests between different instances of your running myapp.com, the speed here will depend on several factors: how quickly your application will respond, how many instances you have running and load average per instance, how much is load balancer loaded at the moment. But surely it will not depend on which domain name you use.