Backend high availability solutions in nginx

Backend high availability solutions in nginx - nginx

Looking for possibilities / alternatives for backend HA in nginx. At the moment we are using lua-nginx which does not support HttpUpstream module, which would be first choice for me. I know a bit about pacemaker but not never used it so not sure if it would be good combination with nginx. Any hints, experience?

Related

getting more data from nginx prometheus exporter

I'm using nginx Prometheus exporter, but the amount of data that its metrics are very little, I want to get information of access.log and error.log too, like how much 200, 404,...
what is your suggestion?

The richier metrics are only available with NGINX Plus which comes at a premium. Unless you want to modify the source code, additional metrics are only available through the log file.
If you are already aggregating logs, say with an elasticsearch, you can use the related exporter to extract metrics.
If not, there are solutions either from dedicated project such as the nginxlog-exporter or generic solutions such as mtail where you can write your own rules.
Finally, there is an intermediary solution which is the official one on Prometheus site: extracting metrics with lua. This is maybe the more robust solution but it comes at the cost of the setup.
It is hard to make a suggestion. It all comes to your time/skill/money budget and the usage you are making of nginx. It you are using it as a proxy, envoy is gaining traction.
In fact, your question is a bit broad but worth an answer because the basic monitoring available is really poor for the widespread usage nginx enjoy (IMNSHO)

Advanced HTTP/2 proxy for load balancing of distributed scraping solution

I have built a distributed HTTP scraper solution that uses different "exit addresses" addresses by design in order to balance the network load.
The solution supports IPv4, IPv6 and HTTP proxy to route the traffic.
Each processor was responsible to define the most efficient route to balance the traffic and it was temporarily implemented manually for prototyping. Currently, the solution grows and with the number of processors as the complexity of the load balancing task get higher, that's why I need a way to create a component dedicated to it.
I did some rather extensive research, but seem to have failed in finding a solution for load balancing traffic between IPv6, IPv4 (thousands of local addresses) and public HTTP proxies. The solution needs to support weights, app-level response checks and cool-down periods.
Does anyone know a solution that already solves this problem? Before I start developing a custom one.
Thanks for your help!

If you search for load balancing proxy you'll discover the Cache Array Routing Protocol (CARP). This CARP might not be what you're searching for and there exists servers only for the proxy-cache what I never knew till now.
Nevertheless those servers have own load balancers too, and perhaps that's a detail where it's worth it to search more.
I found a presentation mentioning CARP as outstanding solution too: https://cs.nyu.edu/artg/internet/Spring2004/lectures/lec_8b.pdf
Example: for proxy-arrays in Netra Proxy Cache Server: https://docs.oracle.com/cd/E19957-01/805-3512-10/6j3bg665f/index.html
Also there exist several concepts for load-balancing (https://link.springer.com/article/10.1023/A:1020943021842):
The three proposed methods can broadly be divided into centralized and decentralized
approaches. The centralized history (CH) method makes use of the transfer rate of each
request to decide which proxy can provide the fastest turnaround time for the next job.
The route transfer pattern (RTP) method learns from the past history to build a virtual
map of traffic flow conditions of the major routes on the Internet at different times of the
day. The map information is then used to predict the best path for a request at a particular time of the day. The two methods require a central executive to collate information
and route requests to proxies. Experimental results show that self-organization can be
achieved (Tsui et al., 2001). The drawback of the centralized approach is that a bottleneck and a single point of failure is created by the central executive. The decentralized
approach—the decentralized history (DH) method—attempts to overcome this problem
by removing the central executive and put a decision maker in every proxy (Kaiser et al.,
2000b) regarding whether it should fetch a requested object or forward the request to another
proxy.
As you use public proxy-servers probably you won't use decentralized history (DH) but centralized history (CH) OR the route transfer pattern (RTP).
Perhaps it would be even useful to replace your own solution completely, i.e. by this: https://github.blog/2018-08-08-glb-director-open-source-load-balancer/. I've no reason for this special example, it's just random by search results I found.
As I'm not working with proxy-servers this post is just a collection of findings, but perhaps there is a usable detail for you. If not, don't mind - probably you know most or all already and it's never adding anything new for you. Also I never mention any concrete solution.

Have you checked this project? https://Traefik.io which supports http/2 and tcp load balancing. The project is open source and available on github. It is build using Go. I'm using it now as my reverse proxy with load balancing for almost everything.
I also wrote a small blog post on docker and Go where I showcase the usage of Traefik. That also might help you in your search. https://marcofranssen.nl/docker-tips-and-tricks-for-your-go-projects/
In the traefik code base you might find your answer, or you might decide to utilize traefik to achieve your goal instead of home grown solution.
See here for a nice explanation on the soon to be arriving Traefik 2.0 with TCP support.
https://blog.containo.us/back-to-traefik-2-0-2f9aa17be305

Varnish to be used for https

Here's the situation. I have clients over a secured network (https) that talk to multiple backends. Now, I wanted to establish a reverse proxy for majorly load balancing (based on header data or cookies) and a little caching. So, I thought varnish could be of use.
But, varnish does not support ssl-connection. As I've read at many places, quoting, "Varnish does not support SSL termination natively". But, I want every connection, ie. client-varnish and varnish-backend to be over https. I cannot have plaintext data anywhere throughout network (there are restrictions) so nothing else can be used as SSL-Terminator (or can be?).
So, here are the questions:
Firstly, what does this mean (if someone can explain in simple terms) that "Varnish does not support SSL termination natively".
Secondly, is this scenario good to implement using varnish?
and Finally, if varnish is not a good contender, should I switch to some other reverse proxy. If yes, then which will be suitable for the scenario? (HA, Nginx etc.)

what does this mean (if someone can explain in simple terms) that "Varnish does not support SSL termination natively"
It means Varnish has no built-in support for SSL. It can't operate in a path with SSL unless the SSL is handled by separate software.
This is an architectural decision by the author of Varnish, who discussed his contemplation of integrating SSL into Varnish back in 2011.
He based this on a number of factors, not the least of which was wanting to do it right if at all, while observing that the de facto standard library for SSL is openssl, which is a labyrinthine collection of over 300,000 lines of code, and he was neither confident in that code base, nor in the likelihood of a favorable cost/benefit ratio.
His conclusion at the time was, in a word, "no."
That is not one of the things I dreamt about doing as a kid and if I dream about it now I call it a nightmare.
https://www.varnish-cache.org/docs/trunk/phk/ssl.html
He revisited the concept in 2015.
His conclusion, again, was "no."
Code is hard, crypto code is double-plus-hard, if not double-squared-hard, and the world really don't need another piece of code that does an half-assed job at cryptography.
...
When I look at something like Willy Tarreau's HAProxy I have a hard time to see any significant opportunity for improvement.
No, Varnish still won't add SSL/TLS support.
Instead in Varnish 4.1 we have added support for Willys PROXY protocol which makes it possible to communicate the extra details from a SSL-terminating proxy, such as HAProxy, to Varnish.
https://www.varnish-cache.org/docs/trunk/phk/ssl_again.html
This enhancement could simplify integrating varnish into an environment with encryption requirements, because it provides another mechanism for preserving the original browser's identity in an offloaded SSL setup.
is this scenario good to implement using varnish?
If you need Varnish, use it, being aware that SSL must be handled separately. Note, though, that this does not necessarily mean that unencrypted traffic has to traverse your network... though that does make for a more complicated and CPU hungry setup.
nothing else can be used as SSL-Terminator (or can be?)
The SSL can be offloaded on the front side of Varnish, and re-established on the back side of Varnish, all on the same machine running Varnish, but by separate processes, using HAProxy or stunnel or nginx or other solutions, in front of and behind Varnish. Any traffic in the clear is operating within the confines of one host so is arguably not a point of vulnerability if the host itself is secure, since it never leaves the machine.
if varnish is not a good contender, should I switch to some other reverse proxy
This is entirely dependent on what you want and need in your stack, its cost/benefit to you, your level of expertise, the availability of resources, and other factors. Each option has its own set of capabilities and limitations, and it's certainly not unheard-of to use more than one in the same stack.

Squid proxies very slow

I have proxies set up via Squid. The proxies usually works great, but last couple months they've been very slow from time to time. It's not consistent.
I really don't know where to start.
Which steps should I follow to troubleshoot?

Check to see if the system is IO/CPU/Memory bound. If you're using it for caching then you may be running out of disk space.
If you don't see any issues around your system you can move on to better understanding
Understand the types of ACL's you are using http://www.eu.squid-cache.org/Doc/config/acl/. Note that there are fast and slow ACL's.
Learn how to utilize the debug sections. Setting the appropriate debug levels on possible pain points could reveal problems:
http://wiki.squid-cache.org/KnowledgeBase/DebugSections
General tip on troubleshooting something like this is to try to understand the pattern. Is it slow during peek times? Is this a virtual machine that could be sharing resources with another vm that has high load? Good luck!

We had exhausted nofiles limits in our squid proxy server. Upon doubling the limits, it was super fast. We also added net.ipv4.tcp_tw_reuse = 1 parameter in sysctl.conf

Maybe the problem are the proxies. I tell you this because I used before Squid proxies and I had only problems with their proxies. Now, I use proxies from Proxy-N-Vpn.com and they are working perfect.

Overhead of serving pages - JSPs vs. PHP vs. ASPXs vs. C

I am interested in writing my own internet ad server.
I want to serve billions of impressions with as little hardware possible.
Which server-side technologies are best suited for this task? I am asking about the relative overhead of serving my ad pages as either pages rendered by PHP, or Java, or .net, or coding Http responses directly in C and writing some multi-socket IO monster to serve requests (I assume this one wins, but if my assumption is wrong, that would actually be most interesting).
Obviously all the most efficient optimizations are done at the algorithm level, but I figure there has got to be some speed differences at the end of the day that makes one method of serving ads better than another. How much overhead does something like apache or IIS introduce? There's got to be a ton of extra junk in there I don't need.
At some point I guess this is more a question of which platform/language combo is best suited - please excuse the in-adroitly posed question, hopefully you understand what I am trying to get at.

You're going to have a very difficult time finding an objective answer to a question like this. There are simply too many variables:
Does your app talk to a database? If so, which one? How is the data modeled? Which strategy is used to fetch the data?
Does your app talk across a network to serve a request (web service, caching server, etc)? If so, what does that machine look like? What does the network look like?
Are any of your machines load balanced? If so, how?
Is there caching? What kind? Where does it live? How is cached data persisted?
How is your app designed? Are you sure it's performance-optimal? If so, how are you sure?
When does the cost of development outweigh the cost of adding a new server? Programmers are expensive. If reduced cost is your goal with reducing hardware, you'll likely save more money by using a language in which your programmers feel productive.
Are you using 3rd party tools? Should you be? Are they fast? Won't some 3rd party tools reduce your cost?
If you want some kind of benchmark, Trustleap publishes challenge results between their G-Wan server using ANSI C scripts, IIS using C#, Apache with PHP, and Glassfish with Java. I include it only because it attempts to measure the exact technologies you mention. I would never settle on a technology without considering the variables above and more.

Errata:
G-Wan uses ANSI C scripts (rather than "compiled ANSI C" as explained above)
And it transparently turns synchronous (connect/recv/send/close) system calls into asynchronous calls (this is working even with shared libraries).
This can help a great deal to scale with database server requests, posts, etc.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex