Nginx rewrite and (later) load balancer together: is that possible? - nginx

I've an old site based on IIS that, for historical reasons, was using lots of RewriteRule via Helicon APE. Now, when we hit the server with multiple clients, Helicon APE frequently crashes (quite frequently, actually). The entire set of IIS servers (4) are expected to grow and the entire system to scale, and a lot of effort was done recently in the webapp to support new features and user growth.
Someone suggested to use NGINX as a load balancer before the IIS servers, as it will handle way better the increasing amount of traffic, and apply those rewrites before hitting IIS, so the URLs would be converted to the new formats before load balancing them.
Following the advice, we have set one POC nginx 1.13 on linux with rewrite rules (from the ones used in APE) and using proxy_pass with two of the servers. But we have noticed several issues this way:
rewrite rules seems to NOT work the way they should; we can check that the regex's are certainly valid (putting them in locations), but the URL seems to be not rewritten.
ProxyPass returns usually a 400 bad request or does not hit the servers.
However, if we set several locations with some of the simpler regexs, and then we put inside ProxyPass to the backend server and the new URL patterns, the servers are hit with the right requests. This solution, however, brings some problems: some of our rewrites are additions to anothers, so the transformations could be done in 3 steps (one changes first part of the rule, the second changes another, and the third will join all together to put the valid url with a break flag). This is impossible to be done mixing locations.
A lot of research through StackOverflow, blogs, support sites and mailings lists has been put in place to find a solution to our problem, but sometimes the suggested solution does not work at all (or partially), and to be honest, after a week with this, we are concerned the arquitecture we had in mind is not possible to be done.
We have tried this with haproxy as well, with really odd behavior from haproxy (ie: sending error messages attached to the request being LB'd).
As the title summarizes, after the long description above, the question is: can someone confirms what we are trying to achieve can really be done with nginx? If not, what could be used?

Related

Can i make WordPress display wp-admin's URLs prettier in browser?

I want to prettify wp-admin URLs, for example, when I go to Pages dashboard, the URL normally:
http://localhost/lb/wp-admin/edit.php?post_type=page
Will be shown
http://localhost/lb/backend/pages
in the browser address bar
Can I achieve this with .htaccess?
I have been playing around with RewriteRule but it seems opposite to what I'm looking for
It rewrites URL received from the client when I actually want to rewrite URL before the server send data back to the client.
Setting up “pretty URLs” usually requires two things:
The rewriting of the incoming requests for the “pretty” version to the internal URL,
and, of course, changing what URLs are output in the HTML code to begin with - because those obviously determine which URL your browser requests, when you click on any links, submit forms etc.
So for that second part, you’d have to modify (probably almost) every URL that gets created in the admin backend. Not sure if WP has any hook for that, not sure how well this would work if maybe the odd 3rd-party plugin doesn’t follow the conventions of how it adds itself to the admin area menu in the first place, etc.
If the URLs that are output can’t be changed, one can still have the server externally redirect the incoming “ugly” request to the pretty version first - but that means a bit of overhead, plus with any POST requests you’d have to be really careful, a standard external rewrite would make the browser issue a GET request next, so the POST data is lost …
The one example of URLs you gave is a pretty trivial case still - but in the backend, you’ll have to deal with URLs that include more parameters, stuff like /wp-admin/post.php?post=1234&action=edit, certain plugins might send even more for some kind of specific functionality, etc. pp.
All things considered, I’d say what you want here, doesn’t make that much sense to begin with. Pretty URLs are also called “vanity URLs” - and that’s what this would be here, more or less pure vanity without any real benefits. The potential drawbacks and problems you’ll be likely to run into here, are not worth the effort IMHO.

Website - blocking view from none specified country locations

I am looking for as reliable and accurate / quick means possible to add in some htaccess code to block visits to a website from countries / IPs which are not in the white listed list of countries I want to allow access for. I have looked at https://www.ip2location.com/free/visitor-blocker which seems to offer a solution - for the 4 allowed countries I want to allow access - it has created a 4.1MB htaccess file! Will this mean slow access when someone attempts to view the site? I guess using a free service like this means the data is likely nowhere near comprehensive?
Does anyone have any suggestions on a good way to allow just visitors from a few countries access to a website?
It sounds like the service you used basically tried tried to brute force the blacklist. If you look into the htaccess file I'm sure you will be a long list of hard coded IP blocks.
In my opinion this is a terrible way to handle a geographic blacklist. To your original question - there is no "most reliable, most accurate, and quickest" method. Those are separate categories and you will need to preference one over the next.
For performance you could consider blacklisting at the routing level / dns server / proxy. This obviously isn't going to be the quickest way in terms of performance. There are Apache Modules that exist that allow you to use a local database to compare the incoming IP address with a list of known IP blocks from the blacklisted country. One of the main issues with this is that you need to constantly update your database to take in new IP blocks.
In my opinion the "best" method to do this is a simple redirect at the application layer using server side code. There exists several geographic API's where you can send in the IP or Hostname and get back a country of origin. An example:
$xml= new SimpleXMLElement(file_get_contents('http://www.freegeoip.net/xml/{IP_or_hostname}'));
if($xml->CountryCode == "US") {
header('Location: http://www.google.com');
}
There are two ways to block a visitor in web server. One is using firewall (.htaccess etc) and another one is using server-side scripting (PHP etc).
If you are concern of the performance of the firewall option, then you can download the IP2Location LITE database from http://lite.ip2location.com and implement the database in your local server. For every connection, you query the visitor IP address and find their country. You can redirect or block them using the PHP codes. Please find the complete steps in https://www.ip2location.com/tutorials/redirect-web-visitors-by-country-using-php-and-mysql-database
There is also another option to use remote geolocation API. However, we do not suggest this method because of network latency. It will slow down all user experience due to API queries.

Sibling Check Hierarchies on Nginx?

Can Nginx cache hierarchies in a "sibling manner" like on Squid?
e.g. http://wiki.squid-cache.org/Features/CacheHierarchy
I am trying to preserve the origin server from too many requests, and ideally it only gets pulled once. Ideally, the system would first would poll all of the other siblings and parent CDN nodes before it checked back with the server.
(Note: One Varnish-based CDN company that seems to do this well is Fastly. I basically am hoping for some pointers in how to make a Nginx-based version of "Origin Shield" http://www.fastly.com/products/origin-shield/ )

Using Cloudfront to expose ElasticSearch REST API in read only (GET/HEAD)

I want to let my clients speak directly with ElasticSearch REST API, obviously preventing them from performing any data or configuration change.
I had a look at ElasticSearch REST interface and I noticed the pattern: HTTP GET requests are pretty safe (harmless queries and status of cluster).
So I thought I can use Cloudfront as a CDN/Proxy that only allows GET/HEAD methods (you can impose such restrict it in the main configuration).
So far so good, all is set up. But things don't work because I would need to open my EC2 security group to the world in order to be reachable from Cloudfront! I don't want this, really!
When I use EC2 with RDS, I can simply allow access to my EC2 security group in RDS security groups. Why can't I do this with CloudFront? Or can I?
Ideas?
edit: It's not documented, but ES accepts facets query, which involve a (JSON) body, not only with POST, but also with GET. This simply breaks HTTP recommendation (as for RFC3616) by not ignoring the body for GET request (source).
This relates because, as pointed out, exposing ES REST interface directly can lead to easy DOS attacks using complex queries. I'm still convinced though, having one less proxy is still worth it.
edit: Other option for me would be to skip CloudFront and adding a security layer as an ElasticSearch plugin as shown here
I ended coding with my own plugin. Surprisingly there was nothing quite like this around.
No proxies, no Jetty, no Tomcat.
Just a the original ES rest module and my RestFilter. Using a minimum of reflection to obtain the remote address of the requests.
enjoy:
https://github.com/sscarduzio/elasticsearch-readonlyrest-plugin
Note that even a GET request can be harmful in Elasticsearch. A query which simply takes up too much resources to compute will bring down your cluster. Facets are a good way to do this.
I'd recommend writing a simple REST API you place in front of ES so you get much more control over what hits your search cluster. If that's not an option you could consider running Nginx on your ES boxes to act as a local reverse proxy, which will give you the same control (and a whole lot more) as CloudFront does. Then you'd only have to open up Nginx to the world, instead of ES.
A way to do this in AWS would be:
Set up an Application Load Balancer in front of your ES cluster. Create a TLS cert for the ALB and serve https. Open the ES security group to the ALB.
Set up CloudFront and use the ALB as origin. Pass a custom header with a secret value (for WAF, see next point).
Set up WAF on your ALB to only allow requests that contain the custom header with the secret value. Now all requests have to go through CloudFront.
Set up a Lambda#Edge function on your CloudFront distribution to either remove the body from GET requests, or DENY such requests.
It’s quite some work, but there’s advantages over the plugin, e.g.:
CloudFront comes with free network DDOS protection
CloudFront gives your users lower latency to ES because of the fast CloudFront network and global PoP’s.
Opens many options to use CloudFront, WAF and Lamba#Edge to further protect your ES cluster.
I’m working on sample code in CDK to set all of this up. Will report back when that’s ready.

how to prevent vulnerability scanning

I have a web site that reports about each non-expected server side error on my email.
Quite often (once each 1-2 weeks) somebody launches automated tools that bombard the web site with a ton of different URLs:
sometimes they (hackers?) think my site has inside phpmyadmin hosted and they try to access vulnerable (i believe) php-pages...
sometimes they are trying to access pages that are really absent but belongs to popular CMSs
last time they tried to inject wrong ViewState...
It is clearly not search engine spiders as 100% of requests that generated errors are requests to invalid pages.
Right now they didn't do too much harm, the only one is that I need to delete a ton of server error emails (200-300)... But at some point they could probably find something.
I'm really tired of that and looking for the solution that will block such 'spiders'.
Is there anything ready to use? Any tool, dlls, etc... Or I should implement something myself?
In the 2nd case: could you please recommend the approach to implement? Should I limit amount of requests from IP per second (let's say not more than 5 requests per second and not more then 20 per minute)?
P.S. Right now my web site is written using ASP.NET 4.0.
Such bots are not likely to find any vulnerabilities in your system, if you just keep the server and software updated. They are generally just looking for low hanging fruit, i.e. systems that are not updated to fix known vulnerabilities.
You could make a bot trap to minimise such traffic. As soon as someone tries to access one of those non-existant pages that you know of, you could stop all requests from that IP address with the same browser string, for a while.
There are a couple of things what you can consider...
You can use one of the available Web Application Firewalls. It usually has set of rules and analytic engine that determine suspicious activities and react accordingly. For example in you case it can automatically block attempts to scan you site as it recognize it as a attack pattern.
More simple (but not 100% solution) approach is check referer url (referer url description in wiki) and if request was originating not from one of you page you rejected it (you probably should create httpmodule for that purpose).
And of cause you want to be sure that you site address all known security issues from OWASP TOP 10 list (OWASP TOP 10). You can find very comprehensive description how to do it for asp.net here (owasp top 10 for .net book in pdf), i also recommend to read the blog of the author of the aforementioned book: http://www.troyhunt.com/
Theres nothing you can do (reliabily) to prevent vulernability scanning, the only thing to do really is to make sure you are on top of any vulnerabilities and prevent vulernability exploitation.
If youre site is only used by a select few and in constant locations you could maybe use an IP restriction

Resources