Can Nginx cache hierarchies in a "sibling manner" like on Squid?
e.g. http://wiki.squid-cache.org/Features/CacheHierarchy
I am trying to preserve the origin server from too many requests, and ideally it only gets pulled once. Ideally, the system would first would poll all of the other siblings and parent CDN nodes before it checked back with the server.
(Note: One Varnish-based CDN company that seems to do this well is Fastly. I basically am hoping for some pointers in how to make a Nginx-based version of "Origin Shield" http://www.fastly.com/products/origin-shield/ )
Related
I've an old site based on IIS that, for historical reasons, was using lots of RewriteRule via Helicon APE. Now, when we hit the server with multiple clients, Helicon APE frequently crashes (quite frequently, actually). The entire set of IIS servers (4) are expected to grow and the entire system to scale, and a lot of effort was done recently in the webapp to support new features and user growth.
Someone suggested to use NGINX as a load balancer before the IIS servers, as it will handle way better the increasing amount of traffic, and apply those rewrites before hitting IIS, so the URLs would be converted to the new formats before load balancing them.
Following the advice, we have set one POC nginx 1.13 on linux with rewrite rules (from the ones used in APE) and using proxy_pass with two of the servers. But we have noticed several issues this way:
rewrite rules seems to NOT work the way they should; we can check that the regex's are certainly valid (putting them in locations), but the URL seems to be not rewritten.
ProxyPass returns usually a 400 bad request or does not hit the servers.
However, if we set several locations with some of the simpler regexs, and then we put inside ProxyPass to the backend server and the new URL patterns, the servers are hit with the right requests. This solution, however, brings some problems: some of our rewrites are additions to anothers, so the transformations could be done in 3 steps (one changes first part of the rule, the second changes another, and the third will join all together to put the valid url with a break flag). This is impossible to be done mixing locations.
A lot of research through StackOverflow, blogs, support sites and mailings lists has been put in place to find a solution to our problem, but sometimes the suggested solution does not work at all (or partially), and to be honest, after a week with this, we are concerned the arquitecture we had in mind is not possible to be done.
We have tried this with haproxy as well, with really odd behavior from haproxy (ie: sending error messages attached to the request being LB'd).
As the title summarizes, after the long description above, the question is: can someone confirms what we are trying to achieve can really be done with nginx? If not, what could be used?
Need some advice before starting develop some things.. I've 15 WordPress websites on different installs, and I've remote server which gets data 24/7 from those websites.
I've reached a point that I want the server to modify the websites based on his calculated data.
The things are this:
Should I allow the server the access the WP DB remotely and modify things without using WP on the circle?
Or, use WP REST API and supply some secured routes which provide data and accept data and make those changes?
My instinct is to use the WP API, but. After all its a PHP (nginx+apache) which have some limits (timeout for example) and I find it hard to run hard and long process on the WP itself.
I can divide the tasks to different levels, for example:
fetching data (simple get)
make some process on the remote server
loop and modify in small batches to another route
My concerns are that this circle require perfect match between remote server and WP API, and any change or fix on WP side brings plugins update on the websites which is not much fun.
Hope for any ideas and suggests to make it forward.
"use WP REST API and supply some secured routes which provide data and accept data and make those changes", indeed.
i don't know why timeout or another limits may cause a problem - but using API is the best way for such kind of cases. You can avoid timeout problems with some adjustments on web servers side.
Or you can increase memory, timeout limit exclusively for requested server.
f.e.
if ($_SERVER["remote_attr"]=='YOUR_MAIN_SERVER_IP') {
ini_set('max_execution_time',1000);
ini_set('memory_limit','1024M');
}
I am looking for as reliable and accurate / quick means possible to add in some htaccess code to block visits to a website from countries / IPs which are not in the white listed list of countries I want to allow access for. I have looked at https://www.ip2location.com/free/visitor-blocker which seems to offer a solution - for the 4 allowed countries I want to allow access - it has created a 4.1MB htaccess file! Will this mean slow access when someone attempts to view the site? I guess using a free service like this means the data is likely nowhere near comprehensive?
Does anyone have any suggestions on a good way to allow just visitors from a few countries access to a website?
It sounds like the service you used basically tried tried to brute force the blacklist. If you look into the htaccess file I'm sure you will be a long list of hard coded IP blocks.
In my opinion this is a terrible way to handle a geographic blacklist. To your original question - there is no "most reliable, most accurate, and quickest" method. Those are separate categories and you will need to preference one over the next.
For performance you could consider blacklisting at the routing level / dns server / proxy. This obviously isn't going to be the quickest way in terms of performance. There are Apache Modules that exist that allow you to use a local database to compare the incoming IP address with a list of known IP blocks from the blacklisted country. One of the main issues with this is that you need to constantly update your database to take in new IP blocks.
In my opinion the "best" method to do this is a simple redirect at the application layer using server side code. There exists several geographic API's where you can send in the IP or Hostname and get back a country of origin. An example:
$xml= new SimpleXMLElement(file_get_contents('http://www.freegeoip.net/xml/{IP_or_hostname}'));
if($xml->CountryCode == "US") {
header('Location: http://www.google.com');
}
There are two ways to block a visitor in web server. One is using firewall (.htaccess etc) and another one is using server-side scripting (PHP etc).
If you are concern of the performance of the firewall option, then you can download the IP2Location LITE database from http://lite.ip2location.com and implement the database in your local server. For every connection, you query the visitor IP address and find their country. You can redirect or block them using the PHP codes. Please find the complete steps in https://www.ip2location.com/tutorials/redirect-web-visitors-by-country-using-php-and-mysql-database
There is also another option to use remote geolocation API. However, we do not suggest this method because of network latency. It will slow down all user experience due to API queries.
I want to let my clients speak directly with ElasticSearch REST API, obviously preventing them from performing any data or configuration change.
I had a look at ElasticSearch REST interface and I noticed the pattern: HTTP GET requests are pretty safe (harmless queries and status of cluster).
So I thought I can use Cloudfront as a CDN/Proxy that only allows GET/HEAD methods (you can impose such restrict it in the main configuration).
So far so good, all is set up. But things don't work because I would need to open my EC2 security group to the world in order to be reachable from Cloudfront! I don't want this, really!
When I use EC2 with RDS, I can simply allow access to my EC2 security group in RDS security groups. Why can't I do this with CloudFront? Or can I?
Ideas?
edit: It's not documented, but ES accepts facets query, which involve a (JSON) body, not only with POST, but also with GET. This simply breaks HTTP recommendation (as for RFC3616) by not ignoring the body for GET request (source).
This relates because, as pointed out, exposing ES REST interface directly can lead to easy DOS attacks using complex queries. I'm still convinced though, having one less proxy is still worth it.
edit: Other option for me would be to skip CloudFront and adding a security layer as an ElasticSearch plugin as shown here
I ended coding with my own plugin. Surprisingly there was nothing quite like this around.
No proxies, no Jetty, no Tomcat.
Just a the original ES rest module and my RestFilter. Using a minimum of reflection to obtain the remote address of the requests.
enjoy:
https://github.com/sscarduzio/elasticsearch-readonlyrest-plugin
Note that even a GET request can be harmful in Elasticsearch. A query which simply takes up too much resources to compute will bring down your cluster. Facets are a good way to do this.
I'd recommend writing a simple REST API you place in front of ES so you get much more control over what hits your search cluster. If that's not an option you could consider running Nginx on your ES boxes to act as a local reverse proxy, which will give you the same control (and a whole lot more) as CloudFront does. Then you'd only have to open up Nginx to the world, instead of ES.
A way to do this in AWS would be:
Set up an Application Load Balancer in front of your ES cluster. Create a TLS cert for the ALB and serve https. Open the ES security group to the ALB.
Set up CloudFront and use the ALB as origin. Pass a custom header with a secret value (for WAF, see next point).
Set up WAF on your ALB to only allow requests that contain the custom header with the secret value. Now all requests have to go through CloudFront.
Set up a Lambda#Edge function on your CloudFront distribution to either remove the body from GET requests, or DENY such requests.
It’s quite some work, but there’s advantages over the plugin, e.g.:
CloudFront comes with free network DDOS protection
CloudFront gives your users lower latency to ES because of the fast CloudFront network and global PoP’s.
Opens many options to use CloudFront, WAF and Lamba#Edge to further protect your ES cluster.
I’m working on sample code in CDK to set all of this up. Will report back when that’s ready.
Is it possible to configure nginx to route all traffic to the primary node, but duplicate the requests (and ignore the response) to a second node? This is for testing a performance update to a web server in production with minimal risk.
I found Shadow Proxy but was concerned about its impact on performance and stability of a production environment.
Seems there is not a stable and high performance and easy way to do this for production environment.
here are some methods instead:
I found some blogs said using nginx-lua can do this.
In our environment, we split the traffic and force some certain requests to our sandbox server, and the ratio is under control. If have problems, only a few users will be affected, and, these group of users could be internal users, such as colleagues of your department or entire company.
Recover the requests from the access log for those "GET" requests. for POST requets, we usually using auto test cases.