Using Cloudfront to expose ElasticSearch REST API in read only (GET/HEAD) - http

I want to let my clients speak directly with ElasticSearch REST API, obviously preventing them from performing any data or configuration change.
I had a look at ElasticSearch REST interface and I noticed the pattern: HTTP GET requests are pretty safe (harmless queries and status of cluster).
So I thought I can use Cloudfront as a CDN/Proxy that only allows GET/HEAD methods (you can impose such restrict it in the main configuration).
So far so good, all is set up. But things don't work because I would need to open my EC2 security group to the world in order to be reachable from Cloudfront! I don't want this, really!
When I use EC2 with RDS, I can simply allow access to my EC2 security group in RDS security groups. Why can't I do this with CloudFront? Or can I?
Ideas?
edit: It's not documented, but ES accepts facets query, which involve a (JSON) body, not only with POST, but also with GET. This simply breaks HTTP recommendation (as for RFC3616) by not ignoring the body for GET request (source).
This relates because, as pointed out, exposing ES REST interface directly can lead to easy DOS attacks using complex queries. I'm still convinced though, having one less proxy is still worth it.
edit: Other option for me would be to skip CloudFront and adding a security layer as an ElasticSearch plugin as shown here

I ended coding with my own plugin. Surprisingly there was nothing quite like this around.
No proxies, no Jetty, no Tomcat.
Just a the original ES rest module and my RestFilter. Using a minimum of reflection to obtain the remote address of the requests.
enjoy:
https://github.com/sscarduzio/elasticsearch-readonlyrest-plugin

Note that even a GET request can be harmful in Elasticsearch. A query which simply takes up too much resources to compute will bring down your cluster. Facets are a good way to do this.
I'd recommend writing a simple REST API you place in front of ES so you get much more control over what hits your search cluster. If that's not an option you could consider running Nginx on your ES boxes to act as a local reverse proxy, which will give you the same control (and a whole lot more) as CloudFront does. Then you'd only have to open up Nginx to the world, instead of ES.

A way to do this in AWS would be:
Set up an Application Load Balancer in front of your ES cluster. Create a TLS cert for the ALB and serve https. Open the ES security group to the ALB.
Set up CloudFront and use the ALB as origin. Pass a custom header with a secret value (for WAF, see next point).
Set up WAF on your ALB to only allow requests that contain the custom header with the secret value. Now all requests have to go through CloudFront.
Set up a Lambda#Edge function on your CloudFront distribution to either remove the body from GET requests, or DENY such requests.
It’s quite some work, but there’s advantages over the plugin, e.g.:
CloudFront comes with free network DDOS protection
CloudFront gives your users lower latency to ES because of the fast CloudFront network and global PoP’s.
Opens many options to use CloudFront, WAF and Lamba#Edge to further protect your ES cluster.
I’m working on sample code in CDK to set all of this up. Will report back when that’s ready.

Related

Is there a way to disable Huckabuy PageSpeed product or perhaps Cloudflare workers in general via query string or HTTP header?

When working with the PageSpeed product from Huckabuy which uses Cloudflare workers to implement some page speed boosters I want to be able to bypass the behavior of the boosters without having to reconfigure the settings. Are there any ways to accomplish this that are exposed by Huckabuy or perhaps a generic way using a URL query string parameter or an HTTP header to bypass any given workers in Cloudflare?
Below is an example of what I'd like to be able to do.
https://www.example.com?huckabuy_pagespeed=false
If that's not possible then perhaps something specific to Cloudflare like the example below.
https://www.example.com?disable_cf_workers=true
Or potentially the following example HTTP header.
DISABLE_CF_WORKER: true
I don't know anything about Huckabuy specifically, but there is no general way to bypass Cloudflare Workers with special request properties. Workers are often used to implement security policies, so it's important that they cannot be bypassed.
Of course, any particular Worker is free to implement its own bypass mechanism if it so chooses.

accessing WordPress DB from remote server

Need some advice before starting develop some things.. I've 15 WordPress websites on different installs, and I've remote server which gets data 24/7 from those websites.
I've reached a point that I want the server to modify the websites based on his calculated data.
The things are this:
Should I allow the server the access the WP DB remotely and modify things without using WP on the circle?
Or, use WP REST API and supply some secured routes which provide data and accept data and make those changes?
My instinct is to use the WP API, but. After all its a PHP (nginx+apache) which have some limits (timeout for example) and I find it hard to run hard and long process on the WP itself.
I can divide the tasks to different levels, for example:
fetching data (simple get)
make some process on the remote server
loop and modify in small batches to another route
My concerns are that this circle require perfect match between remote server and WP API, and any change or fix on WP side brings plugins update on the websites which is not much fun.
Hope for any ideas and suggests to make it forward.
"use WP REST API and supply some secured routes which provide data and accept data and make those changes", indeed.
i don't know why timeout or another limits may cause a problem - but using API is the best way for such kind of cases. You can avoid timeout problems with some adjustments on web servers side.
Or you can increase memory, timeout limit exclusively for requested server.
f.e.
if ($_SERVER["remote_attr"]=='YOUR_MAIN_SERVER_IP') {
ini_set('max_execution_time',1000);
ini_set('memory_limit','1024M');
}

How to secure Symfony app from brute force and malicious traffic

I've been looking around but I couldn't find anything useful. What would be the best practice of securing a Symfony app from brute force attacks? I looked into the SecurityBundle but I couldn't find anything.
Something that I do for this is that I keep a log using event subscribers based on IP addresses and/or usernames attempting to log in. Then, if after an x amount of time an IP/User has tried to log in with a failure then I move that IP address/User to a ban list.. and after that anytime that IP/User tries to log in I deny it right away based on that ban list.
You can also play with the time between attempts and all those goodies inside the event subscriber
Let me know if it makes sense.
Use cloudflare for DDOS attacks. However it may be expensive.
You can prevent dictionary attacks using https://github.com/codeconsortium/CCDNUserSecurityBundle
Honestly I do that with my web/cache server when I need to. I recently used varnish cache to do that with a plugin called vsthrottle. (which is probably one of many things you can use on the server level) the advantage of doing it on the webserver level instead of symfony is that you are not even hitting the php level and compiling all the vendors just to end up rejecting a request, and you are not using a separate data storage (be it mysql or something fast like memcached) to log every request and compare on the next one... If the request reaches the php layer, then it already cost you some performance, and a DDOS of that type will still hurt you even if you are returning a rejection from symfony because it is causing the server to compile php and part of the symfony code.
If you insist on doing it in symfony, register a listener that listens on all requests, parse request headers for either IP addresses or X_forwarded_for (in case you are behind a load balancer in which case only the load balancer ip will keep showing with regular ip check) and then find a suitable way to keep track of all requests up to a minute old (you could probably use memcached for fast storage, with a smart way to increment counts for each ips) and if an ip hits you more than lets say 100 times per the last 1 minute, you return a forbidden response or a too many requests response instead of the usual one... But I do not recommend this as usually built solutions (like the varnish I used) are better, in my case I could throttle for specific routes and not others for example.

Using proxy to cache expensive outgoing HTTP requests?

I am using a fairly expensive external API (there's a cost per request) which makes testing code which uses it impractical.
In an ideal world, I would have a proxy server I would do my requests against which would cache each request (based on URL + query string) indefinitely and only hit the actual API server when I explicitly invalidate the cache for a given request. Is such a server available off the shelf with minimal configuration?
My current stack is Node.js, Docker, Nginx, PostgreSQL & AWS S3 (for non ephemeral state). I think Varnish might accomplish what I need but I'm not sure.
Varnish can and will accomplish that, but only if you build a 'test' API that returns some similar data you can play with. Your best bet if you have to save money, is to query the API a few times to get different typical responses. Once you know the ballpark of what to expect from it, create some sort of dummy API, or even some static JSON or XML files that you can use to mimic it. At that point you can test Varnish and Cache invalidation, and I'd be more than happy to help you with the syntax for that, given some examples of the code.

nginx to use a second node as a shadow

Is it possible to configure nginx to route all traffic to the primary node, but duplicate the requests (and ignore the response) to a second node? This is for testing a performance update to a web server in production with minimal risk.
I found Shadow Proxy but was concerned about its impact on performance and stability of a production environment.
Seems there is not a stable and high performance and easy way to do this for production environment.
here are some methods instead:
I found some blogs said using nginx-lua can do this.
In our environment, we split the traffic and force some certain requests to our sandbox server, and the ratio is under control. If have problems, only a few users will be affected, and, these group of users could be internal users, such as colleagues of your department or entire company.
Recover the requests from the access log for those "GET" requests. for POST requets, we usually using auto test cases.

Resources