Haproxy/Nginx partial URL based hash upstream - nginx

We know that HAProxy and Nginx can do the URL hash based upstream but how can we hash the part of the URL.
We have 4 back-end original image servers, each will store all the original large size image files. The image server will resize the file based on user request on the fly. (Tomcat Java load the file into memory and resize then response)
The original file is:
http://imageserver.company.com/path/to/imageA.jpg
The end-user will request:
httpurl://imageserver.company.com/path/to/imageA.jpg/crop/400x300.jpg
httpurl://imageserver.company.com/path/to/imageA.jpg/400x224.jpg
httpurl://imageserver.company.com/path/to/imageA.jpg/1280x720.jpg
I would like HAProxy and Nginx will do the hash on "/path/to/imageA.jpg";
Hash (substring (url, 0, find (url, ".jpg/")
Any idea of how to config?

In nginx you can use the map and upstream::hash directives:
map $uri $image_hash {
default $uri;
"~(?<image_path>.+(?:jpg|png))/" $image_path;
}
upstream image_backends {
hash $image_hash;
server server1;
server server2;
server server3;
server server4;
}
server {
...
location / {
# add debug header to view the hash
add_header ImageHash $image_hash;
proxy_pass http://image_backends;
}
}
I'm not sure what the exact syntax would be for HAProxy, but it's uri hash supports specifying the "depth" of the URI hash. So if the original path to the URL has a fixed depth then you can use that (though I'm guessing that's not the case)?
The "depth" parameter indicates the maximum directory depth
to be used to compute the hash. One level is counted for each
slash in the request. If both parameters are specified, the
evaluation stops when either is reached.

Related

NGINX - different backend proxy based on query parameter

I've got a particular scenario where I'm needing to route to a different backend based on query parameter:
https://edge1.cdn.com/file.zip?{secure_link}&{tokens}&route=aws1
Where aws1 would be say http://edge1.amazonwebservices.com
and if its aws2 then proxy backend would be http://edge2.amazonwebservices.com
and so on... but I still have not figured out how to do this.
You can use map directive to get a proxy hostname from the $arg_route variable (which contains a value of the route query argument):
map $arg_route $aws {
aws1 edge1.amazonwebservices.com;
aws2 edge2.amazonwebservices.com;
...
default <default_hostname>;
}
server {
...
# if you want to proxy the request, you'd need a 'resolver' directive
resolver <some_working_DNS_server_address>;
location / {
# if you want to proxy the request
proxy_pass http://$aws;
# or if you want to redirect the request
rewrite ^ http://$aws$uri permanent;
}
}
If you don't want to serve the request without route query argument, you can omit the last default line at the map block and add the following if block to your server configuration:
if ($aws = '') {
return 403; # HTTP 403 denied
}
If you need to proxy the request you'd additionally need a resolver directive (you can read some technical details about it in this article).

Nginx auth_request handler accessing POST request body?

I'm using Nginx (version 1.9.9) as a reverse proxy to my backend server. It needs to perform authentication/authorization based on the contents of the POST requests. And I'm having trouble reading the POST request body in my auth_request handler. Here's what I got.
Nginx configuration (relevant part):
server {
location / {
auth_request /auth-proxy;
proxy_pass http://backend/;
}
location = /auth-proxy {
internal;
proxy_pass http://auth-server/;
proxy_pass_request_body on;
proxy_no_cache "1";
}
}
And in my auth-server code (Python 2.7), I try to read the request body like this:
class AuthHandler(BaseHTTPServer.BaseHTTPRequestHandler):
def get_request_body(self):
content_len = int(self.headers.getheader('content-length', 0))
content = self.rfile.read(content_len)
return content
I printed out the content_len and it had the correct value. However, the self.rfile.read() will simply hang. And eventually it will time out and returns "[Errno 32] Broken pipe".
This is how I posted test data to the server:
$ curl --data '12345678' localhost:1234
The above command hangs as well and eventually times out and prints "Closing connection 0".
Any obvious mistakes in what I'm doing?
Thanks much!
The code of the nginx-auth-request-module is annotated at nginx.com. The module always replaces the POST body with an empty buffer.
In one of the tutorials, they explain the reason, stating:
As the request body is discarded for authentication subrequests, you will
need to set the proxy_pass_request_body directive to off and also set the
Content-Length header to a null string
The reason for this is that auth subrequests are sent at HTTP GET methods, not POST. Since GET has no body, the body is discarded. The only workaround with the existing module would be to pull the needed information from the request body and put it into an HTTP header that is passed to the auth service.

Proxy a request - get a parameter from URL, add a header and update request URL using Nginx

I am looking for a way to do the following using Nginx:
Intercept a request
Read URL, parse it and read a value from it.
Add that value as a new request header
Update the URL (remove a particular value)
Forward the request to another server
e.g
Request URL - http://<<nginx>>/test/001.xml/25
Final URL - http://<<server>>/test/001.xml with header (x-replica: 25)
I have a nginx server setup with a upstream for the actual server. I was wondering how do I setup Nginx to achieve this ?
Since the data exists within the request URI itself (available by the $uri variable in nginx), you can parse that using the nginx lua module. nginx will need to be compiled with lua for this to work, see: openresty's nginx lua module.
From there you can use the set_by_lua_block or set_by_lua_file directive given $uri as a parameter.
In configuration this would look something like:
location / {
...
set_by_lua_file $var_to_set /path/to/script.lua $uri;
# $var_to_set would contain the result of the script from this point
proxy_set_header X-Replica $var_to_set;
...
}
In script.lua we can access the $uri variable from in the ngx.arg list (see these docs):
function parse_uri( uri )
parsed_uri = uri
-- Parse logic here
return parsed_uri
end
return parse_uri( ngx.arg[1] )
Similarly, you can modify this function or create another to make a variable with the updated $uri.

how to use url pathname as upstream hash in nginx

I have a nginx server with config to use queryparam as upstream hash. Url looks like below
http://www.my-server.com/xyz/WXYZ?abc=123
And configuration as below
upstream test {
hash $arg_abc;
....
}
is there any possibility to use WXYZ part of URL as upstream hash?
WXYZ is dynamic value and xyz is always same and will be there.
this is what I tried,
location ~ ^/xyz/(.).*$ {
hash $1
}
The deployment guide explicitly said it's possible:
The generic hash method: the server to which a request is sent is
determined from a user-defined key which may be a text, variable, or
their combination. For example, the key may be a source IP and port,
or URI:
upstream backend {
hash $request_uri consistent;
server backend1.example.com;
server backend2.example.com;
}
The hash key is $request_uri which can be replaced with $arg_your_key but not sure is works with upstream block, however it should work as proxy_pass value:
location /xyz {
proxy_pass http://localhost/$uri$is_args$args;
}
Not sure of requirements but if you need to use certain backend based on argument $arg_abc you need map function, like here:
map $arg_abc $backend_server {
default 'serverdefault.domain.com:80';
123 'server1.domain.com:80';
234 'server2.domain.com:80';
345 'server3.domain.com:80';
}
server {
location / {
proxy_pass http://$backend_server;
}
}
Yes, as per the documentation for hash, you can only use it in the upstream context, so what you've tried won't work indeed.
However, why exactly do you need to use only a certain path from your URI, instead of the whole thing, if those other parts stay the same anyways? I think the idea is that the whole string is supposed to be further hashed anyways, so, even if all your URLs start the same, the hash function is still supposed to distribute everything evenly. So, you can most likely just use $request_uri or $uri as your hash.
Alternatively, if you still want to do it your way, you might try to use named pattern matching in your location (location ~ ^/xyz/(?<varForHash>.).*$ {…), and then use the variables from such matches ($varForHash) as your hash (you could probably even use $1 from your example, too, just in the proper context — upstream).
I got similar task and I solved it.
I created upstream.conf and added it to the nginx.conf.
The upstream.conf content is below:
map $uri $myvar{
default $uri;
# pattern "Method1" + "/" + GUID + "/" + target parameter + "/" + HASH;
"~*/Method1/(.*)/(.*)/(.*)$" $2;
# pattern "Method2" + "/" + GUID + "/" + target parameter;
"~*/Method2/(.*)/(.*)$" $2;
}
upstream backend {
hash $myvar consistent;
server s1:80;
server s2:80;
}

how to avoid nginx to replace %20 by whitespace when using as a proxy (proxy_pass) ?

I am using a nginx as a proxy for an apache server.
Here is my config:
location ~ ^/subsite/(.*)$ {
proxy_pass http://127.0.0.1/subsite/$1?$query_string;
}
the problem is that if I send a request with %20 like mywebsite.com/subsite/variable/value/title/Access%20denied/another/example
the %20 is replaced by a whitespace, and apache don't care about all the end of the request after Access /title/Access
Any Idea ?
I was able to solve a similar issue -- we have an api that requires the search terms to be part of the URL path. Passing the output directly to the proxy_pass directive caused it to throw a 502 even though the request was properly url encoded.
Here's the solution we came up with:
location ~ /api/search(/.*) {
set $query $1;
proxy_pass http://127.0.0.1:3003$query;
}
The "set" directive seems to keep the url encoding intact (or re-encodes from what the regex is passing back in $1).

Resources