Fingerprinted proxy cache assets - nginx

Currently I have a proxy cache setup which is very vanilla:
proxy_cache_path /path/to/cache levels=1:2 keys_zone=my_cache:10m;
server {
# ...
location / {
proxy_cache my_cache;
proxy_pass http://my_upstream;
}
}
Now I got the requirement to handle fingerprinted assets. Unfortunately the fingerprint is in the first part of the URL.
Examples:
http://www.example.com/asd9f87asdf/assets/foobar.jpg
http://www.example.com/oihllk8asdf/assets/foobar.jpg
Both requests should ask for
/assets/foobar.jpg
from proxy_pass and add the first part of the URL asd9f87asdf or oihllk8asdf to the used key in the cache.
Is it possible to extract that part of the URL and add it to the proxy cache key?

Could you just redirect any request for /*/assets to /assets? In this sense, you are still getting the initial request (which does not need to be cached), but then the redirect target would be cached?

Related

Can a response from an http request alter the base address in the next client request?

I have an octoprint server running at http://192.168.1.205. I also have an nginx server hosting myDomain. I want to be able to use the nginx server to pass on a request for http://myDomain/octo to http://192.168.1.205 using a reverse proxy. Here is my nginx code...
server {
server_name myDomain;
location /octo/ {
rewrite ^/octo/(.*) /$1 break; #strip /octo from url
proxy_pass http://192.168.1.205;
}
}
The first http://myDomain/octo request is passed on to http://192.168.1.205 correctly. But after the first response the code in the client makes another request to http://myDomain/moreUri. Since this uri doesn't have /octo nginx doesn't know to send it to http://192.168.1.205/moreUri. Is there a way to have nginx change something in the first response so that the client then makes following requests to http://myDomain/octo/moreUri?
I was able to accomplish this for a case where the octoprint server responded with a redirect. I used ...
proxy_redirect http://192.168.1.205/ http://myDomain/octo/;
and it worked. But that only works on redirects and the following requests were wrong again.
Is there a way to have nginx change something in the first response so
that the client then makes following requests to
http://myDomain/octo/moreUri?
I am not aware that this is possible.
What about to adjust the nginx configuration accordingly ? using location / to process all requests within that block and add an additional redirect directive to address the "Since this uri doesn't have /octo nginx doesn't know to send it to http://192.168.1.205/moreUri" should do the trick.
server {
server_name myDomain;
location / {
rewrite ^/octo/(.*) /$1 break; #strip /octo from url
rewrite ^/(.*)/(.*) /octo/$2 break; #rewrite /moreURI to /octo/moreURI
proxy_pass http://192.168.1.205;
}
}
No matter if the above nginx reconfiguration fixes your issue, i assume the root cause why you need to configure the nginx as reverse proxy in this way might be a misconfigured (or not optimally configured) application. Check the config file if it is possible to configure the applications base path. If so, set it to /octo/ (so the application itself prepends /octo/ to all the links it presents to the user and all requests to the backend automatically) and adjust the rewrite rules accordingly.

NGINX caching upstream server when it shouldn't be

I want to set up an NGINX server which provides the following functionality:
When a request is made NGINX to get the page at /path/to/page, it fetches the page at /path/to/page.
If the upstream server is down or NGINX can't connect to it for some reason, NGINX returns a cached version of the page if it has one.
If the cached file is over 6 hours old, don't use it, just return a 502.
If the upstream server is available, never use the cache.
I have an NGINX config here which I think should work based on my understanding of the docs, but it doesn't and I can't see why. The problem is with point (4), this NGINX server returns the cached version of the file even if the upstream server is online.
daemon off;
error_log /dev/stdout info;
events {
}
http {
proxy_cache_path
"/home/jack/Code/NGINX Caching/Codebase/cache" # Cache path
keys_zone=cache:10m # Name of cacahe, max size for keys 10 megabytes
levels=1:2 # Don't store all cached files in a single directory
max_size=500m # Max size of cache
inactive=6h; # Cached file deleted if not used within six hours
proxy_cache_valid 6h;
proxy_cache_key "$request_method$request_uri";
access_log /dev/stdout;
server {
listen 8080;
location ~ ^/(.+)$ {
proxy_pass http://0.0.0.0:8000/$1;
proxy_cache cache;
proxy_cache_valid 6h;
proxy_buffering on;
proxy_cache_use_stale error timeout;
}
}
}
Replace the proxy_cache_path with a path to a directory on your machine, and run another webserver on your machine on port 8000. When I modify a file served by the server on port 8000, NGINX doesn't see the change until I erase the cache. The issue is with NGINX and not my client (Firefox), even if I turn off caching in the browser, NGINX returns a 200 with the old file contents.
Can you please check if these two directives might help you:
proxy_cache_revalidate:
http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_revalidate
and
proxy_cache_use_stale: http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_use_stale
There is a video from nginx.conf '17 online describing all the cool things you could achive with caching: https://www.youtube.com/watch?v=xZrOjmAkFC8. maybe this is also of interest for you.
So, it seems I misunderstood the NGINX proxy cache directives. The docs are quite confusing on this subject, so I'll lay it out point by point.
This official help page gives a decent overview of the various directives, however it makes no mention of something which as it turns out is a very important conceptual building block in understanding how NGINX caching works: the notion of a cached file being stale.
NGINX's default behavior is to always use the cache if it's there, rather than querying the upstream server. With this config, a minimal config to do caching, NGINX will query the upstream server the first time a page is accessed, and then use the cached version forever after that:
events {
}
http {
proxy_cache_path
/path/to/cache
keys_zone=my_cache:10m;
proxy_cache_key "$request_method$request_uri";
server {
listen 8080;
location ~ ^/(.+)$ {
proxy_pass http://0.0.0.0:8000/$1;
proxy_cache cache;
}
}
}
You can use the proxy_cache_valid directive to tell NGINX when a cached file should be considered "stale". For example, if we set proxy_cache_valid 5m, then 5 minutes after a cache file is created NGINX will stop serving it and querying the upstream server again on the next request. If the upstream is down, NGINX will return a 502. However, during those five minutes, NGINX will still use the cache even if the upstream server is available, so this is still not what we want.
NGINX has another directive, proxy_cache_use_stale, gives NGINX conditions under which it may use cached files even if they're stale. We can combine these together to get a server which caches pages, makes them stale immediately (or almost immediately), and then only uses them if the upstream is down:
events {
}
http {
proxy_cache_path
/path/to/cache
keys_zone=my_cache:10m;
proxy_cache_key "$request_method$request_uri";
server {
listen 8080;
location ~ ^/(.+)$ {
proxy_pass http://0.0.0.0:8000/$1;
proxy_cache cache;
proxy_cache_valid 1s;
proxy_cache_use_stale error timeout;
}
}
}
This config has almost the behavior we want, except that if the upstream server goes down for an extended period of time, NGINX will continue to use the cache indefinitely. As far as I know there is no way to tell NGINX to totally invalidate/clear a cached file after a given amount of time. Normally that's what proxy_cache_valid is for, but we're already using that for a different purpose, to make files stale after 1 second so they're only used when the upstream is down. We would need some next level after "stale" that means the file is completely invalidated, but I don't think that exists in NGINX.
So the simplest solution is to just clear the cache manually. It's sufficient to just delete all files in the cache directory (or its subdirectories) which were last modified more than 6 hours ago, or whatever you want the expiry time to be. On a Linux system, you can run this script every 5 minutes, for example:
find /path/to/cache -type f -mmin +360 -delete

Redirect/Rewrite (in nginx) Requests to Subdirectory to S3-Compatible Bucket

I am attempting to get nginx to redirect/rewrite requests to a specific subdirectory so that they are served by an S3-compatible bucket instead of the server. Here is my current server block:
{snip} (See infra.)
Despite fiddling with this for some time now, I've only been able to get it to return 404s.
Additional Information
https://omnifora.com/t/redirect-rewrite-in-nginx-requests-to-subdirectory-to-s3-compatible-bucket/402
Attempts Solutions So Far
rewrite
rewrite ^/security-now/(.*) $scheme://s3.us-west-1.wasabisys.com/bits-podcasts/security-now/$1;
return
return 302 $scheme://s3.us-west-1.wasabisys.com/bits-podcasts/security-now/$1;
proxy_pass
proxy_set_header Host s3.us-west-1.wasabisys.com;
proxy_pass $scheme://s3.us-west-1.wasabisys.com/bits-podcasts/security-now/$1;
You can't use rewrite for cross-domain redirections, for this case you must use proxy_pass, for example:
location ~ ^/directory1/(.*) {
proxy_set_header Host s3.us-west-1.wasabisys.com;
proxy_pass $scheme://s3.us-west-1.wasabisys.com/target-bucket/security-now/$1;
}
Note that if you specify your server with domain name instead or IP address, you'll need to specify additional parameter resolver in your server configuration block, for example:
server {
...
resolver 8.8.8.8;
...
}
Update.
It seems I was wrong stating that you can't use rewrite for cross-domain redirections. You can, but in this case your user got HTTP 301 redirect instead of "transparent" content delivery. Maybe you got 404 error because you missed a $ sign before scheme variable?

FastCGI cache key while ignoring GET requests but allow certain GET requests to bypass cache

I have a WordPress installation running on with Woocommerce which requires the GET request add_to_cart to function in order for users to add items to the cart.
I have FastCGI cache key similar to
How to set fastcgi_cache_key using original $request_uri without $args?
# Map request_path var without query strings
map $request_uri $request_path {
~(?<captured_path>[^?]*) $captured_path;
}
# FastCGI Cache
fastcgi_cache_path /var/run/nginx-cache levels=1:2 keys_zone=WORDPRESS:100m inactive=60m;
fastcgi_cache_key "$scheme$request_method$host$request_path$cookie_aelia_cs_selected_currency";
Is there any way to bypass FastCGI for these requests and for NGINX to just serve the page?
I am pretty new to NGINX and so any help would be very much appreciated
UPDATE
I have tried to use the following in the server block
if ($arg_name ~* "(add-to-cart|remove-item)") {
set $no_cache 1;
}
The server is still ignoring this and serving the cached file.
SOLVED NOW
Solved in the end by adding .*add-to-cart.* in the checks for request.
# Don't cache uris containing the following segments
if ($request_uri ~* "/wp-admin/|/xmlrpc.php|wp-.*.php|/feed/|index.php|sitemap(_index)?.xml|.*add-to-cart.*") {
set $no_cache 1;
}

best way to save nginx request as a file?

i am looking for a solution to save data sent via http (e.g. as a POST) as quickly as possible (with lowest overhead) via nginx (v1.2.9). i tried the following nginx configuration, but am not seeing any files written in the directory:
server {
listen 9199;
location /saveme {
client_body_in_file_only on;
client_body_temp_path /tmp/bodies;
}
}
what am i doing wrong? and/or is there a better way to accomplish this? (the data that is written should ideally be one file per request, and it does not matter if it is fairly "raw" in nature. post-processing of the files will be done via a separate process via a queue.)
This question has already been answered here:
Basically, you need to combine log_format and fastcgi_pass. You can then use the access_log directive for example, to specify where the saved variable should be dumped to.
location = /saveme {
log_format postdata $request_body;
access_log /var/log/nginx/postdata.log postdata;
fastcgi_pass php_cgi;
}
It could also work with your method but I think you're missing client_body_buffer_size and `client_max_body_size
Do you mean save cache for HTTP post while someone access and request file and store on hdd rather than memory?
I may suggest use proxy_cache_path and proxy_cache. The proxy_cache_path directive sets the path and configuration of the cache, and the proxy_cache directive activates it.
proxy_cache_path /path/to/cache levels=1:2 keys_zone=my_cache:10m max_size=10g
inactive=60m use_temp_path=off;
server {
...
location / {
proxy_cache my_cache;
proxy_pass http://my_upstream;
}
}
The local disk directory for the cache is called /path/to/cache
levels sets up a two‑level directory hierarchy under /path/to/cache/
keys_zone sets up a shared memory zone for storing the cache keys and metadata such as usage timers
max_size sets the upper limit of the size of the cache
inactive specifies how long an item can remain in the cache without being accessed
the proxy_cache directive activates caching of all content that matches the URL of the parent location block (in the example, /). You can also include the proxy_cache directive in a server block; it applies to all location blocks for the server that don’t have their own proxy_cache directive.

Resources