I am trying to "leverage browser caching" in order to increase site speed. The webapp is hosted on pythonanywhere and I guess I need to configure the nginx.conf file to include:
location ~* \.(css|js|gif|jpe?g|png)$ {
expires 168h;
add_header Pragma public;
add_header Cache-Control "public, must-revalidate, proxy-revalidate";
}
(from here: how to Leverage browser caching in django)
However I can't find the conf file anywhere. It is not in /etc/nginx, /usr/local/etc /usr/etc ...
Can this be done on pythonanywhere ?
PythonAnywhere dev here. Unfortunately you can't change the nginx settings on our system -- but the system-default settings are actually pretty much what you'd want. If you're using the "Static files" table on the "Web" tab to specify where they are, then:
When a browser requests a static file for the first time, it's sent back with a header saying when it was last modified (based on the file timestamp).
When the browser requests the static file after that, and it has a copy in its cache, it will normally send a "if-modified-since" header with the value of the last-modified header it got the first time around.
The server will check the file timestamp, and if the file hasn't changed, it will send back an HTTP 304 ("not modified") response with no content, so the browser knows it can just used the cached one. If the file has changed, then of course it sends back a normal 200 response with the new content and an updated last-modified timestamp for the browser to cache.
Related
Definition of ETag header (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag):
The ETag HTTP response header is an identifier for a specific version
of a resource. It allows caches to be more efficient, and saves
bandwidth, as a web server does not need to send a full response if
the content has not changed. On the other side, if the content has
changed, etags are useful to help prevent simultaneous updates of a
resource from overwriting each other ("mid-air collisions").
Definition of Cache-Control header (https://developer.mozilla.org/de/docs/Web/HTTP/Headers/Cache-Control):
The Cache-Control general-header field is used to specify directives
for caching mechanisms in both requests and responses.
So the ETag header tells the browser for a resource to send a single HTTP request to the server and ask if the file hash has changed. If yes, download a new one. Great. So if the ETag header is set why should I need Cache-Control any more (beside of the Expires header which may help to avoid this single request)?
So if I have to set the Cache-Control header anyway it can only be harmful right? I think the most appropriate value would be:
Cache-Control: must-revalidate
But I am not sure if this triggers unecessary additional actions.
After some research, I found a great tutorial on Medium by Alex Barashkov: "Best practices for cache control settings for your website".
Alex writes:
I recommend you apply Cache-Control: no-cache to html files. Applying
“no-cache” does not mean that there is no cache at all, it simply
tells the browser to validate resources on the server before use it
from the cache. That’s why we need to use it with Etag, so browsers
will send a simple request and load the extra 80 bytes to verify the
state of the file.
Presence of ETag header does not tell the browser to do anything. Browser decides what to do based on the Cache-Control header it receives in the request and cached response. If it decides that resource is stale or needs to be re-validated, then it can use the ETag value to create a conditional request to the server and either get a new resource (status code 200), or a notification that things have not changed (status code 304)
Both headers are necessary for your cache to work optimally.
I have a server which hosts resources for several users on the same hostname. For example:
http://example.com/alice/blog.html
http://example.com/bob/cat.jpg
http://example.com/carol/todo.txt
I would like to allow users to specify their own response headers for resources within their directories, similar to what is done on AWS S3. For example, Carol may want her TODO list readable from scripts on another domain, so she might want Access-Control-Allow-Origin: * set for todo.txt.
While I want this feature to be as flexible as possible, I cannot allow just any response headers to be specified, as some response headers have side effects for the entire origin or hostname. For example, Set-Cookie could be used for one person's directory, but the user agent could then make a request to someone else's directory with the cookie value. As another example, a user could set Strict-Transport-Security, potentially locking out other users from using normal HTTP.
What other HTTP response headers have the potential for side effects for the entire origin, rather than just the resource that was requested? My list so far:
Alt-Svc
Public-Key-Pins
Server
Set-Cookie
Strict-Transport-Security
Rather than blocking response headers that could affect the entire domain I would recommend a slightly different approach and specify a white list of response headers that are definitely okay to use. There could be new, experimental or browser-specific headers that are non-standard but potentially affect the entire domain for a user with a specific browser.
I would suggest that the following headers are safe to use and should be everything your user needs to modify:
Access-Control-Allow-Origin
Access-Control-Allow-Credentials
Access-Control-Expose-Headers
Access-Control-Max-Age
Access-Control-Allow-Methods
Access-Control-Allow-Headers
Age
Allow
Cache-Control
Content-Disposition
Content-Encoding
Content-Language
Content-Length
Content-Location
Content-Range
Content-Type
Date
ETag
Expires
Last-Modified
Link
Location
Pragma
Retry-After
Transfer-Encoding
For static content such as files and html pages I would not set Content-Range or Content-Length manually. The server should set many of these headers automatically. Nevertheless overriding them might make sense for some users. Transfer-Encoding can be used to add gzip or deflate during transfer if your server supports it, but must not be used with HTTP/2.
Also Location, Allow and Retry-After only make sense for certain status codes. You might want to omit them
We have a downstream application which is setting some custom headers to the requests from browser before hitting nginx. nginx serves only static contents.
ie browser >> application A >> nginx
The requirement is that the nginx should be able to return all the headers which it receives as is to the downstream server which would give it back to the browser. by default its returning only the generic headers ( cookies etc, expiry etc ) and not retuning the custom ones sent by the downstream server.
For instance, there is a header with name appnumber which nginx receives with value app01. I tried to explicitly set it with the following rule to set it manually if it exist, but did not help as it throws error that variables are not allowed.
if ($appnumber) {
add_header appnumber $appnumber;
}
Can someone please guide me here?
The request headers are stored under $http_ variable. You could try something like
if ($appnumber) {
add_header appnumber $http_appnumber;
}
Refer http://nginx.org/en/docs/http/ngx_http_core_module.html and nginx - read custom header from upstream server
I am using nginx.
I have a few static files. When one of these files is modified, I would like to get that information. I was hoping to use the Last-Modified header like this.
add_header Last-Modified $time_stamp_of_uri
or something like that. But I cannot find a variable for $time_stamp_of_uri
I already looked at this list. http://nginx.org/en/docs/varindex.html
If you serve files from local disk, you should not worry about Last-Modified header - nginx will add it automatically.
If you are proxying files from backend - it's a backend business to generate all necessary headers. Not nginx, definitely.
If you have strange demand to process Last-Modified headers returned by backend - you should use built-in perl or lua (available as third-part module) for such transformation.
I have a couple of queries related to Cache-Control.
If I specify Cache-Control max-age=3600, must-revalidate for a static html/js/images/css file, with Last Modified Header defined in HTTP header:
Does browser/proxy cache(like Squid/Akamai) go all the way to origin server to validate before max-age expires? Or will it serve content from cache till max-age expires?
After max-age expiry (that is expiry from cache), is there a If-Modified-Since check or is content re-downloaded from origin server w/o If-Modified-Since check?
a) If the server includes this header:
Cache-Control "max-age=3600, must-revalidate"
it is telling both client caches and proxy caches that once the content is stale (older than 3600 seconds) they must revalidate at the origin server before they can serve the content. This should be the default behavior of caching systems, but the must-revalidate directive makes this requirement unambiguous.
b) The client should revalidate. It might revalidate using the If-Match or If-None-Match headers with an ETag, or it might use the If-Modified-Since or If-Unmodified-Since headers with a date.
a. Look at the ‘Stats’ tab on this page and see what happens.
b. After expiration the browser will check at the server if the file is updated. If not, the server will respond with a 304 Not Modified header and nothing is downloaded.
You can check this behaviour yourself by looking at the ‘Net’ panel in Firebug or similar tools. Just re-enter the URL in the address bar and compare the number of HTTP requests with the number of requests when your cache is empty.
The given answers are incorrect, at least for web browsers in 2019.
"After expiration the browser will check at the server if the file is updated" <- not true
I have a static file served with "Cache-Control: public,must-revalidate,max-age=864000" and both Chrome and Firefox do a request every time (and get a 304 Not Modified back every time).