Lets say I have a file on a server with cache-headers indicating it should be cached. Will the response of a HEAD request to that file be cached as well?
RFC 2616, 9.4 HEAD:
The response to a HEAD request MAY be cacheable in the sense that the
information contained in the response MAY be used to update a
previously cached entity from that resource.
It doesn't really make sense to cache the response to a HEAD request itself, as it contains no entity.
Related
If I send a GET xhr from the browser and the response comes with the correct headers, then the resource is stored in the cache. But I was wondering what would happen if I then send a PUT to the same url. Is the browser smart enough to know that this method will change the resource and so evict the entry from the cache?
I found this in the rfc:
A cache MUST invalidate the effective Request URI (Section 5.5 of
[RFC7230]) as well as the URI(s) in the Location and Content-Location
response header fields (if present) when a non-error status code is
received in response to an unsafe request method.
The unsafe methods include PUT/POST/DELETE, so browsers should evict the entries after requests with these methods are successful (status code 2xx or 3xx).
Though one should be aware that a resource can change even if no requests that modify it are made.
If you fetch an image to display a second or n+1 times, or fetch some JSON likewise, and nothing has changed, then the browser shouldn't actually download/fetch the content. This is how GET requests work with caching.
But I'm wondering, hypothetically, if instead of using GET you use PATCH to fetch the image or JSON. Wondering if the browser can still use its cached version if nothing has changed, or what needs to be done to make PATCH work like a GET so that it doesn't fetch cached content.
It's important to understand that PATCH is not for fetching anything. You're making a change on the server and the response may have information about how the change was applied.
HTTP requests other than GET sometimes can be cacheable. To find out if PATCH is, you can read the RFC's. The RFC has this to say:
A response to this method is only cacheable if it contains explicit freshness information (such as an Expires header or
"Cache-Control: max-age" directive) as well as the Content-Location
header matching the Request-URI, indicating that the PATCH response
body is a resource representation. A cached PATCH response can only
be used to respond to subsequent GET and HEAD requests; it MUST NOT
be used to respond to other methods (in particular, PATCH).
This already suggest 'no', doing a PATCH request twice will not result in the second to be skipped.
A second thing to look out for with HTTP methods is if they are idempotent or safe. PATCH is neither.
RFC7231 has this to say about cacheable methods:
In general, safe methods that
do not depend on a current or authoritative response are defined as
cacheable; this specification defines GET, HEAD, and POST as
cacheable, although the overwhelming majority of cache
implementations only support GET and HEAD.
Both of these suggest that 'no', PATCH is not cacheable, and there's no set of HTTP headers that will make it so.
I am looking for an appropriate HTTP status code that tells the receiver that just the meta-data is being sent, not the complete data.
For example, say you do an HTTP GET:
GET /foo?meta_data_only=yes
the server won't look up the complete data, just send some metadata back about the endpoint, for example. Is there an HTTP status code for the response that can represent this? I would guess it's in the 200s or 300s somewhere?
Since your metadata is being returned in the headers, I would send a status code of 204 No Content.
https://httpstatuses.com/204
The server has successfully fulfilled the request and that there is no
additional content to send in the response payload body.
Metadata in
the response header fields refer to the target resource and its
selected representation after the requested action was applied.
This sounds exactly like what you’re looking for: a successful response that contains no body, and metadata in the headers that provide additional about the resource.
Another thing worth noting is that it’s common practice to use the HTTP verb HEAD when you only want metadata. HEAD is very similar to GET, except that it specifies that you do not want a body back. For example if you do a HEAD to an image url, you will get a 204 No Content response and some metadata about the file such as Content-Type, Content-Size, maybe ETag, but you won’t be sent all of the file data. A lot of web servers (such as Nginx) support this behavior out of the box for static files. I would recommend that you stop using your querystring parameter, and instead implement HEAD versions of your endpoints. That would make the intention even more clear and intuitive.
I'm wondering how the browser determines whether a cached resource has expired or not.
Assume that I have set the max-age header to 300. I made a request at 14:00, 3 minutes later I made another request to the same resource. So how can the browser tell the resource haven't expired (the current age which is 180 is less than the max-age)? Does the browser hold a "expiry date" or "current age" for every requested resource? If so how can I inspect the "current age" at the time I made the request?
Check what browsers store in their cache
To have a better understanding on how the browser cache works, check what the browsers actually store in their cache:
Firefox: Navigate to about:cache.
Chrome: Navigate to chrome://cache.
Note that there's a key for each cache entry (requested URL). Associated with the key, you will find the whole response details (status codes, headers and content). With those details, the browser is able to determine the age of a requested resource and whether it's expired or not.
The reference for HTTP caching
The RFC 7234, the current reference for caching in HTTP/1.1, tells you a good part of the story about how cache is supposed to work:
2. Overview of Cache Operation
Proper cache operation preserves the semantics of HTTP transfers
while eliminating the transfer of information already
held in the cache. Although caching is an entirely OPTIONAL feature
of HTTP, it can be assumed that reusing a cached response is
desirable and that such reuse is the default behavior when no
requirement or local configuration prevents it. [...]
Each cache entry consists of a cache key and one or more HTTP
responses corresponding to prior requests that used the same key.
The most common form of cache entry is a successful result of a
retrieval request: i.e., a 200 (OK) response to a GET request, which
contains a representation of the resource identified by the request
target. However, it is also possible to
cache permanent redirects, negative results (e.g., 404 (Not Found)),
incomplete results (e.g., 206 (Partial Content)), and responses to
methods other than GET if the method's definition allows such caching
and defines something suitable for use as a cache key.
The primary cache key consists of the request method and target URI.
However, since HTTP caches in common use today are typically limited
to caching responses to GET, many caches simply decline other methods
and use only the URI as the primary cache key. [...]
Some rules are defined regarding storing responses in caches:
3. Storing Responses in Caches
A cache MUST NOT store a response to any request, unless:
The request method is understood by the cache and defined as being
cacheable, and
the response status code is understood by the cache, and
the no-store cache directive does not appear
in request or response header fields, and
the private response directive does not
appear in the response, if the cache is shared, and
the Authorization header field does
not appear in the request, if the cache is shared, unless the
response explicitly allows it, and
the response either:
contains an Expires header field, or
contains a max-age response directive, or
contains a s-maxage response directive
and the cache is shared, or
contains a Cache Control Extension that
allows it to be cached, or
has a status code that is defined as cacheable by default, or
contains a public response directive.
Note that any of the requirements listed above can be overridden by a cache-control extension. [...]
Usually (but not always) the server providing the resource will provide a Date header, indicating the time at which that resource was requested. Caching entities can use that Date and the current time to find the resource's age. If the Date response header does not appear, that the caching entity will probably mark the resource's request time in other metadata, and use that metadata for computing the age. Another possibly helpful response header to look for is the Last-Modified response header.
So first, you should check if the cached resource has the Date header for your own age calculation. If not present, it will then depend on which specific browser you are using, and how that browser handles caching for Date-less resources. More information on HTTP caching and the various factors involved, can be found in this caching tutorial.
Hope this helps!
I'm trying to understand what exactly is the difference between "status 304 not modified" and "200 (from cache)"
I'm getting 304 on javascript files that I changed last. Why does this happen?
Thanks for the assistance.
https://sookocheff.com/post/api/effective-caching/ is an excellent source to form the required understanding around these 2 HTTP status codes.
After reading this thoroughly, I had this understanding
In typical usage, when a URL is retrieved, the web server will return the resource's current representation along with its corresponding ETag value, which is placed in an HTTP response header "ETag" field. The client may then decide to cache the representation, along with its ETag. Later, if the client wants to retrieve the same URL resource again, it will first determine whether the local cached version of the URL has expired (through the Cache-Control and the Expire headers). If the URL has not expired, it will retrieve the local cached resource. If it determined that the URL has expired (is stale), then the client will contact the server and send its previously saved copy of the ETag along with the request in a "If-None-Match" field. (Source: https://en.wikipedia.org/wiki/HTTP_ETag)
But even when expires time for an asset is set in future, browser can still reach the server for a conditional GET using ETag as per the 'Vary' header. Details on 'vary' header: https://www.fastly.com/blog/best-practices-using-vary-header/
"200 (from cache)" means the browser found a cached response for the request made. So instead of making a network call to fetch that resource from the remote server, it simply made use of the cached response.
Now these cached responses have a life time associated with them. With time, the responses can become stale. And if these are not allowed to be served stale (see section 4.2.4 - RFC7234), the browser needs to contact the remote server and verify whether these cached responses are still valid or not. The 304 response status code is the servers' way of letting the browser know that the response hasn't changed and is still valid.
If the browser receives a 304 response, it can "freshen" the relevant stale response(s). And subsequent requests to the resource can again be served from the browser cache without checking with the remote server (until the response becomes stale again).
304 Modified
A 304 Not Modified means that the files have not been modified since the specified version by the "If-Modified-Since", or "If-None-Match".
200 OK
This is the response you'll get if an HTTP request works. GET requests will have something relating to the files. A POST request will have something holding the result of the action.
Happy coding!Lyfe