Is server code executed when server returns http status code 304 - http

Does/should server code run when the server return status code 304?
I understand that the server should not return anything (client should use the cache), but I cant find any info on whether the server will executed the code in an api endpoint for example.

RFC 2616 section 10.3.5 describes the 304 Not Modified response:
If the client has performed a conditional GET request and access is
allowed, but the document has not been modified, the server SHOULD
respond with this status code. The 304 response MUST NOT contain a message-body
The server will send Date, ETag and/or Content-Location (200 only), or Expires, Cache-Control, and/or Vary if the respond may vary.
What is a Conditional GET?
An cliet application, browser, or proxy with a retained Last-Modified or Etag value will issue a Conditional GET as an initial header only. This allows the client to determine if the resource has been updated.
How does the client know if the resource changed?
Well it depends on how the server is configured.
The Origin Server May:
Ignore caching, serve every request new.
Similar to ignoring, you may develop the application to change query strings. This prevents caching at Proxy Servers and invalidates the client cache.
If configured to do so, issue a Last-Modified or Etag value. Often done for static content. Proxy Servers and Client Caches use to invalidate their version.
A Web application could issue a Last-Modified far
into the future then change the URL to invalidate stale content. This requires the application be developed with this feature in mind.
Resources may also be issued version numbers. This allows them to preserve Proxy Caches but invalidate Client Caches.
Does server code run when the server return status code 304?
Almost Never. Disregarding that it is technically possible for an incorrectly configured application to respond with a 304 Not Modified code instead of a 200 code.
With a ETag and/or Content-Location value, a server (nginx for example) can confirm nothing has changed without issued a call to the application. This also neatly handles resources with version numbers the same way.
For query strings (image.jpg?version=12), the client cache will invalidate the content. A Proxy Server will also invalidate, and the query will be requested fresh.
I understand that the server should not return anything (client should use the cache), but I cant find any info on whether the server will executed the code in an api endpoint for example.
I'm a fan of nginx, here is a good resource on how caching applies to it.
In short, as much as you can do to support various caches between your client and the application the more requests you can support per day.

Related

Cache resource for some time and then validate it using HTTP caching

Considering that the server responds with the following headers:
Cache-Control: public
Expires: <EXPIRATION DATE>
ETag: <HASH VALUE>
Both <EXPIRATION DATE> and the <HASH VALUE> are not changed if the underlying resource is not actually updated.
I'm I correct to expect the following:
all intermediate proxy-servers (including the CDN) will consider this resource public and safe to cache.
all intermediate proxy-servers (including the CDN) as well as browsers will consider this resource fresh until the <EXPIRATION DATE> and will return it from the cache without accessing the network. However, after the <EXPIRATION DATE> they all will use HTTP validation mechanism with every request to check if the resource is outdated.
So, if the resource is updated after the <EXPIRATION DATE> I can safely expect that all the clients will receive the fresh version of the resource with the next request (because HTTP validation will fail due to the ETag's change)?
I'm interested both from the standards perspective (RFCs) as well as from the real life perspective (e.g. known browser and proxy quirks).
I would like my resource to be fresh for e.g. one day from the time the file is actually updated on the server and to be always returned from the cache. However, after one day, I would like all the client to receive the fresh copy only if the file was actually changed (using HTTP validation mechanism).
As Kevin's comment says:
in terms of the standard your analysis is correct
It's difficult to answer in terms of "known browser and proxy quirks" without knowing your engineering requirements. It sounds like you may be serving static content; consider services like S3 and CloudFront.
For this design, from your expectations:
browsers will consider this resource fresh until the and will return it from the cache without accessing the network
Most browsers will still reach out to the network when a resource is directly referenced, even when it's still fresh in their cache. That should be a conditional request, but it's still network traffic.(immutable may help.)
Any cache may evict the resource; for one CDN:
If a file in an edge location isn't frequently requested, CloudFront might evict the file
If your intention is to reduce load on your origin server, it's a good strategy. You're making correct use of Expires, Cache-Control: public and ETag, assuming you're also handling conditional requests correctly. In practice, you should:
be ready for browsers to make more than a single request in a 24 hour period
be ready to tune your CDN and confirm that it's respecting those headers, and that all requests lead to the same cache key
expect more than a single request per day to your origin server

Caching reverse proxy for dynamic content

I was thinking about asking on Software Recommendations, but then I've found out that it may be a too strange request and it needs some clarification first.
My points are:
Each response contains an etag
which is a hash of the content
and which is globally unique (with sufficient probability)
The content is (mostly) dynamic and may change anytime (expires and max-age headers are useless here).
The content is partly user-dependent, as given by the permissions (which itself change sometimes).
Basically, the proxy should contain a cache mapping the etag to the response content. The etag gets obtained from the server and in the most common case, the server does not deal with the response content at all.
It should go like follows: The proxy always sends a request to the server and then either
1 the server returns only the etag and the proxy makes a lookup based on it and
1.1 on cache hit,
it reads the response data from cache
and sends a response to the client
1.2 on cache miss,
it asks the server again and then
the server returns the response with content and etag,
the proxy stores it in its cache
and sends a response to the client
2 or the server returns the response with content and etag,
the proxy stores the data in its cache
and sends a response to the client
For simplicity, I left out the handling of the if-none-match header, which is rather obvious.
My reason for this is that the most common case 1.1 can be implemented very efficiently in the server (using its cache mapping requests to etags; the content isn't cached in the server), so that most requests can be handled without the server dealing with the response content. This should be better than first getting the content from a side cache and then serving it.
In case 1.2, there are two requests to the server, which sounds bad, but is no worse than the server asking a side cache and getting a miss.
Q1: I wonder, how to map the first request to HTTP. In case 1, it's like a HEAD request. In case 2, it's like GET. The decision between the two is up to the server: If it can serve the etag without computing the content, then it's case 1, otherwise, it's case 2.
Q2: Is there a reverse proxy doing something like this? I've read about nginx, HAProxy and Varnish and it doesn't seem to be the case. This leads me to Q3: Is this a bad idea? Why?
Q4: If not, then which existing proxy is easiest to adapt?
An Example
A GET request like /catalog/123/item/456 from user U1 was served with some content C1 and etag: 777777. The proxy stored C1 under the key 777777.
Now the same request comes from user U2. The proxy forwards it, the server returns just etag: 777777 and the proxy is lucky, finds C1 in its cache (case 1.1) and sends it to U2. In this example, neither the clients not the proxy knew the expected result.
The interesting part is how could the server know the etag without computing the answer. For example, it can have a rule stating that requests of this form return the same result for all users, assuming that the given user is allowed to see it. So when the request from U1 came, it computed C1 and stored the etag under the key /catalog/123/item/456. When the same request came from U2, it just verified that U2 is permitted to see the result.
Q1: It is a GET request. The server can answer with an "304 not modified" without body.
Q2: openresty (nginx with some additional modules) can do it, but you will need to implement some logic yourself (see more detailed description below).
Q3: This sounds like a reasonable idea given the information in your question. Just some food for thought:
You could also split the page in user-specific and generic parts which can be cached independently.
You shouldn't expect the cache to keep the calculated responses forever. So, if the server returns a 304 not modified with etag: 777777 (as per your example), but the cache doesn't know about it, you should have an option to force re-building the answer, e.g. with another request with a custom header X-Force-Recalculate: true.
Not exactly part of your question, but: Make sure to set a proper Vary header to prevent caching issues.
If this is only about permissions, you could maybe also work with permission infos in a signed cookie. The cache could derive the permission from the cookie without asking the server, and the cookie is tamper proof due to the signature.
Q4: I would use openresty for this, specifically the lua-resty-redis module. Put the cached content into a redis key-value-store with the etag as key. You'd need to code the lookup logic in Lua, but it shouldn't be more than a couple of lines.

Nginx: allow only certain cookies in http response

I'm using playframework and nginx. playframework may add following cookies to http response: PLAY_SESSION, PLAY_FLASH, PLAY_LANG.
I want to make sure that only above cookies (PLAY_*) are allowed in nginx level. If there are other cookies (let's say they're added accidentally) they should be removed by nginx.
How can I allow only predefined cookies in http response in nginx?
PS: If it's not possible to solve this issue in nginx, I need to fix by using playframework.
How cookies work?
First, let's establish what's cookies — they're little pieces of "sticky" hidden information that lets you keep state on your web-site for a given User-Agent. These cookies are often used for tracking users, keeping session and storing minor preference information for the site.
Set-Cookie HTTP response header (from server to client)
Cookies can be set by the server through the Set-Cookie response header (with a separate header for each cookie), or, after the page has already been transferred from the server to the client, through JavaScript.
Note that setting cookies is a pretty complex job — they have expiration dates, http/https settings, path etc — hence the apparent necessity to use a separate Set-Cookie header for each cookie.
This requirement to have a separate header is usually not an issue, since cookies aren't supposed to be modified all that often, as they usually store very minimal information, like a session identifier, with the heavy-duty information being stored in an associated database on the server.
Cookie HTTP request header (from client to server)
Regardless how they were first set, cookies would then included in eligible subsequent requests to the server by the client, using the Cookie request header, with a whole list of eligible cookies in one header.
Note that, as such, these cookies that are sent by the client back to the server is a simple list of name and attribute pairs, without any extra information about the underlying cookies that store these attributes on the client side (e.g., the expiration dates, http/https setting and paths are saved by the client internally, but without being revealed in subsequent requests to the server).
This conciseness of the Cookie request header field is important, because, once set, eligible cookies will be subsequently included in all forthcoming requests for all resources with the eligible scheme / domain / path combination.
Caching issues with cookies.
The normal issue of using cookies, especially in the context of acceleration and nginx, is that:
cookies invalidate the cache by default (e.g., unless you use proxy_ignore_headers Set-Cookie;),
or, if you do sloppy configuration, cookies could possibly spoil your cache
e.g., through the client being able to pass cookies to the upstream in the absence of proxy_set_header Cookie "";,
or, through the server insisting on setting a cookie through the absence of proxy_hide_header Set-Cookie;.
How nginx handles cookies?
Cookie from the client
Note that nginx does support looking through the cookies that the client sends to it (in the Cookie request header) through the $cookie_name scheme.
If you want to limit the client to only be sending certain cookies, you could easily re-construct the Cookie header based on these variables, and send only whichever ones you want to the upstream (using proxy_set_header as above).
Or, you could even make decisions based on the cookie to decide which upstream to send the request to, or to have a per-user/per-session proxy_cache_key, or make access control decisions based on the cookies.
Set-Cookie from the backend
As for the upstream sending back the cookies, you can, of course, decide to block it all to considerably improve the caching characteristics (if applicable to your application, or parts thereof), or fix up the domain and/or path with proxy_cookie_domain and/or proxy_cookie_path, respectively.
Otherwise, it's generally too late to make any other routing decision — the request has already been processed by the selected upstream server, and the response is ready to be served — so, naturally, there doesn't seem to be a way to look into these individual Set-Cookie cookies through normal means in nginx (unless you want to go third-party modules, or lua, or perl), since it'd already be too late to make any important routing decisions for a completed request.
Basically, these Set-Cookie cookies have more to do with the content than with the way it is served or routed, so, it doesn't seem appropriate to have integrated functionality to look into them through nginx.
(If you do need to make routing decisions after the completion of the request, then nginx does support X-Accel-Redirect, as well as some other special headers.)
If your issue is security, then, as I've pointed out above, the upstream developer can already use JavaScript to set ANY extra cookies however they want, so, effectively, trying to use nginx to limit some, but not all, Set-Cookie responses from the server is kind of a pointless endeavour in the real world (as there is hardly any difference between the cookies set through JavaScript compared to Set-Cookie).
In summary:
you can easily examine and reconstruct the Cookie header sent by the client to the server before passing it over to the backend, and only include the sanctioned cookies in the request to upstream backend,
but, unless you want to use lua/perl, or have your own nginx module (as well as possibly quarantine the JavaScript from the pages you serve), then you cannot pass only certain Set-Cookie headers back from the upstream backend to the client with a stock nginx.conf — with the Set-Cookie headers, it's an all-or-nothing situation, and there doesn't seem to be a good-enough use-case for a distinct approach.
For an Nginx solution it might be worth asking over at serverfault. Here is a potential solution via Play Framework.
package filters
import javax.inject._
import play.api.mvc._
import scala.concurrent.ExecutionContext
#Singleton
class ExampleFilter #Inject()(implicit ec: ExecutionContext) extends EssentialFilter {
override def apply(next: EssentialAction) = EssentialAction { request =>
next(request).map { result =>
val cookieWhitelist = List("PLAY_SESSION", "PLAY_FLASH", "PLAY_LANG")
val allCookies = result.newCookies.map(c => DiscardingCookie(c.name))
val onlyWhitelistedCookies = result.newCookies.filter(c => cookieWhitelist.contains(c.name))
result.discardingCookies(allCookies: _*).withCookies(onlyWhitelistedCookies: _*)
}
}
}
This solution utilizes Filters and Result manipulation. Do test for adverse effects on performance.

Recognize HTTP 304 in service worker / fetch()

I build a service worker which always responds with data from the cache and then, in the background, sends a request to the server. If the server responds with HTTP 304 - not modified everything is fine, if the server responds with HTTP 200, that means the data was changed and the new file is put into the cache, also the user is notified and asked for a page refresh.
I use the not-modified-since / last-modified headers to make sure the client gets the most up-to-date version. When a request is sent via fetch() the request passes the HTTP-cache on it's way to the network - also the response passes the HTTP cache when it arrives on the client. The problem is when the response has the status 304 - not modified the HTTP cache responds to the service worker with the cached version and replaces the status with 200 (as it is described in the fetch specification - HTTP-network-or-cache-fetch). In the service worker there is no possibility to find out whether the 200 response was initially sent by the server (the user needs to be updated) or it was sent by the cache and the server originally responded with 304 (most up-to-date version is already loaded).
There is the cache-mode flag which can be set to no-cache, but this bypasses the HTTP-cache also when the request is sent to the server, which means that the if-modified-since header is not set and the server has no chance to find out which version the client has. Also this flag is only supported by firefox nightly right now.
I think the best solution is to set a custom HTTP header like x-was-modified when the server responds with 200. This custom header can be accessed in the service worker and can be used to find out whether a resource was updated or not - even if the HTTP cache replaces the 304 status with 200.
Is this a legit solution/workaround? Are there any suggested approaches to solve this problem?
Should I even rely on HTTP headers which are supposed to handle the HTTP cache when implementing the service worker cache? Or should I rather use custom x-if-modified-since / x-last-modified headers and use indexedDB to store the information on the client and append it to each request?
Why does fetch() even replace the 304 code with 200 if there is a up-to-date version in the cache?
You can’t rely on the status code (304 vs. 200) to determine whether something has changed. What if some other part of your code requests the same resource, thus updating the browser’s cache?
Instead, simply compare the response’s Last-Modified header to what you sent in If-Modified-Since, or whatever you last saw in Last-Modified. If the values don’t match, something has changed.
For more precision (if the data can change several times in 1 second), consider using ETag instead of Last-Modified.
Why does fetch() even replace the 304 code with 200 if there is a up-to-date version in the cache?
Because usually people just want to get fresh content, regardless of where it comes from. A 304 response is only interesting to those who implement their own HTTP caches.

Should HTTP Client parse HTTP Headers in response with the error 404 Not Found

I cannot find any RFC or Standard of HTTP client behavior in case it gets HTTP response with an error 4xx. I know the 401, 407 are the examples when the HTTP headers are parsed, but...
I have the concrete problem for OPTIONS method (HTTP1.1). The server responses 401 Unauthorized, so client tries to authenticate and re-sends the request with an authentication. After that the response has the error 404 Not Found and HTTP header is filled with Set-Cookie HTTP Header. The client use Apache Java HTTPClient/HTTPComponents, which ignores HTTP headers in case of an error in the response.
Should this HTTP Header be accepted by the client? I believe it should not be, but I cannot find the supportive quotation in the RFC.
RFC 2616 does not specify that any headers should be ignored, not for 404 responses and not for 4xx responses in general either.
RFC 6265 allows clients to ignore Set-Cookie headers, but does not specify situations where that might happen; a single example is given, that does not cover your case:
the user agent might wish to block responses to "third-party" requests
from setting cookies
In your case, since your server seems to use HTTP basic access authentication, it does not seem to concern the Set-Cookie header. In HTTP basic authentication, the Authorization header is sent by the client with every request, so there should be no need to keep state in a cookie.
It is not clear from your question if you have a very specific HTTP server that you're talking to, or if you are implementing a general HTTP client that is supposed to work with whatever server you throw it at. If you have such a specific case that the HTTP server you work with sends state with 404 responses, and you're required to honor that state in order to communicate with the server, and you have no control over the server, then it does not matter what the standard says; you will honor the state sent, or you will not be able to talk to the server.
If, on the other hand, you're implementing a general client and need it to work regardless of the remote server, then your best bet is to stick to RFC 1958:
Be strict when sending and tolerant when receiving.
Implementations must follow specifications precisely when sending to
the network, and tolerate faulty input from the network. When in
doubt, discard faulty input silently, without returning an error
message unless this is required by the specification.
Which, to me, would mean that you should honor the full response received, regardless of the status code, unless you have an objective reason making it impossible for you to do so. I don't see a reason to ignore the state, even if it violates the standard (or in this case, your personal perception of the standard, since it does not say anything about accepting or ignoring the state).
Update: RFC 2617 (HTTP Authentication) states:
A client SHOULD assume that all paths at or deeper than the depth of
the last symbolic element in the path field of the Request-URI also
are within the protection space specified by the Basic realm value of
the current challenge. A client MAY preemptively send the
corresponding Authorization header with requests for resources in
that space without receipt of another challenge from the server.
It is highly inconsistent if the server expects HTTP authentication for one URL, but does not honor it for URLs beneath it, requiring a separate cookie-based authentication for them. If anything should be changed in your server implementation, it should be to harmonize the authentication scheme for all resources.

Resources