How do HTTP proxy caches decide between serving identity- vs. gzip-encoded resources? - http

An HTTP server uses content-negotiation to serve a single URL identity- or gzip-encoded based on the client's Accept-Encoding header.
Now say we have a proxy cache like squid between clients and the httpd.
If the proxy has cached both encodings of a URL, how does it determine which to serve?
The non-gzip instance (not originally served with Vary) can be served to any client, but the encoded instances (having Vary: Accept-Encoding) can only be sent to a clients with the identical Accept-Encoding header value as was used in the original request.
E.g. Opera sends "deflate, gzip, x-gzip, identity, *;q=0" but IE8 sends "gzip, deflate". According to the spec, then, caches shouldn't share content-encoded caches between the two browsers. Is this true?

First of all, it's IMHO incorrect not to send "Vary: Accept-Encoding" when the entity indeed varies by that header (or its absence).
That being said, the spec currently indeed disallows serving the cached response to Opera, because the Vary header does not match per the definitions in HTTPbis, Part 6, Section 2.6. Maybe this is an area where we should relax the requirements for caches (you may want to follow up on the IETF HTTP mailing list...
UPDATE: turns out that this was already marked as an open question; I just added an issue in our issue tracker for it, see Issue 147.

Julian is right, of course. Lesson: Always send Vary: Accept-Encoding when sniffing Accept-Encoding, no matter what the response encoding.
To answer my question, if you mistakenly leave Vary out, if a proxy receives a non-encoded response (without Vary), it can simply cache and return this for every subsequent request (ignoring Accept-Encoding). Squid does this.

The big problem with leaving out Vary is that If the cache receives an encoded variant without Vary then it MAY send this in response to other requests even if their Accept-Encoding indicates the client can not understand the content.

Related

HTTP GET with Transfer-Encoding Chunked

I recently saw a GET call which had the Transfer-Encoding header set to Chunked. What we noticed from this call is a delay then a 500 socket timeout.
Digging deeper, we showed this behavior of the 500 coming from ELB's and Apache web servers.
Of note, if we make the GET call with Transfer-Encoding as Chunked and include an empty payload, the ELB in our case allows the request to go through as normal.
Given this ELB's and Apache web servers exihibit this behaviour, my question is sending Transfer-Encoding as Chunked on a HTTP GET call is valid or not?
The Transfer-Encoding header is now under authority of RFC 7230, section 3.3.1 (see also the IANA Message Header Registry. You will find that this header is part of the HTTP message framing, so it is valid for both, requests and responses.
Keeping in mind that GET requests may carry bodies, the reaction of the server(s) is absolutely correct. What is causing the delay followed by the 500 is probably the following: The server would expect a body but fails to find one, since the chunk-encoded representation of an empty string is not an empty string but a zero-sized chunk. In consequence, the server is running into a timeout.

What HTTP client headers should I use to instruct proxies to refetch from origin, and cache the response?

I'm currently working on a system where a client makes HTTP 1.1 requests of an origin server. I control both the client and the server software, so have free reign over HTTP headers set. Between the client are multiple, hierarchical layers of web proxy / cache devices (think, Squid or similar).
The data served up by the origin is usually highly cacheable, and I intend to set HTTP response headers to indicate this. Specifically, I plan to use Cache-Control: public, max-age=<value>. I understand that this will mean that intermediate proxies will cache the response up to the specified max-age, at which point they will revalidate against the origin (presumably with a Last-Modified header, looking for a 304 response).
The problem I have is that the client might become aware that the data held by caches might now be invalid. In this case, I need the client to make a request which instructs the caches to either fetch or revalidate their response with the origin. If the origin response is now different, the cache should store this new response. In my mind, this would involve the client making the request, and each cache in the chain should revalidate its response with the next upstream device, all the way back to the origin. The new response can then be served from the closest cache which actually has it.
What's the correct HTTP headers that need to be set on the client request to achieve this? At first I thought that setting Cache-control: no-cache in the HTTP request would make this happen, but reading the RFC, it seems that this will instruct the intermediate caches to both go back to the origin (desired) but also not cache the new response (not desired). I then saw an article in which an HTTP request header of Cache-control: max-age=0 would perhaps do this, but I'm not sure.
Will max-age=0 do what I need here, or do I need some other combination of HTTP headers?
I asked a similar question here: How to make proxy revalidate resource from origin. I since learned that proxy revalidate wasn't supported by nginx at the time of writing. It is scheduled for the 1.5 release.
Sending max-age=0 from the client should trigger this revalidate mechanism in the proxy, if the original response from the origin contained the right cache control headers.
But whether your upstream server(s) will respect these headers and revalidate with their origin is clearly not something you can just assume. If you have control over your upstream servers I think it could work.
Also etag is preferred over modified since headers afaik.
I found these to be helpful articles on the subject:
caching tutorial
cache control directives
http specs on validation
section 14.9.4 on this spec
[UPDATE]
Nginx version 1.5.8 has been released since, and I can confirm that this mechanism is now working!

Is it legitimate http/rest to have requests that are compressed?

I asked this question a few days ago, and I didn't get a lot of activity on it. And it got me thinking that perhaps this was because my question was nonsensical.
My understanding of http is that a client (typical a browser) sends a request (get) to a server, in my case IIS. Part of this request is the accept-encoding header, which indicates to the server what type of encoding the client would like the resource returned in. Typically this could include gZip. And if the server is set up correctly it will return the resource requested in the requested encoding.
The response will include a Content-Encoding header indicating what compression has been applied to the resource. Also included in the response is the Content-Type header which indicates the mime type of the resource. So if the response includes both Content-Type : application/json and Content-Encoding: gzip, the client knows that resource is json that is has been compressed using gzip.
Now the scenario I am facing is that I am developing a web service for clients that are not browsers but mobile devices, and that instead of requesting resources, these devices will be posting data to the service to handle.
So i have implemented a Restfull service that accepts post request with json in the body. And my clients send their post requests with Content-Type:Application/json. But some of my clients have requested that they want to compress their request to speed up transmission. But my understanding is the there is no way to indicate in a request that the body of the request has been encoded using gZip.
That is to say there is no content-Encoding header for requests, only responses.
Is this the case?
Is it incorrect usage of http to attempt to compress requests?
According to another answer here on SO, it is within the HTTP standard to have a Content-Encoding header on the request and send the entity deflated.
It seems that no server automatically inflates the data for you, though, so you'll have to write the server-side code yourself (check the request header and act accordingly).

What are the consequences of not setting "cache-control" in http response header?

Say, my web application responds to a http request with a response that has no "cache-control" in its header. If the client-end submits the same request within a relatively short time, what would happen? Does a cached copy of the response get used and thus the request does not need to reach the server? Or does the request get sent to the server just like the first time?
If the answer is "it depends", please indicate what the dependencies are. Thanks.
There is no caching behavior defined in HTTP/1.1 protocol for a resource served with no cache-related headers, so it's really up to the HTTP client's implementation.
Here is the link to RFC.

What HTTP request headers are important/commonly used?

I'm writing a web server, and I'd like to know what HTTP request headers (sent by the client) are the most common and thus that I should focus on implementing.
Right now, I only support Accept and Host.
Not sure on your scope but since you are interested in serving web browsers, you should have a look into the RFC (HTTP 1.1)
Read about what the server MUST process
The Cookie header might be a good idea, as would the Content-Length header; without Content-Length you won't be able to handle POST and PUT requests properly.

Resources