The gzip_proxied directive allows for the following options (non-exhaustive):
expired
enables compression if a response header includes the “Expires” field with a value that disables caching;
no-cache
enables compression if a response header includes the “Cache-Control” field with the “no-cache” parameter;
no-store
enables compression if a response header includes the “Cache-Control” field with the “no-store” parameter;
private
enables compression if a response header includes the “Cache-Control” field with the “private” parameter;
no_last_modified
enables compression if a response header does not include the “Last-Modified” field;
no_etag
enables compression if a response header does not include the “ETag” field;
auth
enables compression if a request header includes the “Authorization” field;
I can't see any rational reason to use most of these options. For example, why would whether or not a proxied request contains the Authorization header, or Cache-Control: private, affect whether or not I want to gzip it?
Given that old versions of Nginx strip ETags from responses when gzipping them, I can see a use case for no_etag: if you don't have Nginx configured to generate ETags for your gzipped responses, you may prefer to pass on an uncompressed response with an ETag rather than generate a compressed one without an ETag.
I can't figure out the others, though.
What are the intended use cases of each of these options?
From the admin guide: (emphasis mine)
The directive has a number of parameters specifying which kinds of proxied requests NGINX should compress. For example, it is reasonable to compress responses only to requests that will not be cached on the proxy server. For this purpose the gzip_proxied directive has parameters that instruct NGINX to check the Cache-Control header field in a response and compress the response if the value is no-cache, no-store, or private. In addition, you must include the expired parameter to check the value of the Expires header field. These parameters are set in the following example, along with the auth parameter, which checks for the presence of the Authorization header field (an authorized response is specific to the end user and is not typically cached)
I'd agree that not compressing cacheable responses is reasonable. Consider that the primary savings of caching at a proxy is to increase performance (response time) and reduce the time and bandwidth that the proxy spends in requesting the upstream resource. The tradeoff to gain these performance benefits is the cost of cache storage. Here are some use cases where not compressing cacheable responses make sense:
In the normal web traffic of many sites, non-personalized responses (which constitute the majority of cacheable responses) have already been optimized through techniques like script minification, image size optimization, etc., in a web build process. While these static resources might shrink slightly from compression, the CPU cost of trying to gzip them smaller is probably not an efficient use of the proxy layer machine resources. But dynamically generated pages, served to logged-in users, containing tons of application-generated content would very likely benefit from compression (and would typically not be cacheable).
You are setting up a proxy in front of a costly upstream service, but you are serving responses to another proxy that will be responsible for compression for each user agent. For example, if you have a CDN that makes multiple requests to the same costly upstream resource (from separate geographical edge locations) and you want to ensure that you can reuse the costly response. If the CDN caches uncompressed versions (to service both compressed and uncompressed user agents) you may be compressing at your proxy only to have them uncompress again at the CDN, which is simply a waste of hardware and electricity on both sides, to reduce bandwidth in the highest-bandwidth part of the chain. (Response gzip compression is most beneficial at the last mile, to get the response data to your user's phone which has dropped to one dot of signal as they enter the subway.)
For sizable response entities, requests may come in (from various user agents, but often via downstream CDN intermediaries) for byte ranges of the resource, to user agents that don't support compression. The CDN is likely to serve byte range requests from its own cache, provided that it has an uncompressed version already in its cache.
Related
Can max-stale be set in the response header for it to be used by the request on the client side? The documentation here - Cache-Control: Syntax, specifically the "Cache response directives" section makes it appear as if max-stale is not part of the response header. Is this only used by client to make decision how much longer it will use the stale resource and server/application has no say in it? If so, what can be set on the response to simulate the functionality of max-stale?
No max-stale cannot be used in the response. It’s meant to be used by the client to override the cache defaults. “Gimme this resource even if it’s technically a bit past it’s expiry date”. It would be used if there is a caching server between the client and the origin server.
To be honest, in my experience, request cache-control headers are rarely used, except to force refresh all the way back to origin server (max-age=0) for example when doing a “hard reload” with dev tools open. I’ve never seen a real world instance of max-stale as far as I can recall.
There is not equivalent on a response header. If a server is happy for a resource to be used for longer then it should just increase the max-age amount.
There is the stale-while-revalidate response option which allows a stale resource to be used for a limited period, to allow a quick reload of the page, while the browser checks and downloads a new version in the background for the next time. However support of it is limited at this time as shown at the bottom of that page you linked.
Simulating the behaviour of max-stale on the response does not really make much sense. Server owns the resources and has an understanding of how those resources change over time. Server has to decide how critical a resource is and if it is ok for it be served stale. Also decide on some reasonable time limits for the resource to be re-validated such that the clients get fresh data most of the time. It is a balancing act on the server side. Too strict the setting, you are overloading your server with requests and client with network traffic. Too loose, your clients see an old representation.
A client can use max-stale to avoid any re-validation and get what is in the cache. you don't want to generate network requests unless it is really necessary. for example must-revalidate overrides max-stale so if the response has that header, even with max-stale you will hit the origin server no matter. Similarly with no-cache and no-store. So in that sense, server has a say in it. It can identify resources that CANNOT be used stale, even with max-stale and marks with the appropriate header.
What is the current state of affairs when it comes to whether to do
Transfer-Encoding: gzip
or a
Content-Encoding: gzip
when I want to allow clients with e.g. limited bandwidth to signal their willingness to accept a compressed response and the server have the final say whether or not to compress.
The latter is what e.g. Apache's mod_deflate and IIS do, if you let it take care of compression. Depending on the size of the content to be compressed, it will do the additional Transfer-Encoding: chunked.
It will also include a Vary: Accept-Encoding, which already hints at the problem. Content-Encoding seems to be part of the entity, so changing the Content-Encoding amounts to a change of the entity, i.e. a different Accept-Encoding header means e.g. a cache cannot use its cached version of the otherwise identical entity.
Is there a definite answer on this that I have missed (and that's not buried inside a message in a long thread in some apache newsgroup)?
My current impression is:
Transfer-Encoding would in fact be the right way to do what is mostly done with Content-Encoding by existing server and client implentations
Content-Encoding, because of its semantic implications, carries a couple of issues (what should the server do to the ETag when it transparently compresses a response?)
The reason is chicken'n'egg: Browsers don't support it because servers don't because browsers don't
So I am assuming the right way would be a Transfer-Encoding: gzip (or, if I additionally chunk the body, it would become Transfer-Encoding: gzip, chunked). And no reason to touch Vary or ETag or any other header in that case as it's a transport-level thing.
For now I don't care too much about the 'hop-by-hop'-ness of Transfer-Encoding, something that others seem to be concerned about first and foremost, because proxies might uncompress and forward uncompressed to the client. However, proxies might just as well forward it as-is (compressed), if the original request has the proper Accept-Encoding header, which in case of all browsers that I know is a given.
Btw, this issue is at least a decade old, see e.g.
https://bugzilla.mozilla.org/show_bug.cgi?id=68517 .
Any clarification on this will be appreciated. Both in terms of what is considered standards-compliant and what is considered practical. For example, HTTP client libraries only supporting transparent "Content-Encoding" would be an argument against practicality.
The correct usage, as defined in RFC 2616 and actually implemented in the wild, is for the client to send an Accept-Encoding request header (the client may specify multiple encodings). The server may then, and only then, encode the response according to the client's supported encodings (if the file data is not already stored in that encoding), indicate in the Content-Encoding response header which encoding is being used. The client can then read data off of the socket based on the Transfer-Encoding (ie, chunked) and then decode it based on the Content-Encoding (ie: gzip).
So, in your case, the client would send an Accept-Encoding: gzip request header, and then the server may decide to compress (if not already) and send a Content-Encoding: gzip and optionally Transfer-Encoding: chunked response header.
And yes, the Transfer-Encoding header can be used in requests, but only for HTTP 1.1, which requires that both client and server implementations support the chunked encoding in both directions.
ETag uniquely identifies the resource data on the server, not the data actually being transmitted. If a given URL resource changes its ETag value, it means the server-side data for that resource has changed.
Quoting Roy T. Fielding, one of the authors of RFC 2616:
changing content-encoding on the fly in an inconsistent manner
(neither "never" nor "always) makes it impossible for later requests
regarding that content (e.g., PUT or conditional GET) to be handled
correctly. This is, of course, why performing on-the-fly
content-encoding is a stupid idea, and why I added Transfer-Encoding
to HTTP as the proper way to do on-the-fly encoding without changing
the resource.
Source: https://issues.apache.org/bugzilla/show_bug.cgi?id=39727#c31
In other words: Don't do on-the-fly Content-Encoding, use Transfer-Encoding instead!
Edit: That is, unless you want to serve gzipped content to clients that only understand Content-Encoding. Which, unfortunately, seems to be most of them. But be aware that you leave the realms of the spec and might run into issues such as the one mentioned by Fielding as well as others, e.g. when caching proxies are involved.
HTTP 1.1 supports keep alive connections, connections are not closed until "Connection: close" is sent.
So, if the browser, in this case firefox has network.http.pipelining enabled and network.http.pipelining.maxrequests increased isn't the same effect in the end?
I know that these settings are disabled because for some websites this could increase load but I think a simple http header flag could tell the browser that is ok tu use multiplexing and this problem can be solved easier.
Wouldn't be easier to change default settings in browsers than invent a new protocol that increases complexity especially in the http servers?
SPDY has a number of advantages that go beyond what HTTP pipelining can offer, which are described in the SPDY whitepaper:
With pipelining, the server still has to return the responses one at a time in the order they were requested. This can be a problem if the client requests a resource that's dynamically generated before one that is static: the server cannot send any of the "easy" static responses until the dynamically generated one has been generated and sent. With SPDY, responses can be returned out of order or in parallel as they are generated, lowering the total time to receive all resources.
As you noted in your question, not all servers are able to deal with pipelining: it's not just load, some servers actually behave incorrectly when the client requests pipelining. Using a header to indicate that it's okay to do pipelining is too late to get the maximum benefit: you are already receiving the first response at that point, so while you can use it on future connections it's already too late for this one.
SPDY compresses headers using an algorithm which is specific to that task (stateful and with knowledge of what is normally in HTTP headers); while yes, SSL already includes compression, just compressing them with deflate is not as efficient. Most HTTP requests have no bodies and only a short GET line, so the headers make up virtually the entire request: any compression you can get is an improvement. Many responses are also small compared to their headers.
SPDY allows servers to send back additional responses without the client asking for them. For example, a server might start sending back the CSS for a page along with the original HTML, before the client has had a chance to receive and parse the HTML to determine the stylesheet URL. This can speed up page loads even further by eliminating the need for the client to actually parse the HTML before requesting other resources needed to render the page. It also supports a less bandwidth-heavy version of this feature where it can "hint" about which resources might be needed, and allow the client to decide: this allows, for example, clients that don't care about images to not bother to request them, but clients that want to display images can still request the images using the given URLs without needing to wait for the HTML.
Other things too: see William Chan's answer for even more.
HTTP pipelining is susceptible to head of line blocking (http://en.wikipedia.org/wiki/Head-of-line_blocking) at the HTTP transaction level whereas SPDY only has head of line blocking at the transport level, due to its use of multiplexing.
HTTP pipelining has deployability issues. See https://datatracker.ietf.org/doc/html/draft-nottingham-http-pipeline-01 which describes a number of different workarounds and heuristics to mitigate this. SPDY as deployed in the wild does not have this problem since it is generally deployed over SSL (port 443) using NPN (http://technotes.googlecode.com/git/nextprotoneg.html) to negotiate SPDY support. SSL is key, since it prevents intermediaries from interfering.
SPDY has header compression. See http://dev.chromium.org/spdy/spdy-whitepaper which discusses some benchmark results of the benefits of header compression. Now, it's useful to note that bandwidth is less and less of an issue (see http://www.belshe.com/2010/05/24/more-bandwidth-doesnt-matter-much/), but it's also useful to remember that bandwidth is still key for mobile. Check out https://developers.google.com/speed/articles/spdy-for-mobile which shows how beneficial SPDY is for mobile.
SPDY supports features like server push. See http://dev.chromium.org/spdy/spdy-best-practices for ways to use server push to improve cacheability of content and still reduce roundtrips.
HTTP pipelining has ill-defined failure semantics. When the server closes the connection, how do you know which requests have been successfully processed? This is a major reason why POST and other non-idempotent requests are not allowed over pipelined connections. SPDY provides semantics to cancel individual streams on the same connection, and also has a GOAWAY frame which indicates the last stream to be successfully processed.
HTTP pipelining has difficulty, often due to intermediaries, in allowing deep pipelines. This (in addition to many other reasons like HoL blocking) means that you still need to utilize multiple TCP connections to achieve maximal parallelization. Using multiple TCP connections means that congestion control information cannot be shared, that compression contexts cannot be shared (like SPDY does with headers), is worse for the internet (more costly for intermediaries and servers).
I could go on and on about HTTP pipelining vs SPDY. But I'd recommend just reading up on SPDY. Check out http://dev.chromium.org/spdy and our tech talk on SPDY at http://www.youtube.com/watch?v=TNBkxA313kk&list=PLE0E03DF19D90B5F4&index=2&feature=plpp_video.
See Difference between HTTP pipeling and HTTP multiplexing with SPDY
Can I simply set the Transfer-Encoding header?
Will calling Response.Flush() at some point cause this to occur implicitly?
EDIT
No, I Cannot call Response.Headers.Add("Transfer-Encoding","anything"); That throws.
any other suggestions?
Related:
Enable Chunked Transfer Encoding in ASP.NET
TL;DR: Specifying the content-length is the best way to achieve a fast first byte; you'll allow chunking at TCP rather than HTTP level. If you don't know the content-length, setting context.Response.BufferOutput to false will send output as it's written the the output stream using chunked transfer-encoding.
Why do you want to set Transfer-Encoding: chunked? Chunked transfers are essentially a work-around to permit sending documents whose content-length is not known in advance. ASP.NET, however, by default buffers the entire output and hence does know the overall content length.
Of course, HTTP is layered over TCP, and behind the scene TCP is "chunking" anyhow by splitting even a monolithic HTTP response into packets - meaning that if you specify the content-length up front and disable output buffering, you'll get the best latency without requiring HTTP-level chunking. Thus, you don't need HTTP-level chunking to provide a fast first byte when you know the content-length.
Although I'm not an expert on HTTP, I have implemented a simple streaming media server with seeking support, dynamic compression, caching etc. and I do have a reasonable grasp of the relevance of a fast first byte - and chunking is generally an inferior option if you know the content-length - which is almost certainly why ASP.NET won't let you set it manually - it's just not necessary.
However, if you don't know the HTTP content length before transmission and buffering is too expensive, you turn off output buffering and presumably the server will use a chunked transfer encoding by necessity.
When does the server use chunked transfer encoding? I just tested, and indeed if context.Response.BufferOutput is set to false, and when the content length is not set, the response is chunked; such a response is 1-2% larger in my entirely non-scientific quick test of a 1.7MB content-encoding: gzip xml document. Since gzip relies on context to reduce redundancy, I'd expected the compression ratio to suffer more, but it seems that chunking doesn't necessarily greatly reduce compression ratios.
If you look at the framework code in reflector, it seems that the transfer encoding is indeed set automatically as needed - i.e. if buffering is off AND no content length is known AND the response is to an HTTP/1.1 request, chunked transfer encoding is used. However, if the server is IIS7 and this is a worker request (?integrated mode?), the code branches to a native method - probably with the same behavior, but I can't verify that.
It looks like you need to setup IIS for this. IIS 6 has a property AspEnableChunkedEncoding in the metabase and you can see the IIS 7 mappings for this on MSDN at http://msdn.microsoft.com/en-us/library/aa965021(VS.90).aspx.
This will enable you to set TRANSFER-ENCODING: chunked in your header. I hope this helps.
Although you set Buffer to false and leave empty the content length, you need to make sure that you have disabled "Dynamic Content Compressing" feature for IIS7 to make chunked response working. Also, client browser should have at least HTTP 1.1 .. Chunked mode won't be working for HTTP 1.0
Response.Buffer = False
This will set HTTP Header "Tranfer-Encoding:Chuncked" and send the response each callled response.write
Is there an accepted maximum allowed size for HTTP headers? If so, what is it? If not, is this something that's server specific or is the accepted standard to allow headers of any size?
No, HTTP does not define any limit. However most web servers do limit size of headers they accept. For example in Apache default limit is 8KB, in IIS it's 16K. Server will return 413 Entity Too Large error if headers size exceeds that limit.
Related question: How big can a user agent string get?
As vartec says above, the HTTP spec does not define a limit, however many servers do by default. This means, practically speaking, the lower limit is 8K. For most servers, this limit applies to the sum of the request line and ALL header fields (so keep your cookies short).
Apache 2.0, 2.2: 8K
nginx: 4K - 8K
IIS: varies by version, 8K - 16K
Tomcat: varies by version, 8K - 48K (?!)
It's worth noting that nginx uses the system page size by default, which is 4K on most systems. You can check with this tiny program:
pagesize.c:
#include <unistd.h>
#include <stdio.h>
int main() {
int pageSize = getpagesize();
printf("Page size on your system = %i bytes\n", pageSize);
return 0;
}
Compile with gcc -o pagesize pagesize.c then run ./pagesize. My ubuntu server from Linode dutifully informs me the answer is 4k.
Here is the limit of most popular web server
Apache - 8K
Nginx - 4K-8K
IIS - 8K-16K
Tomcat - 8K – 48K
Node (<13) - 8K; (>13) - 16K
HTTP does not place a predefined limit on the length of each header
field or on the length of the header section as a whole, as described
in Section 2.5. Various ad hoc limitations on individual header
field length are found in practice, often depending on the specific
field semantics.
HTTP Header values are restricted by server implementations. Http specification doesn't restrict header size.
A server that receives a request header field, or set of fields,
larger than it wishes to process MUST respond with an appropriate 4xx
(Client Error) status code. Ignoring such header fields would
increase the server's vulnerability to request smuggling attacks
(Section 9.5).
Most servers will return 413 Entity Too Large or appropriate 4xx error when this happens.
A client MAY discard or truncate received header fields that are
larger than the client wishes to process if the field semantics are
such that the dropped value(s) can be safely ignored without changing
the message framing or response semantics.
Uncapped HTTP header size keeps the server exposed to attacks and can bring down its capacity to serve organic traffic.
Source
RFC 6265 dated 2011 prescribes specific limits on cookies.
https://www.rfc-editor.org/rfc/rfc6265
6.1. Limits
Practical user agent implementations have limits on the number and
size of cookies that they can store. General-use user agents SHOULD
provide each of the following minimum capabilities:
o At least 4096 bytes per cookie (as measured by the sum of the
length of the cookie's name, value, and attributes).
o At least 50 cookies per domain.
o At least 3000 cookies total.
Servers SHOULD use as few and as small cookies as possible to avoid
reaching these implementation limits and to minimize network
bandwidth due to the Cookie header being included in every request.
Servers SHOULD gracefully degrade if the user agent fails to return
one or more cookies in the Cookie header because the user agent might
evict any cookie at any time on orders from the user.
--
The intended audience of the RFC is what must be supported by a user-agent or a server. It appears that to tune your server to support what the browser allows you would need to configure 4096*50 as the limit. As the text that follows suggests, this does appear to be far in excess of what is needed for the typical web application. It would be useful to use the current limit and the RFC outlined upper limit and compare the memory and IO consequences of the higher configuration.
I also found that in some cases the reason for 502/400 in case of many headers could be because of a large number of headers without regard to size.
from the docs
tune.http.maxhdr
Sets the maximum number of headers in a request. When a request comes with a
number of headers greater than this value (including the first line), it is
rejected with a "400 Bad Request" status code. Similarly, too large responses
are blocked with "502 Bad Gateway". The default value is 101, which is enough
for all usages, considering that the widely deployed Apache server uses the
same limit. It can be useful to push this limit further to temporarily allow
a buggy application to work by the time it gets fixed. Keep in mind that each
new header consumes 32bits of memory for each session, so don't push this
limit too high.
https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#3.2-tune.http.maxhdr
If you are going to use any DDOS provider like Akamai, they have a maximum limitation of 8k in the response header size. So essentially try to limit your response header size below 8k.