Should HTTP proxy copy Content-Encoding header back to client? - http

It's said that Transport will handle the Content-Encoding automatically (like auto decompressing when reading from resp.Body).
It's also said that, Content-Encoding is an end-to-end HTTP header, not a hop-by-hop one.
Therefore, if a proxy copied Content-Encoding back to client's response header, and this proxy also io.Copy the upstream response body (which may decompressing automatically since io.Copy will read from resp.Body), won't it be inconsistent to client? (Content-Encoding copied from upstream response, but body has been decompressed)

In general, the Content-Encoding response header should not be altered by a proxy.
Different encodings of the same URI are deemed to be different representations and have different ETags. So, changing Content-Encoding would not play well with caching.
But if it's your own proxy and client in your own ecosystem, you could do it, since you know what's going on, so if your proxy is decompressing data back to the client you'd need to strip the Content-Encoding header.

Related

Is there a way to distinguish out the IntermediateSystem-related HTTP request and response headers?

In the wikipedia HTTP header fields:
There list many HTTP header fields, but I did not found a way to distinguish the headers about intermediate system(Such as CDN/HTTP Proxy).
Is there a way to distinguish out the IntermediateSystem-related HTTP header fields? or is there any link introduce it?
All the HTTP headers can send to proxy, there can only distinguished by whether Proxy modify the headers: End-to-end headers and Hop-by-hop headers.
End-to-end headers
These headers must be transmitted to the final recipient of the message: the server for a request, or the client for a response. Intermediate proxies must retransmit these headers unmodified and caches must store them.
Hop-by-hop headers
These headers are meaningful only for a single transport-level connection, and must not be retransmitted by proxies or cached. Note that only hop-by-hop headers may be set using the Connection general header.
and if you want to find more proxy-related,
from developer.mozilla.org - HTTP Headers:
Headers can also be grouped according to how proxies handle them:
Connection
Keep-Alive
Proxy-Authenticate
Proxy-Authorization
TE
Trailer
Transfer-Encoding
Upgrade (see also Protocol upgrade mechanism).

What HTTP client headers should I use to instruct proxies to refetch from origin, and cache the response?

I'm currently working on a system where a client makes HTTP 1.1 requests of an origin server. I control both the client and the server software, so have free reign over HTTP headers set. Between the client are multiple, hierarchical layers of web proxy / cache devices (think, Squid or similar).
The data served up by the origin is usually highly cacheable, and I intend to set HTTP response headers to indicate this. Specifically, I plan to use Cache-Control: public, max-age=<value>. I understand that this will mean that intermediate proxies will cache the response up to the specified max-age, at which point they will revalidate against the origin (presumably with a Last-Modified header, looking for a 304 response).
The problem I have is that the client might become aware that the data held by caches might now be invalid. In this case, I need the client to make a request which instructs the caches to either fetch or revalidate their response with the origin. If the origin response is now different, the cache should store this new response. In my mind, this would involve the client making the request, and each cache in the chain should revalidate its response with the next upstream device, all the way back to the origin. The new response can then be served from the closest cache which actually has it.
What's the correct HTTP headers that need to be set on the client request to achieve this? At first I thought that setting Cache-control: no-cache in the HTTP request would make this happen, but reading the RFC, it seems that this will instruct the intermediate caches to both go back to the origin (desired) but also not cache the new response (not desired). I then saw an article in which an HTTP request header of Cache-control: max-age=0 would perhaps do this, but I'm not sure.
Will max-age=0 do what I need here, or do I need some other combination of HTTP headers?
I asked a similar question here: How to make proxy revalidate resource from origin. I since learned that proxy revalidate wasn't supported by nginx at the time of writing. It is scheduled for the 1.5 release.
Sending max-age=0 from the client should trigger this revalidate mechanism in the proxy, if the original response from the origin contained the right cache control headers.
But whether your upstream server(s) will respect these headers and revalidate with their origin is clearly not something you can just assume. If you have control over your upstream servers I think it could work.
Also etag is preferred over modified since headers afaik.
I found these to be helpful articles on the subject:
caching tutorial
cache control directives
http specs on validation
section 14.9.4 on this spec
[UPDATE]
Nginx version 1.5.8 has been released since, and I can confirm that this mechanism is now working!

Is it legitimate http/rest to have requests that are compressed?

I asked this question a few days ago, and I didn't get a lot of activity on it. And it got me thinking that perhaps this was because my question was nonsensical.
My understanding of http is that a client (typical a browser) sends a request (get) to a server, in my case IIS. Part of this request is the accept-encoding header, which indicates to the server what type of encoding the client would like the resource returned in. Typically this could include gZip. And if the server is set up correctly it will return the resource requested in the requested encoding.
The response will include a Content-Encoding header indicating what compression has been applied to the resource. Also included in the response is the Content-Type header which indicates the mime type of the resource. So if the response includes both Content-Type : application/json and Content-Encoding: gzip, the client knows that resource is json that is has been compressed using gzip.
Now the scenario I am facing is that I am developing a web service for clients that are not browsers but mobile devices, and that instead of requesting resources, these devices will be posting data to the service to handle.
So i have implemented a Restfull service that accepts post request with json in the body. And my clients send their post requests with Content-Type:Application/json. But some of my clients have requested that they want to compress their request to speed up transmission. But my understanding is the there is no way to indicate in a request that the body of the request has been encoded using gZip.
That is to say there is no content-Encoding header for requests, only responses.
Is this the case?
Is it incorrect usage of http to attempt to compress requests?
According to another answer here on SO, it is within the HTTP standard to have a Content-Encoding header on the request and send the entity deflated.
It seems that no server automatically inflates the data for you, though, so you'll have to write the server-side code yourself (check the request header and act accordingly).

Do HTTP proxy servers modify request packets?

Is any request header added or modified to the HTTP request before forwarding to the server by a proxy server?
If so, are the changes done to the same packets, or are the contents used to create new request packets with the modifications?
There are a few different types of proxy servers. Because you've mentioned request headers, I'm going to assume that you're talking about HTTP proxy servers, which forward HTTP requests, not packets.
NOTE: In the special case of HTTPS requests (TLS/SSL via CONNECT), proxy servers will just forward the content of the TCP packets (and are unable to inspect the packets unless acting as a man-in-the-middle proxy).
Of course it depends on the proxy software and its configuration, but HTTP proxies are expected to follow the W3C Guidelines for Web Content Transformation Proxies, which states many things, but most relevantly:
Other than to convert between HEAD and GET proxies must not alter request methods.
If the request contains a Cache-Control: no-transform directive, proxies must not alter the request other than to comply with transparent HTTP behavior defined in RFC 2616 HTTP sections section 14.9.5 and section 13.5.2 and to add header fields as described in 4.1.6 Additional HTTP Header Fields.
Other than the modifications required by RFC 2616 HTTP proxies should not modify the values of header fields other than the User-Agent, Accept, Accept-Charset, Accept-Encoding, and Accept-Language header fields and must not delete header fields.
Proxies should add the IP address of the initiator of the request to the end of a comma separated list in an X-Forwarded-For HTTP header field.
Proxies must (in accordance with RFC 2616) include a Via HTTP header field.
In summary, you can generally expect these HTTP headers to be changed/added by a standards-compliant proxy:
User-Agent
Accept
Accept-Charset
Accept-Encoding
Accept-Language
X-Forwarded-For
Via

X-Cache Header Explanation

I was going through the firefox local cache folder and found a lot of files containing the X-cache header. Can someone explain the purpose of this header ?
thanks
CDN (Content Delivery Network) adds X-cache header to HTTP Response. X-cache:HIT means that your request was served by CDN, not origin servers. CDN is a special network designed to cache content, so that usr request served faster + to unload origin servers.
Prefix 'X' in X-Cache indicates that the header is not a standard HTTP Header Field. Also its meaning vary from one proxy implementation to another. A common place to find these header fields is in squid servers. Organizations and universities place proxy (squid) servers between their and
outer network. This serves two purposes. One of security, and other of caching more frequent web pages (in order to limit network traffic).
X-Cache corresponds to the result, whether the proxy has served the result from cache (HIT for yes, and MISS for no)
X-Cache-Lookup represents if the proxy has a cache-able response to the request (HIT for yes and MISS for no)
Both HITs means that the client has made a cache-able request and the proxy had a cache-able response that matched, and was forwarded back to the client.
In case X-Cache is MISS and X-Cache_Lookup is HIT, then the client made a request that had a cache-able response but was forced by the client to bypass the cache. This is hard refresh, which can be simulated by Ctrl + F5 or by sending headers:
Pragma: no-cache (in HTTP/1.0) and Cache-Control: no-cache
(HTTP/1.1)
If both are MISS(es) then the request by the client doesn't have any valid object corresponding to the request.
Some Useful Resources:
X-Cache and X-Cache-Lookup Headers
Understanding cache HIT and MISS headers with shielded services
X-Cache "is NOT a standard HTTP header field".
Also, check out X-Cache and X-Cache-Lookup headers explained.
for me me this was related to fastcgi cache header existing on Nginx server block
add_header X-Cache $upstream_cache_status;
just removing commenting this line and restart nginx the header were removed .

Resources