I am assuming that all web browsers send User-Agent, DNT, Accept, Accept-Language, Accept-Encoding etc automatically. The web developer do not have to do anything to set these headers. I am saying this because previously www.whatismybrowser.com used to show these header values.
If so then which headers are set by the web browser and sent automatically?
OP here. I got the answer from reddit.
One thing you could easily do is create a page like test.php and set it to just:
<?php
print_r($_SERVER);
Then visit that in the different browser and OS combos that you care about and take any of the notes that you're looking for.
Related
enable-cors.org nginx config suggests using the below values for Access-Control-Allow-Headers and Access-Control-Expose-Headers. But there isn't much explanation of why these are recommended except Custom headers and headers various browsers *should* be OK with but aren't. I'd rather not inflate the payload for every API request if some of these are not needed for my application.
I know I could remove them and wait for something to break but I'm hoping for some background on why/how they were selected so I can make a more educated decision on whether they are necessary for my application. i.e. were they recommended to support a browser that my application doesn't need to support?
Access-Control-Allow-Headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range
Access-Control-Expose-Headers: DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range
For the Allow-Headers, I can understand for most of them why a client would want to send them. X-CustomHeader stands out as an oddball though. Also, I tested on Chrome that even if User-Agent isn't explicitly allowed, chrome still sends it. This implies that these options were added for browser compatibility that my app might not need.
For the Expose-Headers, it seems like it would be very application specific on which headers a client needs to read. Why would a client need to read User-Agent, DNT, or X-Requested-With? They contain info meant for the server to consume, not the client. Additionally, Cache-Control and Content-Range are already enabled by default so they seem redundant here.
I ended up going through each header and determining if it was necessary. I compiled a list of changes:
Changes for both Allow and Expose
Removed from both since they are non-standard headers
X-CustomHeader
Removed from both since they are non-standard and semi-deprecated
Keep-Alive
Changes for Allow:
Removed since they are response-specific headers (used only for
servers to inform client)
Content-Range
Kept even though they are enabled by default but only for certain
types of requests (as per MDN):
Content-Type
Changes for Expose:
Removed since they are already enabled by default (as per MDN)
Cache-Control
Content-Type
Removed since they are request-specific headers (used only for
clients to inform server)
DNT
User-Agent
X-Requested-With
If-Modified-Since
Range
Added since they seem useful
Content-Length
This leaves me with the following:
Access-Control-Allow-Headers: DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range
Access-Control-Expose-Headers: Content-Length,Content-Range
Any comments or corrections would be greatly appreciated.
Background
I'm attempting to help a colleague debug an issue that hasn't been an issue for the past 6 months. After the most recent deployment of an ASP.NET MVC 2 application, FileResult responses that force a PDF file at the user for opening or saving are having trouble existing long enough on the client machine for the PDF reader to open them.
Earlier versions of IE (expecially 6) are the only browsers affected. Firefox and Chrome and newer versions of IE (>8) all behave as expected. With that in mind, the next section defines the actions necessary to recreate the issue.
Behavior
User clicks a link that points to an action method (a plain hyperlink with an href attribute).
The action method generates a PDF represented as a byte stream. The method always recreates the PDF.
In the action method, headers are set to instruct browsers how to cache the response. They are:
response.AddHeader("Cache-Control", "public, must-revalidate, post-check=0, pre-check=0");
response.AddHeader("Pragma", "no-cache");
response.AddHeader("Expires", "0");
For those unfamiliar with exactly what the headers do:
a. Cache-Control: public
Indicates that the response MAY be cached by any cache, even if it would normally be non-cacheable or cacheable only within a non- shared cache.
b. Cache-Control: must-revalidate
When the must-revalidate directive is present in a response received by a cache, that cache MUST NOT use the entry after it becomes stale to respond to a
subsequent request without first revalidating it with the origin server
c. Cache-Control: pre-check (introduced with IE5)
Defines an interval in seconds after which an entity must be checked for freshness. The check may happen after the user is shown the resource but ensures that on the next roundtrip the cached copy will be up-to-date.
d. Cache-Control: post-check (introduced with IE5)
Defines an interval in seconds after which an entity must be checked for freshness prior to showing the user the resource.
e. Pragma: no-cache (to ensure backwards compatibility with HTTP/1.0)
When the no-cache directive is present in a request message, an application SHOULD forward the request toward the origin server even if it has a cached copy of what is being requested
f. Expires
The Expires entity-header field gives the date/time after which the response is considered stale.
We return the file from the action
return File(file, "mime/type", fileName);
The user is presented with an Open/Save dialog box
Clicking "Save" works as expected, but clicking "Open" launches the PDF reader, but the temporary file IE stored has already been deleted by the time the reader tries to open the file, so it complains that the file is missing (and it is).
There are a half dozen other apps here that use the same headers to force Excel, CSV, PDF, Word, and a ton of other content at users and there's never been an issue.
The Question
Are the headers correct for what we're trying to do? We want the file to exist temporarily (get cached), but always be replaced by new versions even though the requests may be identical).
The response headers are set in the action method before return a FileResult. I've asked my colleague to try creating a new class that inherits from FileResult and to instead override the ExecuteResult method so that it modifies the headers and then does base.ExecuteResult() instead -- no status on that.
I have a hunch the "Expires" header of "0" is the culprit. According to this W3C article, setting it to "0" implies "already expired." I do want it to be expired, I just don't want IE to go removing it off of the filesystem before the application handling it gets a chance to open it.
As always, thanks!
Edit: The Solution
Upon further testing (using Fiddler to inspect the headers), we were seeing that the response headers we thought were getting set were not the ones being interpreted by the browser. Having not been familiar with the code myself, I was unaware of an underlying issue: the headers were getting stomped on outside of the action method.
Nonetheless, I'm going to leave this question open. Still outstanding is this: there seems to be some discrepancy between the Expires header having a value of 0 vs. -1. If anybody can lay claim to differences by design, in regards to IE, I would still like to hear about it. As for a solution though, the above headers do work as intended with the Expires value set to -1 in all browsers.
Update 1
The post How to control web page caching, across all browsers? describes in detail that caching can be prevented in all browsers with the help of setting Expires = 0. I'm still not sold on this 0 vs -1 argument...
I think you should just use
HttpContext.Current.Response.Cache.SetMaxAge (new TimeSpan (0));
or
HttpContext.Current.Response.Headers.Set ("Cache-Control", "private, max-age=0");
to set max-age=0 which means nothing more as the cache re-validating (see here). If you would be set additionally ETag in the header with some your custom checksum of hash from the data, the ETag from the previous request will be sent to the server. The server are able either to return the data or, in case that the data are exactly the same as before, it can return empty body and HttpStatusCode.NotModified as the status code. In the case the web browser will get the data from the local browser cache.
I recommend you to use Cache-Control: private which force two important things: 1) switch off caching the data on the proxy, which has sometimes very aggressive caching settings 2) it will allows the caching of the the data, but not permit sharing of the cache with another users. It can solve privacy problems because the data which you return to one user could be not allowed to read by another users. By the way the code HttpContext.Current.Response.Cache.SetMaxAge (new TimeSpan (0)) set Cache-Control: private, max-age=0 in the HTTP header by default. If you do want to use Cache-Control: public you can use SetCacheability (HttpCacheability.Public); to overwrite the behavior or use Headers.Set instead of Cache.SetMaxAge.
If you have interest to study more caching options of HTTP protocol I would recommend you to read the caching tutorial.
UPDATED: I decide to write some more information to clear my position. Corresponds to the information from the Wikipedia even so old web browsers like Mosaic 2.7, Netscape 2.0 and Internet Explorer 3.0 supports March 1996, pre-standard of HTTP/1.1 described in RFC 2068. So I suppose (but not test it) that the old web browsers support max-age=0 HTTP header. In any way Netscape 2.06 and Internet Explorer 4.0 definitively supports HTTP 1.1.
So you should ask you first: which HTML standards you use? Do you still use HTML 2.0 instead of more late HTML 3.2 published in January 1997? I suppose you use at least HTML 4.0 published in December 1997. So if you build your application at least in HTML 4.0, your site can be oriented on the web clients which supports HTTP 1.1 and ignore (don't support) the web clients which don't support HTTP 1.1.
Now about other "Cache-Control" headers as "private, max-age=0". Including of the headers is in my opinion is pure paranoia. As I have some caching problem myself I tried also to include different other headers, but later after reading carefully the section 14.9 of RFC2616 I use only "Cache-Control: private, max-age=0".
The only "Cache-Control" header which can be additionally discussed is "must-revalidate" described on the section 14.9.4 which I referenced before. Here is the quote:
The must-revalidate directive is necessary to support reliable
operation for certain protocol features. In all circumstances an
HTTP/1.1 cache MUST obey the must-revalidate directive; in particular,
if the cache cannot reach the origin server for any reason, it MUST
generate a 504 (Gateway Timeout) response.
Servers SHOULD send the must-revalidate directive if and only if
failure to revalidate a request on the entity could result in
incorrect operation, such as a silently unexecuted financial
transaction. Recipients MUST NOT take any automated action that
violates this directive, and MUST NOT automatically provide an
unvalidated copy of the entity if revalidation fails.
Although this is
not recommended, user agents operating under severe connectivity
constraints MAY violate this directive but, if so, MUST explicitly
warn the user that an unvalidated response has been provided. The
warning MUST be provided on each unvalidated access, and SHOULD
require explicit user confirmation.
Sometime if I have problem with Internet connection I see the empty page with "Gateway Timeout" message. It come from the the usage of "must-revalidate" directive. I don't think that "Gateway Timeout" message really help the user.
So the persons, how prefer to start self-destructive procedure if he hears "Busy" signal on the call to his boss, should additionally use "must-revalidate" directive in the "Cache-Control" header. Other persons I recommend just use "Cache-Control: private, max-age=0" and nothing more.
For IE, I remember having to set Expires: -1. How to prevent caching in Internet Explorer seems to confirm this with the following code snippet.
<% Response.CacheControl = "no-cache" %>
<% Response.AddHeader "Pragma", "no-cache" %>
<% Response.Expires = -1 %>
Looking back in code, this is what I found. Also, I vaguely remember that if you set Cache-Control: private is may not behave correctly with SSL.
Response.AddHeader("Cache-Control", "no-cache");
Response.AddHeader("Expires", "-1");
Also, So, You Don't Want To Cache, Huh? mentions -1, but uses methods on Response.Cache instead:
// Stop Caching in IE
Response.Cache.SetCacheability(System.Web.HttpCacheability.NoCache);
// Stop Caching in Firefox
Response.Cache.SetNoStore();
However, ASP Page caching issue (IE8) says this code doesn't work.
Can anyone break down what these two methods do at a HTTP level.
We are dealing with Akamai edge-caching and have been told that SetNoStore() will cause can exclusion so that (for example) form pages will always post back to the origin server. According to {guy} this sets the HTTP header:
Cache-Control: "no-cache, no-store"
As I was implementing this change to our forms I found SetNoServerCaching(). Well that seems to make a bit more sense semantically, and the documentation says "Explicitly denies caching of the document on the origin-server."
So I went down to the sea sea sea to see what I could see see see. I tried both of these methods and reviewed the headers in Firebug and Fiddler.
And from what I can tell, both these method set the exact same Http Header.
Can anyone explain if there are actual differences between these methods and if so, where are hiding in the http response?!
Theres a few differences,
SetNoStore, essentially stops the browser (and any network resource such as a CDN) from saving any part of the response or request, that includes saving to temp files. This will set the NO-STORE HTTP 1.1 header
SetNoServerCaching, will essentially stop the server from saving files, in ASP.NET There are several levels of caching that can happen, Data only, Partial Requests, Full Pages, and SQL Data. This call should stop the HTTP (Full and Partial) requests being saved on the server. This method should not set the cache-control headers or no-store or no cache.
There is also
Response.Cache.SetCacheability(HttpCacheability.Public);
Response.Cache.SetMaxAge(new TimeSpan(1, 0, 0));
as a possible way of setting cache, this will set the content-expires header.
For a CDN you probably want to set the content-expires header so that he CDN knows when to fetch new content, it if it gets a HIT. You probably don't want no-cache or no-store as this would cause a refetch on every HIT so essentially you are nullifying any benefit the CDN brings to you except they may have a faster backbone connection to the end user than your current ISP but that would be marginal.
Differnce between the two is
HttpCachePolicy.SetNoStore() or Response.Cache.SetNoStore:
Prevents the browser from caching the ASPX page.
HttpCachePolicy.SetNoServerCaching or Response.Cache.SetNoServerCaching:
Stops all origin-server caching for the current response. Explicitly denies caching of the document on the origin-server. Once set, all requests for the document are fully processed.
When these methods are invoked, caching cannot be reenabled for the current response.
I was wondering how companies like Double-Click include a cookie in their image responses to track users. Similarly, how do the images (e.g. smart pixels) send information back to their servers?
Please provide a scripting example if possible (any language is okay) [note: if this is resolve doings something server side, please describe how this would be accomplished using APACHE].
Cheers,
Rob
How do they include a cookie ? Configure the server, probably via script to send cookies with the responses. Images are http requests, that follow the http protocol, there is nothing magical about them.
"Smart pixels" convey their information simply via the request the browser must send to the server in order to load the image. Information about the user/browser, can be gathered via javascript and embedded in the url.
To do this in php, you'd use the setCookie function.
<?php
$value = 'something from somewhere';
setcookie("TestCookie", $value);
setcookie("TestCookie", $value, time()+3600); /* expire in 1 hour */
setcookie("TestCookie", $value, time()+3600, "/~rasmus/", ".example.com", 1);
?>
That code was taken from the php doc I referenced above. Basically this adds the Set-Cookie to the HttpResponse header like: Set-Cookie: UserID=JohnDoe; Max-Age=3600; Version=1
See http://en.wikipedia.org/wiki/List_of_HTTP_header_fields and search for Set-Cookie
ALSO, in scripting languages, like PHP, make sure you set the header before you render any content. This is because the HTTP Headers are the first thing sent in the response, so as soon as you write content, the headers should've already been written.
Another quote from the PHP:setcookie doc:
Like other headers, cookies must be sent before any output from your
script (this is a protocol restriction). This requires that you place
calls to this function prior to any output, including and
tags as well as any whitespace.
I use PHP to generate dynamic Web pages. As stated on the following tutorial (see link below), the MIME type of XHTML documents should be "application/xhtml+xml" when $_SERVER['HTTP_ACCEPT'] allows it. Since you can serve the same page with 2 different MIMEs ("application/xhtml+xml" and "text/html") you should set the "Vary" HTTP header to "Accept". This will help the cache on proxies.
Link:
http://keystonewebsites.com/articles/mime_type.php
Now I'm not sure of the implication of:
header('Vary: Accept');
I'm not really sure of what 'Vary: Accept' will precisely do...
The only explanation I found is:
After the Content-Type header, a Vary
header is sent to (if I understand it
correctly) tell intermediate caches,
like proxy servers, that the content
type of the document varies depending
on the capabilities of the client
which requests the document.
http://www.456bereastreet.com/archive/200408/content_negotiation/
Anyone can give me a "real" explanation of this header (with that value). I think I understand things like:
Vary: Accept-Encoding
where the cache on proxies could be based on the encoding of the page served, but I don't understand:
Vary: Accept
The cache-control header is the primary mechanism for an HTTP server to tell a caching proxy the "freshness" of a response. (i.e., how/if long to store the response in the cache)
In some situations, cache-control directives are insufficient. A discussion from the HTTP working group is archived here, describing a page that changes only with language. This is not the correct use case for the vary header, but the context is valuable for our discussion. (Although I believe the Vary header would solve the problem in that case, there is a Better Way.) From that page:
Vary is strictly for those cases where it's hopeless or excessively complicated for a proxy to replicate what the server would do.
RFC2616 "Header-Field Definitions" describes the header usage from the server perspective, RFC2616 "Caching Negotiated Responses" from a caching proxy perspective. It's intended to specify a set of HTTP request headers that determine uniqueness of a request.
A contrived example:
Your HTTP server has a large landing page. You have two slightly different pages with the same URL, depending if the user has been there before. You distinguish between requests and a user's "visit count" based on Cookies. But -- since your server's landing page is so large, you want intermediary proxies to cache the response if possible.
The URL, Last-Modified and Cache-Control headers are insufficient to give this insight to a caching proxy, but if you add Vary: Cookie, the cache engine will add the Cookie header to its caching decisions.
Finally, for small traffic, dynamic web sites -- I have always found the simple Cache-Control: no-cache, no-store and Pragma: no-cache sufficient.
Edit -- to more precisely answer your question: the HTTP request header 'Accept' defines the Content-Types a client can process. If you have two copies of the same content at the same URL, differing only in Content-Type, then using Vary: Accept could be appropriate.
Update 11 Sep 12:
I'm including a couple links that have appeared in the comments since this comment was originally posted. They're both excellent resources for real-world examples (and problems) with Vary: Accept; Iif you're reading this answer you need to read those links as well.
The first, from the outstanding EricLaw, on Internet Explorer's behavior with the Vary header and some of the challenges it presents to developers: Vary Header Prevents Caching in IE. In short, IE (pre IE9) does not cache any content that uses the Vary header because the request cache does not include HTTP Request headers. EricLaw (Eric Lawrence in the real world) is a Program Manager on the IE team.
The second is from Eran Medan, and is an on-going discussion of Vary-related unexpected behavior in Chrome: Backing doesn't handle Vary header correctly. It's related to IE's behavior, except the Chrome devs took a different approach -- though it doesn't appear to have been a deliberate choice.
Vary: Accept simply says that the response was generated based on the Accept header in the request. A request with a different Accept header might get a different response.
(You can see that the linked PHP code looks at $HTTP_ACCEPT. That's the value of the Accept request header.)
To HTTP caches, this means that the response must be cached with extra care. It is only going to be a valid match for later requests with exactly the same Accept header.
Now this only matters if the page is cacheable in the first place. By default, PHP pages aren't. A PHP page can mark the output as cacheable by sending certain headers (Expires, for example). But whether and how to do that is a different question.
This google webmaster video has a very good explanation about HTTP Vary header.
There are actually a significant number of new features coming soon (and already in Chrome) that make the Vary header extremely useful. For example, consider Client Hinting. When used in connection with images, for example, client hinting allows a server to optimize resources such as images depending on:
Image Width
Viewport Width
Type of encoding supported by browser (think WebP)
Downlink (essentially network speed)
So a server which supports those features would set the Vary header to indicate that.
Chrome advertises WebP support by setting "image/webp" as part of the Vary header for each request. So a server might rewrite an image as WebP if the browser supports it, so the proxy would need to check the header so as to not cache a WebP image and then serve it to a browser that doesn't support WebP. Obviously, if your server doesn't do that, it wouldn't matter. So since the server's response varies on the Accept request header, the response must include that so as not to confuse proxies:
Vary: Accept
Another example might be image width. On a mobile browser the Width header might be quite small for a responsive image, in comparison with what it would be if viewed from a desktop browser. So in that case Width would be added to the the Vary header is essential for proxy to not cache the small mobile version and serve it to desktop browsers, or vice versa. In that case, the header might include:
Vary: Accept, Width
Or in the case that a server supported all of the client hinting specs, the header would be something like:
Vary: Accept, DPR, Width, Save-Data, Downlink