Accept and Accept-Charset - Which is superior? - http

In HTTP you can specify in a request that your client can accept specific content in responses using the accept header, with values such as application/xml. The content type specification allows you to include parameters in the content type, such as charset=utf-8, indicating that you can accept content with a specified character set.
There is also the accept-charset header, which specifies the character encodings which are accepted by the client.
If both headers are specified and the accept header contains content types with the charset parameter, which should be considered the superior header by the server?
e.g.:
Accept: application/xml; q=1,
text/plain; charset=ISO-8859-1; q=0.8
Accept-Charset: UTF-8
I've sent a few example requests to various servers using Fiddler to test how they respond:
Examples
W3
Request
GET http://www.w3.org/ HTTP/1.1
Host: www.w3.org
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1
Response
Content-Type: text/html; charset=utf-8
Google
Request
GET http://www.google.co.uk/ HTTP/1.1
Host: www.google.co.uk
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1
Response
Content-Type: text/html; charset=ISO-8859-1
StackOverflow
Request
GET http://stackoverflow.com/ HTTP/1.1
Host: stackoverflow.com
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1
Response
Content-Type: text/html; charset=utf-8
Microsoft
Request
GET http://www.microsoft.com/ HTTP/1.1
Host: www.microsoft.com
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1
Response
Content-Type: text/html
There doesn't seem to be any consensus around what the expected behaviour is. I am trying to look surprised.

Altough you can set media type in Accept header, the charset parameter definition for that media type is not defined anywhere in RFC 2616 (but it is not forbidden, though).
Therefore if you are going to implement a HTTP 1.1 compliant server, you shall first look for Accept-charset header, and then search for your own parameters at Accept header.

Read RFC 2616 Section 14.1 and 14.2. The Accept header does not allow you to specify a charset. You have
to use the Accept-Charset header instead.

Firstly, Accept headers can accept parameters, see RFC 7231 section 5.3.2
All text/* mime-types can accept a charset parameter.
The Accept-Charset header allows a user-agent to specify the charsets it supports.
If the Accept-Charset header did not exist, a user-agent would have to specify each charset parameter for each text/* media type it accepted, e.g.
Accept: text/html;charset=US-ASCII, text/html;charset=UTF-8, text/plain;charset=US-ASCII, text/plain;charset=UTF-8

RFC 7231 section 5.3.2 (Accept) clearly states:
Each media-range might be followed by zero or more applicable media
type parameters (e.g., charset)
So a charset parameter for each content-type is allowed. In theory a client could accept, for example, text/html only in UTF-8 and text/plain only in US-ASCII.
But it would usually make more sense to state possible charsets in the Accept-Charset header as that applies to all types mentioned in the Accept header.
If those headers’ charsets don’t overlap, the server could send status 406 Not Acceptable.
However, I wouldn’t expect fancy cross-matching from a server for various reasons. It would make the server code more complicated (and therefore more error-prone) while in practice a client would rarely send such requests. Also nowadays I would expect everything server-side is using UTF-8 and sent as-is so there’s nothing to negotiate.

According to Mozilla Development Network, you should never use the Accept-Charset header. It's obsolete.

I don't think it matters. The client is doing something dumb; there doesn't need to be interoperability for that :-)

Related

BizTalk 2016 WCF-WebHttp Caching Headers

I've created a send pipeline with only a custom pipeline component which creates a mime message to POST to a Rest API that requires a multipart/form-data. It works but fails for every 2nd invocation. It alternates between success and failure. When it fails, the boundary I've written to the header appears to be overwritten by the WCF-WebHttp adapter using the boundary of the previously successful message.
I've made sure that I'm writing the correct boundary to the header.
Any streams I've used in the pipeline component have been added to the pipeline resource manager.
If I restart the host instance after the first successful message, the next message will be successful.
Waiting 10 minutes between processing each message has no change in the observed behaviour.
If I send a different file through when the failure is expected to occur, the header content-length is still the same as the previous file. This suggests that the header used is exactly the same as the previous invocation.
The standard BizTalk mime component doesn't write the boundary to the header, so doesn't offer any clue.
Success
POST http://somehost/Record HTTP/1.1
Content-Type: multipart/form-data; boundary="9ccdeb0a-c407-490c-9cce-c5e3be639785"
Host: somehost
Content-Length: 11989
Expect: 100-continue
Accept-Encoding: gzip, deflate
--9ccdeb0a-c407-490c-9cce-c5e3be639785
Content-Type: text/plain; charset=utf-8
Content-Disposition: form-data; name=uri
6442
--9ccdeb0a-c407-490c-9cce-c5e3be639785
Fail: boundary in header not same as in payload
POST http://somehost/Record HTTP/1.1
Content-Type: multipart/form-data; boundary="9ccdeb0a-c407-490c-9cce-c5e3be639785"
Host: somehost
Content-Length: 11989
Expect: 100-continue
Accept-Encoding: gzip, deflate
--3fe3e969-8a41-451c-aae7-8458aee0c9f4
Content-Type: text/plain; charset=utf-8
Content-Disposition: form-data; name=uri
6442
--3fe3e969-8a41-451c-aae7-8458aee0c9f4
Content-Disposition: form-data; name=Files; filename=testdoc.docx; filename*=utf-8''testdoc.docx
My problem will be fixed if I can get the header to use the correct boundary. Any suggestions?
I'm more surprised you actually had some success with this approach. The thing is, the headers aren't officially message properties but are port properties. And ports cache their settings. You have to make it a dynamic send port for it to properly work. Another way is by setting the headers in a custom behavior, but I don't think that suits your scenario.

SignalR longPolling with GET method and application/json Content-Type causes security warnings

Our Third Party security software is being triggered by an apparent mismatch between a header of GET and a Content-Type of application/json.
Payload not allowed (Content-Type header not allowed for this method)
/signalr/poll
transport=longPolling&messageId=...&clientProtocol=1.4&etc
application/json; charset=UTF-8
Mozilla/5.0 (Windows NT 6.1;Trident/7.0; rv:11.0) like Gecko
Is this a known issue or have I done something silly?
Thanks,
James
Although largely superfluous, it is the default behaviour of SignalR to send a Content-Type header with GET method Http requests.
Content-Type: application/json; charset=UTF-8
I have confirmed this with a small SignalR test program and Fiddler.
As far as I can tell, our Third Party Security software is just being a little overeager.

How does my browser display a pdf when it didn't specify that's something it would accept?

I'm writing a simple HTTP server that will serve content from the file system.
I'm a little confused as to how the client and server negotiate content type.
After doing some research, I found that Content-Type specifies the content type of the HTTP message being sent, while the Accept header specifies what the program expects to receive as a response.
When I visit my server from my browser, and read the initial GET request (when visited with a null URI), I get the following:
GET / HTTP/1.1
Host: 127.0.0.1:1234
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
As you can see, the accept header doesn't specify it will accept pdfs, judging by the fact that I can't see the MIME type application/pdf in the accept header value.
Yet, when I send a pdf's bytes along with a content type set to application/pdf, the browser magically displays it.
So, what am I missing? I originally thought the browser might be doing some basic inference on the URI to see if it ends it .pdf, and then accept the corresponding MIME type.
But, when I visit it with a link to a pdf, the Accept header stays the same.
Any help would be really appreciated.
I'm writing a simple HTTP server
Then you should learn to find your way around the various RFCs that describe HTTP.
The relevant one here is RFC 7231, 5.3.2. Accept:
If the header field is
present in a request and none of the available representations for
the response have a media type that is listed as acceptable, the
origin server can either honor the header field by sending a 406 (Not
Acceptable) response or disregard the header field by treating the
response as if it is not subject to content negotiation.
A browser in principle wants to display HTML-formatted documents, for whatever variant of (X)HTML the server is willing to serve, so by default it sends the accept header you observed.
If the request is for another kind of resource however, the server is free to respond with that type of content.

Browser encoding HTTP requests

What encoding does browser use when sending HTTP requests?
I mean when browser sends the very first request how can it be sure that the encoding it uses will be understood by the server?
Example:
GET /hello.htm HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.tutorialspoint.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
A browser can tell the server explicitly which encoding is used thanks to Content-type header. Content-type might contains charset, but it's possible to infer the encoding by type. For example, application/json:
Content-type: application/json; charset=utf-8 designates the content
to be in JSON format, encoded in the UTF-8 character encoding.
Designating the encoding is somewhat redundant for JSON, since the
default (only?) encoding for JSON is UTF-8. So in this case the
receiving server apparently is happy knowing that it's dealing with
JSON and assumes that the encoding is UTF-8 by default, that's why it
works with or without the header.
What about the situation that Content-type is not defined in request?
A sender that generates a message containing a payload body SHOULD
generate a Content-Type header field in that message unless the
intended media type of the enclosed representation is unknown to the
sender. If a Content-Type header field is not present, the recipient
MAY either assume a media type of "application/octet-stream"
([RFC2046], Section 4.5.1) or examine the data to determine its type.

Scope of "charset" in Content-Type HTTP header

HTTP responses generated by the Pyramid web framework append ; charset=UTF-8 to the Content-Type HTTP header. For example,
Content-Type: application/json; charset=UTF-8
Section 14.17 of RFC 2616 gives an example of this:
Content-Type: text/html; charset=ISO-8859-4
However, there's no description of the role of this charset "property". What scope does this have, and who interprets it?
It defines the character encoding of the entity being transferred, and can be interpreted by the remote user. Pyramid is telling everyone that it only ever talks to people in UTF-8, rather than defaulting to ISO-8859-1.

Resources