Why does 'x-www-form-urlencoded' begin with 'x-www', when other standard content types do not? - http

I understand that in the past, it was standard for custom headers names to use the prefix "X-" (I'm aware it no longer is considered standard to do this), but I've been unable to find if there is any relationship between this naming convention and the value ("application/x-www-form-urlencoded"). Did it start out as a custom content-type value that was later adopted or something?
I found this link here, which certainly was interesting, but have been unable to find the answer to my question.
Does anybody know the reason this prefix was chosen, and what it signifies?

it was standard for custom headers names to use the prefix "X-"
Actually … no, not at all. To be precise: It has never been a standard, just a best practice. It allowed implementors to introduce new content types and codings without the need to write an entire RFC for it. Nowadays the IANA Media Type Registry is good for that. RFC 6648 put an end to this practice.
The reason application/x-www-form-urlencoded is prefixed in this way (it is listed as a proper MIME type in said registry, btw)) stems from the fact that it is a "custom" method of structuring the query string in a URL. That part has never seen proper regulation. The people behind HTML just went and did it, which fully justified the prefix.

As far as the history: it has the x- prefix because it originated in a proposal from Mosaic—and since it was just a proposal, they used that x- extension prefix to initially define it. But then other browsers implemented it that way too, and nobody ever got around to taking the time to properly standardize an unprefixed alternative, so it just stuck that way, and here were are now.
It can be traced back to a 1993 thread on the www-talk mailing list titled “Submitting input-form data to server”, and in that thread, a September 1993 message from Marc Andreessen:
This is what we're doing in Mosaic 2.0… See
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html
...for details on what we're up to
That link is broken now but the document, titled “Mosaic for X version 2.0 Fill-Out Form Support” is archived at archive.org. Here’s the relevant excerpt:
ENCTYPE specifies the encoding for the fill-out form contents. This attribute only applies if METHOD is set to POST -- and even then, there is only one possible value (the default, application/x-www-form-urlencoded) so far.
Anyway, application/x-www-form-urlencoded is now formally defined in the URL spec, with algorithms for parsing and serializing it—though the section it’s all defined in has this note:
The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.

Related

Is there some way to set character encoding in SCORM 2004?

I'm trying to record some text values (cmi.interactions.n.learner_response, and cmi.interactions.n.description) on the backend. I'm sending them in a post response from a JS object that uses JSON.stringify.
Inspecting the response in PHP, accented characters äöå (and spaces) are recorded as underscores in learner_response, and in description, they are omitted altogether. Inspecting the response string, it appears to be an ASCII encoded string.
Is it possible to set encoding in SCORM 2004 so that I can see accented characters in the response? My client would like record the interactions more thoroughly. The content was created in Adobe Captivate.
Thanks.
Essentially, no. SCORM's scope limits it to what is happening in the runtime layer that is implemented as the JavaScript API that the SCORM player (the thing launching the content) provides. So the transfer mechanism between that runtime environment and the storage layer (whether that is on a server, local, etc.) is outside the scope of the spec and is therefore implementation specific.
There is reference to ISO-10646-1 which will take you down a path that likely leads to not a lot more information. Essentially it is a character set without including specifics about how to handle those elements, which for this use case probably boils down to JavaScript string.
Having said all of that you should seek support from the SCORM player to see if they have the ability to adjust that so that larger ranges of characters can be supported.

What RFC defines arrays transmitted over HTTP?

What RFC defines the passing arrays over HTTP? Most web application platforms allow you to supply an array of arguments over GET or POST. The following URL is an example:
http://localhost/?var[1]=one&var[2]=two&var[3]=three
RFC1738 defines URLs, however the bracket is missing from the Backus–Naur Form(BNF) definition of the URL. Also this RFC doesn't cover POST. Ideally I would like to get the BNF for this feature as defined in the RFC.
According to Wikipedia, there is no single spec:
While there is no definitive standard, most web frameworks allow multiple values to be associated with a single field (eg. field1=value1&field1=value2&field2=value3)
That Wikipedia article links to the following Stack Overflow post, which covers a similar question: Authoritative position of duplicate HTTP GET query keys
The issue here is that form parameters can be whatever you want them to be. Some web frameworks have settled on key[number]=value for arrays, others haven't. Interestingly, RFC1866 section 8.2.4, page 48 (note: this RFC is historical and not current) shows an example with the same key used twice in a form POST:
name=John+Doe
&gender=male
&family=5
&city=kent
&city=miami
&other=abc%0D%0Adef
&nickname=J%26D
On the W3C side of things, HTML 4.01 has some information about how to encode form parameters. Sadly this doesn't cover arrays.
At the time of writing, I don't think there is a correct answer to your question - no IETF RFC or W3C spec defines the behavior that you're interested in.
(As a side note, the W3C HTML JSON form submission draft spec covers posting arrays, thank goodness.)
URIs are defined by RFC 3986.
However, what you're asking about is encoding of form parameters. You need to look up the HTML spec for that.

Purpose of +xml in HTTP MIME type

What's the significance of the +xml in the following HTTP Accept Header:
application/vnd.google-earth.kml+xml
Is that just to denote it's an XML based format, or that it's suitable for XML editors, or something completely different?
http://www.iana.org/assignments/media-types/application/vnd.google-earth.kml+xml
Check the RFC:
Appendix A. Why Use the '+xml' Suffix for XML-Based MIME Types?
Although the use of a suffix was not considered as part of the
original MIME architecture, this choice is considered to provide the
most functionality with the least potential for interoperability
problems or lack of future extensibility. The alternatives to the
'+xml' suffix and the reason for its selection are described below.
There is a whole list of reasons listed underneath, but I don't think copying them verbatim falls under fair-use. So check the RFC for their full (long!) story of how this came to be.

Using duplicate parameters in a URL

We are building an API in-house and often are passing a parameter with multiple values.
They use: mysite.com?id=1&id=2&id=3
Instead of: mysite.com?id=1,2,3
I favor the second approach but I was curious if it was actually incorrect to do the first?
I'm not an HTTP guru, but from what I understand there's not a definitive standard on the query part of the URL regarding multiple values, it's typically up to the CGI that handles the request to parse the query string.
RFC 1738 section 3.3 mentions a searchpart and that it should go after the ? but doesn't seem to elaborate on its format.
http://<host>:<port>/<path>?<searchpart>
I did not (bother to) check which RFC standard defines it. (Anyone who knows about this please leave a reference in the comment.) But in practice, the mysite.com?id=1&id=2&id=3 way is already how a browser would produce when a form contains duplicated fields, typically the checkboxes. See it in action in this w3schools example page. So there is a good chance that the whatever programming language you are using, already provides some helper functions to parse an input like that and probably returns a list.
You could, of course, go with your own approach such as mysite.com?id=1,2,3, which is not bad at all in this particular case. But you will need to implement your own logic to produce and to consume such format. Now you may or may not need to think about handling some corner cases by yourself, such as: what if the input is not well-formed, like mysite.com?id=1,2,? And do you need to invent yet another separator, if the comma sign itself can also be a valid input, like mysite.com?name=Doe,John|Doe,Jane? Would you reach to a point that you will use a json string as the value, like mysite.com?name=["John Doe", "Jane Doe"]? etc. etc.. Your mileage may vary.
Worth adding that inconsistend handling of duplicate parameters in the URL on the server is may lead to vulnerabilities, specifically server-side HTTP parameter pollution, with a practical example - Client side Http Parameter Pollution - Yahoo! Classic Mail Video Poc.
in your first approach you will get an array of querystring values but in second approach you will get a string of querystring values.
I guess it depends on technology you use, how it becomes convenient. I am currently standing in front of the same question using currency=USD,CHF or currency=USD&currency=CHF
I am using Thymeleaf and using the second option makes it easy to work, I can then request something like: ${param.currency.contains(currency.value)}. When I try to use the first option it seems it takes the "array" like a string, so I need to split first and then do contain, what leads me to a more mess code.
Just my 50 cents :-)

MIME type for msgpack?

msgpack seems to be an extremely fast, if extremely new format for data serialisation. Does it have a recognised MIME type yet? If not, what should be used in the interim?
From Wikipedia :
According to RFC 6838 (published in January 2013 : https://www.rfc-editor.org/rfc/rfc6838), any use of types in the "x." tree is strongly discouraged. Media types with names beginning with "x-" are no longer considered to be members of this tree since January 2013.
Then use directly application/msgpack
According to a quick Google the overwhelming answer is application/x-msgpack. However, I can't find an authorative source.
application/x-msgpack is probably the correct MIME header, however a small caution to future readers: relying on a MIME type for anything beyond high-level information is dangerous (at best) because the structure and meaning of the message is dynamic in nature.
Application types:
application/msgpack
application/x-msgpack
application/*+msgpack

Resources