Purpose of +xml in HTTP MIME type - http

What's the significance of the +xml in the following HTTP Accept Header:
application/vnd.google-earth.kml+xml
Is that just to denote it's an XML based format, or that it's suitable for XML editors, or something completely different?
http://www.iana.org/assignments/media-types/application/vnd.google-earth.kml+xml

Check the RFC:
Appendix A. Why Use the '+xml' Suffix for XML-Based MIME Types?
Although the use of a suffix was not considered as part of the
original MIME architecture, this choice is considered to provide the
most functionality with the least potential for interoperability
problems or lack of future extensibility. The alternatives to the
'+xml' suffix and the reason for its selection are described below.
There is a whole list of reasons listed underneath, but I don't think copying them verbatim falls under fair-use. So check the RFC for their full (long!) story of how this came to be.

Related

MIME type for Flatbuffers?

I've searched to find the proper MIME type for flatbuffers but I can't seem to find any. No mention of it on their documentation either.
The project page: https://github.com/google/flatbuffers
There is a similar question for protocol buffers, with a useful answer here: https://stackoverflow.com/a/48051331/761177
For flatbuffers something like application/x-flatbuffers;schema=x.y.z might be appropriate, where x.y.z is the namespace declared in your schema.
There is none. The correct mime type to use is application/octet-stream.
I don't think creating one would make sense either, since a naked FlatBuffer (without knowledge of its schema) cannot be parsed (unlike JSON), it is an opaque binary file. application/flatbuffer (if it existed) is barely more useful than application/octet-stream.
You need the schema before the file becomes readable, and I don't think mime types have a way to specify the schema name.. though I suppose flatbuffers/schema-name would be cool, if whatever standards body governs mime types would allow it :)

Why does 'x-www-form-urlencoded' begin with 'x-www', when other standard content types do not?

I understand that in the past, it was standard for custom headers names to use the prefix "X-" (I'm aware it no longer is considered standard to do this), but I've been unable to find if there is any relationship between this naming convention and the value ("application/x-www-form-urlencoded"). Did it start out as a custom content-type value that was later adopted or something?
I found this link here, which certainly was interesting, but have been unable to find the answer to my question.
Does anybody know the reason this prefix was chosen, and what it signifies?
it was standard for custom headers names to use the prefix "X-"
Actually … no, not at all. To be precise: It has never been a standard, just a best practice. It allowed implementors to introduce new content types and codings without the need to write an entire RFC for it. Nowadays the IANA Media Type Registry is good for that. RFC 6648 put an end to this practice.
The reason application/x-www-form-urlencoded is prefixed in this way (it is listed as a proper MIME type in said registry, btw)) stems from the fact that it is a "custom" method of structuring the query string in a URL. That part has never seen proper regulation. The people behind HTML just went and did it, which fully justified the prefix.
As far as the history: it has the x- prefix because it originated in a proposal from Mosaic—and since it was just a proposal, they used that x- extension prefix to initially define it. But then other browsers implemented it that way too, and nobody ever got around to taking the time to properly standardize an unprefixed alternative, so it just stuck that way, and here were are now.
It can be traced back to a 1993 thread on the www-talk mailing list titled “Submitting input-form data to server”, and in that thread, a September 1993 message from Marc Andreessen:
This is what we're doing in Mosaic 2.0… See
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html
...for details on what we're up to
That link is broken now but the document, titled “Mosaic for X version 2.0 Fill-Out Form Support” is archived at archive.org. Here’s the relevant excerpt:
ENCTYPE specifies the encoding for the fill-out form contents. This attribute only applies if METHOD is set to POST -- and even then, there is only one possible value (the default, application/x-www-form-urlencoded) so far.
Anyway, application/x-www-form-urlencoded is now formally defined in the URL spec, with algorithms for parsing and serializing it—though the section it’s all defined in has this note:
The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.

Proper way to include data with an HTTP PATCH request

When I'm putting together an HTTP PATCH request, what are my options to include data outside of URL parameters?
Will any of the following work, and what's the most common choice?
multipart/form-data
application/x-www-form-urlencoded
Raw JSON
...any others?
There are no restrictions on the entity bodies of HTTP PATCH requests as defined in RFC 5789. So in theory, your options in this area are unlimited.
In my opinion the only sensible choice is to use the same Content-Type used to originally create the resource. The most common choice is application/json simply because most modern APIs utilize JSON as their preferred data transfer format.
The only relevent statement RFC 5789 makes in regard to what should and shouldn't be part of your PATCH entity body is silent on the matter of Content-Type:
the enclosed entity contains a set of instructions describing how a resource currently residing on the origin server should be modified to produce a new version.
In summary, how you choose to modify resources in your application is entirely up to you.
As rdlowrey writes, RFC 5789 does not mandate specific content types, so the choice of format is up to you.
However, using the general formats you listed or making up your own format is not interoperable, and developers could have a hard time figuring out the semantics you chose. An official erratum to the RFC states this in a more formal way:
The means of applying a PATCH request to a resource's state is
determined by the request's media type. If a server receives a PATCH
request with a media type whose specification does not define
semantics specific to PATCH, the server SHOULD reject the request by
returning the 415 Unsupported Media Type status code, unless a more
specific error status code takes priority.
In particular, servers SHOULD NOT assume PATCH semantics for generic
media types that don't define them, such as application/xml or
application/json. Doing so will cause interoperability issues,
because the semantics of PATCH become specific to that resource,
rather than general.
(Quote formatted for readability, but unchanged otherwise)
One media type whose specification defines PATCH semantics is application/json-patch+json, also called JSON Patch: RFC 6902. I suppose it could be considered the "standard" choice (at least) when dealing with data originally posted as JSON.
The PATCH method is defined in the RFC 5789. This document, however, doesn't enforce any media type for the payload:
The PATCH method requests that a set of changes described in the request entity be applied to the resource identified by the Request-URI. The set of changes is represented in a format called a "patch document" identified by a media type.
Other RFCs, released years later, define some media types for describing a set of changes to the applied to a resource, suitable for PATCHing:
application/json-patch+json
Defined in the RFC 6902:
JSON Patch defines a JSON document structure for expressing a sequence of operations to apply to a JavaScript Object Notation (JSON) document; it is suitable for use with the HTTP PATCH method. The application/json-patch+json media type is used to identify such patch documents.
application/merge-patch+json
Defined in the RFC 7396:
This specification defines the JSON merge patch format and processing rules. The merge patch format is primarily intended for use with the HTTP PATCH method as a means of describing a set of modifications to a target resource's content.

MIME type for msgpack?

msgpack seems to be an extremely fast, if extremely new format for data serialisation. Does it have a recognised MIME type yet? If not, what should be used in the interim?
From Wikipedia :
According to RFC 6838 (published in January 2013 : https://www.rfc-editor.org/rfc/rfc6838), any use of types in the "x." tree is strongly discouraged. Media types with names beginning with "x-" are no longer considered to be members of this tree since January 2013.
Then use directly application/msgpack
According to a quick Google the overwhelming answer is application/x-msgpack. However, I can't find an authorative source.
application/x-msgpack is probably the correct MIME header, however a small caution to future readers: relying on a MIME type for anything beyond high-level information is dangerous (at best) because the structure and meaning of the message is dynamic in nature.
Application types:
application/msgpack
application/x-msgpack
application/*+msgpack

What is the best way to determine the mime type of an http file upload?

Assume you have an html form with an input tag of type 'file'. When the file is posted to the server it will be stored locally, along with relevant metadata.
I can think of three ways to determine the mime type:
Use the mime type supplied in the 'multipart/form-data' payload.
Use the file name supplied in the 'multipart/form-data' payload and look up the mime type based on the file extension.
scan the raw file data and use a mime type guessing library.
None of these solutions are perfect.
Which is the most accurate solution?
Is there another, better option?
If you are using PHP then you can use
http://pecl.php.net/package/Fileinfo
Which will inspect many aspects of the file. For Python you can use
http://pypi.python.org/pypi/python-magic/0.1
Which is the bindings for libmagic on Linux/Unix and possibly Windows? systems. See:
man magic
man libmagic
On Linux. It uses magic number tests to try and assert mime-types of files.
I like the magic number method, because it can catch wrong extensions and alot of trickery if you are handling files on a webserver that are uploaded. These tests are generally one-offs so the performance hit of reading through the file is negligible.
I don't think you can rely on any one of these as being the definite "I am mime type x". The problem with the first two are that the content type supplied may be incorrect, because of issues with the client (browser or otherwise) or a misleading request (various hack attempts etc...) from various clients.
So you should probably try and combine information from each type and work out some sort of confidence level. Iif the file extension says .doc and the mime type is application/msword then there's a pretty good chance it's a word document, but run it through a mime type detection utility just to make sure.
There should be a solution available for mime magic detection with the language you're using - you didn't mention which one though. They all generally work by looking at the first few bytes/characters of the file and match them against a lookup table of mime types. Some also remove the BOM from the file to help with this. Often they fall back to plain text if the mime type can't be detected.
If you want a platform independent approach to this then take a look at the various Java libraries that exist:
http://code.google.com/p/mimemagic/
http://sourceforge.net/projects/jmimemagic/

Resources