What RFC defines arrays transmitted over HTTP? - http

What RFC defines the passing arrays over HTTP? Most web application platforms allow you to supply an array of arguments over GET or POST. The following URL is an example:
http://localhost/?var[1]=one&var[2]=two&var[3]=three
RFC1738 defines URLs, however the bracket is missing from the Backus–Naur Form(BNF) definition of the URL. Also this RFC doesn't cover POST. Ideally I would like to get the BNF for this feature as defined in the RFC.

According to Wikipedia, there is no single spec:
While there is no definitive standard, most web frameworks allow multiple values to be associated with a single field (eg. field1=value1&field1=value2&field2=value3)
That Wikipedia article links to the following Stack Overflow post, which covers a similar question: Authoritative position of duplicate HTTP GET query keys
The issue here is that form parameters can be whatever you want them to be. Some web frameworks have settled on key[number]=value for arrays, others haven't. Interestingly, RFC1866 section 8.2.4, page 48 (note: this RFC is historical and not current) shows an example with the same key used twice in a form POST:
name=John+Doe
&gender=male
&family=5
&city=kent
&city=miami
&other=abc%0D%0Adef
&nickname=J%26D
On the W3C side of things, HTML 4.01 has some information about how to encode form parameters. Sadly this doesn't cover arrays.
At the time of writing, I don't think there is a correct answer to your question - no IETF RFC or W3C spec defines the behavior that you're interested in.
(As a side note, the W3C HTML JSON form submission draft spec covers posting arrays, thank goodness.)

URIs are defined by RFC 3986.
However, what you're asking about is encoding of form parameters. You need to look up the HTML spec for that.

Related

itemtype with http or better https?

I use like:
itemtype="http://schema.org/ImageObject"
but the request http://schema.org/ImageObject will be forwarded to https://schema.org/ImageObject.
If I change to itemtype="https://schema.org/ImageObject", the Google SDTT shows no problem, but nearly all examples about structured data from Google are with http.
What is best or recommended to use http://schema.org or https://schema.org for itemtype?
From Schema.org’s FAQs:
Q: Should we write https://schema.org or http://schema.org in our markup?
There is a general trend towards using https more widely, and you can already write https://schema.org in your structured data. Over time we will migrate the schema.org site itself towards using https: as the default version of the site and our preferred form in examples. However http://schema.org -based URLs in structured data markup will remain widely understood for the forseeable future and there should be no urgency about migrating existing data. This is a lengthy way of saying that both https://schema.org and http://schema.org are fine.
tl;dr: Both variants are possible.
The purpose of itemtype URIs
Note that the URIs used for itemtype are primarily identifiers, they typically don’t get dereferenced:
If a Microdata consumer doesn’t know what the URI in itemtype="http://schema.org/ImageObject" stands for, this consumer "must not automatically dereference" it.
If a Microdata consumer does know what the URI stands for, this consumer has no need to dereference this URI in the first place.
So, there is no technical reason to prefer the HTTPS variant. User agents won’t dereference this URI (in contrast to URIs specified in href/src attributes), and users can’t click on it. I think there is only one case where the HTTPS variant is useful: if a visitor looks into the source code and copy-pastes the URI to check what the type is about.
I would recommend to stick with the HTTP variant until Schema.org switched everything to HTTPS, most importantly the URI in RDF’a initial context.
The specification of Schema for the type ImageObject indicated:
Canonical URL: http://schema.org/ImageObject
It is probably useful to refer to the canonical URL because it is the “preferred” version of the web page.

Why does 'x-www-form-urlencoded' begin with 'x-www', when other standard content types do not?

I understand that in the past, it was standard for custom headers names to use the prefix "X-" (I'm aware it no longer is considered standard to do this), but I've been unable to find if there is any relationship between this naming convention and the value ("application/x-www-form-urlencoded"). Did it start out as a custom content-type value that was later adopted or something?
I found this link here, which certainly was interesting, but have been unable to find the answer to my question.
Does anybody know the reason this prefix was chosen, and what it signifies?
it was standard for custom headers names to use the prefix "X-"
Actually … no, not at all. To be precise: It has never been a standard, just a best practice. It allowed implementors to introduce new content types and codings without the need to write an entire RFC for it. Nowadays the IANA Media Type Registry is good for that. RFC 6648 put an end to this practice.
The reason application/x-www-form-urlencoded is prefixed in this way (it is listed as a proper MIME type in said registry, btw)) stems from the fact that it is a "custom" method of structuring the query string in a URL. That part has never seen proper regulation. The people behind HTML just went and did it, which fully justified the prefix.
As far as the history: it has the x- prefix because it originated in a proposal from Mosaic—and since it was just a proposal, they used that x- extension prefix to initially define it. But then other browsers implemented it that way too, and nobody ever got around to taking the time to properly standardize an unprefixed alternative, so it just stuck that way, and here were are now.
It can be traced back to a 1993 thread on the www-talk mailing list titled “Submitting input-form data to server”, and in that thread, a September 1993 message from Marc Andreessen:
This is what we're doing in Mosaic 2.0… See
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html
...for details on what we're up to
That link is broken now but the document, titled “Mosaic for X version 2.0 Fill-Out Form Support” is archived at archive.org. Here’s the relevant excerpt:
ENCTYPE specifies the encoding for the fill-out form contents. This attribute only applies if METHOD is set to POST -- and even then, there is only one possible value (the default, application/x-www-form-urlencoded) so far.
Anyway, application/x-www-form-urlencoded is now formally defined in the URL spec, with algorithms for parsing and serializing it—though the section it’s all defined in has this note:
The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.

Proper way to include data with an HTTP PATCH request

When I'm putting together an HTTP PATCH request, what are my options to include data outside of URL parameters?
Will any of the following work, and what's the most common choice?
multipart/form-data
application/x-www-form-urlencoded
Raw JSON
...any others?
There are no restrictions on the entity bodies of HTTP PATCH requests as defined in RFC 5789. So in theory, your options in this area are unlimited.
In my opinion the only sensible choice is to use the same Content-Type used to originally create the resource. The most common choice is application/json simply because most modern APIs utilize JSON as their preferred data transfer format.
The only relevent statement RFC 5789 makes in regard to what should and shouldn't be part of your PATCH entity body is silent on the matter of Content-Type:
the enclosed entity contains a set of instructions describing how a resource currently residing on the origin server should be modified to produce a new version.
In summary, how you choose to modify resources in your application is entirely up to you.
As rdlowrey writes, RFC 5789 does not mandate specific content types, so the choice of format is up to you.
However, using the general formats you listed or making up your own format is not interoperable, and developers could have a hard time figuring out the semantics you chose. An official erratum to the RFC states this in a more formal way:
The means of applying a PATCH request to a resource's state is
determined by the request's media type. If a server receives a PATCH
request with a media type whose specification does not define
semantics specific to PATCH, the server SHOULD reject the request by
returning the 415 Unsupported Media Type status code, unless a more
specific error status code takes priority.
In particular, servers SHOULD NOT assume PATCH semantics for generic
media types that don't define them, such as application/xml or
application/json. Doing so will cause interoperability issues,
because the semantics of PATCH become specific to that resource,
rather than general.
(Quote formatted for readability, but unchanged otherwise)
One media type whose specification defines PATCH semantics is application/json-patch+json, also called JSON Patch: RFC 6902. I suppose it could be considered the "standard" choice (at least) when dealing with data originally posted as JSON.
The PATCH method is defined in the RFC 5789. This document, however, doesn't enforce any media type for the payload:
The PATCH method requests that a set of changes described in the request entity be applied to the resource identified by the Request-URI. The set of changes is represented in a format called a "patch document" identified by a media type.
Other RFCs, released years later, define some media types for describing a set of changes to the applied to a resource, suitable for PATCHing:
application/json-patch+json
Defined in the RFC 6902:
JSON Patch defines a JSON document structure for expressing a sequence of operations to apply to a JavaScript Object Notation (JSON) document; it is suitable for use with the HTTP PATCH method. The application/json-patch+json media type is used to identify such patch documents.
application/merge-patch+json
Defined in the RFC 7396:
This specification defines the JSON merge patch format and processing rules. The merge patch format is primarily intended for use with the HTTP PATCH method as a means of describing a set of modifications to a target resource's content.

Purpose of +xml in HTTP MIME type

What's the significance of the +xml in the following HTTP Accept Header:
application/vnd.google-earth.kml+xml
Is that just to denote it's an XML based format, or that it's suitable for XML editors, or something completely different?
http://www.iana.org/assignments/media-types/application/vnd.google-earth.kml+xml
Check the RFC:
Appendix A. Why Use the '+xml' Suffix for XML-Based MIME Types?
Although the use of a suffix was not considered as part of the
original MIME architecture, this choice is considered to provide the
most functionality with the least potential for interoperability
problems or lack of future extensibility. The alternatives to the
'+xml' suffix and the reason for its selection are described below.
There is a whole list of reasons listed underneath, but I don't think copying them verbatim falls under fair-use. So check the RFC for their full (long!) story of how this came to be.

RESTful URLs and folders

On the Microformats spec for RESTful URLs:
GET /people/1
return the first record in HTML format
GET /people/1.html
return the first record in HTML format
and /people returns a list of people
So is /people.html the correct way to return a list of people in HTML format?
If you just refer to the URL path extension, then, yes, that scheme is the recommended behavior for content negotiation:
path without extension is a generic URL (e.g. /people for any accepted format)
path with extension is a specific URL (e.g. /people.json as a content-type-specific URL for the JSON data format)
With such a scheme the server can use content negotiation when the generic URL is requested and respond with a specific representation when a specific URL is requested.
Documents that recommend this scheme are among others:
Cool URIs don't change
Cool URIs for the Semantic Web
Content Negotiation: why it is useful, and how to make it work
You have the right idea. Both /people and /people.html would return HTML-formatted lists of people, and /people.json would return a JSON-formatted list of people.
There should be no confusion about this with regard to applying data-type extensions to "folders" in the URLs. In the list of examples, /people/1 is itself used as a folder for various other queries.
It says that GET /people/1.json should return the first record in JSON format. - Which makes sense.
URIs and how you design them have nothing to do with being RESTful or not.
It is a common practice to do what you ask, since that's how the Apache web server works. Let's say you have foo.txt and foo.html and foo.pdf, and ask to GET /foo with no preference (i.e. no Accept: header). A 300 MULTIPLE CHOICES would be returned with a listing of the three files so the user could pick. Because browsers do such marvelous content negotiation, it's hard to link to an example, but here goes: An example shows what it looks like, except for that the reason you see the page in the first place is the different case of the file name ("XSLT" vs "xslt").
But this Apache behaviour is echoed in conventions and different tools, but really it isn't important. You could have people_html or people?format=html or people.html or sandwiches or 123qweazrfvbnhyrewsxc6yhn8uk as the URI which returns people in HTML format. The client doesn't know any of these URIs up front, it's supposed to learn that from other resources. A human could see the result of All People (HTML format) and understand what happens, while ignoring the strange looking URI.
On a closing note, the microformats URL conventions page is absolutely not a spec for RESTful URLs, it's merely guidance on making URIs that apparently are easy to consume by various HTTP libraries for some reason or another, and has nothing to do with REST at all. The guidelines are all perfectly OK, and following them makes your URIs look sane to other people that happen to glance on the URIs (/sandwiches is admittedly odd). But even the cited AtomPub protocol doesn't require entries to live "within" the collection...

Resources