Clarification regarding validity of using data-URIs in CSS url()

Clarification regarding validity of using data-URIs in CSS url() - css

I'm writing a pre-processing component (in PHP) which, in certain contexts, rewrites external image file requests in CSS such as:
background-image: url('/my-folder/my-image.png');
as CSS-inlined Data URIs, such as:
background-image: url('data:image/png;base64,[Base-64 Encoding Here]');
I've just read (with some surprise) over at MDN:
In CSS Level 1, the url() functional notation described only true
URLs. In CSS Level 2, the definition of url() was extended to describe
any URI, such as a data-uri. CSS Values and Units Level 3 returned to
the narrower, initial definition. Now, url() denotes only true <url>s.
Source: https://developer.mozilla.org/en-US/docs/Web/CSS/url()
Really? This would seem to suggest that Data-URIs constitute an invalid value for url() in CSS Stylesheets (?)
But I can find nothing in:
https://www.w3.org/TR/css-values-3/
that backs this up.
I was under the impression that a Data-URI is an entirely valid value for url() in CSS Stylesheets.
Can anyone clarify (ideally with an authoritative reference), please?
N.B. The tag below reads w3c-validation - I recognise it should probably read what-wg-validation.

data: URIs are actually valid URLs as per RFC 2397, don't worry, they are still allowed.
Not sure what this MDN article tried to imply when it says "such as a data-uri", but I did edit it out to URN since it's actually what happened in CSS 2:
The specs did indeed extend the <url> notation to all URIs, by allowing Uniform Resource Names to be part of it too... I can't tell why they did this change, but it seems very weird to say the least, as I can't see how an URN could be any useful in a stylesheet... According to the specs wording, it seems its authors didn't quite know yet what it would be.
URLs (Uniform Resource Locators, see [RFC1738] and [RFC1808]) provide the address of a resource on the Web. An expected new way of identifying resources is called URN (Uniform Resource Name). Together they are called URIs (Uniform Resource Identifiers, see [URI]). This specification uses the term URI.
Ps: Specs define it as "data: URLs" from the fetch API.

Related

Why does 'x-www-form-urlencoded' begin with 'x-www', when other standard content types do not?

I understand that in the past, it was standard for custom headers names to use the prefix "X-" (I'm aware it no longer is considered standard to do this), but I've been unable to find if there is any relationship between this naming convention and the value ("application/x-www-form-urlencoded"). Did it start out as a custom content-type value that was later adopted or something?
I found this link here, which certainly was interesting, but have been unable to find the answer to my question.
Does anybody know the reason this prefix was chosen, and what it signifies?

it was standard for custom headers names to use the prefix "X-"
Actually … no, not at all. To be precise: It has never been a standard, just a best practice. It allowed implementors to introduce new content types and codings without the need to write an entire RFC for it. Nowadays the IANA Media Type Registry is good for that. RFC 6648 put an end to this practice.
The reason application/x-www-form-urlencoded is prefixed in this way (it is listed as a proper MIME type in said registry, btw)) stems from the fact that it is a "custom" method of structuring the query string in a URL. That part has never seen proper regulation. The people behind HTML just went and did it, which fully justified the prefix.

As far as the history: it has the x- prefix because it originated in a proposal from Mosaic—and since it was just a proposal, they used that x- extension prefix to initially define it. But then other browsers implemented it that way too, and nobody ever got around to taking the time to properly standardize an unprefixed alternative, so it just stuck that way, and here were are now.
It can be traced back to a 1993 thread on the www-talk mailing list titled “Submitting input-form data to server”, and in that thread, a September 1993 message from Marc Andreessen:
This is what we're doing in Mosaic 2.0… See
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html
...for details on what we're up to
That link is broken now but the document, titled “Mosaic for X version 2.0 Fill-Out Form Support” is archived at archive.org. Here’s the relevant excerpt:
ENCTYPE specifies the encoding for the fill-out form contents. This attribute only applies if METHOD is set to POST -- and even then, there is only one possible value (the default, application/x-www-form-urlencoded) so far.
Anyway, application/x-www-form-urlencoded is now formally defined in the URL spec, with algorithms for parsing and serializing it—though the section it’s all defined in has this note:
The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.

What RFC defines arrays transmitted over HTTP?

What RFC defines the passing arrays over HTTP? Most web application platforms allow you to supply an array of arguments over GET or POST. The following URL is an example:
http://localhost/?var[1]=one&var[2]=two&var[3]=three
RFC1738 defines URLs, however the bracket is missing from the Backus–Naur Form(BNF) definition of the URL. Also this RFC doesn't cover POST. Ideally I would like to get the BNF for this feature as defined in the RFC.

According to Wikipedia, there is no single spec:
While there is no definitive standard, most web frameworks allow multiple values to be associated with a single field (eg. field1=value1&field1=value2&field2=value3)
That Wikipedia article links to the following Stack Overflow post, which covers a similar question: Authoritative position of duplicate HTTP GET query keys
The issue here is that form parameters can be whatever you want them to be. Some web frameworks have settled on key[number]=value for arrays, others haven't. Interestingly, RFC1866 section 8.2.4, page 48 (note: this RFC is historical and not current) shows an example with the same key used twice in a form POST:
name=John+Doe
&gender=male
&family=5
&city=kent
&city=miami
&other=abc%0D%0Adef
&nickname=J%26D
On the W3C side of things, HTML 4.01 has some information about how to encode form parameters. Sadly this doesn't cover arrays.
At the time of writing, I don't think there is a correct answer to your question - no IETF RFC or W3C spec defines the behavior that you're interested in.
(As a side note, the W3C HTML JSON form submission draft spec covers posting arrays, thank goodness.)

URIs are defined by RFC 3986.
However, what you're asking about is encoding of form parameters. You need to look up the HTML spec for that.

HTTP/HTML: Resolution of double dots (..) in the URI (request, Location header etc.)

Are HTTP requests URIs allowed to contain ".." segments?
According to RFC 2616, section 5.1.2, they can refer to absolute URIs or absolute paths (the other options in that section are not relevant for this question).
The meaning of absolute URIs and absolute paths is described in RFC 3986, which also describes an algorithm to normalize paths (that includes remove single and double dot elements).
However, I can't find the exact specification whether an RFC conforming request URI can contain ".." segments - are they allowed in an absolute path/URI, and does the server have to normalize such URIs? Or is that up to the client?
Is there any difference for "Location:" response headers? According to the spec, they can only contain absolute URIs, but does that include ".." parts? Will the client have to normalize those too before requesting the referred resource?
To clarify, I know that URIs like ../foo are illegal in those situations, but what about http://example.com/../foo? Is that a valid absolute URI?
I'm currently redirecting clients to such URIs and would like to know if that is conforming to the specifications.

If you want to "know if that is conforming to the specifications," why don't you simply refer to the relevant specification?
RFC 3986 Section 5.2 is very clear on how URI dot segments should be resolved:
This section describes an algorithm for converting a URI reference
that might be relative to a given base URI into the parsed components
of the reference's target. The components can then be recomposed, as
described in Section 5.3, to form the target URI. This algorithm
provides definitive results that can be used to test the output of
other implementations. Applications may implement relative reference
resolution by using some other algorithm, provided that the results
match what would be given by this one.
If you are, for example, following Location: headers, it's usually prudent to normalize and resolve invalid relative paths (Location: headers are supposed to be absolute URIs). In these cases you should absolutely follow the instruction of RFC 3986 to resolve those paths against your base URI.
Should you pass around dot segments in your URIs all over the place? Probably not if you can help it because you're relying on other people to have implemented the specification correctly. But does passing URIs with dot segments violate the URI specification? No.

Syntactically speaking, http://example.com/../foo is a valid URI.
How the server interprets that URI is a different matter. Servers have to be very careful about how then translate URIs to file paths, for obvious security reasons. Usually the server will either strip out .. segments, or do some kind of post-processing to make sure the file path is inside the document root.

(Thank you for the great, crisp question in a topic full of hopeless public confusion, fueled by cryptic specs and surprising subtleties!)
... what about http://example.com/../foo? Is that a valid absolute URI?
No. It's an invalid absolute URI, because it attempts to refer to a place beyond the naming authority's namespace (root).
(Accordingly, I've been rewarded with due "400 Bad request" responses by servers when trying to feed them stuff like that.)
But, assuming you really meant to ask about valid, but equally non-normalized absolute paths like /root/../foo: #rdlowrey's answer is correct: better normalize them out yourself, if you can.
(Again, as an example, my proxy failed on pages that worked fine when sent to the same server by browsers, which go the extra mile normalizing the dot-parts out, instead of relying on servers doing the same.)
However, I can't find the exact specification whether an RFC
conforming request URI can contain ".." segments - are they allowed in
an absolute path/URI, and does the server have to normalize such URIs?
Or is that up to the client?
Unfortunately, you didn't find it because it's not specified, even in HTTP 2, AFAICT :-/

RESTful URLs and folders

On the Microformats spec for RESTful URLs:
GET /people/1
return the first record in HTML format
GET /people/1.html
return the first record in HTML format
and /people returns a list of people
So is /people.html the correct way to return a list of people in HTML format?

If you just refer to the URL path extension, then, yes, that scheme is the recommended behavior for content negotiation:
path without extension is a generic URL (e.g. /people for any accepted format)
path with extension is a specific URL (e.g. /people.json as a content-type-specific URL for the JSON data format)
With such a scheme the server can use content negotiation when the generic URL is requested and respond with a specific representation when a specific URL is requested.
Documents that recommend this scheme are among others:
Cool URIs don't change
Cool URIs for the Semantic Web
Content Negotiation: why it is useful, and how to make it work

You have the right idea. Both /people and /people.html would return HTML-formatted lists of people, and /people.json would return a JSON-formatted list of people.
There should be no confusion about this with regard to applying data-type extensions to "folders" in the URLs. In the list of examples, /people/1 is itself used as a folder for various other queries.

It says that GET /people/1.json should return the first record in JSON format. - Which makes sense.

URIs and how you design them have nothing to do with being RESTful or not.
It is a common practice to do what you ask, since that's how the Apache web server works. Let's say you have foo.txt and foo.html and foo.pdf, and ask to GET /foo with no preference (i.e. no Accept: header). A 300 MULTIPLE CHOICES would be returned with a listing of the three files so the user could pick. Because browsers do such marvelous content negotiation, it's hard to link to an example, but here goes: An example shows what it looks like, except for that the reason you see the page in the first place is the different case of the file name ("XSLT" vs "xslt").
But this Apache behaviour is echoed in conventions and different tools, but really it isn't important. You could have people_html or people?format=html or people.html or sandwiches or 123qweazrfvbnhyrewsxc6yhn8uk as the URI which returns people in HTML format. The client doesn't know any of these URIs up front, it's supposed to learn that from other resources. A human could see the result of All People (HTML format) and understand what happens, while ignoring the strange looking URI.
On a closing note, the microformats URL conventions page is absolutely not a spec for RESTful URLs, it's merely guidance on making URIs that apparently are easy to consume by various HTTP libraries for some reason or another, and has nothing to do with REST at all. The guidelines are all perfectly OK, and following them makes your URIs look sane to other people that happen to glance on the URIs (/sandwiches is admittedly odd). But even the cited AtomPub protocol doesn't require entries to live "within" the collection...

standard for resolving a urn:uuid (and other)?

My application uses urn:uuid as URIs for entities. Of course, when I get, e.g. RDF information about a resource, the referred entities (subject or objects) will contain URIs in the urn:uuid schema. To fetch the representation of the new entity, possibly in a REST way, I need a "resolver", similar in some way to dx.doi.org for DOIs. Another case could be the resolution of a isbn: URI, so to obtain a sensible representation of this URI.
My question is relative to what's out there, in terms of proposed standards, for URI-to-representation-URL resolution.

The concluded URN Working Group of the IETF has also done some work on resolving URNs and published quite a few RFCs on this topic. A list of references is contained in the group charter. Maybe some of them help you.

An UUID is a universally unique identifier, so I don't see how you would be able to resolve a uuid I just generated (e.g. 3136aa1a-fec8-11de-a55f-00003925d394) to something useful.
Only if you manage a database of uuids somewhere, you can retrieve more from it. Or you would have to ask everyone/everything "Do you know this uuid?"
The urn:uuid definition defines a clear space of unique identifiers you can use for defining something truly unique. But as nobody else can guess its value, you can't derive information from it.

There is no standard (proposed or otherwise) for resolving a URN. It's just a name (Uniform Resource NAME) and may have arbitrary meaning.
XML/RDF creates some confusion by using URNs which do resolve because they happen to also be URLs (Uniform Resource Locators) which point to objects describing their meaning, but this is merely a convention. They merely have to be unique and always mean the same thing.
If you are developing an application, you might want to consider use URNs which are also resolvable URLs for items with fixed meaning, and randomly generated URN's in the urn:uuid namespace to identify instances of objects.
That sounded about as confusing as the RDF spec:-)
Quick example:
Tiger: http://www.example.com/animals/tiger
Instance of a Tiger: urn:uuid:9a652678-4616-475d-af12-aca21cfbe06d
There might be a HTML page at http://www.example.com/animals/tiger, but there doesn't have to be. It's merely a convention.
[Additional Clarification Added]
The distinction here is between URNs (Names) and URLs (Locations).
A URN just names something. It's not a location of anything.
URLs are valid URNs, so you can use a URL for a URN if you want to.
In the above example, I could use e.g. http://www.example.com/tigers/9a652678-4616-475d-af12-aca21cfbe06d as the name of my tiger. I could put something at that address. But what would I put there? You can't download an instance of a tiger using http!
The convention in RDF is that if a URN is also a URL, it will point at some documentation defining what the name means.
What RDF is trying to give you is a convention for naming things which ensures that when two people use the same name, they mean the same thing. The UUID specification allows you to generate a unique name for something which is not likely to be used by anything else. But it's just a name, and there's no way of turning it into a thing.
Hope this helps.

One reason URNs exist is to give people the opportunity to create identifiers without the (implicit) responsibility of maintaining a service that describes the underlying resources. You could say that for RDF this is an advantage, but not a necessity, but you'd also be less inclined to use a particular vocabulary for example if you discovered that those HTTP URLs are no longer dereferenceable.
That being said, some URNs can be traced back to their representation. Here are some examples:
The ietf namespace defines several identifier schemes, so URIs like urn:ietf:rfc:2648 can be resolved if you implement the specific patterns.
Some namespaces are defined in other IANA registries, for example urn:ietf:params:xml: with the corresponding files for the resources.
Other namespaces point to already-established identifier spaces, like urn:isbn: (some metadata can be retrieved, but I don't think there is anything that will allow you to download the book from its ISBN), urn:oid:. There is also urn:publicid:, some of whose identifiers may be found somewhere deep inside ISO.
There is no general mechanism for URN resolution, and indeed there cannot be (that is also true for other URI schemes, like tag:).
Talking specifically about UUIDs, in my opinion, the best way out of this is not to use a URN at all. If you want to use a web server for the resolution, a "standard" way is to use the genid well-known service, thus your primary URI would be something like this: http://example.org/.well-known/genid/b47df9f0-a9c5-4e8a-9762-844a33ba7a3e. If you host RDF at that location, there is nothing wrong with adding owl:sameAs <urn:uuid:b47df9f0-a9c5-4e8a-9762-844a33ba7a3e> there if you have to.
To my knowledge, there is only one method that is in use today to create a link that conveys the question "Do you know this URN?", well, kind of: a magnet: link. There is nothing in principle that would require you to use a hash there like you usually find, so something like magnet:?xt=urn:uuid:b47df9f0-a9c5-4e8a-9762-844a33ba7a3e could work, provided you have your own client that can handle that.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex