Meaning of XML namespaces - xhtml

im wondering myself about the meaning of the following sentences:
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:h="http://java.sun.com/jsf/html"
xmlns:f="http://java.sun.com/jsf/core">
What exactly are these sentences doing?
Obviously they bind the concerning html tag library to the prefix h: jsf/html or f: jsf/core or default xhtml.
It is said that these tag libraries are "namespaces" - that means a collection of elements and attributes that defines these html tags to which we have access by the given prefixes - or not?
Do that mean that the URIs above redirect me to some sort of DTDs, where the tag elements are defined?

Yes: https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=xml%20namespaces
A namespace name is a uniform resource identifier (URI). Typically, the URI chosen for the namespace of a given XML vocabulary describes a resource under the control of the author or organization defining the vocabulary, such as a URL for the author's Web server. However, the namespace specification does not require nor suggest that the namespace URI be used to retrieve information; it is simply treated by an XML parser as a string. For example, the document at http://www.w3.org/1999/xhtml itself does not contain any code. It simply describes the XHTML namespace to human readers. Using a URI (such as "http://www.w3.org/1999/xhtml") to identify a namespace, rather than a simple string (such as "xhtml"), reduces the probability of different namespaces using duplicate identifiers.
Although the term namespace URI is widespread, the W3C Recommendation refers to it as the namespace name. The specification is not entirely prescriptive about the precise rules for namespace names (it does not explicitly say that parsers must reject documents where the namespace name is not a valid Uniform Resource Identifier), and many XML parsers allow any character string to be used. In version 1.1 of the recommendation, the namespace name becomes an Internationalized Resource Identifier, which licenses the use of non-ASCII characters that in practice were already accepted by nearly all XML software. The term namespace URI persists, however, not only in popular usage, but also in many other specifications from W3C and elsewhere.
Following publication of the Namespaces recommendation, there was an intensive debate about how a relative URI should be handled, with some intensely arguing that it should simply be treated as a character string, and others arguing with conviction that it should be turned into an absolute URI by resolving it against the base URI of the document.[3] The result of the debate was a ruling from W3C that relative URIs were deprecated.[4]
The use of URIs taking the form of URLs in the http scheme (such as http://www.w3.org/1999/xhtml) is common, despite the absence of any formal relationship with the HTTP protocol. The Namespaces specification does not say what should happen if such a URL is dereferenced (that is, if software attempts to retrieve a document from this location). One convention adopted by some users is to place an RDDL document at the location.[5] In general, however, users should assume that the namespace URI is simply a name, not the address of a document on the Web.

Related

Clarification regarding validity of using data-URIs in CSS url()

I'm writing a pre-processing component (in PHP) which, in certain contexts, rewrites external image file requests in CSS such as:
background-image: url('/my-folder/my-image.png');
as CSS-inlined Data URIs, such as:
background-image: url('data:image/png;base64,[Base-64 Encoding Here]');
I've just read (with some surprise) over at MDN:
In CSS Level 1, the url() functional notation described only true
URLs. In CSS Level 2, the definition of url() was extended to describe
any URI, such as a data-uri. CSS Values and Units Level 3 returned to
the narrower, initial definition. Now, url() denotes only true <url>s.
Source: https://developer.mozilla.org/en-US/docs/Web/CSS/url()
Really? This would seem to suggest that Data-URIs constitute an invalid value for url() in CSS Stylesheets (?)
But I can find nothing in:
https://www.w3.org/TR/css-values-3/
that backs this up.
I was under the impression that a Data-URI is an entirely valid value for url() in CSS Stylesheets.
Can anyone clarify (ideally with an authoritative reference), please?
N.B. The tag below reads w3c-validation - I recognise it should probably read what-wg-validation.
data: URIs are actually valid URLs as per RFC 2397, don't worry, they are still allowed.
Not sure what this MDN article tried to imply when it says "such as a data-uri", but I did edit it out to URN since it's actually what happened in CSS 2:
The specs did indeed extend the <url> notation to all URIs, by allowing Uniform Resource Names to be part of it too... I can't tell why they did this change, but it seems very weird to say the least, as I can't see how an URN could be any useful in a stylesheet... According to the specs wording, it seems its authors didn't quite know yet what it would be.
URLs (Uniform Resource Locators, see [RFC1738] and [RFC1808]) provide the address of a resource on the Web. An expected new way of identifying resources is called URN (Uniform Resource Name). Together they are called URIs (Uniform Resource Identifiers, see [URI]). This specification uses the term URI.
Ps: Specs define it as "data: URLs" from the fetch API.

Why do URIs of specifications/vocabularies contain date information?

Most example namespace URIs seem to contain some combination of year/month/day in their path:
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs: http://www.w3.org/2000/01/rdf-schema#>
It's not obvious (to me) why it makes sense to include part of the created date in the URI when the concepts that are included in the vocabularies are not exactly temporal.
EDIT
There may be additional confusion because of old w3c (and potentially other org) notes lying around that are still high up on SEO for semantic web. For example, this note from a w3c users group recommends the use of dates in URIs.
Manageability.
Issue your URIs in a way that you can manage. One good practice is to include the current year in the URI path, so that you can change the URI-schema each year without breaking older URIs.
#cygri is still correct (the link is pre-2010), just pointing this out for people that come across conflicting information.
It's not obvious (to me) why it makes sense to include part of the created date in the URI when the concepts that are included in the vocabularies are not exactly temporal.
It's a bad idea and shouldn't be done.
These namespaces are from the very early days of RDF, when good practices around URI management for the Semantic Web were not yet understood. Today, W3C uses much shorter and undated namespaces like http://www.w3.org/ns/csvw# for new vocabularies, but changing the old namespaces is not really possible given the huge amount of data and tools that are already published with these namespaces baked in.
So why did W3C think it was a good idea to include the date back then?
Because W3C includes date information in pretty much all their URIs. It's the date when the URI was allocated. It's their way of ensuring that URIs are unique and don't accidentally clash. So, all URIs that were allocated in the year 2000 have a path that starts with /2000/, all from 2001 start with /2001/ and so on. For “high-value” documents, like W3C standards, they also allocate a short alias such as http://www.w3.org/TR/html.
I suppose back then they thought that short aliases are not necessary for vocabularies, because only machines would see those URIs, and namespace prefixes would be used to hide them from view.
Today, the general wisdom is to “leave out as much as possible” when allocating URIs. So, schema.org with class URIs like http://schema.org/Person is pretty much perfect.
Most namespace URIs seem to contain some combination of year/month/day in their path
That's not really true. If a namespace URI has a date in it, it's probably a W3C URI from before 2010. Most namespace URIs don't have dates in them.

Why is the namespace pattern for CustomData a full http address and index suffix

In UI5, customdata is used to contain additional parameters that are passed to event handlers in the eventControl.data() object. This is akin to jquery data functionality.
In an XML view definition for UI5 we have to define namespaces. In all the documentation and examples I have stumbled across on the web so far, the namespace for customdata has a pattern that references a specific http address AND a specific index number suffix, whereas the pattern for all others I have met thus far has been not included either.
See example below where customdata is the last entry in the list.
<mvc:View
controllerName="sapui5.muSample.controller.Master"
xmlns:mvc="sap.ui.core.mvc"
xmlns:l="sap.ui.layout"
xmlns="sap.m"
xmlns:c="sap.ui.core"
xmlns:app="http://schemas.sap.com/sapui5/extension/sap.ui.core.CustomData/1"
>
If I want to run OpenUI5 entirely from my servers hard drive thus without referring to any external sources, how can I alter the reference to customdata? Or am I misunderstanding and is the use of the http prefix NOT going to invoke communication across the web to the given address? If so I am even more confused.
I would like to understand the reason for the full http reference and index suffix AND know how to replace it with an entirely local reference. I tried to make a local reference but without success and I suspect there is more going on here than I grok.
EDIT: I have learned from the Internet and proven with Fiddler that the web address used in the namespace is not for use in communications. So I guess the clue might be the /1 suffix. Still confused though.
Answering for myself in case it helps someone else and to solicit corrections from community if needed.
I searched the debug source of UI5 for the string "http://schemas.sap.com/sapui5/extension/sap.ui.core.CustomData/1" and found a match in XMLTemplateProcessor-dbg.js wherein there is an if statement seeking this precise string.
My conclusions are:
This is nothing more than a namespace that does not match the general pattern in UI5 namespacing that I have experienced to date. Possibly a 'coded-by-a-separate-team' issue or some similar human cause.
The namespace text is arbitrary and has no meaning other than being a match to that needed in the UI5 code.
In a general namespace context, the /1 suffix would be an indication of version number of the namespace - a way of versioning the namespace over time. But this has no impact in this usage in UI5.
The http prefix, in the UI5 context, has no implication of communication and does not make the client reach out to the specified web address (proven by Fiddler observations).
End.
According to the documentation the difference between this namespace and the other namespaces is intentional. Whatever that means. In general the URI is used to define unique namespace names and is not used by the parser.

HTTP/HTML: Resolution of double dots (..) in the URI (request, Location header etc.)

Are HTTP requests URIs allowed to contain ".." segments?
According to RFC 2616, section 5.1.2, they can refer to absolute URIs or absolute paths (the other options in that section are not relevant for this question).
The meaning of absolute URIs and absolute paths is described in RFC 3986, which also describes an algorithm to normalize paths (that includes remove single and double dot elements).
However, I can't find the exact specification whether an RFC conforming request URI can contain ".." segments - are they allowed in an absolute path/URI, and does the server have to normalize such URIs? Or is that up to the client?
Is there any difference for "Location:" response headers? According to the spec, they can only contain absolute URIs, but does that include ".." parts? Will the client have to normalize those too before requesting the referred resource?
To clarify, I know that URIs like ../foo are illegal in those situations, but what about http://example.com/../foo? Is that a valid absolute URI?
I'm currently redirecting clients to such URIs and would like to know if that is conforming to the specifications.
If you want to "know if that is conforming to the specifications," why don't you simply refer to the relevant specification?
RFC 3986 Section 5.2 is very clear on how URI dot segments should be resolved:
This section describes an algorithm for converting a URI reference
that might be relative to a given base URI into the parsed components
of the reference's target. The components can then be recomposed, as
described in Section 5.3, to form the target URI. This algorithm
provides definitive results that can be used to test the output of
other implementations. Applications may implement relative reference
resolution by using some other algorithm, provided that the results
match what would be given by this one.
If you are, for example, following Location: headers, it's usually prudent to normalize and resolve invalid relative paths (Location: headers are supposed to be absolute URIs). In these cases you should absolutely follow the instruction of RFC 3986 to resolve those paths against your base URI.
Should you pass around dot segments in your URIs all over the place? Probably not if you can help it because you're relying on other people to have implemented the specification correctly. But does passing URIs with dot segments violate the URI specification? No.
Syntactically speaking, http://example.com/../foo is a valid URI.
How the server interprets that URI is a different matter. Servers have to be very careful about how then translate URIs to file paths, for obvious security reasons. Usually the server will either strip out .. segments, or do some kind of post-processing to make sure the file path is inside the document root.
(Thank you for the great, crisp question in a topic full of hopeless public confusion, fueled by cryptic specs and surprising subtleties!)
... what about http://example.com/../foo? Is that a valid absolute URI?
No. It's an invalid absolute URI, because it attempts to refer to a place beyond the naming authority's namespace (root).
(Accordingly, I've been rewarded with due "400 Bad request" responses by servers when trying to feed them stuff like that.)
But, assuming you really meant to ask about valid, but equally non-normalized absolute paths like /root/../foo: #rdlowrey's answer is correct: better normalize them out yourself, if you can.
(Again, as an example, my proxy failed on pages that worked fine when sent to the same server by browsers, which go the extra mile normalizing the dot-parts out, instead of relying on servers doing the same.)
However, I can't find the exact specification whether an RFC
conforming request URI can contain ".." segments - are they allowed in
an absolute path/URI, and does the server have to normalize such URIs?
Or is that up to the client?
Unfortunately, you didn't find it because it's not specified, even in HTTP 2, AFAICT :-/

standard for resolving a urn:uuid (and other)?

My application uses urn:uuid as URIs for entities. Of course, when I get, e.g. RDF information about a resource, the referred entities (subject or objects) will contain URIs in the urn:uuid schema. To fetch the representation of the new entity, possibly in a REST way, I need a "resolver", similar in some way to dx.doi.org for DOIs. Another case could be the resolution of a isbn: URI, so to obtain a sensible representation of this URI.
My question is relative to what's out there, in terms of proposed standards, for URI-to-representation-URL resolution.
The concluded URN Working Group of the IETF has also done some work on resolving URNs and published quite a few RFCs on this topic. A list of references is contained in the group charter. Maybe some of them help you.
An UUID is a universally unique identifier, so I don't see how you would be able to resolve a uuid I just generated (e.g. 3136aa1a-fec8-11de-a55f-00003925d394) to something useful.
Only if you manage a database of uuids somewhere, you can retrieve more from it. Or you would have to ask everyone/everything "Do you know this uuid?"
The urn:uuid definition defines a clear space of unique identifiers you can use for defining something truly unique. But as nobody else can guess its value, you can't derive information from it.
There is no standard (proposed or otherwise) for resolving a URN. It's just a name (Uniform Resource NAME) and may have arbitrary meaning.
XML/RDF creates some confusion by using URNs which do resolve because they happen to also be URLs (Uniform Resource Locators) which point to objects describing their meaning, but this is merely a convention. They merely have to be unique and always mean the same thing.
If you are developing an application, you might want to consider use URNs which are also resolvable URLs for items with fixed meaning, and randomly generated URN's in the urn:uuid namespace to identify instances of objects.
That sounded about as confusing as the RDF spec:-)
Quick example:
Tiger: http://www.example.com/animals/tiger
Instance of a Tiger: urn:uuid:9a652678-4616-475d-af12-aca21cfbe06d
There might be a HTML page at http://www.example.com/animals/tiger, but there doesn't have to be. It's merely a convention.
[Additional Clarification Added]
The distinction here is between URNs (Names) and URLs (Locations).
A URN just names something. It's not a location of anything.
URLs are valid URNs, so you can use a URL for a URN if you want to.
In the above example, I could use e.g. http://www.example.com/tigers/9a652678-4616-475d-af12-aca21cfbe06d as the name of my tiger. I could put something at that address. But what would I put there? You can't download an instance of a tiger using http!
The convention in RDF is that if a URN is also a URL, it will point at some documentation defining what the name means.
What RDF is trying to give you is a convention for naming things which ensures that when two people use the same name, they mean the same thing. The UUID specification allows you to generate a unique name for something which is not likely to be used by anything else. But it's just a name, and there's no way of turning it into a thing.
Hope this helps.
One reason URNs exist is to give people the opportunity to create identifiers without the (implicit) responsibility of maintaining a service that describes the underlying resources. You could say that for RDF this is an advantage, but not a necessity, but you'd also be less inclined to use a particular vocabulary for example if you discovered that those HTTP URLs are no longer dereferenceable.
That being said, some URNs can be traced back to their representation. Here are some examples:
The ietf namespace defines several identifier schemes, so URIs like urn:ietf:rfc:2648 can be resolved if you implement the specific patterns.
Some namespaces are defined in other IANA registries, for example urn:ietf:params:xml: with the corresponding files for the resources.
Other namespaces point to already-established identifier spaces, like urn:isbn: (some metadata can be retrieved, but I don't think there is anything that will allow you to download the book from its ISBN), urn:oid:. There is also urn:publicid:, some of whose identifiers may be found somewhere deep inside ISO.
There is no general mechanism for URN resolution, and indeed there cannot be (that is also true for other URI schemes, like tag:).
Talking specifically about UUIDs, in my opinion, the best way out of this is not to use a URN at all. If you want to use a web server for the resolution, a "standard" way is to use the genid well-known service, thus your primary URI would be something like this: http://example.org/.well-known/genid/b47df9f0-a9c5-4e8a-9762-844a33ba7a3e. If you host RDF at that location, there is nothing wrong with adding owl:sameAs <urn:uuid:b47df9f0-a9c5-4e8a-9762-844a33ba7a3e> there if you have to.
To my knowledge, there is only one method that is in use today to create a link that conveys the question "Do you know this URN?", well, kind of: a magnet: link. There is nothing in principle that would require you to use a hash there like you usually find, so something like magnet:?xt=urn:uuid:b47df9f0-a9c5-4e8a-9762-844a33ba7a3e could work, provided you have your own client that can handle that.

Resources