Why do URIs of specifications/vocabularies contain date information? - uri

Most example namespace URIs seem to contain some combination of year/month/day in their path:
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs: http://www.w3.org/2000/01/rdf-schema#>
It's not obvious (to me) why it makes sense to include part of the created date in the URI when the concepts that are included in the vocabularies are not exactly temporal.
EDIT
There may be additional confusion because of old w3c (and potentially other org) notes lying around that are still high up on SEO for semantic web. For example, this note from a w3c users group recommends the use of dates in URIs.
Manageability.
Issue your URIs in a way that you can manage. One good practice is to include the current year in the URI path, so that you can change the URI-schema each year without breaking older URIs.
#cygri is still correct (the link is pre-2010), just pointing this out for people that come across conflicting information.

It's not obvious (to me) why it makes sense to include part of the created date in the URI when the concepts that are included in the vocabularies are not exactly temporal.
It's a bad idea and shouldn't be done.
These namespaces are from the very early days of RDF, when good practices around URI management for the Semantic Web were not yet understood. Today, W3C uses much shorter and undated namespaces like http://www.w3.org/ns/csvw# for new vocabularies, but changing the old namespaces is not really possible given the huge amount of data and tools that are already published with these namespaces baked in.
So why did W3C think it was a good idea to include the date back then?
Because W3C includes date information in pretty much all their URIs. It's the date when the URI was allocated. It's their way of ensuring that URIs are unique and don't accidentally clash. So, all URIs that were allocated in the year 2000 have a path that starts with /2000/, all from 2001 start with /2001/ and so on. For “high-value” documents, like W3C standards, they also allocate a short alias such as http://www.w3.org/TR/html.
I suppose back then they thought that short aliases are not necessary for vocabularies, because only machines would see those URIs, and namespace prefixes would be used to hide them from view.
Today, the general wisdom is to “leave out as much as possible” when allocating URIs. So, schema.org with class URIs like http://schema.org/Person is pretty much perfect.
Most namespace URIs seem to contain some combination of year/month/day in their path
That's not really true. If a namespace URI has a date in it, it's probably a W3C URI from before 2010. Most namespace URIs don't have dates in them.

Related

How do I write a two-word type in JSON:API?

I want to post an alien artifact to my server.
Do I write type="alienArtifact"
or type="alien-artifact"
or something completely different?
I looked here https://jsonapi.org/format/ but is only deals with simple types, "objects".
JSON:API specification is agnostic if you use kebab-case, snake_case or camelCase for field names but it comes with a recommendation:
Member names SHOULD be camel-cased (i.e., wordWordWord)
This recommendation was change in October 2018. It was kebab-case before. Therefor many articles about JSON:API specification and documentation for libraries are still using kebab-case. These arguments were given as main reasons for this change:
camelCased names can be used directly as identifiers in almost all programming languages, making json:api easier to get started with and to work with over time. Dasherized names are usable as identifiers pretty much only in Lisps.
camelCased names are the most common convention in the JS community, and JS is the biggest user of JSON:API
camelCased names are slightly shorter than dasherized (or snake-case) names, which could help with url character limits in
the case of complex filters or other types of complex queries
(e.g., some GraphQL like “deep querying” feature) that we might
serialize into urls in the future.
You could find more details about this change in the appropriate pull request.

itemtype with http or better https?

I use like:
itemtype="http://schema.org/ImageObject"
but the request http://schema.org/ImageObject will be forwarded to https://schema.org/ImageObject.
If I change to itemtype="https://schema.org/ImageObject", the Google SDTT shows no problem, but nearly all examples about structured data from Google are with http.
What is best or recommended to use http://schema.org or https://schema.org for itemtype?
From Schema.org’s FAQs:
Q: Should we write https://schema.org or http://schema.org in our markup?
There is a general trend towards using https more widely, and you can already write https://schema.org in your structured data. Over time we will migrate the schema.org site itself towards using https: as the default version of the site and our preferred form in examples. However http://schema.org -based URLs in structured data markup will remain widely understood for the forseeable future and there should be no urgency about migrating existing data. This is a lengthy way of saying that both https://schema.org and http://schema.org are fine.
tl;dr: Both variants are possible.
The purpose of itemtype URIs
Note that the URIs used for itemtype are primarily identifiers, they typically don’t get dereferenced:
If a Microdata consumer doesn’t know what the URI in itemtype="http://schema.org/ImageObject" stands for, this consumer "must not automatically dereference" it.
If a Microdata consumer does know what the URI stands for, this consumer has no need to dereference this URI in the first place.
So, there is no technical reason to prefer the HTTPS variant. User agents won’t dereference this URI (in contrast to URIs specified in href/src attributes), and users can’t click on it. I think there is only one case where the HTTPS variant is useful: if a visitor looks into the source code and copy-pastes the URI to check what the type is about.
I would recommend to stick with the HTTP variant until Schema.org switched everything to HTTPS, most importantly the URI in RDF’a initial context.
The specification of Schema for the type ImageObject indicated:
Canonical URL: http://schema.org/ImageObject
It is probably useful to refer to the canonical URL because it is the “preferred” version of the web page.

Why does 'x-www-form-urlencoded' begin with 'x-www', when other standard content types do not?

I understand that in the past, it was standard for custom headers names to use the prefix "X-" (I'm aware it no longer is considered standard to do this), but I've been unable to find if there is any relationship between this naming convention and the value ("application/x-www-form-urlencoded"). Did it start out as a custom content-type value that was later adopted or something?
I found this link here, which certainly was interesting, but have been unable to find the answer to my question.
Does anybody know the reason this prefix was chosen, and what it signifies?
it was standard for custom headers names to use the prefix "X-"
Actually … no, not at all. To be precise: It has never been a standard, just a best practice. It allowed implementors to introduce new content types and codings without the need to write an entire RFC for it. Nowadays the IANA Media Type Registry is good for that. RFC 6648 put an end to this practice.
The reason application/x-www-form-urlencoded is prefixed in this way (it is listed as a proper MIME type in said registry, btw)) stems from the fact that it is a "custom" method of structuring the query string in a URL. That part has never seen proper regulation. The people behind HTML just went and did it, which fully justified the prefix.
As far as the history: it has the x- prefix because it originated in a proposal from Mosaic—and since it was just a proposal, they used that x- extension prefix to initially define it. But then other browsers implemented it that way too, and nobody ever got around to taking the time to properly standardize an unprefixed alternative, so it just stuck that way, and here were are now.
It can be traced back to a 1993 thread on the www-talk mailing list titled “Submitting input-form data to server”, and in that thread, a September 1993 message from Marc Andreessen:
This is what we're doing in Mosaic 2.0… See
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html
...for details on what we're up to
That link is broken now but the document, titled “Mosaic for X version 2.0 Fill-Out Form Support” is archived at archive.org. Here’s the relevant excerpt:
ENCTYPE specifies the encoding for the fill-out form contents. This attribute only applies if METHOD is set to POST -- and even then, there is only one possible value (the default, application/x-www-form-urlencoded) so far.
Anyway, application/x-www-form-urlencoded is now formally defined in the URL spec, with algorithms for parsing and serializing it—though the section it’s all defined in has this note:
The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.

RESTful URLs and folders

On the Microformats spec for RESTful URLs:
GET /people/1
return the first record in HTML format
GET /people/1.html
return the first record in HTML format
and /people returns a list of people
So is /people.html the correct way to return a list of people in HTML format?
If you just refer to the URL path extension, then, yes, that scheme is the recommended behavior for content negotiation:
path without extension is a generic URL (e.g. /people for any accepted format)
path with extension is a specific URL (e.g. /people.json as a content-type-specific URL for the JSON data format)
With such a scheme the server can use content negotiation when the generic URL is requested and respond with a specific representation when a specific URL is requested.
Documents that recommend this scheme are among others:
Cool URIs don't change
Cool URIs for the Semantic Web
Content Negotiation: why it is useful, and how to make it work
You have the right idea. Both /people and /people.html would return HTML-formatted lists of people, and /people.json would return a JSON-formatted list of people.
There should be no confusion about this with regard to applying data-type extensions to "folders" in the URLs. In the list of examples, /people/1 is itself used as a folder for various other queries.
It says that GET /people/1.json should return the first record in JSON format. - Which makes sense.
URIs and how you design them have nothing to do with being RESTful or not.
It is a common practice to do what you ask, since that's how the Apache web server works. Let's say you have foo.txt and foo.html and foo.pdf, and ask to GET /foo with no preference (i.e. no Accept: header). A 300 MULTIPLE CHOICES would be returned with a listing of the three files so the user could pick. Because browsers do such marvelous content negotiation, it's hard to link to an example, but here goes: An example shows what it looks like, except for that the reason you see the page in the first place is the different case of the file name ("XSLT" vs "xslt").
But this Apache behaviour is echoed in conventions and different tools, but really it isn't important. You could have people_html or people?format=html or people.html or sandwiches or 123qweazrfvbnhyrewsxc6yhn8uk as the URI which returns people in HTML format. The client doesn't know any of these URIs up front, it's supposed to learn that from other resources. A human could see the result of All People (HTML format) and understand what happens, while ignoring the strange looking URI.
On a closing note, the microformats URL conventions page is absolutely not a spec for RESTful URLs, it's merely guidance on making URIs that apparently are easy to consume by various HTTP libraries for some reason or another, and has nothing to do with REST at all. The guidelines are all perfectly OK, and following them makes your URIs look sane to other people that happen to glance on the URIs (/sandwiches is admittedly odd). But even the cited AtomPub protocol doesn't require entries to live "within" the collection...

standard for resolving a urn:uuid (and other)?

My application uses urn:uuid as URIs for entities. Of course, when I get, e.g. RDF information about a resource, the referred entities (subject or objects) will contain URIs in the urn:uuid schema. To fetch the representation of the new entity, possibly in a REST way, I need a "resolver", similar in some way to dx.doi.org for DOIs. Another case could be the resolution of a isbn: URI, so to obtain a sensible representation of this URI.
My question is relative to what's out there, in terms of proposed standards, for URI-to-representation-URL resolution.
The concluded URN Working Group of the IETF has also done some work on resolving URNs and published quite a few RFCs on this topic. A list of references is contained in the group charter. Maybe some of them help you.
An UUID is a universally unique identifier, so I don't see how you would be able to resolve a uuid I just generated (e.g. 3136aa1a-fec8-11de-a55f-00003925d394) to something useful.
Only if you manage a database of uuids somewhere, you can retrieve more from it. Or you would have to ask everyone/everything "Do you know this uuid?"
The urn:uuid definition defines a clear space of unique identifiers you can use for defining something truly unique. But as nobody else can guess its value, you can't derive information from it.
There is no standard (proposed or otherwise) for resolving a URN. It's just a name (Uniform Resource NAME) and may have arbitrary meaning.
XML/RDF creates some confusion by using URNs which do resolve because they happen to also be URLs (Uniform Resource Locators) which point to objects describing their meaning, but this is merely a convention. They merely have to be unique and always mean the same thing.
If you are developing an application, you might want to consider use URNs which are also resolvable URLs for items with fixed meaning, and randomly generated URN's in the urn:uuid namespace to identify instances of objects.
That sounded about as confusing as the RDF spec:-)
Quick example:
Tiger: http://www.example.com/animals/tiger
Instance of a Tiger: urn:uuid:9a652678-4616-475d-af12-aca21cfbe06d
There might be a HTML page at http://www.example.com/animals/tiger, but there doesn't have to be. It's merely a convention.
[Additional Clarification Added]
The distinction here is between URNs (Names) and URLs (Locations).
A URN just names something. It's not a location of anything.
URLs are valid URNs, so you can use a URL for a URN if you want to.
In the above example, I could use e.g. http://www.example.com/tigers/9a652678-4616-475d-af12-aca21cfbe06d as the name of my tiger. I could put something at that address. But what would I put there? You can't download an instance of a tiger using http!
The convention in RDF is that if a URN is also a URL, it will point at some documentation defining what the name means.
What RDF is trying to give you is a convention for naming things which ensures that when two people use the same name, they mean the same thing. The UUID specification allows you to generate a unique name for something which is not likely to be used by anything else. But it's just a name, and there's no way of turning it into a thing.
Hope this helps.
One reason URNs exist is to give people the opportunity to create identifiers without the (implicit) responsibility of maintaining a service that describes the underlying resources. You could say that for RDF this is an advantage, but not a necessity, but you'd also be less inclined to use a particular vocabulary for example if you discovered that those HTTP URLs are no longer dereferenceable.
That being said, some URNs can be traced back to their representation. Here are some examples:
The ietf namespace defines several identifier schemes, so URIs like urn:ietf:rfc:2648 can be resolved if you implement the specific patterns.
Some namespaces are defined in other IANA registries, for example urn:ietf:params:xml: with the corresponding files for the resources.
Other namespaces point to already-established identifier spaces, like urn:isbn: (some metadata can be retrieved, but I don't think there is anything that will allow you to download the book from its ISBN), urn:oid:. There is also urn:publicid:, some of whose identifiers may be found somewhere deep inside ISO.
There is no general mechanism for URN resolution, and indeed there cannot be (that is also true for other URI schemes, like tag:).
Talking specifically about UUIDs, in my opinion, the best way out of this is not to use a URN at all. If you want to use a web server for the resolution, a "standard" way is to use the genid well-known service, thus your primary URI would be something like this: http://example.org/.well-known/genid/b47df9f0-a9c5-4e8a-9762-844a33ba7a3e. If you host RDF at that location, there is nothing wrong with adding owl:sameAs <urn:uuid:b47df9f0-a9c5-4e8a-9762-844a33ba7a3e> there if you have to.
To my knowledge, there is only one method that is in use today to create a link that conveys the question "Do you know this URN?", well, kind of: a magnet: link. There is nothing in principle that would require you to use a hash there like you usually find, so something like magnet:?xt=urn:uuid:b47df9f0-a9c5-4e8a-9762-844a33ba7a3e could work, provided you have your own client that can handle that.

Resources