I found a very little information on this matter. What is the difference between dereferenceable and non-dereferenceable URIs? What does it mean to dereference a URI? How does the URI change after it has been derefenced?
When reading about linked data at Wikipedia, it is said:
Use HTTP URIs so that these things can be looked up (interpreted, "dereferenced").
This makes it sound like every individual that can be found with the HTTP URI, eg "can be looked up" can be dereferenced? But not all URIs are derefenceable.
The simple answer is that if you can fetch a resource behind a URI by using exactly that URI, that URI is dereferenceable. This formulation means that only URLs are (potentially) dereferenceable and URNs aren't.
An extended definition is that all URIs you can map to a resource can be considered dereferenceable. For example, if you can map the URN urn:isbn:0451450523 to a book resource, then you may stretch the definition of dereferenceable URIs to include such URN (I wouldn't).
While on the topic, I think it's far better to mint URNs when your Linked Data resources are not dereferenceable (e.g. using an OBDA tool like Ontop) as to not confuse the consumers.
If you are looking at a quick way to make Linked Data resources dereferenceable, you can look at http://wifo5-03.informatik.uni-mannheim.de/pubby/
Related
I use like:
itemtype="http://schema.org/ImageObject"
but the request http://schema.org/ImageObject will be forwarded to https://schema.org/ImageObject.
If I change to itemtype="https://schema.org/ImageObject", the Google SDTT shows no problem, but nearly all examples about structured data from Google are with http.
What is best or recommended to use http://schema.org or https://schema.org for itemtype?
From Schema.org’s FAQs:
Q: Should we write https://schema.org or http://schema.org in our markup?
There is a general trend towards using https more widely, and you can already write https://schema.org in your structured data. Over time we will migrate the schema.org site itself towards using https: as the default version of the site and our preferred form in examples. However http://schema.org -based URLs in structured data markup will remain widely understood for the forseeable future and there should be no urgency about migrating existing data. This is a lengthy way of saying that both https://schema.org and http://schema.org are fine.
tl;dr: Both variants are possible.
The purpose of itemtype URIs
Note that the URIs used for itemtype are primarily identifiers, they typically don’t get dereferenced:
If a Microdata consumer doesn’t know what the URI in itemtype="http://schema.org/ImageObject" stands for, this consumer "must not automatically dereference" it.
If a Microdata consumer does know what the URI stands for, this consumer has no need to dereference this URI in the first place.
So, there is no technical reason to prefer the HTTPS variant. User agents won’t dereference this URI (in contrast to URIs specified in href/src attributes), and users can’t click on it. I think there is only one case where the HTTPS variant is useful: if a visitor looks into the source code and copy-pastes the URI to check what the type is about.
I would recommend to stick with the HTTP variant until Schema.org switched everything to HTTPS, most importantly the URI in RDF’a initial context.
The specification of Schema for the type ImageObject indicated:
Canonical URL: http://schema.org/ImageObject
It is probably useful to refer to the canonical URL because it is the “preferred” version of the web page.
When I was digging DBpedia and trying to learn more about Linked Data, I have seen that DBpedia is redirecting from http://dbpedia.org/resource/Rome
to http://dbpedia.org/page/Rome.
I wasn't able to find any reason for that and I would like to learn why this is happening.
The /resource/ URI represents the thing.
The /page/ URI represents the human-readable document about the thing.
The /data/ URI represents the machine-readable document about the thing.
(This is the HTTP status code 303 approach. More details.)
So, if you want to say something about the city/comune Rome, you have to use the /resource/ URI.
Are HTTP requests URIs allowed to contain ".." segments?
According to RFC 2616, section 5.1.2, they can refer to absolute URIs or absolute paths (the other options in that section are not relevant for this question).
The meaning of absolute URIs and absolute paths is described in RFC 3986, which also describes an algorithm to normalize paths (that includes remove single and double dot elements).
However, I can't find the exact specification whether an RFC conforming request URI can contain ".." segments - are they allowed in an absolute path/URI, and does the server have to normalize such URIs? Or is that up to the client?
Is there any difference for "Location:" response headers? According to the spec, they can only contain absolute URIs, but does that include ".." parts? Will the client have to normalize those too before requesting the referred resource?
To clarify, I know that URIs like ../foo are illegal in those situations, but what about http://example.com/../foo? Is that a valid absolute URI?
I'm currently redirecting clients to such URIs and would like to know if that is conforming to the specifications.
If you want to "know if that is conforming to the specifications," why don't you simply refer to the relevant specification?
RFC 3986 Section 5.2 is very clear on how URI dot segments should be resolved:
This section describes an algorithm for converting a URI reference
that might be relative to a given base URI into the parsed components
of the reference's target. The components can then be recomposed, as
described in Section 5.3, to form the target URI. This algorithm
provides definitive results that can be used to test the output of
other implementations. Applications may implement relative reference
resolution by using some other algorithm, provided that the results
match what would be given by this one.
If you are, for example, following Location: headers, it's usually prudent to normalize and resolve invalid relative paths (Location: headers are supposed to be absolute URIs). In these cases you should absolutely follow the instruction of RFC 3986 to resolve those paths against your base URI.
Should you pass around dot segments in your URIs all over the place? Probably not if you can help it because you're relying on other people to have implemented the specification correctly. But does passing URIs with dot segments violate the URI specification? No.
Syntactically speaking, http://example.com/../foo is a valid URI.
How the server interprets that URI is a different matter. Servers have to be very careful about how then translate URIs to file paths, for obvious security reasons. Usually the server will either strip out .. segments, or do some kind of post-processing to make sure the file path is inside the document root.
(Thank you for the great, crisp question in a topic full of hopeless public confusion, fueled by cryptic specs and surprising subtleties!)
... what about http://example.com/../foo? Is that a valid absolute URI?
No. It's an invalid absolute URI, because it attempts to refer to a place beyond the naming authority's namespace (root).
(Accordingly, I've been rewarded with due "400 Bad request" responses by servers when trying to feed them stuff like that.)
But, assuming you really meant to ask about valid, but equally non-normalized absolute paths like /root/../foo: #rdlowrey's answer is correct: better normalize them out yourself, if you can.
(Again, as an example, my proxy failed on pages that worked fine when sent to the same server by browsers, which go the extra mile normalizing the dot-parts out, instead of relying on servers doing the same.)
However, I can't find the exact specification whether an RFC
conforming request URI can contain ".." segments - are they allowed in
an absolute path/URI, and does the server have to normalize such URIs?
Or is that up to the client?
Unfortunately, you didn't find it because it's not specified, even in HTTP 2, AFAICT :-/
On the Microformats spec for RESTful URLs:
GET /people/1
return the first record in HTML format
GET /people/1.html
return the first record in HTML format
and /people returns a list of people
So is /people.html the correct way to return a list of people in HTML format?
If you just refer to the URL path extension, then, yes, that scheme is the recommended behavior for content negotiation:
path without extension is a generic URL (e.g. /people for any accepted format)
path with extension is a specific URL (e.g. /people.json as a content-type-specific URL for the JSON data format)
With such a scheme the server can use content negotiation when the generic URL is requested and respond with a specific representation when a specific URL is requested.
Documents that recommend this scheme are among others:
Cool URIs don't change
Cool URIs for the Semantic Web
Content Negotiation: why it is useful, and how to make it work
You have the right idea. Both /people and /people.html would return HTML-formatted lists of people, and /people.json would return a JSON-formatted list of people.
There should be no confusion about this with regard to applying data-type extensions to "folders" in the URLs. In the list of examples, /people/1 is itself used as a folder for various other queries.
It says that GET /people/1.json should return the first record in JSON format. - Which makes sense.
URIs and how you design them have nothing to do with being RESTful or not.
It is a common practice to do what you ask, since that's how the Apache web server works. Let's say you have foo.txt and foo.html and foo.pdf, and ask to GET /foo with no preference (i.e. no Accept: header). A 300 MULTIPLE CHOICES would be returned with a listing of the three files so the user could pick. Because browsers do such marvelous content negotiation, it's hard to link to an example, but here goes: An example shows what it looks like, except for that the reason you see the page in the first place is the different case of the file name ("XSLT" vs "xslt").
But this Apache behaviour is echoed in conventions and different tools, but really it isn't important. You could have people_html or people?format=html or people.html or sandwiches or 123qweazrfvbnhyrewsxc6yhn8uk as the URI which returns people in HTML format. The client doesn't know any of these URIs up front, it's supposed to learn that from other resources. A human could see the result of All People (HTML format) and understand what happens, while ignoring the strange looking URI.
On a closing note, the microformats URL conventions page is absolutely not a spec for RESTful URLs, it's merely guidance on making URIs that apparently are easy to consume by various HTTP libraries for some reason or another, and has nothing to do with REST at all. The guidelines are all perfectly OK, and following them makes your URIs look sane to other people that happen to glance on the URIs (/sandwiches is admittedly odd). But even the cited AtomPub protocol doesn't require entries to live "within" the collection...
My application uses urn:uuid as URIs for entities. Of course, when I get, e.g. RDF information about a resource, the referred entities (subject or objects) will contain URIs in the urn:uuid schema. To fetch the representation of the new entity, possibly in a REST way, I need a "resolver", similar in some way to dx.doi.org for DOIs. Another case could be the resolution of a isbn: URI, so to obtain a sensible representation of this URI.
My question is relative to what's out there, in terms of proposed standards, for URI-to-representation-URL resolution.
The concluded URN Working Group of the IETF has also done some work on resolving URNs and published quite a few RFCs on this topic. A list of references is contained in the group charter. Maybe some of them help you.
An UUID is a universally unique identifier, so I don't see how you would be able to resolve a uuid I just generated (e.g. 3136aa1a-fec8-11de-a55f-00003925d394) to something useful.
Only if you manage a database of uuids somewhere, you can retrieve more from it. Or you would have to ask everyone/everything "Do you know this uuid?"
The urn:uuid definition defines a clear space of unique identifiers you can use for defining something truly unique. But as nobody else can guess its value, you can't derive information from it.
There is no standard (proposed or otherwise) for resolving a URN. It's just a name (Uniform Resource NAME) and may have arbitrary meaning.
XML/RDF creates some confusion by using URNs which do resolve because they happen to also be URLs (Uniform Resource Locators) which point to objects describing their meaning, but this is merely a convention. They merely have to be unique and always mean the same thing.
If you are developing an application, you might want to consider use URNs which are also resolvable URLs for items with fixed meaning, and randomly generated URN's in the urn:uuid namespace to identify instances of objects.
That sounded about as confusing as the RDF spec:-)
Quick example:
Tiger: http://www.example.com/animals/tiger
Instance of a Tiger: urn:uuid:9a652678-4616-475d-af12-aca21cfbe06d
There might be a HTML page at http://www.example.com/animals/tiger, but there doesn't have to be. It's merely a convention.
[Additional Clarification Added]
The distinction here is between URNs (Names) and URLs (Locations).
A URN just names something. It's not a location of anything.
URLs are valid URNs, so you can use a URL for a URN if you want to.
In the above example, I could use e.g. http://www.example.com/tigers/9a652678-4616-475d-af12-aca21cfbe06d as the name of my tiger. I could put something at that address. But what would I put there? You can't download an instance of a tiger using http!
The convention in RDF is that if a URN is also a URL, it will point at some documentation defining what the name means.
What RDF is trying to give you is a convention for naming things which ensures that when two people use the same name, they mean the same thing. The UUID specification allows you to generate a unique name for something which is not likely to be used by anything else. But it's just a name, and there's no way of turning it into a thing.
Hope this helps.
One reason URNs exist is to give people the opportunity to create identifiers without the (implicit) responsibility of maintaining a service that describes the underlying resources. You could say that for RDF this is an advantage, but not a necessity, but you'd also be less inclined to use a particular vocabulary for example if you discovered that those HTTP URLs are no longer dereferenceable.
That being said, some URNs can be traced back to their representation. Here are some examples:
The ietf namespace defines several identifier schemes, so URIs like urn:ietf:rfc:2648 can be resolved if you implement the specific patterns.
Some namespaces are defined in other IANA registries, for example urn:ietf:params:xml: with the corresponding files for the resources.
Other namespaces point to already-established identifier spaces, like urn:isbn: (some metadata can be retrieved, but I don't think there is anything that will allow you to download the book from its ISBN), urn:oid:. There is also urn:publicid:, some of whose identifiers may be found somewhere deep inside ISO.
There is no general mechanism for URN resolution, and indeed there cannot be (that is also true for other URI schemes, like tag:).
Talking specifically about UUIDs, in my opinion, the best way out of this is not to use a URN at all. If you want to use a web server for the resolution, a "standard" way is to use the genid well-known service, thus your primary URI would be something like this: http://example.org/.well-known/genid/b47df9f0-a9c5-4e8a-9762-844a33ba7a3e. If you host RDF at that location, there is nothing wrong with adding owl:sameAs <urn:uuid:b47df9f0-a9c5-4e8a-9762-844a33ba7a3e> there if you have to.
To my knowledge, there is only one method that is in use today to create a link that conveys the question "Do you know this URN?", well, kind of: a magnet: link. There is nothing in principle that would require you to use a hash there like you usually find, so something like magnet:?xt=urn:uuid:b47df9f0-a9c5-4e8a-9762-844a33ba7a3e could work, provided you have your own client that can handle that.