standard for resolving a urn:uuid (and other)? - uri

My application uses urn:uuid as URIs for entities. Of course, when I get, e.g. RDF information about a resource, the referred entities (subject or objects) will contain URIs in the urn:uuid schema. To fetch the representation of the new entity, possibly in a REST way, I need a "resolver", similar in some way to dx.doi.org for DOIs. Another case could be the resolution of a isbn: URI, so to obtain a sensible representation of this URI.
My question is relative to what's out there, in terms of proposed standards, for URI-to-representation-URL resolution.

The concluded URN Working Group of the IETF has also done some work on resolving URNs and published quite a few RFCs on this topic. A list of references is contained in the group charter. Maybe some of them help you.

An UUID is a universally unique identifier, so I don't see how you would be able to resolve a uuid I just generated (e.g. 3136aa1a-fec8-11de-a55f-00003925d394) to something useful.
Only if you manage a database of uuids somewhere, you can retrieve more from it. Or you would have to ask everyone/everything "Do you know this uuid?"
The urn:uuid definition defines a clear space of unique identifiers you can use for defining something truly unique. But as nobody else can guess its value, you can't derive information from it.

There is no standard (proposed or otherwise) for resolving a URN. It's just a name (Uniform Resource NAME) and may have arbitrary meaning.
XML/RDF creates some confusion by using URNs which do resolve because they happen to also be URLs (Uniform Resource Locators) which point to objects describing their meaning, but this is merely a convention. They merely have to be unique and always mean the same thing.
If you are developing an application, you might want to consider use URNs which are also resolvable URLs for items with fixed meaning, and randomly generated URN's in the urn:uuid namespace to identify instances of objects.
That sounded about as confusing as the RDF spec:-)
Quick example:
Tiger: http://www.example.com/animals/tiger
Instance of a Tiger: urn:uuid:9a652678-4616-475d-af12-aca21cfbe06d
There might be a HTML page at http://www.example.com/animals/tiger, but there doesn't have to be. It's merely a convention.
[Additional Clarification Added]
The distinction here is between URNs (Names) and URLs (Locations).
A URN just names something. It's not a location of anything.
URLs are valid URNs, so you can use a URL for a URN if you want to.
In the above example, I could use e.g. http://www.example.com/tigers/9a652678-4616-475d-af12-aca21cfbe06d as the name of my tiger. I could put something at that address. But what would I put there? You can't download an instance of a tiger using http!
The convention in RDF is that if a URN is also a URL, it will point at some documentation defining what the name means.
What RDF is trying to give you is a convention for naming things which ensures that when two people use the same name, they mean the same thing. The UUID specification allows you to generate a unique name for something which is not likely to be used by anything else. But it's just a name, and there's no way of turning it into a thing.
Hope this helps.

One reason URNs exist is to give people the opportunity to create identifiers without the (implicit) responsibility of maintaining a service that describes the underlying resources. You could say that for RDF this is an advantage, but not a necessity, but you'd also be less inclined to use a particular vocabulary for example if you discovered that those HTTP URLs are no longer dereferenceable.
That being said, some URNs can be traced back to their representation. Here are some examples:
The ietf namespace defines several identifier schemes, so URIs like urn:ietf:rfc:2648 can be resolved if you implement the specific patterns.
Some namespaces are defined in other IANA registries, for example urn:ietf:params:xml: with the corresponding files for the resources.
Other namespaces point to already-established identifier spaces, like urn:isbn: (some metadata can be retrieved, but I don't think there is anything that will allow you to download the book from its ISBN), urn:oid:. There is also urn:publicid:, some of whose identifiers may be found somewhere deep inside ISO.
There is no general mechanism for URN resolution, and indeed there cannot be (that is also true for other URI schemes, like tag:).
Talking specifically about UUIDs, in my opinion, the best way out of this is not to use a URN at all. If you want to use a web server for the resolution, a "standard" way is to use the genid well-known service, thus your primary URI would be something like this: http://example.org/.well-known/genid/b47df9f0-a9c5-4e8a-9762-844a33ba7a3e. If you host RDF at that location, there is nothing wrong with adding owl:sameAs <urn:uuid:b47df9f0-a9c5-4e8a-9762-844a33ba7a3e> there if you have to.
To my knowledge, there is only one method that is in use today to create a link that conveys the question "Do you know this URN?", well, kind of: a magnet: link. There is nothing in principle that would require you to use a hash there like you usually find, so something like magnet:?xt=urn:uuid:b47df9f0-a9c5-4e8a-9762-844a33ba7a3e could work, provided you have your own client that can handle that.

Related

What does make a URI derefenceable?

I found a very little information on this matter. What is the difference between dereferenceable and non-dereferenceable URIs? What does it mean to dereference a URI? How does the URI change after it has been derefenced?
When reading about linked data at Wikipedia, it is said:
Use HTTP URIs so that these things can be looked up (interpreted, "dereferenced").
This makes it sound like every individual that can be found with the HTTP URI, eg "can be looked up" can be dereferenced? But not all URIs are derefenceable.
The simple answer is that if you can fetch a resource behind a URI by using exactly that URI, that URI is dereferenceable. This formulation means that only URLs are (potentially) dereferenceable and URNs aren't.
An extended definition is that all URIs you can map to a resource can be considered dereferenceable. For example, if you can map the URN urn:isbn:0451450523 to a book resource, then you may stretch the definition of dereferenceable URIs to include such URN (I wouldn't).
While on the topic, I think it's far better to mint URNs when your Linked Data resources are not dereferenceable (e.g. using an OBDA tool like Ontop) as to not confuse the consumers.
If you are looking at a quick way to make Linked Data resources dereferenceable, you can look at http://wifo5-03.informatik.uni-mannheim.de/pubby/

Why do URIs of specifications/vocabularies contain date information?

Most example namespace URIs seem to contain some combination of year/month/day in their path:
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs: http://www.w3.org/2000/01/rdf-schema#>
It's not obvious (to me) why it makes sense to include part of the created date in the URI when the concepts that are included in the vocabularies are not exactly temporal.
EDIT
There may be additional confusion because of old w3c (and potentially other org) notes lying around that are still high up on SEO for semantic web. For example, this note from a w3c users group recommends the use of dates in URIs.
Manageability.
Issue your URIs in a way that you can manage. One good practice is to include the current year in the URI path, so that you can change the URI-schema each year without breaking older URIs.
#cygri is still correct (the link is pre-2010), just pointing this out for people that come across conflicting information.
It's not obvious (to me) why it makes sense to include part of the created date in the URI when the concepts that are included in the vocabularies are not exactly temporal.
It's a bad idea and shouldn't be done.
These namespaces are from the very early days of RDF, when good practices around URI management for the Semantic Web were not yet understood. Today, W3C uses much shorter and undated namespaces like http://www.w3.org/ns/csvw# for new vocabularies, but changing the old namespaces is not really possible given the huge amount of data and tools that are already published with these namespaces baked in.
So why did W3C think it was a good idea to include the date back then?
Because W3C includes date information in pretty much all their URIs. It's the date when the URI was allocated. It's their way of ensuring that URIs are unique and don't accidentally clash. So, all URIs that were allocated in the year 2000 have a path that starts with /2000/, all from 2001 start with /2001/ and so on. For “high-value” documents, like W3C standards, they also allocate a short alias such as http://www.w3.org/TR/html.
I suppose back then they thought that short aliases are not necessary for vocabularies, because only machines would see those URIs, and namespace prefixes would be used to hide them from view.
Today, the general wisdom is to “leave out as much as possible” when allocating URIs. So, schema.org with class URIs like http://schema.org/Person is pretty much perfect.
Most namespace URIs seem to contain some combination of year/month/day in their path
That's not really true. If a namespace URI has a date in it, it's probably a W3C URI from before 2010. Most namespace URIs don't have dates in them.

How do I write a two-word type in JSON:API?

I want to post an alien artifact to my server.
Do I write type="alienArtifact"
or type="alien-artifact"
or something completely different?
I looked here https://jsonapi.org/format/ but is only deals with simple types, "objects".
JSON:API specification is agnostic if you use kebab-case, snake_case or camelCase for field names but it comes with a recommendation:
Member names SHOULD be camel-cased (i.e., wordWordWord)
This recommendation was change in October 2018. It was kebab-case before. Therefor many articles about JSON:API specification and documentation for libraries are still using kebab-case. These arguments were given as main reasons for this change:
camelCased names can be used directly as identifiers in almost all programming languages, making json:api easier to get started with and to work with over time. Dasherized names are usable as identifiers pretty much only in Lisps.
camelCased names are the most common convention in the JS community, and JS is the biggest user of JSON:API
camelCased names are slightly shorter than dasherized (or snake-case) names, which could help with url character limits in
the case of complex filters or other types of complex queries
(e.g., some GraphQL like “deep querying” feature) that we might
serialize into urls in the future.
You could find more details about this change in the appropriate pull request.

Can someone explain FHIR extensions?

I've been trying to wrap my head around authoring profiles in FHIR. The trouble I'm having is around the use of using extensions.
The documentation talks about extensions as if they are simply just there to extend existing elements of the resource which a profile belongs to, this is kind of confirmed to me when using forge because I can add new elements which don't have extensions.
It feels very foreign to me as in our proprietary storage system, we have the equivalent of profiles, and they have properties about them (which I think are similar to elements in fhir), however a property is only designed to store one type of thing; e.g. you might have a patient profile that has the properties DOB, ethniticy, identifier, etc. I don't really understand what profiles are for in the context of fhir, are they similar to my properties? Can I use the to limit the datatype that a profile instance can have for a particular element?
Is there any better documentation than the spec? I'm finding it really hard to get to grips with.
FHIR extensions are used to be able to enter extra data elements, when there's no field for that in the standard definition. Mother's maiden name is an example of that for the Patient resource.
The use of an extension is a standard FHIR mechanism and will always look like this:
<extension>
<url value="http://hl7.org/fhir/StructureDefinition/patient-mothersMaidenName"/>
<valueString value="Williams"/>
</extension>
The url is the canonical url for the definition of the extension, which is a StructureDefinition resource defining the extension and the datatype(s) of the value.
You can have extensions on every level of a resource/datatype.
Since profiling is a very overloaded term, it is hard for me to understand what you're saying about profiles and properties in your proprietary system, or how that relates to your question. But in general, FHIR profiling is needed and used to
be able to add data when there's no data field for it in the specification (i.e. an extension of the specs)
constrain the specification in places where you need to be more strict, for example to make an optional field mandatory (i.e. a constraint on the specs, also called a profile)
I recommend browsing through some of the profiles and their descriptions on the Simplifier repository to get an idea of why people are creating profiles on FHIR.

REST design: what verb and resource name to use for a filtering service

I am developing a cleanup/filtering service that has a method that receives a list of objects serialized in xml, and apply some filtering rules to return a subset of those objects.
In a REST-ful service, what verb shall I use for such a method? I thought that GET is a natural choice, but I have to put the serialized XML in the body of the request which works but feels incorrect. The other verbs don't seem to fit semantically.
What is a good way to define that Service interface? Naming the resource /Cleanup or /Filter seems weird mainly because in the examples I see online, it is always a name rather than a verb being used for resource name.
Am I right to feel that REST services are better suited for CRUD operations and you start bending the rules in situations like this service? If yes, am I then making a wrong architectural choice.
I've pushed to develop this service in REST-ful style (as opposed to SOAP) for simplicity, but such awkward cases happen a lot and make me feel like I am missing something. Either choosing REST where it shouldn't be used or may be over-thinking some stuff that doesn't really matter? In that case, what really matters?
REST is about using HTTP the way it was designed. To be RESTful consider (title was REST design :):
URLs should be permalinks to a resource (caching benefits, storing/sharing endpoints etc...)
Because they are permalinks to a resource, having verbs in the URL is a hint that you're on the wrong path (filter is a verb).
A collection of resources can be an endpoint /foos.
If you want to filter the collection of resources, consider querystring params like ?filter= or something like ?ids=1,2,3,4,5.
A GET should not change resources. Note that 'cleanup' implies something getting deleted so be cautious of changes to resources when you do a GET. REST says a GET shouldn't alter resources. Imagine a caching server taking you're cleanup request as a GET and returning OK because t's cached. Caching servers know not to cache a POST, DELETE etc... (that's the way HTTP was designed).
Don't rule out multiple calls - for example, you may do a get to filter and get a set of resources to clean up and then could be followed by many or one DELETE verb calls to do the cleanup.
Sometimes there's a temporal resource like a transaction or a 'job' that could do work like a cleanup. Don't rule out a POST to the resource with the body containing items to cleanup up and it returns a job id. You can then query the jobid for the cleanup progress or status.
It's hard to give exact guidance because the question isn't clear but hopefully the RESTful principlies guidance and thoughts above set you on the right track. If you clarify the exact calls, I'll try and recommend APIs.
So, let's say you wanted to cleanup duplicate foos.
[GET] /foos/duplicates (or /foos?filter=duplicates)
returns a body with identifies to of foos that are duplicates. Let's say that returns 1,2,5 (could be names).
Then you could issue:
[DELETE] /foos with the body being an array containing 1,2,5 (or names if unique). the delete call is passive so even if the GET call is cached according to REST principles it's fine.
It's also possible and valid to not go the REST route such as POX or JOSN RPC over http but just realize at that point that it's not REST. And that's fine but you're not getting the benefits of REST described in fielding's thesis.
http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
Also, read this:
http://blog.steveklabnik.com/posts/2011-07-03-nobody-understands-rest-or-http
EDIT:
After reading the comment where you clarified you're sending the server a set of objects (not persisted server side) and it returns the subset with the dupes filtered out (like a server side helper function), some options are:
Do this client/browser side if possible - why take the network roundtrip to filter out dupes out of collection?
If for some reason only the server has specific knowledge/data to determine that two items are functional equivalent (even though data not exactly the same), then consider POSTing the data set to the server with the response body containing the unique/filtered set. Even though the server isn't persisting the set, it would fall into a 'temporal' object or set and the server is modifying it. It's not conceptually a GET of server resources and caching offers no benefits in that scenario.
Last question first: What really matters is getting the job done in a way that is
Correct
As easy to use as practical
Easily maintained by future programmers (likely to include yourself)
REST is a natural fit for operations on resources where each URL matches some object that can be manipulated. It is a less natural fit for other uses, but these are more guidelines than actual rules. Others have pointed out the original dissertation on REST, but it is worth remembering that few implementations are pure.
If you have several URLs that perform these transformative kinds of functions, consider putting them in their own special URL space, like /api/filter and /api/transliterate, etc.. That will help users and maintainers alike know that certain URLs aren't REST, but are more like remote procedure calls. Posting data to these URLs results in you getting some kind of data back.
If you get stuck on specific names you should make a list of candidates, have a few beers, then choose one from the list. That's what I do when I get stuck on minutia.
SOAP is a neat protocol and has its uses, but it tends to be very heavy. Good documentation and consistency are probably more important to your budding API than using any specific technology.

Resources