Why is URN one of more popular formats used to uniquely identify the resource? - uri

I somewhat understand that URNs are used to provide unique and location independent name for the resource. Yet I fail to see their usefulness and how exactly they work:
a) In order for URN to really be unique, there would have to be some central authority (similar to authority for domain names) where we could register URNs and that way ensure they are unique.
Since there isn’t any such authority, how else do we make sure that our URNs are unique? And if we can’t. then what’s the point of having them?
b) Also,I don’t understand the reasoning behind URNs having the format urn:NID:NSS. What makes this format more efficient/logical than for example urn:NID:NID1:NSS?
c) And finally, how can URN help us locate a resource on the internet?
EDIT:
I'm not sure what you mean. NID is the Namespace Identifier and NSS is the Namespace Specific String Are you proposing a system of sub-namespaces?
I’m just trying to make sense of why the format URN uses is “superb” to other formats, such as urn:NID:NID1:NSS

a) In order for URN to really be unique, there would have to be some central authority... Since there isn’t any such authority, how else do we make sure that our URNs are unique?
There is a central authority, called IANA, to register namespaces (the NID part), and each namespace is responsible for ensuring uniqueness.
b) Also, I don’t understand the reasoning behind URNs having the format urn:NID:NSS. What makes this format more efficient/logical than for example urn:NID:NID1:NSS?
The "urn:NID:NSS" description states the interpretation of NSS depends on the value of NID. For example, if NID is "isbn", then we know to interpret the NSS as an ISBN number, as in "urn:isbn:0451450523".
The NSS part can contain colons, so "urn:example:other:more" is valid syntax. (And in-fact is a valid URN as of 2013-04-24.) For example, given "urn:mpeg:mpeg7:schema:2001", the NSS part is "mpeg7:schema:2001" and we interpret that according to the rules for the "mpeg" namespace.
Had "urn:NID:NID1:NSS" been required, it would have been redundant (some namespaces don't need a nested NID1) and superfluous (the authority for a namespace can already divide the NSS part up, as in the above mpeg example).
c) And finally, how can URN help us locate a resource on the internet?
URNs are not about location, that's a URL.

a) In order for URN to really be unique, there would have to be some central authority (similar to authority for domain names) where we could register URNs and that way ensure they are unique. Since there isn’t any such authority, how else do we make sure that our URNs are unique? And if we can’t. then what’s the point of having them?
An ISBN is used an a URN, and is managed by an agency.
b) Also,I don’t understand the reasoning behind URNs having the format urn:NID:NSS. What makes this format more efficient/logical than for example urn:NID:NID1:NSS?
I'm not sure what you mean. NID is the Namespace Identifier and NSS is the Namespace Specific String Are you proposing a system of sub-namespaces?
c) And finally, how can URN help us locate a resource on the internet?
A URN (Uniform Resource Name) doesn't help you locate something on the Internet. A URL (Uniform Resource Locator) does.
Also see What is the difference between URI and URL?

URNs
a URN ( Uniform Resource Name ), is supposed to be unique across both ( time and space ).
a URL\URI cannot guarantee his uniqueness, unlike a URN that can be a URI in the same time.
Maybe a URI Resource (X) in path (Y) is a valid URL, because the path can be a location, but the same whole Identifier (Z) can be duplicated in many physical, logical or virtual locations in the world.
``
# Unique only in the same actual location
Z = [Y => X];
A = [B => Z];
C = [D => Z];
But if we add A Uniform U (could be a domain name for example) at the beginning it can be more flexible but not unique (domains can get expired).
# Unique only in the same actual location
Z = [ U => Y => X ];
The same format can be extended and extended by other variables trying to make it as Unique as possible.
Because of this last, we have to make sure a more sophisticated and real unique format is here, that can identify more type of Resources across time and space.
``
"URNs" are not a "URLs" ( exception ofUnique persistant URL used as a name ), because they are not locating a resource, in fact they are more then what your think, they can identify [ *ideas, UUIDs, virtual or physical Objects and more* ], but both of them plus "URCs/data URIs" can be "URIs".
Note :
Take a look into a simple and more clear example of URNs here :
https://stackoverflow.com/a/1984274/5405973
And here is a very informative link :
https://stackoverflow.com/a/28865728/5405973

Related

x509certificate CN supported characters

I want to know whether X509Certificate CN(commonname) support with i18n characters and which are all the supported character set
I assume you are talking about the CN in the distinguished name of the issuer or subject of the X509 certificate in question.
RFC 5280 on "Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile" contains a definition of the allowed value for a common name AttributeTypeAndValue in a distinguished name
-- Naming attributes of type X520CommonName:
-- X520CommonName ::= DirectoryName (SIZE (1..ub-common-name))
--
-- Expanded to avoid parameterized type:
X520CommonName ::= CHOICE {
teletexString TeletexString (SIZE (1..ub-common-name)),
printableString PrintableString (SIZE (1..ub-common-name)),
universalString UniversalString (SIZE (1..ub-common-name)),
utf8String UTF8String (SIZE (1..ub-common-name)),
bmpString BMPString (SIZE (1..ub-common-name)) }
At the same time, though, it says
CAs
conforming to this profile MUST use either the PrintableString or
UTF8String encoding of DirectoryString
(DirectoryName in the ASN.1 comment above should actually be DirectoryString, cf. the errata.)
There are certain exceptions to this for the sake of backward compatibility but let's consider the general case.
Thus, the common name may either be a PrintableString or an UTF8String. The former allows only to use a small subset of the characters the latter does. So you effectively are limited to what can be represented in UTF-8.
This does not mean, though, that you can go to a CA of your choice and insist on getting a certificate with a subject common name containing the wildest Unicode characters. CAs may have limited the set of characters they allow in the subjects of certificates they issue. This might be accidental (their software for some reason may be limited to that set), intentional to allow interoperability with other legacy software, or a deliberate security measure, e.g. to prevent misuse of similar looking Unicode characters.
Such restriction may even be documented in their CA certificates by use of name constraint extensions; in that case the CA cannot circumvent the restrictions in any way.

Use of typed URI in sesame sail openrdf

My question is simple but maybe non-sense. (in that case , sorry to people who gonna spend time to explain me why )
I'd like to create a resource like (i dont show all the resource declaration here ) :
<owl:DatatypeProperty rdf:about="relation:isPartOf">
<rdfs:domain rdf:resource="http://www.w3.org/2004/02/skos/core#note"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#anyURI"/>
</owl:DatatypeProperty>
<rdf:Description rdf:about="resource:context:sc#c1">
<skos:note rdf:datatype="relation:isPartOf" rdf:resource="resource:context:sc#c2">
</skos:note>
</rdf:Description>
Important to see is the triple about a skos:note relation
Subject : c1 a uri. Predicate : a skos:note , Object : a typed URI
My URI is not a direct URI but a "relation;isPartOf" uri.
I create a custom typedUri class to do that / i used a home made triple store so i can use my own class.
I change a little bit the RDFXMWritter to output these example. so "it works".
My question is more : Can a URI be typed like this ? why sesame openrdf do not provide a TypedURI class ? I'm sure there is a good reason ? any help, ideas or answers would be nice.
i'm quite sure , my idea to create a TypedURi class is wrong somewhere . but where ? :-)
thank you
EDIT : the TypedURI is not really a new kind of resource. The URI in my context is still a URI. i just declare that inside my skos:note statement , that for c1 , the object of the statement is a data of type "relation:isPartOf" and the range of the data is a anyURI.
... The typedURI helps to implements the datatype with such a range.
First of all: no, a URI can not be typed like this in RDF. Which also answers your second question: OpenRDF Sesame does not provide this functionality because it is not part of the RDF model.
Typing of URIs (or more accurately, resources, which are identified using URIs) is done by using an rdf:type relation, linking the resource URI to a class URI. For example, to make the resource ex:p1 of type foaf:Person, we would say (using Turtle syntax for RDF):
ex:p1 rdf:type foaf:Person .
There's another kind of typing in RDF, namely datatyping. This only applies to literal values so it can not be used on a URI. It is used to make a literal value a string, an integer number, a date, etc.
Update a confusion may arise because xsd:anyURI is a valid datatype in RDF, and it is (in XML Schema) defined to be a type for URIs. However, when using a datatype in RDF, its lexical space is always a literal (simply because the spec only allows for literals to actually have a datatype). So you could indeed do something like this (using Turtle syntax for literal notation):
"http://www.example.org/some/uri"^^xsd:anyURI
But from the point of view of the RDF model, this is not a URI, but a literal string (with datatype xsd:anyURI). So in a sense, yes, you can add types to URIs in RDF, but you can only do this by "converting" them to literals first.

Different RESTful representations of the same resource

My application has a resource at /foo. Normally, it is represented by an HTTP response payload like this:
{"a": "some text", "b": "some text", "c": "some text", "d": "some text"}
The client doesn't always need all four members of this object. What is the RESTfully semantic way for the client to tell the server what it needs in the representation? e.g. if it wants:
{"a": "some text", "b": "some text", "d": "some text"}
How should it GET it? Some possibilities (I'm looking for correction if I misunderstand REST):
GET /foo?sections=a,b,d.
The query string (called a query string after all) seems to mean "find resources matching this condition and tell me about them", not "represent this resource to me according to this customization".
GET /foo/a+b+d My favorite if REST semantics doesn't cover this issue, because of its simplicity.
Breaks URI opacity, violating HATEOAS.
Seems to break the distinction between resource (the sole meaning of a URI is to identify one resource) and representation. But that's debatable because it's consistent with /widgets representing a presentable list of /widget/<id> resources, which I've never had a problem with.
Loosen my constraints, respond to GET /foo/a, etc, and have the client make a request per component of /foo it wants.
Multiplies overhead, which can become a nightmare if /foo has hundreds of components and the client needs 100 of those.
If I want to support an HTML representation of /foo, I have to use Ajax, which is problematic if I just want a single HTML page that can be crawled, rendered by minimalist browsers, etc.
To maintain HATEOAS, it also requires links to those "sub-resources" to exist within other representations, probably in /foo: {"a": {"url": "/foo/a", "content": "some text"}, ...}
GET /foo, Content-Type: application/json and {"sections": ["a","b","d"]} in the request body.
Unbookmarkable and uncacheable.
HTTP does not define body semantics for GET. It's legal HTTP but how can I guarantee some user's proxy doesn't strip the body from a GET request?
My REST client won't let me put a body on a GET request so I can't use that for testing.
A custom HTTP header: Sections-Needed: a,b,d
I'd rather avoid custom headers if possible.
Unbookmarkable and uncacheable.
POST /foo/requests, Content-Type: application/json and {"sections": ["a","b","d"]} in the request body. Receive a 201 with Location: /foo/requests/1. Then GET /foo/requests/1 to receive the desired representation of /foo
Clunky; requires back-and-forth and some weird-looking code.
Unbookmarkable and uncacheable since /foo/requests/1 is just an alias that would only be used once and only kept until it is requested.
I would suggest the querystring solution (your first). Your arguments against the other alternatives are good arguments (and ones that I've run into in practise when trying to solve the same problem). In particular, the "loosen the constraints/respond to foo/a" solution can work in limited cases, but introduces a lot of complexity into an API from both implementation and consumption and hasn't, in my experience, been worth the effort.
I'll weakly counter your "seems to mean" argument with a common example: consider the resource that is a large list of objects (GET /Customers). It's perfectly reasonable to page these objects, and it's commonplace to use the querystring to do that: GET /Customers?offset=100&take=50 as an example. In this case, the querystring isn't filtering on any property of the listed object, it's providing parameters for a sub-view of the object.
More concretely, I'd say that you can maintain consistency and HATEOAS through these criteria for use of the querystring:
the object returned should be the same entity as that returned from the Url without the querystring.
the Uri without the querystring should return the complete object - a superset of any view available with a querystring at the same Uri. So, if you cache the result of the undecorated Uri, you know you have the full entity.
the result returned for a given querystring should be deterministic, so that Uris with querystrings are easily cacheable
However, what to return for these Uris can sometimes pose more complex questions:
returning a different entity type for Uris differing only by querystring could be undesirable (/foo is an entity but foo/a is a string); the alternative is to return a partially-populated entity
if you do use different entity types for sub-queries then, if your /foo doesn't have an a, a 404 status is misleading (/foo does exist!), but an empty response may be equally confusing
returning a partially-populated entity may be undesirable, but returning part of an entity may not be possible, or may be more confusing
returning a partially populated entity may not be possible if you have a strong schema (if a is mandatory but the client requests only b, you are forced to return either a junk value for a, or an invalid object)
In the past, I have tried to resolve this by defining specific named "views" of required entities, and allowing a querystring like ?view=summary or ?view=totalsOnly - limiting the number of permutations. This also allows for definition of a subset of the entity that "makes sense" to the consumer of the service, and can be documented.
Ultimately, I think that this comes down to an issue of consistency more than anything: you can meet HATEOAS guidance using the querystring relatively easily, but the choices you make need to be consistent across your API and, I'd say, well documented.
I've decided on the following:
Supporting few member combinations: I'll come up with a name for each combination. e.g. if an article has members for author, date, and body, /article/some-slug will return all of it and /article/some-slug/meta will just return the author and date.
Supporting many combinations: I'll separate member names by hyphens: /foo/a-b-c.
Either way, I'll return a 404 if the combination is unsupported.
Architectural constraint
REST
Identifying resources
From the definition of REST:
a resource R is a temporally varying membership function MR(t), which for time t maps to a set of entities, or values, which are equivalent. The values in the set may be resource representations and/or resource identifiers.
A representation being an HTTP body and an identifier being a URL.
This is crucial. An identifier is just a value associated with other identifiers and representations. That's distinct from the identifier→representation mapping. The server can map whatever identifier it wants to any representation, as long as both are associated by the same resource.
It's up to the developer to come up with resource definitions that reasonably describe the business by thinking of categories of things like "users" and "posts".
HATEOAS
If I really care about perfect HATEOAS, I could put a hyperlink somewhere in the /foo representation to /foo/members, and that representation would just contain a hyperlink to every supported combination of members.
HTTP
From the definition of a URL:
The query component contains non-hierarchical data that, along with data in the path component, serves to identify a resource within the scope of the URI's scheme and naming authority (if any).
So /foo?sections=a,b,d and /foo?sections=b are distinct identifiers. But they can be associated within the same resource while being mapped to different representations.
HTTP's 404 code means that the server couldn't find anything to map the URL to, not that the URL is not associated with any resource.
Functionality
No browser or cache will ever have trouble with slashes or hyphens.
Actually it depends on the functionality of the resource.
If for example the resource represents an entity:
/customers/5
Here the '5' represents an id of the customer
Response:
{
"id": 5,
"name": "John",
"surename": "Doe",
"marital_status": "single",
"sex": "male",
...
}
So if we will examine it closely, each json property actually represents a field of the record on customer resource instance.
Let's assume consumer would like to get partial response, meaning, part of the fields. We can look at it as the consumer wants to have the ability to select the various fields via the request, which are interesting to him, but not more (in order to save traffic or performance, if part of the fields are hard to compute).
I think in this situation, the most readable and correct API would be (for example, get only name and surename)
/customers/5?fields=name,surename
Response:
{
"name": "John",
"surename": "Doe"
}
HTTP/1.1
if illegal field name is requested - 404 (Not Found) is returned
if different field names are requested - different responses will be generated, which also aligns with the caching.
Cons: if the same fields are requested, but the order is different between the fields (say: fields=id,name or fields=name,id), although the response is the same, those responses will be cached separately.
HATEOAS
In my opinion pure HATEOAS is not suitable for solving this particular problem. Because in order to achieve that, you need a separate resource for every permutation of field combinations, which is overkill, as it is bloating the API extensively (say you have 8 fields in a resource, you will need permutations!).
if you model resources only for the fields but not all the permutations, it has performance implications, e.g. you want to bring the number of round trips to minimum.
If a,b,c are property of a resource like admin for role property the right way is to use is the first way that you've suggested GET /foo?sections=a,b,d because in this case you would apply a filter to the foo collection. Otherwise if a,b and c are a singole resource of foo collection the the way that would follow is to do a series of GET requests /foo/a /foo/b /foo/c. This approach, as you said, has a high payload for request but it is the correct way to follow the approach Restfull. I would not use the second proposal made ​​by you because plus char in a url has a special meaning.
Another proposal is to abandon use GET and POST and create an action for the foo collection like so: /foo/filter or /foo/selection or any verb that represent an action on the collection. In this way, having a post request body, you can pass a json list of the resource you would.
you could use a second vendor media-type in the request header application/vnd.com.mycompany.resource.rep2, you can't bookmark this however, query-parameters are not cacheable (/foo?sections=a,b,c) you could take a look at matrix-parameters however regarding this question they should be cacheable URL matrix parameters vs. request parameters

Clarification on URI path component?

According to RFC 3986 Section 3 - Syntax Components:
The scheme and path components are required, though the path may be
empty (no characters).
Can someone clarify how the path component can be required if it's able to be empty? Maybe I'm misunderstanding the definition of "required" in this context, but I assumed it to mean something along the lines of "must be non-empty," which obviously conflicts with the spec here.
Here, "required" means merely "always present": the scheme and path
components of an absolute URI are always present.
The scheme component can't be empty because the production
"scheme" requires at least one character.
The path component can be empty because the production
"path-empty" (part of "hier-part") consists of zero characters.
A common practical example of an empty - more precisely, an abempty - path is a URI like http://stackoverflow.com where the path is empty. The authority component (in this case it is stackoverflow.com) alone isn't enough information to identify a resource.
When the authority is empty, the path must begin with a / in order to distinguish the path from the authority - scheme:/// is a valid URI - hence an abempty path. Also take a look at this answer for further reading.

Authoritative position of duplicate HTTP GET query keys

I am having trouble on finding authoritative information about the behavior with HTTP GET query string duplicate fields, like
http://example.com/page?field=foo&field=bar
and in particular if the order is kept or not. Most web-oriented languages produce an array containing both foo and bar associated to a key "field", but I would like to know if authoritative statement exist (e.g. on a RFC) about this point. RFC 3986 has a section 3.4. Query, which refers to key=value pairs, but nothing is said on how to interpret order and duplicate fields and so on. This makes sense, since it's backend dependent, and not in the scope of that RFC...
Although a de-facto standard exists, I'd like to see an authoritative source for it, just out of curiosity.
There is no spec on this. You may do what you like.
Typical approaches include: first-given, last-given, array-of-all, string-join-with-comma-of-all.
Suppose the raw request is:
GET /blog/posts?tag=ruby&tag=rails HTTP/1.1
Host: example.com
Then there are various options for what request.query['tag'] should yield, depending on the language or the framework:
request.query['tag'] => 'ruby'
request.query['tag'] => 'rails'
request.query['tag'] => ['ruby', 'rails']
request.query['tag'] => 'ruby,rails'
The situation seems to have changed since this question was asked and the accepted answer was written 12 years ago. I believe we now have an authoritative source: The WHATWG URL Standard describes the process of extracting and parsing a query string in detail in section 6.2 (https://url.spec.whatwg.org/#interface-urlsearchparams) and section 5.1 on x-www-form-urlencoded parsing (https://url.spec.whatwg.org/#urlencoded-parsing). The parsing output is "an initially empty list of name-value tuples where both name and value hold a string", where a list is defined as a finite ordered sequence, and the key-value pairs are added to this list in the order they appear in the URL. At first there is no mention of repeated keys, but some methods on the URLSearchParams class in section 6.2 (https://url.spec.whatwg.org/#interface-urlsearchparams) set clear expectations on ordering: "The getAll(name) method steps are to return the values of all name-value pairs whose name is name... in list order"; The sort() method specifies that "The relative order between name-value pairs with equal names must be preserved." (Emphasis mine). Examining the Github issue referenced in the commit where the sort method was added, we see that the original proposal was to sort on values where keys were identical, but this was changed: "The reason for the default sort not affecting the value order is that ordering of the values can be significant. We should not assume that it's ok to move the order of the values around." (https://github.com/whatwg/url/issues/26#issuecomment-271600764)
I can confirm that for PHP (at least in version 4.4.4 and newer) it works like this:
GET /blog/posts?tag=ruby&tag=rails HTTP/1.1
Host: example.com
results in:
request.query['tag'] => 'rails'
But
GET /blog/posts?tag[]=ruby&tag[]=rails HTTP/1.1
Host: example.com
results in:
request.query['tag'] => ['ruby', 'rails']
This behavior is the same for GET and POST data.
yfeldblum's answer is perfect.
Just a note about a fifth behavior I noticed recently: on Windows Phone, opening an application with an uri with a duplicate query key will result in NavigationFailed with:
System.ArgumentException: An item with the same key had already been added.
The culprit is System.Windows.Navigation.UriParsingHelper.InternalUriParseQueryStringToDictionary(Uri uri, Boolean decodeResults).
So the system won't even let you handle it the way you want, it will forbid it. You are left with the only solution to choose your own format (CSV, JSON, XML, ...) and uri-escape-it.
Most (all?) of the frameworks offer no guarantees, so assume they will be returned in random order.
Always take the safest approach.
For example, java HttpServlet interface:
ServletRequest.html#getParameterValues
Even the getParameterMap method leaves out any mention about parameter order (the order of a java.util.Map iterator cannot be relied on either.)
Typically, duplicate parameter values like
http://example.com/page?field=foo&field=bar
result in a single queryString parameter that is an array:
field[0]=='foo'
field[1]=='bar'
I've seen this behavior in ASP, ASP.NET and PHP4.
The ?array[]=value1&array[]=value2 approach is certainly a very popular one.
supported by most Javascript frameworks
supported by Java Spring
supported by PHP

Resources