Why use quality values in the HTTP Accept-Language header? - http

In HTTP, the Accept-Language request header looks like this:
Accept-Language: da, en-gb;q=0.8, en;q=0.7
Why were the quality values (q=...) included in the HTTP specification? Couldn't one sort the languages by quality, pick an arbitrary order for languages with the same quality, and leave out any languages with q=0?

Interesting question.
The discussion of how this feature came to be is probably buried somewhere in the mailing list archives, for which I could not find a valid link. Your example is not the only problematic one. What is a server to with "fr; q=1.0, en; q=1.0" if it supports both languages. Serve the french because it is first? What about "fr, en; q=1.0"?
Seems to me that an ordered list of language preferences would be a better fit for the problem than the current weighted (and maybe sorted) list. There are too many edge cases where the spec is mum about the expected behavior from an implementation.
At least (some of) the contributors to the spec agree this feature is far from perfect (Key Differences between HTTP/1.0 and HTTP/1.1 - Paper Presented at The Eighth International World Wide Web Conference):
"Because the content-negotiation mechanism allows qvalues and wildcards, and expresses variation across many dimensions (language, character-set, content-type, and content-encoding) the automated choice of the ``best available'' variant can be complex and might generate unexpected outcomes. These choices can interact with caching in subtle ways; see the discussion in Section 3.4.
Content negotiation promises to be a fertile area for additional protocol evolution. For example, the HTTP working group recognized the utility of automatic negotiation regarding client implementation features, such as screen size, resolution, and color depth. The IETF has created the Content Negotiation working group to carry forward with work in the area."
In short I have no real answer but hopefully a participant in the specification process pipes in.

Keep in mind that Quality Values are used for many other (Accept-*) headers, so while it may not make much sense in context of Accept-Language and feel overcomplicated for selecting languages, the same universal concept is being used for MIME-types (including wildcards for whole groups) and much more, so don't judge just based solely on how it fits the user language selection.

Related

Is there any 'correct' way of negotiate http quality values / q-factors?

Whilst there are many possible implementations and suggestions how to interpret HTTP-quality-values / -Q-factors floating on the internet, I was unable to find 'correct' interpretations for several cases and why they should be interpreted this way. This would be mandatory for creating a "bulletproof" parsing mechanism.
For example, the MDN-Documentation lists following examples:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
would be read as follows:
value
priority
text/html, application/xhtml+xml
1.0
application/xml
0.9
*/*
0.8
It seems clear to me, that according to RFC 7231 (HTTP/1.1), the first argument would be qualified as q=1.0 due to defaults ('(no value is the same as q=1)'), at least it seems to be true for the second argument, as well, without clearly stating that this is due to the defaulting.
Furthermore it is totally unclear for me, how the following constructed statement should be parsed:
text/html,application/xhtml+xml,application/xml;q=0.9,text/plain,image/png;q=0.95,text/html;q=0.5,image/jpeg;q=0.99,image*;q=1.5
Keeping aside the obvious pointlessness of this statement, it leads to several problems:
Should you consider this totally invalid, due to the several imperfections of this call, even partially breaking the standards at all (q>1 / q<0 is not allowed)?
For example, both MDN: Accept-Language and RFC7231 5.3.5 Accept-Language state, it might be okay to reject the request using HTTP/406 Not Acceptable - but advises against it due to usability reasons.
Should you expect text/plain to be q=1.0 due to its qualification not specified, even though its between two arguments not defined as q=1.0? Or "should" you process it with some kind of state machine, transfering over former values, so it becomes q=0.9?
How should you respond to conflicting information? (text/html has both q=1.0 and q=0.5) Should it get overwritten, or is it just "another" entity in the list, resulting in a duplicate?
How does the qualifier affect the order of preference, when the request could be fully satisfied by the server in all arguments, but it is provided in non-descending or even random order? I would assume, based on the resources given so far, to be expected to sort descending by the q-Value. However, the second example on the MDN page leaves this up to debate.
The latter example lists as follows:
text/html;q=0.8,text/*;q=0.8,*/*;q=0.8
which would expand to
value
priority
text/html
0.8
text/*
0.8
*/*
0.8
In this example, every value has the same q-factor, which yields the question whether the application "should" sort such statements from the most specific to the least specific, or whether it "should" be kept in the order of declarance. If so, what purpose would and could the qualifier serve, especially as - according to the declaration - every content would be "accepted" after all? I assume, when stated on MDN, the example would make sense in some way and not just be hypothetically, but I am kind of uncertain in this case. Most examples out there simply sort by the qualifier, which definitively would result in unexpected behaviour in this scenario.
And as MIME-types tend to consist of two components, according to RFC 2046, any reorder by specificality would have to consider at least two dimensions; this would lead to potential unexpected behaviour for permutations like
text/html;q=0.8,text/*;q=0.8,*/html;q=0.8,*/*;q=0.8
Also, it is unclear to me, whether I should expect parameters to the arguments, like for example
text/html;charset=utf-8;q=0.8,text/plain;charset=utf-8;q=0.8,text/html;q=0.5
as the MDN page for quality values states that there are headers like accept handling additional specifiers resulting in definitions like text/html;level=1, but not going further into detail about that. Considering the RFC7231 ABNF definitions, it seems at least possible, leading to the suggestion that an application developer shouldn't rely on matching solely for only some kind of q=<potential floating point number [\.0-9]+> representation and expecting everything remaining a potential request argument.
There might be solutions for all these hypothetical problems which feel "natural" in some way, however I am unable to find any reliable clues confirming them.
This all concludes to the following question:
Is there even a right way intented, or is this left open to the application developer?
If so, how should an application server react to the stated scenarios and why / where is it documented?
As far as I can see in my research so far, the topic seems to be rather sparse in the RFCs as well as the browser documentation, which might conclude it is designed to be more of a tool than a ruleset. But I am unsure if I am not missing out something "self-evident", as I am not aware of every RFC ever published.
As far as I can tell, the RFC and the MDN example are consistent, no?
Regarding your example:
text/html,application/xhtml+xml,application/xml;q=0.9,text/plain,image/png;q=0.95,text/html;q=0.5,image/jpeg;q=0.99,image*;q=1.5
This parses into
text/html
application/xhtml+xml
application/xml;q=0.9
text/plain,
image/png;q=0.95
text/html;q=0.5
image/jpeg;q=0.99
image*;q=1.5
where the last element is an invalid media range. It's up to you whether you want to ignore just that element or the complete header field.

Etag: weak vs strong example

I have been reading about Etags, and I understand that there are 2 ways of generating an Etag, weak and strong. Weak Etags are computationally easier to generate than strong ones. I have also come to know that Weak Etags are practically enough for most use cases.
from MDN
Weak validators are easy to generate but are far less useful for
comparisons. Strong validators are ideal for comparisons but can be
very difficult to generate efficiently.
another snippet:
Weak Etag values of two representations of the same resources might be
semantically equivalent, but not byte-for-byte identical.
I am finding it hard to understand what does it mean for a resource to be semantically similar but not byte by byte same ? It would be great to see some examples.
EDIT: found an example here, but i don't get it:
Weak Validation: The two resource representations are semantically
equivalent, e.g. some of the content differences are not important
from the business logic perspective e.g. current date displayed on the
page might not be important for updating the entire resource for it.
Is it like while generating the Etag, you can decide that the changes in content are not important for the functionality (for e.g. a css property change for font-size) and respond with 304 ? If yes, then when is the resource updated on the browser, as I guess as long as the Etag is the same , the browser would not get the latest version. In this case it might mean that when a major change happens and a new Etag is created, the css property change would only then be sent to the browser along with the major change.
My suggestion is to look at the specification, RFC 7232, section 2.1. It's only a couple pages long and may answer all of your questions.
You asked for examples, here are some from the specification:
For example, the representation of a weather report that changes in
content every second, based on dynamic measurements, might be grouped
into sets of equivalent representations (from the origin server's
perspective) with the same weak validator in order to allow cached
representations to be valid for a reasonable period of time.
A representation's modification time, if defined with only
one-second resolution, might be a weak validator if it is possible
for the representation to be modified twice during a single second
and retrieved between those modifications.
If the origin server sends the same validator for a representation with
a gzip content coding applied as it does for a representation with no
content coding, then that validator is weak.
That last one represents what is probably the most common use of weak ETags: servers converting strong ETags into weak ones when they gzip the content. Nginx does this, for example.
The specification also explains when to change a weak ETag:
An origin server SHOULD change a weak entity-tag whenever it considers prior
representations to be unacceptable as a substitute for the current
representation.
In other words, it's up to you to decide if two representations of a resource are acceptable substitutions or not. If they are, you can improve caching performance by giving them the same weak ETag.

Which tincan verbs to use

For data normalisation of standard tin can verbs, is it best to use verbs from the tincan registry https://registry.tincanapi.com/#home/verbs e.g.
completed http://activitystrea.ms/schema/1.0/complete
or to use the adl verbs like those defined:
in the 1.0 spec at https://github.com/adlnet/xAPI-Spec/blob/master/xAPI.md
this article http://tincanapi.com/2013/06/20/deep-dive-verb/
and listed at https://github.com/RusticiSoftware/tin-can-verbs/tree/master/verbs
e.g.
completed http://adlnet.gov/expapi/verbs/completed
I'm confused as to why those in the registry differ from every other example I can find. Is one of these out of date?
It really depends on which "profile" you want to target with your Statements. If you are trying to stick to e-learning practices that most closely resemble SCORM or some other standard then the ADL verbs may be most fitting. It is a very limited set, and really only the "voided" verb is provided for by the specification. The other verbs were related to those found in 0.9 and have become the de facto set, but aren't any more "standard" than any other URI. If you are targeting statements to be used in an Activity Streams way, specifically with a social application then you may want to stick with their set. Note that there are verbs in the Registry that are neither ADL coined or provided by the Activity Streams specification.
If you aren't targeting any specific profile (or existing profile) then you should use the terms that best capture the experiences which you are trying to record. And we ask that you either coin those terms at our Registry so that they are well formed and publicly available, or if you coin them under a different domain then at least get them catalogued in our Registry so others may find them. Registering a particular term in one or more registries will hopefully help keep the list of terms from exploding as people search for reusable items. This will ultimately make reporting tools more interoperable with different content providers.

Why isn't HTTP PUT allowed to do partial updates in a REST API?

Who says RESTful APIs must support partial updates separately via HTTP PATCH?
It seems to have no benefits. It adds more work to implement on the server side and more logic on the client side to decide which kind of update to request.
I am asking this question within the context of creating a REST API with HTTP that provides abstraction to known data models. Requiring PATCH for partial updates as opposed to PUT for full or partial feels like it has no benefit, but I could be persuaded.
Related
http://restcookbook.com/HTTP%20Methods/idempotency/ - this implies you don't have control over the server software that may cache requests.
What's the justification behind disallowing partial PUT? - no clear answer given, only reference to what HTTP defines for PUt vs PATCH.
http://groups.yahoo.com/neo/groups/rest-discuss/conversations/topics/17415 - shows the divide of thoughts on this.
Who says? The guy who invented REST says:
#mnot Oy, yes, PATCH was something I created for the initial HTTP/1.1 proposal because partial PUT is never RESTful. ;-)
https://twitter.com/fielding/status/275471320685367296
First of all, REST is an architectural style, and one of its principles is to leverage on the standardized behavior of the protocol underlying it, so if you want to implement a RESTful API over HTTP, you have to follow HTTP strictly for it to be RESTful. You're free to not do so if you think it's not adequate for your needs, nobody will curse you for that, but then you're not doing REST. You'll have to document where and how you deviate from the standard, creating a strong coupling between client and server implementations, and the whole point of using REST is precisely to avoid that and focus on your media types.
So, based on RFC 7231, PUT should be used only for complete replacement of a representation, in an idempotent operation. PATCH should be used for partial updates, that aren't required to be idempotent, but it's a good to make them idempotent by requiring a precondition or validating the current state before applying the diff. If you need to do non-idempotent updates, partial or not, use POST. Simple. Everyone using your API who knows how PUT and PATCH works expects them to work that way, and you don't have to document or explain what the methods should do for a given resource. You're free to make PUT act in any other way you see fit, but then you'll have to document that for your clients, and you'll have to find another buzzword for your API, because that's not RESTful.
Keep in mind that REST is an architectural style focused on long term evolution of your API. To do it right will add more work now, but will make changes easier and less traumatic later. That doesn't mean REST is adequate for everything and everyone. If your focus is the ease of implementation and short term usage, just use the methods as you want. You can do everything through POST if you don't want to bother about clients choosing the right methods.
To extend on the existing answer, PUT is supposed to perform a complete update (overwrite) of the resource state simply because HTTP defines the method in this way. The original RFC 2616 about HTTP/1.1 is not very explicit about this, RFC 7231 adds semantic clarifications:
4.3.4 PUT
The PUT method requests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message payload. A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being sent in a 200 (OK) response.
As stated in the other answer, adhering to this convention simplifies the understanding and usage of APIs, and there is no need to explicitly document the behavior of the PUT method.
However, partial updates are not disallowed because of idempotency. I find this important to highlight, as these concepts are often confused, even on many StackOverflow answers (e.g. here).
Idempotent solely means that applying a request one or many times results in the same effect on the server. To quote RFC 7231 once more:
4.2.2 Idempotent methods
A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
As long as a partial update contains only new values of the resource state and does not depend on previous values (i.e. those values are overwritten), the requirement of idempotency is fulfilled. Independently of how many times such a partial update is applied, the server's state will always hold the values specified in the request.
Whether an intermediate request from another client can change a different part of the resource is not relevant, because idempotency refers to the operation (i.e. the PUT method), not the state itself. And with respect to the operation of a partial overwriting update, its application yields the same effect after being applied once or many times.
On the contrary, an operation that is not idempotent depends on the current server state, therefore it leads to different results depending on how many times it is executed. The easiest example for this is incrementing a number (non-idempotent) vs. setting it to an absolute value (idempotent).
For non-idempotent changes, HTTP foresees the methods POST and PATCH, whereas PATCH is explicitly designed to carry modifications to an existing resource, whereas POST can be interpreted much more freely regarding the relation of request URI, body content and side effects on the server.
What does this mean in practice? REST is a paradigma for implementing APIs over the HTTP protocol -- a convention that many people have considered reasonable and is thus likely to be adopted or understood. Still, there are controversies regarding what is RESTful and what isn't, but even leaving those aside, REST is not the only correct or meaningful way to build HTTP APIs.
The HTTP protocol itself puts constraints on what you may and may not do, and many of them have actual practical impact. For example, disregarding idempotency may result in cache servers changing the number of requests actually issued by the client, and subsequently disrupt the logic expected by applications. It is thus crucial to be aware of the implications when deviating from the standard.
Being strictly REST-conform, there is no completely satisfying solution for partial updates (some even say this need alone is against REST). The problem is that PATCH, which first appears to be made just for this purpose, is not idempotent. Thus, by using PATCH for idempotent partial updates, you lose the advantages of idempotency (arbitrary number of automatic retries, simpler logic, potential for optimizations in client, server and network). As such, you may ask yourself if using PUT is really the worst idea, as long as the behavior is clearly documented and doesn't break because users (and intermediate network nodes) rely on certain behavior...?
Partial updates are allowed by PUT (according to RFC 7231 https://www.rfc-editor.org/rfc/rfc7231#section-4.3.4).
",... PUT request is defined as replacing the state of the target resource." - replacing part of object basically change state of it.
"Partial content updates are possible by targeting a separately identified resource with state that overlaps a portion of the larger resource, ..."
According to that RFC next request is valid: PUT /resource/123 {name: 'new name'}
It will change only name for specified resource. Specifying id inside request payload would be incorrect (as PUT not allow partial updates for unspecified resources).
PS: Below is example when PATCH is useful.
There is object that have Array inside. With PUT you can't update specific value. You only could replace whole list to new one. With PATCH, you could replace one value to another. With maps and more complex objects benefit will be even bigger.

Is there a way to use AllegroGraph with a Lisp other than ACL?

I'm so far only reading the documentation, and it says that in order to use the Lisp client I have to use ACL. ACL, the Express edition has a 30 day expiration date. Since I'm too far from even considering any commercial use, I'm not likely to buy it in the observable future.
Did anyone try it with other Lisp? Is it at all permitted by the license? (My guess is "yes", because, for example, Python client doesn't require any special purchases of course.)
Sure, actually. Allegrograph supports a superset of the Sesame 2.0 HTTP protocol for graph stores. The key documentation you should have a look at is:
http://www.franz.com/agraph/support/documentation/current/http-protocol.html
As an example, to request a list of repositories in the root catalog, the HTTP interaction would be as follows:
GET /repositories HTTP/1.1
Accept: application/json
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
[{"uri": "<http://localhost:10035/repositories/test>",
"id": "\"test\"",
"title": "\"test\"",
"readable": true
"writeable": true}]
Note the Accept: header which, in this case, specifies JSON as the format of desired response. There are other formats available, ntriples for example, hut refer to the documentation for the most current list and proper MIME type to use for each.
One thing to be aware of, since you will be getting information back that has no semantic definition on your remote lisp instance, you will almost certainly want to define these yourself in order to build a useful library. So, among others, you would probably want to define data structures (say, classes for example) representing nodes, liberals, triples, and so on. This is actually not the easiest thing to know how to model effectively if you've never thought much about such a thing before, but it is fairly straightforward and not too involved in practice. I'd recommend perhaps starting out using a library such as Ora Lasilla's Wilbur, which I have used many tines and always find it a delight to read through. In fact, the original Allegrograph, years ago, started out using Wilbur as a basis,miso you will find that although there are many differences now there is still a reasonable compatibility of ideas between the two projects. You can fetch current sources for Wilbur from:
http://github.com/lisp/de.setf.wilbur
I hope this can at least help point you in the right direction to get started. Good luck!

Resources