Incrementing resource counter in a RESTful way: PUT vs POST - http

I have a resource that has a counter. For the sake of example, let's call the resource profile, and the counter is the number of views for that profile.
Per the REST wiki, PUT requests should be used for resource creation or modification, and should be idempotent. That combination is fine if I'm updating, say, the profile's name, because I can issue a PUT request which sets the name to something 1000 times and the result does not change.
For these standard PUT requests, I have browsers do something like:
PUT /profiles/123?property=value&property2=value2
For incrementing a counter, one calls the url like so:
PUT /profiles/123/?counter=views
Each call will result in the counter being incremented. Technically it's an update operation but it violates idempotency.
I'm looking for guidance/best practice. Are you just doing this as a POST?

I think the right answer is to use PATCH. I didn't see anyone else recommending it should be used to atomically increment a counter, but I believe RFC 2068 says it all very well:
The PATCH method is similar to PUT except that the entity contains a
list of differences between the original version of the resource
identified by the Request-URI and the desired content of the resource
after the PATCH action has been applied. The list of differences is
in a format defined by the media type of the entity (e.g.,
"application/diff") and MUST include sufficient information to allow
the server to recreate the changes necessary to convert the original
version of the resource to the desired version.
So, to update profile 123's view count, I would:
PATCH /profiles/123 HTTP/1.1
Host: www.example.com
Content-Type: application/x-counters
views + 1
Where the x-counters media type (which I just made up) is made of multiple lines of field operator scalar tuples. views = 500 or views - 1 or views + 3 are all valid syntactically (but may be forbidden semantically).
I can understand some frowning-upon making up yet another media type, but I humbly suggest it's more correct than the POST / PUT alternative. Making up a resource for a field, complete with its own URI and especially its own details (which I don't really keep, all I have is an integer) sounds wrong and cumbersome to me. What if I have 23 different counters to maintain?

An alternative might be to add another resource to the system to track the viewings of a profile. You might call it "Viewing".
To see all Viewings of a profile:
GET /profiles/123/viewings
To add a viewing to a profile:
POST /profiles/123/viewings #here, you'd submit the details using a custom media type in the request body.
To update an existing Viewing:
PUT /viewings/815 # submit revised attributes of the Viewing in the request body using the custom media type you created.
To drill down into the details of a viewing:
GET /viewings/815
To delete a Viewing:
DELETE /viewings/815
Also, because you're asking for best-practice, be sure your RESTful system is hypertext-driven.
For the most part, there's nothing wrong with using query parameters in URIs - just don't give your clients the idea that they can manipulate them.
Instead, create a media type that embodies the concepts the parameters are trying to model. Give this media type a concise, unambiguous, and descriptive name. Then document this media type. The real problem of exposing query parameters in REST is that the practice often leads out-of-band communication, and therefore increased coupling between client and server.
Then give your system a uniform interface. For example, adding a new resource is always a POST. Updating a resource is always a PUT. Deleting is DELETE, and getiing is GET.
The hardest part about REST is understanding how media types figure into system design (it's also the part that Fielding left out of his dissertation because he ran out of time). If you want a specific example of a hypertext-driven system that uses and doucuments media types, see the Sun Cloud API.

After evaluating the previous answers I decided PATCH was inappropriate and, for my purposes, fiddling around with Content-Type for a trivial task was a violation of the KISS principle. I only needed to increment n+1 so I just did this:
PUT /profiles/123$views
++
Where ++ is the message body and is interpreted by the controller as an instruction to increment the resource by one.
I chose $ to deliminate the field/property of the resource as it is a legal sub-delimiter and, for my purposes, seemed more intuitive than / which, in my opinion, has the vibe of traversability.

I think both approaches of Yanic and Rich are interresting. A PATCH does not need to be safe or indempotent but can be in order to be more robust against concurrency. Rich's solution is certainly easier to use in a "standard" REST API.
See RFC5789:
PATCH is neither safe nor idempotent as defined by [RFC2616], Section
9.1.
A PATCH request can be issued in such a way as to be idempotent,
which also helps prevent bad outcomes from collisions between two
PATCH requests on the same resource in a similar time frame.
Collisions from multiple PATCH requests may be more dangerous than
PUT collisions because some patch formats need to operate from a
known base-point or else they will corrupt the resource.

Related

What is the expected logic of a batched POST request to a subresource

I like the idea of vectorized/batched requests similar to what the StackExchange API offers and would like to implement something for my own API, i.e. GET /users/1;2;3;4;5 would return the selected user resources with id 1 to 5.
I think this is fairly simple when reading data, but what would be the expected behavior for i.e. a POST request to a subresource?
POST /1;2;3;4;5/subresource
Would this mean:
Creation of five new subresources, assigned to each id (1:1)
Creation of a single new subresource, but assigned to each resource id (1:n)
I have a couple of concerns regarding this approach. First, resources should be uniquely addressable via certain resource locators (URIs). Using your approach however bypasses this requirement in some way IMO. This approach may also lead to other issues later on, i.e. plenty of frameworks do not allow URIs that exceed a certain character size.
Furthermore, instead of consecutive resource IDs the resource should use UUIDs instead. This will first and foremost prevent guessing attacks and also prevent logical issues on moving resources or inserting some in between.
The POST method requests that the target resource process the
representation enclosed in the request according to the resource's
own specific semantics.
In regards to HTTP POST operations, the specification clearly states that the semantics of any body received via POST is up to the service developer. So you are basically allowed to do anything within a POST request. As the semantics is totally up to you, you have to document the behavior explicitely. Not documenting the applied logic will leave a large grey-zone for service users.

How should Transient Resources be retrieved in a RESTful API

For a while I was (wrongly) thinking that a RESTful API just exposed CRUD operation to persisted entities for a web application. When you code something up in "the real world" you soon find out that this is not enough. For example, a bank account transfer doesn't have to be a persisted entity. It could be a transient resource where you POST to /transfers/ and in the payload you specify the details:
{"accountToCredit":1234, "accountToDebit":5678, "amount":10}
Using POST here makes sense because it changes the state on the server ($10 moves from one account to another every time this POST occurs).
What should happen in the case where it doesn't affect the server? The simple first answer would be to use GET. For example, you want to get a list of savings and checking accounts that have less than $100. You would then call something like GET to /accounts/searchResults?minBalance=0&maxBalance=100. What happens though if your search parameter need to use complex objects that wouldn't fit in the maximum length of a GET request.
My first thought was to use POST, but after thinking about it some more it should probably be a PUT since it isn't changing the state of the server, but from my (limited) understanding I always though of PUT as updating a resource and POST as creating a resource (like creating this search results). So which should be used in this case?
I found the following links which provide some information but it wasn't clear to me what should be used in the different cases:
Transient REST Representations
How to design RESTful search/filtering?
RESTful URL design for search
I would agree with your approach, it seems reasonable to me to use GET when searching for resources, and as said in one of your provided links, the whole point of query strings is for doing things like search. I also agree that PUT fits better when you want to update some resource in an idempotent way (no matter how many times you hit the request, the result will be the same).
So generally, I would do it as you propose. Now, if you are limited by the maximum length of GET request, then you could use POST or PUT, passing your parameters in a JSON, in a URI like:
PUT /api/search
You could see this as a "search resource" where you send new parameters. I know it seems like a workaround and you may be worried that REST is about avoiding verbs in the URIs. Well, there are few cases that it's still acceptable and RESTful to use verbs, e.g. in cases where calculation or conversion is involved to generate the result (for more about this, check this reference).
PS. I think this workaround is still RESTful, but even if it wasn't, REST isn't an obsession and an ultimate goal. Being pragmatic and keeping a clean API design might be a better approach, even if in few cases you are not RESTful.

RESTful Alternatives to DELETE Request Body

While the HTTP 1.1 spec seems to allow message bodies on DELETE requests, it seems to indicate that servers should ignore it since there are no defined semantics for it.
4.3 Message Body
A server SHOULD read and forward a message-body on any request; if the
request method does not include defined semantics for an entity-body,
then the message-body SHOULD be ignored when handling the request.
I've already reviewed several related discussions on this topic on SO and beyond, such as:
Is an entity body allowed for an HTTP DELETE request?
Payloads of HTTP Request Methods
HTTP GET with request body
Most discussions seem to concur that providing a message body on a DELETE may be allowed, but is generally not recommended.
Further, I've noticed a trend in various HTTP client libraries where more and more enhancements seem to be getting logged for these libraries to support request bodies on DELETE. Most libraries seem to oblige, although occasionally with a little bit of initial resistance.
My use case calls for the addition of some required metadata on a DELETE (e.g. the "reason" for deletion, along with some other metadata required for deletion). I've considered the following options, none of which seem completely appropriate and inline with HTTP specs and/or REST best practices:
Message Body - The spec indicates that message bodies on DELETE have no semantic value; not fully supported by HTTP clients; not standard practice
Custom HTTP Headers - Requiring custom headers is generally against standard practices; using them is inconsistent with the rest of my API, none of which require custom headers; further, no good HTTP response available to indicate bad custom header values (probably a separate question altogether)
Standard HTTP Headers - No standard headers are appropriate
Query Parameters - Adding query params actually changes the Request-URI being deleted; against standard practices
POST Method - (e.g. POST /resourceToDelete { deletemetadata }) POST is not a semantic option for deleting; POST actually represents the opposite action desired (i.e. POST creates resource subordinates; but I need to delete the resource)
Multiple Methods - Splitting the DELETE request into two operations (e.g. PUT delete metadata, then DELETE) splits an atomic operation into two, potentially leaving an inconsistent state. The delete reason (and other related metadata) are not part of the resource representation itself.
My first preference would probably be to use the message body, second to custom HTTP headers; however, as indicated, there are some downsides to these approaches.
Are there any recommendations or best practices inline with REST/HTTP standards for including such required metadata on DELETE requests? Are there any other alternatives that I haven't considered?
Despite some recommendations not to use the message body for DELETE requests, this approach may be appropriate in certain use cases. This is the approach we ended up using after evaluating the other options mentioned in the question/answers, and after collaborating with consumers of the service.
While the use of the message body is not ideal, none of the other options were perfectly fitting either. The request body DELETE allowed us to easily and clearly add semantics around additional data/metadata that was needed to accompany the DELETE operation.
I'd still be open to other thoughts and discussions, but wanted to close the loop on this question. I appreciate everyone's thoughts and discussions on this topic!
Given the situation you have, I would take one of the following approaches:
Send a PUT or PATCH: I am deducing that the delete operation is virtual, by the nature of needing a delete reason. Therefore, I believe updating the record via a PUT/PATCH operation is a valid approach, even though it is not a DELETE operation per se.
Use the query parameters: The resource uri is not being changed. I actually think this is also a valid approach. The question you linked was talking about not allowing the delete if the query parameter was missing. In your case, I would just have a default reason if the reason is not specified in the query string. The resource will still be resource/:id. You can make it discoverable with Link headers on the resource for each reason (with a rel tag on each to identify the reason).
Use a separate endpoint per reason: Using a url like resource/:id/canceled. This does actually change the Request-URI and is definitely not RESTful. Again, link headers can make this discoverable.
Remember that REST is not law or dogma. Think of it more as guidance. So, when it makes sense to not follow the guidance for your problem domain, don't. Just make sure your API consumers are informed of the variance.
What you seem to want is one of two things, neither of which are a pure DELETE:
You have two operations, a PUT of the delete reason followed by a DELETE of the resource. Once deleted, the contents of the resource are no longer accessible to anyone. The 'reason' cannot contain a hyperlink to the deleted resource. Or,
You are trying to alter a resource from state=active to state=deleted by using the DELETE method. Resources with state=deleted are ignored by your main API but might still be readable to an admin or someone with database access. This is permitted - DELETE doesn't have to erase the backing data for a resource, only to remove the resource exposed at that URI.
Any operation which requires a message body on a DELETE request can be broken down into at it's most general, a POST to do all the necessary tasks with the message body, and a DELETE. I see no reason to break the semantics of HTTP.
I suggest you include the required metadata as part of the URI hierarchy itself. An example (Naive):
If you need to delete entries based on a date range, instead of passing the start date and end date in body or as query parameters, structure the URI such a way that you pass the required information as part of the URI.
e.g.
DELETE /entries/range/01012012/31122012 -- Delete all entries between 01 January 2012 to 31st December 2012
Hope this helps.
I would say that query parameters are part of the resource definition, thus you can use them to define the scope of your operation, then "apply" the operation.
My conclusion is that Query Parameters as you defined it is the best approach.

REST verbs - which convention is "correct"

I'm well into implementing a REST service (on a Windows CE platform if that matters) and I started out using IBM's general definitions of using POST for creating (INSERTs) and PUT for updating.
Now I've run across Sun's definitions which are exactly the opposite. So my question is, which is the "generally accepted" definition? Or is there even one?
The disadvantage of using PUT to create resources is that the client has to provide the
unique ID that represents the object it is creating. While it usually possible for the client
to generate this unique ID, most application designers prefer that their servers (usually
through their databases) create this ID. In most cases we want
our server to control the generation of resource IDs. So what do we do? We can switch
to using POST instead of PUT.
So:
Put = UPDATE
Post = INSERT
The reason to use POST as INSERT and PUT as UPDATE is that POST is the only nonidempotent and unsafe operation according to HTTP. Idempotent means that no matter how many times you apply the operation, the result is always the same. In SQL INSERT is the only nonidempotent operation so it should be mapped onto POST. UPDATE is idempotent so it can be mapped on PUT. It means that the same PUT/UPDATE operation may be applied more than one time but only first will change state of our system/database.
Thus using PUT for INSERT will broke HTTP semantic viz requirement that PUT operation must be idempotent.
The verbs are:
GET {path}: Retrieve the resource whose identifier is {path}.
PUT {path}: Create or update the resource whose identifier is {path}.
DELETE {path}: Delete the resource whose identifier is {path}.
POST {path}: Invoke an action which is identified by {path}.
When the intention is to create a new resource, we can use PUT if we know what the identifier of the resource should be, and we can use POST if we want the server to determine the identifier of the new resource.
PUT can be used for creation when the server grants the client control over a portion of its URI space. This is equivalent to file creation in a file system: when you save to a file that does not yet exist you create it and if that file exists the result is an update.
However, PUT is lacking the ability of an implicit intent of the client. Consider placing an order: if you PUT to /orders/my-new-order the meaning can only ever be update the resource identified by /orders/my-new-order whereas POST /orders/ can mean 'place a new order' if the POST accepting resource has the appropriate semantics.
IOW, if you want to achieve anything as a side effect of the creation of the new resource you must use POST.
Jan
We use POST= Create, PUT= Update.
Why? There's no good reason. We had to choose one, and that's that choice I made.
Edit. Looking at other answers, I realize that the key creation issue might make the decision.
We POST new entries and return a JSON object with the generated key. It seems like this is a better fit for generally accepted POST semantics.
We PUT to existing entries with a full URI that identifies the object.
Here http://www.w3.org/Protocols/rfc2616/rfc2616.html is the offical guide of how to implement the behaviour of the HTTP methods.
I've been studying the concepts and implementations of REST a lot lately and the general consensus seems to be: PUT is used for creating/updating depending on whether or not the resource already exists. POST is used to append a resource to a collection.
See HTTP/1.1 Method Definitions http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
POST is designed to allow a uniform method to cover the following
functions:
- Annotation of existing resources;
- Posting a message to a bulletin board, newsgroup, mailing
list, or similar group of articles;
- Providing a block of data, such as the result of submitting a
form, to a data-handling process;
- Extending a database through an append operation.
The PUT method requests that the enclosed entity be stored under the
supplied Request-URI. If the Request-URI refers to an already existing
resource, the enclosed entity SHOULD be considered as a modified
version of the one residing on the origin server.
Also see the accepted answer to the question at Understanding REST: Verbs, error codes, and authentication.

Paging in a Rest Collection

I'm interested in exposing a direct REST interface to collections of JSON documents (think CouchDB or Persevere). The problem I'm running into is how to handle the GET operation on the collection root if the collection is large.
As an example pretend I'm exposing StackOverflow's Questions table where each row is exposed as a document (not that there necessarily is such a table, just a concrete example of a sizable collection of 'documents'). The collection would be made available at /db/questions with the usual CRUD api GET /db/questions/XXX, PUT /db/questions/XXX, POST /db/questions is in play. The standard way to get the entire collection is to GET /db/questions but if that naively dumps each row as a JSON object, you'll get a rather sizeable download and a lot of work on the part of the server.
The solution is, of course, paging. Dojo has solved this problem in its JsonRestStore via a clever RFC2616-compliant extension of using the Range header with a custom range unit items. The result is a 206 Partial Content that returns only the requested range. The advantage of this approach over a query parameter is that it leaves the query string for...queries (e.g. GET /db/questions/?score>200 or somesuch, and yes that'd be encoded %3E).
This approach completely covers the behavior I want. The problem is that RFC 2616 specifies that on a 206 response (emphasis mine):
The request MUST have included a Range header field (section 14.35)
indicating the desired range, and MAY have included an If-Range
header field (section 14.27) to make the request conditional.
This makes sense in the context of the standard usage of the header but is a problem because I'd like the 206 response to be the default to handle naive clients/random people exploring.
I've gone over the RFC in detail looking for a solution but have been unhappy with my solutions and am interested in SO's take on the problem.
Ideas I've had:
Return 200 with a Content-Range header! - I don't think that this is wrong, but I'd prefer if a more obvious indicator that the response is only Partial Content.
Return 400 Range Required - There is not a special 400 response code for required headers, so the default error has to be used and read by hand. This also makes exploration via web browser (or some other client like Resty) more difficult.
Use a query parameter - The standard approach, but I'm hoping to allow queries a la Persevere and this cuts into the query namespace.
Just return 206! - I think most clients wouldn't freak out, but I'd rather not go against a MUST in the RFC
Extend the spec! Return 266 Partial Content - Behaves exactly like 206 but is in response to a request that MUST NOT contain the Range header. I figure that 266 is high enough that I shouldn't run into collision issues and it makes sense to me but I'm not clear on whether this is considered taboo or not.
I'd think this is a fairly common problem and I'd like to see this done in a sort of de facto fashion so I or someone else isn't reinventing the wheel.
What's the best way to expose a full collection via HTTP when the collection is large?
I don't really agree with some of you guys. I've been working for weeks on this features for my REST service. What I ended up doing is really simple. My solution only makes a sense for what REST people call a collection.
Client MUST include a "Range" header to indicate which part of the collection he needs, or otherwise be ready to handle a 413 REQUESTED ENTITY TOO LARGE error when the requested collection is too large to be retrieved in a single round-trip.
Server sends a 206 PARTIAL CONTENT response, with the Content-Range header specifying which part of the resource has been sent, and an ETag header to identify the current version of the collection. I usually use a Facebook-like ETag {last_modification_timestamp}-{resource_id}, and I consider that the ETag of a collection is that of the most recently modified resource it contains.
To request a specific part of a collection, the client MUST use the "Range" header, and fill the "If-Match" header with the ETag of the collection obtained from previously performed requests to acquire other parts of the same collection. The server can therefore verify that the collection hasn't changed before sending the requested portion. If a more recent version exists, a 412 PRECONDITION FAILED response is returned to invite the client to retrieve the collection from scratch. This is necessary because it could mean that some resources might have been added or removed before or after the currently requested part.
I use ETag/If-Match in tandem with Last-Modified/If-Unmodified-Since to optimize cache. Browsers and proxies might rely on one or both of them for their caching algorithms.
I think that a URL should be clean unless it's to include a search/filter query. If you think about it, a search is nothing more than a partial view of a collection. Instead of the cars/search?q=BMW type of URLs, we should see more cars?manufacturer=BMW.
My gut feeling is that the HTTP range extensions aren't designed for your use case, and thus you shouldn't try. A partial response implies 206, and 206 must only be sent if the client asked for it.
You may want to consider a different approach, such as the one use in Atom (where the representation by design may be partial, and is returned with a status 200, and potentially paging links). See RFC 4287 and RFC 5005.
You can still return Accept-Ranges and Content-Ranges with a 200 response code. These two response headers give you enough information to infer the same information that a 206 response code provides explicitly.
I would use Range for pagination, and have it simply return a 200 for a plain GET.
This feels 100% RESTful and doesn't make browsing any more difficult.
Edit:
I wrote a blog post about this: http://otac0n.com/blog/2012/11/21/range-header-i-choose-you.html
If there is more than one page of responses, and you don't want to offer the whole collection at once, does that mean there are multiple choices?
On a request to /db/questions, return 300 Multiple Choices with Link headers that specify how to get to each page as well as a JSON object or HTML page with a list of URLs.
Link: <>; rel="http://paged.collection.example/relation/paged"
Link: <>; rel="http://paged.collection.example/relation/paged"
...
You'd have one Link header for each page of results (an empty string means the current URL, and the URL is the same for each page, just accessed with different ranges), and the relationship is defined as a custom one per the upcoming Link spec. This relationship would explain your custom 266, or your violation of 206. These headers are your machine-readable version, since all of your examples require an understanding client anyway.
(If you stick with the "range" route, I believe your own 2xx return code, as you described it, would be the best behavior here. You're expected to do this for your applications and such ["HTTP status codes are extensible."], and you have good reasons.)
300 Multiple Choices says you SHOULD also provide a body with a way for the user agent to pick. If your client is understanding, it should use the Link headers. If it's a user manually browsing, perhaps an HTML page with links to a special "paged" root resource that can handle rendering that particular page based on the URL? /humanpage/1/db/questions or something hideous like that?
The comments on Richard Levasseur's post remind me of an additional option: the Accept header (section 14.1). Back when the oEmbed spec came out, I wondered why it hadn't been done entirely using HTTP, and wrote up an alternative using them.
Keep the 300 Multiple Choices, the Link headers and the HTML page for an initial naive HTTP GET, but rather than use ranges, have your new paging relationship define the use of the Accept header. Your subsequent HTTP request might look like this:
GET /db/questions HTTP/1.1
Host: paged.collection.example
Accept: application/json;PagingSpec=1.0;page=1
The Accept header allows you to define an acceptable content type (your JSON return), plus extensible parameters for that type (your page number). Riffing on my notes from my oEmbed writeup (can't link to it here, I'll list it in my profile), you could be very explicit and provide a spec/relation version here in case you need to redefine what the page parameter means in the future.
Edit:
After thinking about it a bit more, I'm inclined to agree that Range headers aren't appropriate for pagination. The logic being, the Range header is intended for the server's response, not the applications. If you served 100 megabytes of results, but the server (or client) could only process 1 megabyte at a time, well, thats what the Range header is for.
I'm also of the opinion that a subset of resources is its own resource (similar to relational algebra.), so it deserve representation in the URL.
So basically, I recant my original answer (below) about using a header.
I think you answered your own question, more or less - return 200 or 206 with content-range and optionally use a query parameter. I would sniff the user agent and content type and, depending on those, check for a query parameter. Otherwise, require the range headers.
You essentially have conflicting goals - let people use their browser to explore (which doesn't easily allow custom headers), or force people to use a special client that can set headers (which doesn't let them explore).
You could just provide them with the special client depending on the request - if it looks like a plain browser, send down a small ajax app that renders the page and sets the necessary headers.
Of course, there is also the debate about whether the URL should contain all the necessary state for this sort of thing. Specifying the range using headers can be considered "un-restful" by some.
As an aside, it would be nice if servers could respond with a "Can-Specify: Header1, header2" header, and web browsers would present a UI so users could fill in values, if they desired.
You might consider using a model something like the Atom Feed Protocol since it has a sane HTTP model of collections and how to manipulate them (where insane means WebDAV).
There's the Atom Publishing Protocol which defines the collection model and REST operations plus you can use RFC 5005 - Feed Paging and Archiving to page through big collections.
Switching from Atom XML to JSON content should not affect the idea.
I think the real problem here is that there is nothing in the spec that tells us how to do automatic redirects when faced with 413 - Requested Entity Too Large.
I was struggling with this same problem recently and I looked for inspiration in the RESTful Web Services book. Personally I don't think 206 is appropriate due to the header requirement. My thoughts also led me to 300, but I thought that was more for different mime-types, so I looked up what Richardson and Ruby had to say on the subject in Appendix B, page 377. They suggest that the server just pick the preferred representation and send it back with a 200, basically ignoring the notion that it should be a 300.
That also jibes with the notion of links to next resources that we have from atom. The solution I implemented was to add "next" and "previous" keys to the json map I was sending back and be done with it.
Later on I started thinking maybe the thing to do is send a 307 - Temporary Redirect to a link that would be something like /db/questions/1,25 - that leaves the original URI as the canonical resource name, but it gets you to an appropriately named subordinate resource. This is behavior I'd like to see out of a 413, but 307 seems a good compromise. Haven't actually tried this in code yet though. What would be even better is for the redirect to redirect to a URL containing the actual IDs of the most recently asked questions. For example if each question has an integer ID, and there are 100 questions in the system and you want to show the ten most recent, requests to /db/questions should be 307'd to /db/questions/100,91
This is a very good question, thanks for asking it. You confirmed for me that I'm not nuts for having spent days thinking about it.
One of the big problems with range headers is that a lot of corporate proxies filter them out. I'd advise to use a query parameter instead.
With the publication of rfc723x, unregistered range units do go against an explicit recommendation in the spec. Consider rfc7233 (deprecating rfc2616):
"New range units ought to be registered with IANA" (along with a reference to a HTTP Range Unit Registry).
You can detect the Range header, and mimic Dojo if it is present, and mimic Atom if it is not. It seems to me that this neatly divides the use cases. If you are responding to a REST query from your application, you expect it to be formatted with a Range header. If you are responding to a casual browser, then if you return paging links it will let the tool provide an easy way to explore the collection.
Seems to me that the best way to do this is to include range as query parameters. e.g., GET /db/questions/?date>mindate&date<maxdate. Upon a GET to the /db/questions/ with no query parameters, return 303 with Location: /db/questions/?query-parameters-to-retrieve-the-default-page. Then provide a different URL by which whomever is consuming your API to get statistics about the collection (e.g., what query parameters to use if s/he wants the entire collection);
While its possible to use the Range header for this purpose, I don't think that was the intent. It seems to have been designed for handling flaky connections as well as limiting the data (so the client can request part of the request if something was missing or the size was too large to process). You are hacking pagination into something that is likely used for other purposes at the communication layer.
The "proper" way to handle pagination is with the types you return. Rather than returning questions object, you should be returning a new type instead.
So if questions is like this:
<questions>
<question index=1></question>
<question index=2></question>
...
</questions>
The new type could be something like this:
<questionPage>
<startIndex>50</startIndex>
<returnedCount>10</returnedCount>
<totalCount>1203</totalCount>
<questions>
<question index=50></question>
<question index=51></question>
..
</questions>
<questionPage>
Of course you control your media types, so you can make your "pages" a format that suits your needs. If you make is something generic, you can have a single parser on the client to handle paging the same for all types. I think that is more in the spirit of the HTTP specification, rather than fudging the Range parameter for something else.

Resources