I am trying to implement a system which depends on the HTTP get/post parameter order.
I want the system provide a remote function call mechanism, for example:
Suppose there is a function foo(int, int), it can be called remotely by HTTP get http://ip:port/method=foo¶mType=int¶m=1¶mType=int¶m=2 or HTTP post with post data as method=foo¶mType=int¶m=1¶mType=int¶m=2, which acts as call foo(1,2) locally.
As you see, it depends on parameter order extremely. If parameter order goes wrong, foo(2,1) will be called unexpected.
But I am not sure is it reliable, since I think W3 did not make a spec for the parameter order(tell me if I'm wrong).
I am not sure the parameter order will be as expected at three points:
Will the client(such as a browser or jmeter) post the parameter in
order as you see?
Will the order be preserved during transmission?
Will the web contain(such as tomcat) or the web framework(such as django)
preserve the parameter order?
I did a few tests, found chrome, firefox and jmeter will send get/post parameter as expected and tomcat preserved the parameter order, but it's a hard work to find negetive cases and I am not sure there is no such cases. So I can't be sure is the system I am trying to implement is reliable.
Does anyone have any experiences for such problem? All suggestions are welcome.
You cannot enforce parameter order in either a URL query string or application/x-www-form-urlencoded post. Although W3C defines HTML to transmit form values in the order they appear in the HTML, server-side scripts are free to access parameters by name in any order, and having multiple parameters with the same name is a recipe for disaster. You need to rename your parameters to make them unique and order-independant, eg:
method=foo¶m1Type=int¶m1=1¶m2Type=int¶m2=2
This way, foo() can read its 2 paramX parameters regardless of their ordering. For instance, this would also be perfectly valid and still be functional:
param2=2¶m1=1¶m1Type=int¶m2Type=int&method=foo
Personally, I would suggest you eliminate the paramType parameters:
method=foo¶m1=1¶m2=2
Your API spec dictates the data types of the parameters. If a client sends a non-integer value to foo(), return an HTTP error, like 400 Bad Request. Always validate input before using it.
If the order matters I would design it in a way like #TGH said where the parameters are part of the path like http://someServer/param1/param2. This enforces ordering and wont allow requests to be made any other way. If you design it using query parameters expecting the browser to maintain the order, that opens up the possibility for a security hole for someone to take advantage of.
Related
I know that typically, GET requests pass relevant parameters by appending query strings to the URI. POST requests pass parameters by inserting them into the request body.
I also know that it is considered bad practice to pass GET parameters into the request body, but that is not an explanation as to why this behavior was initially implemented. As far as I can tell, the main difference between GET and POST requests is more or less semantic. That is, I can generally assume GET is safe and POST is not, which helps me understand that I can cache GET results, and allow GET to call multiple times without worrying that I am going to mess something up server-side. This does not explain why there is a different implementation for passing parameters. Would it have been a bad idea to have GET be semantically equivalent to the way that it is used now but use the request body to pass parameters instead of the URI? Similarly, was there a compelling historical reason for the designers of HTTP to divide how GET and POST stored parameters (in such a seemingly arbitrary way)?
I think it might help to re frame this a bit to not look at these as parameters, but take a step back.
Ultimately the purpose of the GET request is to ask a server for the representation of a given URI. The fact that the URI has things that are 'parameters' or 'variable' is kind of irrelevant here.
The idea of the GET request is, give me the thing at this URI. The fact that there are some mechanisms to dynamically build up these URIs is not important.
You can do the exact same thing with POST. Any URI that you can GET, you could send a POST request to. So these URI variables can also exist for POST.
So on a high level, this is what the requests are for:
GET - Return the representation at the given URI
PUT - Replace the representation at the given URI
DELETE - Remove the resource at the given URI
Looking at these operations as something you do at the given URI it starts to make sense that GET doesn't have a request body. If the purpose of the request is simply 'give me the thing at this uri', there's no real reason to also have a body.
Likewise, it doesn't make that much sense for a PUT request to respond with a body. The purpose of PUT is to replace whatever is at the given URI with the request body. The caller only really needs to know if the operation succeeded or not, so a HTTP status code is enough.
For DELETE neither the request or response body really makes sense. All you need to know is there is a URI, and you are calling the DELETE operation, which succeeds or fails.
So what about POST? POST is really a bit of a catch-all that is used for basically 'anything else'. You're not explicitly replacing, removing or retrieving a resource, but rather you are doing 'some arbitrary operation', like doing an RPC call or creating a new resource in a collection.
Unlike GET, PUT, and DELETE, the meaning of POST is much more loose.
POST - Do an arbitrary operation at the given URI.
Data in the URL can be linked to, bookmarked and easily shared.
Data in the request body can be big, and easily include things which are not text (like file attachments).
I'm designing a (more or less) RESTful internal web service running on ASP.NET and IIS. I want clients to be able to pass query details to the server when accessing large collections of entries, using JSON to describe the query in a known manner. The issue is that the queries sent to the server will be complex; they may include aggregation, filtering, mapping—essentially anything that is supported by the LINQ query operators. This will result in relatively large JSON objects representing the queries.
The conflict I'm facing is that, while a query is semantically a GET in the world of REST, there's no standardized way to pass a large block of data to a web server during a GET. I've come up with a few options to get around this issue.
Option 1: Send the query object in the body of the GET request.
GET /namespace/collection/ HTTP/1.1
Content-Length: 22
{ /* query object */ }
Obviously, this is non-standard, and some software may choke on a GET request that has a body. (Or worse, simply strip the body and handle the request without it, which would cause the server to return an incorrect result set.)
Option 2: Use a non-standard HTTP verb (perhaps QUERY) instead of GET.
QUERY /namespace/collection/ HTTP/1.1
Content-Length: 22
{ /* query object */ }
While this doesn't fit exactly with the REST pattern, it seems (to me) like a safe alternative because other software (such as anything that uses WebDAV) seems to use non-standard HTTP verbs with sufficient success.
Option 3: Put the query object in a non-standard HTTP header.
GET /namespace/collection/ HTTP/1.1
ProjectName-Query: { /* query object */ }
This option keeps the request as a GET, but requires stuffing what could potentially be a very large object in an HTTP header. I understand some software places arbitrary length limits on HTTP headers, so this may cause issues if the object gets too big.
Option 4: Use the POST verb and provide an alternate endpoint for querying.
POST /namespace/collection/query HTTP/1.1
Content-Length: 22
{ /* query object */ }
Because this uses a standard verb and no standard headers, this method is guaranteed to work in all scenarios. The only issue is that it strays from RESTful architecture, which I'm trying to stay aligned with as best I can.
None of these options are quite right. What I want to know is which way makes the most sense for the service I'm writing; it's an internal web service (it will never exposed to the public) but it may be accessed through a variety of network security applications (firewalls, content filters, etc..) and I want to stick to known development styles, standards, and architecture as best I can.
I would think about "RESTful querying" as having two resources: Query and QueryResult.
You POST your Query to one end-point (e.g. "POST /queries/") and receive a CREATED Status back with the URI of your specific query (/queries/123) and a nice and RESTful hypertext body telling you the URL of your query result (e.g. /result/123 ). Then you access your query result with a GET /result/123. (Bonus points if you use hypertext to link back to /queries/123 so that the consumer of the query result can check and modify the query.
To elaborate the point I'm trying to make:
If RESTful is basically reduced to "map business entities to URIs" than the obvious question arises: "how can I query a subset of my entities"? Often the solution is "adding a query string to the 'all entities of this type'-URL" - Why else would it be called "query string"?. But it starts to feel "wrong" - as stated in the OP - if you want to have a full fledged query interface.
The reason is that with this requirement the Query becomes a full business object itself and is no longer an addendum to an resource address. It's no longer secondary but primary. It becomes important enough to become a resource in its own right - with it's own address (e.g. URL) and representation.
I would use Option 4. It is difficult to put the query representation in json for a large search request into an url, especially against a search server. I agree, in that case it does not fit into a Restful style since the resources cannot be identified by the URI. REST is a guideline. If the scenario cannot be realized by REST then i guess do something that solves the problem. Here using POST is not restful but it seems to be the correct solution.
I'm not sure how much it would look "canonical" to you, but you could have a serious look at OData (open data protocol):
OData is a standardized protocol for creating and consuming data APIs.
OData builds on core protocols like HTTP and commonly accepted
methodologies like REST. The result is a uniform way to expose
full-featured data APIs.
Even if you don't implement it as is, there are ideas that could be reused.
Specifically, OData defines batch processing. It's used for executing multiple operations sent in a single HTTP request. So, with OData, you have two choices:
use the GET + query string operation for queries that are not too long
use a POST + multipart body operation for bigger things.
More on maximum uri length in an OData context: OData Url Length Limitations
Also, many security devices (routers, firewall, etc.) will simply not let your option 1, 2 and 3 go through. GET + Body is unusual, GET + a big form value may get killed, and a custom HTTP verb is also very unusual.
Overall, I think the POST + body seems the best choice (whether it's strictly multipart - like in OData - or not is up to you)
After thinking more about this, I am going to give another answer.
What do you mean, in estimated number of characters, when you state the JSON representations will be "relatively large"? IE can handle URLs over 2,000 characters. Will the queries ever get bigger than that? Because I think the querystring is the way to go. Right now I am working on a system that uses JSONP so we have no other option than to pass all data as a JSON package in the querystring and it works fine. Not only will using the GET verb be semantically correct, this will also include the feature of being able to bookmark URLs to the results. The users could easily share links to the data results through email or other electronic communication systems you use internally.
I'm not sure if this helps but even for all Quickbooks APIs, queries which return large resultsets like Read All, or a LINQ extender query which returns large JSON resultsets, we use GET with the relevant content type and encoding like ASCII. The request uses compressionFormat as None and response uses a GZIP compressionFormat.
https://developer.intuit.com/apiexplorer?apiName=V3QBO
The best way would be to serialize the search JSON object and pass it as a query parameter. Are you sure it will be too long for modern browsers and servers? Modern browsers and servers can handle pretty hefty GET query parameter lengths, thousands of characters.
Perhaps an extension header like X-Custom-Query-Parameters-JSON if objects are going to be more on the order of 8k characters.
How many characters would a serialized JSON object be in your particular case?
Some related questions about character limits:
What is the limit on QueryString / GET / URL parameters
Is there a practical HTTP Header length limit?
An interesting problem. I don't have the specifics on what you are trying to do, but I wonder if it is too much to gracefully handle with one resource. You may want to break it up into several different types depending on the main characteristics of the request. If you are just trying to expose what should be a SQL query through an HTTP request, then I don't think there is any way it can can be implemented without a mess. Just pass the SQL query in the query string and stop trying to find a proper way to do it - it doesn't exist.
Use POST, and pass the queries/parameters as key-value pairs in the body as json. It also becomes easier in your asp.net code to translate the payload into a dictionary object.
Dictionary<string,object>
When i have a resource, let's say customers/3 which returns the customer object and i want to return this object with different fields, or some other changes (for example let's say i need to have include in customer object also his latest purchase (for the sake of speed i dont want to do 2 different queries)).
As i see it my options are:
customers/3/with-latest-purchase
customers/3?display=with-latest-purchase
In the first option there is distinct URI for the new representation, but is this REALLY needed? Also how do i tell the client that this URI exist?
In the second option there is GET parameter telling the server what kind of representation to return. The URI parameters can be explained through OPTIONS method and it is easier to tell client where to look for the data as all the representations are all in one place.
So my question is which of these is better (more RESTful) and/or is there some better way to do this that i do not know about?
I think what is best is to define atomic, indivisible service objects, e.g. customer and customer-latest-purchase, nice, clean, simple. Then if the client wants a customer with his latest purchases, they invoke both service calls, instead of jamming it all in one with funky parameters.
Different representations of an object is OK in Java through interfaces but I think it is a bad idea for REST because it compromises its simplicity.
There is a misconception that making query parameters look like file paths is more RESTful. The query portion of the address is included when determining a distinct URI so the second option is fine.
Is there much of a performance hit in including the latest purchase data in all customer GET requests? If not, the simplest thing would be to do that so there would neither be weird URL params or double requests. If getting the latest order is a significant hardship (which it probably shouldn't be) there is nothing wrong with adding a flag in the query string to include it.
This question already has answers here:
When should I use GET or POST method? What's the difference between them?
(15 answers)
Closed 9 years ago.
I've only recently been getting involved with PHP/AJAX/jQuery and it seems to me that an important part of these technologies is that of POST and GET.
First, what is the difference between POST and GET? Through experimenting, I know that GET appends the returning variables and their values to the URL string
website.example/directory/index.php?name=YourName&bday=YourBday
but POST doesn't.
So, is this the only difference or are there specific rules or conventions for using one or the other?
Second, I've also seen POST and GET outside of PHP: also in AJAX and jQuery. How do POST and GET differ between these 3? Are they the same idea, same functionality, just utilized differently?
GET and POST are two different types of HTTP requests.
According to Wikipedia:
GET requests a representation of the specified resource. Note that GET should not be used for operations that cause side-effects, such as using it for taking actions in web applications. One reason for this is that GET may be used arbitrarily by robots or crawlers, which should not need to consider the side effects that a request should cause.
and
POST submits data to be processed (e.g., from an HTML form) to the identified resource. The data is included in the body of the request. This may result in the creation of a new resource or the updates of existing resources or both.
So essentially GET is used to retrieve remote data, and POST is used to insert/update remote data.
HTTP/1.1 specification (RFC 2616) section 9 Method Definitions contains more information on GET and POST as well as the other HTTP methods, if you are interested.
In addition to explaining the intended uses of each method, the spec also provides at least one practical reason for why GET should only be used to retrieve data:
Authors of services which use the HTTP protocol SHOULD NOT use GET based forms for the submission of sensitive data, because this will cause this data to be encoded in the Request-URI. Many existing servers, proxies, and user agents will log the request URI in some place where it might be visible to third parties. Servers can use POST-based form submission instead
Finally, an important consideration when using GET for AJAX requests is that some browsers - IE in particular - will cache the results of a GET request. So if you, for example, poll using the same GET request you will always get back the same results, even if the data you are querying is being updated server-side. One way to alleviate this problem is to make the URL unique for each request by appending a timestamp.
A POST, unlike a GET, typically has relevant information in the body of the request. (A GET should not have a body, so aside from cookies, the only place to pass info is in the URL.) Besides keeping the URL relatively cleaner, POST also lets you send much more information (as URLs are limited in length, for all practical purposes), and lets you send just about any type of data (file upload forms, for example, can't use GET -- they have to use POST plus a special content type/encoding).
Aside from that, a POST connotes that the request will change something, and shouldn't be redone willy-nilly. That's why you sometimes see your browser asking you if you want to resubmit form data when you hit the "back" button.
GET, on the other hand, should be idempotent -- meaning you could do it a million times and the server will do the same thing (and show basically the same result) each and every time.
Whilst not a description of the differences, below are a couple of things to think about when choosing the correct method.
GET requests can get cached by the browser which can be a problem (or benefit) when using ajax.
GET requests expose parameters to users (POST does as well but they are less visible).
POST can pass much more information to the server and can be of almost any length.
POST and GET are two HTTP request methods. GET is usually intended to retrieve some data, and is expected to be idempotent (repeating the query does not have any side-effects) and can only send limited amounts of parameter data to the server. GET requests are often cached by default by some browsers if you are not careful.
POST is intended for changing the server state. It carries more data, and repeating the query is allowed (and often expected) to have side-effects such as creating two messages instead of one.
If you are working RESTfully, GET should be used for requests where you are only getting data, and POST should be used for requests where you are making something happen.
Some examples:
GET the page showing a particular SO question
POST a comment
Send a POST request by clicking the "Add to cart" button.
With POST you can also do multipart mime encoding which means you can attach files as well. Also if you are using post variables across navigation of pages, the user will get a warning asking if they want to resubmit the post parameter. Typically they look the same in an HTTP request, but you should just stick to POST if you need to "POST" something TO a server and "GET" if you need to GET something FROM a server as that's the way they were intended.
The only "big" difference between POST & GET (when using them with AJAX) is since GET is URL provided, they are limited in ther length (since URL arent infinite in length).
I'm interested in exposing a direct REST interface to collections of JSON documents (think CouchDB or Persevere). The problem I'm running into is how to handle the GET operation on the collection root if the collection is large.
As an example pretend I'm exposing StackOverflow's Questions table where each row is exposed as a document (not that there necessarily is such a table, just a concrete example of a sizable collection of 'documents'). The collection would be made available at /db/questions with the usual CRUD api GET /db/questions/XXX, PUT /db/questions/XXX, POST /db/questions is in play. The standard way to get the entire collection is to GET /db/questions but if that naively dumps each row as a JSON object, you'll get a rather sizeable download and a lot of work on the part of the server.
The solution is, of course, paging. Dojo has solved this problem in its JsonRestStore via a clever RFC2616-compliant extension of using the Range header with a custom range unit items. The result is a 206 Partial Content that returns only the requested range. The advantage of this approach over a query parameter is that it leaves the query string for...queries (e.g. GET /db/questions/?score>200 or somesuch, and yes that'd be encoded %3E).
This approach completely covers the behavior I want. The problem is that RFC 2616 specifies that on a 206 response (emphasis mine):
The request MUST have included a Range header field (section 14.35)
indicating the desired range, and MAY have included an If-Range
header field (section 14.27) to make the request conditional.
This makes sense in the context of the standard usage of the header but is a problem because I'd like the 206 response to be the default to handle naive clients/random people exploring.
I've gone over the RFC in detail looking for a solution but have been unhappy with my solutions and am interested in SO's take on the problem.
Ideas I've had:
Return 200 with a Content-Range header! - I don't think that this is wrong, but I'd prefer if a more obvious indicator that the response is only Partial Content.
Return 400 Range Required - There is not a special 400 response code for required headers, so the default error has to be used and read by hand. This also makes exploration via web browser (or some other client like Resty) more difficult.
Use a query parameter - The standard approach, but I'm hoping to allow queries a la Persevere and this cuts into the query namespace.
Just return 206! - I think most clients wouldn't freak out, but I'd rather not go against a MUST in the RFC
Extend the spec! Return 266 Partial Content - Behaves exactly like 206 but is in response to a request that MUST NOT contain the Range header. I figure that 266 is high enough that I shouldn't run into collision issues and it makes sense to me but I'm not clear on whether this is considered taboo or not.
I'd think this is a fairly common problem and I'd like to see this done in a sort of de facto fashion so I or someone else isn't reinventing the wheel.
What's the best way to expose a full collection via HTTP when the collection is large?
I don't really agree with some of you guys. I've been working for weeks on this features for my REST service. What I ended up doing is really simple. My solution only makes a sense for what REST people call a collection.
Client MUST include a "Range" header to indicate which part of the collection he needs, or otherwise be ready to handle a 413 REQUESTED ENTITY TOO LARGE error when the requested collection is too large to be retrieved in a single round-trip.
Server sends a 206 PARTIAL CONTENT response, with the Content-Range header specifying which part of the resource has been sent, and an ETag header to identify the current version of the collection. I usually use a Facebook-like ETag {last_modification_timestamp}-{resource_id}, and I consider that the ETag of a collection is that of the most recently modified resource it contains.
To request a specific part of a collection, the client MUST use the "Range" header, and fill the "If-Match" header with the ETag of the collection obtained from previously performed requests to acquire other parts of the same collection. The server can therefore verify that the collection hasn't changed before sending the requested portion. If a more recent version exists, a 412 PRECONDITION FAILED response is returned to invite the client to retrieve the collection from scratch. This is necessary because it could mean that some resources might have been added or removed before or after the currently requested part.
I use ETag/If-Match in tandem with Last-Modified/If-Unmodified-Since to optimize cache. Browsers and proxies might rely on one or both of them for their caching algorithms.
I think that a URL should be clean unless it's to include a search/filter query. If you think about it, a search is nothing more than a partial view of a collection. Instead of the cars/search?q=BMW type of URLs, we should see more cars?manufacturer=BMW.
My gut feeling is that the HTTP range extensions aren't designed for your use case, and thus you shouldn't try. A partial response implies 206, and 206 must only be sent if the client asked for it.
You may want to consider a different approach, such as the one use in Atom (where the representation by design may be partial, and is returned with a status 200, and potentially paging links). See RFC 4287 and RFC 5005.
You can still return Accept-Ranges and Content-Ranges with a 200 response code. These two response headers give you enough information to infer the same information that a 206 response code provides explicitly.
I would use Range for pagination, and have it simply return a 200 for a plain GET.
This feels 100% RESTful and doesn't make browsing any more difficult.
Edit:
I wrote a blog post about this: http://otac0n.com/blog/2012/11/21/range-header-i-choose-you.html
If there is more than one page of responses, and you don't want to offer the whole collection at once, does that mean there are multiple choices?
On a request to /db/questions, return 300 Multiple Choices with Link headers that specify how to get to each page as well as a JSON object or HTML page with a list of URLs.
Link: <>; rel="http://paged.collection.example/relation/paged"
Link: <>; rel="http://paged.collection.example/relation/paged"
...
You'd have one Link header for each page of results (an empty string means the current URL, and the URL is the same for each page, just accessed with different ranges), and the relationship is defined as a custom one per the upcoming Link spec. This relationship would explain your custom 266, or your violation of 206. These headers are your machine-readable version, since all of your examples require an understanding client anyway.
(If you stick with the "range" route, I believe your own 2xx return code, as you described it, would be the best behavior here. You're expected to do this for your applications and such ["HTTP status codes are extensible."], and you have good reasons.)
300 Multiple Choices says you SHOULD also provide a body with a way for the user agent to pick. If your client is understanding, it should use the Link headers. If it's a user manually browsing, perhaps an HTML page with links to a special "paged" root resource that can handle rendering that particular page based on the URL? /humanpage/1/db/questions or something hideous like that?
The comments on Richard Levasseur's post remind me of an additional option: the Accept header (section 14.1). Back when the oEmbed spec came out, I wondered why it hadn't been done entirely using HTTP, and wrote up an alternative using them.
Keep the 300 Multiple Choices, the Link headers and the HTML page for an initial naive HTTP GET, but rather than use ranges, have your new paging relationship define the use of the Accept header. Your subsequent HTTP request might look like this:
GET /db/questions HTTP/1.1
Host: paged.collection.example
Accept: application/json;PagingSpec=1.0;page=1
The Accept header allows you to define an acceptable content type (your JSON return), plus extensible parameters for that type (your page number). Riffing on my notes from my oEmbed writeup (can't link to it here, I'll list it in my profile), you could be very explicit and provide a spec/relation version here in case you need to redefine what the page parameter means in the future.
Edit:
After thinking about it a bit more, I'm inclined to agree that Range headers aren't appropriate for pagination. The logic being, the Range header is intended for the server's response, not the applications. If you served 100 megabytes of results, but the server (or client) could only process 1 megabyte at a time, well, thats what the Range header is for.
I'm also of the opinion that a subset of resources is its own resource (similar to relational algebra.), so it deserve representation in the URL.
So basically, I recant my original answer (below) about using a header.
I think you answered your own question, more or less - return 200 or 206 with content-range and optionally use a query parameter. I would sniff the user agent and content type and, depending on those, check for a query parameter. Otherwise, require the range headers.
You essentially have conflicting goals - let people use their browser to explore (which doesn't easily allow custom headers), or force people to use a special client that can set headers (which doesn't let them explore).
You could just provide them with the special client depending on the request - if it looks like a plain browser, send down a small ajax app that renders the page and sets the necessary headers.
Of course, there is also the debate about whether the URL should contain all the necessary state for this sort of thing. Specifying the range using headers can be considered "un-restful" by some.
As an aside, it would be nice if servers could respond with a "Can-Specify: Header1, header2" header, and web browsers would present a UI so users could fill in values, if they desired.
You might consider using a model something like the Atom Feed Protocol since it has a sane HTTP model of collections and how to manipulate them (where insane means WebDAV).
There's the Atom Publishing Protocol which defines the collection model and REST operations plus you can use RFC 5005 - Feed Paging and Archiving to page through big collections.
Switching from Atom XML to JSON content should not affect the idea.
I think the real problem here is that there is nothing in the spec that tells us how to do automatic redirects when faced with 413 - Requested Entity Too Large.
I was struggling with this same problem recently and I looked for inspiration in the RESTful Web Services book. Personally I don't think 206 is appropriate due to the header requirement. My thoughts also led me to 300, but I thought that was more for different mime-types, so I looked up what Richardson and Ruby had to say on the subject in Appendix B, page 377. They suggest that the server just pick the preferred representation and send it back with a 200, basically ignoring the notion that it should be a 300.
That also jibes with the notion of links to next resources that we have from atom. The solution I implemented was to add "next" and "previous" keys to the json map I was sending back and be done with it.
Later on I started thinking maybe the thing to do is send a 307 - Temporary Redirect to a link that would be something like /db/questions/1,25 - that leaves the original URI as the canonical resource name, but it gets you to an appropriately named subordinate resource. This is behavior I'd like to see out of a 413, but 307 seems a good compromise. Haven't actually tried this in code yet though. What would be even better is for the redirect to redirect to a URL containing the actual IDs of the most recently asked questions. For example if each question has an integer ID, and there are 100 questions in the system and you want to show the ten most recent, requests to /db/questions should be 307'd to /db/questions/100,91
This is a very good question, thanks for asking it. You confirmed for me that I'm not nuts for having spent days thinking about it.
One of the big problems with range headers is that a lot of corporate proxies filter them out. I'd advise to use a query parameter instead.
With the publication of rfc723x, unregistered range units do go against an explicit recommendation in the spec. Consider rfc7233 (deprecating rfc2616):
"New range units ought to be registered with IANA" (along with a reference to a HTTP Range Unit Registry).
You can detect the Range header, and mimic Dojo if it is present, and mimic Atom if it is not. It seems to me that this neatly divides the use cases. If you are responding to a REST query from your application, you expect it to be formatted with a Range header. If you are responding to a casual browser, then if you return paging links it will let the tool provide an easy way to explore the collection.
Seems to me that the best way to do this is to include range as query parameters. e.g., GET /db/questions/?date>mindate&date<maxdate. Upon a GET to the /db/questions/ with no query parameters, return 303 with Location: /db/questions/?query-parameters-to-retrieve-the-default-page. Then provide a different URL by which whomever is consuming your API to get statistics about the collection (e.g., what query parameters to use if s/he wants the entire collection);
While its possible to use the Range header for this purpose, I don't think that was the intent. It seems to have been designed for handling flaky connections as well as limiting the data (so the client can request part of the request if something was missing or the size was too large to process). You are hacking pagination into something that is likely used for other purposes at the communication layer.
The "proper" way to handle pagination is with the types you return. Rather than returning questions object, you should be returning a new type instead.
So if questions is like this:
<questions>
<question index=1></question>
<question index=2></question>
...
</questions>
The new type could be something like this:
<questionPage>
<startIndex>50</startIndex>
<returnedCount>10</returnedCount>
<totalCount>1203</totalCount>
<questions>
<question index=50></question>
<question index=51></question>
..
</questions>
<questionPage>
Of course you control your media types, so you can make your "pages" a format that suits your needs. If you make is something generic, you can have a single parser on the client to handle paging the same for all types. I think that is more in the spirit of the HTTP specification, rather than fudging the Range parameter for something else.