Below is the statement written at http://hateinterview.com/java-script/methods-get-vs-post-in-html-forms/1854.html
By specification, GET is used basically for retrieving data where as POST is used for data storing, data updating, ordering a product or even e-mailing
Whenever i used get or post method, i used them to get the parameters with getparameter() method of httprequest. I did not get in above statement, how post method is used for datastoring or dataupdation which we cant achieve with get method. Looking for a very brief example.
EDIT: Thanks all for your answers but i am specifically looking for meaning of data storing, data updating in post method apart from file loading stuff.
GET can in theory also store and update data, but it's simply not safe. It's too easy to accidently store or update data by just bookmarking it, following a link or being indexed by a searchbot. POST requests are not bookmarkable/linkable and not indexed by searchbots. Also, the GET query string has a limited length, the safe limit is 255 characters. The POST request body can however be as large as 2GB. Also, uploading files is not possible by GET.
One difference is that the GET data (in the URL, as another answer says) show up at a *nix server as the contents of the environment variable QUERY_STRING, whereas POST data appear on stdin. Regardless of how they are packaged and sent, GET and POST data are identical in format, in my experience.
#Mohit edited his question to add: "Thanks all for your answers but i am specifically looking for meaning of data storing, data updating in post method apart from file loading stuff. "
Read rfc2616, Hypertext Transfer Protocol -- HTTP/1.1, specifically Sections 9.3 GET and 9.5 POST:
"The GET method means retrieve ... information."
"The POST method is used to request that the origin server accept" information.
To be strictly compliant with rfc2616, use the GET method to read data from the server. Use the POST method to write data to the server.
The "meaning of data storing, data updating" is exactly that. How could this possibly be any clearer or more explicit?
POST sends the data in the body while GET puts the data in the URL...
For example to upload a file you use POST... since GET puts the data into URL the data is visible to the user and the length is limited.
see for example http://www.cs.tut.fi/~jkorpela/forms/methods.html
there are things you can not do with GET !first of them is with post you can upload files!
See this: Article or this one
Related
I've got POST method which responds with a file. And right after file download I also want to show the message on the UI, that was generated during the file creation in the code (were there any errors, are all items included in the file etc.).
My question is how to implement such behaviour in most appropriate way?
now I've got 2 ideas:
sending another POST method right after download, which will return
JSON with message contents. Which seems to be quite complicated
since I should be storing on the server objects with messages for
each created file
sending a cookie with the same response, that will contain message
contents
I think both of your solutions should work in theory. I am assuming here that cookie headers sent back with an octet stream response will be honoured and instantly accessable via javascript. I've not tried this out myself.
I have two more ideas which might give you some inspiration:
Simple
In the POST handler before you output the file to the user, you could set a session variable containing the message you want to display. Then after the file is downloaded, make a 2nd request to GET that message from the session. Since it would get overwritten each time, it might get a bit unreliable if the user made lots of POSTs at once.
More complicated
For a totally asynchronous solution, in the past I have used services in this way:
POST /process_this -> "{'status':'ok', 'id':123}"
GET /status?id=123 -> "{'status':'processing', 'progress':'45%'}"
GET /status?id=123 -> "{'status':'processing', 'progress':'85%'}"
GET /status?id=123 -> "{'status':'ready', 'meta':{'data':'message'}}"
GET /download?id=123 -> Binary file stream
It gives you the ability to always respond instantly and do some cool progress bar on the frontend. Also you could have a queue system if the processing is intensive. But on the downside, you need to write the processor as a seperate module and have it talking to a database or something that saves state.
Other
Some other thoughts I had included:
Storing the meta data within the file itself. (What is the response file format?)
Using multipart responses (not sure how compatible browsers are on this yet)
Using a combination of your two ideas if cookies are not big enough to hold all of the message.
As stated in http://www.boutell.com/newfaq/misc/urllength.html, HTTP query string have limited length. It can be limited by the client (Firefox, IE, ...), the server (Apache, IIS, ...) or the network equipment (applicative firewall, ...).
Today I face this problem with a search form. We developed a search form with a lot of fields, and this form is sent to the server as a GET request, so I can bookmark the resulting page.
We have so many fields that our query string is 1100 bytes long, and we have a firewall that drops HTTP GET requests with more than 1024 bytes. Our system administrator recommends us to use POST instead so there will be no limitation.
Sure, POST will work, but I really feel a search as a GET and not a POST. So I think I will review our field names to ensure the query string is not too long, and if I can't I will be pragmatic and use POST.
But is there a flaw in the design of RESTful services? If we have limited length in GET request, how can I do to send large objects to a RESTful webservice? For example, if I have a program that makes calculations based on a file, and I want to provide a RESTful webservice like this: http://compute.com?content=<base64 file>. This won't work because the query string has not unlimited length.
I'm a little puzzled...
HTTP specification actually advises to use POST when sending data to a resource for computation.
Your search looks like a computation, not a resource itself. What you could do if you still want your search results to be a resource is create a token to identify that specific search result and redirect the user agent to that resource.
You could then delete search results tokens after some amount of time.
Example
POST /search
query=something&category=c1&category=c2&...
201 Created
Location: /search/01543164876
then
GET /search/01543164876
200 Ok
... your results here...
This way, browsers and proxies can still cache search results but you are submitting your query parameters using POST.
EDIT
For clarification, 01543164876 here represents a unique ID for the resource representing your search. Those 2 requests basically mean: create a new search object with these criteria, then retrieve the results associated with the created search object.
This ID can be a unique ID generated for each new request. This would mean that your server will leak "search" objects and you will have to clean them regularly with a caching strategy.
Or it can be a hash of all the search criteria actually representing the search asked by the user. This allows you to reuse IDs since recreating a search will return an existing ID that may (or may not) be already cached.
Based on your description, IMHO you should use a POST. POST is for putting data on the server and, in some cases, obtain an answer. In your case, you do a search (send a query to the server) and get the result of that search (retrieve the query result).
The definition of GET says that it must be used to retrieve an already existing resource. By definition, POST is to create a new resource. This is exactly what you are doing: creating a resource on the server and retrieving it! Even if you don't store the search result, you created an object on the server and retrieved it. As PeterMmm previsouly said, you could do this with a POST (create and store the query result) and then use a GET to retrive the query, but it's more pratical do only a POST and retrieve the result.
Hope this helps! :)
REST is a manner to do things, not a protocol. Even if you dislike to POST when it is really a GET, it will work.
If you will/must stay with the "standard" definition of GET, POST, etc. than maybe consider to POST a query, that query will be stored on the server with a query id and request the query later with GET by id.
Regarding your example:http://compute.com?content={base64file}, I would use POST because you are uploading "something" to be computed. For me this "something" feels more like a resource as a simple parameter.
In contrast to this in usual search I would start to stick with GET and parameters. You make it so much easier for api-clients to test and play around with your api. Make the read-only access (which in most cases is the majority of traffic) as simple as possible!
But the dilemma of large query strings is a valid limitation of GET. Here I would go pragmatic, as long as you don't hit this limit go with GET and url-params. This will work in 98% of search-cases. Only act if you hit this limit and then also introduce POST with payload (with mime-type Content-Type: application/x-www-form-urlencoded).
Have you got more real-world examples?
The confusion around GET is a browser limitation. If you are creating a RESTful interface for an A2A or P2P application then there is no limitation to the length of your GET.
Now, if you happen to want to use a browser to view your RESTful interface (aka during development/debugging) then you will run into this limit, but there are tools out there to get around this.
This is an easy one. Use POST. HTTP doesn't impose a limit on the URL length for GET but servers do. Be pragmatic and work around that with a POST.
You could also use a GET body (that is allowed) but that's a double-whammy in that it is not correct usage and probably going to have server problems.
I think if u develop the biz system, encounter this issue, u must think whether the api design reasonable, if u GET api param design a biz_ids, and it too long.
u should think about with UI or Usecase, whether use other_biz_id to find biz_ids and build target response instead of biz_ids directly or not.
if u old api be depended on, u can add a new api for this usecase, if u module design well u add this api may fast.
I think should use protocols in a standard way as developer.
hope help u.
This question already has answers here:
When should I use GET or POST method? What's the difference between them?
(15 answers)
Closed 9 years ago.
I've only recently been getting involved with PHP/AJAX/jQuery and it seems to me that an important part of these technologies is that of POST and GET.
First, what is the difference between POST and GET? Through experimenting, I know that GET appends the returning variables and their values to the URL string
website.example/directory/index.php?name=YourName&bday=YourBday
but POST doesn't.
So, is this the only difference or are there specific rules or conventions for using one or the other?
Second, I've also seen POST and GET outside of PHP: also in AJAX and jQuery. How do POST and GET differ between these 3? Are they the same idea, same functionality, just utilized differently?
GET and POST are two different types of HTTP requests.
According to Wikipedia:
GET requests a representation of the specified resource. Note that GET should not be used for operations that cause side-effects, such as using it for taking actions in web applications. One reason for this is that GET may be used arbitrarily by robots or crawlers, which should not need to consider the side effects that a request should cause.
and
POST submits data to be processed (e.g., from an HTML form) to the identified resource. The data is included in the body of the request. This may result in the creation of a new resource or the updates of existing resources or both.
So essentially GET is used to retrieve remote data, and POST is used to insert/update remote data.
HTTP/1.1 specification (RFC 2616) section 9 Method Definitions contains more information on GET and POST as well as the other HTTP methods, if you are interested.
In addition to explaining the intended uses of each method, the spec also provides at least one practical reason for why GET should only be used to retrieve data:
Authors of services which use the HTTP protocol SHOULD NOT use GET based forms for the submission of sensitive data, because this will cause this data to be encoded in the Request-URI. Many existing servers, proxies, and user agents will log the request URI in some place where it might be visible to third parties. Servers can use POST-based form submission instead
Finally, an important consideration when using GET for AJAX requests is that some browsers - IE in particular - will cache the results of a GET request. So if you, for example, poll using the same GET request you will always get back the same results, even if the data you are querying is being updated server-side. One way to alleviate this problem is to make the URL unique for each request by appending a timestamp.
A POST, unlike a GET, typically has relevant information in the body of the request. (A GET should not have a body, so aside from cookies, the only place to pass info is in the URL.) Besides keeping the URL relatively cleaner, POST also lets you send much more information (as URLs are limited in length, for all practical purposes), and lets you send just about any type of data (file upload forms, for example, can't use GET -- they have to use POST plus a special content type/encoding).
Aside from that, a POST connotes that the request will change something, and shouldn't be redone willy-nilly. That's why you sometimes see your browser asking you if you want to resubmit form data when you hit the "back" button.
GET, on the other hand, should be idempotent -- meaning you could do it a million times and the server will do the same thing (and show basically the same result) each and every time.
Whilst not a description of the differences, below are a couple of things to think about when choosing the correct method.
GET requests can get cached by the browser which can be a problem (or benefit) when using ajax.
GET requests expose parameters to users (POST does as well but they are less visible).
POST can pass much more information to the server and can be of almost any length.
POST and GET are two HTTP request methods. GET is usually intended to retrieve some data, and is expected to be idempotent (repeating the query does not have any side-effects) and can only send limited amounts of parameter data to the server. GET requests are often cached by default by some browsers if you are not careful.
POST is intended for changing the server state. It carries more data, and repeating the query is allowed (and often expected) to have side-effects such as creating two messages instead of one.
If you are working RESTfully, GET should be used for requests where you are only getting data, and POST should be used for requests where you are making something happen.
Some examples:
GET the page showing a particular SO question
POST a comment
Send a POST request by clicking the "Add to cart" button.
With POST you can also do multipart mime encoding which means you can attach files as well. Also if you are using post variables across navigation of pages, the user will get a warning asking if they want to resubmit the post parameter. Typically they look the same in an HTTP request, but you should just stick to POST if you need to "POST" something TO a server and "GET" if you need to GET something FROM a server as that's the way they were intended.
The only "big" difference between POST & GET (when using them with AJAX) is since GET is URL provided, they are limited in ther length (since URL arent infinite in length).
I have REST services which should receive really long queries via GET. Say for example I want to query a service with many geographical coordinates to find out something about all this coordinates.
1) My first thought was to use long URIs and increase the max URI length of the servlet container.
It would look like this:
GET http://some.test/myresource?query={really big JSON object}
But it seems that URIs longer than 2 KB are not reliable due to old proxy servers (is that right?).
2) My workaround is to create a temporary resource via POST first and use the URI of this resource as parameter in the actual GET request. That would look like this:
POST http://some.test/temp
Request Body: {really big JSON object}
201 Created Location: http://some.test/temp/12309871
GET http://some.test/myresource?query=http://some.test/temp/12309871
3) Use body of GET request. I've read the answers to the question whether it is a good idea to use the body of a GET request for the query, and the consensus is: no. Even Roy Fielding says that this is a bad idea.
4) Another approach could be to interpret POST as "create query result resource" and delete this resource after the request. But I consider that to be not RESTful and to be a bad idea.
Is there a better way to handle big queries with GET requests?
Use PUT.
Why? For the following reasons:
Just because the verb PUT 'may update' the resource, doesn't mean it will or must alter underlying state of the resource.
No new resource identifier (url) should be created by the API side of a PUT. Yes, technically a PUT with a client specified identifier is possible, but in this case you're hitting an existing resource.
PUT is like GET in the fact that it should be idempotent, meaning the results of the request will always be the same regardless of how often you call it and it has no side effects.
PUT means you're putting resource data to an existing resource. In terms of a article or post in the document / blog post worlds, it would be like uploading a new revision of some document to an existing resource URL. If you upload the same revision to the same URL, nothing should change in the resource you get back.
In your case, the geo data is some new resource data you're uploading and the result you get back should be the same every time you make the same request.
A more purist method to use the GET verb for the request might be:
Create an endpoint for a query resource type
POST the JSON set of query details to a query resource endpoint and get an identifier for the query resource (say it returns a query id of 123)
Submit to the get request a query identifier http://some.test/myresource?query_id=123
Delete the query resource 123
I see the pure method much more overhead than using PUT with query resource data in the body.
I thought that the whole point in REST was to work on "documents" (or something alike). The URI part of a request is there to identify uniquely the resource to work on. The body part in contrast is there for the "contents" part of the document.
Hence, use the "body" part of the request.
Also note that the semantics of a "GET" request isn't supposed to be used for "PUTTING" or "POSTING" documents (comment in relation to your "query" example above which seems to "create" an object).
In any case, as you have pointed out, the URI part is limited (for good reason I am sure).
If you are concerned with caching, then the use of ETag/Last-Modified fields (in conjunction with "conditional GET" helps for this purpose.
Here is a slight variation on your second option. Create yourself a processor resource called QueryMaker. POST your parameters to it and let it redirect you to a temporary query resource that will return your results.
POST /QueryMaker
Body: Big Json representation of parameters
303: See Other
Location: http://example.org/TemporaryQueries/123213
If you are using a GET request to send large objects, you are not using REST correctly.
GET should be used for retrieving
resources (via some sort of unique
identifier)
POST should be used for
creating resources (with the contents
in the body)
PUT should be used for
updating a resource (with the
contents in the body)
DELETE should be used for deleting a resource
If you follow these guidelines you will never have to have overly long URIs.
Some best practice REST guidelines are here: http://www.xml.com/pub/a/2004/08/11/rest.html
The biggest limitation of URL lengths on the open Web is actually IE, which constraints them to 2083 characters.
Some proxies (e.g., all but the most recent versions of Squid) will limit them to about 4k, although this is moving towards 8k slowly.
Your #2 workaround is a good approach, depending on your use case.
Sending bodies on GETs may be allowed by some implementations, and disallowed by others, so it's a bad idea for interoperability as well as theoretical reasons. Most importantly, how will a cache know what to use as a key?
Can't you just send the big JSON data with the GET request body, instead of creating the temp resource?
Although it's not 100% kosher, i've found it works nicely with firefox and IE and IMO, the querystring is inelegant and usually exposes implementation details that don't belong in the URI. Just make sure to add a cache buster querystring parameter if you need up-to-date data because the server will ignore the data when determining whether it can return a cached response.
See here for a discussion of pros and cons of stuffing data in the GET request body.
I'm interested in exposing a direct REST interface to collections of JSON documents (think CouchDB or Persevere). The problem I'm running into is how to handle the GET operation on the collection root if the collection is large.
As an example pretend I'm exposing StackOverflow's Questions table where each row is exposed as a document (not that there necessarily is such a table, just a concrete example of a sizable collection of 'documents'). The collection would be made available at /db/questions with the usual CRUD api GET /db/questions/XXX, PUT /db/questions/XXX, POST /db/questions is in play. The standard way to get the entire collection is to GET /db/questions but if that naively dumps each row as a JSON object, you'll get a rather sizeable download and a lot of work on the part of the server.
The solution is, of course, paging. Dojo has solved this problem in its JsonRestStore via a clever RFC2616-compliant extension of using the Range header with a custom range unit items. The result is a 206 Partial Content that returns only the requested range. The advantage of this approach over a query parameter is that it leaves the query string for...queries (e.g. GET /db/questions/?score>200 or somesuch, and yes that'd be encoded %3E).
This approach completely covers the behavior I want. The problem is that RFC 2616 specifies that on a 206 response (emphasis mine):
The request MUST have included a Range header field (section 14.35)
indicating the desired range, and MAY have included an If-Range
header field (section 14.27) to make the request conditional.
This makes sense in the context of the standard usage of the header but is a problem because I'd like the 206 response to be the default to handle naive clients/random people exploring.
I've gone over the RFC in detail looking for a solution but have been unhappy with my solutions and am interested in SO's take on the problem.
Ideas I've had:
Return 200 with a Content-Range header! - I don't think that this is wrong, but I'd prefer if a more obvious indicator that the response is only Partial Content.
Return 400 Range Required - There is not a special 400 response code for required headers, so the default error has to be used and read by hand. This also makes exploration via web browser (or some other client like Resty) more difficult.
Use a query parameter - The standard approach, but I'm hoping to allow queries a la Persevere and this cuts into the query namespace.
Just return 206! - I think most clients wouldn't freak out, but I'd rather not go against a MUST in the RFC
Extend the spec! Return 266 Partial Content - Behaves exactly like 206 but is in response to a request that MUST NOT contain the Range header. I figure that 266 is high enough that I shouldn't run into collision issues and it makes sense to me but I'm not clear on whether this is considered taboo or not.
I'd think this is a fairly common problem and I'd like to see this done in a sort of de facto fashion so I or someone else isn't reinventing the wheel.
What's the best way to expose a full collection via HTTP when the collection is large?
I don't really agree with some of you guys. I've been working for weeks on this features for my REST service. What I ended up doing is really simple. My solution only makes a sense for what REST people call a collection.
Client MUST include a "Range" header to indicate which part of the collection he needs, or otherwise be ready to handle a 413 REQUESTED ENTITY TOO LARGE error when the requested collection is too large to be retrieved in a single round-trip.
Server sends a 206 PARTIAL CONTENT response, with the Content-Range header specifying which part of the resource has been sent, and an ETag header to identify the current version of the collection. I usually use a Facebook-like ETag {last_modification_timestamp}-{resource_id}, and I consider that the ETag of a collection is that of the most recently modified resource it contains.
To request a specific part of a collection, the client MUST use the "Range" header, and fill the "If-Match" header with the ETag of the collection obtained from previously performed requests to acquire other parts of the same collection. The server can therefore verify that the collection hasn't changed before sending the requested portion. If a more recent version exists, a 412 PRECONDITION FAILED response is returned to invite the client to retrieve the collection from scratch. This is necessary because it could mean that some resources might have been added or removed before or after the currently requested part.
I use ETag/If-Match in tandem with Last-Modified/If-Unmodified-Since to optimize cache. Browsers and proxies might rely on one or both of them for their caching algorithms.
I think that a URL should be clean unless it's to include a search/filter query. If you think about it, a search is nothing more than a partial view of a collection. Instead of the cars/search?q=BMW type of URLs, we should see more cars?manufacturer=BMW.
My gut feeling is that the HTTP range extensions aren't designed for your use case, and thus you shouldn't try. A partial response implies 206, and 206 must only be sent if the client asked for it.
You may want to consider a different approach, such as the one use in Atom (where the representation by design may be partial, and is returned with a status 200, and potentially paging links). See RFC 4287 and RFC 5005.
You can still return Accept-Ranges and Content-Ranges with a 200 response code. These two response headers give you enough information to infer the same information that a 206 response code provides explicitly.
I would use Range for pagination, and have it simply return a 200 for a plain GET.
This feels 100% RESTful and doesn't make browsing any more difficult.
Edit:
I wrote a blog post about this: http://otac0n.com/blog/2012/11/21/range-header-i-choose-you.html
If there is more than one page of responses, and you don't want to offer the whole collection at once, does that mean there are multiple choices?
On a request to /db/questions, return 300 Multiple Choices with Link headers that specify how to get to each page as well as a JSON object or HTML page with a list of URLs.
Link: <>; rel="http://paged.collection.example/relation/paged"
Link: <>; rel="http://paged.collection.example/relation/paged"
...
You'd have one Link header for each page of results (an empty string means the current URL, and the URL is the same for each page, just accessed with different ranges), and the relationship is defined as a custom one per the upcoming Link spec. This relationship would explain your custom 266, or your violation of 206. These headers are your machine-readable version, since all of your examples require an understanding client anyway.
(If you stick with the "range" route, I believe your own 2xx return code, as you described it, would be the best behavior here. You're expected to do this for your applications and such ["HTTP status codes are extensible."], and you have good reasons.)
300 Multiple Choices says you SHOULD also provide a body with a way for the user agent to pick. If your client is understanding, it should use the Link headers. If it's a user manually browsing, perhaps an HTML page with links to a special "paged" root resource that can handle rendering that particular page based on the URL? /humanpage/1/db/questions or something hideous like that?
The comments on Richard Levasseur's post remind me of an additional option: the Accept header (section 14.1). Back when the oEmbed spec came out, I wondered why it hadn't been done entirely using HTTP, and wrote up an alternative using them.
Keep the 300 Multiple Choices, the Link headers and the HTML page for an initial naive HTTP GET, but rather than use ranges, have your new paging relationship define the use of the Accept header. Your subsequent HTTP request might look like this:
GET /db/questions HTTP/1.1
Host: paged.collection.example
Accept: application/json;PagingSpec=1.0;page=1
The Accept header allows you to define an acceptable content type (your JSON return), plus extensible parameters for that type (your page number). Riffing on my notes from my oEmbed writeup (can't link to it here, I'll list it in my profile), you could be very explicit and provide a spec/relation version here in case you need to redefine what the page parameter means in the future.
Edit:
After thinking about it a bit more, I'm inclined to agree that Range headers aren't appropriate for pagination. The logic being, the Range header is intended for the server's response, not the applications. If you served 100 megabytes of results, but the server (or client) could only process 1 megabyte at a time, well, thats what the Range header is for.
I'm also of the opinion that a subset of resources is its own resource (similar to relational algebra.), so it deserve representation in the URL.
So basically, I recant my original answer (below) about using a header.
I think you answered your own question, more or less - return 200 or 206 with content-range and optionally use a query parameter. I would sniff the user agent and content type and, depending on those, check for a query parameter. Otherwise, require the range headers.
You essentially have conflicting goals - let people use their browser to explore (which doesn't easily allow custom headers), or force people to use a special client that can set headers (which doesn't let them explore).
You could just provide them with the special client depending on the request - if it looks like a plain browser, send down a small ajax app that renders the page and sets the necessary headers.
Of course, there is also the debate about whether the URL should contain all the necessary state for this sort of thing. Specifying the range using headers can be considered "un-restful" by some.
As an aside, it would be nice if servers could respond with a "Can-Specify: Header1, header2" header, and web browsers would present a UI so users could fill in values, if they desired.
You might consider using a model something like the Atom Feed Protocol since it has a sane HTTP model of collections and how to manipulate them (where insane means WebDAV).
There's the Atom Publishing Protocol which defines the collection model and REST operations plus you can use RFC 5005 - Feed Paging and Archiving to page through big collections.
Switching from Atom XML to JSON content should not affect the idea.
I think the real problem here is that there is nothing in the spec that tells us how to do automatic redirects when faced with 413 - Requested Entity Too Large.
I was struggling with this same problem recently and I looked for inspiration in the RESTful Web Services book. Personally I don't think 206 is appropriate due to the header requirement. My thoughts also led me to 300, but I thought that was more for different mime-types, so I looked up what Richardson and Ruby had to say on the subject in Appendix B, page 377. They suggest that the server just pick the preferred representation and send it back with a 200, basically ignoring the notion that it should be a 300.
That also jibes with the notion of links to next resources that we have from atom. The solution I implemented was to add "next" and "previous" keys to the json map I was sending back and be done with it.
Later on I started thinking maybe the thing to do is send a 307 - Temporary Redirect to a link that would be something like /db/questions/1,25 - that leaves the original URI as the canonical resource name, but it gets you to an appropriately named subordinate resource. This is behavior I'd like to see out of a 413, but 307 seems a good compromise. Haven't actually tried this in code yet though. What would be even better is for the redirect to redirect to a URL containing the actual IDs of the most recently asked questions. For example if each question has an integer ID, and there are 100 questions in the system and you want to show the ten most recent, requests to /db/questions should be 307'd to /db/questions/100,91
This is a very good question, thanks for asking it. You confirmed for me that I'm not nuts for having spent days thinking about it.
One of the big problems with range headers is that a lot of corporate proxies filter them out. I'd advise to use a query parameter instead.
With the publication of rfc723x, unregistered range units do go against an explicit recommendation in the spec. Consider rfc7233 (deprecating rfc2616):
"New range units ought to be registered with IANA" (along with a reference to a HTTP Range Unit Registry).
You can detect the Range header, and mimic Dojo if it is present, and mimic Atom if it is not. It seems to me that this neatly divides the use cases. If you are responding to a REST query from your application, you expect it to be formatted with a Range header. If you are responding to a casual browser, then if you return paging links it will let the tool provide an easy way to explore the collection.
Seems to me that the best way to do this is to include range as query parameters. e.g., GET /db/questions/?date>mindate&date<maxdate. Upon a GET to the /db/questions/ with no query parameters, return 303 with Location: /db/questions/?query-parameters-to-retrieve-the-default-page. Then provide a different URL by which whomever is consuming your API to get statistics about the collection (e.g., what query parameters to use if s/he wants the entire collection);
While its possible to use the Range header for this purpose, I don't think that was the intent. It seems to have been designed for handling flaky connections as well as limiting the data (so the client can request part of the request if something was missing or the size was too large to process). You are hacking pagination into something that is likely used for other purposes at the communication layer.
The "proper" way to handle pagination is with the types you return. Rather than returning questions object, you should be returning a new type instead.
So if questions is like this:
<questions>
<question index=1></question>
<question index=2></question>
...
</questions>
The new type could be something like this:
<questionPage>
<startIndex>50</startIndex>
<returnedCount>10</returnedCount>
<totalCount>1203</totalCount>
<questions>
<question index=50></question>
<question index=51></question>
..
</questions>
<questionPage>
Of course you control your media types, so you can make your "pages" a format that suits your needs. If you make is something generic, you can have a single parser on the client to handle paging the same for all types. I think that is more in the spirit of the HTTP specification, rather than fudging the Range parameter for something else.