What is the way to do an idempotent HTTP request with data that does not fit in the URL? - http

So, basically the 255-char limitation of URLs is too short for me, and I don't want to rely on ignoring it.
My request will not change anything on the HTTP server end. I need to send some data with request, that is slightly larger than 1024 characters in size and is NOT of secure/secret character. The server will use the data for verification against a database and return a result of this verification, but nothing is changed. The request is thus said to be idempotent. In pseudecode, client calls RPC:
int verify_data(char[>1024] data)
Can I use POST or will this violate principles of REST and other good HTTP client/server design? I obviously cannot use GET?

I have a similar trouble
May be using PUT is a better alternative. I mean, according to REST principles PUT should also be an idempotent operation (so that servers might cache it) and you don't have the limitation size of GET

What is the maximum length of a URL in different browsers?
Do you control the server and the client? If you know that both will handle the length, then I would go ahead and run with it.

Related

Is POST the correct HTTP verb for resources that are produced by but not stored by the server?

Perhaps a partial duplicate of What is the correct http verb for a Download in a REST API? , though that is referring to a file download specifically.
In terms of CRUD operations, POST is the recommended HTTP verb for creating a resource. Does this still apply to creating resources, even if those resources are not persisted on the server? (i.e. stored in a database or similar) For example, generating some code based on code the client sent or converting a file and returning it to the client.
With such types of functionality, the server has indeed "created" something, though only in passing, returning or streaming it to the client, and not storing it in the traditional CRUD sense. I've always used POST for such functionality, but now I'm starting to double guess myself and think that those could have been GET endpoints the whole time.
Seems like a weird gray area in which neither HTTP verbs GET or POST completely match.
Is POST the correct HTTP verb for resources that are produced by but not stored by the server?
Yes - or alternatively, it's the least incorrect choice to use.
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.” -- Fielding, 2009
In 2020, the HTTP-WG adopted a proposal to define a method token for "GET with a body", which would give as an alternative to POST when we want to indicate to general purpose components that the request has safe semantics, so there should eventually be some registered method that is a better fit.
There is no gray area here.
If the state of the server changes, use an unsafe method like POST. If it does not, you can use a safe methof like GET, as long as you stay within the documented limitations of GET (no request body).
If you're looking for a safe method that does take a request body, you may want to look at https://datatracker.ietf.org/doc/draft-ietf-httpbis-safe-method-w-body/.

Consequences of POST not being idempotent (RESTful API)

I am wondering if my current approach makes sense or if there is a better way to do it.
I have multiple situations where I want to create new objects and let the server assign an ID to those objects. Sending a POST request appears to be the most appropriate way to do that.
However since POST is not idempotent the request may get lost and sending it again may create a second object. Also requests being lost might be quite common since the API is often accessed through mobile networks.
As a result I decided to split the whole thing into a two-step process:
First sending a POST request to create a new object which returns the URI of the new object in the Location header.
Secondly performing an idempotent PUT request to the supplied Location to populate the new object with data. If a new object is not populated within 24 hours the server may delete it through some kind of batch job.
Does that sound reasonable or is there a better approach?
The only advantage of POST-creation over PUT-creation is the server generation of IDs.
I don't think it worths the lack of idempotency (and then the need for removing duplicates or empty objets).
Instead, I would use a PUT with a UUID in the URL. Owing to UUID generators you are nearly sure that the ID you generate client-side will be unique server-side.
well it all depends, to start with you should talk more about URIs, resources and representations and not be concerned about objects.
The POST Method is designed for non-idempotent requests, or requests with side affects, but it can be used for idempotent requests.
on POST of form data to /some_collection/
normalize the natural key of your data (Eg. "lowercase" the Title field for a blog post)
calculate a suitable hash value (Eg. simplest case is your normalized field value)
lookup resource by hash value
if none then
generate a server identity, create resource
Respond => "201 Created", "Location": "/some_collection/<new_id>"
if found but no updates should be carried out due to app logic
Respond => 302 Found/Moved Temporarily or 303 See Other
(client will need to GET that resource which might include fields required for updates, like version_numbers)
if found but updates may occur
Respond => 307 Moved Temporarily, Location: /some_collection/<id>
(like a 302, but the client should use original http method and might do automatically)
A suitable hash function might be as simple as some concatenated fields, or for large fields or values a truncated md5 function could be used. See [hash function] for more details2.
I've assumed you:
need a different identity value than a hash value
data fields used
for identity can't be changed
Your method of generating ids at the server, in the application, in a dedicated request-response, is a very good one! Uniqueness is very important, but clients, like suitors, are going to keep repeating the request until they succeed, or until they get a failure they're willing to accept (unlikely). So you need to get uniqueness from somewhere, and you only have two options. Either the client, with a GUID as Aurélien suggests, or the server, as you suggest. I happen to like the server option. Seed columns in relational DBs are a readily available source of uniqueness with zero risk of collisions. Round 2000, I read an article advocating this solution called something like "Simple Reliable Messaging with HTTP", so this is an established approach to a real problem.
Reading REST stuff, you could be forgiven for thinking a bunch of teenagers had just inherited Elvis's mansion. They're excitedly discussing how to rearrange the furniture, and they're hysterical at the idea they might need to bring something from home. The use of POST is recommended because its there, without ever broaching the problems with non-idempotent requests.
In practice, you will likely want to make sure all unsafe requests to your api are idempotent, with the necessary exception of identity generation requests, which as you point out don't matter. Generating identities is cheap and unused ones are easily discarded. As a nod to REST, remember to get your new identity with a POST, so it's not cached and repeated all over the place.
Regarding the sterile debate about what idempotent means, I say it needs to be everything. Successive requests should generate no additional effects, and should receive the same response as the first processed request. To implement this, you will want to store all server responses so they can be replayed, and your ids will be identifying actions, not just resources. You'll be kicked out of Elvis's mansion, but you'll have a bombproof api.
But now you have two requests that can be lost? And the POST can still be repeated, creating another resource instance. Don't over-think stuff. Just have the batch process look for dupes. Possibly have some "access" count statistics on your resources to see which of the dupe candidates was the result of an abandoned post.
Another approach: screen incoming POST's against some log to see whether it is a repeat. Should be easy to find: if the body content of a request is the same as that of a request just x time ago, consider it a repeat. And you could check extra parameters like the originating IP, same authentication, ...
No matter what HTTP method you use, it is theoretically impossible to make an idempotent request without generating the unique identifier client-side, temporarily (as part of some request checking system) or as the permanent server id. An HTTP request being lost will not create a duplicate, though there is a concern that the request could succeed getting to the server but the response does not make it back to the client.
If the end client can easily delete duplicates and they don't cause inherent data conflicts it is probably not a big enough deal to develop an ad-hoc duplication prevention system. Use POST for the request and send the client back a 201 status in the HTTP header and the server-generated unique id in the body of the response. If you have data that shows duplications are a frequent occurrence or any duplicate causes significant problems, I would use PUT and create the unique id client-side. Use the client created id as the database id - there is no advantage to creating an additional unique id on the server.
I think you could also collapse creation and update request into only one request (upsert). In order to create a new resource, client POST a “factory” resource, located for example at /factory-url-name. And then the server returns the URI for the new resource.
Why don't you use a request Id on your originating point (your originating point should do two things, send a GET request on request_id=2 to see if it's request has been applied - like a response with person created and created as part of request_id=2
This will ensure your originating system knows what was the last request that was executed as the request id is stored in db.
Second thing, if your originating point finds that last request was still at 1 not yet 2, then it may try again with 3, to make sure if by any chance just the GET response has gotten lost but the request 2 was created in the db.
You can introduce number of tries for your GET request and time to wait before firing again a GET etc kinds of system.

HTTP GET and POST semantics and limitations

Earlier this week, I had to do something which feels like a semantics violation. Let me explain.
I was making a simple AJAX client application, which was to make a request to a service with a given number of parameters. Since the whole app is basically read-only, I thought that using HTTP GET was the way to go. Some of the parameters that I had to pass were simple (such as the sort order, or page number).
However, one of the required parameters could be of variable length, and this made me worry. Since I was encoding all of the parameters in the querystring of the GET request, it seemed to me that this placed an unnecessary upper limit of (roughly) 2000 characters for the request URL. And regardless, I didn't like seeing 500-character-long request URLs.
So, since a POST request doesn't have a limitation like that, I decided to switch. But this doesn't feel right. I am under the impression that a POST denotes modification of data - but I'm using it for a simple read-only request.
Is there a better way to do this? To perform a GET, with many parameters? I've heard of one method - where you perform a preliminary POST of the parameters themselves, and then perform a GET. But, this technique leaves much to be desired.
But looking past this specific case, what are the real semantics and limitations of HTTP request methods? And why does GET not support any kind of parameter payload? Using the querystring in the URL almost feels like a hack to me.
A few points on this issue:
The HTTP spec (RFC 2616) doesn't forbit GET requests to have parameters, so it's not a matter of the semantics of HTTP GET itself. However, many HTTP stacks (for clients, services, or proxies) forbid bodies in HTTP requests, the fact that you can't use them is mostly an implementation detail (quite prevalent) than a semantic issue with the HTTP GET requests
Similarly, the limitation of the URI (or query string) length isn't specified on the RFC either. It's mostly a security mitigation implemented by several HTTP server stacks to prevent a bad client from consuming server resources (for example, in IIS/ASP.NET the default limit is 2k but you can increase it via some elements in web.config). Again, it's not a semantic but a practical issue.
POST requests do indicate data modification if you're following the REST philosophy, but there are many examples of HTTP POST requests used for read-only operations. SOAP uses POST in all of its requests, regardless of whether the operation it is calling is a "safe" or a "modifying" one. So you can use POST for those operations as well. However, by deviating from the REST (and the "canonical" HTTP) usage, you'll lose some of the features of the protocol, such as caching which can be applied for GET requests, but not for POST.
Your example of using two requests (POST with parameters + GET to "get" the results) seems overkill. As I mentioned, POST requests don't necessarily mean modifying resources, so you don't have to create a new "protocol" (POST+GET) to access your operation when one request is enough.

Why is using a HTTP GET to update state on the server in a RESTful call incorrect?

OK, I know already all the reasons on paper why I should not use a HTTP GET when making a RESTful call to update the state of something on the server. Thus returning possibly different data each time. And I know this is wrong for the following 'on paper' reasons:
HTTP GET calls should be idempotent
N > 0 calls should always GET the same data back
Violates HTTP spec
HTTP GET call is typically read-only
And I am sure there are more reasons. But I need a concrete simple example for justification other than "Well, that violates the HTTP Spec!". ...or at least I am hoping for one. I have also already read the following which are more along the lines of the list above: Does it violate the RESTful when I write stuff to the server on a GET call? &
HTTP POST with URL query parameters -- good idea or not?
For example, can someone justify the above and why it is wrong/bad practice/incorrect to use a HTTP GET say with the following RESTful call
"MyRESTService/GetCurrentRecords?UpdateRecordID=5&AddToTotalAmount=10"
I know it's wrong, but hopefully it will help provide an example to answer my original question. So the above would update recordID = 5 with AddToTotalAmount = 10 and then return the updated records. I know a POST should be used, but let's say I did use a GET.
How exactly and to answer my question does or can this cause an actual problem? Other than all the violations from the above bullet list, how can using a HTTP GET to do the above cause some real issue? Too many times I come into a scenario where I can justify things with "Because the doc said so", but I need justification and a better understanding on this one.
Thanks!
The practical case where you will have a problem is that the HTTP GET is often retried in the event of a failure by the HTTP implementation. So you can in real life get situations where the same GET is received multiple times by the server. If your update is idempotent (which yours is), then there will be no problem, but if it's not idempotent (like adding some value to an amount for example), then you could get multiple (undesired) updates.
HTTP POST is never retried, so you would never have this problem.
If some form of search engine spiders your site it could change your data unintentionally.
This happened in the past with Google's Desktop Search that caused people to lose data because people had implemented delete operations as GETs.
Here is an important reason that GETs should be idempotent and not be used for updating state on the server in regards to Cross Site Request Forgery Attacks. From the book: Professional ASP.NET MVC 3
Idempotent GETs
Big word, for sure — but it’s a simple concept. If an
operation is idempotent, it can be executed multiple times without
changing the result. In general, a good rule of thumb is that you can
prevent a whole class of CSRF attacks by only changing things in your
DB or on your site by using POST. This means Registration, Logout,
Login, and so forth. At the very least, this limits the confused
deputy attacks somewhat.
One more problem is there. If GET method is used , data is sent in the URL itself . In web server's logs , this data gets saved somewhere in the server along with the request path. Now suppose that if someone has access to/reads those log files , your data (can be user id , passwords , key words , tokens etc. ) gets revealed . This is dangerous and has to be taken care of .
In server's log file, headers and body are not logged but request path is . So , in POST method where data is sent in body, not in request path, your data remains safe .
i think that reading this resource:
http://www.servicedesignpatterns.com/WebServiceAPIStyles could be helpful to you to make difference between message API and resource api ?

HTTP Verbs and Content Negotiation or GET Strings for REST service?

I am designing a REST service and am trying to weight the pros and cons of using the full array of http verbs and content negotiation vs GET string variables. Does my choice affect cacheability? Neither solution may be right for every area.
Which is best for the crud and queries (e.g. ?action=PUT)?
Which is best for api version picking (e.g. ?version=1.0)?
Which is best for return data type(e.g. ?type=json)?
CRUD/Queries are best represented with HTTP verbs. A create and update is usually a PUT or POST. A retrieve would be a GET. Deletes would be a DELETE. Thats the generally mapping. The main point is that a GET doesn't cause side effects, and that the verbs do what you'd expect them to do.
Putting the action in the URI is OK if thats the -only- way to pass it (e.g, the http client library doesn't allow you to send non-GET/POST requests). Most libraries do, though, so its strongly advised not to pass the verb via the URL.
The "best" way to version the API would be using HTTP headers on a per-request basis; this lets clients upgrade/downgrade specific requests instead of every single one. Of course, that granularity of versioning needs to be baked in at the start and could severely complicate the server-side code. Most people just use the URL used the access the servers. A longer explanation is in a blog post by Peter Williams, "Versioning Rest Web Services"
There is no best return data type; it depends on your app. JSON might be easier for Ajax websites, whereas XML might be easier for complicated structures you want to query with Xpath. Protocol Buffers are a third option. Its also debated whether its better to put the return protocol is best specified in the URL or in the HTTP headers.
For the most part, headers will have the largest affect on caching, since proxies are suppose to respect them when told, as are user agents (obviously UA's behave differently, though). Caching based on URL alone is very dependent on the layers. Some user agents don't cache anything with a query string (Safari, iirc), and proxies are free to cache or not cache as they feel appropriate.

Resources