Play ignore continue 100 requests - http

A play Controller with Actions for POST requests, may need to ignore HTTP Request Automatic Retries to prevent Controller code being ran multiple times.
What is the best way to do this in Play?

I would recommend the following:
add a unique ID to every post request as part of the query string.
Extend the DefaultHttpRequestHandler, as explained here
In your extension check if this is a request for which you want to
prevent retries and read the request's ID using getQueryString on the RequestHeader, see docs here.
Check if you have already seen the ID by querying a datastore such as Redis. Save the ID to Redis if this is the first time you have seen it.
Drop the POST request if you have already seen the ID, otherwise forward it to the router.

Related

How can I communicate to the user's browser that a POST request it made is side-effect-free?

I have to add a page to my website that will be accessed via a POST request. The request is side-effect-free, hence it is safe for the user to use their browser's "Refresh" button on the page. The reason why it has to be POST and not GET is that the volume of data needed to characterise the request is large (it includes a collection of arbitrarily many GUIDs identifying resources to be operated upon at a later stage in the process).
When the user of a browser refreshes a page that was the result of a POST request, the browser will typically warn them that the form will be resubmitted and may cause an action to be repeated. This is not a concern in this case, because as I said, the action of requesting this page is side-effect-free. I therefore want to communicate to the user's browser that no such warning should be presented to the user if they use the "Refresh" function. How can I do this?
You cannot prevent the browser from warning the user about resubmitting a POST request.
References
Mozzila forums (Firefox predecessor) discussed the feature extensively starting in 2002. Some discussion of other browsers also occurs. It is clear that the decision was made to enforce the feature and although workarounds were suggested, they were not taken up.
Google Chrome (2008) and other subsequent browsers also included the feature.
The reasons for this related to the difference between GET and POST in rfc2616: Hypertext Transfer Protocol -- HTTP/1.1 (1999).
GET
retrieve whatever information is identified by the Request-URI
POST
request that the origin server accept the
entity enclosed in the request as a new subordinate of the resource
This infers that whilst a GET request only retrieves data, a POST request modifies the data in some way. As per the discussion on the Mozilla forum, the decision was that enabling the warning to be disabled created more risks for the user than the inconvenience of keeping it.
Solutions
Instead a solution is to use sessions to store the data in the POST request and redirect the user with a GET request to a URL that looks in the session data to find the original request parameters.
Assuming the server side application has session support and it's enabled.
User submits POST request with data that generates a specific result POST /results
Server stores that data in the session with a known key
Server responds with a 302 Redirect to a chosen URL (Could be the same one)
The client will request the new page with a GET request GET /results
Server identifies the incoming GET request is asking for the results of previous POST request and retrieves the data from the session using the known key.
If the user refreshes the page then steps 4 & 5 are repeated.
To make the solution more robust, the POST data could be assigned to a unique key that is passed as part of the path or query in the 302 redirect GET /results?set=1. This would enable multiple different pages to be viewed and refreshed, for example in different browser tabs. Consideration must be given to ensuring that the unique key is valid and does not allow access to other session data.
Some systems, Kibana, Grafana, pastebin.com and many others go one step further. The POST request values are stored in a persistent data store and a unique short URL is provided to the user. The short URL can be used in GET requests and shared with other users to view the same result of what was originally a POST request.
You can solve this problem by implementing the Post/Redirect/Get pattern.
You typically get a browser warning when trying to re-send a POST request for security reasons. Consider a form where you input personal data to register an account or order a product. If you would double-send your data it might happen that you register twice or buy the same thing two times (of course, this is just a theoretical example). Thus, the user should get warned when trying to send the same POST request several times. This behaviour is intended and cannot be disabled but avoided by using the aforementioned PRG pattern.
Image from Wikipedia (published under LGPL).
In simple words, this pattern can be used to avoid double submissions of form data that could possibly cause undesired results. You have to configure your server to redirect the affected incoming POST requests using the status code 303 ("see other"). Then the user will be redirected (using a GET request) to a confirmation page, showing that the request has been successful and will now be processed. If the user now reloads the page, he / she will be redirected to the same page without re-submitting the POST request.
However, this strategy might not always work. In case the server did not receive the first submission yet (because of traffic for instance), if the user now re-submits the second POST request could still be sent.
If you provide more information on your tech stack, I can expand my answer by adding specific code samples.
You can't prevent all browsers from showing this "Are you sure you want to re-submit this form?" popup when the user refreshes a page that is the result of a POST request. So you will have to turn this POST request into a GET request if you want to prevent this popup when your users hit F5 on that page.
And for a search form, which you kind of admitted this was for, turning a POST into a GET has its own problems.
For starters, are you sure you need POST to begin with? Is the data really too large to fit in the query string? Taking a reasonable limit of 1024 characters, being around 30 GUIDs (give or take some space for repeated &q=), why do you need the search parameters to be GUIDs to begin with? If you can map them or look them up somehow, you could perhaps limit the size of each parameter to a handful of characters instead of 32 for a non-dashed GUID, and with 5 characters per key you could suddenly fit 200 parameters in the query string.
Still not enough? Then you need a POST indeed.
One approach, mentioned in comments, is using AJAX, so your search form doesn't actually submit, but instead it sends the query data in the background through a JavaScript HTTP POST request and updates the page with the results. This has the benefit that refreshing the page doesn't prompt, as there's only a GET as far as the browser is concerned, but there's one drawback: search results don't get a unique URL, so you can't cache, bookmark or share them.
If you don't care about caching or URL bookmarking, then AJAX definitely is the simplest option here and you need to read no further.
For all non-AJAX approaches, you need to persist the query parameters somewhere, enabling a Post/Redirect/Get pattern. This patterns ends up with a page that is the result of a GET request, which users can refresh without said popup. What the other answers are being quite handwavy about, is how to properly do this.
Options are:
Serverside session
When POSTing to the server, you can let the server persist the query parameters in the session (all major serverside frameworks enable you to use sessions), then redirect the user to a generic /search-results page, which on the server side reads the data from the session and presents the user with the results built from querying the database combined with the query parameters from the session.
Drawbacks:
Sessions generally time out, and they do so for good reasons. If your user hits F5 after, say, 20 minutes, their session data is gone, and so are their query parameters.
Sessions are shared between browser tabs. If your user is searching for thing A on tab 1, and for thing B on tab 2, the parameters of the tab that's submitted latest, will overwrite the earlier tabs when those are refreshed.
Sessions are per browser. There's generally no trivial way to share sessions (apart from putting the session ID in the URL, but see the first bullet), so you can't bookmark or share your search results.
Local storage / cookies
You could think "but cookies can contain more data than the query string", but just no. Apart from having a limit as well, they're also shared between tabs and can't be (easily) shared between users and not bookmarked.
Local storage also isn't an option, because while that can contain way more data - it doesn't get sent to the server. It's local storage.
Serverside persistent storage
If your search queries actually are that complex that you need multiple KB of query parameters, then you could probably benefit from persisting the query parameters in a database.
So for each search request, you create a new search_query database record that contains the appropriate parameters for the query-to-execute, and, given search results aren't private, you could even write some code that looks up whether the given parameter combination has been used before and first perform a lookup.
So you get a unique search_id that points to a set of parameters with which you can perform a query. Now you can redirect your user, so they perform a GET request to this page:
/search-results?search_id=Xxx
And there you render the results for the given query. Benefits:
You can cache, bookmark and share the URL /search-results?search_id=Xxx
You can refresh the page displaying the search results without an annoying popup
Each browser tab displays its own search results
Of course this approach also has drawbacks:
Unless you use a unguessable key for search_id, users can enumerate earlier searches by other users
Each search costs permanent serverside storage, unless you decide to evict earlier searches based on some criteria

How can I cancel all accepted requests or list them in here batch api?

One of our our batch requests is holding up the queue so that all subsequent requests never transition past accepted. We need to flush the queue but the only api call I see on the documentation is for deleteing a job request by specifying its job ID. Is there a way to delete all job requests or list job ids so that we can then delete the one by one? If there isn't is there a way for the Here team to clear our queue?
For each job request you send, you get a request ID back. With this ID, you can cancel (terminate) or delete (remove) the job. Currently the batch geocoder API does not support cancelling or deleting all jobs in a single request. This has to be done one after the other for each request (by specifying the request ID).
Example request to cancel a job:
https://developer.here.com/documentation/batch-geocoder/topics/example-cancel.html

Update database with GET requests?

I've read online that you shouldn't update your database with GET requests for the following reasons:
GET request is idempotent and safe
violates HTTP spec
should always read data from a server-database
Let's say that we have build a URL Shortener service. When someone clicks on the link or paste it at the browser's address bar it will be a GET request.
So, if I want to update, on my database, the stats of a shortened link every time it's been clicked, how can I do this if GET requests are idempotent?
The only way I can think of is by calling a PUT request inside the server's side code that handles the GET request.
Is this a good practice or is there a better way to do this?
It seems as you're mixing up a few things here.
While you shouldn't use GET requests to transfer sensitive data (as it is shown in the request URL and most likely logged somewhere in between), there is nothing wrong with using them in your use case. You are only updating a variable serverside.
Just keep in mind that the request parameter are stored in the URL when using GET requests and you should be ok.

Consequences of POST not being idempotent (RESTful API)

I am wondering if my current approach makes sense or if there is a better way to do it.
I have multiple situations where I want to create new objects and let the server assign an ID to those objects. Sending a POST request appears to be the most appropriate way to do that.
However since POST is not idempotent the request may get lost and sending it again may create a second object. Also requests being lost might be quite common since the API is often accessed through mobile networks.
As a result I decided to split the whole thing into a two-step process:
First sending a POST request to create a new object which returns the URI of the new object in the Location header.
Secondly performing an idempotent PUT request to the supplied Location to populate the new object with data. If a new object is not populated within 24 hours the server may delete it through some kind of batch job.
Does that sound reasonable or is there a better approach?
The only advantage of POST-creation over PUT-creation is the server generation of IDs.
I don't think it worths the lack of idempotency (and then the need for removing duplicates or empty objets).
Instead, I would use a PUT with a UUID in the URL. Owing to UUID generators you are nearly sure that the ID you generate client-side will be unique server-side.
well it all depends, to start with you should talk more about URIs, resources and representations and not be concerned about objects.
The POST Method is designed for non-idempotent requests, or requests with side affects, but it can be used for idempotent requests.
on POST of form data to /some_collection/
normalize the natural key of your data (Eg. "lowercase" the Title field for a blog post)
calculate a suitable hash value (Eg. simplest case is your normalized field value)
lookup resource by hash value
if none then
generate a server identity, create resource
Respond => "201 Created", "Location": "/some_collection/<new_id>"
if found but no updates should be carried out due to app logic
Respond => 302 Found/Moved Temporarily or 303 See Other
(client will need to GET that resource which might include fields required for updates, like version_numbers)
if found but updates may occur
Respond => 307 Moved Temporarily, Location: /some_collection/<id>
(like a 302, but the client should use original http method and might do automatically)
A suitable hash function might be as simple as some concatenated fields, or for large fields or values a truncated md5 function could be used. See [hash function] for more details2.
I've assumed you:
need a different identity value than a hash value
data fields used
for identity can't be changed
Your method of generating ids at the server, in the application, in a dedicated request-response, is a very good one! Uniqueness is very important, but clients, like suitors, are going to keep repeating the request until they succeed, or until they get a failure they're willing to accept (unlikely). So you need to get uniqueness from somewhere, and you only have two options. Either the client, with a GUID as Aurélien suggests, or the server, as you suggest. I happen to like the server option. Seed columns in relational DBs are a readily available source of uniqueness with zero risk of collisions. Round 2000, I read an article advocating this solution called something like "Simple Reliable Messaging with HTTP", so this is an established approach to a real problem.
Reading REST stuff, you could be forgiven for thinking a bunch of teenagers had just inherited Elvis's mansion. They're excitedly discussing how to rearrange the furniture, and they're hysterical at the idea they might need to bring something from home. The use of POST is recommended because its there, without ever broaching the problems with non-idempotent requests.
In practice, you will likely want to make sure all unsafe requests to your api are idempotent, with the necessary exception of identity generation requests, which as you point out don't matter. Generating identities is cheap and unused ones are easily discarded. As a nod to REST, remember to get your new identity with a POST, so it's not cached and repeated all over the place.
Regarding the sterile debate about what idempotent means, I say it needs to be everything. Successive requests should generate no additional effects, and should receive the same response as the first processed request. To implement this, you will want to store all server responses so they can be replayed, and your ids will be identifying actions, not just resources. You'll be kicked out of Elvis's mansion, but you'll have a bombproof api.
But now you have two requests that can be lost? And the POST can still be repeated, creating another resource instance. Don't over-think stuff. Just have the batch process look for dupes. Possibly have some "access" count statistics on your resources to see which of the dupe candidates was the result of an abandoned post.
Another approach: screen incoming POST's against some log to see whether it is a repeat. Should be easy to find: if the body content of a request is the same as that of a request just x time ago, consider it a repeat. And you could check extra parameters like the originating IP, same authentication, ...
No matter what HTTP method you use, it is theoretically impossible to make an idempotent request without generating the unique identifier client-side, temporarily (as part of some request checking system) or as the permanent server id. An HTTP request being lost will not create a duplicate, though there is a concern that the request could succeed getting to the server but the response does not make it back to the client.
If the end client can easily delete duplicates and they don't cause inherent data conflicts it is probably not a big enough deal to develop an ad-hoc duplication prevention system. Use POST for the request and send the client back a 201 status in the HTTP header and the server-generated unique id in the body of the response. If you have data that shows duplications are a frequent occurrence or any duplicate causes significant problems, I would use PUT and create the unique id client-side. Use the client created id as the database id - there is no advantage to creating an additional unique id on the server.
I think you could also collapse creation and update request into only one request (upsert). In order to create a new resource, client POST a “factory” resource, located for example at /factory-url-name. And then the server returns the URI for the new resource.
Why don't you use a request Id on your originating point (your originating point should do two things, send a GET request on request_id=2 to see if it's request has been applied - like a response with person created and created as part of request_id=2
This will ensure your originating system knows what was the last request that was executed as the request id is stored in db.
Second thing, if your originating point finds that last request was still at 1 not yet 2, then it may try again with 3, to make sure if by any chance just the GET response has gotten lost but the request 2 was created in the db.
You can introduce number of tries for your GET request and time to wait before firing again a GET etc kinds of system.

Which HTTP methods match up to which CRUD methods?

In RESTful style programming, we should use HTTP methods as our building blocks. I'm a little confused though which methods match up to the classic CRUD methods. GET/Read and DELETE/Delete are obvious enough.
However, what is the difference between PUT/POST? Do they match one to one with Create and Update?
Create = PUT with a new URI
POST to a base URI returning a newly created URI
Read = GET
Update = PUT with an existing URI
Delete = DELETE
PUT can map to both Create and Update depending on the existence of the URI used with the PUT.
POST maps to Create.
Correction: POST can also map to Update although it's typically used for Create. POST can also be a partial update so we don't need the proposed PATCH method.
The whole key is whether you're doing an idempotent change or not. That is, if taking action on the message twice will result in “the same” thing being there as if it was only done once, you've got an idempotent change and it should be mapped to PUT. If not, it maps to POST. If you never permit the client to synthesize URLs, PUT is pretty close to Update and POST can handle Create just fine, but that's most certainly not the only way to do it; if the client knows that it wants to create /foo/abc and knows what content to put there, it works just fine as a PUT.
The canonical description of a POST is when you're committing to purchasing something: that's an action which nobody wants to repeat without knowing it. By contrast, setting the dispatch address for the order beforehand can be done with PUT just fine: it doesn't matter if you are told to send to 6 Anywhere Dr, Nowhereville once, twice or a hundred times: it's still the same address. Does that mean that it's an update? Could be… It all depends on how you want to write the back-end. (Note that the results might not be identical: you could report back to the user when they last did a PUT as part of the representation of the resource, which would ensure that repeated PUTs do not cause an identical result, but the result would still be “the same” in a functional sense.)
I Was searching for the same answer, here is what IBM say.
IBM Link
POST Creates a new resource.
GET Retrieves a resource.
PUT Updates an existing resource.
DELETE Deletes a resource.
Right now (2016) the latest HTTP verbs are GET, POST, PATCH, PUT and DELETE
Overview
HTTP GET - SELECT/Request
HTTP PUT - UPDATE
HTTP POST - INSERT/Create
HTTP PATCH - When PUTting a complete resource representation is cumbersome and utilizes more bandwidth, e.g.: when you have to update partially a column
HTTP DELETE - DELETE
Hope this helps!
If you are interested on designing REST APIs this is an ansewome reading to have! website online version github repository
There's a great youtube video talk by stormpath with actually explains this, the URL should skip to the correct part of the video:
stormpath youtube video
Also it's worth watch it's over an hour of talking but very intersting if your thinking of investing time in building a REST api.
It depends on the concrete situation.. but in general:
PUT = update or change a concrete resource with a concrete URI of the resource.
POST = create a new resource under the source of the given URI.
I.e.
Edit a blog post:
PUT:
/blog/entry/1
Create a new one:
POST:
/blog/entry
PUT may create a new resource in some circumstances where the URI of the new ressource is clear before the request.
POST can be used to implement several other use cases, too, which are not covered by the others (GET, PUT, DELETE, HEAD, OPTIONS)
The general understanding for CRUD systems is GET = request, POST = create, Put = update, DELETE = delete
The building blocks of REST are mainly the resources (and URI) and the hypermedia. In this context, GET is the way to get a representation of the resource (which can indeed be mapped to a SELECT in CRUD terms).
However, you shouldn't necessarily expect a one-to-one mapping between CRUD operations and HTTP verbs.
The main difference between PUT and POST is about their idempotent property. POST is also more commonly used for partial updates, as PUT generally implies sending a full new representation of the resource.
I'd suggest reading this:
http://roy.gbiv.com/untangled/2009/it-is-okay-to-use-post
http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
The HTTP specification is also a useful reference:
The PUT method requests that the
enclosed entity be stored under the
supplied Request-URI.
[...]
The fundamental difference between the
POST and PUT requests is reflected in
the different meaning of the
Request-URI. The URI in a POST request
identifies the resource that will
handle the enclosed entity. That
resource might be a data-accepting
process, a gateway to some other
protocol, or a separate entity that
accepts annotations. In contrast, the
URI in a PUT request identifies the
entity enclosed with the request --
the user agent knows what URI is
intended and the server MUST NOT
attempt to apply the request to some
other resource. If the server desires
that the request be applied to a
different URI,
Generally speaking, this is the pattern I use:
HTTP GET - SELECT/Request
HTTP PUT - UPDATE
HTTP POST - INSERT/Create
HTTP DELETE - DELETE
The Symfony project tries to keep its HTTP methods joined up with CRUD methods, and their list associates them as follows:
GET Retrieve the resource from the server
POST Create a resource on the server
PUT Update the resource on the server
DELETE Delete the resource from the server
It's worth noting that, as they say on that page, "In reality, many modern browsers don't support the PUT and DELETE methods."
From what I remember, Symfony "fakes" PUT and DELETE for those browsers that don't support them when generating its forms, in order to try to be as close to using the theoretically-correct HTTP method even when a browser doesn't support it.

Resources