Database rollback on API response failure

Database rollback on API response failure - api-design

A customer of ours is very persistent that they expect this "from any API" (meaning they don't want to pay for the changes). I seem to have trouble finding clear information on this though.
Say we have an API that creates an appointment for a calendar. Server-side everything was successful, data is committed to the database. API tries to send the HTTP 201 (Created) response, but something goes wrong there. Client ignores the response, or connection dropped, ...
They want our API to undo the database changes in that particular situation.
The question is not how to do this, but rather if this is something most APIs do? Is this standard behavior? Or something similar like refusing duplicate create requests?

The difficult part of course is to actually know if an API has failed to send the response, and as far as I am concerned with respect to the crux of the question, it is not a usual behavior implemented. If the user willingly inputs the data, you can go ahead and store it. If the response doesn't return properly due to timeouts (you are not responsible for user "ignoring" the response), then the client side code can refresh on failure and load fresh data. And the user can delete inputted data themselves(given you provide an endpoint for that)
Depending on the database, it is possible to make all database changes of an API reversible. For example, with SQL, you use [SQL transactions][1] using commit, rollback and savepoints. There is most likely a similar mechanism available for noSQL.

Related

Does Firebase Realtime Database guarantees FCFS order when serving requests?

This is rather just a straight forward question.
Does Firebase Realtime Database guarantees to follow 'First come first serve' rule when handling requests?
But when there is a write-request, and then instantaneously followed by a read-request, is the read-request will fetch updated data?
When there is a write-read-write sequence of requests, does for sure read-request fetch the data written by first write?
Suppose there is a write-request, which was unable to perform (due to some connection issues). As firebase can work even in offline, that change will be saved locally. Now from somewhere else another write-request was made and it completed successfully. After this, if the first device comes online, does it modify the values(since it arrived latest)? or not(since it was initiated prior to the latest changes)?

There are a lot of questions in your post, and many of them depend on how you implement the functionality. So it's not nearly as straightforward as you may think.
The best I can do is explain a bit of how the database works in the scenarios you mention. If you run into more questions from there, I recommend implementing the use-case and posting back with an MCVE for each specific question.
Writes from a single client are handled in the order in which that client makes them.
But writes from different clients are handled with a last-write-wins logic. If your use-case requires something else, include a client-side timestamp in the write and use security rules to reject writes that are older than the current state.
Firebase synchronizes state to the listeners, and not necessarily all (write) events that led to this state. So it is possible (and fairly common) for listeners to not get all state changes that happened, for example if multiple changes to the same state happened while they were offline.
A read of data on a client that this client itself has changed, will always see the state including its own changes.

What is the best practice for a REST api to manage maximum characters for a field?

I have a Web Api RESTful service that has a few POST endpoints to insert a new object to the database. We want to limit the maximum characters accepted for the object name to 20. Should the DB, API, or UI handle this?
Obviously, we could prevent more than 20 characters on any of those layers. However, if it gets past the UI then the form has been submitted. At that point, we would want the Service layer or the DB layer to return an informative explanation as to why it was not accepted. What would be the best practice to handle this?

Should the DB, API, or UI handle this?
At the very least, your API must handle data validation. Everyone has their own opinions on how REST should work, but one good approach would be to return HTTP 400 (Bad Request), with some content that includes information about why the request was bad.
Client-side checking is a definite bonus on top of this. Ideally you'll never even see this code get hit at your API layer. The client should also be capable of handling the potential "Bad Request" response in a graceful way. If there's ever a mismatch between the rules applied by the API and the client, you'll want the client to recognize that its action didn't succeed, and display an appropriate error to help the user respond to the issue.
Database-level checks are also a good idea if you ever allow data to bypass your API layer, through bulk imports or something. If not, the database might just be one more place you'll have to remember to change if you ever change your validation requirements.

Would real world Meteor application use server side Methods almost exclusively?

I'm learning Meteor and fundamentally enjoy how fast I can build data driven applications however as I went through the Creating Posts chapter in the Discover Meteor book I learned about using server side Methods. Specifically the primary reason (and there are a number of very valid reasons to use these) was because of the timestamp. You wouldn't want to rely on the client date/time, you'd want to use the server date/time.
Makes sense except that in almost every application I've ever built we store date/time of row create/update in a column. Effectively every single create or update to the database records date/time which in Meteor now looks like I would need to use server side Methods to ensure data integrity.
If I'm understanding correctly that pretty much eliminates the ease of use and real-time nature of a client side Collection because I'll need to use Methods for almost every single update and create to our databases.
Just wanted to check and see how everyone else is doing this in the real world. Are you just querying a server side Method that just returns the date/time and then using client side Collection or something else?
Thanks!

The short answer to this question is that yes, every operation that affects the server's database will go through a server-side method. The only difference is whether you are defining this method explicitly or not.
When you are just getting started with Meteor, you will probably do insert/update/remove operations directly on client collections using validators, which check for whether the operation is allowed. This usage is actually calling predefined methods on both the server and client: (for a collection named foo the you have /foo/insert, for example) which simply checks the specified validators before doing the operation. As you become more familiar with Meteor you will probably override these default methods, for reasons you described (among others.)
When using your own methods, you will typically want to define a method both on the server and the client, just as the default collection functions do for you. This is because of Meteor's latency compensation, which allows most client operations to be reflected immediately in the browser without any noticeable lag, as long as they are permitted. Meteor does this by first simulating the effect of a method call in the client, updating the client's cached data temporarily, then sending the actual method call to the server. If the server's method causes a different set of changes than the client's simulation, the client's cache will be updated to reflect this when the server method returns. This also means that if the client's method would have done the same as the server, we've basically allowed for an instant operation from the perspective of the client.
By defining your own methods on the server and client, you can extend this to fill your own needs. For example, if you want to insert timestamps on updates, have the client insert whatever timestamp in the simulation method. The server will insert an authoritative timestamp, which will replace the client's timestamp when the method returns. From the client's perspective, the insert operation will be instant, except for an update to the timestamp if the client's time happens to be way off. (By the way, you may want to check out my timesync package for displaying relative server time accurately on the client.)
A final note: it's good to understand what scope you are doing collection operations in, as this was one of the this that originally confused me about Meteor. For example, if you have a collection instance in the client Foo, Foo.insert() in normal client code will call the default pair of client/server methods. However, Foo.insert() in a client method will run only in a simulation and will never call server code - so you will need to define the same method on the server and make sure you do Foo.insert() there as well, for the method to work properly.
A good rule of thumb for moving forward is to replace groups of validated collection operations with your own methods that do the same operations, and then adding specific extra features on the server and client respectively.

In short— yes!
Publications exist to send out a 'live', and dynamic, subset of the database to the client, sending DDP added messages for existing records, followed by a ready, and then added, changed, and deleted messages to keep the client's cache consistent.
Methods exist to- directly, or indirectly— cause Mongo Updates, and like it was mentioned by Andrew, they are always in use.
But truly, because of Meteor's publication architecture, any edits to collections that are currently being published to at least one client, will be published via DDP - regardless of the source of the change to Mongo - even an outside process.

Should POST data have a re-try on timeout condition or not?

When POSTing data - either using AJAX or from a mobile device or what have you - there is often a "retry" condition, so that should something like a timeout occue, the data is POSTed again.
Is this actually a good idea?
POST data is meant to be idempotent, so if you
make a POST to the server,
the server receives the request,
takes time to execute and
then sends the data back
if the timeout is hit sometime after 3. then the next retry will send data that was meant to be idempotent.
The question then is that should a retry (when calling from the client side) be set for POST data, or should the server be designed to always handle POST data appropriately (with tokens and so on), or am i missing something?
update as per the questions - this is for a mobile app. As it happens, during testing it was noticed that with too short a timeout, the app would retry. Meanwhile, the back-end server had in fact accepted and processed the initial request, and got v. upset when the new (otherwise identical) re-request came in.

nonce's are a (partial) solution to this. The server generates a nonce and gives it to the client. The client sends the POST including the nonce, the server checks if the nonce is valid and unused and if so, acts on the POST and invalidates the nonce, if not, it reports back that the nonce is used and discards the data. Also very usefull to avoid the 'double post' problem by users clicking a submit button twice.
However, it is moving the problem from the client to a different one on the server. If you invalidate the nonce before the action, the action might still fail / hang, if you invalidate it after, the nonce is still valid for requests during the processing. So, a possible scenario on the server becomes on receiving.
Lock nonce
Do action
On any processing error preventing action completion, rollback, release lock on nonce.
On no errors, invalidate / remove nonce.
Semaphores on the server side are most helpfull with this, most backend languages have libraries for these.
So, implementing all these:
It is safe to retry, if the action is already performed it won't be done again.
A reply that the nonce has already been used can be understood as a a confirmation that the original POST has been acted upon.
If you need the result of an action where the second requests shows that the first came through, a short-lived cache would be needed server-sided.
Up to you to set a sane limit on subsequent tries (what if the 2nd fails? or the 3rd?).

The issue with automatic retries is the server needs to know if the prior POST was successfully processed to avoid unintended consequences. For example, if each POST inserts a DB record, but the last POST timed out after the insert, the automatic re-POST will cause a duplicate DB record, which is probably not what you want.
In a robust setup you may be able to catch that there was a timeout and rollback any POSTed updates. That would safely allow an automatic re-POST. But many (most?) web systems aren't this elaborate, so you'll typically require human intervention to determine if the POST should happen again.

If you can't avoid the long wait until the server responds, a better practice would be to return an immediate response (200OK with the message "processing") and have the client, later on, send a new request that checks if the action was performed.
AJAX was not designed to be used in such a way ("heavy" actions).
By the way, the default HTTP timeout is 7200 secs so I don't think you'll reach it easily - but regardless, you should avoid having the user wait for long periods of time.
Providing more information about the process (like what exactly you're trying to do) would help in suggesting ways to avoid such obstacles.

If your requests are failing enough to instigate something like this, you have major problems that have nothing to do with implementing a retry condition. I have never seen a Web app that needed this type of functionality. Programming your app to automatically beat an overloaded sever with the F5 hammer is not the solution to anything.
If this ajax request is triggered from a button click, disable that button until it returns, successful or not. If it failed, let the user click it again.

Why is using a HTTP GET to update state on the server in a RESTful call incorrect?

OK, I know already all the reasons on paper why I should not use a HTTP GET when making a RESTful call to update the state of something on the server. Thus returning possibly different data each time. And I know this is wrong for the following 'on paper' reasons:
HTTP GET calls should be idempotent
N > 0 calls should always GET the same data back
Violates HTTP spec
HTTP GET call is typically read-only
And I am sure there are more reasons. But I need a concrete simple example for justification other than "Well, that violates the HTTP Spec!". ...or at least I am hoping for one. I have also already read the following which are more along the lines of the list above: Does it violate the RESTful when I write stuff to the server on a GET call? &
HTTP POST with URL query parameters -- good idea or not?
For example, can someone justify the above and why it is wrong/bad practice/incorrect to use a HTTP GET say with the following RESTful call
"MyRESTService/GetCurrentRecords?UpdateRecordID=5&AddToTotalAmount=10"
I know it's wrong, but hopefully it will help provide an example to answer my original question. So the above would update recordID = 5 with AddToTotalAmount = 10 and then return the updated records. I know a POST should be used, but let's say I did use a GET.
How exactly and to answer my question does or can this cause an actual problem? Other than all the violations from the above bullet list, how can using a HTTP GET to do the above cause some real issue? Too many times I come into a scenario where I can justify things with "Because the doc said so", but I need justification and a better understanding on this one.
Thanks!

The practical case where you will have a problem is that the HTTP GET is often retried in the event of a failure by the HTTP implementation. So you can in real life get situations where the same GET is received multiple times by the server. If your update is idempotent (which yours is), then there will be no problem, but if it's not idempotent (like adding some value to an amount for example), then you could get multiple (undesired) updates.
HTTP POST is never retried, so you would never have this problem.

If some form of search engine spiders your site it could change your data unintentionally.
This happened in the past with Google's Desktop Search that caused people to lose data because people had implemented delete operations as GETs.

Here is an important reason that GETs should be idempotent and not be used for updating state on the server in regards to Cross Site Request Forgery Attacks. From the book: Professional ASP.NET MVC 3
Idempotent GETs
Big word, for sure — but it’s a simple concept. If an
operation is idempotent, it can be executed multiple times without
changing the result. In general, a good rule of thumb is that you can
prevent a whole class of CSRF attacks by only changing things in your
DB or on your site by using POST. This means Registration, Logout,
Login, and so forth. At the very least, this limits the confused
deputy attacks somewhat.

One more problem is there. If GET method is used , data is sent in the URL itself . In web server's logs , this data gets saved somewhere in the server along with the request path. Now suppose that if someone has access to/reads those log files , your data (can be user id , passwords , key words , tokens etc. ) gets revealed . This is dangerous and has to be taken care of .
In server's log file, headers and body are not logged but request path is . So , in POST method where data is sent in body, not in request path, your data remains safe .

i think that reading this resource:
http://www.servicedesignpatterns.com/WebServiceAPIStyles could be helpful to you to make difference between message API and resource api ?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex