Can HTTP request fail half way? - http

I am talking about only one case here.
client sent a request to server -> server received it and returned a response -> unfortunately the response dropped.
I have only one question about this.
Is this case even possible? If it's possible then what should the response code be, or will client simply see it as read timeout?
As I want to sync status between client/server and want 100% accuracy no matter how poor the network is, the answer to this question can greatly affect the client's 'retry on failure' strategy.
Any comment is appreciated.

Yes, the situation you have described is possible and occurs regularly. It is called "packet loss". Since the packet is lost, the response never reaches the client, so no response code could possibly be received. Web browsers will display this as "Error connecting to server" or similar.
HTTP requests and responses are generally carried inside TCP packets. If a TCP packet carrying the HTTP response does not arrive in the expected time window, the request is retransmitted. The request will only be retransmitted a certain number of times before a timeout error will occur and the connection is considered broken or dead. (The number of attempts before TCP timeout can be configured on both the client and server sides.)

Is this case even possible?
Yes. It's easy to see why if you picture a physical cable between the client and the server. If I send a request down the cable to the server, and then, before the server has a chance to respond, unplug the cable, the server will receive the request, but the client will never "hear" the response.
If it's possible then what should the response code be, or will client simply see it as read timeout?
It will be a timeout. If we go back to our physical cable example, the client is sitting waiting for a response that will never come. Hopefully, it will eventually give up.
It depends on exactly what tool or library you're using how this is wrapped up, however - it might give you a specific error code for "timeout" or "network error"; it might wrap it up as some internal 5xx status code; it might raise an exception inside your code; etc.

Related

Determine when HTTP(S) POST have reached receiver without waiting for full response

I want to invoke an HTTP POST with a request body and wait until it has reached the receiver, but NOT wait for any full response if the receiving server is slow to send the response.
Is this possible at all to do reliably? It's been years since I studied the internals of TCP/IP so I don't really remember the entire state machine here.
I guess that if I simply incur a timeout of say 1 seconds and then close the socket, there's no guarantee that the request has reached the remote server. Is there any signalling at all happening when the receiving server has received the entire request, but before it starts sending its response?
In practical terms I want to call a webhook URL without having to wait for a potentially slow server implementation of that webhook - I want to make the webhook request as "fire and forget" and simply ignore the responses (even if they are intermediate errors in gateways etc and the request actually didn't reach its final destination), but I'm hesitant to simply setting a low timeout (if so, how low would be "sufficient", etc)?

Does HTTP response "200 OK" give a guarantee that the document has been received by the machine who generated the HTTP request?

I have two machines, A and B.
A sends an HTTP request to B and asks for some document.
B responds back and sends the requested document and gives a 200 OK message, but machine A is complaining that the document is not received because of a network failure.
Does HTTP code 200 also work as acknowledgment that the document is received?
Does the HTTP 200 code also work as an acknowledgment that document has been received?
No. Not at all.
It is not even a guarantee that the document was completely transmitted.
The response code is in the first line of the response stream. The server could fail, or be disconnected from the client anywhere between sending the first line and the last byte of the response. The server may not even know this has happened.
In fact, there is no way that the server can know if the client received a complete (or partial) HTTP response. There is no provision for an acknowledgment in the HTTP protocol.
Now you could implement an application protocol over the top of HTTP in which the client is required to send a second HTTP request to the server to say "yes, I got the document". But this would involve some "application logic" implemented in the user's browser; e.g. in Javascript.
Absolutely not.
HTTP 200 is generated by the server, and only means that it understood the request and thinks it is able to fulfill it (e.g. the file is actually there).
All sorts of errors may occur during the transmission of the full response document (network connection breaking, packet loss, etc) which will not show up in the HTTP response, but need to be detected separately.
A pretty good guide to the HTTP protocol is found here: http://blog.catchpoint.com/2010/09/17/anatomyhttp/
You should make a distinction between the HTTP protocol and the underlying stream transport protocol, which should be reliable for HTTP purposes. The stream transport protocol will ACKnowledge all data transmission, including the response, so that both ends of exchange will affirm that the data is transmitted correctly. If the transport stream fails, then you will get a 'network failure' or similar error. When this happens, the HTTP protocol cannot continue; the data is no longer reliable or even complete.
What a 200 OK message means, at the HTTP level, is that the server has the document you're after and is about to transmit it to you. Normally you will get a content-length header as well, so you will be able to ascertain if/when the body is complete as an additional check on top of the stream protocol. From the HTTP protocol perspective, a response receives no acknowledgement, so once a response has been sent there is no verification.
However, as the stream transport is reliable, the act of sending the response will either be successful or result in an error. This does verify whether the document has been received by the network target (as noted by TripeHound, in the case of non-direct connection, e.g. a proxy, this is not a guarantee of delivery to the final target).
It's very simple to see that the 200 OK response code can't be a guarantee of anything about the response document. It's sent before the document is transmitted, so only a violation of causality could allow it to be dependent on successful reception of the document. It only serves as an indicator that the request was received properly and the server believes that it's able to fulfill the request. If the request requires extra processing (e.g. running a script), rather than just returning a static document, the response code should generally be sent after this has been completed, so it's normally an indicator that this was successful (but there are situations where this is not feasible, such as requests with persistent connections and push notifications -- the script could fail later).
On a more general level, it's never possible to provide an absolute guarantee that all messages have been received in any protocol, due to the Two Generals Problem. No acknowledgement system can get around this, because at some point there has to be a last acknowledgement; there's no way to know if this is received successfully, because that would require another acknowledgement, contradicting the premise that it was the last one.
HTTP is designed with an awareness of the possibility of various sorts of "middleboxes" - proxies operating with or without the knowledge of the client.
If there is a proxy involved, then even knowing that the server had transmitted all the data and recieved an normal close connection would not tell you anything about whether the document has been received by the machine who generated the HTTP request.
A sends a request to B. There may be all kinds of obstacles in the way that prevent the request from reaching B. In the case of https, the request may be reaching B but be rejected and it counts as if it hadn't reached B. In all these cases, B will not send any status at all.
Once the request reaches B, and there are no bugs crashing B, and no hardware failure etc. B will examine the request and determine what to do and what status to report. If A requested a file that is there and A is allowed access, B will start sending a "status 200" together with the file data.
Again all kinds of things can go wrong. A may receive nothing, or the "status 200" with no data or incomplete data etc. (By "receive" I mean that data arrives on the Ethernet cable, or through WiFi).
Usually the user of A will use some library that handles the ugly bits. With some decent library, the user can expect that they either get some error, or a status complete with the corresponding data. If a status 200 arrives at A with only half the data, the user will (depending on the design of the library) receive an error, not a status, and definitely not a status 200.
Or you may have a library that reports the status 200 and tells you "here's the first 2,000 bytes", "here's the next 2,000 bytes" and so on, and at some point when things go wrong, you might be told "sorry, there was an error, the data is incomplete".
But in general, the case that the user gets a status 200, and no data, will not happen.

Is an HTTP request 'atomic'

I understand an HTTP request will result in a response with a code and optional body.
If we call the originator of the request the 'client' and the recipient of the request the 'server'.
Then the sequence is
Client sends request
Server receives request
Server sends response
Client receive response
Is it possible for the Server to complete step 3 but step 4 does not happen (due to dropped connection, application error etc).
In other words: is it possible for the Server to 'believe' the client should have received the response, but the client for some reason has not?
Network is inherently unreliable. You can only know for sure a message arrived if the other party has acknowledged it, but you never know it did not.
Worse, with HTTP, the only acknowledge for the request is the answer and there is no acknowledge for the answer. That means:
The client knows the server has processed the request if it got the response. If it does not, it does not know whether the request was processed.
The server never knows whether the client got the answer.
The TCP stack does normally acknowledge the answer when closing the socket, but that information is not propagated to the application layer and it would not be useful there, because the stack can acknowledge receipt and then the application might not process the message anyway because it crashes (or power failed or something) and from perspective of the application it does not matter whether the reason was in the TCP stack or above it—either way the message was not processed.
The easiest way to handle this is to use idempotent operations. If the server gets the same request again, it has no side-effects and the response is the same. That way the client, if it times out waiting for the response, simply sends the request again and it will eventually (unless the connection was torn out never to be fixed again) get a response and the request will be completed.
If all else fails, you need to record the executed requests and eliminate the duplicates in the server. Because no network protocol can do that for you. It can eliminate many (as TCP does), but not all.
There is a specific section on that point on the HTTP RFC7230 6.6 Teardown (bold added):
(...)
If a server performs an immediate close of a TCP connection, there is
a significant risk that the client will not be able to read the last
HTTP response.
(...)
To avoid the TCP reset problem, servers typically close a connection
in stages. First, the server performs a half-close by closing only
the write side of the read/write connection. The server then
continues to read from the connection until it receives a
corresponding close by the client, or until the server is reasonably
certain that its own TCP stack has received the client's
acknowledgement of the packet(s) containing the server's last
response. Finally, the server fully closes the connection.
So yes, this response sent step is a quite complex stuff.
Check for example the Lingering close section on this Apache 2.4 document, or the complex FIN_WAIT/FIN_WAIT2 pages for Apache 2.0.
So, a good HTTP server should maintain the socket long enough to be reasonably certain that it's OK on the client side. But if you really need to acknowledge something in a web application, you should use a callback (image callback, ajax callback) asserting the response was fully loaded in the client browser (so another HTTP request). That means it's not atomic as you said, or at least not transactional like you could expect from a relational database. You need to add another request from the client, that maybe you'll never get (because the server had crash before receiving the acknowledgement), etc.

Detecting HTTP close using inet

In my mochiweb application, I am using a long held HTTP request. I wanted to detect when the connection with the user died, and I figured out how to do that by doing:
Socket = Req:get(socket),
inet:setopts(Socket, [{active, once}]),
receive
{tcp_closed, Socket} ->
% handle clean up
Data ->
% do something
end.
This works when: user closes his tab/browser or refreshes the page. However, when the internet connection dies suddenly (say wifi signal lost all of a sudden), or when the browser crashes abnormally, I am not able to detect a tcp close.
Am I missing something, or is there any other way to achieve this?
There is a TCP keepalive protocol and it can be enabled with inet:setopts/2 under the option {keepalive, Boolean}.
I would suggest that you don't use it. The keep-alive timeout and max-retries tends to be system wide, and it is optional after all. Using timeouts on the protocol level is better.
The HTTP protocol has the status code Request Timeout which you can send to the client if it seems dead.
Check out the after clause in receive blocks that you can use to timeout waiting for data, or use the timer module, or use erlang:start_timer/3. They all have different performance characteristics and resource costs.
There isn't a default "keep alive" (but can be enabled if supported) protocol over TCP: in case there is a connection fault when no data is exchanged, this translates to a "silent failure". You would need to account for this type of failure by yourself e.g. implement some form of connection probing.
How does this affect HTTP? HTTP is a stateless protocol - this means that every request is independent of every other. The "keep alive" functionality of HTTP doesn’t change that i.e. "silent failure" can still occur.
Only when data is exchanged can this condition be detected (or when TCP Keep Alive is enabled).
I would suggest sending the application level keep alive messages over HTTP chunked-encoding. Have your client/server smart enough to understand the keep alive messages and ignore them if they arrive on time or close and re-establish the connection again.

Detecting missing responses to long running HTTP (SOAP) requests

I need a way to detect a missing response to a long running HTTP POST request. This problem arises when the network infrastructure (firewalls, proxies, unplugged cables, etc.) drops the response packets. The server may detect this failure, but the client cannot send additional bytes after the POST to probe the state of the TCP connection. The failure may be limited to a single TCP connection. For example I may be able to subsequently open a new TCP connection to the server.
I'm looking for a solution that still uses HTTP POST and does not change the duration of the server side processing.
Some solutions that I can think of are:
Provide a side channel interface to retrieve request & response history. If the history lists the response as having been send (presumably resulting in a TCP error) but I have not yet received it within a reasonable time I can generate a local error.
Use an X header to request that the server deliver "spurious" 100 Continue provisional responses on a regular interval. If I fail to see an expected 100 Continue or a non-provisional response I can generate a local error.
Is there a state of the art solution for this problem?
It sounds to me like you are using Soap for something that would be much better done using a stateful connection, or a server side push technology.

Resources