Detecting missing responses to long running HTTP (SOAP) requests - http

I need a way to detect a missing response to a long running HTTP POST request. This problem arises when the network infrastructure (firewalls, proxies, unplugged cables, etc.) drops the response packets. The server may detect this failure, but the client cannot send additional bytes after the POST to probe the state of the TCP connection. The failure may be limited to a single TCP connection. For example I may be able to subsequently open a new TCP connection to the server.
I'm looking for a solution that still uses HTTP POST and does not change the duration of the server side processing.
Some solutions that I can think of are:
Provide a side channel interface to retrieve request & response history. If the history lists the response as having been send (presumably resulting in a TCP error) but I have not yet received it within a reasonable time I can generate a local error.
Use an X header to request that the server deliver "spurious" 100 Continue provisional responses on a regular interval. If I fail to see an expected 100 Continue or a non-provisional response I can generate a local error.
Is there a state of the art solution for this problem?

It sounds to me like you are using Soap for something that would be much better done using a stateful connection, or a server side push technology.

Related

Does HTTP response "200 OK" give a guarantee that the document has been received by the machine who generated the HTTP request?

I have two machines, A and B.
A sends an HTTP request to B and asks for some document.
B responds back and sends the requested document and gives a 200 OK message, but machine A is complaining that the document is not received because of a network failure.
Does HTTP code 200 also work as acknowledgment that the document is received?
Does the HTTP 200 code also work as an acknowledgment that document has been received?
No. Not at all.
It is not even a guarantee that the document was completely transmitted.
The response code is in the first line of the response stream. The server could fail, or be disconnected from the client anywhere between sending the first line and the last byte of the response. The server may not even know this has happened.
In fact, there is no way that the server can know if the client received a complete (or partial) HTTP response. There is no provision for an acknowledgment in the HTTP protocol.
Now you could implement an application protocol over the top of HTTP in which the client is required to send a second HTTP request to the server to say "yes, I got the document". But this would involve some "application logic" implemented in the user's browser; e.g. in Javascript.
Absolutely not.
HTTP 200 is generated by the server, and only means that it understood the request and thinks it is able to fulfill it (e.g. the file is actually there).
All sorts of errors may occur during the transmission of the full response document (network connection breaking, packet loss, etc) which will not show up in the HTTP response, but need to be detected separately.
A pretty good guide to the HTTP protocol is found here: http://blog.catchpoint.com/2010/09/17/anatomyhttp/
You should make a distinction between the HTTP protocol and the underlying stream transport protocol, which should be reliable for HTTP purposes. The stream transport protocol will ACKnowledge all data transmission, including the response, so that both ends of exchange will affirm that the data is transmitted correctly. If the transport stream fails, then you will get a 'network failure' or similar error. When this happens, the HTTP protocol cannot continue; the data is no longer reliable or even complete.
What a 200 OK message means, at the HTTP level, is that the server has the document you're after and is about to transmit it to you. Normally you will get a content-length header as well, so you will be able to ascertain if/when the body is complete as an additional check on top of the stream protocol. From the HTTP protocol perspective, a response receives no acknowledgement, so once a response has been sent there is no verification.
However, as the stream transport is reliable, the act of sending the response will either be successful or result in an error. This does verify whether the document has been received by the network target (as noted by TripeHound, in the case of non-direct connection, e.g. a proxy, this is not a guarantee of delivery to the final target).
It's very simple to see that the 200 OK response code can't be a guarantee of anything about the response document. It's sent before the document is transmitted, so only a violation of causality could allow it to be dependent on successful reception of the document. It only serves as an indicator that the request was received properly and the server believes that it's able to fulfill the request. If the request requires extra processing (e.g. running a script), rather than just returning a static document, the response code should generally be sent after this has been completed, so it's normally an indicator that this was successful (but there are situations where this is not feasible, such as requests with persistent connections and push notifications -- the script could fail later).
On a more general level, it's never possible to provide an absolute guarantee that all messages have been received in any protocol, due to the Two Generals Problem. No acknowledgement system can get around this, because at some point there has to be a last acknowledgement; there's no way to know if this is received successfully, because that would require another acknowledgement, contradicting the premise that it was the last one.
HTTP is designed with an awareness of the possibility of various sorts of "middleboxes" - proxies operating with or without the knowledge of the client.
If there is a proxy involved, then even knowing that the server had transmitted all the data and recieved an normal close connection would not tell you anything about whether the document has been received by the machine who generated the HTTP request.
A sends a request to B. There may be all kinds of obstacles in the way that prevent the request from reaching B. In the case of https, the request may be reaching B but be rejected and it counts as if it hadn't reached B. In all these cases, B will not send any status at all.
Once the request reaches B, and there are no bugs crashing B, and no hardware failure etc. B will examine the request and determine what to do and what status to report. If A requested a file that is there and A is allowed access, B will start sending a "status 200" together with the file data.
Again all kinds of things can go wrong. A may receive nothing, or the "status 200" with no data or incomplete data etc. (By "receive" I mean that data arrives on the Ethernet cable, or through WiFi).
Usually the user of A will use some library that handles the ugly bits. With some decent library, the user can expect that they either get some error, or a status complete with the corresponding data. If a status 200 arrives at A with only half the data, the user will (depending on the design of the library) receive an error, not a status, and definitely not a status 200.
Or you may have a library that reports the status 200 and tells you "here's the first 2,000 bytes", "here's the next 2,000 bytes" and so on, and at some point when things go wrong, you might be told "sorry, there was an error, the data is incomplete".
But in general, the case that the user gets a status 200, and no data, will not happen.

Can HTTP request fail half way?

I am talking about only one case here.
client sent a request to server -> server received it and returned a response -> unfortunately the response dropped.
I have only one question about this.
Is this case even possible? If it's possible then what should the response code be, or will client simply see it as read timeout?
As I want to sync status between client/server and want 100% accuracy no matter how poor the network is, the answer to this question can greatly affect the client's 'retry on failure' strategy.
Any comment is appreciated.
Yes, the situation you have described is possible and occurs regularly. It is called "packet loss". Since the packet is lost, the response never reaches the client, so no response code could possibly be received. Web browsers will display this as "Error connecting to server" or similar.
HTTP requests and responses are generally carried inside TCP packets. If a TCP packet carrying the HTTP response does not arrive in the expected time window, the request is retransmitted. The request will only be retransmitted a certain number of times before a timeout error will occur and the connection is considered broken or dead. (The number of attempts before TCP timeout can be configured on both the client and server sides.)
Is this case even possible?
Yes. It's easy to see why if you picture a physical cable between the client and the server. If I send a request down the cable to the server, and then, before the server has a chance to respond, unplug the cable, the server will receive the request, but the client will never "hear" the response.
If it's possible then what should the response code be, or will client simply see it as read timeout?
It will be a timeout. If we go back to our physical cable example, the client is sitting waiting for a response that will never come. Hopefully, it will eventually give up.
It depends on exactly what tool or library you're using how this is wrapped up, however - it might give you a specific error code for "timeout" or "network error"; it might wrap it up as some internal 5xx status code; it might raise an exception inside your code; etc.

Is an HTTP request 'atomic'

I understand an HTTP request will result in a response with a code and optional body.
If we call the originator of the request the 'client' and the recipient of the request the 'server'.
Then the sequence is
Client sends request
Server receives request
Server sends response
Client receive response
Is it possible for the Server to complete step 3 but step 4 does not happen (due to dropped connection, application error etc).
In other words: is it possible for the Server to 'believe' the client should have received the response, but the client for some reason has not?
Network is inherently unreliable. You can only know for sure a message arrived if the other party has acknowledged it, but you never know it did not.
Worse, with HTTP, the only acknowledge for the request is the answer and there is no acknowledge for the answer. That means:
The client knows the server has processed the request if it got the response. If it does not, it does not know whether the request was processed.
The server never knows whether the client got the answer.
The TCP stack does normally acknowledge the answer when closing the socket, but that information is not propagated to the application layer and it would not be useful there, because the stack can acknowledge receipt and then the application might not process the message anyway because it crashes (or power failed or something) and from perspective of the application it does not matter whether the reason was in the TCP stack or above it—either way the message was not processed.
The easiest way to handle this is to use idempotent operations. If the server gets the same request again, it has no side-effects and the response is the same. That way the client, if it times out waiting for the response, simply sends the request again and it will eventually (unless the connection was torn out never to be fixed again) get a response and the request will be completed.
If all else fails, you need to record the executed requests and eliminate the duplicates in the server. Because no network protocol can do that for you. It can eliminate many (as TCP does), but not all.
There is a specific section on that point on the HTTP RFC7230 6.6 Teardown (bold added):
(...)
If a server performs an immediate close of a TCP connection, there is
a significant risk that the client will not be able to read the last
HTTP response.
(...)
To avoid the TCP reset problem, servers typically close a connection
in stages. First, the server performs a half-close by closing only
the write side of the read/write connection. The server then
continues to read from the connection until it receives a
corresponding close by the client, or until the server is reasonably
certain that its own TCP stack has received the client's
acknowledgement of the packet(s) containing the server's last
response. Finally, the server fully closes the connection.
So yes, this response sent step is a quite complex stuff.
Check for example the Lingering close section on this Apache 2.4 document, or the complex FIN_WAIT/FIN_WAIT2 pages for Apache 2.0.
So, a good HTTP server should maintain the socket long enough to be reasonably certain that it's OK on the client side. But if you really need to acknowledge something in a web application, you should use a callback (image callback, ajax callback) asserting the response was fully loaded in the client browser (so another HTTP request). That means it's not atomic as you said, or at least not transactional like you could expect from a relational database. You need to add another request from the client, that maybe you'll never get (because the server had crash before receiving the acknowledgement), etc.

How does a http client associate an http response with a request (with Netty) or in general?

Is a http end point suppose to respond to requests from a particular client in order that they are received?
What about if it doesn't make sense to in the case of requests handled by cluster behind a proxy or in requests handled with NIO where one request is finished faster than the other?
Is there a standard way of associating a unique id with each http request to associate with the response? How is this handled in clients like http componenets httpclient or curl?
The question comes down to the following case:
Suppose, I am downloading a file from a server and the request is not finished. Is a client capable of completing other requests on the same keep-alive connection?
Whenever a TCP connection is opened, the connection is recognized by the source and destination ports and IP addresses. So if I connect to www.google.com on destination port 80 (default for HTTP), I need a free source port which the OS will generate.
The reply of the web server is then sent to the source port (and IP). This is also how NAT works, remembering which source port belongs to which internal IP address (and vice versa for incoming connections).
As for your edit: no, a single http connection can execute one command (GET/POST/etc) at the same time. If you send another command while you are retreiving data from a previously issued command, the results may vary per client and server implementation. I guess that Apache, for example, will transmit the result of the second request after the data of the first request is sent.
I won't re-write CodeCaster's answer because it is very well worded.
In response to your edit - no. It is not. A single persistent HTTP connection can only be used for one request at once, or it would get very confusing. Because HTTP does not define any form of request/response tracking mechanism, it simply would not be possible.
It should be noted that there are other protocols which use a similar message format (conforming to RFC822), which do allow for this (using mechanisms such as SIP's cSeq header), and it would be possible to implement this in a custom HTTP app, but HTTP does not define any standard mechanism for doing this, and therefore nothing can be done that could be assumed to work everywhere. It would also present a problem with the response for the second message - do you wait for the first response to finish before sending the second response, or try and pause the first response while you send the second response? How will you communicate this in a way that guarantees messages won't become corrupted?
Note also that SIP (usually) operates over UDP, which does not guarantee packet ordering, making the cSeq system more of a necessity.
If you want to send a request to a server while another transaction is still in progress, you will need to create a new connection to the server, and hence a new TCP stream.
Facebook did some research into this while they were building their CDN, and they concluded that you can efficiently have 2 or 3 open HTTP streams at any one time, but any more than that reduces overall transfer time because of the extra packet overhead cost. I would link to the blog entry if I could find the link...

Mapping HTTP requests to HTTP responses

If I make multiple HTTP Get Requests to the same server and get HTTP 200 OK responses to each one how do I tell which request maps to which response using Wireshark?
Currently it looks like an http request is made, and the next HTTP 200 OK response is quickly received so everything is in a the proper sequence. I have seen things to the contrary however. For example using the Google Maps API v2 I've made several requests for location information and then the information is received in an arbitrary order (closely resembling the order in which I requested it, but not necessarily perfect.)
So my intuition is I cannot assume that my responses will be received in a specific order, even though they may be in order most of the time. So I'm wondering how I can determine this order from the response.
Update: Clarification as to what I need. I just need to know that the server has received the request. It seems like I need to do this by looking at sequence numbers and perhaps even ACKS. The reasoning behind this approach is I'm basically observing a web app and checking it is sending the information and the information is being received.
Update: This has nothing to do with wireshark specifically. I believe it is confusing people so I removing it from the title. It has to do with the HTTP protocol on top of the TCP/IP protocol and how we map responses to requests.
Thanks.
After you have stopped capturing packets follow this steps:
position the cursor on a GET request
Open the Analyze menu
click "Follow TCP Stream"
You get a new window with requests and responses in sequence.
While I was googling for a complete different question, I saw this one and I think I can provide a more complete answer :
HTTP dictates that responses must arrive in the order they were requested, Therefore, if you are looking at a single TCP connection at a given time you should be seeing :
Request ; Response ; Request ; Response ...
Also in HTTP/1.1, there is support for "Pipeline" where the client doesn't have to wait for responses to arrive in order to issue the next request. What could be observed in such cases is :
Request ; Response ; Request ; Request ; Response ; Response ; Request ; Response
In the HTTP response itself, there is no reference to the specific request that triggered it.
Filipo's suggestion is classic when debugging / observing a single TCP connection, but, when observing multiple TCP connections, you can't click the follow TCP Stream because you'd have to do it for each connection.
If you have many TCP connections, and many requests/responses you will have to look at TCP Source port in the request packet, and the TCP dest port in the response packet to know which response is related to each tcp connection, and then apply the HTTP request/response order rules.
Also, Wireshark CAN decompress the response body, and it will do it automatically if all the response body has arrived, but it will do so NOT in the Follow TCP Stream.
I always use Wireshark to debug HTTP.
Seems like this ability is not provided by the HTTP protocol at the application layer so I must go down to the transportation layer to determine this. In my case the TCP/IP layer using sequence numbers.
HTTP only presumes a reliable
transport; any protocol that provides
such guarantees can be used; the
mapping of the HTTP/1.1 request and
response structures onto the
transport data units of the protocol
in question is outside the scope of
this specification.
Read more:
http://www.faqs.org/rfcs/rfc2616.html#ixzz0e20kxKcz
Don't use Wireshark to debug HTTP, use an HTTP debugger such as Fiddler2

Resources