Partial reading of HTTP requests - http

Suppose I have a server (REST) application, which does not need to read fully incoming HTTP requests. Clients may send large HTTP requests of any size but I need only first X Kilobytes.
I would like to read only X Kilobytes and immediately close the connection. Does it make sense? Is it legal in terms of HTTP? What are alternatives?

I would like to read only X Kilobytes and immediately close the connection. Does it make sense?
Not for a REST-ful application.
Is it legal in terms of HTTP?
Yes, technically. In the HTTP protocol a server response of some kind is always expected for a complete transaction. This will be experienced by the client as a premature ending of the connection, i.e. an incomplete or aborted transaction.
What are alternatives?
What are you trying to accomplish?
If you just want to read the first X bytes of whatever is sent by any client who connects and then not bother to reply at all, then the HTTP protocol is not for you, never mind REST.

Related

HTTP Request parsing

I would like to understand the usage of 'Transfer-encoding: Chunked' in case of HTTP requests.
Is it common for requests to be chunked?
My thinking is no since requests need to be completely read before processing, it does not make sense to be sending chunked requests.
It is not that common, but it can be very useful for large request bodies.
My thinking is no since requests need to be completely read before processing, it does not make sense to be sending chunked requests.
(1) No, they don't need to be read completely.
(2) ...and the main reason to compress it to save bytes on the wire anyway.
For an HTTP agent acting as a reverse proxy or a forward proxy, so taking a message from one side and sending it on the other side, using a chunked transmission means you can send the parts of the message you have without storing it locally. You avoid the 'buffering' problems, slowdown and storage.
You also have some optimizations based on each actor preferred size of data blocks, like you could have an actor which likes sending packets of 8000 bytes, because that's the good number for his own kernel settings (tcp windows, internal http server buffer size, etc), while another actor on the message transmission using smaller chunks of 2048 bytes.
Finally, you do not need to compute the size of the message, the message will end on the end-of-stream marker, that's all. Which is also usefull if you are sending something which is compressed on the fly, you may not know the final size until everything is compressed.
Chunked transmission is used a lot. It is the default mode of most HTTP servers if you ask for HTTP/1.1 mode and not HTTP/1.0.

Is an HTTP request 'atomic'

I understand an HTTP request will result in a response with a code and optional body.
If we call the originator of the request the 'client' and the recipient of the request the 'server'.
Then the sequence is
Client sends request
Server receives request
Server sends response
Client receive response
Is it possible for the Server to complete step 3 but step 4 does not happen (due to dropped connection, application error etc).
In other words: is it possible for the Server to 'believe' the client should have received the response, but the client for some reason has not?
Network is inherently unreliable. You can only know for sure a message arrived if the other party has acknowledged it, but you never know it did not.
Worse, with HTTP, the only acknowledge for the request is the answer and there is no acknowledge for the answer. That means:
The client knows the server has processed the request if it got the response. If it does not, it does not know whether the request was processed.
The server never knows whether the client got the answer.
The TCP stack does normally acknowledge the answer when closing the socket, but that information is not propagated to the application layer and it would not be useful there, because the stack can acknowledge receipt and then the application might not process the message anyway because it crashes (or power failed or something) and from perspective of the application it does not matter whether the reason was in the TCP stack or above it—either way the message was not processed.
The easiest way to handle this is to use idempotent operations. If the server gets the same request again, it has no side-effects and the response is the same. That way the client, if it times out waiting for the response, simply sends the request again and it will eventually (unless the connection was torn out never to be fixed again) get a response and the request will be completed.
If all else fails, you need to record the executed requests and eliminate the duplicates in the server. Because no network protocol can do that for you. It can eliminate many (as TCP does), but not all.
There is a specific section on that point on the HTTP RFC7230 6.6 Teardown (bold added):
(...)
If a server performs an immediate close of a TCP connection, there is
a significant risk that the client will not be able to read the last
HTTP response.
(...)
To avoid the TCP reset problem, servers typically close a connection
in stages. First, the server performs a half-close by closing only
the write side of the read/write connection. The server then
continues to read from the connection until it receives a
corresponding close by the client, or until the server is reasonably
certain that its own TCP stack has received the client's
acknowledgement of the packet(s) containing the server's last
response. Finally, the server fully closes the connection.
So yes, this response sent step is a quite complex stuff.
Check for example the Lingering close section on this Apache 2.4 document, or the complex FIN_WAIT/FIN_WAIT2 pages for Apache 2.0.
So, a good HTTP server should maintain the socket long enough to be reasonably certain that it's OK on the client side. But if you really need to acknowledge something in a web application, you should use a callback (image callback, ajax callback) asserting the response was fully loaded in the client browser (so another HTTP request). That means it's not atomic as you said, or at least not transactional like you could expect from a relational database. You need to add another request from the client, that maybe you'll never get (because the server had crash before receiving the acknowledgement), etc.

How does a http client associate an http response with a request (with Netty) or in general?

Is a http end point suppose to respond to requests from a particular client in order that they are received?
What about if it doesn't make sense to in the case of requests handled by cluster behind a proxy or in requests handled with NIO where one request is finished faster than the other?
Is there a standard way of associating a unique id with each http request to associate with the response? How is this handled in clients like http componenets httpclient or curl?
The question comes down to the following case:
Suppose, I am downloading a file from a server and the request is not finished. Is a client capable of completing other requests on the same keep-alive connection?
Whenever a TCP connection is opened, the connection is recognized by the source and destination ports and IP addresses. So if I connect to www.google.com on destination port 80 (default for HTTP), I need a free source port which the OS will generate.
The reply of the web server is then sent to the source port (and IP). This is also how NAT works, remembering which source port belongs to which internal IP address (and vice versa for incoming connections).
As for your edit: no, a single http connection can execute one command (GET/POST/etc) at the same time. If you send another command while you are retreiving data from a previously issued command, the results may vary per client and server implementation. I guess that Apache, for example, will transmit the result of the second request after the data of the first request is sent.
I won't re-write CodeCaster's answer because it is very well worded.
In response to your edit - no. It is not. A single persistent HTTP connection can only be used for one request at once, or it would get very confusing. Because HTTP does not define any form of request/response tracking mechanism, it simply would not be possible.
It should be noted that there are other protocols which use a similar message format (conforming to RFC822), which do allow for this (using mechanisms such as SIP's cSeq header), and it would be possible to implement this in a custom HTTP app, but HTTP does not define any standard mechanism for doing this, and therefore nothing can be done that could be assumed to work everywhere. It would also present a problem with the response for the second message - do you wait for the first response to finish before sending the second response, or try and pause the first response while you send the second response? How will you communicate this in a way that guarantees messages won't become corrupted?
Note also that SIP (usually) operates over UDP, which does not guarantee packet ordering, making the cSeq system more of a necessity.
If you want to send a request to a server while another transaction is still in progress, you will need to create a new connection to the server, and hence a new TCP stream.
Facebook did some research into this while they were building their CDN, and they concluded that you can efficiently have 2 or 3 open HTTP streams at any one time, but any more than that reduces overall transfer time because of the extra packet overhead cost. I would link to the blog entry if I could find the link...

Mapping HTTP requests to HTTP responses

If I make multiple HTTP Get Requests to the same server and get HTTP 200 OK responses to each one how do I tell which request maps to which response using Wireshark?
Currently it looks like an http request is made, and the next HTTP 200 OK response is quickly received so everything is in a the proper sequence. I have seen things to the contrary however. For example using the Google Maps API v2 I've made several requests for location information and then the information is received in an arbitrary order (closely resembling the order in which I requested it, but not necessarily perfect.)
So my intuition is I cannot assume that my responses will be received in a specific order, even though they may be in order most of the time. So I'm wondering how I can determine this order from the response.
Update: Clarification as to what I need. I just need to know that the server has received the request. It seems like I need to do this by looking at sequence numbers and perhaps even ACKS. The reasoning behind this approach is I'm basically observing a web app and checking it is sending the information and the information is being received.
Update: This has nothing to do with wireshark specifically. I believe it is confusing people so I removing it from the title. It has to do with the HTTP protocol on top of the TCP/IP protocol and how we map responses to requests.
Thanks.
After you have stopped capturing packets follow this steps:
position the cursor on a GET request
Open the Analyze menu
click "Follow TCP Stream"
You get a new window with requests and responses in sequence.
While I was googling for a complete different question, I saw this one and I think I can provide a more complete answer :
HTTP dictates that responses must arrive in the order they were requested, Therefore, if you are looking at a single TCP connection at a given time you should be seeing :
Request ; Response ; Request ; Response ...
Also in HTTP/1.1, there is support for "Pipeline" where the client doesn't have to wait for responses to arrive in order to issue the next request. What could be observed in such cases is :
Request ; Response ; Request ; Request ; Response ; Response ; Request ; Response
In the HTTP response itself, there is no reference to the specific request that triggered it.
Filipo's suggestion is classic when debugging / observing a single TCP connection, but, when observing multiple TCP connections, you can't click the follow TCP Stream because you'd have to do it for each connection.
If you have many TCP connections, and many requests/responses you will have to look at TCP Source port in the request packet, and the TCP dest port in the response packet to know which response is related to each tcp connection, and then apply the HTTP request/response order rules.
Also, Wireshark CAN decompress the response body, and it will do it automatically if all the response body has arrived, but it will do so NOT in the Follow TCP Stream.
I always use Wireshark to debug HTTP.
Seems like this ability is not provided by the HTTP protocol at the application layer so I must go down to the transportation layer to determine this. In my case the TCP/IP layer using sequence numbers.
HTTP only presumes a reliable
transport; any protocol that provides
such guarantees can be used; the
mapping of the HTTP/1.1 request and
response structures onto the
transport data units of the protocol
in question is outside the scope of
this specification.
Read more:
http://www.faqs.org/rfcs/rfc2616.html#ixzz0e20kxKcz
Don't use Wireshark to debug HTTP, use an HTTP debugger such as Fiddler2

Detecting missing responses to long running HTTP (SOAP) requests

I need a way to detect a missing response to a long running HTTP POST request. This problem arises when the network infrastructure (firewalls, proxies, unplugged cables, etc.) drops the response packets. The server may detect this failure, but the client cannot send additional bytes after the POST to probe the state of the TCP connection. The failure may be limited to a single TCP connection. For example I may be able to subsequently open a new TCP connection to the server.
I'm looking for a solution that still uses HTTP POST and does not change the duration of the server side processing.
Some solutions that I can think of are:
Provide a side channel interface to retrieve request & response history. If the history lists the response as having been send (presumably resulting in a TCP error) but I have not yet received it within a reasonable time I can generate a local error.
Use an X header to request that the server deliver "spurious" 100 Continue provisional responses on a regular interval. If I fail to see an expected 100 Continue or a non-provisional response I can generate a local error.
Is there a state of the art solution for this problem?
It sounds to me like you are using Soap for something that would be much better done using a stateful connection, or a server side push technology.

Resources