Why can HTTP handle only one pending request per socket?

Being curious, I wonder why HTTP, by design, can only handle one pending request per socket.
I understand that this limitation is because there is no 'Id' to associate a request to its response, so the only way to match a response with its request is to send the response on the same socket that sent the request. There would be no way to match a response to its request if there was more than one pending request on the socket because we may not receive the responses in the same order requests were sent.
If the protocol had been designed to have a matching 'Id' for requests and responses, there could be multiple pending requests on only one socket. This could greatly reduce the number of socket used by internet browsers and applications using web services.
Was HTTP designed like this for simplicity even if it's less efficient or am I missing something and this is the best approach?

Not true. Read about HTTP1.1 pipelining. Apache implements it and Firefox implements it. Although Firefox disables it by default.
To turn it on in Firefox use about:config and write 'pipelining' in the filter.
It's basically for simplicity; various proposals have been made over the years that multiplex on the same connection (e.g. SPDY) but none have taken off yet.

One problem with sending multiple requests on a single socket is that it would cause inefficient queuing.
For instance, lets say you are in a store and there are 2 cashiers, and 10 people waiting to be checked out. The ideal way to make the line is to have a single queue of 10 people and the next person in line goes to a cashier when they become available. However, if you sent all the requests at once you would probably send 5 people to cashier A and 5 to cashier B. However, what if you sent the 5 people with the largest shopping carts to the same cashier? That's bad queuing and what could happen if you queued a bunch of requests on a single socket.
NOTE: I'm not saying that you couldn't use queuing well, but it keeps it simple to do it right if there is no queuing on a single socket.

There are a few concidertaions I would review.
The first is related to the nature of TCP itself. TCP suffers from 'head-of-line' blocking issue where there can only be a single outstanding (unacknowledged) request (connection/TCP level) in flight. Given traditional latencies this can be a problem from a load time user experience perspective compared to results of parallel connection scheme browsers employ today. The higher the latency of the link the larger the impact of this fundemental limitation.
There is also a concurrency issue in that sometimes you really want to load multiple resources incrementally / in parallel. Back in the day one of the greatest features mozilla had over mosaic was that it would load images and objects incrementally so you could begin to see what was going on and use a resource without having to wait for it to load. With fewer connections there is a risk in that for example loading a large image on page before a style sheet can be catastrophic from an experience point of view. Expecting some kind of mitigating intelligence or explicit configuration to optimally order requests may not be a realistic or ideal solution.
There are proposals such as HTTP over SCTP that will more or less totally correct the issue you raise at the transport level.

Also realize that HTTP doesn't necessarily mandate a Content-Length header to serve data. Even if each HTTP response was ID'd, how would you manage streaming binary content with no content length (HTTP/1.0 style)? or if the client sent the Connection: close header to have the client close due to non-known lengths?
To manage this you would have to HTTP chunk (already present) in multiplex (I don't think anyone implements this) and add some non-trivial work to many programs.


Load balancing TCP traffic using Apache Camel with Netty leads to transaction failures

I am new to Apache Camel and Netty and this is my first project. I am trying to use Camel with the Netty component to load balance heavy traffic in a back end load test scenario.This is the setup I have right now:
The issue is unexpected buffer sizes that I am receiving in the response that I see in the client system sending tcp traffic to Camel. When I send multiple requests one after the other I see no issues and the buffer size is as expected. But, when I try running multiple users sending similar requests to Camel on the same port, I intermittently see unexpected buffer sizes, sometimes 0 bytes to sometimes even greater than the expected number of bytes. I tried playing around with multiple options mentioned in the Camel-Netty page like:
Increasing backlog
stream caching (did not work)
disabled useOriginalMessage for performance
System level TCP parameters, etc. among others.
I am yet to resolve the issue. I am not sure if I'm fundamentally missing something. I did take a look at the encoder/decoders and guess if that could be an issue. But, I don't understand why a load balancer needs to encode/decode messages. I have worked with other load balancers which just require endpoint configurations and hence, I am assuming that Camel does not require this. Am I right? Please know that the issue is not with my client/backend as I ran a 2000 user load test from my client to the backend with less than 1% failures but see a large number of failure ( not that there are no successes) with Camel. I have the following questions:
1.Is this a valid use-case for Apache Camel- Netty? Should I be looking at Mina or others?
2.Can I try to route tcp traffic to JMS or other components and then finally to the tcp endpoint?
3.Do I need encoders/decoders or should this configuration work?
4.Should I continue with this approach or try some other load balancer?
Please let me know if you have any other suggestions. TIA.
I also tried the same approach with netty4 and mina components. The route looks similar to the one in netty. The route with netty4 is as follows:
I read a few posts which had the same issue but did not find any solution relevant to my issue.
I increased the receive timeout at my client and immediately noticed the mismatch in expected buffer length issue fall to less than 1%. However, I see that the response times for each transaction when using Camel and not using it is huge; almost 10 times higher. Can you help me with reducing the response times for each transaction? The message received back at my client varies from 5000 to 20000 bytes. Here is my latest route:
I also used certain performance enhancements like:
Can you point me in the right direction about how I can reduce the individual transaction times?
For netty4 component there is no parameter called defaultCodec. It is called allowDefaultCodec. http://camel.apache.org/netty4.html
Also, try something like this first.
The above means the data being sent is normal text. If you are sending byte or something else you will need to provide decoding/encoding for netty to handle the data.
And a side note. Before running the Camel route, test manually to send test messages via a standard tcp tool like sockettest to verify that everything works. Then implement the same via Camel. You can find sockettest here http://sockettest.sourceforge.net/ .
I finally solved the issue with the same route settings as above. The issue was with the Request and Response Delimiter not configured properly due to which it was either closing the connection too early leading to unexpected buffer sizes or it was waiting too long even after the entire buffer was received leading to high response times.

How can HTTP pipelining make performance worse?

It's a popular claim that HTTP pipelining can degrade performance of downloading sites due to the head of line (HoL) blocking phenomena. Is this performance compared to a single non-pipelined persistent HTTP connection or to multiple TCP connections opened simultaneously in order to download resources of the site in parallel?
In the first case I can't really see how large response blocking sending subsequential smaller ones can result in performance loss. Yes, this blocking will occur. But in the case of a single non-pipelined persistent HTTP connection the HoL blocking phenomena occurs every time the client sends a request and every time the server sends a response. The only reasons of performance being potentially worse here I was able to think of are that:
1) the time needed to properly queue/buffer requests/responses may be longer that the time saved by the fact that the server can start processing n-th request without waiting for the processing of (n-1)-th request to complete. But it basically comes down to numbering the requests/responses correctly, so it seems to be more of a concern if many small requests have to be dealt with (it's unlikely that queueing/buffering-related computations will take more time than processing a large response and people indicating that HoL blocking can be a problem refer to large responses, not to small ones) and it is not directly related to the HoL blocking;
2) if many clients had pipelining enabled then it is possible that many large responses would have to be buffered effectively leading to memory exhaustion on the server side. But this is a kind of special situation and clearly it is not what people have in mind when speaking about enabling pipeling in a browser being able to make performance worse.
On the other hand, in the case of comparison of pipelining to multiple simultaneous TCP connections it is readily seen that the necessity to send large response before sending subsequential smaller ones will slow things down.
However, if the comparison is made to a single non-pipelined HTTP connection and pipelining can indeed result in performance loss - can you demonstrate some basic (perhaps simplified) calculations showing that?
I tried to search the response to my question on the Internet but was unable to find it.
Some of the resources that I tried:
What are the disadvantage(s) of using HTTP pipelining?

What is the recommended HTTP POST content length?

I have several clients that constantly post data to a REST service. REST service is put behind a network load balancer. Each client sends 100 - 500 MB a day and I need to support 500+ clients.
I can POST either very large packets, this will reduce overhead for TCP/IP session set up and HTTP headers. This will, however, firmly tie one client to a particular server and limit my scalability options. Alternatively, I can send small HTTP packets, which I can load balance well, but I will get more overhead for TCP/IP session set up and HTTP headers.
What is the recommended packet size for HTTP POST? Or how can I calculate one for my environment?
There is no recommended size.
While HTTP POST size is not constrained by the RFCs, since HTTP is a commodity protocol implementing request / response type messaging, most of the infrastructure is configured around the idea that TCP connections are not particularly long lasting / does not carry significant amounts of data. i.e. there will be factors outside your control which may impact the service - although HTTP supports range requests for responses, there is no corollary for requests.
You can get around a lot of these (although not all) by using HTTPS. However you still need to think about how you detect/manage outages - are you happy to wait for a TCP timeout?
With 500+ clients presumably using the system quite heavily, the congestion avoidance limits shouldn't be a problem - whether TCP window scaling is likely to be an issue depends on how the system is used. HTTP handshakes should not be an issue unless you restrict the request size to something silly.
If the service is highly dependant on clients pushing lots of data on to your server, then I'd encourage you to look at parsing the data on the client (given the volume, presumably it's coming from files - implying a signed java applet or javascript with UniversalBrowserRead privilege) then sending it over a bi-directional communication channel (e.g. websocket).
Leaving that aside for now, the only way you can find out what the route between your clients and your server will support is to measure it - and monitor it. I would expect that a 2Mb upload size would work pretty much anywhere, while a 10Mb size would work most of the time within the US or Europe - and that you could probably increase this to 50Mb as long as there's no mobile clients.
But if you want to maintain the effectiveness of the service you'll need to monitor bandwidth, packet loss and lost connections.

Non-serial pipelined HTTP possible?

RFC 2616 section states:
A client that supports persistent connections MAY "pipeline" its requests (i.e., send multiple requests without waiting for each response). A server MUST send its responses to those requests in the same order that the requests were received.
Serial responses are often more harm than good, since serial responses actually require the server to do more processing and negates the performance benefits gained by pipelining.
For example, if a HTTP client requests for files 1.jpg, 2.jpg, 3.jpg, 4.jpg, and 5.jpg, it doesn't matter if 3.jpg is returned before 1.jpg, or if 4.jpg is returned before 3.jpg. The client simply want the responses as soon as they are available, in any order.
How can a HTTP client gain the benefits of pipelining, and at the same time not pay for the disadvantages of response queueing?
A client can't circumvent HOL-queueing as it's part of RFC 2616. The only benefit of pipelining (in my opinion) is in extremely specific and narrow cases. Consider:
R1cost = Request A processing cost.
R2cost = Request B processing cost.
TCPcost = Cost of negotiating new TCP connection.
Using pipelining would, therefore, be viable in specific cases where:
R1cost ≥ R2cost ≤ TCPcost
How often is a request more expensive than a previous request and less expensive than negotiating a new TCP connection? Not often. I would add that Websockets are (by far) a more interesting and appropriate solution (as far as parallel back-end processing is concerned).
It can't (in HTTP/1.1). It might be in a future version of HTTP.
There is no default mechanism in the HTTP headers to identify which response would match which request. A response is known to be that to a specific request because of the order in which it's received. If you requested 1.jpg, 2.jpg, 3.jpg, 4.jpg, and 5.jpg and sent the responses in any order, you wouldn't know which one is which.
(You could implement your own markers in client and server headers, but you'd certainly not be compliant with the protocol and most implementations would not know how to deal with that. You would have to do some processing to map, which may negate the anticipated benefits of this parallel implementation too.)
The main benefits you get from the existing HTTP pipeline mechanism are:
Possible reduced communication latency. This may matter depending on your connection.
For request that require some longer server-side computation, the server could start this computation in the background, upon reception of the request, while it's sending a previous response, so as to be able to start sending the second result earlier. (This is also a form a latency, but in terms of response preparation.)
Some of these benefits can also be gained by more modern web-browser techniques, where multiple requests can be sent separately and parts of the page may be updated progressively (via AJAX).

How do CSS sprites speed up a web site?

I'm trying to understand how CSS sprites improve performance on a site?
Why is the downloading of several small images slower than the download of a single image holding the smaller images if the total size of the single image is the sum of the smaller images?
It's important to understand why the overhead of an HTTP request has such an impact.
In its simplest form, an HTTP request consists of opening a socket, sending the request on the open socket and reading the response.
To open a socket, the client's TCP/IP stack sends a TCP SYN packet to the server. The server responds with a SYN-ACK, and the client responds to that with an ACK.
So, before you send a single byte of application data, you have to wait for a whole one and a half round trips to the server, at least.
Then the client needs to send the request, wait for the server to parse the request, find the requested data, send it back - that's another round trip plus some server side overhead (hopefully a small overhead, although I've seen some slow servers) plus the time to transmit the actual data, and that's the best case, assuming no network congestion which would result in packets being dropped and retransmitted.
Every chance you have to avoid this, you should.
Modern browsers will issue multiple requests in parallel in an attempt to reduce some of the overhead involved. HTTP requests can theoretically be done on the same socket, making things a little better. But in general, network round trips are bad for performance, and should be avoided.
Fewer round-trips to the server. Instead of 6 (say) requests for 6 different images, you get one request and 6 uses of the same image. If the server is going to respond "it hasn't changed since the last time you asked" most of the time, that can be a significant reduction in the amount of network traffic.
Because multiple images require multiple http requests. See Yahoo's performance rule #1: Minimize HTTP Requests.
In addition to minimizing the number of requests, depending on the images, you also might find that the file size is smaller combined than it would be if they are separated (due, I think, to the reduced amount of metadata, among other things). Another added benefit to using sprites is that you don't have the flicker effect when you first hover over an element that has a hover state, which can improve user perception of your page's performance. An interesting resource on image optimization you might want to read is this series of blog posts on the Yahoo User Interface Blog. On rereading Yahoo's recommended practices for performance, I was surprised to see that they also suggested that arranging your images horizontally rather than vertically can also reduce your file size.
Aside from the reasons above, I find them easier to work with. You only have one file you need to modify and upload, and one URL to change in your code, if you update the image.
