How can HTTP pipelining make performance worse? - http

It's a popular claim that HTTP pipelining can degrade performance of downloading sites due to the head of line (HoL) blocking phenomena. Is this performance compared to a single non-pipelined persistent HTTP connection or to multiple TCP connections opened simultaneously in order to download resources of the site in parallel?
In the first case I can't really see how large response blocking sending subsequential smaller ones can result in performance loss. Yes, this blocking will occur. But in the case of a single non-pipelined persistent HTTP connection the HoL blocking phenomena occurs every time the client sends a request and every time the server sends a response. The only reasons of performance being potentially worse here I was able to think of are that:
1) the time needed to properly queue/buffer requests/responses may be longer that the time saved by the fact that the server can start processing n-th request without waiting for the processing of (n-1)-th request to complete. But it basically comes down to numbering the requests/responses correctly, so it seems to be more of a concern if many small requests have to be dealt with (it's unlikely that queueing/buffering-related computations will take more time than processing a large response and people indicating that HoL blocking can be a problem refer to large responses, not to small ones) and it is not directly related to the HoL blocking;
2) if many clients had pipelining enabled then it is possible that many large responses would have to be buffered effectively leading to memory exhaustion on the server side. But this is a kind of special situation and clearly it is not what people have in mind when speaking about enabling pipeling in a browser being able to make performance worse.
On the other hand, in the case of comparison of pipelining to multiple simultaneous TCP connections it is readily seen that the necessity to send large response before sending subsequential smaller ones will slow things down.
However, if the comparison is made to a single non-pipelined HTTP connection and pipelining can indeed result in performance loss - can you demonstrate some basic (perhaps simplified) calculations showing that?
I tried to search the response to my question on the Internet but was unable to find it.
Some of the resources that I tried:
https://devcentral.f5.com/articles/http-pipelining-a-security-risk-without-real-performance-benefits
What are the disadvantage(s) of using HTTP pipelining?

Related

Deciding whether to use ws or wss?

I have an account with a trading exchange and they have a websockets API which supports ws://... and wss://...
For the non-authenticated channels such as the current state of the orderbook, is it an easy decision to just use ws, mostly for the (however minimal) time savings? Obviously I would like to have my data as recent as possible.
I just want to check there's not some other factor which is more important than a few TLS encryption CPU cycles and ms in latency saved.
I see absolutely no reason not to use wss for all your connections, particular with a webSocket. The normal webSocket use is to make a connection, then keep that connection for a long time and use it. While there is a bit of overhead on every transmission because of the encryption, the main wss overhead is when you first make the connection and that only happens once per connection.
I just want to check there's not some other factor which is more important than a few TLS encryption CPU cycles and ms in latency saved.
No, there is not some other factor. In fact, the opposite. There are more and more and more reasons these days to use TLS whenever possible to protect your privacy.
For the non-authenticated channels such as the current state of the orderbook, is it an easy decision to just use ws, mostly for the (however minimal) time savings?
Why? If wss is available, I'd be using it for everything. If you actually run into a CPU problem down the road, you could revisit whether using wss has anything to do with it, but that is unlikely to happen and, in my opinion, you have nothing to lose by starting with wss. While one wants to design code intelligently, you don't want to try to micro-optimize performance-related things before you even have a documented, measured performance issue to worry about.
General reasons to use TLS:
Privacy (nobody in the middle can snoop on what you're doing)
Security of data (nobody can read your data, not even proxies)
Security of endpoint (endpoint you're connecting to can't be hijacked without you knowing about it)
I just want to check there's not some other factor which is more important than a few TLS encryption CPU cycles and ms in latency saved.
Actually, there is.
This is also stated in the RFC:
At the time of writing of this specification, it should be noted that connections on ports 80 and 443 have significantly different success rates, with connections on port 443 being significantly more likely to succeed, though this may change with time.
Some network intermediaries (especially with some mobile providers) will fail on ws connections but work properly when using wss.
The reason seems to be that these intermediaries (proxies / routers) will attempt to read the WebSocket message as if it were HTTP and "fix" HTTP errors or resolve caching (which actually corrupts WebSocket data).
The encrypted wss protocol will trigger a pass-through mode, since these intermediaries won't be able to read the data or "fix" any HTTP errors.
The Websocket protocol uses client-side frame masking for the same purpose, but sometimes with limited results. Using wss increases connectivity on some networks.

Http - How Are Parallel Connections Transmitted?

I'm taking a google video course about the http protocol. The http 1.1 introduced so called the pipeling technique to reduce a time between requestes and responses. There might occur the head of line blocking, so browsers uses parallel connections to avoid the HOL blocking.
I wonder, how does browsers send parallel network packets? I have never thought about possibility of multiple packets sent simultaneously, is it even possible to send parallel requests through a "cable"? How does it work?
Another thing is the http 2.0, does browsers implement parallel connections in this protocol? The http 2.0 uses the streams, but I'm not sure how browsers handles it.
Nothing in HTTP is truly parallel. If multiple resources are to be tranferred at once, clients have to establish multiple connections. For HTTP/1.1, it is not uncommon to see three to five of these per host.
HTTP/2 is a bit different in that it can engage into interleaving: The smalest entity in HTTP/1.x is a message whereas in HTTP/2 this would be a frame of a message. This allows HTTP/2 to transmit multiple messages "at once" (really: one frame of a given message at a time) while HTTP/1.1 could just start pipelining and possible suffer from HOL-blocking as you mentioned.
As for your question regarding multiple packets being sent simultaneously: Yes, that is possible and is also regularly done. That would concern wave physics, fourier transformations, and electrical engineering and thus be a bit off-topic for SO ;)

Non-serial pipelined HTTP possible?

RFC 2616 section 8.1.2.2 states:
A client that supports persistent connections MAY "pipeline" its requests (i.e., send multiple requests without waiting for each response). A server MUST send its responses to those requests in the same order that the requests were received.
Serial responses are often more harm than good, since serial responses actually require the server to do more processing and negates the performance benefits gained by pipelining.
For example, if a HTTP client requests for files 1.jpg, 2.jpg, 3.jpg, 4.jpg, and 5.jpg, it doesn't matter if 3.jpg is returned before 1.jpg, or if 4.jpg is returned before 3.jpg. The client simply want the responses as soon as they are available, in any order.
How can a HTTP client gain the benefits of pipelining, and at the same time not pay for the disadvantages of response queueing?
A client can't circumvent HOL-queueing as it's part of RFC 2616. The only benefit of pipelining (in my opinion) is in extremely specific and narrow cases. Consider:
R1cost = Request A processing cost.
R2cost = Request B processing cost.
TCPcost = Cost of negotiating new TCP connection.
Using pipelining would, therefore, be viable in specific cases where:
R1cost ≥ R2cost ≤ TCPcost
How often is a request more expensive than a previous request and less expensive than negotiating a new TCP connection? Not often. I would add that Websockets are (by far) a more interesting and appropriate solution (as far as parallel back-end processing is concerned).
It can't (in HTTP/1.1). It might be in a future version of HTTP.
There is no default mechanism in the HTTP headers to identify which response would match which request. A response is known to be that to a specific request because of the order in which it's received. If you requested 1.jpg, 2.jpg, 3.jpg, 4.jpg, and 5.jpg and sent the responses in any order, you wouldn't know which one is which.
(You could implement your own markers in client and server headers, but you'd certainly not be compliant with the protocol and most implementations would not know how to deal with that. You would have to do some processing to map, which may negate the anticipated benefits of this parallel implementation too.)
The main benefits you get from the existing HTTP pipeline mechanism are:
Possible reduced communication latency. This may matter depending on your connection.
For request that require some longer server-side computation, the server could start this computation in the background, upon reception of the request, while it's sending a previous response, so as to be able to start sending the second result earlier. (This is also a form a latency, but in terms of response preparation.)
Some of these benefits can also be gained by more modern web-browser techniques, where multiple requests can be sent separately and parts of the page may be updated progressively (via AJAX).

Web Browser Parallel Downloads vs Pipelining

I always knew web browsers could do parallel downloads. But then the other day I heard about pipelining. I thought pipelining was just another name for parallel downloads, but then found out even firefox has pipelining disabled by default. What is the difference between these things and how do work together?
As I under stand it, "parallel downloads" are requests going out on multiple sockets. They can be to totally unrelated servers but they don't have to be.
Pipelining is an HTTP/1.1 feature that lets you make multiple requests on the same socket before receiving a response. When connecting to the same server, this reduces the number of sockets, conserving resources.
I think this MDC article explains HTTP pipelining pretty darn well.
What is HTTP pipelining?
Normally, HTTP requests are issued sequentially, with the next request being issued only after the response to the current request has been completely received. Depending on network latencies and bandwidth limitations, this can result in a significant delay before the next request is seen by the server.
HTTP/1.1 allows multiple HTTP requests to be written out to a socket together without waiting for the corresponding responses. The requestor then waits for the responses to arrive in the order in which they were requested. The act of pipelining the requests can result in a dramatic improvement in page loading times, especially over high latency connections.
Pipelining can also dramatically reduce the number of TCP/IP packets. With a typical MSS (maximum segment size) in the range of 536 to 1460 bytes, it is possible to pack several HTTP requests into one TCP/IP packet. Reducing the number of packets required to load a page benefits the internet as a whole, as fewer packets naturally reduces the burden on IP routers and networks.
HTTP/1.1 conforming servers are required to support pipelining. This does not mean that servers are required to pipeline responses, but that they are required to not fail if a client chooses to pipeline requests. This obviously has the potential to introduce a new category of evangelism bugs, since no other popular web browsers implement pipelining.
I recommend reading the whole article since there's more than what I copied into my answer.

Why can HTTP handle only one pending request per socket?

Being curious, I wonder why HTTP, by design, can only handle one pending request per socket.
I understand that this limitation is because there is no 'Id' to associate a request to its response, so the only way to match a response with its request is to send the response on the same socket that sent the request. There would be no way to match a response to its request if there was more than one pending request on the socket because we may not receive the responses in the same order requests were sent.
If the protocol had been designed to have a matching 'Id' for requests and responses, there could be multiple pending requests on only one socket. This could greatly reduce the number of socket used by internet browsers and applications using web services.
Was HTTP designed like this for simplicity even if it's less efficient or am I missing something and this is the best approach?
Thanks.
Not true. Read about HTTP1.1 pipelining. Apache implements it and Firefox implements it. Although Firefox disables it by default.
To turn it on in Firefox use about:config and write 'pipelining' in the filter.
see: http://www.mozilla.org/projects/netlib/http/pipelining-faq.html
It's basically for simplicity; various proposals have been made over the years that multiplex on the same connection (e.g. SPDY) but none have taken off yet.
One problem with sending multiple requests on a single socket is that it would cause inefficient queuing.
For instance, lets say you are in a store and there are 2 cashiers, and 10 people waiting to be checked out. The ideal way to make the line is to have a single queue of 10 people and the next person in line goes to a cashier when they become available. However, if you sent all the requests at once you would probably send 5 people to cashier A and 5 to cashier B. However, what if you sent the 5 people with the largest shopping carts to the same cashier? That's bad queuing and what could happen if you queued a bunch of requests on a single socket.
NOTE: I'm not saying that you couldn't use queuing well, but it keeps it simple to do it right if there is no queuing on a single socket.
There are a few concidertaions I would review.
The first is related to the nature of TCP itself. TCP suffers from 'head-of-line' blocking issue where there can only be a single outstanding (unacknowledged) request (connection/TCP level) in flight. Given traditional latencies this can be a problem from a load time user experience perspective compared to results of parallel connection scheme browsers employ today. The higher the latency of the link the larger the impact of this fundemental limitation.
There is also a concurrency issue in that sometimes you really want to load multiple resources incrementally / in parallel. Back in the day one of the greatest features mozilla had over mosaic was that it would load images and objects incrementally so you could begin to see what was going on and use a resource without having to wait for it to load. With fewer connections there is a risk in that for example loading a large image on page before a style sheet can be catastrophic from an experience point of view. Expecting some kind of mitigating intelligence or explicit configuration to optimally order requests may not be a realistic or ideal solution.
There are proposals such as HTTP over SCTP that will more or less totally correct the issue you raise at the transport level.
Also realize that HTTP doesn't necessarily mandate a Content-Length header to serve data. Even if each HTTP response was ID'd, how would you manage streaming binary content with no content length (HTTP/1.0 style)? or if the client sent the Connection: close header to have the client close due to non-known lengths?
To manage this you would have to HTTP chunk (already present) in multiplex (I don't think anyone implements this) and add some non-trivial work to many programs.

Resources