HTTP/1.x has a problem called "head-of-line blocking"
HTTP/1.1 tried to fix this with pipelining
Multiplexing addresses these problems by allowing multiple request and response messages to be in flight at the same time; it’s even possible to intermingle parts of one message with another on the wire
Does this reason minimize importance of domain sharding for resource and also bundling resource, spriting images etc? If this is true, should I at least plan for a refactoring? And how does this work?
Multiplexing takes every request and makes them into one request, therefore optimisations surrounding reducing the number of requests are far less useful than in HTTP 1. I would suggest that you plan for refactoring your site/app only if you are migrating your server to HTTP2. Modern browsers are adopting it, however server implementations vary. This was done to ensure that we as developers have the choice to upgrade to HTTP2, in contrast to a 'forced upgrade'.
Related
I read in bundling parts of systemjs documentation that bundling optimizations no longer needed in HTTP/2:
Over HTTP/2 this approach may be preferable as it allows files to be
individually cached in the browser meaning bundle optimizations are no
longer a concern.
My questions:
It means we don't need to think of bundling scripts or other resources when using HTTP/2?
What is in HTTP/2 which makes this feature enable?
The bundling optimization was introduced as a "best practice" when using HTTP/1.1 because browsers could only open a limited number of connections to a particular domain.
A typical web page has 30+ resources to download in order to be rendered.
With HTTP/1.1, a browser opens 6 connections to the server, request 6 resources in parallel, wait for those to be downloaded, then request other 6 resources and so forth (or course some resource will be downloaded faster than others and that connection could be reused before than others for another request).
The point being that with HTTP/1.1 you can only have at most 6 outstanding requests.
To download 30 resources you would need 5 roundtrips, which adds a lot of latency to the page rendering.
In order to make the page rendering faster, with HTTP/1.1 the application developer had to reduce the number of requests for a single page.
This lead to "best practices" such as domain sharding, resource inlining, image spriting, resource bundling, etc., but these are in fact just clever hacks to workaround HTTP/1.1 protocol limitations.
With HTTP/2 things are different because HTTP/2 is multiplexed.
Even without HTTP/2 Push, the multiplexing feature of HTTP/2 renders all those hacks useless, because now you can request hundreds of resources in parallel using a single TCP connection.
With HTTP/2, the same 30 resources would require just 1 roundtrip to be downloaded, giving you a straight 5x performance increase in that operation (that typically dominates the page rendering time).
Given that the trend of web content is to be richer, it will have more resources; the more resources, the better HTTP/2 will perform with respect to HTTP/1.1.
On top of HTTP/2 multiplexing, you have HTTP/2 Push.
Without HTTP/2 Push, the browser has to request the primary resource (the *.html page), download it, parse it, and then arrange to download the 30 resources referenced by the primary resource.
HTTP/2 Push allows you to get the 30 resources while you are requesting the primary resource that references them, saving one more roundtrip, again thanks to the HTTP/2 multiplexing.
It is really the multiplexing feature of HTTP/2 that allows you to forget about resource bundling.
You can look at the slides of the HTTP/2 session that I gave at various conferences.
HTTP/2 supports "server push" which obsoletes bundling of resources. So, yes, if you are you using HTTP/2, bundling would actually be an anti-pattern.
For more info check this: https://www.igvita.com/2013/06/12/innovating-with-http-2.0-server-push/
Bundling is doing a lot in a modern JavaScript build.
HTTP/2 only addresses the optimisation of minimising the amount of requests between the client and server by making the cost of additional requests much cheaper than with HTTP/1
But bundling today is not only about minimising the count of requests between the client and the server. Two other relevant aspects are:
Tree Shaking: Modern bundlers like WebPack and Rollup can eliminate unused code (even from 3rd party libraries).
Compression: Bigger JavaScript bundles can be better compressed (gzip, zopfli ...)
Also HTTP/2 server push can waste bandwidth by pushing resources that the browser does not need, because he still has them in the cache.
Two good posts about the topic are:
http://engineering.khanacademy.org/posts/js-packaging-http2.htm
https://www.contentful.com/blog/2017/04/04/es6-modules-support-lands-in-browsers-is-it-time-to-rethink-bundling/
Both those posts come to the conclusion that "build processes are here to stay".
Bundling is still useful if your website is
Served on HTTP (HTTP 2.0 requires HTTPS)
Hosted by a server that does not support ALPN and HTTP 2.
Required to support old browsers (Sensitive and Legacy Systems)
Required to support both HTTP 1 and 2 (Graceful Degradation)
There are two HTTP 2.0 features that makes bundling obsolete:
HTTP 2.0 Multiplexing and Concurrency (allows multiple resources to be requested on a single TCP connection)
HTTP 2.0 Server Push (Server push allows the server to preemptively push the responses it thinks the client will need into the client's cache)
PS: Bundling is not the lone optimization technique that would be eliminated by the insurgence of HTTP 2.0 features. Features like image spriting, domain sharding and resource inlining (Image embedding through data URIs) will be affected.
How HTTP 2.0 affects existing web optimization techniques
I have a question about SPDY/HTTP2:
Normally you concatenate multiple CSS and JS files into one file to save requests and to get a better performance. I heard that SPDY/HTTP2 combines multiple requests into a single response. Would that mean that I don't need to pre-concatenate CSS and JS files anymore, because this is handled by the protocol?
To say it in other words:
Can I use <script source="moduleA.js"></script> and <script source="moduleB.js"></script> with SPDY/HTTP2 in the same way as I would use <script source="allScripts.js"></script> with HTTP1? Is this the same from a response performance point of view, but with the benefit of caching each file on its own, so that I can change moduleB.js and keep moduleA.js cached?
HTTP/2.0 does not (AFAIK) exist yet - it's still a proposed standard. But it seems likely that it will use similar connection handling to SPDY.
SPDY doesn't concatenate them it multiplexes the requests across the same connection - from the network's point of view the effect is the same.
Yes, you don't need to merge the content files by hand, yes they will be cached independently.
SPDY3 and HTTP2 are multiplexing requests on the same physical connection.
But even multiplexed, requests may be sent sequentially for each resource, causing major slowdowns due to roundtrip time waits.
Both SPDY3 and HTTP2 have a feature called "Resource Push" (also known as "SPDY Push", not to be confused with "Server Push") that allows related resources to be pushed without the client requesting them, and the Jetty project - I am a committer - is the only one to my knowledge that implements that feature.
You can watch Resource Push in action in this video: http://webtide.intalio.com/2012/10/spdy-push-demo-from-javaone-2012/.
With Resource Push, you save additional roundtrips to get all the different JS files and still benefit of the browser cache per single file.
The whole point of resource concatenation is exactly to reduce the number of roundtrips necessary to get all the resources needed, and Resource Push helps to solve that problem.
HTTP/2.0 allows for multiplexing, where multiple request/response streams exchange data over the same TCP connection.
Because creating and starting TCP connections is expensive, HTTP/2.0's multiplexing will usually be faster than the semi-parallel downloading of HTTP/1.1, where a limited amount of TCP connections is (re)used by the browser to perform a given amount of requests for resources.
But your mileage may vary. Measure it.
As a sidenote, you might want to reference all your libraries separately when developing and debugging, but bundle and minify the JS/CSS into one file upon a deploy.
WWDC 2012 session 706 - Networking Best Practices explains HTTP Pipelining.
By default its disabled on iOS
In the talk its described it as a huge performance win.
Why might you not want to use it?
Implementation bugs
For pipelining to work, responses must come back in the order they were requested. A naive server implementation might just send the response as soon as it has been calculated. If multiple requests are sent in parallel, and the first request one takes longer to process (e.g. processing a larger image), then the responses will be out of order.
This is a problem for the client since HTTP is a stateless protocol, the client has no way to match the requests with the responses. It is reliant on the order the responses came back in.
A server MUST send its responses to those requests in the same order that the requests were received.
Safari apparently used HTTP pipelining for images at least. This results in an issue where images would be swapped around.
AFNetworking used to use pipelining, but it was pulled after a reported issue.
All major browsers (other than Opera) have HTTP pipelining is disabled or not implemented.
Performance issues
Even if the server does properly support pipelining, performance issues can arise because all subsequent requests have to wait for the first one to be complete (Head of Line blocking).
This article, talks about performance loss in some circumstances and a potential of denial of service attack.
This article also suggest that pipelining isn't a massive win.
WWDC 2015 - Networking with NSURLSession explains head of line blocking really well. (The solution is to switch to HTTP 2 which support priorities)
So in summary the issues with HTTP pipelining are:
Some servers & most proxies don't support it. (Perhaps due to security / reliability / or performance concerns)
Some servers support it incorrectly and this can lead to client bugs.
It is not necessarily a performance win.
Susceptible to head of line blocking
HTTP 1.1 supports keep alive connections, connections are not closed until "Connection: close" is sent.
So, if the browser, in this case firefox has network.http.pipelining enabled and network.http.pipelining.maxrequests increased isn't the same effect in the end?
I know that these settings are disabled because for some websites this could increase load but I think a simple http header flag could tell the browser that is ok tu use multiplexing and this problem can be solved easier.
Wouldn't be easier to change default settings in browsers than invent a new protocol that increases complexity especially in the http servers?
SPDY has a number of advantages that go beyond what HTTP pipelining can offer, which are described in the SPDY whitepaper:
With pipelining, the server still has to return the responses one at a time in the order they were requested. This can be a problem if the client requests a resource that's dynamically generated before one that is static: the server cannot send any of the "easy" static responses until the dynamically generated one has been generated and sent. With SPDY, responses can be returned out of order or in parallel as they are generated, lowering the total time to receive all resources.
As you noted in your question, not all servers are able to deal with pipelining: it's not just load, some servers actually behave incorrectly when the client requests pipelining. Using a header to indicate that it's okay to do pipelining is too late to get the maximum benefit: you are already receiving the first response at that point, so while you can use it on future connections it's already too late for this one.
SPDY compresses headers using an algorithm which is specific to that task (stateful and with knowledge of what is normally in HTTP headers); while yes, SSL already includes compression, just compressing them with deflate is not as efficient. Most HTTP requests have no bodies and only a short GET line, so the headers make up virtually the entire request: any compression you can get is an improvement. Many responses are also small compared to their headers.
SPDY allows servers to send back additional responses without the client asking for them. For example, a server might start sending back the CSS for a page along with the original HTML, before the client has had a chance to receive and parse the HTML to determine the stylesheet URL. This can speed up page loads even further by eliminating the need for the client to actually parse the HTML before requesting other resources needed to render the page. It also supports a less bandwidth-heavy version of this feature where it can "hint" about which resources might be needed, and allow the client to decide: this allows, for example, clients that don't care about images to not bother to request them, but clients that want to display images can still request the images using the given URLs without needing to wait for the HTML.
Other things too: see William Chan's answer for even more.
HTTP pipelining is susceptible to head of line blocking (http://en.wikipedia.org/wiki/Head-of-line_blocking) at the HTTP transaction level whereas SPDY only has head of line blocking at the transport level, due to its use of multiplexing.
HTTP pipelining has deployability issues. See https://datatracker.ietf.org/doc/html/draft-nottingham-http-pipeline-01 which describes a number of different workarounds and heuristics to mitigate this. SPDY as deployed in the wild does not have this problem since it is generally deployed over SSL (port 443) using NPN (http://technotes.googlecode.com/git/nextprotoneg.html) to negotiate SPDY support. SSL is key, since it prevents intermediaries from interfering.
SPDY has header compression. See http://dev.chromium.org/spdy/spdy-whitepaper which discusses some benchmark results of the benefits of header compression. Now, it's useful to note that bandwidth is less and less of an issue (see http://www.belshe.com/2010/05/24/more-bandwidth-doesnt-matter-much/), but it's also useful to remember that bandwidth is still key for mobile. Check out https://developers.google.com/speed/articles/spdy-for-mobile which shows how beneficial SPDY is for mobile.
SPDY supports features like server push. See http://dev.chromium.org/spdy/spdy-best-practices for ways to use server push to improve cacheability of content and still reduce roundtrips.
HTTP pipelining has ill-defined failure semantics. When the server closes the connection, how do you know which requests have been successfully processed? This is a major reason why POST and other non-idempotent requests are not allowed over pipelined connections. SPDY provides semantics to cancel individual streams on the same connection, and also has a GOAWAY frame which indicates the last stream to be successfully processed.
HTTP pipelining has difficulty, often due to intermediaries, in allowing deep pipelines. This (in addition to many other reasons like HoL blocking) means that you still need to utilize multiple TCP connections to achieve maximal parallelization. Using multiple TCP connections means that congestion control information cannot be shared, that compression contexts cannot be shared (like SPDY does with headers), is worse for the internet (more costly for intermediaries and servers).
I could go on and on about HTTP pipelining vs SPDY. But I'd recommend just reading up on SPDY. Check out http://dev.chromium.org/spdy and our tech talk on SPDY at http://www.youtube.com/watch?v=TNBkxA313kk&list=PLE0E03DF19D90B5F4&index=2&feature=plpp_video.
See Difference between HTTP pipeling and HTTP multiplexing with SPDY
Being curious, I wonder why HTTP, by design, can only handle one pending request per socket.
I understand that this limitation is because there is no 'Id' to associate a request to its response, so the only way to match a response with its request is to send the response on the same socket that sent the request. There would be no way to match a response to its request if there was more than one pending request on the socket because we may not receive the responses in the same order requests were sent.
If the protocol had been designed to have a matching 'Id' for requests and responses, there could be multiple pending requests on only one socket. This could greatly reduce the number of socket used by internet browsers and applications using web services.
Was HTTP designed like this for simplicity even if it's less efficient or am I missing something and this is the best approach?
Thanks.
Not true. Read about HTTP1.1 pipelining. Apache implements it and Firefox implements it. Although Firefox disables it by default.
To turn it on in Firefox use about:config and write 'pipelining' in the filter.
see: http://www.mozilla.org/projects/netlib/http/pipelining-faq.html
It's basically for simplicity; various proposals have been made over the years that multiplex on the same connection (e.g. SPDY) but none have taken off yet.
One problem with sending multiple requests on a single socket is that it would cause inefficient queuing.
For instance, lets say you are in a store and there are 2 cashiers, and 10 people waiting to be checked out. The ideal way to make the line is to have a single queue of 10 people and the next person in line goes to a cashier when they become available. However, if you sent all the requests at once you would probably send 5 people to cashier A and 5 to cashier B. However, what if you sent the 5 people with the largest shopping carts to the same cashier? That's bad queuing and what could happen if you queued a bunch of requests on a single socket.
NOTE: I'm not saying that you couldn't use queuing well, but it keeps it simple to do it right if there is no queuing on a single socket.
There are a few concidertaions I would review.
The first is related to the nature of TCP itself. TCP suffers from 'head-of-line' blocking issue where there can only be a single outstanding (unacknowledged) request (connection/TCP level) in flight. Given traditional latencies this can be a problem from a load time user experience perspective compared to results of parallel connection scheme browsers employ today. The higher the latency of the link the larger the impact of this fundemental limitation.
There is also a concurrency issue in that sometimes you really want to load multiple resources incrementally / in parallel. Back in the day one of the greatest features mozilla had over mosaic was that it would load images and objects incrementally so you could begin to see what was going on and use a resource without having to wait for it to load. With fewer connections there is a risk in that for example loading a large image on page before a style sheet can be catastrophic from an experience point of view. Expecting some kind of mitigating intelligence or explicit configuration to optimally order requests may not be a realistic or ideal solution.
There are proposals such as HTTP over SCTP that will more or less totally correct the issue you raise at the transport level.
Also realize that HTTP doesn't necessarily mandate a Content-Length header to serve data. Even if each HTTP response was ID'd, how would you manage streaming binary content with no content length (HTTP/1.0 style)? or if the client sent the Connection: close header to have the client close due to non-known lengths?
To manage this you would have to HTTP chunk (already present) in multiplex (I don't think anyone implements this) and add some non-trivial work to many programs.