HTTP2 and NGINX - when would I use a keepalive directive? - nginx

I am experimenting with migration to http/2. I have set up a Wordpress website and configured NGINX to serve it using http/2 (using SSL/TLS).
My understanding of http/2 is that by default it uses one connection to transfer a number of files - rather than opening new connections and repeating SSL handshakes for each file.
To achieve a similar effect for http/1.x, NGINX offered the keepalive directive.
So does using the keepalive directive make sense when using http/2 exclusively? Checking my config files with "nginx -t" reports no errors when I include it. But does it have any effect? Benchmarking showed no difference.

You're misunderstanding how this works.
Keep Alive's work between requests.
When you download a webpage it downloads the HTML page and discovers it needs another 20 resources say (CSS files, javascript files, images, fonts... etc.).
Under HTTP/1.1 you can only request one of these resources at once so typically the web browser fires up another 5 connections (giving 6 in total) and requests 6 of those 20 resources. Then it requests the remaining 14 resources as those connections free up. Yes keep-alives help in between those requests but that's not its only use as we'll discuss below. The overhead of setting up those connections is small but noticeable and there is a delay in only being able to request 6 resources of those 20 at a time. This is why HTTP/1.1 is inefficient for today's usage of the web where a typical web page is made up of 100 resources.
Under HTTP/2 we can fire off all 20 requests at once on the same connection so some good gains there. And yes technically you don't really benefit from keep-alives in between those as connection is still in use until they all arrive - though still benefit from small delay between first HTML request and the other 20.
However after that initial load there are likely to be more requests. Either because you are browsing around the site or because you interact with the page and it makes addition XHR api calls. Those will benefit from keep-alives whether on HTTP/1.1 or HTTP/2.
So HTTP/2 doesn't negate need for keep-alives. It negates need for multiple connections (amongst other things).
So the answer is to always use keep-alives unless you've a very good reason not to. And what type of benchmarking are you doing to say it makes no difference?

Related

Why bundle optimizations are no longer a concern in HTTP/2

I read in bundling parts of systemjs documentation that bundling optimizations no longer needed in HTTP/2:
Over HTTP/2 this approach may be preferable as it allows files to be
individually cached in the browser meaning bundle optimizations are no
longer a concern.
My questions:
It means we don't need to think of bundling scripts or other resources when using HTTP/2?
What is in HTTP/2 which makes this feature enable?
The bundling optimization was introduced as a "best practice" when using HTTP/1.1 because browsers could only open a limited number of connections to a particular domain.
A typical web page has 30+ resources to download in order to be rendered.
With HTTP/1.1, a browser opens 6 connections to the server, request 6 resources in parallel, wait for those to be downloaded, then request other 6 resources and so forth (or course some resource will be downloaded faster than others and that connection could be reused before than others for another request).
The point being that with HTTP/1.1 you can only have at most 6 outstanding requests.
To download 30 resources you would need 5 roundtrips, which adds a lot of latency to the page rendering.
In order to make the page rendering faster, with HTTP/1.1 the application developer had to reduce the number of requests for a single page.
This lead to "best practices" such as domain sharding, resource inlining, image spriting, resource bundling, etc., but these are in fact just clever hacks to workaround HTTP/1.1 protocol limitations.
With HTTP/2 things are different because HTTP/2 is multiplexed.
Even without HTTP/2 Push, the multiplexing feature of HTTP/2 renders all those hacks useless, because now you can request hundreds of resources in parallel using a single TCP connection.
With HTTP/2, the same 30 resources would require just 1 roundtrip to be downloaded, giving you a straight 5x performance increase in that operation (that typically dominates the page rendering time).
Given that the trend of web content is to be richer, it will have more resources; the more resources, the better HTTP/2 will perform with respect to HTTP/1.1.
On top of HTTP/2 multiplexing, you have HTTP/2 Push.
Without HTTP/2 Push, the browser has to request the primary resource (the *.html page), download it, parse it, and then arrange to download the 30 resources referenced by the primary resource.
HTTP/2 Push allows you to get the 30 resources while you are requesting the primary resource that references them, saving one more roundtrip, again thanks to the HTTP/2 multiplexing.
It is really the multiplexing feature of HTTP/2 that allows you to forget about resource bundling.
You can look at the slides of the HTTP/2 session that I gave at various conferences.
HTTP/2 supports "server push" which obsoletes bundling of resources. So, yes, if you are you using HTTP/2, bundling would actually be an anti-pattern.
For more info check this: https://www.igvita.com/2013/06/12/innovating-with-http-2.0-server-push/
Bundling is doing a lot in a modern JavaScript build.
HTTP/2 only addresses the optimisation of minimising the amount of requests between the client and server by making the cost of additional requests much cheaper than with HTTP/1
But bundling today is not only about minimising the count of requests between the client and the server. Two other relevant aspects are:
Tree Shaking: Modern bundlers like WebPack and Rollup can eliminate unused code (even from 3rd party libraries).
Compression: Bigger JavaScript bundles can be better compressed (gzip, zopfli ...)
Also HTTP/2 server push can waste bandwidth by pushing resources that the browser does not need, because he still has them in the cache.
Two good posts about the topic are:
http://engineering.khanacademy.org/posts/js-packaging-http2.htm
https://www.contentful.com/blog/2017/04/04/es6-modules-support-lands-in-browsers-is-it-time-to-rethink-bundling/
Both those posts come to the conclusion that "build processes are here to stay".
Bundling is still useful if your website is
Served on HTTP (HTTP 2.0 requires HTTPS)
Hosted by a server that does not support ALPN and HTTP 2.
Required to support old browsers (Sensitive and Legacy Systems)
Required to support both HTTP 1 and 2 (Graceful Degradation)
There are two HTTP 2.0 features that makes bundling obsolete:
HTTP 2.0 Multiplexing and Concurrency (allows multiple resources to be requested on a single TCP connection)
HTTP 2.0 Server Push (Server push allows the server to preemptively push the responses it thinks the client will need into the client's cache)
PS: Bundling is not the lone optimization technique that would be eliminated by the insurgence of HTTP 2.0 features. Features like image spriting, domain sharding and resource inlining (Image embedding through data URIs) will be affected.
How HTTP 2.0 affects existing web optimization techniques

What is the use of http non persistent connection mode

It may seem to be a trivial question but still.. I have a confusion over it.
Almost at every site I have read that HTTP persistent or keep-alive connections are better than the non-persistent one.
Ques: So, why do non-persistent even exists?
Some says that persistent has disadvantage if server is serving many clients as users are deprived of connection.
Ques: All the popular websites server millions of clients, does that mean they don't use persistent mode?
As per my understanding I can think search engines may not be using persistent connections.
Can someone please enlighten me on this topic.
Another doubt I have is regarding the HTTP requests. I have read that if a page contains link to several objects then web browser makes that many request to fetch all those (this is why persistent connections are used). My doubt is why all the objects are not embedded in the page and sent as one object? If argument is that it makes page heavy and not bandwidth friendly then anyways the browser open parallel connections to fetch multiple objects which again putting the same load on the network.
OK, I understand that this cannot be done for like image search but if a page contains few objects then can we embed them into the page and send.
These may seem foolish questions but I can't help. I have a doubt and I need to clear that and you can help.
Thanks
The original HTTP specification always uses non-persistent connections; HTTP/1.1 added persistence because it is more efficient for web pages that embed a lot of external objects (which were rare when HTTP/1.0 was written.)
However, even though HTTP/1.1 allows persistent connections there are implementations that don't support them, or which still only support HTTP/1.0. For this reason, HTTP/1.1 requires that the Connection: keep-alive header be sent in order to enable this feature, and Connection: close be sent to disable it.
It is possible to include media directly in the HTML by base64 encoding the data and including it in a data: URL. This is not usually done because it slows down your web browser. With a standard HTML page, the browser can start rendering the structure of the page without waiting for the (rather large) inline data: links to download.
As you say most of the webpages hosted over the internet will not only handle fewer data, and nobody can estimate that. The HTTP server should be generic and it should have a mechanism to avoid multiple requests in the name of dependencies. You say that the non-persistent method avoids the blocking of ports by a single client for a long time where as the server may have to serve more clients and it would give a lot of stress, that is not true. Persistent connections actually reduce the load for a server by limiting the number of queries it has to serve.
Hope this HTTP Persistent connection will help you understand.

Is SPDY any different than http multiplexing over keep alive connections

HTTP 1.1 supports keep alive connections, connections are not closed until "Connection: close" is sent.
So, if the browser, in this case firefox has network.http.pipelining enabled and network.http.pipelining.maxrequests increased isn't the same effect in the end?
I know that these settings are disabled because for some websites this could increase load but I think a simple http header flag could tell the browser that is ok tu use multiplexing and this problem can be solved easier.
Wouldn't be easier to change default settings in browsers than invent a new protocol that increases complexity especially in the http servers?
SPDY has a number of advantages that go beyond what HTTP pipelining can offer, which are described in the SPDY whitepaper:
With pipelining, the server still has to return the responses one at a time in the order they were requested. This can be a problem if the client requests a resource that's dynamically generated before one that is static: the server cannot send any of the "easy" static responses until the dynamically generated one has been generated and sent. With SPDY, responses can be returned out of order or in parallel as they are generated, lowering the total time to receive all resources.
As you noted in your question, not all servers are able to deal with pipelining: it's not just load, some servers actually behave incorrectly when the client requests pipelining. Using a header to indicate that it's okay to do pipelining is too late to get the maximum benefit: you are already receiving the first response at that point, so while you can use it on future connections it's already too late for this one.
SPDY compresses headers using an algorithm which is specific to that task (stateful and with knowledge of what is normally in HTTP headers); while yes, SSL already includes compression, just compressing them with deflate is not as efficient. Most HTTP requests have no bodies and only a short GET line, so the headers make up virtually the entire request: any compression you can get is an improvement. Many responses are also small compared to their headers.
SPDY allows servers to send back additional responses without the client asking for them. For example, a server might start sending back the CSS for a page along with the original HTML, before the client has had a chance to receive and parse the HTML to determine the stylesheet URL. This can speed up page loads even further by eliminating the need for the client to actually parse the HTML before requesting other resources needed to render the page. It also supports a less bandwidth-heavy version of this feature where it can "hint" about which resources might be needed, and allow the client to decide: this allows, for example, clients that don't care about images to not bother to request them, but clients that want to display images can still request the images using the given URLs without needing to wait for the HTML.
Other things too: see William Chan's answer for even more.
HTTP pipelining is susceptible to head of line blocking (http://en.wikipedia.org/wiki/Head-of-line_blocking) at the HTTP transaction level whereas SPDY only has head of line blocking at the transport level, due to its use of multiplexing.
HTTP pipelining has deployability issues. See https://datatracker.ietf.org/doc/html/draft-nottingham-http-pipeline-01 which describes a number of different workarounds and heuristics to mitigate this. SPDY as deployed in the wild does not have this problem since it is generally deployed over SSL (port 443) using NPN (http://technotes.googlecode.com/git/nextprotoneg.html) to negotiate SPDY support. SSL is key, since it prevents intermediaries from interfering.
SPDY has header compression. See http://dev.chromium.org/spdy/spdy-whitepaper which discusses some benchmark results of the benefits of header compression. Now, it's useful to note that bandwidth is less and less of an issue (see http://www.belshe.com/2010/05/24/more-bandwidth-doesnt-matter-much/), but it's also useful to remember that bandwidth is still key for mobile. Check out https://developers.google.com/speed/articles/spdy-for-mobile which shows how beneficial SPDY is for mobile.
SPDY supports features like server push. See http://dev.chromium.org/spdy/spdy-best-practices for ways to use server push to improve cacheability of content and still reduce roundtrips.
HTTP pipelining has ill-defined failure semantics. When the server closes the connection, how do you know which requests have been successfully processed? This is a major reason why POST and other non-idempotent requests are not allowed over pipelined connections. SPDY provides semantics to cancel individual streams on the same connection, and also has a GOAWAY frame which indicates the last stream to be successfully processed.
HTTP pipelining has difficulty, often due to intermediaries, in allowing deep pipelines. This (in addition to many other reasons like HoL blocking) means that you still need to utilize multiple TCP connections to achieve maximal parallelization. Using multiple TCP connections means that congestion control information cannot be shared, that compression contexts cannot be shared (like SPDY does with headers), is worse for the internet (more costly for intermediaries and servers).
I could go on and on about HTTP pipelining vs SPDY. But I'd recommend just reading up on SPDY. Check out http://dev.chromium.org/spdy and our tech talk on SPDY at http://www.youtube.com/watch?v=TNBkxA313kk&list=PLE0E03DF19D90B5F4&index=2&feature=plpp_video.
See Difference between HTTP pipeling and HTTP multiplexing with SPDY

HTTP Keep-Alive for <1KB calls every 1 second

I am optimizing my web server settings to handle large number of concurrent users and one of the problems I'm running into is deciding whether or not to disable HTTP Keep-Alive.
I am using CDN for all the images on the site so when my HTML page is requested I am downloading approximately 5 files (js, css, etc) on first load... and then only HTML on each successive load.
Other then that, the only thing I have is HTTP POST update invoke on every second (resulting JSON is typically less than 1KB).
So, with these constrains - would you think that disabling HTTP Keep-Alive on the server would be a good idea? Would that improve the number of concurrent users server can handle?
(By the way, I've reduced KeepAliveTimeout/ConnectionTimeout to 15 seconds in the IIS 7.5 settings)
From what you are describing, you are making a call per client per second. So all boils down to how much it takes to serve the request. If let's say, it takes 100ms to serve the request. So what that means is Http Keep-Alive of 15 seconds will be have 15 calls accommodated w/o re-establishing connection but connection was actual active (or being used) only for 1.5 seconds - rest of the time, you are actually blocking some client/connection (assuming there is any client). W/o keep alive, you can probably accommodate 8-9 times more concurrent clients.
However all said, you have to look at actual parameters to make decision. How many concurrent clients you are likely to have and what is the response time etc. The best way is to do simulation/load testing to measure the performance. Because if your server is going to handle the anticipated max concurrent user load with keep-alive, you can very well keep keep alive.
BTW, also see this related question on SO: http keep-alive in the modern age

The number of HTTP requests need to be made for downing a webpage?

Are all assets (html files, js files, css files, images) in one webpage transmitted through a single HTTP request/response, or through multiple HTTP requests/responses, one for each asset?
Assumed no XHR in that webpage.
All the digital assets on a web document are transmitted on separate HTTP requests. However modern web servers and browsers are able to use the same TCP connection with HTTP keep-alive.
Conceptually, each asset is a separate request. In practise, most servers allow the browser to re-use the same physical socket connection for multiple requests (but they are still issued one after the other) and this can significantly improve performance (because you need extra round-trips to establish a connection, and subsequent requests can piggy-back on the ACKs for the previous request: you cut down on a lot of round-trips).
But yes, there is always one request/response per asset on the page.
On connections with high latency (e.g. Australia -> U.S.) the number of round-trips can be a significant bottleneck, and that's why things like CSS sprites are widely used.
It's one request per asset, but you can use multiple TCP connections to send multiple HTTP requests in parallel. In fact all browser do exactly that.
I'd recommend downloading Firebug for Firefox, then watching its 'Net' tab while you browser some sites. It would answer this question and so many more.

Resources