http2 domain sharding without hurting performance - cdn

Most articles consider using domain sharding as hurting performance but it's actually not entirely true. A single connection can be reused for different domains at certain conditions:
they resolve to the same IP
in case of secure connection the same certificate should cover both domains
https://www.rfc-editor.org/rfc/rfc7540#section-9.1.1
Is that correct? Is anyone using it?
And what about CDN? Can I have some guarantees that they direct a user to the same server (IP)?

Yup that’s one of the benefits of HTTP/2 and in theory allows you to keep sharding for HTTP/1.1 users and automatically unshard for HTTP/2 users.
The reality is a little more complicated as always - due mostly to implementation issues and servers resolving to different IP addresses as you state. This blog post is a few years old now but describes some of the issues: https://daniel.haxx.se/blog/2016/08/18/http2-connection-coalescing/. Maybe it’s improved since then, but would imagine issues still exist. Also new features like the ORIGIN frame should help but are not widely supported yet.
I think however it’s worth revisiting the assumption that sharding is actually good for HTTP/1.1. The costs of setting up new connections (DNS lookup, TCP setup, TLS handshake and then the actual sending HTTP messages) are not immaterial and studies have shown the 6 connection browser limit is really used never mind adding more by sharding. Concatenation, spriting and inlining are usually much better options and these can still be used for HTTP/2. Try it on your site and measure is the best way of being sure of this!
Incidentally it is for for these reasons (and security) that I’m less keen on using common libraries (e.g. jquery, bootstrap...etc.) from their CDNs instead of hosted locally. In my opinion the performance benefit of a user already having the version your site uses already cached is over stated.
With al these things, HTTP/1.1 will still work without sharded domains. It may (arguably) be slower but it won’t break. But most users are likely on HTTP/2 so is it really worth adding the complexity for the minority’s of users? Is this not a way of progressively enhancing your site for people on modern browsers (and encouraging those not, to upgrade)? For larger sites (e.g. Google, Facebook... etc.) the minority may still represent a large number of users and the complexity is worth it (and they have the resources and expertise to deal with it) for the rest of us, my recommendation is not to shard, to upgrade to new protocols like HTTP/2 when they become common (like it is now!) but otherwise to keep complexity down.

Related

Can nginx reverse proxy isolate cache per endpoint?

I'm using nginx reverse proxy to cache content from two endpoints, one of which is very reliable; the other has frequent timeouts.
I've found that those timeouts can sometimes use up all available connections or cause other issues, degrading performance for the server as a whole and leading to increased latency for the reliable endpoint as well.
I've tweaked some settings (worker_rlimit_nofile, worker_connections), but what I'd really like to do is isolate the caching and connections for the two endpoints as much as possible: give each a share of the available cache, and a share of the available connections, and operate as if they're hitting two separate servers, to reduce the chances that issues with one endpoint affect the performance of the other.
If I were to create two location blocks, one for each endpoint, can I designate each block's share of the cache (e.g. number of files, or total size) and share of available connections?
Or is there a better way of achieving this goal of isolation to ensure reliable performance for the good endpoint, even if the bad endpoint is experiencing lots of timeouts?
Most of the proxy_cache_* directives can be specific to location blocks and will allow you to do just that.
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache
It may also help others answer if an example config is provided that reflects what you're currently doing.

How to add HTTP/2 in G-WAN

I would like to know if it's possible to make G-WAN 100% compatible with HTTP/2 by using for example the solution nghttp2 (https://nghttp2.org)
Sorry for the late answer - for any reason Stackoverflow did not notify us this question and I have found it only because a more recent one was notified.
I have not looked at this library so I can't tell for sure if it can be used without modifications, but it could certainly be used as the basis of an event-based G-WAN protocol handler.
But, from a security point of view, there are severe issues with HTTP-2, and this is why we have not implemented it in G-WAN: HTTPS-2 lets different servers use the same TCP connection - even if they weren't listed in the original TLS certificate.
That may be handy for legit applications, but that's a problem for security: DOH (DNS over HTTP-2) prevents users from blocking (or even detecting) unwanted hosts at the traditionally used DNS requests level (the "hosts" file in various operating systems).
In facts, this new HTTP standard is defeating the purpose of SSL certificates, and defeating domain-name monitoring and blacklisting.
Is it purely a theoretical threat?
Google ads have been used in the past to inject malware designed to attack both the client and server sides.

HTTP2 protocol impact on web developement?

I would like to bring your attention to something that I re-think for days. The new features and impact of HTTP/2 protocol for web development. I would also like to ask some related questions, because my annual planning is getting less accurate because of HTTP/2.
Since HTTP/2 uses a single, multiplexed connection, instead of multiple connections in HTTP 1.x domain sharding techniques will not be needed any more.
With HTTP/1.x you may have already put files in different domains to increase parallelism in file transfer to the web browser; content domain networks (CDNs) do this automatically. But it doesn't help – and can hurt – performance under HTTP/2.
Q1: Will HTTP/2 minimize the need for CDNs?
Code files concatenating. Code chunks that would normally be maintained and transferred as separate files are combined into one. The browser then finds and runs the needed code within the concatenated file as needed.
Q2. Will HTTP/2 eliminate the need to concatenate files with similar extensions (css, javascript) and the usage of great Grunt and Gulp tools to do so?
Q. Also, in order to simplify and keep the question more compact, I would ask quite generally what may be other impacts of HTTP/2 on web development as you can foresee?
Q1: Will HTTP/2 minimize to need for CDNs?
It will certainly shift the balance a bit, provided that you use the right software. I talk about balance because CDNs cost money and management time.
If you are using CDNs to offload traffic you still will need them to offload traffic.
If you are a smallish website (and most websites are, in numerical terms), you will have less of a reason to use a CDN, as latency can be hidden quite effectively with HTTP/2 (provided that you deploy it correctly). HTTP/2 is even better than SPDY, and check this article for a use case regarding SPDY.
Also, most of the third-party content that we incorporate into our sites already uses CDNs.
Q2. Will HTTP/2 eliminate the need to concatenate files with similar extensions (css, javascript) and the usage of great Grunt and Gulp tools to do so?
Unfortunately not. Concatenating things won't be needed, unless the files you are delivering are extremely small, say a few hundred bytes. Everything else is still relevant, including minification and adding those ugly query strings for cache busting.
Q3 . Also, in order to simplify and keep the question more compact, I would ask quite general what may be other impacts of HTTP/2 on web development as you can foresee?
This is a tricky question. In one hand HTTP/2 arrives at a moment when the web is mature, and developers have whole stacks of things to take care of. HTTP/2 can be seen as a tiny piece to change in such a way that the entire stack doesn't crumble. Indeed, I can imagine many teams selling HTTP/2 to management this way ("It won't be a problem, we promise!").
But from a technical standpoint, HTTP/2 allows for better development workflows. For example, the multiplexing nature of HTTP/2 means that most of the contents of a site can be served over a single connection, allowing some servers to learn about interactions between assets by just observing browser behaviors. The information can be used together with other features of HTTP/2 and the modern web (specifically, HTTP/2 PUSH and the pre-open headers) to hide a lot of latency. Think about how much work that can save developers interested in performance.
Q1: Will HTTP/2 minimize to need for CDNs?
No. CDN's are primarily to co-locate content close to the user based on geographic location. Closer your are to the server, faster you will get the contet.
Q2. Will HTTP/2 eliminate the need to concatenate files with similar extensions (css, javascript) and the usage of great Grunt and Gulp tools to do so?
Concatenation is only a part of things a tool like is Grunt/Gulp does. Linting, conversions, runnings tests are other things you would still need a tool for. So they will stay. In terms of concat, you would ideally move away from creating a single large concat file per type and move to creating smaller concatenated files per module.
Q3. Also, in order to simplify and keep the question more compact, I would ask quite general what may be other impacts of HTTP/2 on web development as you can foresee?
General idea is HTTP/2 will not make a huge change to the way we develop things as its a protocol level change. Developers would ideally remove optimizations (like compacting, sharding) which are not optimization techniques with http/2

Since HTTP 2.0 is rolling out, are tricks like asset bundle still necessary?

How can we know how many browsers support HTTP 2.0?
How can we know how many browsers support HTTP 2.0?
A simple Wikipedia search will tell you. They cover at least 60% of the market and probably more once you pick apart the less than 10% browsers. That's pretty good for something that's only been a standard for a month.
This is a standard people have been waiting for for a long time. It's based on an existing protocol, SPDY, that's had some real world vetting. It gives some immediate performance boosts, and performance in browsers is king. Rapid adoption by browsers and servers is likely. Everyone wants this. Nobody wants to allow their competitors such a significant performance edge.
Since http 2.0 is rolling out, does tricks like asset bundle still be necessary?
HTTP/2 is designed to solve many of the existing performance problems of HTTP/1.1. There should be less need for tricks to bundle multiple assets together into one HTTP request.
With HTTP/2 multiple requests can be performed in a single connection. An HTTP/2 server can also push extra content to the client before the client requests, allowing it to pre-load page assets in a single request and even before the HTML is downloaded and parsed.
This article has more details.
When can we move on to the future of technologies and stop those dirty optimizations designed mainly for HTTP 1?
Three things have to happen.
Chrome has to turn on their support by default.
This will happen quickly. Then give a little time for the upgrade to trickle out to your users.
You have to use HTTPS everywhere.
Most browsers right now only support HTTP/2 over TLS. I think everyone was expecting HTTP/2 to only work encrypted to force everyone to secure their web sites. Sort of a carrot/stick, "you want better performance? Turn on basic security." I think the browser makers are going to stick with the "encrypted only" plan anyway. It's in their best interest to promote a secure web.
You have to decide what percentage of your users get degraded performance.
Unlike something like CSS support, HTTP/2 support does not affect your content. Its benefits are mostly performance. You don't need HTTP/1.1 hacks. Your site will still look and act the same for HTTP/1.1 if you get rid of them. It's up to you when you want to stop putting in the extra work to maintain.
Like any other hack, hopefully your web framework is doing it for you. If you're manually stitching together icons into a single image, you're doing it wrong. There are all sorts of frameworks which should make this all transparent to you.
It doesn't have to be an all-or-nothing thing either. As the percentage of HTTP/1.1 connections to your site drops, you can do a cost/benefit analysis and start removing the HTTP/1.1 optimizations which are the most hassle and the least benefit. The ones that are basically free, leave them in.
Like any other web protocol, the question is how fast will people upgrade? These days, most browsers update automatically. Mobile users, and desktop Firefox and Chrome users, will upgrade quickly. That's 60-80% of the market.
As always, IE is the problem. While the newest version of IE already supports HTTP/2, it's only available in Windows 10 which isn't even out yet. All those existing Windows users will likely never upgrade. It's not in Microsoft's best interest to backport support into old versions of Windows or IE. In fact, they just announced they're replacing IE. So that's probably 20% of the web population permanently left behind. The statistics for your site will vary.
Large institutional installations like governments, universities and corporations will also be slow to upgrade. Regardless of what browser they have standardized on, they often disable automatic updates in order to more tightly control their environment. If this is a large chunk of your users, you may not be willing to drop the HTTP/1.1 hacks for years.
It will be up to you to monitor how people are connecting to your web site, and how much effort you want to put into optimizing it for an increasingly shrinking portion of your users. The answer is "it depends on who your users are" and "whenever you decide you're ready".

HTTPS instead of HTTP?

I'm new to web security.
Why would I want to use HTTP and then switch to HTTPS for some connections?
Why not stick with HTTPS all the way?
There are interesting configuration improvements that can make SSL/TLS less expensive, as described in this document (apparently based on work from a team from Google: Adam Langley, Nagendra Modadugu and Wan-Teh Chang): http://www.imperialviolet.org/2010/06/25/overclocking-ssl.html
If there's one point that we want to
communicate to the world, it's that
SSL/TLS is not computationally
expensive any more. Ten years ago it
might have been true, but it's just
not the case any more. You too can
afford to enable HTTPS for your users.
In January this year (2010), Gmail
switched to using HTTPS for everything
by default. Previously it had been
introduced as an option, but now all
of our users use HTTPS to secure their
email between their browsers and
Google, all the time. In order to do
this we had to deploy no additional
machines and no special hardware. On
our production frontend machines,
SSL/TLS accounts for less than 1% of
the CPU load, less than 10KB of memory
per connection and less than 2% of
network overhead. Many people believe
that SSL takes a lot of CPU time and
we hope the above numbers (public for
the first time) will help to dispel
that.
If you stop reading now you only need
to remember one thing: SSL/TLS is not
computationally expensive any more.
One false sense of security when using HTTPS only for login pages is that you leave the door open to session hijacking (admittedly, it's better than sending the username/password in clear anyway); this has recently made easier to do (or more popular) using Firesheep for example (although the problem itself has been there for much longer).
Another problem that can slow down HTTPS is the fact that some browsers might not cache the content they retrieve over HTTPS, so they would have to download them again (e.g. background images for the sites you visit frequently).
This being said, if you don't need the transport security (preventing attackers for seeing or altering the data that's exchanged, either way), plain HTTP is fine.
If you're not transmitting data that needs to be secure, the overhead of HTTPS isn't necessary.
Check this SO thread for a very detailed discussion of the differences.
HTTP vs HTTPS performance
Mostly performance reasons. SSL requires extra (server) CPU time.
Edit: However, this overhead is becoming less of a problem these days, some big sites already switched to HTTPS-per-default (e.g. GMail - see Bruno's answer).
And not less important thing. The firewall, don't forget that usually HTTPS implemented on port 443.
In some organization such ports are not configured in firewall or transparent proxies.
HTTPS can be very slow, and unnecessary for things like images.

Resources