With plain HTTP, cookieless domains are an optimization to avoid unnecessarily sending cookie headers for page resources.
However, the SPDY protocol compresses HTTP headers and in some cases eliminates unnecessary headers. My question then is, does SPDY make cookieless domains irrelevant?
Furthermore, should the page source and all of its resources be hosted at the same domain in order to optimize a SPDY implementation?
Does SPDY make cookieless domains irrelevant?
Sort of, mostly... But not entirely.
First off, there are at least two good reasons for using "cookieless domains": one is to avoid the extra headers and reduce the size of the request, second is to avoid leaking any private or secure information about the user. Each is valid independent of each other. So with that, clearly there is still a reason to have a "cookieless domain" under HTTP 2.0 for security and privacy.
Further, compression is not a magic bullet either. Establishing a compression / decompression context is not free, and depending on the used compression scheme, allocated buffer sizes, etc, a large cookie could completely destroy the performance of the compressor. Up to spdy/v3, a gzip compressor (sliding window) was used, and given a large enough cookie, you would have a negative impact on performance of the compressor (degree varies by browser, based on implementation). In spdy/v4, the gzip compressor is out and an entirely new algorithm is being implemented from scratch -- since v4 is not out yet, it's too early to speculate about the specifics of performance. Having said that, in most cases, you should be fine.. I'm just highlighting the edge cases.
Furthermore, should the page source and all of its resources be hosted at the same domain in order to optimize a SPDY implementation?
Yes, to the extent possible - that'll give you best performance. There are caveats here as well: high packet loss to origin server, or high BDP product without window scaling. But chances are, if you're using a reasonable hosting provider with good connectivity, neither of these should be an issue.
Related
Most articles consider using domain sharding as hurting performance but it's actually not entirely true. A single connection can be reused for different domains at certain conditions:
they resolve to the same IP
in case of secure connection the same certificate should cover both domains
https://www.rfc-editor.org/rfc/rfc7540#section-9.1.1
Is that correct? Is anyone using it?
And what about CDN? Can I have some guarantees that they direct a user to the same server (IP)?
Yup that’s one of the benefits of HTTP/2 and in theory allows you to keep sharding for HTTP/1.1 users and automatically unshard for HTTP/2 users.
The reality is a little more complicated as always - due mostly to implementation issues and servers resolving to different IP addresses as you state. This blog post is a few years old now but describes some of the issues: https://daniel.haxx.se/blog/2016/08/18/http2-connection-coalescing/. Maybe it’s improved since then, but would imagine issues still exist. Also new features like the ORIGIN frame should help but are not widely supported yet.
I think however it’s worth revisiting the assumption that sharding is actually good for HTTP/1.1. The costs of setting up new connections (DNS lookup, TCP setup, TLS handshake and then the actual sending HTTP messages) are not immaterial and studies have shown the 6 connection browser limit is really used never mind adding more by sharding. Concatenation, spriting and inlining are usually much better options and these can still be used for HTTP/2. Try it on your site and measure is the best way of being sure of this!
Incidentally it is for for these reasons (and security) that I’m less keen on using common libraries (e.g. jquery, bootstrap...etc.) from their CDNs instead of hosted locally. In my opinion the performance benefit of a user already having the version your site uses already cached is over stated.
With al these things, HTTP/1.1 will still work without sharded domains. It may (arguably) be slower but it won’t break. But most users are likely on HTTP/2 so is it really worth adding the complexity for the minority’s of users? Is this not a way of progressively enhancing your site for people on modern browsers (and encouraging those not, to upgrade)? For larger sites (e.g. Google, Facebook... etc.) the minority may still represent a large number of users and the complexity is worth it (and they have the resources and expertise to deal with it) for the rest of us, my recommendation is not to shard, to upgrade to new protocols like HTTP/2 when they become common (like it is now!) but otherwise to keep complexity down.
Here's the situation. I have clients over a secured network (https) that talk to multiple backends. Now, I wanted to establish a reverse proxy for majorly load balancing (based on header data or cookies) and a little caching. So, I thought varnish could be of use.
But, varnish does not support ssl-connection. As I've read at many places, quoting, "Varnish does not support SSL termination natively". But, I want every connection, ie. client-varnish and varnish-backend to be over https. I cannot have plaintext data anywhere throughout network (there are restrictions) so nothing else can be used as SSL-Terminator (or can be?).
So, here are the questions:
Firstly, what does this mean (if someone can explain in simple terms) that "Varnish does not support SSL termination natively".
Secondly, is this scenario good to implement using varnish?
and Finally, if varnish is not a good contender, should I switch to some other reverse proxy. If yes, then which will be suitable for the scenario? (HA, Nginx etc.)
what does this mean (if someone can explain in simple terms) that "Varnish does not support SSL termination natively"
It means Varnish has no built-in support for SSL. It can't operate in a path with SSL unless the SSL is handled by separate software.
This is an architectural decision by the author of Varnish, who discussed his contemplation of integrating SSL into Varnish back in 2011.
He based this on a number of factors, not the least of which was wanting to do it right if at all, while observing that the de facto standard library for SSL is openssl, which is a labyrinthine collection of over 300,000 lines of code, and he was neither confident in that code base, nor in the likelihood of a favorable cost/benefit ratio.
His conclusion at the time was, in a word, "no."
That is not one of the things I dreamt about doing as a kid and if I dream about it now I call it a nightmare.
https://www.varnish-cache.org/docs/trunk/phk/ssl.html
He revisited the concept in 2015.
His conclusion, again, was "no."
Code is hard, crypto code is double-plus-hard, if not double-squared-hard, and the world really don't need another piece of code that does an half-assed job at cryptography.
...
When I look at something like Willy Tarreau's HAProxy I have a hard time to see any significant opportunity for improvement.
No, Varnish still won't add SSL/TLS support.
Instead in Varnish 4.1 we have added support for Willys PROXY protocol which makes it possible to communicate the extra details from a SSL-terminating proxy, such as HAProxy, to Varnish.
https://www.varnish-cache.org/docs/trunk/phk/ssl_again.html
This enhancement could simplify integrating varnish into an environment with encryption requirements, because it provides another mechanism for preserving the original browser's identity in an offloaded SSL setup.
is this scenario good to implement using varnish?
If you need Varnish, use it, being aware that SSL must be handled separately. Note, though, that this does not necessarily mean that unencrypted traffic has to traverse your network... though that does make for a more complicated and CPU hungry setup.
nothing else can be used as SSL-Terminator (or can be?)
The SSL can be offloaded on the front side of Varnish, and re-established on the back side of Varnish, all on the same machine running Varnish, but by separate processes, using HAProxy or stunnel or nginx or other solutions, in front of and behind Varnish. Any traffic in the clear is operating within the confines of one host so is arguably not a point of vulnerability if the host itself is secure, since it never leaves the machine.
if varnish is not a good contender, should I switch to some other reverse proxy
This is entirely dependent on what you want and need in your stack, its cost/benefit to you, your level of expertise, the availability of resources, and other factors. Each option has its own set of capabilities and limitations, and it's certainly not unheard-of to use more than one in the same stack.
So I'm running a static landing page for a product/service I'm selling, and we're advertising using AdWords & similar. Naturally, page load speed is a huge factor here to maximize conversions.
Pros of HTTP/2:
Data is more compressed.
Server Push allows to send all resources at once without requests, which has MANY benefits such as replacing base64 inline images, sprites...etc.
Multiplexing over a single connection significantly improves load time.
Cons of HTTP/2:
1) Mandatory TLS, which slows down load speed.
So I'm torn. On one side, HTTP/2 has many improvements. On the other, maybe it would be faster to keep avoiding unnecessary TLS and continue using base64/sprites to reduce requests.
The total page size is ~1MB.
Would it be worth it?
The performance impact of TLS on modern hardware is negligible. Transfer times will most likely be network-bound. It is true that additional network round-trips are required to establish a TLS session but compared to the time required to transfer 1MB, it is probably negligible (and TLS session tickets, which are widely supported, also save a round-trip).
The evidence is that reducing load speed is definitely worth the effort (see the business case for speed).
The TLS session is a pain and it is unfortunate that the browser vendors are insisting on it, as there is nothing in HTTP2 that prevents plain text. For a low load system, were CPU costs are not the limiting factor, TLS essentially costs you one RTT (round trip time on network).
HTTP/2 and specially HTTP/2 push can save you many RTTs and thus can be a big win even with the TLS cost. But the best way to determine this is to try it for your page. Make sure you use a HTTP/2 server that supports push (eg Jetty) otherwise you don't get all the benefits. Here is a good demo of push with SPDY (which is that same mechanism as in HTTP/2):
How many HTTP requests does these 1000 kb require? With a page that large, I don't think it matters much for the end user experience. TLS is here to stay though... I don't think you should NOT use it because it may slow your site down. If you do it right, it won't slow your site down.
Read more about SSL not being slow anymore: https://istlsfastyet.com/
Mandatory TLS doesn't slow down page load speed if it's SPDY/3.1 or HTTP/2 based due to both supporting multiplexing request streams. Only non-SPDY or non-HTTP/2 based TLS would be slower than non-https.
Check out https://community.centminmod.com/threads/nginx-spdy-3-1-vs-h2o-http-2-vs-non-https-benchmarks-tests.2543/ clearly illustrates why SPDY/3.1 and HTTP/2 over TLS is faster for overall page loads. HTTP/2 allows multiplexing over several hosts at same time while SPDY/3.1 allows multoplexing per host.
Best thing to do is test both non-https and HTTP/2 or SPDY/3.1 https and see which is best for you. Since you have a static landing page it makes testing that much easier to do. You can do something similar to page at https://h2ohttp2.centminmod.com/flags.html where you setup both HTTP/2, SPDY and non-https on same server and be able to test all combinations and compare them.
I'm new to web security.
Why would I want to use HTTP and then switch to HTTPS for some connections?
Why not stick with HTTPS all the way?
There are interesting configuration improvements that can make SSL/TLS less expensive, as described in this document (apparently based on work from a team from Google: Adam Langley, Nagendra Modadugu and Wan-Teh Chang): http://www.imperialviolet.org/2010/06/25/overclocking-ssl.html
If there's one point that we want to
communicate to the world, it's that
SSL/TLS is not computationally
expensive any more. Ten years ago it
might have been true, but it's just
not the case any more. You too can
afford to enable HTTPS for your users.
In January this year (2010), Gmail
switched to using HTTPS for everything
by default. Previously it had been
introduced as an option, but now all
of our users use HTTPS to secure their
email between their browsers and
Google, all the time. In order to do
this we had to deploy no additional
machines and no special hardware. On
our production frontend machines,
SSL/TLS accounts for less than 1% of
the CPU load, less than 10KB of memory
per connection and less than 2% of
network overhead. Many people believe
that SSL takes a lot of CPU time and
we hope the above numbers (public for
the first time) will help to dispel
that.
If you stop reading now you only need
to remember one thing: SSL/TLS is not
computationally expensive any more.
One false sense of security when using HTTPS only for login pages is that you leave the door open to session hijacking (admittedly, it's better than sending the username/password in clear anyway); this has recently made easier to do (or more popular) using Firesheep for example (although the problem itself has been there for much longer).
Another problem that can slow down HTTPS is the fact that some browsers might not cache the content they retrieve over HTTPS, so they would have to download them again (e.g. background images for the sites you visit frequently).
This being said, if you don't need the transport security (preventing attackers for seeing or altering the data that's exchanged, either way), plain HTTP is fine.
If you're not transmitting data that needs to be secure, the overhead of HTTPS isn't necessary.
Check this SO thread for a very detailed discussion of the differences.
HTTP vs HTTPS performance
Mostly performance reasons. SSL requires extra (server) CPU time.
Edit: However, this overhead is becoming less of a problem these days, some big sites already switched to HTTPS-per-default (e.g. GMail - see Bruno's answer).
And not less important thing. The firewall, don't forget that usually HTTPS implemented on port 443.
In some organization such ports are not configured in firewall or transparent proxies.
HTTPS can be very slow, and unnecessary for things like images.
I have a web application where the client will be running off a local server (i.e. - requests will not be going out over the net). The site will be quite low traffic and so I am trying to figure out if the actual de-compression is expensive in this type of a system. Performance is an issue so I will have caching set up, but I was considering compression as well. I will not have bandwidth issues as the site is very low traffic. So, I am just trying to figure out if compression will do more harm than good in this type of system.
Here's a good article on the subject.
On pretty much any modern system with a solid web stack, compression will not be expensive, but it seems to me that you won't be gaining any positive effects from it whatsoever, no matter how minor the overhead. I wouldn't bother.
When you measured the performance, how did the numbers compare? Was it faster when you had compression enabled, or not?
I have used compression but users were running over a wireless 3G network at various remote locations. Compression made a significant different to the bandwidth usage in this case.
For users running locally, and with bandwidth not an issue, I don't think it is worth it.
For cachable resources (.js, .html, .css) files, I think it doesn't make sense after the browser caches these resources.
But for non-cachable resources (e.g. json response) I think it makes sense.