How do Download Managers download huge files on HTTP without multiple requests? - http

I was downloading a 200MB file yesterday with FlashGet in the statistics it showed that it was using the HTTP1.1 protocol.
I was under the impression that HTTP is a request-response protocol and most generally used for web pages weighing a few KiB...I don't quite understand how it can download MB's or GB's of data and that too simultaneously through 5(or more) different streams.

HTTP/1.1 has a "Range" header that can specify what part of a file to transfer over the connection. The download manager can make multiple connections, specifying different ranges to transfer. It would then combine the chunks together to build the full file.

There is no size limit in http. It is used for web pages, but it is also used to deliver a huge majority of the content on the Internet. It's more a matter of bandwidth that limits sizes, not the protocol itself. And of course, this was more of a limit in the early days. (and, I suppose, those still on dial-up)

These links might help:
HTTP
HTTP Persistent Connections
Chunked Transfer Encoding

Related

How partial download works in HTTP/2?

HTTP/2 is good for downloading multiple resources, as it supports multiplexing.
I am currently using HTTP/1.1 and we use range based download (multiple partial downloads) using range header.
We created multiple connections for download.
We are planning to move on HTTP/2, does multiplexing help here, can we download all the partial chunks in the single connection?
In short my question is,
In HTTP/2 I can get multiple resources like .html, .css,.js, etc in one connection. For the same, I need to use different connections in HTTP/1.1. Now when I download range based, in HTTP/1.1 it created multiple connections for each part, in HTTP/2 all the parts of a single file will be downloaded in one connection. Is that correct?
If you believe that multiple connections actually speed up things (which in general should not be the case, unless there's packet loss or throtteling), then you'll have to do the same in HTTP/2 (HTTP/2 multiplexing uses a single TCP connection).

What is the use of http non persistent connection mode

It may seem to be a trivial question but still.. I have a confusion over it.
Almost at every site I have read that HTTP persistent or keep-alive connections are better than the non-persistent one.
Ques: So, why do non-persistent even exists?
Some says that persistent has disadvantage if server is serving many clients as users are deprived of connection.
Ques: All the popular websites server millions of clients, does that mean they don't use persistent mode?
As per my understanding I can think search engines may not be using persistent connections.
Can someone please enlighten me on this topic.
Another doubt I have is regarding the HTTP requests. I have read that if a page contains link to several objects then web browser makes that many request to fetch all those (this is why persistent connections are used). My doubt is why all the objects are not embedded in the page and sent as one object? If argument is that it makes page heavy and not bandwidth friendly then anyways the browser open parallel connections to fetch multiple objects which again putting the same load on the network.
OK, I understand that this cannot be done for like image search but if a page contains few objects then can we embed them into the page and send.
These may seem foolish questions but I can't help. I have a doubt and I need to clear that and you can help.
Thanks
The original HTTP specification always uses non-persistent connections; HTTP/1.1 added persistence because it is more efficient for web pages that embed a lot of external objects (which were rare when HTTP/1.0 was written.)
However, even though HTTP/1.1 allows persistent connections there are implementations that don't support them, or which still only support HTTP/1.0. For this reason, HTTP/1.1 requires that the Connection: keep-alive header be sent in order to enable this feature, and Connection: close be sent to disable it.
It is possible to include media directly in the HTML by base64 encoding the data and including it in a data: URL. This is not usually done because it slows down your web browser. With a standard HTML page, the browser can start rendering the structure of the page without waiting for the (rather large) inline data: links to download.
As you say most of the webpages hosted over the internet will not only handle fewer data, and nobody can estimate that. The HTTP server should be generic and it should have a mechanism to avoid multiple requests in the name of dependencies. You say that the non-persistent method avoids the blocking of ports by a single client for a long time where as the server may have to serve more clients and it would give a lot of stress, that is not true. Persistent connections actually reduce the load for a server by limiting the number of queries it has to serve.
Hope this HTTP Persistent connection will help you understand.

Does SPDY/HTTP2 concatenates responses?

I have a question about SPDY/HTTP2:
Normally you concatenate multiple CSS and JS files into one file to save requests and to get a better performance. I heard that SPDY/HTTP2 combines multiple requests into a single response. Would that mean that I don't need to pre-concatenate CSS and JS files anymore, because this is handled by the protocol?
To say it in other words:
Can I use <script source="moduleA.js"></script> and <script source="moduleB.js"></script> with SPDY/HTTP2 in the same way as I would use <script source="allScripts.js"></script> with HTTP1? Is this the same from a response performance point of view, but with the benefit of caching each file on its own, so that I can change moduleB.js and keep moduleA.js cached?
HTTP/2.0 does not (AFAIK) exist yet - it's still a proposed standard. But it seems likely that it will use similar connection handling to SPDY.
SPDY doesn't concatenate them it multiplexes the requests across the same connection - from the network's point of view the effect is the same.
Yes, you don't need to merge the content files by hand, yes they will be cached independently.
SPDY3 and HTTP2 are multiplexing requests on the same physical connection.
But even multiplexed, requests may be sent sequentially for each resource, causing major slowdowns due to roundtrip time waits.
Both SPDY3 and HTTP2 have a feature called "Resource Push" (also known as "SPDY Push", not to be confused with "Server Push") that allows related resources to be pushed without the client requesting them, and the Jetty project - I am a committer - is the only one to my knowledge that implements that feature.
You can watch Resource Push in action in this video: http://webtide.intalio.com/2012/10/spdy-push-demo-from-javaone-2012/.
With Resource Push, you save additional roundtrips to get all the different JS files and still benefit of the browser cache per single file.
The whole point of resource concatenation is exactly to reduce the number of roundtrips necessary to get all the resources needed, and Resource Push helps to solve that problem.
HTTP/2.0 allows for multiplexing, where multiple request/response streams exchange data over the same TCP connection.
Because creating and starting TCP connections is expensive, HTTP/2.0's multiplexing will usually be faster than the semi-parallel downloading of HTTP/1.1, where a limited amount of TCP connections is (re)used by the browser to perform a given amount of requests for resources.
But your mileage may vary. Measure it.
As a sidenote, you might want to reference all your libraries separately when developing and debugging, but bundle and minify the JS/CSS into one file upon a deploy.

The number of HTTP requests need to be made for downing a webpage?

Are all assets (html files, js files, css files, images) in one webpage transmitted through a single HTTP request/response, or through multiple HTTP requests/responses, one for each asset?
Assumed no XHR in that webpage.
All the digital assets on a web document are transmitted on separate HTTP requests. However modern web servers and browsers are able to use the same TCP connection with HTTP keep-alive.
Conceptually, each asset is a separate request. In practise, most servers allow the browser to re-use the same physical socket connection for multiple requests (but they are still issued one after the other) and this can significantly improve performance (because you need extra round-trips to establish a connection, and subsequent requests can piggy-back on the ACKs for the previous request: you cut down on a lot of round-trips).
But yes, there is always one request/response per asset on the page.
On connections with high latency (e.g. Australia -> U.S.) the number of round-trips can be a significant bottleneck, and that's why things like CSS sprites are widely used.
It's one request per asset, but you can use multiple TCP connections to send multiple HTTP requests in parallel. In fact all browser do exactly that.
I'd recommend downloading Firebug for Firefox, then watching its 'Net' tab while you browser some sites. It would answer this question and so many more.

How to analyse a HTTP dump?

I have a file that apparently contains some sort of dump of a keep-alive HTTP conversation, i.e. multiple GET requests and responses including headers, containing an HTML page and some images. However, there is some binary junk in between - maybe it's a dump on the TCP or even IP level (I'm not sure how to determine what it is).
Basically, I need to extract the files that were transferred. Are there any free tools I could use for this?
Use Wireshark.
Look into the file format for its dumps and convert your dump to it. Its very simple. Its called the pcap file format. Then you can open it in Wireshark no problem and it should be able to recognize the contents. Wireshark supports many dozens if not many hundred communication formats at various OSI layers (including TCP/IP/HTTP) and is great for this kind of debugging.
Wireshark will analyze on the packet level. If you want to analyze on the protocol level, I recommend Fiddler: http://www.fiddlertool.com/fiddler/
It will show you the headers sent, the responses, and will decrypt HTTPS sessions as well. And a ton more.
The Net tab in the Firebug plugin for Firefox might be of use.

Resources