HTTP 1.0 vs 1.1 - http

Could somebody give me a brief overview of the differences between HTTP 1.0 and HTTP 1.1? I've spent some time with both of the RFCs, but haven't been able to pull out a lot of difference between them. Wikipedia says this:
HTTP/1.1 (1997-1999)
Current version; persistent connections enabled by default and works well with proxies. Also supports request pipelining, allowing multiple requests to be sent at the same time, allowing the server to prepare for the workload and potentially transfer the requested resources more quickly to the client.
But that doesn't mean a lot to me. I realize this is a somewhat complicated subject, so I'm not expecting a full answer, but can someone give me a brief overview of the differences at a bit lower level?
By this I mean that I'm looking for the info I would need to know to implement either an HTTP server or application. I'm mostly looking for a nudge in the right direction so that I can figure it out on my own.

Proxy support and the Host field:
HTTP 1.1 has a required Host header by spec.
HTTP 1.0 does not officially require a Host header, but it doesn't hurt to add one, and many applications (proxies) expect to see the Host header regardless of the protocol version.
Example:
GET / HTTP/1.1
Host: www.blahblahblahblah.com
This header is useful because it allows you to route a message through proxy servers, and also because your web server can distinguish between different sites on the same server.
So this means if you have blahblahlbah.com and helohelohelo.com both pointing to the same IP. Your web server can use the Host field to distinguish which site the client machine wants.
Persistent connections:
HTTP 1.1 also allows you to have persistent connections which means that you can have more than one request/response on the same HTTP connection.
In HTTP 1.0 you had to open a new connection for each request/response pair. And after each response the connection would be closed. This lead to some big efficiency problems because of TCP Slow Start.
OPTIONS method:
HTTP/1.1 introduces the OPTIONS method. An HTTP client can use this method to determine the abilities of the HTTP server. It's mostly used for Cross Origin Resource Sharing in web applications.
Caching:
HTTP 1.0 had support for caching via the header: If-Modified-Since.
HTTP 1.1 expands on the caching support a lot by using something called 'entity tag'.
If 2 resources are the same, then they will have the same entity tags.
HTTP 1.1 also adds the If-Unmodified-Since, If-Match, If-None-Match conditional headers.
There are also further additions relating to caching like the Cache-Control header.
100 Continue status:
There is a new return code in HTTP/1.1 100 Continue. This is to prevent a client from sending a large request when that client is not even sure if the server can process the request, or is authorized to process the request. In this case the client sends only the headers, and the server will tell the client 100 Continue, go ahead with the body.
Much more:
Digest authentication and proxy authentication
Extra new status codes
Chunked transfer encoding
Connection header
Enhanced compression support
Much much more.

 HTTP 1.0 (1994)
It is still in use
Can be used by a client that cannot deal with chunked
(or compressed) server replies
 HTTP 1.1 (1996- 2015)
Formalizes many extensions to version 1.0
Supports persistent and pipelined connections
Supports chunked transfers, compression/decompression
Supports virtual hosting (a server with a single IP Address hosting multiple domains)
Supports multiple languages
Supports byte-range transfers; useful for resuming interrupted data
transfers
HTTP 1.1 is an enhancement of HTTP 1.0. The following lists the
four major improvements:
Efficient use of IP addresses, by allowing multiple domains to be
served from a single IP address.
Faster response, by allowing a web browser to send multiple
requests over a single persistent connection.
Faster response for dynamically-generated pages, by support for
chunked encoding, which allows a response to be sent before its
total length is known.
Faster response and great bandwidth savings, by adding cache
support.

For trivial applications (e.g. sporadically retrieving a temperature value from a web-enabled thermometer) HTTP 1.0 is fine for both a client and a server. You can write a bare-bones socket-based HTTP 1.0 client or server in about 20 lines of code.
For more complicated scenarios HTTP 1.1 is the way to go. Expect a 3 to 5-fold increase in code size for dealing with the intricacies of the more complex HTTP 1.1 protocol. The complexity mainly comes, because in HTTP 1.1 you will need to create, parse, and respond to various headers. You can shield your application from this complexity by having a client use an HTTP library, or server use a web application server.

A key compatibility issue is support for persistent connections. I recently worked on a server that "supported" HTTP/1.1, yet failed to close the connection when a client sent an HTTP/1.0 request. When writing a server that supports HTTP/1.1, be sure it also works well with HTTP/1.0-only clients.

One of the first differences that I can recall from top of my head are multiple domains running in the same server, partial resource retrieval, this allows you to retrieve and speed up the download of a resource (it's what almost every download accelerator does).
If you want to develop an application like a website or similar, you don't need to worry too much about the differences but you should know the difference between GET and POST verbs at least.
Now if you want to develop a browser then yes, you will have to know the complete protocol as well as if you are trying to develop a HTTP server.
If you are only interested in knowing the HTTP protocol I would recommend you starting with HTTP/1.1 instead of 1.0.

HTTP 1.1 is the latest version of Hypertext Transfer Protocol, the World Wide Web application protocol that runs on top of the Internet's TCP/IP suite of protocols. compare to HTTP 1.0 , HTTP 1.1 provides faster delivery of Web pages than the original HTTP and reduces Web traffic.
Web traffic Example: For example, if you are accessing a server. At the same time so many users are accessing the server for the data, Then there is a chance for hanging the Server. This is Web traffic.

HTTP 1.1 comes with the host header in its specification while the HTTP 1.0 doesn't officially have a host header, but it doesn't refuse to add one.
The host header is useful because it allows the client to route a message throughout the proxy server, and the major difference between 1.0 and 1.1 versions HTTP are:
HTTP 1.1 comes with persistent connections which define that we can have more than one request or response on the same HTTP connection.
while in HTTP 1.0 you have to open a new connection for each request and response
In HTTP 1.0 it has a pragma while in HTTP 1.1 it has Cache-Control
this is similar to pragma

Related

What does this line mean in rfc2068

source
In addition, the proliferation of incompletely-implemented
applications calling themselves "HTTP/1.0" has necessitated a
protocol version change in order for two communicating applications
to determine each other's true capabilities.
From the RFC:
HTTP has been in use by the World-Wide Web global information initiative since 1990. The first version of HTTP, referred to as HTTP/0.9, was a simple protocol for raw data transfer across the Internet.
Rephrased:
Before HTTP was standardised there were differences in implementations that meant they couldn't always communicate with each other correctly (e.g. certain web-browsers couldn't work with certain web-servers). The RFC article refers to these pre-standardisation implementations as using HTTP/0.9.
HTTP/1.0, as defined by RFC 1945, improved the protocol by allowing messages to be in the format of MIME-like messages, containing metainformation about the data transferred and modifiers on the request/response semantics. However, HTTP/1.0 does not sufficiently take into consideration the effects of hierarchical proxies, caching, the need for persistent connections, and virtual hosts. In addition, the proliferation of incompletely-implemented applications calling themselves "HTTP/1.0" has necessitated a protocol version change in order for two communicating applications to determine each other's true capabilities.
Rephrased:
After HTTP was standardised as HTTP/1.0 it certainly helped the interopability and compatibility problems, but version 1.0 of the protocol simply assumed all HTTP software would be able to use it for their existing application, but now that HTTP/1.0 has been in-use for a while the maintainers of the HTTP protocol specification saw that they need to extend HTTP to support these use-cases (e.g. proxies, caches, persistent connections, virtual-hosts) and while these things could be done using the built-in extension mechanisms in HTTP/1.0 they felt a need to increment the version number to HTTP/1.1 in order to prevent an implementation simply assuming the remote host supports a feature or not.
Example
A good example is the Host header in HTTP/1.1 that allows for a web-server serving from a single IP address and port number to serve-up different websites based on the Host header (as before HTTP/1.1 existed webservers could only serve one website per IP address, which is a problem). HTTP/1.0 does allow clients and servers to add their own custom headers, such as Host, however there is no way for the client or the server to know that the other end actually supports the Host header. But in HTTP/1.1 the Host header was formerly added to the specification so if both the client and server declare they use HTTP/1.1 then the other end knows that they'll recognize the Host header and handle it correctly.
So in the HTTP/1.0 days, with custom headers, this is how it would play out if a browser requests www.example.com if it were served from a Shared Webhost:
Browser (to DNS server): "Please give me the IP address for 'www.example.com'"
DNS Server (to browser): "www.example.com is 198.51.100.7"
Browser (to 198.51.100.7): "Hello, I speak HTTP/1.0, please send me index.html for Host: www.example.com
Server (to browser): "I also speak HTTP/1.0, here is index.html for 'not-actually-example.com'"
As you can see, the browser got not-actually-example.com even though it asked for www.example.com, because the Web-server was using HTTP/1.0 which does not recognize the Host header, even though the web-browser was sending the Host header (as an extension/experimental header). The browser software has no way of knowing if not-actually-example.com is what the user wanted or not.
In human terms, what they're saying is: so many people said they did HTTP 1.0 while they didn't, that nobody knew whether it really was HTTP 1.0 any more when someone said it.
To get out of that, they chose a new number.

Http 1.1 pipelining support

I enabled http pipelining support in google chrome and observerd some problems in how data is received even when using big sites like amazon.com. What is the current support for pipelining from major servers ? I wonder if issues can be caused also because of our transparent proxy (microsoft TMG) although http://technet.microsoft.com/en-us/library/cc302548.aspx mentions that
"ISA Server does not implement pipelining. Client request pipelining is supported, allowing a client to make multiple requests without waiting for each response. However, pipelining when sending requests to the Web server is not supported." I think this should not cause data to be received wrongfully from the pipeline aware web server.

Varnish + Static HTML Pages

I've recently come across a http web accelerator called Varnish. From what I've read, Varnish speeds up delivery of a website by optimizing every process of HTTP communication with the HTTP server using a reverse proxy configuration.
My question is that if you have a website that has its caching mechanism configured all the way down to static html files then how much more of an effect will Varnish have on this? Does a reverse proxy cut down the work that is performed by the HTTP server to process the request? If you have everything extensively cached on the server-side (HTTP headers, Etags, Expires Headers, Database Caching, Fragment and Page caching) then what more will a HTTP accelerator do to improve on this?
Firstly, we should differentiate between two different types of caching that go on in a normal web system: HTTP caching and server-side caching.
HTTP caching is controlled by HTTP headers, notably as you point out ETag and the various expiry mechanisms (including Expires and various aspects of Cache-Control). This is all covered in RFC 2616 (HTTP), section 13, and allows HTTP caches to return a response to an HTTP request from a client without having to go back to the origin server. In effect, the HTTP caching mechanism allows another machine between client and server to act as if it's the server, in certain cases. This is actually what varnish is doing, as we'll see in a minute; another common use that many people are familiar with is when ISPs provide an HTTP cache within their network, that can generally respond faster to their subscribers (and so improve perceived performance) than the origin servers outside their network.
Server-side caching includes database caching, and fragment and page caching, which are really all just ways of the web server avoiding doing some expensive operation (say, a database query, or rendering a particular piece of a template) by doing it once then keeping the result in a cache for a while.
I said earlier that varnish was an HTTP cache, which means that straight away it's able to be more efficient than a web server serving even a static file. Consider what a web server has to do:
parse the HTTP request
map the URI (and any relevant request headers, such as Accept-Encoding) onto a file
pull up information about the file to build the HTTP headers in the response; these are known as entity headers (RFC 2616 section 7.1, which include things such as Content-Length, Content-Type and the Expires and Last-Modified headers used in HTTP caching)
figure out what additional response headers (RFC 2616 section 6.2; these include ETag and Vary, both important parts of HTTP caching) and general header fields (RFC 2616 section 4.5) are needed
write the HTTP status line and headers out to the network
write the file's contents out to the network
By comparison, varnish is upstream of all of this, so all it has to do is:
parse the HTTP request
map the URI (and any relevant request headers) onto an entry in its internal cache
see if there's an entry; if there is, write it to the network; the HTTP headers will have been stored in the cache
If there isn't an entry, varnish has to do a little more work:
connect to a web server behind it that will run through all the steps 1-6 in the first list to generate a response
write the response to the network, including all the HTTP headers
store the response in its cache
In particular because the HTTP headers and entity body (the entire response) can be cached by varnish, if it can serve out of its cache it has less work to do. When you start generating the response dynamically in your server, the difference can become even more pronounced: say you have a page that takes 5 seconds to generate, but is the same for everyone hitting your site, varnish should be able to serve that in at most milliseconds out of the cache (plus whatever time it takes to get the response across the network to the HTTP client), and has a neat mechanism (the grace period) so it can keep on doing it while hitting the backend server once to refresh the cached version of the page.
Of course, you can introduce server-side caching to improve the speed with which your web server can process a request, but if you have a response you can cache in varnish it's generally going to be faster to do that. (There are various things that are hard to cache in varnish, particularly if you're using cookies or have pages that change depending on which user is looking at them. While it's possible to continue using varnish in these cases, unless you need really incredible speed, as far as I'm aware most people start optimising those cases using server-side caching and other techniques before hitting up varnish.)
(Note that varnish can also edit headers and indeed data going in and out of the cache, which complicates things. But the main points still stand, and even while editing things on the fly varnish can be incredibly fast.)

CGI to Handle Multiple Requests on a Persistent HTTP Connection

CGI programs typically get a single HTTP request.
HTTP 1.1 supports persistent HTTP connections whereby multiple HTTP requests/responses are made w/o closing the connection.
Is there a way for a CGI program (or similar mechanism) to handle multiple HTTP requests/responses on the same connection?
I am using Apache httpd.
Keep-alives are one of the higher-level HTTP features that is wholly dealt with by the web server. They are out-of-scope for CGI applications themselves.
Accessing CGI scripts through Apache mod_cgi works with keep-alive for me. The browser re-uses the same TCP connection to fetch the page and then resources referred to by it, without the scripts in question having to do anything special.
If you mean you would like to have the same CGI process handle one request and then the next (instead of the process ending and a new one being spawned), then I'm afraid that's not possible. The web server will intercept keep-alives and make them look like single requests before your scripts can do anything about it. (If you want to do that to improve performance, consider a different gateway interface, such as FastCGI or language-specific options like WSGI.)
SCGI sounds exactly like what you want. It is similar to FastCGI but a simpler solution to implement (the S stands for Simple :)).

Which Web Application Frameworks enforce HTTP's TCP connection limit of two per client /server?

The HTTP 1.1 RFC restricts a Client from using more than Two TCP connections between any Client and Server. I want to know which Web Application Frameworks enforce this restriction.
Regards
The HTTP 1.1 implementation is not a function of the web application framework, it's a function of the client or server HTTP agent. In other words, it's implemented by Safari, Chrome, Firefox and Internet Explorer on the client side, and Apache or IIS on the server side [*]. Of course, there are a lot more HTTP agents out there that also implement HTTP 1.1; I am just listing the most popular (as in "that I use" :-)) ones.
As far as I know, most of the web application frameworks listed on that Wikipedia article you linked to should happily run on top of Apache and/or IIS at least, so they should be able to benefit from HTTP 1.1. However, if the browser the user is using does not support HTTP 1.1, the default configuration for Apache and IIS will be to fallback to HTTP 1.0, and that will happen transparently to the web app framework of your choice in the most common case.
Update: Your question should be paraphrased (as per your comment) to "Which web application frameworks support only HTTP 1.1 as transport protocol".
There are no major web service frameworks that enforce endpoint configurations or client calls over HTTP 1.1 only. All of them allow the application code (service or client) to choose the transport. There are two main reasons for this:
the protocol choice depends on the deployment configuration of the actual service, so it's orthogonal to the framework used and is rarely made by the web service developer
limiting the transport protocol choice to HTTP 1.1 by the framework means increased adoption barrier, which no framework author would want.
The only frameworks that might enforce particular HTTP version would be ones that come with either their own implementation of the web server or with pre-configured deployment of a major web server (usually Apache). However, I am not aware of any that would enforce HTTP 1.1 only; if anything, they would enforce HTTP 1.0 only.
There is also one very practical reason that prevents HTTP 1.1 enforcementfor web services in general - most deployments are expected to work across an unknown number of middle gateways (firewalls, caching servers, load balancers and so on) which might or might not support HTTP 1.1, so the protocol negotiation between the web service client and the web service endpoint fail without fallback support for HTTP 1.0.
[*] Well, technically, it's implemented by WinHTTP and WinInet on Windows platform and is just reused by the applications. And I am sure there's a common library that is reused on Linux as well (probably called libhttp.so or something like it, but don't quote me on that :-)).

Resources