What is an http upgrade? - http

This is one of the Node http events. Did the obvious Google Searches, didn't find much. What is it exactly?

HTTP Upgrade is used to indicate a preference or requirement to switch to a different version of HTTP or to another protocol, if possible:
The Upgrade general-header allows the client to specify what
additional communication protocols it supports and would like to use
if the server finds it appropriate to switch protocols. The server
MUST use the Upgrade header field within a 101 (Switching Protocols)
response to indicate which protocol(s) are being switched.
Upgrade = "Upgrade" ":" 1#product
For example,
Upgrade: HTTP/2.0, SHTTP/1.3, IRC/6.9, RTA/x11
The Upgrade header field is intended to provide a simple mechanism
for transition from HTTP/1.1 to some other, incompatible protocol.
According to the IANA register, there are only 3 registered mentions of it (including one in the HTTP specification itself).
The other two are for:
Upgrading to TLS Within HTTP/1.1 (almost never used, not to be confused with HTTP over TLS, which defines HTTPS as widely used). This upgrade allows for a similar mechanism to STARTTLS in other protocols (e.g. LDAP, SMTP, ...) so as to be able to switch to TLS on the same port as the plain connection, after exchanging some of the application protocol messages, as opposed to having the entire HTTP exchange on top of SSL/TLS without it needing to know it's on top of TLS (the way HTTPS works).
Upgrading to WebSockets (still a draft).

Related

Does TLS imply HTTPS

If I understand right, for TLS handshake, client initiates the connection, server returns it's certificate, client verifies the signature on it and then a session key is generated which is used for further communication.
If above process is happening, does it mean that the initial call that the client made was HTTPS? Or is it possible to do TLS handshake for HTTP?
TLS and HTTP/HTTPS are different things, but in practice they are often deployed together, and so the boundaries between them can become a bit blured.
TLS itself can happily be used without HTTP, and is an integral part of many of the other protocols you'll use on a daily basis, such as your email connection (IMAPS/ESMTP etc). When it is used with HTTP (as HTTPS) though, then there are a few TLS extensions that the client can use when establishing the connection, to let the server know that it is expecting an HTTP response (such as the ALPN extension).
Likewise, HTTP can also happily be used without any thought of TLS. However, there exists a collection of HTTP features that have been introduced over the years to either force (or invite) an HTTP connection to migrate to HTTPS (such as the strict-transport-security or upgrade-insecure-requests headers)
HTTPS is simply HTTP inside TLS (or SSL if you're asking this question from a time when dinosaurs roamed the earth).

What does this line mean in rfc2068

source
In addition, the proliferation of incompletely-implemented
applications calling themselves "HTTP/1.0" has necessitated a
protocol version change in order for two communicating applications
to determine each other's true capabilities.
From the RFC:
HTTP has been in use by the World-Wide Web global information initiative since 1990. The first version of HTTP, referred to as HTTP/0.9, was a simple protocol for raw data transfer across the Internet.
Rephrased:
Before HTTP was standardised there were differences in implementations that meant they couldn't always communicate with each other correctly (e.g. certain web-browsers couldn't work with certain web-servers). The RFC article refers to these pre-standardisation implementations as using HTTP/0.9.
HTTP/1.0, as defined by RFC 1945, improved the protocol by allowing messages to be in the format of MIME-like messages, containing metainformation about the data transferred and modifiers on the request/response semantics. However, HTTP/1.0 does not sufficiently take into consideration the effects of hierarchical proxies, caching, the need for persistent connections, and virtual hosts. In addition, the proliferation of incompletely-implemented applications calling themselves "HTTP/1.0" has necessitated a protocol version change in order for two communicating applications to determine each other's true capabilities.
Rephrased:
After HTTP was standardised as HTTP/1.0 it certainly helped the interopability and compatibility problems, but version 1.0 of the protocol simply assumed all HTTP software would be able to use it for their existing application, but now that HTTP/1.0 has been in-use for a while the maintainers of the HTTP protocol specification saw that they need to extend HTTP to support these use-cases (e.g. proxies, caches, persistent connections, virtual-hosts) and while these things could be done using the built-in extension mechanisms in HTTP/1.0 they felt a need to increment the version number to HTTP/1.1 in order to prevent an implementation simply assuming the remote host supports a feature or not.
Example
A good example is the Host header in HTTP/1.1 that allows for a web-server serving from a single IP address and port number to serve-up different websites based on the Host header (as before HTTP/1.1 existed webservers could only serve one website per IP address, which is a problem). HTTP/1.0 does allow clients and servers to add their own custom headers, such as Host, however there is no way for the client or the server to know that the other end actually supports the Host header. But in HTTP/1.1 the Host header was formerly added to the specification so if both the client and server declare they use HTTP/1.1 then the other end knows that they'll recognize the Host header and handle it correctly.
So in the HTTP/1.0 days, with custom headers, this is how it would play out if a browser requests www.example.com if it were served from a Shared Webhost:
Browser (to DNS server): "Please give me the IP address for 'www.example.com'"
DNS Server (to browser): "www.example.com is 198.51.100.7"
Browser (to 198.51.100.7): "Hello, I speak HTTP/1.0, please send me index.html for Host: www.example.com
Server (to browser): "I also speak HTTP/1.0, here is index.html for 'not-actually-example.com'"
As you can see, the browser got not-actually-example.com even though it asked for www.example.com, because the Web-server was using HTTP/1.0 which does not recognize the Host header, even though the web-browser was sending the Host header (as an extension/experimental header). The browser software has no way of knowing if not-actually-example.com is what the user wanted or not.
In human terms, what they're saying is: so many people said they did HTTP 1.0 while they didn't, that nobody knew whether it really was HTTP 1.0 any more when someone said it.
To get out of that, they chose a new number.

Why websocket needs an opening handshake using HTTP? Why can't it be an independent protocol?

Websocket is designed in such a way that its servers can share a port with HTTP servers, by having its handshake be a valid HTTP Upgrade request.
I have a doubt in this design philosophy.
Any ways the WebSocket Protocol is an independent TCP-based protocol.
Why would we need this HTTP handshake(upgrade request) and a protocol switching. Instead why can't we directly(& independently) follow a websocket like protocol?
To quote from the IETF 6455 WebSocket spec:
The WebSocket Protocol attempts to address the goals of existing
bidirectional HTTP technologies in the context of the existing HTTP
infrastructure; as such, it is designed to work over HTTP ports 80
and 443 as well as to support HTTP proxies and intermediaries, even
if this implies some complexity specific to the current environment.
However, the design does not limit WebSocket to HTTP, and future
implementations could use a simpler handshake over a dedicated port
without reinventing the entire protocol.
In other words, there is a vast infrastructure for HTTP and HTTPS that already exists (proxies, firewalls, caches, and other intermediaries). In order to increase the chances of being adopted widely, the WebSocket protocol was designed to allow adjustments and extensions to the existing infrastructure without having to recreate everything from scratch to support a new protocol on a dedicate port.
It's also important to note that even if WebSocket protocol were to get rid of the HTTP compatible handshake, it would still need a handshake of almost equivalent complexity to support security requirements of the modern web so the browser and server can validate each other and to support CORS (cross-origin request sharing) securely. Even "raw" Flash sockets do a handshake with the server via the security policy request prior to creating the actual socket.

what is the benefit of using http hijacker

Go http pkg provide a Hijacker interface, can anyone tell when should I use it.
I check the comment, after a Hijack call lets the caller take over the connection, the HTTP server library will not do anything else with the connection.
I understand it as it's used to support both http request and common tcp interactive within one port. Is it right? Does it has any other benefits.
It means that you take over the control of TCP connection.
TCP is a generic transport protocol, whereas HTTP is an application protocol on top of TCP. The OSI seven layer model describes TCP as layer 4 and HTTP is layer 7.
If you need to implement a different application protocol, this is one use-case for hijacking.
Or if you need to do something specialised with HTTP, like preventing keep-alive connections, that is another use-case.
An example for an alternative web application protocol is Google's SPDY. It's also a good reason why you might hijack an existing HTTP connection, rather than create a TCP connection directly. For SPDY, a browser would first make an HTTP request that included 'accept' headers indicating that it is also able to understand SPDY. So now you could hijack the connection and implement SPDY instead of HTTP.

HTTP 1.0 vs 1.1

Could somebody give me a brief overview of the differences between HTTP 1.0 and HTTP 1.1? I've spent some time with both of the RFCs, but haven't been able to pull out a lot of difference between them. Wikipedia says this:
HTTP/1.1 (1997-1999)
Current version; persistent connections enabled by default and works well with proxies. Also supports request pipelining, allowing multiple requests to be sent at the same time, allowing the server to prepare for the workload and potentially transfer the requested resources more quickly to the client.
But that doesn't mean a lot to me. I realize this is a somewhat complicated subject, so I'm not expecting a full answer, but can someone give me a brief overview of the differences at a bit lower level?
By this I mean that I'm looking for the info I would need to know to implement either an HTTP server or application. I'm mostly looking for a nudge in the right direction so that I can figure it out on my own.
Proxy support and the Host field:
HTTP 1.1 has a required Host header by spec.
HTTP 1.0 does not officially require a Host header, but it doesn't hurt to add one, and many applications (proxies) expect to see the Host header regardless of the protocol version.
Example:
GET / HTTP/1.1
Host: www.blahblahblahblah.com
This header is useful because it allows you to route a message through proxy servers, and also because your web server can distinguish between different sites on the same server.
So this means if you have blahblahlbah.com and helohelohelo.com both pointing to the same IP. Your web server can use the Host field to distinguish which site the client machine wants.
Persistent connections:
HTTP 1.1 also allows you to have persistent connections which means that you can have more than one request/response on the same HTTP connection.
In HTTP 1.0 you had to open a new connection for each request/response pair. And after each response the connection would be closed. This lead to some big efficiency problems because of TCP Slow Start.
OPTIONS method:
HTTP/1.1 introduces the OPTIONS method. An HTTP client can use this method to determine the abilities of the HTTP server. It's mostly used for Cross Origin Resource Sharing in web applications.
Caching:
HTTP 1.0 had support for caching via the header: If-Modified-Since.
HTTP 1.1 expands on the caching support a lot by using something called 'entity tag'.
If 2 resources are the same, then they will have the same entity tags.
HTTP 1.1 also adds the If-Unmodified-Since, If-Match, If-None-Match conditional headers.
There are also further additions relating to caching like the Cache-Control header.
100 Continue status:
There is a new return code in HTTP/1.1 100 Continue. This is to prevent a client from sending a large request when that client is not even sure if the server can process the request, or is authorized to process the request. In this case the client sends only the headers, and the server will tell the client 100 Continue, go ahead with the body.
Much more:
Digest authentication and proxy authentication
Extra new status codes
Chunked transfer encoding
Connection header
Enhanced compression support
Much much more.
 HTTP 1.0 (1994)
It is still in use
Can be used by a client that cannot deal with chunked
(or compressed) server replies
 HTTP 1.1 (1996- 2015)
Formalizes many extensions to version 1.0
Supports persistent and pipelined connections
Supports chunked transfers, compression/decompression
Supports virtual hosting (a server with a single IP Address hosting multiple domains)
Supports multiple languages
Supports byte-range transfers; useful for resuming interrupted data
transfers
HTTP 1.1 is an enhancement of HTTP 1.0. The following lists the
four major improvements:
Efficient use of IP addresses, by allowing multiple domains to be
served from a single IP address.
Faster response, by allowing a web browser to send multiple
requests over a single persistent connection.
Faster response for dynamically-generated pages, by support for
chunked encoding, which allows a response to be sent before its
total length is known.
Faster response and great bandwidth savings, by adding cache
support.
For trivial applications (e.g. sporadically retrieving a temperature value from a web-enabled thermometer) HTTP 1.0 is fine for both a client and a server. You can write a bare-bones socket-based HTTP 1.0 client or server in about 20 lines of code.
For more complicated scenarios HTTP 1.1 is the way to go. Expect a 3 to 5-fold increase in code size for dealing with the intricacies of the more complex HTTP 1.1 protocol. The complexity mainly comes, because in HTTP 1.1 you will need to create, parse, and respond to various headers. You can shield your application from this complexity by having a client use an HTTP library, or server use a web application server.
A key compatibility issue is support for persistent connections. I recently worked on a server that "supported" HTTP/1.1, yet failed to close the connection when a client sent an HTTP/1.0 request. When writing a server that supports HTTP/1.1, be sure it also works well with HTTP/1.0-only clients.
One of the first differences that I can recall from top of my head are multiple domains running in the same server, partial resource retrieval, this allows you to retrieve and speed up the download of a resource (it's what almost every download accelerator does).
If you want to develop an application like a website or similar, you don't need to worry too much about the differences but you should know the difference between GET and POST verbs at least.
Now if you want to develop a browser then yes, you will have to know the complete protocol as well as if you are trying to develop a HTTP server.
If you are only interested in knowing the HTTP protocol I would recommend you starting with HTTP/1.1 instead of 1.0.
HTTP 1.1 is the latest version of Hypertext Transfer Protocol, the World Wide Web application protocol that runs on top of the Internet's TCP/IP suite of protocols. compare to HTTP 1.0 , HTTP 1.1 provides faster delivery of Web pages than the original HTTP and reduces Web traffic.
Web traffic Example: For example, if you are accessing a server. At the same time so many users are accessing the server for the data, Then there is a chance for hanging the Server. This is Web traffic.
HTTP 1.1 comes with the host header in its specification while the HTTP 1.0 doesn't officially have a host header, but it doesn't refuse to add one.
The host header is useful because it allows the client to route a message throughout the proxy server, and the major difference between 1.0 and 1.1 versions HTTP are:
HTTP 1.1 comes with persistent connections which define that we can have more than one request or response on the same HTTP connection.
while in HTTP 1.0 you have to open a new connection for each request and response
In HTTP 1.0 it has a pragma while in HTTP 1.1 it has Cache-Control
this is similar to pragma

Resources