The essence of HTTP protocol - http

There's too much elaboration about the HTTP protocol. But to its essensce, it's nothing but a string of ASCII characters transmitted over the TCP protocol. And the string defines the semantic of the protocol. Am I right on this?
If so, 2 questions follows:
Can we devise any protocols as we want, cause it just looks like
passing strings over the internet.
Why don't we compress the HTTP strings before we pass it down to the TCP level?

That's right, HTTP is by no means a special, but because it underpins the web it receives a lot of attention. It's an application level protocol like SMTP or FTP or any other.
Yes, you could design any protocol you like. For fun, grab an RFC for SMTP, FTP or HTTP and connect to your own server and learn the protocol. RFC2324 is also required reading - http://www.faqs.org/rfcs/rfc2324.html
Lack of HTTP header compression has been talked about a lot in recent years. See Steve Souders blog/books, YSlow! and Google Page Speed sites. The SPDY protocol is probably going to be the front runner at addressing several of the current issues with HTTP connection management, performance and security - http://www.chromium.org/spdy/spdy-whitepaper

Sure. But you would have to get others to adopt your protocol (unless it is an internal/proprietary spec). And if you can coherently express your communique in the form of HTTP, why not use it? It's widely implemented in virtually every language and operating system, and is well understood and easily debugged. Don't just create protocols for the heck of it.
The HTTP specification provides for several common compression schemes. gzip and deflate are particularly widely used. See, for example, Apache's mod_gzip and mod_deflate. Clients and servers routinely negotiate compression on your behalf.

Related

Why achieving real time connection was difficult with HTTP protocol?

Why achieving real time connection was difficult with HTTP protocol.
As HTTP is implemented over TCP connection which is reliable and continuing the persistent connection should be easier . right?.
How Web Sockets solves this persistent connection problem?
Prior to HTTP/2, HTTP/0.9 and HTTP/1.x were just command+response protocols only. A client would send a command, the server would send a reply, done, nothing more. And prior to HTTP/1.1, keeping the TCP connection alive after each response was impossible in HTTP/0.9 and not common practice in HTTP/1.0. HTTP/1.1 standardized that practice.
So it was very difficult to implement any kind of persistent real-time bi-direction communications over HTTP until recent years. Sure, various technologies were devised to try to address this (long polling, server-side pushes, etc), but nothing was really widely accepted and put into common practice across all implementations.
Then WebSockets came along, which was specifically designed for persistent bi-directional communications, in such a way that was easy to add to existing HTTP infrastructures so server admins didn't have to re-invest in new technologies, while maintaining backwards compatibility with older HTTP systems that can't/won't support WebSockets.
Then HTTP/2 came along, which now has server pushes and multiplexing built right in to the HTTP protocol, solving many of the shortcomings of old HTTP versions, and covers some of the use-cases that WebSockets were being used for, but doesn't obsolete Websockets completely.
HTTP has a very long history of evolution to get to where it is today. Evolution takes time.

using UDP to parallelize HTTP reads

Apparently, I don't get true parallel reads of different URLs on the same server, even issuing truly contemporary requests, on multiple physical interfaces (NICs).
I think the problem could be that HTTP protocol is connection oriented, then requests are serialized at lower level into TCP/IP stack (is this correct wording?).
Does make sense to attempt to 'reimplement' an high level HTTP request with a connectionless schema, like UDP, and handle myself packet addressing, to speedup streaming ?
HTTP requests are independent. They can be issues over arbitrarily many independent connections. HTTP does not impose an limits regarding concurrency.
You hit some resource limit. Maybe your client library restricts the number of concurrent calls. Maybe the server does. Maybe the network is fully utilized. Maybe back-end resources that the server uses are maxed out.
Find the bottleneck and eliminate it. The transport protocol is not the problem. Changing it can't help.
different URLs
Whether the URL is different or not makes no difference, except if the server implements some special throttling. Highly unlikely.
on multiple physical interfaces (NICs).
You are probably not network-bound.
requests are serialized at lower level into TCP/IP stack
No. Connection management is not part of HTTP. The client decided how many connections to use. Reconfigure the client.
Does make sense to attempt to 'reimplement' an high level HTTP request with a connectionless schema, like UDP, and handle myself packet addressing, to speedup streaming ?
You will have to re-implement flow control, segment fragmentation, re-transmission and other features of TCP protocol yourself. And then your HTTP implementation will not be compatible with the standard one.
So no, it does not make much sense.
For streaming you may like to use protocols designed for streaming, like WebRTC.

What Necessitates a Different Protocol for Email?

In what way is HTTP inappropriate for E-mail? How (for example) does the statefulness of IMAP benefit client development?
What actually are the arguments for keeping them separate other then historical and backwards compatibility reasons?
SMTP, IMAP, and HTTP are specialized application-level protocols. If there was a generic application-level protocol which all of these could inherit from, you could usefully refactor things, but since that is not the case, wedging the other protocols into one of the existing protocols is hardly worth the effort, and would hardly simplify things.
As things are now, the history and backwards compatibility is not just a cultural heritage, it is also a long and complex process of defining application-specific features for each protocol. SMTP is store-and-forward, which introduces the need for audit headers (Received: et al.). IMAP was designed for concurrent access to a data store, which is what made it necessary to introduce state (who are you, where are you authorized to connect, which folder are you connected to, what have you already seen, read, or deleted). HTTP is fundamentally a pull protocol (pull down a web page) and the POST facility carries with it a lot of functionality specific to the CGI protocol and the overall content model of HTTP.
SMTP is a protocol that identifies the sender and the recipients to send individual mail messages, each mail server accepts (or not) mail to forward, eventually reaching the destination. HTTP is meant for anybody to connect to the server and look at (mostly the same) contents. They are quite fundamentally different, and so it makes a lot of sense to use different protocols.

Is SPDY any different than http multiplexing over keep alive connections

HTTP 1.1 supports keep alive connections, connections are not closed until "Connection: close" is sent.
So, if the browser, in this case firefox has network.http.pipelining enabled and network.http.pipelining.maxrequests increased isn't the same effect in the end?
I know that these settings are disabled because for some websites this could increase load but I think a simple http header flag could tell the browser that is ok tu use multiplexing and this problem can be solved easier.
Wouldn't be easier to change default settings in browsers than invent a new protocol that increases complexity especially in the http servers?
SPDY has a number of advantages that go beyond what HTTP pipelining can offer, which are described in the SPDY whitepaper:
With pipelining, the server still has to return the responses one at a time in the order they were requested. This can be a problem if the client requests a resource that's dynamically generated before one that is static: the server cannot send any of the "easy" static responses until the dynamically generated one has been generated and sent. With SPDY, responses can be returned out of order or in parallel as they are generated, lowering the total time to receive all resources.
As you noted in your question, not all servers are able to deal with pipelining: it's not just load, some servers actually behave incorrectly when the client requests pipelining. Using a header to indicate that it's okay to do pipelining is too late to get the maximum benefit: you are already receiving the first response at that point, so while you can use it on future connections it's already too late for this one.
SPDY compresses headers using an algorithm which is specific to that task (stateful and with knowledge of what is normally in HTTP headers); while yes, SSL already includes compression, just compressing them with deflate is not as efficient. Most HTTP requests have no bodies and only a short GET line, so the headers make up virtually the entire request: any compression you can get is an improvement. Many responses are also small compared to their headers.
SPDY allows servers to send back additional responses without the client asking for them. For example, a server might start sending back the CSS for a page along with the original HTML, before the client has had a chance to receive and parse the HTML to determine the stylesheet URL. This can speed up page loads even further by eliminating the need for the client to actually parse the HTML before requesting other resources needed to render the page. It also supports a less bandwidth-heavy version of this feature where it can "hint" about which resources might be needed, and allow the client to decide: this allows, for example, clients that don't care about images to not bother to request them, but clients that want to display images can still request the images using the given URLs without needing to wait for the HTML.
Other things too: see William Chan's answer for even more.
HTTP pipelining is susceptible to head of line blocking (http://en.wikipedia.org/wiki/Head-of-line_blocking) at the HTTP transaction level whereas SPDY only has head of line blocking at the transport level, due to its use of multiplexing.
HTTP pipelining has deployability issues. See https://datatracker.ietf.org/doc/html/draft-nottingham-http-pipeline-01 which describes a number of different workarounds and heuristics to mitigate this. SPDY as deployed in the wild does not have this problem since it is generally deployed over SSL (port 443) using NPN (http://technotes.googlecode.com/git/nextprotoneg.html) to negotiate SPDY support. SSL is key, since it prevents intermediaries from interfering.
SPDY has header compression. See http://dev.chromium.org/spdy/spdy-whitepaper which discusses some benchmark results of the benefits of header compression. Now, it's useful to note that bandwidth is less and less of an issue (see http://www.belshe.com/2010/05/24/more-bandwidth-doesnt-matter-much/), but it's also useful to remember that bandwidth is still key for mobile. Check out https://developers.google.com/speed/articles/spdy-for-mobile which shows how beneficial SPDY is for mobile.
SPDY supports features like server push. See http://dev.chromium.org/spdy/spdy-best-practices for ways to use server push to improve cacheability of content and still reduce roundtrips.
HTTP pipelining has ill-defined failure semantics. When the server closes the connection, how do you know which requests have been successfully processed? This is a major reason why POST and other non-idempotent requests are not allowed over pipelined connections. SPDY provides semantics to cancel individual streams on the same connection, and also has a GOAWAY frame which indicates the last stream to be successfully processed.
HTTP pipelining has difficulty, often due to intermediaries, in allowing deep pipelines. This (in addition to many other reasons like HoL blocking) means that you still need to utilize multiple TCP connections to achieve maximal parallelization. Using multiple TCP connections means that congestion control information cannot be shared, that compression contexts cannot be shared (like SPDY does with headers), is worse for the internet (more costly for intermediaries and servers).
I could go on and on about HTTP pipelining vs SPDY. But I'd recommend just reading up on SPDY. Check out http://dev.chromium.org/spdy and our tech talk on SPDY at http://www.youtube.com/watch?v=TNBkxA313kk&list=PLE0E03DF19D90B5F4&index=2&feature=plpp_video.
See Difference between HTTP pipeling and HTTP multiplexing with SPDY

When to use TCP and HTTP in node.js?

Stupid question, but just making sure here:
When should I use TCP over HTTP? Are there any examples where one is better than the other?
TCP is full-duplex 2-way communication. HTTP uses request/response model. Let's see if you are writing a chat or messaging application. TCP will work much better because you can notify the client immediately. While with HTTP, you have to do some tricks like long-polling.
However, TCP is just byte stream. You have to find another protocol over it to define your messages. You can use Google's ProtoBuffer for that.
Use HTTP if you need the services it provides -- e.g., message framing, caching, redirection, content metadata, partial responses, content negotiation -- as well as a large number of well-understood tools, implementations, documentation, etc.
Use TCP if you can't work within those constraints. However, if you use TCP you'll be creating a new application protocol, which has a number of pitfalls.

Resources