What Necessitates a Different Protocol for Email? - http

In what way is HTTP inappropriate for E-mail? How (for example) does the statefulness of IMAP benefit client development?
What actually are the arguments for keeping them separate other then historical and backwards compatibility reasons?

SMTP, IMAP, and HTTP are specialized application-level protocols. If there was a generic application-level protocol which all of these could inherit from, you could usefully refactor things, but since that is not the case, wedging the other protocols into one of the existing protocols is hardly worth the effort, and would hardly simplify things.
As things are now, the history and backwards compatibility is not just a cultural heritage, it is also a long and complex process of defining application-specific features for each protocol. SMTP is store-and-forward, which introduces the need for audit headers (Received: et al.). IMAP was designed for concurrent access to a data store, which is what made it necessary to introduce state (who are you, where are you authorized to connect, which folder are you connected to, what have you already seen, read, or deleted). HTTP is fundamentally a pull protocol (pull down a web page) and the POST facility carries with it a lot of functionality specific to the CGI protocol and the overall content model of HTTP.

SMTP is a protocol that identifies the sender and the recipients to send individual mail messages, each mail server accepts (or not) mail to forward, eventually reaching the destination. HTTP is meant for anybody to connect to the server and look at (mostly the same) contents. They are quite fundamentally different, and so it makes a lot of sense to use different protocols.

Related

Difference between ZeroMQ asynchronous http requests and Messages?

How is using asynchronous HTTP Requests different from using Messages when it comes to sending data in ZeroMQ?
A http request is simply the use of the hypertext transport protocol used over IP between two machines, client and server. It can be used for moving data in either direction. There's no particular restrictions as to what that data can be. An asynchronous request is simply one where the requester isn't bothering to wait for the reply having made the request; it'll use some mechanism to later rendezvous with the request, whenever that happens to come in.
Sending a message through ZeroMQ can be somewhat similar, specifically the REQ/REP pattern (request, reply). Similar to a http request, the requester will send some sort of message and the replier will reply in some way, and strictly in this pattern.
ZeroMQ uses its own protocol, zmtp, to move messages around. Again, there's nothing really limiting what data is in a message. ZeroMQ is inherently asynchronous - it's implementing the Actor programming model (though I notice that the way some implementations in some languages have eroded ZeroMQ's simplicity w.r.t. that, fitting into the language's own way of being asynchronous rather than use a poll funcion provided by ZeroMQ).
However, ZeroMQ builds many more data distribution patterns than req/rep on top of zmtp, like pub/sub, dealer/router, that http simply has no equivalent of. Further differences are that ZeroMQ can use IP, interprocess comms, or in-memory transports; this makes it highly suited for both in-application use, and for inter-machine distributed applications. I guess that a webserver could be contacted over ipc too, but I've never heard of anyone bothering to do that. Http is expected to be used over specific ports (e.g. port 80), whereas ZMQ gets used on whatever ports the developer wants (obeying the normal port allocation rules if they want a quiet life).

What is the difference between DEALER and ROUTER socket archetype in ZeroMQ?

What is the difference between the ROUTER and the DEALER socket archetypes in zmq?
And which should I use, if I have a server, which is receiving messages and a client, which is sending messages? The server will never send a message to a client.
EDIT: I forgot to say that there can be several instances of the client.
For details on ROUTER/DEALER Formal Communication Pattern, do not hesitate to consult the API documentation. There are many features important for ROUTER/DEALER ( XREQ/XREP ) that have nothing beneficial for your indicated use-case.
Many just send, just one just listens?
Given N-clients purely .send() messages to 1-server, which exclusively .recv() messages, but never sends any message back,
the design may benefit from a PUB/SUB Formal Communication Pattern.
In case some other preferences outweight the trivial approach, one may setup a more complex "wireing", using another one-way type of infrastructure, based on PUSH/PULL, and use a reverse setup PUB/SUB, where each new client, the PUB side, .connect()-s to the SUB-side, given a server-side .bind() access-point is on a known, static IP address and the client self-advertises on this signalling channel, that it is alive ( keep-alive with IP-address:port#, where the server-side ought initiate a new PUSHtoPULL.connect() setup onto the client-advertised, .bind()-ready PULL-side access point.
Complex? Rather a limitless tool, only our imagination is our limit.
After some time, one realises all the powers of multi-functional SIG/MSG-infrastructure, so do not hesitate to experiment and re-use the elementary archetypes in more complex, mutually-cooperating distributed systems computing.

How do XMPP/HTML/etc. *really* work?

This might be a dumb question, however, I have been continually frustrated by what seems to be a big gap in every explanation I've seen of protocols like XMPP or HTML. So basically, when I've read documentation on either, in general, it will describe the structure of the data sent back and forth through the protocol, but it does not explain exactly how this data is transferred. It's one thing to provide an example of, say, a generic HTTP request, but it is something else to explain how this text is actually sent to the server.
I guess posed another way, what resources are there out there for learning best practices for implementing text-based protocols? At their core, are all text-based protocols basically the exact same thing? How, for example, would it differ at the binary level, were I to say send the text content of an HTTP request over IRC vs however it is done natively by HTTP?
If I wanted to develop my own, simple textual protocol, what would be the best way to send the text to a client? Does the content itself even really matter? What I mean is that, obviously, HTTP and XMPP are rather different protocols, but do they differ in terms of how the text is transferred between computer to computer?
HTTP, IRC and XMPP are all sent on top of TCP, which is a protocol that provides a bidirectional stream between two endpoints (IP address + port). Under the hood, the data you send is split into separate packets, sent across the network, and reassembled on the other end, so that the recipient just sees a stream of incoming data - except when something goes wrong; there is a somewhat accessible description here.
What that means is that while the application protocol (HTTP, XMPP etc) is different, the underlying transport mechanism is exactly the same. It would be possible (perhaps even interesting) to implement HTTP on top of IRC: an HTTP/IRC client enters a channel, sends the HTTP request as messages to the channel, line by line, a server is present in the channel, reads the request and sends the response the same way - but transporting HTTP over IRC is fundamentally different from transporting HTTP over TCP. The former means layering an application protocol over another application protocol (and the IRC connection needs to go over TCP anyway), while the latter is an application protocol over a transport protocol, which is the way things usually are done (except for various kinds of proxies).
Hope that makes some sense...

The essence of HTTP protocol

There's too much elaboration about the HTTP protocol. But to its essensce, it's nothing but a string of ASCII characters transmitted over the TCP protocol. And the string defines the semantic of the protocol. Am I right on this?
If so, 2 questions follows:
Can we devise any protocols as we want, cause it just looks like
passing strings over the internet.
Why don't we compress the HTTP strings before we pass it down to the TCP level?
That's right, HTTP is by no means a special, but because it underpins the web it receives a lot of attention. It's an application level protocol like SMTP or FTP or any other.
Yes, you could design any protocol you like. For fun, grab an RFC for SMTP, FTP or HTTP and connect to your own server and learn the protocol. RFC2324 is also required reading - http://www.faqs.org/rfcs/rfc2324.html
Lack of HTTP header compression has been talked about a lot in recent years. See Steve Souders blog/books, YSlow! and Google Page Speed sites. The SPDY protocol is probably going to be the front runner at addressing several of the current issues with HTTP connection management, performance and security - http://www.chromium.org/spdy/spdy-whitepaper
Sure. But you would have to get others to adopt your protocol (unless it is an internal/proprietary spec). And if you can coherently express your communique in the form of HTTP, why not use it? It's widely implemented in virtually every language and operating system, and is well understood and easily debugged. Don't just create protocols for the heck of it.
The HTTP specification provides for several common compression schemes. gzip and deflate are particularly widely used. See, for example, Apache's mod_gzip and mod_deflate. Clients and servers routinely negotiate compression on your behalf.

Discussion: Chat server via node.js: HTTP or TCP?

I was considering doing a chat server using node.js/socket.io. Should I make it a tcp server or a http server? I'd imagine tcp server would be more efficient, but can you send other stuff to it like file attachments etc? If tcp is more efficient, how much more so? Also, just wondering how many concurrent connections can one node.js server handle? Is it more work to do TCP or HTTP?
You are talking about 2 totally different approaches here - TCP is a transport layer protocol and HTTP is an application layer protocol. HTTP (usually) operates over TCP, so whichever option you choose, it will still be operating over TCP.
The efficiency question is sort of a moot point, because you are talking about different OSI layers. If you went for raw TCP sockets, your solution would probably be more efficient - in bandwidth at least - since HTTP contains a whole bunch of extra data (the headers) that would likely be irrelevant to your purposes (depending on the scale of the chat program). What you are talking about developing there is your own application layer protocol.
You can send anything you like over TCP - after all HTTP can send attachments, and that operates over TCP. FTP also operates over TCP, and that is designed purely for transferring "attachments". In order to do this, you would need to write your protocol so that it was able to tell the remote party that the following data was a file, then send the file data, then tell the remote party that the transfer is complete. Implementations of this are many and varied (the HTTP approach is completely different from the FTP approach) and your options are pretty much infinite.
I don't know for sure about the node.js connection limit, but I can say with a fair amount of confidence that it is limited by the operating system. This might help you get to grips with the answer to that question.
It is debatable whether it is more work to do it with TCP or HTTP - it's a lot of work to do it in both. I would probably lean more toward the TCP option being your best bet. While TCP would require you to design a protocol rather than/as well as an application, HTTP is not particularly suited to live, 2-way applications like chat servers. There are many implementations of chat over HTTP that use AJAX, but I can tell you from painful experience that they are a complete pain in the rear-end.
I would say that you should only be looking at HTTP if you are intending the endpoint (i.e. the client) to be a browser. If you are going to write a desktop app for the endpoint, a direct TCP link would definitely be the way to go. The main reason for this is that HTTP works in a request-response manner, where the client sends a request to the server, and the server responds. Over TCP you can open a single TCP stream, that can be used for bi-directional communication. This means that the server can push an event to the client instantly, while over HTTP you have to wait for the client to send a request, so you can respond with an event. If you were intending to use a browser as the client, it will make the whole file transfer thing much more tricky (the sending at least).
There are ways to implement this over HTTP using long-polling and server push (read this) but it can be a real pain to implement.
If you are going to implement this on a LAN (or possibly even over the internet) it is worth considering UDP over TCP - in a chat application it is not usually absolutely mission critical that messages arrive in the right order, and even if it was, users would probably not be able to type faster than the variations in network latency (probably <100ms). Then for file transfers you could either negotiate a seperate TCP socket for the data exchange (like FTP), or implement some kind of UDP ACK system (like TFTP).
I feel there is a lot more to say on this subject but right now I can't put it into words - I may extend this answer at some point.
Chat servers are the Hello World program in node. Use http.
As far as the question of how many concurrent connections can it handle, that all depends on your system. Set up a simple chat server and then try benchmarking it.
Also, check out http://search.npmjs.org/ and search for chat for a few pointers.

Resources