Discussion: Chat server via node.js: HTTP or TCP? - http

I was considering doing a chat server using node.js/socket.io. Should I make it a tcp server or a http server? I'd imagine tcp server would be more efficient, but can you send other stuff to it like file attachments etc? If tcp is more efficient, how much more so? Also, just wondering how many concurrent connections can one node.js server handle? Is it more work to do TCP or HTTP?

You are talking about 2 totally different approaches here - TCP is a transport layer protocol and HTTP is an application layer protocol. HTTP (usually) operates over TCP, so whichever option you choose, it will still be operating over TCP.
The efficiency question is sort of a moot point, because you are talking about different OSI layers. If you went for raw TCP sockets, your solution would probably be more efficient - in bandwidth at least - since HTTP contains a whole bunch of extra data (the headers) that would likely be irrelevant to your purposes (depending on the scale of the chat program). What you are talking about developing there is your own application layer protocol.
You can send anything you like over TCP - after all HTTP can send attachments, and that operates over TCP. FTP also operates over TCP, and that is designed purely for transferring "attachments". In order to do this, you would need to write your protocol so that it was able to tell the remote party that the following data was a file, then send the file data, then tell the remote party that the transfer is complete. Implementations of this are many and varied (the HTTP approach is completely different from the FTP approach) and your options are pretty much infinite.
I don't know for sure about the node.js connection limit, but I can say with a fair amount of confidence that it is limited by the operating system. This might help you get to grips with the answer to that question.
It is debatable whether it is more work to do it with TCP or HTTP - it's a lot of work to do it in both. I would probably lean more toward the TCP option being your best bet. While TCP would require you to design a protocol rather than/as well as an application, HTTP is not particularly suited to live, 2-way applications like chat servers. There are many implementations of chat over HTTP that use AJAX, but I can tell you from painful experience that they are a complete pain in the rear-end.
I would say that you should only be looking at HTTP if you are intending the endpoint (i.e. the client) to be a browser. If you are going to write a desktop app for the endpoint, a direct TCP link would definitely be the way to go. The main reason for this is that HTTP works in a request-response manner, where the client sends a request to the server, and the server responds. Over TCP you can open a single TCP stream, that can be used for bi-directional communication. This means that the server can push an event to the client instantly, while over HTTP you have to wait for the client to send a request, so you can respond with an event. If you were intending to use a browser as the client, it will make the whole file transfer thing much more tricky (the sending at least).
There are ways to implement this over HTTP using long-polling and server push (read this) but it can be a real pain to implement.
If you are going to implement this on a LAN (or possibly even over the internet) it is worth considering UDP over TCP - in a chat application it is not usually absolutely mission critical that messages arrive in the right order, and even if it was, users would probably not be able to type faster than the variations in network latency (probably <100ms). Then for file transfers you could either negotiate a seperate TCP socket for the data exchange (like FTP), or implement some kind of UDP ACK system (like TFTP).
I feel there is a lot more to say on this subject but right now I can't put it into words - I may extend this answer at some point.

Chat servers are the Hello World program in node. Use http.
As far as the question of how many concurrent connections can it handle, that all depends on your system. Set up a simple chat server and then try benchmarking it.
Also, check out http://search.npmjs.org/ and search for chat for a few pointers.

Related

HTTP vs TCP for online games

I am wondering about the difference between HTTP and TCP data transfer protocols for online games.
I have heard many people using TCP or UDP to transfer data between client and server for online games.
But can you use http at all? I know http is mostly used for web browsing, but if I could set up web server and let my game applications use GET and POST methods, I can still send data back and forth right? Is it that this way of communicating is too slow or unnecessary?
And just one thing about TCP transmission protocols, if I were to write some gaming application using TCP, is it that the data are usually transferred using something called "sockets" (like Socket classes in Java)? What about UDP?
Thanks very much!
Appreciate any answer!
HTTP is an additional layer on top of TCP that defines what a request looks like, what a response looks like, and how the connection is closed or maintained across requests. You can either use it or not use it, depending on what you actually need to transport. If your game consists of a series of requests that each get a reply, HTTP might make sense. If it's more like unsolicited messages in each direction, making HTTP work is like putting a square peg in a round hole.
Most platforms provide a socket interface that allows you to work with either TCP or UDP depending on the protocol specified when the socket is allocated. Some higher-level APIs look completely different for different protocols.

using UDP to parallelize HTTP reads

Apparently, I don't get true parallel reads of different URLs on the same server, even issuing truly contemporary requests, on multiple physical interfaces (NICs).
I think the problem could be that HTTP protocol is connection oriented, then requests are serialized at lower level into TCP/IP stack (is this correct wording?).
Does make sense to attempt to 'reimplement' an high level HTTP request with a connectionless schema, like UDP, and handle myself packet addressing, to speedup streaming ?
HTTP requests are independent. They can be issues over arbitrarily many independent connections. HTTP does not impose an limits regarding concurrency.
You hit some resource limit. Maybe your client library restricts the number of concurrent calls. Maybe the server does. Maybe the network is fully utilized. Maybe back-end resources that the server uses are maxed out.
Find the bottleneck and eliminate it. The transport protocol is not the problem. Changing it can't help.
different URLs
Whether the URL is different or not makes no difference, except if the server implements some special throttling. Highly unlikely.
on multiple physical interfaces (NICs).
You are probably not network-bound.
requests are serialized at lower level into TCP/IP stack
No. Connection management is not part of HTTP. The client decided how many connections to use. Reconfigure the client.
Does make sense to attempt to 'reimplement' an high level HTTP request with a connectionless schema, like UDP, and handle myself packet addressing, to speedup streaming ?
You will have to re-implement flow control, segment fragmentation, re-transmission and other features of TCP protocol yourself. And then your HTTP implementation will not be compatible with the standard one.
So no, it does not make much sense.
For streaming you may like to use protocols designed for streaming, like WebRTC.

Icecast transport layer protocol - TCP or UDP?

I don't seem to find a answer, so I'm asking you.
Does a stock Icecast2 server use TCP or UDP to broadcast the streaming data? I know that it uses a custom HTTP based Application Layer protocol, so one might think its TCP, but on the other hand it is a broadcast application, so UDP would be more logical to me. If it uses TCP nonetheless, why does it do that?
Icecast and SHOUTcast both use TCP for both the source streams and streaming to end clients. There are many reasons this is beneficial:
The codecs used by most internet radio stations do not lend themselves well to having lost chunks of data. If the stream were corrupt, either by lost or out-of-order packets, the decoder will sometimes be able to re-sync and continue, but many will simply fail.
Most internet radio stations have no real latency requirement. Nobody knows or cares if they get the audio delayed by a few seconds. It is actually typical to crank up the buffer size to allow clients to start playback quickly, causing delays of 10-30 seconds.
It is important to be compatible with HTTP. I suspect that when Nullsoft originally built SHOUTcast, their goal was to get up and running with it as simply as possible, so it makes sense that they mimicked HTTP. I suspect that the reason Icecast and SHOUTcast are so popular is that it is easy to write a client for them because it is essentially HTTP. Now that web-based players are a reality (with Flash and even HTML5), it is critical that the protocol be compatible with HTTP as many browsers do not support other streaming protocols. (Flash has its own protocol, but it is not nearly as simple as HTTP to implement.) If a client can play a file streamed from an HTTP server, it can stream from Icecast (and SHOUTcast if it is lenient in its HTTP implementation).
You mentioned broadcast... I don't know if you meant in the sense of UDP broadcast packets, but those do not work well in practice over the internet. Therefore, the only benefit to using UDP would be to reduce overhead, but I think you will see that for the reasons above, the few bytes of overhead don't outweigh the benefits of TCP for this type of application.
In short, this is not a telephony application where latency matters and custom clients can be used.

How do XMPP/HTML/etc. *really* work?

This might be a dumb question, however, I have been continually frustrated by what seems to be a big gap in every explanation I've seen of protocols like XMPP or HTML. So basically, when I've read documentation on either, in general, it will describe the structure of the data sent back and forth through the protocol, but it does not explain exactly how this data is transferred. It's one thing to provide an example of, say, a generic HTTP request, but it is something else to explain how this text is actually sent to the server.
I guess posed another way, what resources are there out there for learning best practices for implementing text-based protocols? At their core, are all text-based protocols basically the exact same thing? How, for example, would it differ at the binary level, were I to say send the text content of an HTTP request over IRC vs however it is done natively by HTTP?
If I wanted to develop my own, simple textual protocol, what would be the best way to send the text to a client? Does the content itself even really matter? What I mean is that, obviously, HTTP and XMPP are rather different protocols, but do they differ in terms of how the text is transferred between computer to computer?
HTTP, IRC and XMPP are all sent on top of TCP, which is a protocol that provides a bidirectional stream between two endpoints (IP address + port). Under the hood, the data you send is split into separate packets, sent across the network, and reassembled on the other end, so that the recipient just sees a stream of incoming data - except when something goes wrong; there is a somewhat accessible description here.
What that means is that while the application protocol (HTTP, XMPP etc) is different, the underlying transport mechanism is exactly the same. It would be possible (perhaps even interesting) to implement HTTP on top of IRC: an HTTP/IRC client enters a channel, sends the HTTP request as messages to the channel, line by line, a server is present in the channel, reads the request and sends the response the same way - but transporting HTTP over IRC is fundamentally different from transporting HTTP over TCP. The former means layering an application protocol over another application protocol (and the IRC connection needs to go over TCP anyway), while the latter is an application protocol over a transport protocol, which is the way things usually are done (except for various kinds of proxies).
Hope that makes some sense...

HTTP push over 100,000 connections

I want to use a client-server protocol to push data to clients which will always remain connected, 24/7.
HTTP is a good general-purpose client-server protocol. I don't think the semantics possibly could be very different for any other protocol, and many good HTTP servers exist.
The critical factor is the number of connections: the application will gradually scale up to a very large number of clients, say 100,000. They cannot be servers because they have dynamic IP addresses and may be behind firewalls. So, a socket link must be established and preserved, which leads us to HTTP push. Only rarely will data actually be pushed to a given client, so we want to minimize the connection overhead too.
The server should handle this by accepting the connection, inserting the remote IP and port into a table, and leaving it idle. We don't want 100,000 threads running, just so many table entries and file descriptors.
Is there any way to achieve this using an off-the-shelf HTTP server, without writing at the socket layer?
Use Push Framework : http://www.pushframework.com.
It was designed for that goal of managing a large number of long-lived asynchronous full-duplex connections.
LightStreamer (http://www.lightstreamer.com/) is the tool that is made specifically for PUSH operations of HTTP.
It should solve this problem.
You could also look at Jetty + Continuations.

Resources