Using websocket over raw tcp even when no web browser is involved: good idea? - networking

After reviewing the differences between raw TCP and websocket, I am thinking to use websocket, even though it will be a client/server system with no web browser in the picture. My motivation stems from:
websocket is message-oriented, so I do not have to write down a protocol on top of the tcp layer to delimit messages myself.
The initial handshake of websocket is quite fitting for my use case as I can authenticate/authorize the user in this initial response-request exchange.
Performance does matter a lot here though, I am wondering if, excluding the websocket handshake, there would be a loss of performance between the websocket messages vs writing a custom protocol on raw tcp? If not, then websocket is the most convenient choice to me, even if I don't use the benefits related to the "web" part.
Also would using wss change the answer to the above question?

You are basically asking if using an already implemented library which perfectly fits your requirements and which even has the option for secure connections (wss) is better then designing and implementing your own message based protocol on TCP, assuming that performance and overhead are not relevant for your use case.
If you rephrase your question this way the answer should be obvious: using an existing implementation which fits your purpose saves you a lot of time and hassle for design, implementation and testing. It is also easier to train developers to use this protocol. It is easier to debug problems since common tools like Wireshark understand the protocol already.
Apart from this websockets have an established mechanism to use proxies, use a common protocol so that they can easier pass firewalls etc. So you will likely run into less problems when rolling out your application.
In other words: I can see no reason on why you should not use websockets if they fit your purpose.

Related

Deciding between TCP connection V/s web socket [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
We are developing a browser extension which would send all the URLs visited by a logged in user to backend APIs to be persisted.
Now as number of requests send to backend API would be huge and hence we are confused between if we create a persistent connection via websocket OR do it via TCP connection i.e. using HTTP rest API calls.
The data post to backend API doesn't need to be real time as we anyway would be using that data in our models which doesn't demand them to be real time.
We are inclined towards HTTP rest API calls as due to below reasons
Easy to implement
Easy to scale(using auto-scaling techniques)
Everyone in the team is already comfortable with the rest APIs
But at the same time cons would be
On the scale where we would have a lot of post requests going to server not sure it would be optimised
Feels like websockets can give us an optimised infrastructure :(
I would love if I can hear from community if we can have any pitfalls going with rest API calls option.
So first of all TCP is the transport layer. It is not possible to use raw TCP, you have to create some protocol on top of it. You have to give meaning to the stream of data.
REST or HTTP or even WebSockets will never be as efficient as customly designed protocol on top of raw TCP (or even UDP). However the gain may not be as spectacular as one may think. I've actually done such transition once and we've experienced only few percent of performance gain. And it was neither easy to do correctly nor easy to maintain. Of course YMMV.
Why is that? Well, the reason is that HTTP is already quite highly optimized. First of all you have "keep alive" header that keeps the connection open if it is used. And so the default HTTP mechanisms already persists connection when used. Secondly HTTP handles body compression by default, and with HTTP/2 it also handles headers compression. With HTTP/3 you even have more efficient TLS usage and better support in case of unstable network (e.g. mobile).
Another thing is that since you do not require real time data then you can do buffering. So you don't send data each time it is available, but you gather it for say few seconds, or minutes or maybe even hours, and send it all in one go. With such approach the difference between HTTP and custom protocol will be even less noticable.
All in all: I advice you start with the simplest solution there is, in your case it seems to be REST. Design your code so that transition to other protocol is as simple as possible. Optimize later if needed. Always measure.
Btw, there are lots of valid privacy and security concerns around your extension. For example I'm quite surprised that you didn't mention TLS at all. Which matters, not only because of security, but also because of performance: establishing TLS connections is not free (although once established, encryption does not affect performance much).
Putting my discomfort aside (privacy, anyone?)...
Assuming your extension collates the Information, you might consider "pushing" to the server every time the browser starts / quits and then once again every hour or so (users hardly ever quite their browsers these days)... this would make REST much more logical.
If you aren't collating the information on the client side, you might prefer a WebSocket implementation that pushes data in real time.
However, whatever you decide, you would also want to decouple the API from the transmission layer.
This means that (ignoring authentication paradigms) the WebSockets and REST implementations would look largely the same and be routed to the same function that contains the actual business logic... a function you could also call from a script or from the terminal. The network layer details should be irrelevant as far as the API implementation is concerned.
As a last note: I would never knowingly install an extension that collects so much data on me. Especially since URLs often contain private information (used for REST API routing). Please reconsider if you want to take part in creating such a product... they cannot violate our privacy if we don't build the tools that make it possible.

Implementing server push with Twisted framework

I am developing a group chat using the python Twisted framework. The technique I am using is Long polling with Ajax. I am returning SERVER_NOT_DONE_YET to keep the connection open. The code is non-blocking and allows other requests. How much scalable is it ??
However, I want to move ahead of this streaming over open connections. I want to implement a pure server push. How to do it ? Do I need to go in the direction of XMPP ? If I open a socket on the server for each unique client, which web server would best suit the bridging ? How much scalable would it be ?
I want it to be as much scalable as the C10K problem.I would like to stick to Twisted because it has a lot of protocol implementations in easy steps. Please point me in the right direction. Thanx
Long-polling works, but isn't necessarily your best option. It starts getting really nasty in terms of integration with firewalls and flaky internet connections. For example, at work, a lot of our customers' firewalls kill off any HTTP connection that isn't active for 10-20 seconds.
We've solved a lot of problems by switching over to WebSocket over SSL. WebSocket gives you a full-duplex channel, which is perfect for server push. By using SSL, firewalls are often less aggressive in their garbage collecting, and transparent proxies are often fooled by the TLS encryption. You will still need to manage the occasional disconnection on an application-level, even if you're using WebSockets instead of long-polling, but even that can be handled gracefully by having a decent recovery protocol, regardless of whatever transport protocol you use.
This being said, instead of going directly for WebSockets, we've decided to use SockJS. The main reason for this choice was that SockJS can use WebSockets when available (rfc6455, hixie-76, hybi-10), but also fall back to xhr-streaming, xdr-streaming, etc, if the client's browser does not support it (or if the connection fails). When I say that it can "fall back", I mean that the code you use on the client side remains exactly the same, SockJS takes care of the dirty work.
On the server side, the same is true. We currently use Cyclone's SockJS implementation for Twisted (in production), but we're also aware of DesertBus' implementation, which we still have to check out. There's also some other stuff that we're hoping to check out, for example WAMP, and the accompanying Autobahn|Python.
With regards to performance, we use HAProxy for SSL termination and load-balancing. HAProxy's performance is pretty amazing, on a multitude of levels.
We have migrated to WebSockets now. It works perfectly fine !!

Other common protocols besides HTTP?

I usually pass data between my web servers (in different locations) using HTTP requests (sometimes using SSL if it's sensitive). I was wondering if there were any lighter protocols that I might be able to swap HTTP(S) for that would also support public/private keys like SSH or something.
I used PHP sockets to build a SMTP client before so I wouldn't mind doing that if required.
There are lots and lots and lots of protocols. Lots. Start here for a list.
http://en.wikipedia.org/wiki/Internet_Protocol_Suite
SFTP is fun for passing data around. It works well. You'll find that it's not much better than HTTP, however, because HTTP is pretty simple.
http://en.wikipedia.org/wiki/SSH_file_transfer_protocol
SMTP would work. http://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol
SNMP can be made to work. http://en.wikipedia.org/wiki/Simple_Network_Management_Protocol You have to really push the envelope.
All of these, however, involve TCP/IP sockets, which involve a fair amount of overhead because of the negotiation for a connection and the acknowledgement of packets.
If you want real fun with very low overhead, use UDP.
http://en.wikipedia.org/wiki/User_Datagram_Protocol
You might want to use Reliable UDP if you're worried about messages getting dropped.
http://en.wikipedia.org/wiki/Reliable_User_Datagram_Protocol
I'd like to mention XMPP in addition to protocols already listed in other answers.
It's lightweight, and it is used in some "realtime" communication systems (for example, in GTalk).
WebSocket is a good option if you are interested in keeping a connection open to pass multiple messages back and forth. It's useful for issuing updates from the server to clients in real time, for example.
Why don't you simply use FTPS:
http://en.wikipedia.org/wiki/FTPS
or SFTP
http://en.wikipedia.org/wiki/SSH_file_transfer_protocol

TCP Vs. Http Benchmark

I am having a Web application sitting on IIS, and talking with [remote]Service-Machine.
I am not sure whether to choose TCP or Http, as the main protocol.
more details:
i will have more than one service\endpoint
some of them will be one-way
the other will be two-ways
the web pages will work infront of the services
we are talking about hi-scale web-site
I know the difference pretty well, but I am looking for a good benchmark, that shows how much faster is the TCP?
HTTP is a layer built ontop of the TCP layer to some what standardize data transmission. So naturally using TCP sockets will be less heavy than using HTTP. If performance is the only thing you care about then plain TCP is the best solution for you.
You may want to consider HTTP because of its ease of use and simplicity which ultimately reduces development time. If you are doing something that might be directly consumed by a browser (through an AJAX call) then you should use HTTP. For a non-modern browser to directly consume TCP connections without HTTP you would have to use Flash or Silverlight and this normally happens for rich content such as video and/or audio. However, many modern browsers now (as of 2013) support API's to access network, audio, and video resources directly via JavaScript. The only thing to consider is the usage rate of modern web browsers among your users; see caniuse.com for the latest info regarding browser compatibility.
As for benchmarks, this is the only thing I found. See page 5, it has the performance graph. Note that it doesn't really compare apples to apples since it compares the TCP/Binary data option with the HTTP/XML data option. Which begs the question: what kind of data are your services outputting? binary (video, audio, files) or text (JSON, XML, HTML)?
In general performance oriented system like those in the military or financial sectors will probably use plain TCP connections. Where as general web focused companies will opt to use HTTP and use IIS or Apache to host their services.
The question you really need an answer for is "will TCP or HTTP be faster for my application". The answer is that it depends on the nature of your application, and on the way that you use TCP and/or HTTP in your application. A generic HTTP vs TCP benchmark won't answer your question, because the chances are that the benchmark won't match your application behaviour.
In theory, an optimally designed / implemented solution using TCP will be faster than one that uses HTTP. But it may also be considerably more work to implement ... depending on the details of your application.
There are other issues that might affect your choice. For example, you are less likely to run into firewall issues if you use HTTP than if you use TCP on some random port. Another is that HTTP would make it easier to implement a load balancer between the IIS server and the backend systems.
Finally, at the end of the day it is probably more important that your system is secure, reliable, maintainable and (maybe) scalable than it is fast. A sensible strategy is to implement the simple version first, but have plans in your head for how to make it faster ... if the simple solution is too slow.
You could always benchmark it.
In general, if what you want to accomplish can be easily done over HTTP (i.e. the only reason you would otherwise think about using raw TCP is for a possible performance boost) you should probably just use HTTP. Sure, you can do socket programming, but why bother? Lots of people have spent a lot of time and effort building HTTP client libraries and servers, and they have spent waaaaaay more time optimizing and testing that code than you will ever be able to possibly spend on your TCP sockets. There are simply so many possible errors that you would have to handle, edge cases, and optimizations that can be done, that it is usually easier and safer to use a library function for HTTP.
Plus, the HTTP specs define all kinds of features (and clients/servers implement, which you get to use "for free", i.e. no extra implementation work) which makes any third-party interoperability that much easier. "Here is my URL, here are the rules for what you send, here are the rules for what I return..."
I have a Self Hosted Windows native C++ server application that I use the Casablanca C++ REST SDK code in. I can use any client C#, JavaScript, C++, cURL, basically anything that can send a POST, GET, PUT, DEL message can be used to send request messages to this self hosted windows app. Also I can use a plain browser address bar to do GET related requests using various parameters. Currently I only run this system on a private intranet so it is very fast - I haven't benchmark it against just doing raw TCP, but on a private intranet I doubt there would be even a few microseconds difference? For the convenience and ease of development and ability to expand to full blown internet app it's a dream come true. It is a dedicated system with a private protocol using small JSON packets so not certain if that fits your application needs or not? Another nice thing is this Windows application native C++ code could be ported fairly easily to run on Linux/MacOS as the Casablanca REST SDK is portable to those OSes.

Designing an application protocol [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have an existing standalone application which is going to be extended by a 3rd-party, using a network protocol. The capabilities are already implemented, all I need is to expose them to the outside.
Assuming the transport protocol is already chosen (UDP), are there any resources that will help me to design my application protocol?
There seems to be a lot of information about software design, but not on protocol design.
I've already looked at Application Protocol Design.
See Jabber protocols design guidelines and RFC 4101. Although it is aimed at making RFCs more easy to understand to reviewers, this RFC provides some interesting advices.
Have you looked at Google Protocol Buffer? It seems like a good way to resolve this issue.
You can create an endpoint that communicates with your existing app and then responds from 'outside' using the protobuffer protocol. It's binary, so it's tiny and fast and you don't have to write your own protocol manager, 'cause you can use the Google ones. The downside is that it has to be implemented on both sides of the system (on your 'server' side and on the consumer/client side).
Another recommendation for protocol buffers - nice tight binary with little effort. Note, however, that while the binary protocol is well defined, there isn't yet an agreed RPC standard (several are in progress, tending to lean towards TCP or HTTP).
The spec makes it very easy to have the client and server in different architectures, which is good - plus it is extensible.
Caveat: I'm the author of one of the .NET versions, so I may well be biased ;-p
First off, UDP is primarily a one-way broadcast transport method. Also, it is potentially lossy, so you need to be able handle missing packets and out-of-order packets. If you need any level of reliability from UDP, or require two-way connections, you will end up needing just about everything from TCP, so you might as well go with that to start with and let the network stack take care of it.
Next up, if your data is potentially larger than a single IP packet then you will need some way of identifying the start and end of each packet, and a means of handling illegal or corrupt packets. I would recommend some kind of header with packet length, some kind of footer, and maybe a checksum.
Then you need some way of encoding the messages and responses. There are many RPC protocols around. You could look at SOAP, or design a custom XML-based protocol, or a binary one.
You should really think hard about whether you really want to design, document and maintain your own protocol or use something that is already existing. It is probable there is already a documented protocol that matches your needs. Depending on what you are doing it will probably look overkill at first and implementing all the spec will look tedious and a lot less fun than writing your own but if you intend for your application to still be actively developed in a few years it should save you a lot of time and money to use something that already exist and is known by third parties. Besides, if you can use an existing library for that protocol, the implementation part should be a lot faster.
Designing new protocol is more fun than implementing one but less than maintaining one as you have to live with all the defects. No protocol is perfect but if you have never designed one you can be assured you will make more mistake designing it than the people who designed the existing well known protocol you could use instead.
In short, leverage what already exists whenever possible.
If you're choosing XML keep in mind that you will have a giant overhead of markup.
A simple binary protocol will also be need not so much ressources to parse compared to xml.
If you do not want to build your protocol from ground up, you should take a look at SOAP. Support varies for different programming languages, but cross language communication is explicitly encouraged.
Unfortunately UDP and SOAP seem to have stuck in its infancy, HTTP is most commonly used.
I have an existing standalone application which is going to be extended by a 3rd-party, using a network protocol.
It would help to know a little more about what your program does and what the nature of these 3rd party extensions are. Maybe some rationale for using UDP?

Resources