Designing an application protocol [closed] - networking

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have an existing standalone application which is going to be extended by a 3rd-party, using a network protocol. The capabilities are already implemented, all I need is to expose them to the outside.
Assuming the transport protocol is already chosen (UDP), are there any resources that will help me to design my application protocol?
There seems to be a lot of information about software design, but not on protocol design.
I've already looked at Application Protocol Design.

See Jabber protocols design guidelines and RFC 4101. Although it is aimed at making RFCs more easy to understand to reviewers, this RFC provides some interesting advices.

Have you looked at Google Protocol Buffer? It seems like a good way to resolve this issue.
You can create an endpoint that communicates with your existing app and then responds from 'outside' using the protobuffer protocol. It's binary, so it's tiny and fast and you don't have to write your own protocol manager, 'cause you can use the Google ones. The downside is that it has to be implemented on both sides of the system (on your 'server' side and on the consumer/client side).

Another recommendation for protocol buffers - nice tight binary with little effort. Note, however, that while the binary protocol is well defined, there isn't yet an agreed RPC standard (several are in progress, tending to lean towards TCP or HTTP).
The spec makes it very easy to have the client and server in different architectures, which is good - plus it is extensible.
Caveat: I'm the author of one of the .NET versions, so I may well be biased ;-p

First off, UDP is primarily a one-way broadcast transport method. Also, it is potentially lossy, so you need to be able handle missing packets and out-of-order packets. If you need any level of reliability from UDP, or require two-way connections, you will end up needing just about everything from TCP, so you might as well go with that to start with and let the network stack take care of it.
Next up, if your data is potentially larger than a single IP packet then you will need some way of identifying the start and end of each packet, and a means of handling illegal or corrupt packets. I would recommend some kind of header with packet length, some kind of footer, and maybe a checksum.
Then you need some way of encoding the messages and responses. There are many RPC protocols around. You could look at SOAP, or design a custom XML-based protocol, or a binary one.

You should really think hard about whether you really want to design, document and maintain your own protocol or use something that is already existing. It is probable there is already a documented protocol that matches your needs. Depending on what you are doing it will probably look overkill at first and implementing all the spec will look tedious and a lot less fun than writing your own but if you intend for your application to still be actively developed in a few years it should save you a lot of time and money to use something that already exist and is known by third parties. Besides, if you can use an existing library for that protocol, the implementation part should be a lot faster.
Designing new protocol is more fun than implementing one but less than maintaining one as you have to live with all the defects. No protocol is perfect but if you have never designed one you can be assured you will make more mistake designing it than the people who designed the existing well known protocol you could use instead.
In short, leverage what already exists whenever possible.

If you're choosing XML keep in mind that you will have a giant overhead of markup.
A simple binary protocol will also be need not so much ressources to parse compared to xml.

If you do not want to build your protocol from ground up, you should take a look at SOAP. Support varies for different programming languages, but cross language communication is explicitly encouraged.
Unfortunately UDP and SOAP seem to have stuck in its infancy, HTTP is most commonly used.

I have an existing standalone application which is going to be extended by a 3rd-party, using a network protocol.
It would help to know a little more about what your program does and what the nature of these 3rd party extensions are. Maybe some rationale for using UDP?

Related

Deciding between TCP connection V/s web socket [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
We are developing a browser extension which would send all the URLs visited by a logged in user to backend APIs to be persisted.
Now as number of requests send to backend API would be huge and hence we are confused between if we create a persistent connection via websocket OR do it via TCP connection i.e. using HTTP rest API calls.
The data post to backend API doesn't need to be real time as we anyway would be using that data in our models which doesn't demand them to be real time.
We are inclined towards HTTP rest API calls as due to below reasons
Easy to implement
Easy to scale(using auto-scaling techniques)
Everyone in the team is already comfortable with the rest APIs
But at the same time cons would be
On the scale where we would have a lot of post requests going to server not sure it would be optimised
Feels like websockets can give us an optimised infrastructure :(
I would love if I can hear from community if we can have any pitfalls going with rest API calls option.
So first of all TCP is the transport layer. It is not possible to use raw TCP, you have to create some protocol on top of it. You have to give meaning to the stream of data.
REST or HTTP or even WebSockets will never be as efficient as customly designed protocol on top of raw TCP (or even UDP). However the gain may not be as spectacular as one may think. I've actually done such transition once and we've experienced only few percent of performance gain. And it was neither easy to do correctly nor easy to maintain. Of course YMMV.
Why is that? Well, the reason is that HTTP is already quite highly optimized. First of all you have "keep alive" header that keeps the connection open if it is used. And so the default HTTP mechanisms already persists connection when used. Secondly HTTP handles body compression by default, and with HTTP/2 it also handles headers compression. With HTTP/3 you even have more efficient TLS usage and better support in case of unstable network (e.g. mobile).
Another thing is that since you do not require real time data then you can do buffering. So you don't send data each time it is available, but you gather it for say few seconds, or minutes or maybe even hours, and send it all in one go. With such approach the difference between HTTP and custom protocol will be even less noticable.
All in all: I advice you start with the simplest solution there is, in your case it seems to be REST. Design your code so that transition to other protocol is as simple as possible. Optimize later if needed. Always measure.
Btw, there are lots of valid privacy and security concerns around your extension. For example I'm quite surprised that you didn't mention TLS at all. Which matters, not only because of security, but also because of performance: establishing TLS connections is not free (although once established, encryption does not affect performance much).
Putting my discomfort aside (privacy, anyone?)...
Assuming your extension collates the Information, you might consider "pushing" to the server every time the browser starts / quits and then once again every hour or so (users hardly ever quite their browsers these days)... this would make REST much more logical.
If you aren't collating the information on the client side, you might prefer a WebSocket implementation that pushes data in real time.
However, whatever you decide, you would also want to decouple the API from the transmission layer.
This means that (ignoring authentication paradigms) the WebSockets and REST implementations would look largely the same and be routed to the same function that contains the actual business logic... a function you could also call from a script or from the terminal. The network layer details should be irrelevant as far as the API implementation is concerned.
As a last note: I would never knowingly install an extension that collects so much data on me. Especially since URLs often contain private information (used for REST API routing). Please reconsider if you want to take part in creating such a product... they cannot violate our privacy if we don't build the tools that make it possible.

Using websocket over raw tcp even when no web browser is involved: good idea?

After reviewing the differences between raw TCP and websocket, I am thinking to use websocket, even though it will be a client/server system with no web browser in the picture. My motivation stems from:
websocket is message-oriented, so I do not have to write down a protocol on top of the tcp layer to delimit messages myself.
The initial handshake of websocket is quite fitting for my use case as I can authenticate/authorize the user in this initial response-request exchange.
Performance does matter a lot here though, I am wondering if, excluding the websocket handshake, there would be a loss of performance between the websocket messages vs writing a custom protocol on raw tcp? If not, then websocket is the most convenient choice to me, even if I don't use the benefits related to the "web" part.
Also would using wss change the answer to the above question?
You are basically asking if using an already implemented library which perfectly fits your requirements and which even has the option for secure connections (wss) is better then designing and implementing your own message based protocol on TCP, assuming that performance and overhead are not relevant for your use case.
If you rephrase your question this way the answer should be obvious: using an existing implementation which fits your purpose saves you a lot of time and hassle for design, implementation and testing. It is also easier to train developers to use this protocol. It is easier to debug problems since common tools like Wireshark understand the protocol already.
Apart from this websockets have an established mechanism to use proxies, use a common protocol so that they can easier pass firewalls etc. So you will likely run into less problems when rolling out your application.
In other words: I can see no reason on why you should not use websockets if they fit your purpose.

How to author an Internet protocol?

We're all familiar with popular protocols like IMAP and POP, used for email messaging.
I have a plan for a new protocol, but I'm not sure to go about implementing it.
Is the protocol a collection of C source code, for example, that accepts and sends data through ports? Or is a protocol just a thorough description of how data should be sent, which clients then implement?
I'm lost where to start here, and I'm not very familiar with how the protocol system works.
Edit:
Also, if I write a protocol and it isn't made official by the standards group, can people/clients still implement it?
The official way is to write an RFC - a Request for Comments. People will respond to that (that's why it's an RFC) and probably try to implement your protocol.
As soon as two independent implementations exist that completely support the protocol, it's a new standard.
Of course, people aren't going to implement a new protocol for someone just for fun. So you should first find a group who is interested in listening to you. Maybe there already is a protocol which does what you want (or can easily be extended).
But you probably don't want to invent a new standard. Standards are a lot of work and - for some - overrated.
So you should describe how it works and create a library that can read and write the protocol, so developers can use it even though it's not an official standard.
As you are interested in the Replace Email section of the Paul Graham article you linked, then IMHO you will need to both develop a protocol definition, and also provide an example implementation. The protocol definition does not need to be published as an internet protocol standard in order to be useful.
You will need an implementation to so that you can test, refine and improve the ideas. It is extremely unlikely the protocol will be right at the first attempt, and you'll need something to support the initial users.
You don't need a protocol definition to implement an improved email, but you will need one if you expect others to work with you and adopt it, though it very much depends on your 'business model'. I strongly recommend you have a protocol definition from the start, even if only to keep yourself sane when you try to produce the second implementation.
I recommend having a look at some examples of sneaky approaches to protocols and implementation. My favourite is described in the Viewpoints Research 2008 Progress report on a super-compact approach to TCP/IP.
They did not follow the traditional approach to developing the implementation of a protocol (the protocol stack). Instead they wrote code which parsed the human-readable TCP/IP protocol specification, and generated the code of a TCP/IP stack from that protocol document. The usual TCP/IP stack is about 40,000 lines of code, or more. Their program, which read the protocol specification, and generated the code for a TCP/IP stack 'automatically' was only 160 lines of code. They use extremly powerful programming tools.
If you had an approach like that, you could keep the protocol implementation synchronised with the specification, and potentially make it straightforward for others to adopt your protocol.
HTH
You are confusing a protocol standard with the implementation.
These 2 are unrelated.
A protocol is described in a high level but has enough information for someone to undestand how it should be implemented.
The idea is that someone reading the document can understand how/what to implement in any language of preference
To give an example: SIP protocol in the RFC describes the various flows and also has the various messages and how they are supposed to b processed i.e. the semantics well defined.
You can implement a SIP UA or Server in C++ or Java. This is irrelevant to the SIP protocol
For this you don't need to provide any source code (you could though if you think it helps clarify some obscurity of the description).
The most important part is that your protocol is actually reviewed by stakeholders i.e. people that expect it to solve their problems.
This part is the most important not only because it could solve problems in your protocol but because they can actually verify that the concept is solid i.e. can be technically implemented
The only case that one could specify something concrete or imply something is if for example the protocol described something demanding some specific constraints e.g. hard-real time constraint which could serve as "hint" on which implementation/languages to avoid
Also, if I write a protocol and it isn't made official by the
standards group, can people/clients still implement it?
Strange question.What do you mean?How will someone know your protocol exists?
If it is official he can get it from the standards group to implement it.
Otherwise it is obvious that you have some sort of "proprietary" protocol (which is not uncommon e.g. a company can have an internal protocol for its own software) and people have to get the spec from you.

VB.Net - Networking method for client/server game

My first question so go easy on me :)
I've been developing for years and have written WAY too many apps (mostly web apps) using web services - I'm happy with SOAP/WSDL/etc... I also used to write TCP/IP client-server apps back in the day using good old winsock.
I'm a bit bored and looking for a new project to expand my skills so decided to have a go at doing either a game or some sort of server monitoring and remote control application
I haven't decided which and the answer to this question will hopefully inform my decision.
What I'd like is some advice as to which methods I should be looking to handle the communication.
Let's assume I'm doing thew game for the moment - I want 2-way communication with low latency and the ability to handle as many simultaneous connections as possible.
I've considered web services but it seems like a lot of overhead - especially as I'd need the client to expose one as well.
TCP/IP would do the job but seems like it's a little low-level and I lsoe a lot of the advantages like definitions. Presumably I'd need to formulate a new protocol for the communications etc... I'm also unsure how I'd have one client use multiple channels for concurrent information - eg a chat and updating location information. I could attempt to multiplex this in some way but my initial ideas re: the queuing seem quite messy
.Net remoting - I've not really touched this much at all. Seems to have low overhead and more flexibility than webservices but I don't know enough to evaluate properly.
I'd really appreciate any input you can provide (and a link to a tutorial would be fantastic)
Thanks in advance for your help
EDIT: I've had an answer which points me at a UDP library. Is UDP appropriate for this? For location information/similar which requires no history, I can see how this is advantageous but for a chat, a lost packet could be an issue - or do I manually send back an acknowledgment of receipt? If so, aren't I duplicating TCP/IP functionality for limited advantage?
Apologies if this is an incorrect way to expand on the question - guidance for that appreciated too :)
If you're up to date on .NET 3.5 SP1, then you should use WCF. You say you don't want to use web services, and I assume from that you mean you don't want to use SOAP over HTTP. WCF does a lot more than SOAP over HTTP. In particular, it can do binary over TCP/IP using the same infrastructure. It also has support for peer-to-peer.
Take a look at something like Lidgren and see how that work's. Its written in c# so its able to be used with VB.Net
Lidgren is a socket wrapper, Ive used it in a few small scale multiplayer games, ( mainly by using a header stating packet type. ie first byte represents packet type,
Lidgren

Why HTTP protocol is designed in plain text way?

Yesterday, I have a discussion with my colleagues about HTTP. It is asked why HTTP is designed in plain text way. Surely, it can be designed in binary way just like TCP protocol, using flags to represents different kinds of method(POST, GET) and variables (HTTP headers). So, why HTTP is designed in such way? Is there any technical or historical reasons?
A reason that's both technical and historical is that text protocols are almost always preferred in the Unix world.
Well, this is not really a reason but a pattern. The rationale behind this is that text protocols allows you to see what's going on on the network by just dumping everything that goes through. You don't need a specialized analyzer as you need for TCP/IP. This makes it easier to debug and easier to maintain.
Not only HTTP, but many protocols are text based (e.g., FTP, POP3, SMTP, IMAP).
You might want to take a look at The Art of Unix Programming for a much more detailed explanation of this Unix thing.
With HTTP, the content of a request is almost always orders of magnitude larger than the protocol overhead. Converting the protocol into a binary one would save very little bandwidth, and the easy debugability that a text protocol offers easily trumps the minor bandwidth savings of a binary protocol.
Many Internet application protocols use more or less plain text for the protocol (see FTP, POP, SMTP, etc.).
It makes interoperability and troubleshooting much easier.
HTTP stands for "Hypertext Transfer Protocol".
It was initially devised as a way to serve text documents, hence the text based protocol.
What we do with HTTP now is far beyond its original intent.
As with RFC 2616 section 3.7.1 for HTTP 1.1, the key identifier to a line of command or header is the text line-break CRLF; text-based application protocols makes it easier to carry out a conversation (for troubleshooting) purely with a Telnet client. It also makes it easier to program with ReadLine() calls and matching text strings.
The CRLF parameter break also gives near-unlimited abitrary header extensions unlike a fixed-size TCP or IP headers where one hard-codes by bit offsets.
So it's easier to "read" the traffic or create a client or server?
You can debate whether it actually makes it easier, but surely that was the intent.
In the case of http ,some people work on a "binary" version of it, they called it Embedded Binary HTTP (EBHTTP)
https://datatracker.ietf.org/doc/html/draft-tolle-core-ebhttp-00
Historically, it all starts from RFC822 (STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES), whose latest version is RFC5322 (Internet Message Format). SMTP (RFC 821) was one of the most popular protocol based on RFC822. And, HTTP was born out of SMTP (your mail protocol).
I like the:
...preferred in the Unix world.
reason, but it doesn't go into any explanation for why.
In order to understand why you need to place yourself into the shoes of a designer that wants to make a usable product.
A) You can document the shit out of meaningless gibberish (binary).
B) Develop or hope others develop tools that portray your meaningless gibberish in a meaningful way.
or
A) You can document the shit out of meaningful text that takes advantage of language as a tool for a self-documenting protocol.
B) There is no immediate need for additional tools, and additional tools will be much easier to write and debug.
It creates staged delivery and creates something that is easier to comprehend & recall when doing future development. It also creates a situation where a higher level abstraction is no longer necessary.
Imagine a world where setting a header value isn't as simple as dictionary/Map somewhere in your framework. When running into errors you'd have to constantly question whether or not your framework is correct or not, because you couldn't easily see it's doing the right thing without additional tools. That would be the world of HTTP if each framework needed to invent/implement it's own higher level abstraction (browsers come to mind).
Many protocol designer's want efficiency, this design focuses on usability, which is paramount in the software development industry. Unusable tools that are prematurely optimized create an unnecessary burden for software developers, and this burden manifests across the board.
Now,HTTP/2 based Binary,it is much less error-prone.
https://http2.github.io/faq/#why-is-http2-binary

Resources