Application protocol design - tcp

I'm discovering TCP communications and I'm looking on how I will structure data for client/server parsing
Every request have a token, a command and arguments.
What I'm thinking: AUTH_TOKEN: 123456 CMD: doSomething ARGS: {user_id: 1}\n
The ARGS can contain arrays, hashes
Is the right way or there is design pattern's to do that ?

Related

Sending Raw Tcp Packet Using NetCat to Erlang Server

I am trying to create a TCP Server which will store incoming TCP Packets as binary, for a Key/Value Store. I already have an Erlang client which can send TCP packets to my Erlang Server, however for the sake of completeness, I want to allow the user to send TCP packets from a command line using clients such as NetCat. The user would adhere to a spec of how to format the data in the TCP Packet such that the Server will be able to understand it. For Example
$ nc localhost 8091
add:key:testKey
Key Saved!
add:value:testValue
Value Saved!
get:key:testKey
Value: testValue
The user interacts with the server by using the add:key/value: and get:key:. What is after that should be taken literally and passed to the server. Meaning a situation like this could be possible, if the user so wanted to.
$ nc localhost 8091
add:key:{"Foo","Bar"}
Key Saved!
add:value:["ferwe",324,{2,"this is a value"}]
Value Saved!
get:key:{"Foo","Bar"}
Value: ["ferwe",324,{2,"this is a value"}]
However, this doesn't seem possible to do as what actually happens is as follows...
I will pre-fill the erlang key/value store (using ETS) using my erlang client with a key of {"Foo","Bar"} and a value of ["ferwe",324,{2,"this is a value"}]. A tuple and list respectively (in this example) as this key/value store has to be able to accommodate ANY erlang compliant data type.
So in the example, currently there is 1 element in the ETS table:
Key
Value
{"Foo","Bar"}
["ferwe",324,{2,"this is a value"}]
I then want to retrieve that entry using NetCat by giving the Key, so I type in NetCat...
$ nc localhost 8091
get:key:{"Foo","Bar"}
My Erlang Server, receives this as <<"{\"Foo\",\"Bar\"}\n">>
My Erlang Server is set up to receive binary which is not an issue.
My question is therefore, can NetCat be used to send unencoded Packets which doesn't escape the quote marks.
Such that my Server is able to receive the Key and just <<"{"Foo","Bar"}">>
Thank you.
My question is therefore, can NetCat be used to send unencoded Packets which doesn't escape the quote marks.
Yes, netcat sends exactly what you give it, so in this case it sends get:key:{"Foo","Bar"} without escaping the quote marks.
Such that my Server is able to receive the Key and just <<"{"Foo","Bar"}">>
<<"{"Foo","Bar"}">> is not a syntactically correct Erlang term. Do you want to get the tuple {"Foo","Bar"} instead, in order to look it up in the ETS table? You can do it by parsing the binary:
Bin = <<"{\"Foo\",\"Bar\"}\n">>,
%% need to add a dot at the end for erl_parse
{ok, Tokens, _} = erl_scan:string(binary_to_list(Bin) ++ "."),
{ok, Term} = erl_parse:parse_term(Tokens),
ets:lookup(my_table, Term).

Is it necessary to use a queue to save messages received from clients and pending to be forwarded to the backend server?

I want to write a Proxy Server for SMB2 based on Asio and consider using a cumulative buffer to receive a full message so as to do business logic, and introducing a queue for multiple messages which will force me to synchronize the following resouce accesses:
the read and write operation on the queue because the two upstream/downstream queue are shared the frontend client and the backend server,
the backend connection state because reads on the frontend won't wait for the completion of connect or writes on the backend server before the next read, and
the resource release when an error occurs or a connection is normally closed because both read and write handlers on the same scoket registered with the EventLoop are not yet completed and a asynchronous connect operation can be initiated in worker threads while its partner socket has been closed, and those may run concurrently.
If not using the two queues, only one (read, write and connect) handler is register with the EventLoop on the proxy flow for a request, so no need to synchronize.
From the Application level,
I think a cumulative buffer is generally a must in order to process a full message packet (e.g. a message in the fomat | length (4 bytes) | body (variable) |) after multiple related API calls (System APIs: recv or read, or Library APIs: asio::sync_read).
And then, is it necessary to use a queue to save messages received from clients and pending to be forwarded to the backend server
use the following diagram from http://www.partow.net/programming/tcpproxy/index.html, it turned out to have similar thoughts to mine (the upstream concept as in NGINX upstream servers).
---> upstream ---> +---------------+
+---->------> |
+-----------+ | | Remote Server |
+---------> [x]--->----+ +---<---[x] |
| | TCP Proxy | | +---------------+
+-----------+ | +--<--[x] Server <-----<------+
| [x]--->--+ | +-----------+
| Client | |
| <-----<----+
+-----------+
<--- downstream <---
Frontend Backend
For a Request-Response protocol without a message ID field (useful for matching each reply message to the corresponding request message), such as HTTP, I can use one single buffer for every connection in the two downstream and upstream flows, and then continue processing the next request (note for the first request, a connection to the server is attempted, so it's slower than the subsequent processes), because clients always wait (may block or get notified by an asynchronous callback function) for the response after sending requests.
However, for a protocol in which clients don't wait for the response before sending the next request, a message ID field can be used to uniquely identify or distinguish request-replies pairs. For example, JSON-RPC 2.0, SMB2, etc. If I strictly complete the two above flows regardless of next read (without call to read and make TCP data accumulated in kernel), the subsequent requests from the same connection cannot be timely processed. After reading What happens if one doesn't call POSIX's recv “fast enough”? I think it can be done.
I also did a SMB2 proxy test using one single buffer for the two downstream and upstream flows on windows and linux using the ASIO networking library (also included in Boost.Asio). I used smbclient as a client on linux to create 251 of connections (See the following command):
ft=$(date '+%Y%m%d_%H%M%S.%N%z'); for ((i = 2000; i <= 2250; ++i)); do smbclient //10.23.57.158/fromw19 user_password -d 5 -U user$i -t 100 -c "get 1.96M.docx 1.96M-$i.docx" >>smbclient_${i}_${ft}_out.txt 2>>smbclient_${i}_${ft}_err.txt & done
Occasionally, it printed several errors, "Connection to 10.23.57.158 failed (Error NT_STATUS_IO_TIMEOUT)". But if increasing the number of connections, the number of errors would increase, so it's a threshold? In fact, those connections were completed within 30 seconds, and I also set the timeout for smbclient to 100. What's wrong?
Now, I know those problems need to be resolved. But here, I just want to know "Is it necessary to use a queue to save messages received from clients and pending to be forwarded to the backend server?" so I can determine my goal because it causes a great deal of difference.
Maybe they cannot care about the application message format, the following examples will reqest the next read after completing the write operation to it peer.
HexDumpProxyFrontendHandler.java or tcpproxy based on c++ Asio.
Other References
[Computer Networks: A Systems Approach] 5.3 Remote Procedure Call - Overcoming Network Limitations
[Computer Networks: A Systems Approach] 5.3 Remote Procedure Call - Overcoming Network Limitations at github
JSON RPC at wikipedia

Difference between multiplex and multistream

What is the difference between multistream (yamux, multistream-select, ..) and multiplex (mplex)?
I'd like to utilize one TCP connection for RPC, HTTP, etc (one client is behind firewall) like this:
conn = tcp.connect("server.com:1111")
conn1, conn2 = conn.split()
stream1 = RPC(conn1)
stream2 = WebSocket(conn2)
..
// received packets tagged for conn1 is forwarded to stream1
// received packets tagged for conn2 is forwarded to stream2
// writing to stream1 tags the packets for conn1
// writing to stream2 tags the packets for conn2
Which one suits this case?
The short answer: mplex and yamux are both Stream Multiplexers (aka stream muxers), and they're responsible for interleaving mulitiple "logical streams" over a single "raw" connection (e.g. TCP). Multistream is used to identify what kind of protocol should be used when sending / receiving data over the stream, and multistream-select lets peers negotiate which protocols are supported by each end and hopefully agree on one to use.
Long answer:
Stream muxing is an interface with several implementations. The "baseline" stream muxer is called mplex - a libp2p-specific protocol with implementations in javascript, go and rust.
Stream multiplexers are "pluggable", meaning that you add support for them by pulling in a module and configuring your libp2p app to use them. A given libp2p application can support several multiplexers at the same time, so for example, you might use yamux as the default but also support mplex to communicate with peers that don't support yamux.
While having this kind of flexibility is great, it also means that we need a way to figure out what stream muxer to use for any specific connection. This is where multistream and multistream-select come in.
Multistream (despite the name) is not directly related to stream multiplexing. Instead, it acts as a "header" for a stream of binary data that contextualizes the stream with a protocol id. The closely-related multistream-select protocol uses mutlistream protocol ids to negotiate what protocols to use for the "next phase" of communication.
So, to agree upon what stream muxer to use, we use multistream-select.
Here's an example the multistream-select back-and-forth:
/multistream/1.0.0 <- dialer says they'd like to use multistream 1.0.0
/multistream/1.0.0 -> listener echoes back to indicate agreement
/secio/1.0.0 <- dialer wants to use secio 1.0.0 for encryption
/secio/1.0.0 -> listener agrees
* secio handshake omitted. what follows is encrypted via secio: *
/mplex/6.7.0 <- dialer would like to use mplex 6.7.0 for stream multiplexing
/mplex/6.7.0 -> listener agrees
This is the simple case where both sides agree upon everything - if e.g. the listener didn't support /mplex/6.7.0, they could respond with na (not available), and the dialer could either try another protocol, ask for a list of supported protocols by sending ls, or give up.
In the example above, both sides agreed on mplex, so future communication over the open connection will be subject the semantics of mplex.
It's important to note that most of the details above will be mostly "invisible" to you when opening individual connections in libp2p, since it's rare to use the multistream and stream muxing libraries directly.
Instead, a libp2p component called the "switch" (also called the "swarm" by some implementations) manages the dialing / listening state for the application. The switch handles the multistream negotiation process and "hides" the details of which specific stream muxer is in use from the rest of the libp2p stack.
As a libp2p developer, you generally dial other peers using the switch interface, which will give you a stream to read from and write to. Under the hood, the switch will find the appropriate transport (e.g. TCP / websockets) and use multistream-select to negotiate encryption & stream multiplexing. If you already have an open connection to the remote peer, the switch will just use the existing connection and open another muxed stream over it, instead of starting from scratch.
The same goes for listening for connections - you give the switch a protocol id and a stream handler function, and it will handle the muxing & negotiation process for you.
Our documentation is a work-in-progress, but there is some information at https://docs.libp2p.io that might help clarify, especially the concept doc on Transports and the glossary. You can also find links to example code.
Improving the docs for libp2p is my main quest at the moment, so please feel free to file issues at https://github.com/libp2p/docs to let me know what your most important missing pieces are.

protobuf vs gRPC

I try to understand protobuf and gRPC and how I can use both. Could you help me understand the following:
Considering the OSI model what is where, for example is Protobuf at layer 4?
Thinking through a message transfer how is the "flow", what is gRPC doing what protobuf misses?
If the sender uses protobuf can the server use gRPC or does gRPC add something which only a gRPC client can deliver?
If gRPC can make synchronous and asynchronous communication possible, Protobuf is just for the marshalling and therefore does not have anything to do with state - true or false?
Can I use gRPC in a frontend application communicating instead of REST or GraphQL?
I already know - or assume I do - that:
Protobuf
Binary protocol for data interchange
Designed by Google
Uses generated "Struct" like description at client and server to un-/-marshall message
gRPC
Uses protobuf (v3)
Again from Google
Framework for RPC calls
Makes use of HTTP/2 as well
Synchronous and asynchronous communication possible
I again assume its an easy question for someone already using the technology. I still would thank you to be patient with me and help me out. I would also be really thankful for any network deep dive of the technologies.
Protocol buffers is (are?) an Interface Definition Language and serialization library:
You define your data structures in its IDL i.e. describe the data objects you want to use
It provides routines to translate your data objects to and from binary, e.g. for writing/reading data from disk
gRPC uses the same IDL but adds syntax "rpc" which lets you define Remote Procedure Call method signatures using the Protobuf data structures as data types:
You define your data structures
You add your rpc method definitions
It provides code to serve up and call the method signatures over a network
You can still serialize the data objects manually with Protobuf if you need to
In answer to the questions:
gRPC works at layers 5, 6 and 7. Protobuf works at layer 6.
When you say "message transfer", Protobuf is not concerned with the transfer itself. It only works at either end of any data transfer, turning bytes into objects
Using gRPC by default means you are using Protobuf. You could write your own client that uses Protobuf but not gRPC to interoperate with gRPC, or plugin other serializers to gRPC - but using gRPC would be easier
True
Yes you can
Actually, gRPC and Protobuf are 2 completely different things. Let me simplify:
gRPC manages the way a client and a server can interact (just like a web client/server with a REST API)
protobuf is just a serialization/deserialization tool (just like JSON)
gRPC has 2 sides: a server side, and a client side, that is able to dial a server. The server exposes RPCs (ie. functions that you can call remotely). And you have plenty of options there: you can secure the communication (using TLS), add authentication layer (using interceptors), ...
You can use protobuf inside any program, that has no need to be client/server. If you need to exchange data, and want them to be strongly typed, protobuf is a nice option (fast & reliable).
That being said, you can combine both to build a nice client/server sytem: gRPC will be your client/server code, and protobuf your data protocol.
PS: I wrote this paper to show how one can build a client/server with gRPC and protobuf using Go, step by step.
grpc is a framework build by google and it is used in production projects from google itself and #HyperledgerFabric is built with grpc there are many opensource applications built with grpc
protobuff is a data representation like json this is also by google in fact they have some thousands of proto file are generated in their production projects
grpc
gRPC is an open-source framework developed by google
It allows us to create Request & Response for RPC and handle rest by the framework
REST is CRUD oriented but grpc is API oriented(no constraints)
Build on top of HTTP/2
Provides >>>>> Auth, Loadbalancing, Monitoring, logging
[HTTP/2]
HTTP1.1 has released in 1997 a long time ago
HTTP1 opens a new TCP connection to a server at each request
It doesn't compress headers
NO server push, it just works with Req, Res
HTTP2 released in 2015 (SPDY)
Supports multiplexing
client & server can push messages in parallel over the same TCP connection
Greatly reduces latency
HTTP2 supports header compression
HTTP2 is binary
proto buff is binary so it is a great match for HTTP2
[TYPES]
Unary
client streaming
server streaming
Bi directional streaming
grpc servers are Async by default
grpc clients can be sync or Async
protobuff
Protocol buffers are language agnostic
Parsing protocol buffers(binary format) is less CPU intensive
[Naming]
Use camel case for message names
underscore_seperated for fields
Use camelcase for Enums and CAPITAL_WITH_UNDERSCORE for value names
[Comments]
Support //
Support /* */
[Advantages]
Data is fully Typed
Data is fully compressed (less bandwidth usage)
Schema(message) is needed to generate code and read the code
Documentation can be embedded in the schema
Data can be read across any language
Schema can evolve any time in a safe manner
faster than XML
code is generated for you automatically
Google invented proto buff, they use 48000 protobuf messages & 12000.proto files
Lots of RPC frameworks, including grpc use protocol buffers to exchange data
gRPC is an instantiation of RPC integration style that is based on protobuf serialization library.
There are five integration styles: RPC, File Transfer, MOM, Distributed Objects, and Shared Database.
RMI is another example of instantiation of RPC integration style. There are many others. MQ is an instantiation of MOM integration style. RabbitMQ as well. Oracle database schema is an instantiation of Shared Database integration style. CORBA is an instantiation of Distributed Objects integration style. And so on.
Avro is an example of another (binary) serialization library.
gRPC (google Remote Procedure Call) is a client-server structure.
Protocol buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.
service Greeter {
rpc SayHello (HelloRequest) returns (HelloResponse) {}
}
message HelloRequest {
string myname = 1;
}
message HelloResponse {
string responseMsg = 1;
}
Protocol buffer is used to exchange data between gRPC client and gRPC Server. It is a protocol between gRPC client and gRPC Server. Protocol buffer is implemented as a .proto file in gRPC project. It defines interface, e.g. service, which is provided by server-side and message format between client and server, and rpc methods, which are used by the client to access the server.
Both client and side have the same proto files. (One real example: envoy xds grpc client side proto files, server side proto files.) It means that both the client and server know the interface, message format, and the way that the client accesses services on the server side.
The proto files (e.g. protocol buffer) will be compiled into real language.
The generated code contains both stub code for clients to use and an abstract interface for servers to implement, both with the method defined in the service.
service defined in the proto file (e.g. protocol buffer) will be translated as abstract class xxxxImplBase (e.g. interface on the server side).
newStub(), which is a synchronous call, is the way to implement a remote procedure call (e.g. rpc in the proto file).
And the methods which build request and response messages are also implemented in the generated files.
I re-implemented simple client and server-side samples based on samples in the official doc. cpp client, cpp server, java client, java server, springboot client, springboot server
Recommended Useful Docs:
cpp/helloworld/README.md#generating-grpc-code,
cpp/basics/#generating-client-and-server-code,
cpp/basics/#defining-the-service,
generated-code/#client-stubs,
a blocking/synchronous stub
StreamObserver
how-to-use-grpc-with-spring-boot
Others: core-concepts,
gRPC can use protocol buffers as both its Interface Definition Language (IDL) and as its underlying message interchange format
In simplest form grpc is like a public vechicle.It will exchange data between client and server.
The protocol Buffer is the protocol like your bus ticket,that decides where you should go or shouldn't go.

Is servlet also a kind of RPC?

As far as I understand, RPC is a client-server model while the client sends some requests to the server side and get some results back. Then, is Java servlet also a kind of RPC which uses HTTP protocol? Am I right?
Here is the very first sentence of the wikipedia article on RPC:
In computer science, a remote procedure call (RPC) is an inter-process communication that allows a computer program to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network) without the programmer explicitly coding the details for this remote interaction.1 That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote.
So, Servlets would be an RPC mechanism if you could invoke a servlet from a client using
SomeResult r = someObject.doSomething();
That's not the case at all. To invoke a servlet, you need to explicitely send a HTTP request and encode parameters in the way the servlet expects them, then read and parse the response.

Resources