End-to-end property - networking

I understand the end-to-end principle from the classic MIT paper, which states that executing a function between 2 remote nodes does not depend on the states of the in-between nodes.
But what is end-to-end encryption, end-to-end guarantees, end-to-end protocols, etc...? I couldn't find a precise definition of end-to-end. The term seems to be over-used.
In other words, when one describes a system property X as end-to-end, what does it mean? What is the opposite of end-to-end?

I don't think end-to-end is over-used. It merely says that the property holds from one end to the other. An "end" can be a node or a layer in the computing stack.
Consider three nodes: A, B and C. Node A wants to talk with C. B sits between A and C and forwards messages between them. B is, for example, a load balancer or a gateway.
Encryption is end-to-end if B cannot read and tamper messages sent from A to C. A concrete example is the following: A is your laptop, C is a remote machine in your network at home or work. B is a VPN gateway. The encryption here is not end-to-end because only the link between A and B is actually encrypted. An attacker sitting between B and C would be able to read the clear text. That might be fine in practice, but it is not end-to-end.
Another example. Say we don't care about encryption, but about reliable message transmission. You know that the network might corrupt bits of messages. Therefore, TCP and other protocols have a checksum field that is checked whenever messages are received. But the guarantees of these checksums is not necessarily end-to-end.
If A sends a message m to C relying on the TCP's checksum, a node B sitting in the middle could corrupt the message in an undetectable way. Abstracting most details, node B basically (1) receives m, (2) checks m's checksum, (3) finds the route to C and creates a new message with m's payload, (4) calculates a new checksum for m, and (5) sends m (with the new checksum) to C. Now, if node B corrupts the message after (2) but before step (4), the resulting message arriving on C is corrupted but that cannot be detected by looking at m's checksum! Therefore, such checksum is not end-to-end. Node B does not even have to be malicious. Such a corruption can be caused by hardware errors or more probably by bugs in node B. This has happened a couple of times in Amazon S3 service, for example: this case and this case and this case.
The solution is, obviously, to use application-level checksums, which are end-to-end. Here, a checksum of m's payload is appended to the payload before calculating the lower layer checksum.

Related

Understanding NATS clustering

Section NATS Server Clustering states that:
Note that NATS clustered servers have a forwarding limit of one hop.
This means that each gnatsd instance will only forward messages that
it has received from a client to the immediately adjacent gnatsd
instances to which it has routes. Messages received from a route will
only be distributed to local clients. Therefore a full mesh cluster,
or complete graph, is recommended for NATS to function as intended and
as described throughout the documentation.
Let's assume that I have a NATS cluster of 3 nodes: A -> B -> C ( -> denotes a route). Would you please let me know what will happen wit NATS clients in the following scenario:
A message sent to node A
Node A suddenly terminates before delivering the message to node B
Thanks in advance
In the case you described, the message will be dropped.
Core NATS provides a delivery guarantee of "at most once", so if you cannot tolerate lost messages, your application needs to detect that the message never arrived in its destination and resend the message. You might detect this from a timeout using the request/reply pattern, or implement your own type of remediation for lost messages.
Alternatively, you can use NATS streaming, which provides log based persistence and sits atop NATS. It will guarantee the message will be delivered "at least once".

How to guarantee the transmission of data over the network?

Imagine 2 machines A and B. A wants to make sure B receives a packet P even if there are network failures, how can that be achieved?
Scenario 1:
1) A sends P over the network to B.
Problem: if the network fails at step 1, B won't receive P and A won't know about it.
Scenario 2 with acknowledgement:
1) A sends P over the network to B.
2) B sends back ACK if it receives P.
Problem: If the network fails at step 2, A won't receive the ACK, so A can't reliably know whether B received P or not.
Having ACKs of ACKs would in turn push the problem 1 step further.
This is a huge problem which has been studied for years. Read about the TCP protocol which guarantees that either the data will be delivered or the sender will (eventually) know that delivery may have failed.
You can start reading here: https://en.wikipedia.org/wiki/Transmission_Control_Protocol
but there are many more helpful and interesting pages about TCP on the web.
Or you could just use TCP and take advantage of the work that has already been done.

How is the shortest path in IRC ensured?

RFC 2810 says the following about one-to-one communication:
Communication on a one-to-one basis is usually performed by clients,
since most server-server traffic is not a result of servers talking
only to each other. To provide a means for clients to talk to each
other, it is REQUIRED that all servers be able to send a message in
exactly one direction along the spanning tree in order to reach any
client. Thus the path of a message being delivered is the shortest
path between any two points on the spanning tree.
(Emphasis mine.)
What does this "one direction" mean? To only one client? And how does this "reach any client" and find "the shortest path between any two points [hosts on the IRC network]"?
And why not simply cut the crap and store IP addresses of clients and let IP do its job? After all, IRC is built on top of TCP/IP.
Johannes alludes to the solution but doesn't fully answer your questions. He is however correct in that graph theory is a large part of the answer.
Because each child node in server maps of EFnet and IRCnet have only one parent, the shortest path is the only path between two servers on the graph; the same vertice cannot be visited twice without backtracking. This is called a spanning tree, where all nodes are connected, but no loops exist.
IRC is not necessarily unicast like TCP/IP. It communicates with multiple clients on different servers by broadcasting. The important thing to note is that the Client says 'send 'hi' to everyone on #coding', and the message travels from the client to the connected server. That server passes the message to any connected servers, and those servers pass it on to any clients subscribed to #coding and then on to any connected servers.
There isn't really anything like 'client-to-client' communication; one-to-one is accomplished by sending a message to a user with the specified name; not ip address. NickServs help to prevent people from hijacking names, and temporarily associate a nickname with an IP, refusing to authenticate other IP addresses, and protecting the nickname with a password when authentication expires.
In much the same way as sending a channel message, the user sends a message to the server 'send 'hi' to #nicky', and the server simply passes this message on until the client #nicky is listed as a client connected to the server receiving the message. Bots provide a means for #nicky to receive messages when offline; they sign in under the username.
EDIT: IRC actually opens an invite-only personal channel for client-client communications.
Essentially, the shortest path guarantee is a result of IRCs broadcast policy; the moment a message propagates near the desired user's server, it is forwarded to the desired user. Timestamps presumably prevent echoed messages if there are loops in the graph of servers.
In the architecture section, we find evidence that 'spanning tree' is being used in the proper sense. Servers are aware of eachother so as to prevent loops (guarantee shortest paths) and connect efficiently:
6.1 Scalability
It is widely recognized that this protocol does not scale
sufficiently well when used in a large arena. The main problem comes
from the requirement that all servers know about all other servers,
clients and channels and that information regarding them be updated
as soon as it changes.
and this one below is a result of having no alternate paths/detours to take
6.3 Network Congestion
Another problem related to the scalability and reliability issues, as
well as the spanning tree architecture, is that the protocol and
architecture for IRC are extremely vulnerable to network congestions.
IRC networks are designed to be IP agnostic, and follow the shortest path because messages propagate the whole graph, stopping when they reach an endpoint. Clients and servers have enough information to discard duplicate broadcasts. IRC is a very simple, but effective chatting protocol that makes no assumptions about security, IP, or hardware. You could literally use a networked telegraph machine to connect to IRC.
Every IRC server is connected to one or more servers in the same Network. A client connects to one of the server. Let's suppose we have the following setup:
A
/ \
B C
/ / \
D E F
Let's suppose a client on server A wants to send a message to a user on server E. In that case, server A only sends a message to server C, which will send this message to server E, but not to F.
If a client on A sends a message to a channel with users on the servers B and E then A will send the message to the servers B and C. B will send the message to the users in that channel that are connected to B, C will send the message to server E, which will send the message to it's clients in that channel.
Server D and F will never see the message because nobody in that channel is connected to them, but C will see the message even if nobody in that channel is connected to C because it has to rely the message to E

Are hashes and MACs vulnerable to bit-flipping attacks?

Suppose there is an encrypted communication between A and B, through an unsecure medium, such that A and B shared a secret key with DH protocol.
If A sends an encrypted message and the hash/MAC/HMAC of this message to B, wouldn't it be easy for an eavesdropper to just intercept the hash/MAC/HMAC, change some bits in it, and send it to B?
B wouldn't be able to check the integrity of all messages sent by A and thus will destroy them everytime he gets a message from A, right?
B will then become non available ???
Thank you
The process you describe is just a very specific form of corrupting the data. If an attacker can corrupt the data, then of course the attacker can prevent A from speaking to B. The attacker could just drop the packets on the ground. That would also prevent A from speaking to B.
Any data corruption, not just modifying the HMAC, will cause this same situation. If I modify the authenticated stream, then the (unmodified) HMAC won't match and it will be discarded.
The point of an HMAC is to ensure integrity. It has nothing to do with availability. Any Man-in-the-Middle can always trivially destroy availability in any system as long as the connection goes through them. (If they can't, they're not a MitM.)

Nagle-Like Problem

so I have this real-time game, with a C++ sever with disabled nagle using SFML library , and client using asyncsocket, also disables nagle. I'm sending 30 packets every 1 second. There is no problem sending from the client to the server, but when sending from the server to the clients, some of the packets are migrating. For example, if I'm sending "a" and "b" in completly different packets, the client reads it as "ab". It's happens just once a time, but it makes a real problem in the game.
So what should I do? How can I solve that? Maybe it's something in the server? Maybe OS settings?
To be clear: I AM NOT using nagle but I still have this problem. I disabled in both client and server.
For example, if I'm sending "a" and "b" in completly different packets, the client reads it as "ab". It's happens just once a time, but it makes a real problem in the game.
I think you have lost sight of the fundamental nature of TCP: it is a stream protocol, not a packet protocol. TCP neither respects nor preserves the sender's data boundaries. To put it another way, TCP is free to combine (or split!) the "packets" you send, and present them on the receiver any way its wants. The only restriction that TCP honors is this: if a byte is delivered, it will be delivered in the same order in which it was sent. (And nothing about Nagle changes this.)
So, if you invoke send (or write) on the server twice, sending these six bytes:
"packet" 1: A B C
"packet" 2: D E F
Your client side might recv (or read) any of these sequences of bytes:
ABC / DEF
ABCDEF
AB / CD / EF
If your application requires knowledge of the boundaries between the sender's writes, then it is your responsibility to preserve and transmit that information.
As others have said, there are many ways to go about that. You could, for example, send a newline after each quantum of information. This is (in part) how HTTP, FTP, and SMTP work.
You could send the packet length along with the data. The generalized form for this is called TLV, for "Type, Length, Value". Send a fixed-length type field, a fixed-length length field, and then an arbitrary-length value. This way you know when you have read the entire value and are ready for the next TLV.
You could arrange that every packet you send is identical in length.
I suppose there are other solutions, and I suppose that you can think of them on your own. But first you have to realize this: TCP can and will merge or break your application packets. You can rely upon the order of the bytes' delivery, but nothing else.
You have to disable Nagle in both peers. You might want to find a different protocol that's record-based such as SCTP.
EDIT2
Since you are asking for a protocol here's how I would do it:
Define a header for the message. Let's say I would pick a 32 bits header.
Header:
MSG Length: 16b
Version: 8b
Type: 8b
Then the real message comes in, having MSG Length bytes.
So now that I have a format, how would I handle things ?
Server
When I write a message, I prepend the control information (the length is the most important, really) and send the whole thing. Having NODELAY enabled or not makes no difference.
Client
I continuously receive stuff from the server, right ? So I have to do some sort of read.
Read bytes from the server. Any amount can arrive. Keep reading until you've got at least 4 bytes.
Once you have these 4 bytes, interpret them as the header and extract the MSG Length
Keep reading until you've got at least MSG Length bytes. Now you've got your message and can process it
This works regardless of TCP options (such as NODELAY), MTU restrictions, etc.

Resources