STUN and TURN clarification - networking

I need to establish a connection between a server and clients which can both be behind any type of NAT. For this purpose I have a dedicated host on the internet with clean IP for hosting STUN/TURN server. I'm not going to use WebRTC, I just want to use STUN/TURN server for messaging between clients and a server. After reading RFC's, SO, etc I have some questions left unclear:
In which case STUN is used? In my understanding STUN is used only for Full-cone NAT. In all other cases TURN server must be used, because of host and/or port restriction. Is that correct?
It seems I need a signalling server to notify clients about server address and vice versa. But as soon as client/server sends a message to the signalling server, I know their outer host:port, so I let each side know other's side host:port, each side can send messages to this signalling server containing peer's host:port data, which the signalling server can use to detect which peer this message is for and forward it to corresponding peer. At first sight this logic seems to me pretty straight-forward and my signalling server becomes a TURN server - is that how TURN server is implemented? But if so, I don't understand, why would I need a TURN server like "coturn", "reTurn", etc? I know they implement ICE, but how this ICE will work, if my signalling server received message from concrete host:port of a peer, so that is the only candidate that can be used for connection with the peer?
In case of restricted NAT (port, address or symmetric), how long a client outer (public) port is open on router for receiving UDP datagrams? I read that TURN client sends refresh messages to server to keep channel open, is this how client also prevents ports from closing?

STUN can bridge P2P connections for most NATs except the symmetric variety, which have unpredictable port mapping. TURN is needed for the latter.
Signaling is typically done with TCP and a different socket. P2P media is typically UDP. So there's that distinction. You might discover the IP address with the signaling servers help, but you won't reliably discover the port. Even if both are TCP, you probably want a separate socket connection for the signaling service than the media.
From my experience: anywhere from 1-2 minutes. Sometimes longer. In the absence of data flowing in both directions, have keep alive messages that flow every 45 seconds to keep the session from getting dropped.

Related

Retrieve available clients via UDP broadcast

I'm currently developing a "node-based" system where a server will send out a UDP broadcast on the private network (with a custom protocol), which will be received by several different clients which supports the specified protocol. The server will after the request pick between some of the clients for a more steady TCP connection.
Request for client sequence
Server broadcasting a request-for-ip message to every device/node on the network.
All available clients that supports the protocol will answer with their unique IP to the server.
Server chooses among the clients via a request-for-connection message.
Client that got choosen by the server connects to the server via TCP for a reliable connection.
My question
I've got pretty good knowledge about both TCP and UDP, but I've never designed a system like this before. Do you think this system is built in the right way or is there a more "standard" way doing something similar to this? What are your thoughts?
Thanks!
--- Edit ---
Added a diagram of the program.
There is a standard protocol to advertise services on the network, which you may like to consider: Simple Service Discovery Protocol, based on periodic UDP multicast:
The Simple Service Discovery Protocol (SSDP) is a network protocol based on the Internet protocol suite for advertisement and discovery of network services and presence information. It accomplishes this without assistance of server-based configuration mechanisms, such as Dynamic Host Configuration Protocol (DHCP) or Domain Name System (DNS), and without special static configuration of a network host. SSDP is the basis of the discovery protocol of Universal Plug and Play (UPnP) and is intended for use in residential or small office environments.
In this protocol clients join that UDP multicast group to discover local network services and initiate connections to them, if they wish to. And this is pretty much the intended use case for the protocol, which is somewhat different from your use case.
One benefit of IP/UDP multicast is that multicast packets can be dropped in the network adapter if no process on the host has joined that multicast group. Another one is that IP/UDP multicast can be routed across networks.
From the diagram you posted:
The server is the mediator (design pattern) whose location must be known to every other process of the distributed system.
The clients need to connect/register with the server.
Your master client is a control application.
It makes sense for the server to advertise itself over UDP multi-cast.
Online clients would connect to the server using TCP on start or TCP connection loss. If a client terminates for any reason that breaks the TCP connection and the server becomes immediately aware of that, unless the client was powered off or its OS crashed. You may like to enable frequent TCP keep-alives for the server to detect dead clients as soon as possible, if no data is being transmitted from the server to the clients. Same applies to the clients.
All communications between the server and the clients happen over TCP. Otherwise you would need to implement reliable messaging over UDP or use PGM, which can be a lot of work. Multicast UDP should only be used for server discovery, not bi-directional communication that requires reliable delivery.
The master client also connects to the server, possibly on another port, for control. The master client can discover all available servers (if there is more than one) and allow the user to choose which one to connect to.

RPC & TCP Behavior

Can someone describe from a network point of view what RPC (SUN and/or DCE) is and why it deviates from standard TCP behavior?
The way that I understand it is a client reaches out to a server with a unique source port and then switches the source port after the TCP three way handshake finishes. I work with ASA firewalls so this behavior becomes very apparent when the inspection of DCE RPC is not enabled since the firewall will block it because it sees it as a threat. I have read a few MS TechNet articles and other website definitions to include watching about five Youtube videos which all seem to explain it from a programmers perspective but I have yet to fully understand this concept since I am not a programmer.
Note that there is nothing that deviates from standard TCP regarding the RPC protocols.
SunRPC or DCE RPC works on top of UDP(at least SunRPC can use UDP) or on top of TCP.
Typically in order for an RPC client to contact/call an RPCserver, it first contacts some sort of lookup server (called portmapper or rpcbind in the case of SunRPC), which replies with the location (IP address and port number) of where the actual server is running.
So from a networking perspective:
RPC Servers listens on a random port number, which may change each time that server program is (re)started.
At startup the RPC server connects to the portmapper, which runs on a well known port and register itself with which IP address and port number it's listening on.
Normally the portmapper service runs on the same machine as the RPC server programs.
When a client wants to connect to or call an RPC service it performs these actions:
Connects to the portmapper, on a well known/standard destination port and asks it where the particular service it wants to connect to is.
portmapper replies with the IP address and port number of the service the client asked for.
client tears down the connection to the portmapper
client establishes a new connection to the service using the IP address and port number that pormapper gave it.
client calls the RPC service over this new connection, which the client may use for multiple RPC calls.
These RPC calls are just application message exchanged on top of a TCP connection.
(In the case UDP is used instead of TCP, it works much the same, but there's no naturally no connection setup/teardown being performed over the network)
This presents a problem for firewalls, since the servers listens on randomly chosen port numbers, one cannot administratively allow access to a particular port number. Instead a firewall wanting to support this kind of setup would need to open up the portmapper port, catch the RPC messages going to that well known port of portmapper, inspect the message content exchanged with the portmapper to extract the IP address and port number from the RPC messages(the portmapper is itself implemented as an RPC server) in order to dynamically open a port between the RPC server and client.

How do WebRTC peers connect to each other if none have opened ports?

I know there needs to be a STUN/ICE/TURN server to find the IP addresses of the peers involved in a WebRTC communication. However, even after IPs are found, how do the peers actually talk to each other independently without having any ports opened?
If you build a website, you usually have to open the ports on your server to have others access your site. What's the magic that is happening in WebRTC that I'm not understanding?
There are several strategies to do this: one possibility is for the client to explicitly open a port via UPnP. I'm not sure if any current WebRTC client does so, but in general networking this is a possibility.
Failing that, the STUN server kicks in. There are several hole punching techniques it can try; read the aforelinked article for the gory details. In short though, a firewall will usually open a port for outgoing traffic (because it needs to receive responses), so by establishing an outgoing connection to a known target and then making note of the port that was opened it is possible to open a port.
Failing even that, a TURN server is necessary. This server is publicly accessible from both peers, even if both peers cannot see each other. The TURN server then will act as a relay between the two. This somewhat negates the point of a P2P protocol, but is necessary in a certain percentage of situations (estimates range around 10%-20%).
The original Question is "what/who creates the sockets?"
The browsers creates the socket and bind them to a local port for you
during the "ICE gathering".
Wether you use any stun/turn server or
not, each candidate generated during the ice gathering has a
corresponding port open.
Those ports are usually open only for 30 mn
after which they are revoked to avoid an attack by someone using old
and/or spoof candidates. These 30mns are not specified in any
specification and are an arbitrary choice by the browser vendor. -
The next question is "how does the remote peer know about which ports are open".
through the ICE mechanism, which for each media will generate potential candidates and send them to the remote peer through your preferred signaling channel.
ICE candidates (which are one line of SDP, really) have a "type". if this type is HOST, then your candidate is a local candidate generated without the use of any stun or turn server. is the type is SRFLX, then you have used a STUN server to add the mapping between your local IP:port and your public IP:port. if your type is RELAY, same thing with a TURN server.
of course, using the local IP:port HOST candidate will fail unless the remote peer is on the same local network.
From the browser and local system point of view, the socket is open on the local IP:PORT anyway. Hence, opening the sockets and finding out on which port a remote peer should connect to connect to the socket are separate problems handled separately.
The Final question is: "can it really work without a STUN server"
Most probably no, unless you are on the same sub network.
Stats shows (http://webrtcstats.com) that even with a STUN server, you still fail in 8% of the case, for the general public. It's much more in enterprise, where you'd better have advanced turn (supporting tunneling through TCP/80 and TLS/443) and even support for HTTP proxy's CONNECT method.

Client Server Architecture

I'm struggle with what technique to choose for a server client aspect of my application.
Defining design
Windows, C# on .net 2
On many machines there is a .net 2 service. I call that the Client.
Machines can be in different networks behind NAT's (or not) connected to Internet.
Server services are public.
Requirements
To communicate with the Clients on demand.
Client must listen for incoming connections.
The server can be or not online.
Port forwarding is not possible.
What are my choices to do something like that?
Now I'm looking in the UDP Hole punching technique. The difference between the UDP hole punching technique setup and my setup is that instead of having 2 clients behind a NAT and a mediation Server, I got only the client behind the NAT that must communicate with the server. That must be easier but I'm having hard time to understand and implement.
I'm on the right way with the this kind of NAT traversal or may be some other methods much easier to implement?
Other methods that I've taken in consideration:
When the service sees the server online, creates a connection to the server using TCP. The problem is that I have something around 200 clients, and the number is rising and I was afraid that this is a resource killer.
When the service sees the server online, checks a database table for commands then at every 30 seconds checks again. This is also a resource killer for my server.
Bottom of line is, if the UDP Hole Punching tehnique is the right way for this scenario, please provide some code ideas for de UDPServer that will run on the service behind NAT.
Thank you.
Hole punching and p2p
You might be interested in a high level discussion of UDP hole punching. Hole punching is needed if you want clients (who both might be behind a firewall) to communicate directly without an relaying server. This is how many peer 2 peer (p2p) communications work.
With p2p, typically NAT'ed clients must use some external server to determine each other's "server reflexive address." When NAT translation occurs, behind the firewall ports can be mapped to some arbitrary port to the public. A client can use a STUN server to determine its "server reflexive address." Clients then will, through an intermediary server, exchange server reflexive addresses and can initiate communication (with hole punching to initiate the session).
Often, a NAT will not behave in a manner to allow direct communication as described above. Sending packets to different destinations will cause a NAT to map ports to entirely different values depending on the destination. In this case, a TURN server is needed.
Links
Nat traversal and different types of NAT behaviors
STUN RFC
TURN RFC
Server-client communication
If your client only needs to communicate with a server, hole punching is not needed. As long as the client can communicate with the public Internet, then you can use any C# socket API (I'm not familiar with C#) to make connections to the server's public IP/port combination. Typically, clients making socket connections don't specify a source port and let the underlying socket API make that decision since it really doesn't matter.
Your server should be listening to a specific port (you make this determination), and when it receives a packet from the client, the source address of the packet will be some NAT'ed address. In other words, the source address will be the public IP of whatever firewall your client is behind. If the NAT changed the source port of the client's packet, the server will see this NAT'ed port as the source port. It really doesn't matter, since when the server sends back a response packet, the NAT on the client machine will translate the destination port (it stores translations internally) and correctly send the packet back to the correct private host (the client).

How does the socket API accept() function work?

The socket API is the de-facto standard for TCP/IP and UDP/IP communications (that is, networking code as we know it). However, one of its core functions, accept() is a bit magical.
To borrow a semi-formal definition:
accept() is used on the server side.
It accepts a received incoming attempt
to create a new TCP connection from
the remote client, and creates a new
socket associated with the socket
address pair of this connection.
In other words, accept returns a new socket through which the server can communicate with the newly connected client. The old socket (on which accept was called) stays open, on the same port, listening for new connections.
How does accept work? How is it implemented? There's a lot of confusion on this topic. Many people claim accept opens a new port and you communicate with the client through it. But this obviously isn't true, as no new port is opened. You actually can communicate through the same port with different clients, but how? When several threads call recv on the same port, how does the data know where to go?
I guess it's something along the lines of the client's address being associated with a socket descriptor, and whenever data comes through recv it's routed to the correct socket, but I'm not sure.
It'd be great to get a thorough explanation of the inner-workings of this mechanism.
Your confusion lies in thinking that a socket is identified by Server IP : Server Port. When in actuality, sockets are uniquely identified by a quartet of information:
Client IP : Client Port and Server IP : Server Port
So while the Server IP and Server Port are constant in all accepted connections, the client side information is what allows it to keep track of where everything is going.
Example to clarify things:
Say we have a server at 192.168.1.1:80 and two clients, 10.0.0.1 and 10.0.0.2.
10.0.0.1 opens a connection on local port 1234 and connects to the server. Now the server has one socket identified as follows:
10.0.0.1:1234 - 192.168.1.1:80
Now 10.0.0.2 opens a connection on local port 5678 and connects to the server. Now the server has two sockets identified as follows:
10.0.0.1:1234 - 192.168.1.1:80
10.0.0.2:5678 - 192.168.1.1:80
Just to add to the answer given by user "17 of 26"
The socket actually consists of 5 tuple - (source ip, source port, destination ip, destination port, protocol). Here the protocol could TCP or UDP or any transport layer protocol. This protocol is identified in the packet from the 'protocol' field in the IP datagram.
Thus it is possible to have to different applications on the server communicating to to the same client on exactly the same 4-tuples but different in protocol field. For example
Apache at server side talking on (server1.com:880-client1:1234 on TCP)
and
World of Warcraft talking on (server1.com:880-client1:1234 on UDP)
Both the client and server will handle this as protocol field in the IP packet in both cases is different even if all the other 4 fields are same.
What confused me when I was learning this, was that the terms socket and port suggest that they are something physical, when in fact they're just data structures the kernel uses to abstract the details of networking.
As such, the data structures are implemented to be able to distinguish connections from different clients. As to how they're implemented, the answer is either a.) it doesn't matter, the purpose of the sockets API is precisely that the implementation shouldn't matter or b.) just have a look. Apart from the highly recommended Stevens books providing a detailed description of one implementation, check out the source in Linux or Solaris or one of the BSD's.
As the other guy said, a socket is uniquely identified by a 4-tuple (Client IP, Client Port, Server IP, Server Port).
The server process running on the Server IP maintains a database (meaning I don't care what kind of table/list/tree/array/magic data structure it uses) of active sockets and listens on the Server Port. When it receives a message (via the server's TCP/IP stack), it checks the Client IP and Port against the database. If the Client IP and Client Port are found in a database entry, the message is handed off to an existing handler, else a new database entry is created and a new handler spawned to handle that socket.
In the early days of the ARPAnet, certain protocols (FTP for one) would listen to a specified port for connection requests, and reply with a handoff port. Further communications for that connection would go over the handoff port. This was done to improve per-packet performance: computers were several orders of magnitude slower in those days.

Resources