Connecting P2P over NAT? - networking

I started to explore the option of connecting with other using a p2p connection, so I coded a simple socket program in JAVA for android devices in which the users can share simple messages p2p (I didn't have any idea about NAT then). I got to know about NAT, so I now need to establish a TCP connection with another user which uses a server for discovery but payload is transferred p2p. I have also looked at XMPP(a very good and detailed explanation of how protocol works is here) and UPnP but I dont know how to implement them.
Another interesting question that arises is of BitTorrent because they can work on any device and even behind a NAT. I am not able to get any explanation of how BitTorrent works.
I have researched a lot but I am stuck.
My questions are:
A detailed explanation of BitTorrent(like here, not how torrents work) and how is it able to work around NAT ?
Is there a way to make a NAT entry programmatically ?
Is socket programming sufficient for p2p ?
How difficult is it to create your own protocol and how can I build one ?
If two devices D1 and D2 want to communicate p2p and they know each other's IP. D1 sends a request to D2 and that can't get through the D2's NAT, but there should be an entry created in D1's NAT. So when D2 tries to send something D1's NAT should discover an entry with D2's IP. Then why is the packet not allowed by it ?

Another interesting question that arises is of BitTorrent because they can work on any device and even behind a NAT. I am not able to get any explanation of how BitTorrent works.
This statement looks like you assume that bittorrent needs full connectivity to operate.
That is incorrect.
Behind a NAT device you will still be able to establish outgoing TCP connections. Which generally is sufficient for bittorrent as long as there are other, non-NATed (or NATed but properly port-forwarded) clients in the network that can accept incoming connnections.
NAT has no impact on the flow direction of the data because connections are bi-directional once they are established. It only is problematic for the initial connection setup.
This works perfectly fine for bittorrent because bittorent does not care from which specific node you get your data.
Although better connectivity generally does improve performance.
If the identity of the node matters or one-on-one transfers are an important use-case then other p2p protocols usually attempt NAT traversal first and if that fails rely on 3rd party nodes relaying traffic between those nodes who cannot connect to each other directly.
Additionally, IPv6 support will become essential in the future to maintain end-to-end connectivity because more and more ISPs are starting to roll out carrier-grade NAT for IPv4 while IPv6 will remain non-NATed

One thing need to be clear is that 100% P2P between all type of NAT is impossible right now. There is no practical way to establish P2P connectivity between **Symmetric and Symmetric/PRC NAT. In this scenario connection is established through a relay server called TURN.
I am answering from your 2nd question because I don't know much about the first one.
2) Yes. You can send a packet through your NAT and there will be a mapping between your internal IP:Port to your NAT's external IP:Port. You can know these external IP:Port by sending a stun request. Note that this technique doesn't work for Symmetric NAT.
3)Yes socket programming sufficient for p2p.
4)Why do you need a protocol when there already exists several. ICE protocol is the best today for NAT traversal and I don't think it was easy to create. UPnP and NAT-PMP is really vulnerable in terms of security.
5)I think what happens is usually NAT blocks unknown packets coming to it. So when D1 sends a packet to D2, its NAT blocks all packets incoming from D1s IP:Port. That is why connection establishment fails. You have to employ hole punching technique for D1 and D2 to successfully establish P2P connectivity.
**By symmetric NAT I mean symmetric NAT with random port allocation.

There is a paper on "Peer-to-Peer Communication Across Network Address Translators" which describes the UDP hole punching method and extends it to be used over TCP as well.
Of course, you will always need a relay server for the cases where hole punching is not supported.

Recent versions of BitTorrent use µTP, which is layered above UDP, not TCP. µTorrent uses a private extension (ut_holepunch) that performs UDP hole punching, most other implementations don't bother (with the notable exception of Tixati).
Some NAT routers accept port forwarding requests using either the uPNP or the PMP protocol. Whether this is supported depends on the particular brand of router and its configuration.
Yes, socket programming is enough for P2P.
Difficult to answer. I suggest that you read the wikified and annotated BitTorrent specification for a start.
Yes, this is the principle behind UDP hole punching.

Related

Writing client-server application in global network

I know, how to write a C# application that works through a local network.
I mean I know, how to make my client-side application access my server-side application in a single local network.
But I wonder: How do such apps, as Skype, TeamViewer, and many other connect via global network?
I apologise, if this question is simple or obvious, but I couldn't find any information about this stuff.
Please, help me, I'll be very grateful. Any information is accepted - articles, plain info, books,and so on...
Question is very wide and I try to do short overview.
Following major difference between LAN (Local Area Network) and WAN (Wide Area Network):
Network quality:
LAN is more or less stable, WAN can be with network issues like:
Packet loss (you need use loss-tolerant transport like TCP or UDP with retransmits or packet loss concealment)
Packet jitter (interpacket intervals may differ a lot from sending part). Most common thing is packets bursts.
Packet reordering
Packet duplication
Network connectivity
WAN is less stable than LAN. So you need properly handle all things like:
Connection stale
Connection loss
Errors in the middle of the connection (if you use UDP for example)
Addresses:
In WAN you deal with different network equipment between client and server (or peers in case of peer-to-peer communication). You need to take in account:
NATs - most of the clients are behind NAT and you need to pass them through. According technics are called "NAT traversal"
Firewalls - may ISP has own rules what client can do or can't. So if you do something specific like custom transport protocol you may bump into ISP firewalls.
Routing - especially multicast and broadcast communication. In common case multicast is not possible to route. Broadcasts are never routed. So you need to avail this type of communication if you want to use WAN.
May be I forgot something. But these points are major. You can read many articles about any of them.

What's so hard about p2p Hole Punching? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed last month.
Improve this question
I am trying to experiment with some p2p networking. Upon doing some research, one of the biggest obstacle I learnt is "What if a client is behind a NAT/Firewall", later on I discovered about Hole Punching but that it is not always guaranteed to work.
As far a I understand, I don't understand why it might fail, This is what I know so far:
Based on the diagram above, this is how I understand how a successful connection can be established.
Alice joins the network (1) by creating connection to a directory-server. When this happens, Alice's NAT creates a mapping from her public ip to her local ip.
The directory server receives the connection and store Alice's public ip:port in the directory
Bob does the same (2), Joins the network and publishes his ip:port in the directory
Alice wants to communicate with bob. So she looks up Bob's ip:port from the directory. (3)
Alice sends data on Bob's ip:port which she got from the server. (5)
Since Bob also has a mapping from is ip:port to his local ip:port, the NAT simply forwards any data received on Bob's public ip:port to his computer.
Same works for Alice
I hope I was clear in my explanation of what I understand. My question is, what is so hard or unreliable about this? i must be clearly missing something. Can you explain me what it is?
One problem is that the NAT mappings in Alice's NAT server may time out, either after a fixed time, or after a period of inactivity.
A second potential problem is that the NAT server could make the restriction that Alice's NAT mapping is only "good" for TCP connections established by Alice, or connections between Alice and the initial IP "she" connected to. (In other words, direct communication between Alice & Bob may be blocked.)
And so on.
The problem is that the behaviour of a NAT server is highly dependent on how the managing organization's configuration / policy decisions. Many of these decisions could mean that your particular P2P usage pattern won't work reliably ... or at all.
So then is my whole idea about hole punching wrong?
No. It just means that it won't always work.
Possibly the biggest problem in NAT holepunching is lack of port consistency. For your implementation to work, at least one of the two NATs must support it.
Port consistency is where the same (local ip, local port) is mapped to the same (external ip, external port) regardless of the target (destination ip, destination port). Without this, the port seen by the directory server is not helpful to the client since it will not be the same port the clients will need to talk to each other.
(Note that this is a weaker requirement than port preservation, where external port == local port.)
Unfortunately for P2P communication, most NATs are some flavor of Symmetric NAT and do not have consistent port mappings.
Firewalls are typically stateful. Bob (2) establishing communications with the outside directory server sets up a rule in his NAT server that allows Bob and the directory server to communicate. When the NAT server sees packets from Alice, it rejects/drops them because it hasn't seen Bob establish communications with Alice.
First of all there are 2 types of hole punching
1.UDP hole punching
2.TCP hole punching
UDP hole punching success rate is 82%
TCP hole punching success rate is 64%
I have done many UDP hole punching experiments and they were mostly all successful but not same in the case of TCP hole punching.
The reason behind the failure of TCP hole punching is only the router NAT table. I will try to explain my best:
Client 1 --> connect(client2) --Internet-- connect(client1)<-- Client 2
Now if Client1 **SYN Packet**** reaches to the client2 and **client2 **SYN packet wasn't released** , the ROUTER of client2 can do 2 things:
1. send RST packet back as connection refused to client1.
2. drop packet immediately and no reply send to client1.
If this happens no connection will be established.
I can only suggest a solution that time difference between connect call from both the client should be very less. The connect call difference should be in milli-seconds
TIP: if you are in local network , put disable your firewall
for ubuntu user : sudo ufw disable
I think understanding how the Hole Punching really works would assist to get into why it may fail.
It was first explored by Dan Kagel, read here. In this technique, both peers are generally
assumed to be behind two different NATs. Both peers must be connected to an
intermediate server called a Rendezvous/Signaling server; there are many well-known Rendezvous protocols and SIP (RFC 3261) is the most famous one. As they are
connected to the server, they get to know about each other’s public transport
addresses through it. The public transport addresses are allocated by the NATs
in front of them. The Hole Punching process in short:
Peer 1 and Peer 2 first discover their Public Transport Addresses using
STUN Bind Request as described in RFC 8589.
Using a signaling/messaging mechanism, they exchange their Public Transport Addresses. SDP(Session Description Protocol)[16] may be used to
complete this. The Public Transport Address of Peer 1 and Peer 2 will be
assigned by NAT 1 and NAT 2 respectively.
Then both peers attempt to connect to each other using the received Public
Transport addresses. With most NATs, the first message will be dropped
by the NAT except for Full Cone NAT. But the subsequent packets will
penetrate the NAT successfully as by this time the NAT will have a mapping.
NAT can be of any type. If the NAT is, let's say, Symmetric NAT RFC 8489, Hole Punching won’t be possible. Because only an external host that receives a packet from an internal host can send a packet back. If this is the case, then the only possible way is Relaying.
Learn more about the current state of P2P communication: read RFC 5128.

How bit torrent works in private network?

Can somebody please explain working of a bit torrent from the perspective of a host in private network as its IP address is not visible outside the private network. Is port forwarding necessary for bit torrent to work?
Not really. The basic protocol still works if it can not accept incoming connections, it can rely on just outgoing connections. Of course if several peers are not accepting incoming connections, none of them can directly connect, and that's a bad thing - for those peers and for the whole swarm. The number of unreachable (but active) peers is significant in practice, though very hard to measure precisely.
Also, consider that your client will be advertising itself as available, so other peers will be wasting connection attempts to your client, which will be rejected by the NAT device (or they won't even really go anywhere, if the client is silly enough to advertise its private IP address).
So in short, it will work, but it's not a good thing.
For the UDP based protocols (UDP tracker, DHT, µtp), hole-punching can be used (except from behind symmetric NAT), so typically no forwarding is required for those (as long as the client supports hole-punching).

Doubts Related to working of torrents?

I was trying to understand how torrents work?
And after reading a lot on web I now know the basics about it but I have a very
important question related to working of torrents!
In torrents how do peer-to-peer connections take place?
Almost all the peers have private-IP(for e.g 192.x.x.x) addresses then how does connections take place without a server(As I have read: There is no server involved in torrents) ?
Thanks a lot!
There are a few alternatives:
Peers behind NAT simply don't connect to other peers behind NATs. This creates two classes of peers, where the ones that are connectable will have an advantage when trading pieces, and typically achieve faster download rates.
Peers behind NAT use UPnP or NAT-PMP to set up port forwarding in order to be connectable by other peers
peers using uTP and Peer exchange can support a simple hole-punching mechanism (uTorrent and libtorrent supports this for instance). A peer can help in introducing two of its connections to each other, they try to connect to at the same time and of one of them have a full-cone NAT, they are very likely to succeed in establishing the connection.
Peers supporting DHT and uTP may use a relatively new feature where the port announced to the DHT is derived from their UDP packets. Using the same socket for DHT and uTP increases the chances that a peer behind a full-cone NAT can accept incoming connections without UPnP or NAT-PMP set up. Simply because the DHT traffic will keep a pinhole open on the NAT.
If you have a swarm of only peers behind symmetric NATs, nobody is going to be able to connect to anyone else, and bittorrent is not going to work. In practice (at least in moderately large swarms) there are always some peers that are connectable.

If TCP is connection oriented why do packets follow different paths?

According to my knowledge if an internet application has to be designed, we should use either a connection-oriented service or connection-less service, but not both.
Internet's connection oriented service is TCP and connection-less service is UDP, and both resides in the transport layer of Internet Protocol stack.
Internet's only network layer is IP, which is a connection-less service. So it means whatever application we design it eventually uses IP to transmit the packets.
Connection-oriented services use the same path to transmit all the packets, and connection-less does not.
Therefore my problem is
if a connection oriented application has been designed, it should transmit the packets using the same path. But IP breaks that rule by using different routes.So how do both TCP and IP work together in this sense? It totally confuses me.
You, my friend, are confusing the functionality of two different layers.
TCP is connection oriented in the sense that there's a connection establishment, between the two ends where they may negotiate different things like congestion-control mechanism among other things.
The transport layer protocols' general purpose is to provide process-to-process delivery meaning that it doesn't know anything about routes; how your packets reach the end system is beyond their scope, they're only concerned with how packets are being transmitted between the two end PROCESSES.
IP, on the other hand, the Network layer protocol for the Internet, is concerned with data-delivery between end-systems yet it's connection-less, it maintains no connection so each packet is handled independently of the other packets.
Leaving your system, each router will choose the path that it sees fit for EACH packet, and this path may change depending on availability/congestion.
How does that answer your question?
TCP will make sure packets reach the other process, it won't care HOW they got there.
IP, on the other hand, will not care if they reach the other end at all, it'll simply forward each different packet according to what it sees most fit for a particular packet.
Note:
Let's assume that IP was connection-oriented, would that mean packets would follow the same-path?
Not necessarily, it depends on what the word 'connection' at this layer means, if it means negotiating certain options related to security, for instance, you may still have all your packets being forwarded through different routes over the Internet.
EDIT:
Not to confuse you though, most connection-oriented services at the network-layer and below mean that the connection, when established, also establishes a virtual-path that all 'packets' must follow, for further information read about:
Virtual circuit and frame-relay networks
This link answers your question pretty well http://www.tcpipguide.com/free/t_ConnectionOrientedandConnectionlessProtocols-3.htm
Some people consider this (TCP) to be like a “simulation” of circuit-switching at higher network layers; this is perhaps a bit of a dubious analogy. Even though a TCP connection can be used to send data back and forth between devices, all that data is indeed still being sent as packets; there is no real circuit between the devices. This means that TCP must deal with all the potential pitfalls of packet-switched communication, such as the potential for data loss or receipt of data pieces in the incorrect order.
The TCP protocol deals with the problem of IP packets arriving out of order or being lost, to give you the feeling they arrive through a single FIFO channel. Yes, TCP is smart enough to do that, there's no need for a dedicated underlying channel.
The TCP protocal is implemented by the sending/receiving machines, once the packets leave the sending machine, the routers they travel along know nothing about TCP, they just use IP to get the packets from the source the to destination. Then, it is the destination machines job to, using TCP, make sure that all the packets arrive and that they arrive in the correct order. The internet itself doesn't know anything about TCP, it's just a layer (often software) that gives connection to a connectionless medium (the internet).
So onces a packet leaves a destination, it can go along any path (mostly) as long as it gets to the desintation, regardless of the higher level protocol (such as TCP or UDP).
I mean, it's a bit more complicated then that, but as far as I can remember that's the general Idea.
Refer my short points properly,
1) Connection oriented means ==> reserving resources(buffer,cpu,bandwidth etc.)..but "Where??".(where resources are reserved?? This where is reason of your confusion, so following is ans.).
2) Connection oriented at Transport Layer means ==> Reserving the resources at Both End processes/Ports.(Since TCP is a transport layer,then its responsibility is to reserve the resources at both end processes only,irrespective of whats happening in the intermediate path.)
3) Connection oriented at Network Layer means ==> Reserving the resources at Network Layers.(Now In the whole journey of a packet from source to destination, Network layer is found at all intermediate routers too(but not transport layer). Hence if any protocol at Network layer is connection oriented then,its responsibility is to reserve resources at all intermediate routeres too i.e. all packets will have to follow same intermediate path, But IP is connection less hence,intermediate resources will not be reserved. i.e journey of a packets may follow different paths etc.)
#CONCLUSION:==> Intermediate path is decided by Network Layer, hence if IP then paths may be different.(IP may contain TCP),But TCP is responsible for Resource reservation at both End processes ,irrespective of Intermediate path of packet.
router works on three layers only (physical , data link and Network layers) , so routers will take decision depending only on the info. of network layer (IP protocol ) hence there is no information available about its TCP or UDP at the router

Resources