I am learning the p2p architecture through a simulation within a single machine. For that I was told that I can use named pipes. I am in the design phase of the simulation. Here is how I plan to move forward:
Create a peer program that is able to join the p2p 'network'. The network is a collection of such peers. The peer is identified via the process id in the machine.
When a peer is created i.e., joins the network, it announces it's arrival through bootstrapping itself to a 'master-node' or tracker. When the peer announces its arrival, the master-node updates its list of peers (process ids), and returns the list of peers available in the network to the new peer.
Once in the network, the peer can now download a file from the network or upload a file to an incoming request for a file that s/he has. The files that the peer receives through the network automatically becomes available for upload.
To download a file, the peer calls a location algorithm that locates the peer with the file that the current peer is seeking to download.
As you can see, there is a gap in my understanding of the design. From my naive approach, I see #2 and #3 as different steps. However, I have a feeling that they must somehow be related. I guess my understanding lacks in knowing how a distributed hash table and algorithm like Chord or CAN works. I need help in rounding up these scattered ideas that will help me actually implement the simulation.
To begin, my first question is: where and how do I initiate the location algorithm? Or where is the distributed hash table created in the steps listed above?
You have a choice of either implementing a tracker server a la Napster. Or you can implement DHT-capable nodes - Bit torrent style.
In the former case, the tracker keeps track of all the nodes and the files they are hosting.
And in the latter, each node keeps track of the information (at least one peer in the network assumed in this case). A new peer will then contact one of the DHT - capable peer and get information about the node that is hosting the file that it requires.
Related
While building an application by python for implementing post-processing in Quantum Key Distribution, I require a server (say Alice) and a client(say Bob) at distant locations to interact and exchange some information while doing the required calculations simultaneously (for which threading is used).
At first, the open-source NGROK service was used as a hosting server but it makes the time required for post-processing very huge mainly due to network congestion.
So, is there a way to establish a direct peer-to-peer connection between Alice and Bob via the internet wherein Alice's system itself acts as a server thereby bypassing a third-party host server? If there is a way out please suggest one otherwise any leads or suggestions for open source services better than NGROK would be highly helpful.
PS: If you need any additional information for providing help, i would be eager to respond.
Does the Corda Enterprise have an Information Broadcast solution?
If it does not currently exist, will it be possible in the future?
The quick answer is, of course you can perform "information broadcast" and it can do so in exactly the same way that other DLT platforms do. In fact, it makes no sense that you can't broadcast with Corda!
This question comes up a lot, probably because there is some marketing material which says that Corda messaging happens on a peer to peer basis and that "there is no broadcast". What this actually means is that there is no gossiping of transactions with Corda. This is a good thing because it means peers have fine-grained control over which other peers can see their transactions.
To send a message to a peer on a network, you must know where the recipient can be reached. As such DLT/blockchain platforms maintain a list of peers. Platforms like Bitcoin, have a list of peers bundled with the software to bootstrap the network. This list can grow as more peers are discovered. With Corda, this is currently done through the network map service. Corda nodes can query their local cache of the network map to get a list of peers on the network.
If you want to broadcast a message to all peers on the network or a sub-set of peers on the network then it follows that you can iterate through the set of peers that you which to send a message to and send them the message. Easy. Note there is no gossiping here. It's simply just a bunch of unicast messages. You can do this asynchronously, as well.
It is also possible to facilitate gossiping of messages with Corda. In section 12 of the technical white paper, there is a concept mentioned called data distribution groups or clubs. You can think of a club as a directed minimum spanning tree of nodes on a network, which might look something like this:
As such, a node can start a club, then invite others to it, and so on. Members of the club can send a message to the club and it will be forwarded on to all the others. Referring to the image above, if node one publishes a message to the club, then all the other nodes will receive it.
I've implemented a prototype of this here. It's a feature that we plan to roll out in the near future.
Probably worth noting that most networks don't default to broadcasting or multicasting because it makes them a lot slower (if you look into the original history of the Internet, for example, you'll see that multicast didn't exist).
Broadcast platforms have a lot of problems in that senders typically don't know if the recipients have received those messages or not, so it's not at all unusual to find that when some systems talk about "broadcasting" they actually do multiple unicasts instead.
The Corda approach means that there's guaranteed delivery of the messages sent to all the relevant parties. As a point of comparison, even though Wi-Fi networks support multicast messages at L2, most access points will prefer to convert L3 (IP level) multicasts into a series of point-to-point L2 messages as these will be delivered reliably (the receiver ACKs the messages).
It's not that hard to build a gossipy sort of design on top of Corda's messaging. We did this for project Ubin phase 2a in 2017.
I've been digging about DHTs and especially kademlia for some time now already.
I'm trying to implement a p2p network working on a Kademlia DHT. I want to be able to gossip a message to the whole network.
from my research for that gossip protocols are used, but it seems odd to add another completely new protocol to spread messages when I already use the dht to store peers.
Is there a gossip protocol that works over or with a DHT topology like Kademlia ?
How concerned are you about efficiency? As a lower bound someone has to send a packet to all N nodes in the network to propagate an update to all nodes.
The most naive approach is to simply forward every message to all entries in your routing table. This will not do since it obviously leads to forwarding storms.
The second most naive approach is to forward updates, i.e. newer data. This will result in N * log(N) traffic.
If all your nodes are trusted and you don't care about the last quantum of efficiency you can already stop here.
If nodes are not trusted you will need a mechanism to limit who can send updates and to verify packets.
If you also care about efficiency you can add randomized backoff before forwarding and tracking which routing table entry already has which version to prune unnecessary forwarding attempts.
If you don't want to gossip with the whole network but only a subset thereof you can implement subnetworks which interested nodes can join, i.e. subscribe to. Bittorrent Enhancement Proposal 50 describes such an approach.
I want to create a P2P application on the internet. What is the best or if none exist a good enough way to do auto-discovery of other nodes in a decentralized network?
Grothoff and GauthierDickey from the GNUnet project (an anonymous censorship-resistant file-sharing network) researched on the question of bootstrapping a p2p network without any central hostlist.
They found that for the Gnutella (Limewire) network a random ip search needed on average 2500 connection attempts to find a peer.
In the paper they proposed a method which reduced the required connection attempts to 817 for Gnutella and 51 for the E2DK network.
Achieved was this through creating a statistical profile of p2p users for every DNS organization, this small (around 100kb) discovery database has to be created in advance and shipped with the p2p client.
This is the holy grail of P2P. There isn't a magic solution really - there's no way a node can discover other nodes without a good known point to act as a reference (well, you can do so on a LAN by using broadcasting, but not on the internet). P2P filesharing tends to work by having known websites distributing 'start points' for discovery, and then further discovery (I would expect) can come from asking nodes what other nodes they know about.
A good place to start on research would be Distributed Hash Tables.
As for security, that topic will be in the literature somewhere, I should think - again I would recommend Wikipedia. Non-existent ones are trivially dealt with: if you can't contact an IP/port, don't keep it on your list, and if a node regularly provides non-existent pointers, consider de-prioritising it or removing it from your list entirely.
For evil nodes, it depends on your use case, but let's say you are doing file sharing. If you request a section of a file, check with several nodes what the file section's hash should be, and then request by hash. If the evil node gives you a chunk that has a different hash, then you can again de-prioritise or forget that node.
Distributed processing systems work a little differently: they tend to ask several unrelated nodes to perform the same work, and then they use a voting system (probably using hashing again) to determine whether evilness is at hand. If a node provides consistently bad results, the administrator is contacted or the IP is removed from the known nodes list.
ok, for two peers to find each other they both have to know a common, lets say, mediator to exchange IPs once. You can use anything for this kind of the first handshake whilst being able to WRITE and READ from that "channel". i.e: DNS (your well known domains), e-Mail, IRC, Twitter, Facebook, dropbox, etc.
I'm developing a some sort like P2P system here.
The types of P2P network that I'm going to use is Hybrid P2P.
But, there's a problem here which I don't know how to solve.
It's the data directory.
How does the data directory works?
How does it knows that which nodes contain what files?
If I understand your question correctly, you are referring to indexing resources available on your P2P network. Basically, a central peer stores which electronic resource is available on which peer.
Instead of querying all peers, a given peer can query the central peer containing the index. Then, this information can be used to fetch the electronic resource (PDF file, music, video, etc....) from the correct peer. It reduces traffic.