I've been digging about DHTs and especially kademlia for some time now already.
I'm trying to implement a p2p network working on a Kademlia DHT. I want to be able to gossip a message to the whole network.
from my research for that gossip protocols are used, but it seems odd to add another completely new protocol to spread messages when I already use the dht to store peers.
Is there a gossip protocol that works over or with a DHT topology like Kademlia ?
How concerned are you about efficiency? As a lower bound someone has to send a packet to all N nodes in the network to propagate an update to all nodes.
The most naive approach is to simply forward every message to all entries in your routing table. This will not do since it obviously leads to forwarding storms.
The second most naive approach is to forward updates, i.e. newer data. This will result in N * log(N) traffic.
If all your nodes are trusted and you don't care about the last quantum of efficiency you can already stop here.
If nodes are not trusted you will need a mechanism to limit who can send updates and to verify packets.
If you also care about efficiency you can add randomized backoff before forwarding and tracking which routing table entry already has which version to prune unnecessary forwarding attempts.
If you don't want to gossip with the whole network but only a subset thereof you can implement subnetworks which interested nodes can join, i.e. subscribe to. Bittorrent Enhancement Proposal 50 describes such an approach.
Related
Does the Corda Enterprise have an Information Broadcast solution?
If it does not currently exist, will it be possible in the future?
The quick answer is, of course you can perform "information broadcast" and it can do so in exactly the same way that other DLT platforms do. In fact, it makes no sense that you can't broadcast with Corda!
This question comes up a lot, probably because there is some marketing material which says that Corda messaging happens on a peer to peer basis and that "there is no broadcast". What this actually means is that there is no gossiping of transactions with Corda. This is a good thing because it means peers have fine-grained control over which other peers can see their transactions.
To send a message to a peer on a network, you must know where the recipient can be reached. As such DLT/blockchain platforms maintain a list of peers. Platforms like Bitcoin, have a list of peers bundled with the software to bootstrap the network. This list can grow as more peers are discovered. With Corda, this is currently done through the network map service. Corda nodes can query their local cache of the network map to get a list of peers on the network.
If you want to broadcast a message to all peers on the network or a sub-set of peers on the network then it follows that you can iterate through the set of peers that you which to send a message to and send them the message. Easy. Note there is no gossiping here. It's simply just a bunch of unicast messages. You can do this asynchronously, as well.
It is also possible to facilitate gossiping of messages with Corda. In section 12 of the technical white paper, there is a concept mentioned called data distribution groups or clubs. You can think of a club as a directed minimum spanning tree of nodes on a network, which might look something like this:
As such, a node can start a club, then invite others to it, and so on. Members of the club can send a message to the club and it will be forwarded on to all the others. Referring to the image above, if node one publishes a message to the club, then all the other nodes will receive it.
I've implemented a prototype of this here. It's a feature that we plan to roll out in the near future.
Probably worth noting that most networks don't default to broadcasting or multicasting because it makes them a lot slower (if you look into the original history of the Internet, for example, you'll see that multicast didn't exist).
Broadcast platforms have a lot of problems in that senders typically don't know if the recipients have received those messages or not, so it's not at all unusual to find that when some systems talk about "broadcasting" they actually do multiple unicasts instead.
The Corda approach means that there's guaranteed delivery of the messages sent to all the relevant parties. As a point of comparison, even though Wi-Fi networks support multicast messages at L2, most access points will prefer to convert L3 (IP level) multicasts into a series of point-to-point L2 messages as these will be delivered reliably (the receiver ACKs the messages).
It's not that hard to build a gossipy sort of design on top of Corda's messaging. We did this for project Ubin phase 2a in 2017.
Nodes: Clients on DHT-network.
Peers: Clients trying to download a specific resource.
Suppose that the DHT-network is a connected graph, but NO nodes can access ALL other nodes (a consumption contrary to the common belief that the Internet, which DHT-network overlays on, is fully connected).
Is the Peer-network, which overlays on DHT-network, is still a connected graph? Why?
Kademlia is an abstract algorithm that assumes spherical cows in a vacuum. The only failure modes the paper discusses are churn and temporary graph partitions. Asymmetric reachability is not considered.
Kademlia as implemented in the real world makes no guarantees. Everything is done on a best-effort probabilities-are-good-enough basis.
The main concern in the real world are not nodes where interconnected cluster A cannot talk to a interconnected cluster B. NATs and firewalls do not introduce such clusters on a considerable scale. They create a set of second-class citizens which are not consistently reachable by anyone - absent NAT traversal measures - and thus can only connect to the first-class citizens which are the nodes where anyone can talk to anyone else. Of course a few edge cases exist, but they're largely irrelevant.
Anyway, since you're not even asking about kademlia but about bittorrent, which is not really an overlay over kademlia but a separate network which simply bootstraps its contact information from kademlia things get even more complicated. Bittorrent can be implemented over two different transport mechanisms, TCP and µTP, and clients may support different levels of nat traversal capabilities for TCP, µTP and Kademlia-via-UDP.
Kademlia nodes generally store contact information for bittorrent on several reachable nodes, since they - quite obviously - cannot reach unreachable nodes for the purpose of storage. They also do so with redundancy, which ensures a high likelihood that the stored contact information can be observed by anyone else.
Based on that contact information bittorrent clients can then attempt to connect to each other. As long as there are some reachable bittorrent clients they will be able to establish a direct connection and then may additionally be able to attempt some nat traversal measures between non-reachable nodes. Again, there are no guarantees, so small swarms may fail under some circumstances, but once a swarm becomes large enough the probabilities tip overwhelmingly in the favor of the graph becoming connected.
A small additional concern is IPv4 vs. IPv6. Generally IPv6 provides better connectivity (if firewalls don't get in the way) but not all clients implement the ipv6 extensions equally well, thus possibly preventing a few v6-edges from forming when they would in principle provide superior connectivity between the same nodes.
Note that ipv4 and ipv6 DHTs are in theory independent DHT networks, they just happen to have some significant overlap. It's basically outside the scope of kademlia how to coordinate multiple independent networks.
If StackOverflow is the wrong Exchange for this question, please help direct me to the correct one.
Short Version
What is the best design for a networking application in which one user transmits a constant, high-bandwidth stream of data to many other addresses? The solution must not require the uploader to duplicate the packets for each recipient and preferably will not transmit to users that have not been accepted by the transmitter.
Long Version
A friend and I have written an application that enables someone to transmit data in real time to one or more recipients that he wants to receive the data. I have designed the high-level application protocol to use UDP and to encode the data so that each packet can be lost without hurting the use of the rest. This solution requires managing sockets with each user and sending each packet to every user.
The problem here is that the stream can be very high bandwidth. The user can modify the settings for how high quality the data he is sending should be, and can end up sending 6 Mbps to each user. It is unfeasible to expect a user to pay his ISP enough to be allowed to upload such a stream to the preferred minimum of four other users at a time.
We need a way for the transmitter to send a packet exactly once and have each user receive a copy.
We have looked at multicasting. It may be what we need to use in the end, but we are concerned about the fact that anyone can join any group. It would be preferable to not allow users we do not want to see the data to not be allowed to join in. There is also the problem that if multiple transmitters happen to use the same group, viewers may find that they are receiving multiple streams' worth of data when they only want one.
My searching has revealed something IBM published over a decade ago called Explicit Multicast (Xcast) that looks perfect, but I have yet to find any information to determine whether this technology is commonly supported. Also, I have not yet seen whether it supports datagrams.
Does anyone know the best way to design an application that meets our needs?
Please keep in mind that we have no funds to support our project. Solutions need to be free.
Edit
In the summary above, I hinted at but failed to explicitly state that this is for a real-time application. The motivating drive behind the application is to keep the clients/recipients as close together in time as is possible. If packets are lost or arrive too late to be used in keeping the server and clients in phase, they need to be disregarded. That is why I designed the application protocol on top of UDP with independent data in each packet. Even if a client receives only one packet out of 300 for a given time step, it will use what it did get.
I think that I_am_Helpful's recommendation may be a good step in the right direction (or possibly the destination). I need to do some experimentation to determine whether using a system like Spread will work. However, I do not think I can budget more than additional 17 ms in transmission time.
If you can think of a system that enables sending unreliable datagrams to a specific group of users (like Spread) for a real-time application (unlike Spread, see p. 3), please let me know about it.
We need a way for the transmitter to send a packet exactly once and
have each user receive a copy.
In my limited knowledge, I would say that Reliable Multicasting appears to be one of the viable option for broadcasting in the group. I would like to mention that there are some of the possible Java API's* which could help you achieving the same :
JGroups Java API
The Spread Toolkit -> Spread consists of a library that user applications are linked with, a binary daemon which runs on each computer that is part of the processor group, and various utility and demonstration programs.
Appia
*NOTE : I have never worked with these API's.
It would be preferable to not allow users we do not want to see the
data to not be allowed to join in.
They do provide this feature, e.g., Spread supports thousands of groups with different sets of members. It also provides a range of reliability, ordering and stability guarantees for messages. JGroups can be used to create groups of processes whose members can send messages to each other. It also has facilities like group creation and deletion(Group members can be spread across LANs or WANs).
There is also the problem that if multiple transmitters happen to use
the same group, viewers may find that they are receiving multiple
streams' worth of data when they only want one.
When you could easily create multiple groups in the same network(using Spread,etc.), then, I believe that would no longer be an issue. It is your responsibility to declassify users into different groups.
I hope the given information helps. Good LUCK.
Via multicast you achieve exactly you want: sending each packet once, but authentication seems to be a concern for you.
One possible solution could be simetric cryptography, where the same key is used to encrypt and decrypt. Via TCP your clients connect to a server and fetch the multicast IP Address of the transmission and its associated key, then they join the multicast group and decrypt the incomming transmission.
If you accept a more flexible solution, you could have a server which sends a transmission in real time to a set of distributed servers. Your clients connect to one of these distributed servers via unicast, and after authentication is done, they are inluded in a list of receivers. Each distributed server sends each new transmission packet to each registered client via UDP. in ordinary situations your clients would have the same experience as if it was delivered in a multicast group, but the servers will spend far more bandwidth. Multiple transmission at a time will be allowed, so it could be good for you, and you can have more control, as clients can send signals to the servers, like PAUSE, and etc.
I want to create a proxy server which routes incoming packets from REQ type sockets to one of the REP sockets on one of the computers in a cluster. I have been reading the guide and I think the proper structure is a combination of ROUTER and DEALER on the proxy server. Where the ROUTER passes messages to the dealer to be distributed. However, I cannot figure out how to create this connection scheme. Is this the correct architecture? If so how to I bind a dealer to multiple addresses. The flow I envision is like this REQ->ROUTER|DEALER->[REP, REP, ...] where only one REP socket would handle a single request.
NB: forget about packets -- think in terms of "Behaviour", that's the key
ZeroMQ is rather an abstract layer for certain communication-behavioral patterns, so while terms alike socket do sound similar to what one has read/used previously, the ZeroMQ-world is by far different from many points of view.
This very formalism allows ZeroMQ Formal-Communication-Patterns to grow in scale, to get assembled in higher-order-patterns ( for load-balancing, for fault-tolerance, for performance-scaling ). Mastering this style of thinkign, you forget about packets, thread-sync-issues, I/O-polling and focus on your higher-abstraction-based design -- on Behaviour -- rather than on underlying details. This makes your design both free from re-inventing wheel & very powerful, as you re-use a highly professional tools right for your problem-domain tasks.
DEALER->[REP,REP,...] Segment
That said, your DEALER-node ( in fact a ZMQsocket-access-node, having The Behaviour called a "DEALER" to resemble it's queue/buffering-style, it's round-robin dispatcher, it's send-out&expect-answer-in model ) may .bind() to multiple localhost address:port-s and these "service-points" may also operate over different TransportClass-es -- one working over tcp://, another over inproc://, if that makes sense for your Design Architecture -- ZeroMQ empowers you to use this transparently abstracted from all the "awfull&dangerous" lower level gritty-nitties.
ZeroMQ also allows to reverse .connect() / .bind()
In principle, where helpfull, one may reverse the .bind() and .connect() from DEALER to a known target address of the respective REP entity.
You leave a couple details out that are important to determining the correct architecture.
When you say "from REQ type sockets to one of the REP sockets on one of the computers in a cluster", how do you determine which computer gets the message? Is it addressed to a specific computer? Does a computer announce its availability before it can receive a message? Does each message just get passed to the next one in line in a round-robin fashion? (if it's not the last one, you probably don't want a DEALER socket)
When you say "how do I bind a dealer to multiple addresses", it's not clear what you mean by "addresses"... Do you mean to say that the proxy has a unique IP address that it uses to communicate with each computer in the cluster? Or are you just wondering how to manage the connection to multiple different peers with the same socket? The former is a special case, the latter is simple.
I'm going to work with the following assumptions:
You want a worker computer from the cluster to announce its availability for work before it receives any work, and any computer in the cluster can handle any job. A faster worker, or a worker working on a smaller job, will not have to wait behind some slow worker to finish their job and get a new job first.
The proxy/broker uses a single ip interface to communicate with all workers.
If those are true, then what you want will be closer to this:
REQ->ROUTER|ROUTER->[REQ, REQ, ...]
A worker will create a request to the backend router socket to announce its availability, and await a reply with work. Once it is finished, it will create a new request with the finished work, which again announces its availability. The other half of the pattern you've already worked out.
This is the Simple Pirate Pattern from the ZMQ guide. It's a good place to start, but it's not very robust. This is in the Reliable Request-Reply Patterns section of the guide, and I suggest you read or reread that section carefully as it will guide you well. In particular, they keep refining this pattern into more and more reliable implementations and wind up with the Majordomo pattern, which is very robust and fault tolerant. You should see if you need all the features that provides or if you can scale it back a little. Either way, you should learn and understand what these patterns are doing and why before you make the choice to do something different.
We're implementing a SIP-based solution and have configured the setup to work with RTPProxy. Right now, we're routing everything through RTPProxy as we were having some issues with media transport relying on ICE. If we're not mistaken, a central relay server is necessary for relaying streaming data between two clients if they're behind symmetric NATs. In practice, is this a large percentage of all consumer users? How much bandwidth woudl we save if we implemented proper routing to skip the relay server when not necessary. Are there better solutions we're missing?
In falling order of usefulness:
There is a direct connection between the two endpoints in both directions. You just connect and you are essentially done.
There is a direct connection between the two endpoints in one direction. In that case you just connect via the right direction by trying both.
Both parties are behind NATs of some kind.
Luckily, UPnP works in one end, you can then upgrade the connection to the above scheme
UPnP doesn't work, but STUN does. Use it to punch a hole in the NAT. There are a couple of different protocols but the general trick is to negotiate via a middle man that coordinates the NAT-piercing.
You fall back to let another node on the network act as a relaying proxy.
If you implement the full list above, then you have to give up very few connections and don't have to spend much time on bandwidth utilization at proxies. The BitTorrent protocol, of which I am somewhat familiar, usually stops at UPnP, but provides a built-in test to test for connectivity through the NAT.
One really wonders why IPv6 did not get implemented earlier - this is a waste of programmers time.
Real world NAT types survey (not a huge dataset, though):
http://nattest.net.in.tum.de/results.php
According to Google, about 8% of the traffic has to be relayed: http://code.google.com/apis/talk/libjingle/important_concepts.html
A large percentage (if not the majority) of home users uses NAT, as that is what those xDSL/cable routers use to provide network access to the local network.
You can theoretically use UPnP to open ports and set-up forwarding rules on the router to go through the NAT transparently. Unfortunately (or fortunately, depending on who you are) many users disable UPnP as a matter of course on their router and may not appreciate having to add forwarding rules manually.
What you might be able to do (and what Skype does AFAIK) is to have some of the users that have clear network paths and enough bandwidth act as relay nodes. Apart from the routing and QoS issues, you would at least have to find some way to ensure the privacy of any relayed data from anyone, including the owner of the relay node. In addition, there might be legal issues to settle with this approach, apart from the technical ones.