Autodiscovery in P2P Applications - networking

I want to create a P2P application on the internet. What is the best or if none exist a good enough way to do auto-discovery of other nodes in a decentralized network?

Grothoff and GauthierDickey from the GNUnet project (an anonymous censorship-resistant file-sharing network) researched on the question of bootstrapping a p2p network without any central hostlist.
They found that for the Gnutella (Limewire) network a random ip search needed on average 2500 connection attempts to find a peer.
In the paper they proposed a method which reduced the required connection attempts to 817 for Gnutella and 51 for the E2DK network.
Achieved was this through creating a statistical profile of p2p users for every DNS organization, this small (around 100kb) discovery database has to be created in advance and shipped with the p2p client.

This is the holy grail of P2P. There isn't a magic solution really - there's no way a node can discover other nodes without a good known point to act as a reference (well, you can do so on a LAN by using broadcasting, but not on the internet). P2P filesharing tends to work by having known websites distributing 'start points' for discovery, and then further discovery (I would expect) can come from asking nodes what other nodes they know about.
A good place to start on research would be Distributed Hash Tables.
As for security, that topic will be in the literature somewhere, I should think - again I would recommend Wikipedia. Non-existent ones are trivially dealt with: if you can't contact an IP/port, don't keep it on your list, and if a node regularly provides non-existent pointers, consider de-prioritising it or removing it from your list entirely.
For evil nodes, it depends on your use case, but let's say you are doing file sharing. If you request a section of a file, check with several nodes what the file section's hash should be, and then request by hash. If the evil node gives you a chunk that has a different hash, then you can again de-prioritise or forget that node.
Distributed processing systems work a little differently: they tend to ask several unrelated nodes to perform the same work, and then they use a voting system (probably using hashing again) to determine whether evilness is at hand. If a node provides consistently bad results, the administrator is contacted or the IP is removed from the known nodes list.

ok, for two peers to find each other they both have to know a common, lets say, mediator to exchange IPs once. You can use anything for this kind of the first handshake whilst being able to WRITE and READ from that "channel". i.e: DNS (your well known domains), e-Mail, IRC, Twitter, Facebook, dropbox, etc.

Related

Atomic swap in a cross compatibility zone setup

I am pretty new to corda and I am curious if it is possible to do a cross compatibility zone DvP. According to https://www.corda.net/2017/08/compatibility-and-upgrades/ it is possible to have different corda newtorks in a global network.
My question addresses following use case:
let's say I have two corda networks (compatibility zones). Each network has its own notary, nodes, customers & KYC process and is supporting a certain asset.
The first network provides for example a payment infrastructure and the second network a securities network.
Is it possible to do that by using R3 corda, if yes is there any example/tutorial?
Thanks in advance for any support!
The answer is yes but I think we're talking at cross-purposes :) Networks operated and governed by different entities are intended to form and operate WITHIN a compatibility zone.
The way I think it's most helpful to think of Compatibility Zones is to imagine the concept just doesn't exist... imagine there was just ONE Corda network (ie CZ) that everybody used (that was transparently/openly governed so no one firm/group of firms controlled it)... and then all the different apps and business networks existed within it... able to interoperate and transact across each other, because their nodes were compatible... they would understand and accept each other's transactions, etc.
Think about it from the perspective of a firm installing a blockchain node: getting onto any blockchain network (a Corda CZ or whatever the equivalent concept is for other platforms)... getting an identity, punching the right holes in the firewall, setting up the node infrastructure... it's analogous to the work needed to get a firm "on the internet" - setting up routers, getting IP addresses, etc, etc.
It's the kind of thing you want to do once and then reuse ruthlessly. The idea that you would have to connect to an entirely new communications network for each app your firm used would be ludicrous. And yet that's how some people seem to think blockchain deployments should be: ie for each app, you set up a separate blockchain network with its own nodes and settings and identity layer and consensus providers. But that's surely just nonsense, right?
You want to connect to a global network once and then reuse that infrastructure.
So the idea is that we try to have as few CZs as possible and encourage as many business networks as possible to form within that small number of CZs.
I know this can mess with your mind when you first hear about it because all the other enterprise blockchain platforms are going in totally the wrong direction (in my opinion..!) They seem to be encouraging the formation of a separate private network for each application. But that just seems crazy to me.
So maybe try this: even if you think I'm mad, play along with the idea for a day or so and see if it begins to grow on you :) If not, let's debate it again but I really do think this idea of multiple apps on the same overall shared network (ie multiple business networks in a single compatibility zone) is just so amazingly powerful as a concept.
So to your answer: can you do cross-app/cross-business-network DvP within a CZ? Yes! That is one of the key use-cases we invented Corda to solve... it's almost perfect for those sorts of scenarios.
Could you do it if the two apps were on different CZs? Well, yes... but it would be like asking if you could do DvP between assets managed in different databases or hosted on different blockchains.. it's just messier... needing locking and 2PC and all the stuff that we can just eliminate if we hold ourselves accountable for not creating needless balkanisation/siloed deployment through deployment of standalone networks unless they're really, really needed.

Is it possible, for an other client, to 'trust' a client in a network

I'm considering a problem in which a node lives in a network of many nodes. Nodes come and go. Each new node is spawned from an other (trusted) node (at least I could do that). Data is being transmitted between the nodes. No central authority exists.
Is it possible for a receiving node to know that the transmitting node is trusted?
I'm looking for all sorts of attack vectors. The server running the node could be compromised. Man in the middle attacks? How does Bittorrent prevent malicious bytes entering the network (do they at all)? Could public/private encryption play a role?
Try looking into PGP, which uses the 'web of trust' concept. It sounds to me like that's what you're looking for.

Peer-to-peer replication of a sqlite database

I am looking for a way to replicate a small and simple relational database (like SQLite) across peers. This should work in an environment with unstable network connections, hence the need for each peer to have a full copy of the database. This should allow a peer to continue working off-line in the event of network failure.
To keep things simple, replication should only have to support the replication of addition of data, i.e. only INSERTs, not DELETEs or UPDATEs.
Does anyone know of a good - and ideally cross-platform - technology or method of creating such a system? I am currently looking at JXTA and JXSE, but I am put off by its complexity and apparant lack of life in its community after the takeover of Sun by Oracle.
Thanks!
Frans
rqlite uses the raft consensus algorithm, so it should be fairly resilient to unstable network connection.
Also, it seems to be possible to configure rqlite to accept reads even in the case of a network failure.
A similar project, dqlite, exists as a library, available in various languages, but it seems less explicit about the event of a network failure.
You may want to explore JGroups for the communication layer if you don't like JXTA. For the replication, I think you will have to implement your own code.
I am working on something similar (though the code is far from ready). I'll describe a little about my intended approach, but whether that is suitable for you depends on some key design points you'd need to consider. I am not aware of any ready-built projects that will do this, unfortunately.
In particular we'd need to know what language you wish to use, or which languages you'd rather avoid.
Also, consider how you intend to do peer dicovery - can you set up trust between node pairs manually, or do you want them to auto-discover?
Presumably all peers may insert data?
If you are able to use PHP, and are happy manually peering node pairs, then my approach may be of interest. Set up an ORM such as Doctrine, Propel or NotORM, and get each node to regularly sync with an internet time source. For each new row in a db, grab the data (either in an array or ORM object), serialise it, and push it out to all nodes that you have a trust relationship with. Where a push fails, keep a note of this and retry at periodic intervals (potentially giving up after a remote node fails to answer a large number of retries).
Pushes can either be kicked off by your application that creates the row, or can be called by whatever scheduler is available on each machine. A push message can be XML, or for simplicity can be just a POST message containing the new row and whatever metadata (e.g. timestamp of save, so as to resolve INSERT order from several nodes).
If your nodes do not have static IP addresses, they could be registered with a dynamic DNS addressing service so as to allow each node to stay in touch with peers even if their IP changes. You might also consider adding a message signing system, to ensure that messages between nodes are genuine.

I want to build a decentralized, reddit-like system using P2P. What existing p2p library should I base it on?

I want to build a decentralized, reddit-like system using P2P. Basically, I want to retain the basic capabilities of reddit, but make it decentralized, to make it more robust and immune to censorship. This will also allow people to develop different clients to match the way they want to browse it.
Could you recommend good p2p libraries to base my work on? They should be open-source, cross-platform, robust and easy to use. I don't care much about the language, I can adapt.
Disclaimer: warning, self-promotion here !!!
Have you considered JXTA's latest release? It is probably sufficient for what you want to do. Else, we are working on a new P2P framework called Chaupal, but it is not operational yet.
EDIT
There is also what I call the quick-and-dirty UDP solution (which is not so dirty after all, I should call it minimal).
Just implement one server with a public address and start listening for UPD.
Peers located behind NATs contact the server which can read how their private IP address has been translated into a public IP address from the received datagrams.
You send that information back to the peer who can forward it to other peers. The server can also help exchanging this information between peers.
Then peers can communicate directly (one-to-one) by sending datagrams to these translated addresses.
Simple, easy to implement, but does not cover for lost datagrams, replays, out-of-order etc... (i.e., the typical stuff that TCP solves for you at the IP stack level).
I haven't had a chance to use it, but Telehash seems to have been made for this kind of application. Peer2Peer apps have a particular challenge dealing with the restrictions of firewalls... since Telehash is based on UDP, it's well suited for hole-punching through firewalls.
EDIT for static_rtti's comment:
If code velocity is a requirement libjingle has a lot of effort going into it, but is primarily geared towards XMPP. You can port off parts of the ICE code and at least get hole-punching. See the libjingle architecture overview for details about their implementation.
Check out CouchDB. It's a decentralized web app platform that uses an HTTP API. People have used it to create "CouchApps" which are decentralized CouchDB-based applications that can spread in a viral nature to other CouchDB servers. All you need to know to write CouchApps is Javascript and learn the CouchDB API. You can read this free online book to learn more: http://guide.couchdb.org
The secret sauce to CouchDB is a Master-to-Master replication protocol that lets information spread like a virus. When I attended the first CouchConf, they demonstrated how efficient this is by throwing a "Couch Party" (which is where you have a room full of people replicating to the person next to them simulating an ad hoc network).
Also, all the code that makes a CouchApp work is public by default in special entities known as Design Documents.
P.S. I've been thinking of doing a similar project, but I don't have a lot of time to devote to it at the moment. GOD SPEED MY BOY!

how to dispatch network requests to the (geographically) closest server

I'm a Java coder and not very familiar with how networks work (other than basic UDP/TCP connections)
Say I have servers running on machines in the US, Asia, Latin America and Europe. When a user requests a service, I want their request to go to the server closest to them.
Is it possible for me to have one address: mycompany.com, and somehow get requests routed to the appropriate server? Apparently when someone goes to cnn.com, they receive the pictures, videos, etc. from a server close to them. Frankly, I don't see how that works.
By the way, my servers don't serve web pages, they serve other services such as stock market data....just in case that is relevant.
Since I'm a programmer, I'm interested to know how one would do it in software. Since this is little more than an idle curiosity, pointers to commercial products or services won't be very helpful in understanding this problem :)
One simple approach would be to look at the first byte (Class A) of the IP address coming into the UDP DNS request and then based off that you could deliver the right geo-located IP.
Another approach would be a little more complicated. Instead of using the server that is geographically closest to the user, you could use the server that has the lowest latency for that user.
The lower latency will provide faster transfer speeds while being easier to calculate than geographic location.
For a much more detailed look, check out this article on CDNs (pay attention to the Technology Section):
Content Delivery Network - Wikipedia
These are the kinds of networks that the large sites use to distribute their content over the net (Akamai is a popular example). As you can see, things can get pretty complicated pretty quickly with CDNs having their own proprietary protocols, etc...
Update: I didn't see the disclaimer about commercial solutions at the end of the original post. I'll leave this up for those who may find it of interest.
--
Take a look at http://ultradns.com/. A managed DNS service like that may be just what you need to accomplish what you are looking for.
Amazon.com, Forbes.com, Oracle, all use them...
Quote From http://ultradns.com/solutions/traffic.html:
UltraDNS Traffic Management solution provides a set of tools allowing IT administrators to define load balancing configurations for content servers residing in one or more geographic locations. The Traffic Management Solution manages traffic directed to the servers by dynamically changing the responses to DNS requests. Load balancing is performed based on dynamic metrics obtained from the host servers on a continual monitoring basis. The UltraDNS Traffic Management solution is not a single application, but combines the capabilities of several existing UltraDNS systems to control traffic, manage site failures, and optimize web content systems.
One approach is, as Jeff mentioned, using the IP address: http://en.wikipedia.org/wiki/Geolocation_software
In my experienced, this is precise to the nearest relatively large city (in the US at least). There are several open databases to aid in this (see the wiki link). Then you can generate image tags and download links and such based on this information.
As for locating the nearest server, I'm sure you can think of a few ways to do it. For instance, if the best return you can get is major city, you can lookup that city in a list of Latitude/Longitude and calculate the nearest server based on that.

Resources