Hybrid P2P's server data directory - dictionary

I'm developing a some sort like P2P system here.
The types of P2P network that I'm going to use is Hybrid P2P.
But, there's a problem here which I don't know how to solve.
It's the data directory.
How does the data directory works?
How does it knows that which nodes contain what files?

If I understand your question correctly, you are referring to indexing resources available on your P2P network. Basically, a central peer stores which electronic resource is available on which peer.
Instead of querying all peers, a given peer can query the central peer containing the index. Then, this information can be used to fetch the electronic resource (PDF file, music, video, etc....) from the correct peer. It reduces traffic.

Related

How to establish a direct peer-peer connection between two computers in two distant locations through internet?

While building an application by python for implementing post-processing in Quantum Key Distribution, I require a server (say Alice) and a client(say Bob) at distant locations to interact and exchange some information while doing the required calculations simultaneously (for which threading is used).
At first, the open-source NGROK service was used as a hosting server but it makes the time required for post-processing very huge mainly due to network congestion.
So, is there a way to establish a direct peer-to-peer connection between Alice and Bob via the internet wherein Alice's system itself acts as a server thereby bypassing a third-party host server? If there is a way out please suggest one otherwise any leads or suggestions for open source services better than NGROK would be highly helpful.
PS: If you need any additional information for providing help, i would be eager to respond.

DICOM C-StoreSCP: Is there any way to know for sure that the study is received completely?

This question is next part of my other question.
My SCP receive images from multiple clients.
Each client behaves differently.
Some clients send complete study on only one association; so in this case, when association is closed, SCP can know that complete study is received.
Some clients send multiple studies on same association; which is DICOM legal.
Some clients send one study on multiple associations; which is DICOM legal.
Data transfer happens on unstable internet. If study is being transferred and connection disconnects for any reason, instances those were successfully stored will not be sent again. Only failed/pending instances will be sent in next attempt.
Considering all above, is there any DICOM way to know study is received completely.
Storage commitment is not good solution in my understanding. Most of the users do not support it. Also, this feature is designed for SCU to know if instance is stored on SCP; not other way around.
MPPS is also not reliable. Please refer Conclusion section of my other question.
I read this post which has similar requirement. Timeout solution mentioned there is not reliable in my understanding.
The goal of DICOM Storage Service is to allow simple transfer of information Instances (objects) such as images, waveform, reports etc. So the concept of Study is not associated with this service.
Also, there is no clear cut definition of what constitutes a "Study" and standard does not constrain how a study is transferred from SCU to SCP (e.g. in a single association, multiple association or any time constrain).
Storage Commitment Service also operates on Instances and it is in implementer desecration when to send the Storage Commitment request. As for example, each time modality captures an image and stores it to PACS or once all images are transferred to PACS or when modality needs to free-up the local drive space etc.
However, an SCU can send study level C-MOVE to an SCP for transfer all Instances sharing the same Study Instance UID (a study) to a destination SCP.
So, there is no definitive way to know if a client completed sending a study.

Autodiscovery in P2P Applications

I want to create a P2P application on the internet. What is the best or if none exist a good enough way to do auto-discovery of other nodes in a decentralized network?
Grothoff and GauthierDickey from the GNUnet project (an anonymous censorship-resistant file-sharing network) researched on the question of bootstrapping a p2p network without any central hostlist.
They found that for the Gnutella (Limewire) network a random ip search needed on average 2500 connection attempts to find a peer.
In the paper they proposed a method which reduced the required connection attempts to 817 for Gnutella and 51 for the E2DK network.
Achieved was this through creating a statistical profile of p2p users for every DNS organization, this small (around 100kb) discovery database has to be created in advance and shipped with the p2p client.
This is the holy grail of P2P. There isn't a magic solution really - there's no way a node can discover other nodes without a good known point to act as a reference (well, you can do so on a LAN by using broadcasting, but not on the internet). P2P filesharing tends to work by having known websites distributing 'start points' for discovery, and then further discovery (I would expect) can come from asking nodes what other nodes they know about.
A good place to start on research would be Distributed Hash Tables.
As for security, that topic will be in the literature somewhere, I should think - again I would recommend Wikipedia. Non-existent ones are trivially dealt with: if you can't contact an IP/port, don't keep it on your list, and if a node regularly provides non-existent pointers, consider de-prioritising it or removing it from your list entirely.
For evil nodes, it depends on your use case, but let's say you are doing file sharing. If you request a section of a file, check with several nodes what the file section's hash should be, and then request by hash. If the evil node gives you a chunk that has a different hash, then you can again de-prioritise or forget that node.
Distributed processing systems work a little differently: they tend to ask several unrelated nodes to perform the same work, and then they use a voting system (probably using hashing again) to determine whether evilness is at hand. If a node provides consistently bad results, the administrator is contacted or the IP is removed from the known nodes list.
ok, for two peers to find each other they both have to know a common, lets say, mediator to exchange IPs once. You can use anything for this kind of the first handshake whilst being able to WRITE and READ from that "channel". i.e: DNS (your well known domains), e-Mail, IRC, Twitter, Facebook, dropbox, etc.

p2p simulation and distributed hash table

I am learning the p2p architecture through a simulation within a single machine. For that I was told that I can use named pipes. I am in the design phase of the simulation. Here is how I plan to move forward:
Create a peer program that is able to join the p2p 'network'. The network is a collection of such peers. The peer is identified via the process id in the machine.
When a peer is created i.e., joins the network, it announces it's arrival through bootstrapping itself to a 'master-node' or tracker. When the peer announces its arrival, the master-node updates its list of peers (process ids), and returns the list of peers available in the network to the new peer.
Once in the network, the peer can now download a file from the network or upload a file to an incoming request for a file that s/he has. The files that the peer receives through the network automatically becomes available for upload.
To download a file, the peer calls a location algorithm that locates the peer with the file that the current peer is seeking to download.
As you can see, there is a gap in my understanding of the design. From my naive approach, I see #2 and #3 as different steps. However, I have a feeling that they must somehow be related. I guess my understanding lacks in knowing how a distributed hash table and algorithm like Chord or CAN works. I need help in rounding up these scattered ideas that will help me actually implement the simulation.
To begin, my first question is: where and how do I initiate the location algorithm? Or where is the distributed hash table created in the steps listed above?
You have a choice of either implementing a tracker server a la Napster. Or you can implement DHT-capable nodes - Bit torrent style.
In the former case, the tracker keeps track of all the nodes and the files they are hosting.
And in the latter, each node keeps track of the information (at least one peer in the network assumed in this case). A new peer will then contact one of the DHT - capable peer and get information about the node that is hosting the file that it requires.

Is there a library that can perform packet analysis and block certain packets from being sent?

I found Jpcap, however it only meets half my requirements - it does not allow me to block packets, as stated in the FAQ. I would prefer a cross-platform (Windows, Mac, Linux) solution, but if one does not exist, OS-specific solutions would be acceptable.
My goal is to, under certain conditions, block access to certain Internet and network resources by finding out where the packets are going and blocking the ones that meet specific criteria, regardless of how the resource was accessed. Perhaps I'm going about this the wrong way, so any advice would be appreciated.
My goal is to ... block access to certain Internet and network resources by ... blocking [packets] that meet specific criteria, regardless of how the resource was accessed.
that's only doable in the kernel, and as such is completely platform-specific.
There is also the libpcap, but I'm not sure it will exactly do what you're looking for ...
according to the sourceforce page:
libpcap is a system-independent interface for user-level packet capture. libpcap provides a portable framework for low-level network monitoring. Applications includenetwork statistics collection, security monitoring, network debugging, etc.

Resources