Graph and nodes how to determine the connection weight and impact - networking

I am having a network of objects that are connected each other.
Each object as usual can have classes and structures, etc.
I am giving you an example because it is difficult to explained other wise.
A structure propagates from one object to the another.For example object-1 can be an FTP server that sends a specific file with specific structure to a neighbor node, object-2. This connection e.g. is related with a protocol (FTP) and a specific file structure.
Among other what I am trying to determine are the following:
1. How can I measure the weight of this connection, I mean what could be the characteristics that I have to choose and use? How can I measure them?
In the case where nodes are connected to other nodes any change of a property, eg. a file structure impacts the operation of the next node that is connected. In which way I can can determine the diffusion of a change within a network and the impact per node?
Thanks a lot
George

Related

Is it possible to generate a fat tree by any number of nodes?

As my question title suggests, I have a confusion about the fat-tree structure.
I am trying to write a program, where I get a certain number of nodes as my input and I should generate an output that builds a fat-tree topology out of them.
For example, if my input is 4, my output must represent a fat-tree topology made by 4 nodes(n1,n2,n3,n4)
As far as I could read, fat-tree topology is only dependant on the number of ports rather than the nodes. This is why I am confused about whether it is possible to create a fat-tree structure with the number of nodes as my only input at all!.
I am very new to networking concepts, I would appreciate any guidance
If I understood the question, you have a certain amount of nodes in input, and you want to build a FatTree topology with these nodes.
Unfortunately, you cannot create a complete FatTree topology with an arbitrary number of nodes.
If you are confused about the construction, I suggest to have a look at this link
For my master thesis, I explored some data center topologies and their feasability for network tomography-based monitoring applications. This resultet in a few python models—FatTree included—implemented using the networkx library, that are available on Github. The code is not the prettiest, especially the visualisation parts, and could surely be improved, but I hope it can still be useful to gain an intuition about how these topologies scale.
If you start playing around with the different scales of the FatTree you will quickly see, that Giuseppe is right. A fat-tree has a very strict structure that is only dependant on the port number parameter. It is therefore indeed not possible to construct a fat-tree with an arbitrary number of nodes.
Although I'm late in answering this, and others have already given the correct answer, I'd still like to add some value with respect to FatTree topology design.
-> For a k-port switch based Fattree topology, you can derive these values by using tree data structure properties and the topology requirements:
- number of core switches = (k/2)^2
- number of pods = k
- number of aggregation switches in each pod = k/2
- number of edge switches in each pod = k/2
- each aggregation switch connected to k/2 core switches and k/2 edge switches
- each edge switch connected to k/2 aggregation switches and k/2 nodes
- i.e., each pod consists of (k/2)^2 nodes
- number of nodes possible to be connected to the network = (K^3)/4
Since the number of servers possible to be connected to this network is expressed in terms of k, now you can clearly see that you can't create a fattree topology with any number of nodes. The count of nodes can only take the forms (k^3)/4 for even values of k (to be integer values), e.g., 16, 54, and so on. So, in other words, you can't have a proper fattree topology with random node count (different than listed above or if not expressed as above)!

Using chord to implement a torrent network

From my understanding, in order to implement a torrent network, you have maintain a map from hash value of the file to the list of peers available in the network.
However, I think there are a few problems with this.
the list will be changed constantly for it will be updated when a nodes joins or leaves.
the list might become very long? but there can only be one version of the list in the dht network(for it's a hash table)
You have maintain a map from hash value of the file to the list of peers available in the network.
That is correct and it's the purpose of the DHT. The DHT will keep information on who is on the network and who has a certain information.
The list will be changed constantly for it will be updated when a nodes joins or leaves.
This is also correct but bit torrent DHTs are built with this in mind, with protocols for joining and removing offline peers.
the list might become very long? but there can only be one version of the list in the dht network(for it's a hash table)
That's the thing with DHT, it's distributed so you don't need to keep the entire DHT in one place. You only need a find the information you're looking for across the network.
To join a network you'll need a way to find some peers and to request the DHT. Usually this is done with bootstrap nodes. Once you've contacted a node you will be able to share information you have and discover other nodes and get information they have.
Each DHT has its own algorithms, you can find some details for the Chord DHT here.

Options to achieve consensus in an immutable distributed hash table

I'm implementing a completely decentralized database. Anyone at any moment can upload any type of data to it. One good solution that fits on this problem is an immutable distributed hash table. Values are keyed with their hash. Immutability ensures this map remains always valid, simplifies data integrity checking, and avoids synchronization.
To provide some data retrieval facilities a tag-based classification will be implemented. Any key (associated with a single unique value) can be tagged with arbitrary tag (an arbitrary sequence of bytes). To keep things simple I want to use same distributed hash table to store this tag-hash index.
To implement this database I need some way to maintain a decentralized consensus of what is the actual and valid tag-hash index. Immutability forces me to use some kind of linked data structure. How can I find the root? How to synchronize entry additions? How to make sure there is a single shared root for everybody?
In a distributed hash table you can have the nodes structured in a ring, where each node in the ring knows about at least one other node in the ring (to keep it connected). To make the ring more fault-tolerant make sure that each node has knowledge about more than one other node in the ring, so that it is able to still connect if some node crashes. In DHT terminology, this is called a "sucessor list". When the nodes are structured in the ring with unique IDs and some stabilization-protocol, you can do key lookups by routing through the ring to find the node responsible for a certain key.
How to synchronize entry additions?
If you don't want replication, a weak version of decentralized consensus is enough and that is that each node has its unique ID and that they know about the ring structure, this can be achieved by a periodic stabilization protocol, like in Chord: http://nms.lcs.mit.edu/papers/chord.pdf
The stabilization protocol has each node communicating with its successor periodically to see if it is the true successor in the ring or if a new node has joined in-between in the ring or the sucessor has crashed and the ring must be updated. Since no replication is used, to do consistent insertions it is enough that the ring is stable so that peers can route the insertion to the correct node that inserts it in its storage. Each item is only held by a single node in a DHT without replication.
This stabilization procedure can give you very good probability that the ring will always be stable and that you minimize inconsistency, but it cannot guarantee strong consistency, there might be gaps where the ring is temporary unstable when nodes joins or leaves. During the inconsistency periods, data loss, duplication, overwrites etc could happen.
If your application requires strong consistency, DHT is not the best architecture, it will be very complex to implement that kind of consistency in a DHT. First of all you'll need replication and you'll also need to add a lot of ACK and synchronity in the stabilization protocol, for instance using a 2PC protocol or paxos protocol for each insertion to ensure that each replica got the new value.
How can I find the root?
How to make sure there is a single shared root for everybody?
Typically DHTs are associated with some lookup-service (centralized) that contains IPs/IDs of nodes and new nodes registers at the service. This service can then also ensure that each new node gets a unique ID. Since this service only manages IDs and simple lookups it is not under any high load or risk of crashing so it is "OK" to have it centralized without hurting fault-tolerance, but of course you could distribute the lookup service as well, and sycnhronizing them with a consensus protocol like Paxos.

How to define topology in Castalia-3.2 for WBAN

How can defined topology in Castalia-3.2 for WBAN ?
How can import topology in omnet++ to casalia ?
where the topology defined in default WBAN scenario in Castalia?
with regard
thanks
Topology of a network is an abstraction that shows the structure of the communication links in the network. It's an abstraction because the notion of a link is an abstraction itself. There are no "real" links in a wireless network. The communication is happening in a broadcast medium and there are many parameters that dictate if a packet is received or not, such as the power of transmission, the path loss between transmitter and receiver, noise and interference, and also just luck. Still, the notion of a link could be useful in some circumstances, and some simulators are using it to define simulation scenarios. You might be used to simulators that you can draw nodes and then simply draw lines between them to define their links. This is not how Castalia models a network.
Castalia does not model links between the nodes, it models the channel and radios to get a more realistic communication behaviour.
Topology is often confused with deployment (I confuse them myself sometimes). Deployment is just the placement of nodes on the field. There are multiple ways to define deployment in Castalia, if you wish, but it is not needed in all scenarios (more on this later). People can confuse deployment with topology, because under very simplistic assumptions certain deployments lead to certain topologies. Castalia does not make these assumptions. Study the manual (especially chapter 4) to get a better understanding of Castalia's modeling.
After you have understood the modeling in Castalia, and you still want a specific/custom topology for some reason then you could play with some parameters to achieve your topology at least in a statistical sense. Assuming all nodes use the same radios and the same transmission power, then the path loss between nodes becomes a defining factor of the "quality" of the link between the nodes. In Castalia, you can define the path losses for each and every pair of nodes, using a pathloss map file.
SN.wirelessChannel.pathLossMapFile = "../Parameters/WirelessChannel/BANmodels/pathLossMap.txt"
This tells Castalia to use the specific path losses found in the file instead of computing path losses based on a wireless channel model. The deployment does not matter in this case. At least it does not matter for communication purposes (it might matter for other aspects of the simulation, for example if we are sampling a physical process that depends on location).
In our own simulations with BAN, we have defined a pathloss map based on experimental data, because other available models are not very accurate for BAN. For example the, lognormal shadowing model, which is Castalia's default, is not a good fit for BAN simulations. We did not want to enforce a specific topology, we just wanted a realistic channel model, and defining a pathloss map based on experimental data was the best way.
I have the impression though that when you say topology, you are not only referring to which nodes could communicate with which nodes, but which nodes do communicate with which nodes. This is also a matter of the layers above the radio (MAC and routing). For example it's the MAC and Routing that allow for relay nodes or not.
Note that in Castalia's current implementations of 802.15.6MAC and 802.15.4MAC, relay nodes are not allowed. So you can not create a mesh topology with these default implementations. Only a star topology is supported. If you want something more you'll have to implemented yourself.

Who can explain the 'Replication' in dynamo paper?

In dynamo paper : http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
The Replication section said:
To account for node failures, preference list contains more than N
nodes.
I want to know why ? and does this 'node' mean virtual node ?
It is for increasing Dynamo availability. If the top N nodes in the preference list are good, the other nodes will not be used. But if all the N nodes are unavailable, the other nodes will be used. For write operation, this is called hinted handoff.
The diagram makes sense both for physical nodes and virtual nodes.
I also don't understand the part you're talking about.
Background:
My understanding of the paper is that, since Dynamo's default replication factor is 3, each node N is responsible for circle ranging from N-3 to N (while also being the coordinator to ring range N-1 to N).
That explains why:
node B holds keys from F to B
node C holds keys from G to C
node D holds keys from A to D
And since range A-B falls within all those ranges, nodes B, C and D are the ones that have that range of key hashes.
The paper states:
The section 4.3 Replication:
To address this, the preference list for a key is constructed by skipping positions in the ring to ensure that the list contains only distinct physical nodes.
How can preference list contain more than N nodes if it is constructed by skipping virtual ones?
IMHO they should have stated something like this:
To account for node failures, ring range N-3 to N may contain more than N nodes, N physical nodes plus x virtual nodes.
Distributed DBMS Dyanmo DB falls in that class which sacrifices Consistency. Refer to the image below:
So, the system is inconsistent even though it is highly available. Because network partitions are a given in Distributed Systems you cannot not pick Partition Tolerance.
Addressing your questions:
To account for node failures, preference list contains more than N nodes. I want to know why?
One fact of Large Scale Distributed Systems is that in a system of thousands of nodes, failure of a nodes is a norm.
You are bound to have a few nodes failing in such a big system. You don't treat it as an exceptional condition. You prepare for such situations. How do you prepare?
For Data: You simply replicate your data on multiple nodes.
For Execution: You perform the same execution on multiple nodes. This is called speculative execution. As soon as you get the first result from the multiple executions you ran, you cancel the other executions.
That's the answer right there - you replicate your data to prepare for the case when node(s) may fail.
To account for node failures, preference list contains more than N nodes. Does this 'node' mean virtual node?
I wanted to ensure that I always have access to my house. So I copied my house's keys and gave them to another family member of mine. This guy put those keys in a safe in our house. Now when we all go out, I'm under the illusion that we have other set of keys so in case I lose mine, we can still get in the house. But... those keys are in the house itself. Losing my keys simply means I lose the access to my house. This is what would happen if we replicate the data on virtual nodes instead of physical nodes.
A virtual node is not a separate physical node and so when the real node on which this virtual node has been mapped to will fail, the virtual node will go away as well.
This 'node' cannot mean virtual node if the aim is high availability, which is the aim in Dynamo DB.

Resources