Consistent hashing: Where is the data-structure of ring kept - hashtable

We have N cache-nodes with basic consistent-hashing in a ring.
Questions:
Is data-structure of this ring stored:
On each of these nodes?
Partly on each node with its ranges?
On a separate machine as a load balancer?
What happens to the ring when other nodes join it?
Thanks a lot.

I have found an answer to the question No 1.
Answer 1:
All the approaches are written in my blog:
http://ivoroshilin.com/2013/07/15/distributed-caching-under-consistent-hashing/
There are a few options on where to keep ring’s data structure:
Central point of coordination: A dedicated machine keeps a ring and works as a central load-balancer which routes request to appropriate nodes.
Pros: Very simple implementation. This would be a good fit for not a dynamic system having small number of nodes and/or data.
Cons: A big drawback of this approach is scalability and reliability. Stable distributed systems don’t have a single poing of failure.
No central point of coordination – full duplication: Each node keeps a full copy of the ring. Applicable for stable networks. This option is used e.g. in Amazon Dynamo.
Pros: Queries are routed in one hop directly to the appropriate cache-server.
Cons: Join/Leave of a server from the ring requires notification/amendment of all cache-servers in the ring.
No central point of coordination – partial duplication: Each node keeps a partial copy of the ring. This option is direct implementation of CHORD algorithm. In terms of DHT each cache-machine has its predessesor and successor and when receiving a query one checks if it has the key or not. If there’s no such a key on that machine, a mapping function is used to determine which of its neighbors (successor and predessesor) has the least distance to that key. Then it forwards the query to its neighbor thas has the least distance. The process continues until a current cache-machine finds the key and sends it back.
Pros: For highly dynamic changes the previous option is not a fit due to heavy overhead of gossiping among nodes. Thus this option is the choice in this case.
Cons: No direct routing of messages. The complexity of routing a message to the destination node in a ring is O(lg N).

Related

P2P Network Bootstrapping

For P2P networks, I know that some networks have initial bootstrap nodes. However, one would assume that with all new nodes learning of peers from said bootstrap nodes, the network would have difficulty adding new peers and would end up with many cliques - unbalanced, for lack of a better word.
Are there any methods to prevent this from occuring? I'm aware that some DHTs structure their routing tables to be a bit less susceptible to this, but I would think the problem would still hold.
To clarify, I'm asking about what sort of peer mixing algorithms exist/are commonly used for peer to peer networks.
However, one would assume that with all new nodes learning of peers from said bootstrap nodes, the network would have difficulty adding new peers and would end up with many cliques - unbalanced, for lack of a better word.
If the bootstrap nodes were the only source of peers and no further mixing occured that might be an issue. But in practice bootstrap nodes only exist for bootstrapping (possibly only ever once) and then other peer discovery mechanisms take over.
Natural mixing caused by connection churn should be enough to randomize graphs over time, but proactive measures such as a globally agreed-on mixing algorithm to drop certain neighbors in favor of others can speed up that process.
I'm aware that some DHTs structure their routing tables to be a bit less susceptible to this, but I would think the problem would still hold.
The local buckets in kademlia should provide an exhaustive view of the neighborhood, medium-distance buckets will cover different parts of the keyspace for different nodes and the farthest buckets will preferentially contain long-lived nodes which should have a good view of the network.
This doesn't leave much room for clique-formation.

Options to achieve consensus in an immutable distributed hash table

I'm implementing a completely decentralized database. Anyone at any moment can upload any type of data to it. One good solution that fits on this problem is an immutable distributed hash table. Values are keyed with their hash. Immutability ensures this map remains always valid, simplifies data integrity checking, and avoids synchronization.
To provide some data retrieval facilities a tag-based classification will be implemented. Any key (associated with a single unique value) can be tagged with arbitrary tag (an arbitrary sequence of bytes). To keep things simple I want to use same distributed hash table to store this tag-hash index.
To implement this database I need some way to maintain a decentralized consensus of what is the actual and valid tag-hash index. Immutability forces me to use some kind of linked data structure. How can I find the root? How to synchronize entry additions? How to make sure there is a single shared root for everybody?
In a distributed hash table you can have the nodes structured in a ring, where each node in the ring knows about at least one other node in the ring (to keep it connected). To make the ring more fault-tolerant make sure that each node has knowledge about more than one other node in the ring, so that it is able to still connect if some node crashes. In DHT terminology, this is called a "sucessor list". When the nodes are structured in the ring with unique IDs and some stabilization-protocol, you can do key lookups by routing through the ring to find the node responsible for a certain key.
How to synchronize entry additions?
If you don't want replication, a weak version of decentralized consensus is enough and that is that each node has its unique ID and that they know about the ring structure, this can be achieved by a periodic stabilization protocol, like in Chord: http://nms.lcs.mit.edu/papers/chord.pdf
The stabilization protocol has each node communicating with its successor periodically to see if it is the true successor in the ring or if a new node has joined in-between in the ring or the sucessor has crashed and the ring must be updated. Since no replication is used, to do consistent insertions it is enough that the ring is stable so that peers can route the insertion to the correct node that inserts it in its storage. Each item is only held by a single node in a DHT without replication.
This stabilization procedure can give you very good probability that the ring will always be stable and that you minimize inconsistency, but it cannot guarantee strong consistency, there might be gaps where the ring is temporary unstable when nodes joins or leaves. During the inconsistency periods, data loss, duplication, overwrites etc could happen.
If your application requires strong consistency, DHT is not the best architecture, it will be very complex to implement that kind of consistency in a DHT. First of all you'll need replication and you'll also need to add a lot of ACK and synchronity in the stabilization protocol, for instance using a 2PC protocol or paxos protocol for each insertion to ensure that each replica got the new value.
How can I find the root?
How to make sure there is a single shared root for everybody?
Typically DHTs are associated with some lookup-service (centralized) that contains IPs/IDs of nodes and new nodes registers at the service. This service can then also ensure that each new node gets a unique ID. Since this service only manages IDs and simple lookups it is not under any high load or risk of crashing so it is "OK" to have it centralized without hurting fault-tolerance, but of course you could distribute the lookup service as well, and sycnhronizing them with a consensus protocol like Paxos.

How to define topology in Castalia-3.2 for WBAN

How can defined topology in Castalia-3.2 for WBAN ?
How can import topology in omnet++ to casalia ?
where the topology defined in default WBAN scenario in Castalia?
with regard
thanks
Topology of a network is an abstraction that shows the structure of the communication links in the network. It's an abstraction because the notion of a link is an abstraction itself. There are no "real" links in a wireless network. The communication is happening in a broadcast medium and there are many parameters that dictate if a packet is received or not, such as the power of transmission, the path loss between transmitter and receiver, noise and interference, and also just luck. Still, the notion of a link could be useful in some circumstances, and some simulators are using it to define simulation scenarios. You might be used to simulators that you can draw nodes and then simply draw lines between them to define their links. This is not how Castalia models a network.
Castalia does not model links between the nodes, it models the channel and radios to get a more realistic communication behaviour.
Topology is often confused with deployment (I confuse them myself sometimes). Deployment is just the placement of nodes on the field. There are multiple ways to define deployment in Castalia, if you wish, but it is not needed in all scenarios (more on this later). People can confuse deployment with topology, because under very simplistic assumptions certain deployments lead to certain topologies. Castalia does not make these assumptions. Study the manual (especially chapter 4) to get a better understanding of Castalia's modeling.
After you have understood the modeling in Castalia, and you still want a specific/custom topology for some reason then you could play with some parameters to achieve your topology at least in a statistical sense. Assuming all nodes use the same radios and the same transmission power, then the path loss between nodes becomes a defining factor of the "quality" of the link between the nodes. In Castalia, you can define the path losses for each and every pair of nodes, using a pathloss map file.
SN.wirelessChannel.pathLossMapFile = "../Parameters/WirelessChannel/BANmodels/pathLossMap.txt"
This tells Castalia to use the specific path losses found in the file instead of computing path losses based on a wireless channel model. The deployment does not matter in this case. At least it does not matter for communication purposes (it might matter for other aspects of the simulation, for example if we are sampling a physical process that depends on location).
In our own simulations with BAN, we have defined a pathloss map based on experimental data, because other available models are not very accurate for BAN. For example the, lognormal shadowing model, which is Castalia's default, is not a good fit for BAN simulations. We did not want to enforce a specific topology, we just wanted a realistic channel model, and defining a pathloss map based on experimental data was the best way.
I have the impression though that when you say topology, you are not only referring to which nodes could communicate with which nodes, but which nodes do communicate with which nodes. This is also a matter of the layers above the radio (MAC and routing). For example it's the MAC and Routing that allow for relay nodes or not.
Note that in Castalia's current implementations of 802.15.6MAC and 802.15.4MAC, relay nodes are not allowed. So you can not create a mesh topology with these default implementations. Only a star topology is supported. If you want something more you'll have to implemented yourself.

Running Riak on Heterogeneous cluster

From what I've read, Riak treats all nodes in a cluster equal. However we'd like to have a heterogeneous cluster, where cpu/mem/hd is not always equal - in fact they can be very different. Each node would of course meet minimal requirements that's required for a node.
Questions:
1) What is the consequence of creating a cluster composed of machines with wildly varying specifications (cpu, diskspace, disk speed, amount and speed of memory, network speed) and
2) Can the cluster detect and compensate for such differences automatically? (assuming no here)
3) Are there ways to take care of this problem in other ways? Think of: prioritizing nodes in load balancer, based on the hardware. Something else?
I'll answer your questions, however, operating Riak in this manner is strongly discouraged as Riak assumes identical capabilities among nodes.
You could possibly have wildly varying performance characteristics
for operations against your node. In general, the "weakest node" in
the system could affect operations throughout the cluster. For
instance, during the PUT phase of an operation, a replica of the
data could be routed to the weakest node and the duration of that
operation could affect the entire PUT operation based on the PUT
operation's quorum value.
No, the cluster assumes identical hardware.
There really is no way to compensate for this.

What is an FFP machine?

In R. Kent Dybvig's paper "Three Implementation Models for Scheme" he speaks of "FFP languages" and "FFP machines". Apparently there is some correlation between FFP machines, and string-reduction on multiple processors.
Googling doesn't really uncover much in terms of explanations or examples.
Can anyone shed some light on this topic?
Thanks.
Kent Dybvig's advisor, Gyula A. Mago, published a detailed description in "The FFP Machine: Technical Report 87-014" in 1987 by Mago and Stanat.
As of this writing, the PDF is freely available at:
http://www.cs.unc.edu/techreports/87-014.pdf
The FFP Machine is a very fine-grained parallel computer architecture:
each processor holds a single symbol / atom / value.
It uses a string reduction model of computation in which
innermost function applications are found and replaced by their
equivalent result (eager evaluation).
Where a result is used in several places, it tends to be re-evaluated
instead of incurring the costs of accessing some global store
(but see Mago's paper on "Copying Operands vs Copying Results", or better yet Mago's "Data Sharing in an FFP Machine" in the 1982 Functional Programming Languages and Computer Architecture conference).
The L cells holding the FFP expression being reduced
communicate through a tree structured arrangement of T cells.
Note that IC's are basically two dimensional and with wiring,
circuits can move towards being three dimensional in physical space.
Interconnection networks that occupy higher dimensions
(such as the Hypercube, Omega, Banyan, Star, etc. networks)
will eventually be unable to perform near their theoretical limit.
This communication network is circuit-switched rather than being packet-switched.
Data packets contain no addresses and do not need routing.
Packets from distinct reductions cannot meet, cannot conflict
and cannot experience congestion with each other.
The configuring activity (called "Partitioning") is performed
in a single sweep upwards in the tree, using a handful
of logic operations on 3-bit messages, leaving "area machines" in its wake,
each created to advance at most a single reducible application.
While it is technically logarithmic in time,
the resulting area machines can begin communicating
in a pipelined fashion behind the partitioning wave,
practically costing a constant time penalty.
(The dismantling of area machines remains a logarithmic cost in time).
Packets within a single reduction should, and must, meet
and thus provide a often-useful synchronization.
Sequences of packets are sorted and combined as they rise
within an area, to be broadcast from the root of the area machine.
Parallel Prefix and Parallel Suffix operations are provided
to reduce area traffic, since there remains a potential bottleneck
within an individual reducible application.
This is accomplished without the need exhibited in
the Ultracomputer (Jack (Jacob?) Schwartz at NYU)
for a separate logarithmic-sized cache memory in each
communication node.
Each T cell (internal tree node) only needs a FIFO buffer
(for efficiency) of size greater than the pipeline path to
the top of the tree and back down.
(This latter is a conjecture of mine, but it seems reasonable).
Since the tree maintains the left-to-right order of data
(unlike some other combining networks), the system enables cells
to rotate their data in logarithmic rather than linear time,
avoiding the plausible congestion at the root of the area machine.
It's worth noting again, that the parallelism within an area
machine is independent of the simultaneous parallelism in other
area machines, and has available to it a number of processors
proportional to the quantity of data in the operand.
Have you come across this yet?: Compiling APL for parallel execution on an FFP machine
Formal FP. Similar to FP, but with regular sugarless syntax, for machine execution is all I can offer you.
See Wikis Fp page.

Resources