We have a cluster with 3 nodes, we currently have the following HA (High Availability) mirror policy as shown below:
Seeing as we specify the ha-params as 2, does this mean mirror to the 2 other nodes, or 2 nodes in total or the 3?
Is this the same as the following where we just specify all, as there are only 3 nodes?
HA / classic mirroring is thoroughly documented here - https://www.rabbitmq.com/ha.html
NOTE: you should be using quorum queues instead - https://www.rabbitmq.com/quorum-queues.html
Classic mirroring will be removed in RabbitMQ 4.0
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
A count value of 2 means 2 replicas: 1 queue leader and 1 queue mirror. In other words: NumberOfQueueMirrors = NumberOfNodes - 1.
to dig more details: https://www.rabbitmq.com/ha.html#mirroring-arguments
Delving deeper and with a bit more understanding I can see that specifying ha-params as 2 will mean that a total of 2 nodes will be mirrored, so one of the nodes wont be! It's no the same as the all one, as that would mirror to all nodes, which would make more sense to me currently.
Related
studying raft, I can’t understand one thing, for example, I have a cluster of 6 nodes, and I have 3 partitions with a replication factor of 3, let’s say that a network error has occurred, and now 3 nodes do not see the remaining 3 nodes, until they remain available for clients, and for example, the record SET 5 came to the first formed cluster , and in this case it will pass? because replication factor =3 and majority will be 2? it turns out you can get split brain using raft protocol?
In case of 6 nodes, the majority is 4. So if you have two partitions of three nodes, neither of those partitions will be able to elect a leader or commit new values.
When a raft cluster is created, it is configured with specific number of nodes. And the majority of those nodes is required to either elect a leader or commit a log message.
In case of a raft cluster, every node has a replica of data. I guess we could say that the replication factor is equal to cluster side in a raft cluster. But I don't think I've ever seen replication factor term being used in consensus use case.
Few notes on cluster size.
Traditionally, cluster size is 2*N+1, where N is number of nodes a cluster can lose and still be operational - as the rest of nodes still have majority to elect a leader or commit log entries. Based on that, a cluster of 3 nodes may lose 1 node; a cluster of 5 may lose 2.
There is no much point (from consensus point of view) to have cluster of size 4 or 6. In case of 4 nodes total, the cluster may survive only one node going offline - it can't survive two as the other two are not the majority and they won't be able to elect a leader or agree on progress. Same logic applies for 6 nodes - that cluster can survive only two nodes going off. Having a cluster of 4 nodes is more expensive as we can have support the same single node outage with just 3 nodes - so cluster of 4 is a just more expensive with no availability benefit.
There is a case when cluster designers do pick cluster of size 4 or 6. This is when the system allows stale reads and those reads can be executed by any node in a cluster. To support larger scale of potentially stale reads, a cluster owner adds more nodes to handle the load.
we are having a proxmox cluster with 3 nodes. Each node have 4 ssd and 12 hdd.
My plan is to create 2 crush rules (one for ssd devices and another one for hdd devices).
With these 2 rules I will create 2 pools. One ssd pool and one hdd pool.
But inside the ceph documentation I found this https://docs.ceph.com/en/latest/rados/operations/crush-map/#custom-crush-rules.
I am trying to understand this rule. Would this rule be more useful for my hardware?
Can somebody explain (with simple words), what this rule is doing?
Thank you so much.
The easiest way to use SSDs or HDDs in your crush rules would be these, assuming you're using replicated pools:
rule rule_ssd {
id 1
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}
rule rule_hdd {
id 2
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}
These rules make sure to select the desired device class (ssd or hdd) and choose any host within that selection, depending on your pool size (don't use size=2 except for testing purposes) it will choose that many hosts. So in this case the failure-domain is "host".
The rule you refer to in the docs has its purpose in the name "mixed_replicated_rule". It spreads the replicas across different device classes (by the way, the autoscaler doesn't work well with mixed devices classes), I wouldn't really recommend it unless you have a good reason to. Stick to the easy ruleset and just use devices classes which usually are automatically detected when adding the drives.
I came across this intersting database the other day, and have read some doc on its offical site, I have some questions regarding Raft Group in TiKV (here),
Suppose that we have a cluster which has like 100 nodes, and the replication factor is 3, does that mean we will end up with a lot of tiny Raft "bubbles", each of them contains only 3 members, and they do leader election and log replication inside the "buble".
Or, we only have one single fat Raft "buble" which contains 100 nodes?
Please help to shed some light here, thank you!
a lot of tiny Raft "bubbles", each of them contains only 3 members,
The tiny Raft bubble in your context is a Raft group in TiKV, comprised of 3 replicas(by default). Data is auto-sharded in Regions in TiKV, each Region corresponding to a Raft group. To support large data volume, Multi-Raft is implemented. So you can perceive Multi-raft as tiny Raft "bubbles" that are distributed evenly across your nodes.
Check the image for Raft in TiKV here
we only have one single fat Raft "buble" which contains 100 nodes?
No, a Raft group does not contain nodes, they are contained in nodes.
For more details, see: What is Multi-raft in TiKV
In this case it means that you have 33 shards ("bubbles") of 3 nodes each.
A replication factors of 3 is quite common in distributed systems. In my experience, databases use replication factors of 3 (in 3 different locations) as a sweet spot between durability and latency; 6 (in 3 locations) when they lean heavily towards durability; and 9 (in 3 locations) when they never-ever want to lose data. The 9-node databases are extremely stable (paxos/raft-based) and I have only seen them used as configuration for the 3-node and 6-node databases which can use a more performant protocol (though, raft is pretty performant, too).
I have a dozen load balanced cloud servers all monitored by Munin.
I can track each one individually just fine. But I'm wondering if I can somehow bundle them up to see just how much collective CPU usage (for example) there is among the cloud cluster as a whole.
How can I do this?
The munin.conf file makes it easy enough to handle this for subdomains, but I'm not sure how to configure this for simple web nodes. Assume my web nodes are named, web_node_1 - web_node_10.
My conf looks something like this right now:
[web_node_1]
address 10.1.1.1
use_node_name yes
...
[web_node_10]
address 10.1.1.10
use_node_name yes
Your help is much appreciated.
You can achieve this with sum and stack.
I've just had to do the same thing, and I found this article pretty helpful.
Essentially you want to do something like the following:
[web_nodes;Aggregated]
update no
cpu_aggregate.update no
cpu_aggregate.graph_args --base 1000 -r --lower-limit 0 --upper-limit 200
cpu_aggregate.graph_category system
cpu_aggregate.graph_title Aggregated CPU usage
cpu_aggregate.graph_vlabel %
cpu_aggregate.graph_order system user nice idle
cpu_aggregate.graph_period second
cpu_aggregate.user.label user
cpu_aggregate.nice.label nice
cpu_aggregate.system.label system
cpu_aggregate.idle.label idle
cpu_aggregate.user.sum web_node_1:cpu.user web_node_2:cpu.user
cpu_aggregate.nice.sum web_node_1:cpu.nice web_node_2:cpu.nice
cpu_aggregate.system.sum web_node_1:cpu.nice web_node_2:cpu.system
cpu_aggregate.idle.sum web_node_1:cpu.nice web_node_2:cpu.idle
There are a few other things to tweak the graph to give it the same scale, min/max, etc as the main plugin, those can be copied from the "cpu" plugin file. The key thing here is the last four lines - that's where the summing of values from other graphs comes in.
This is a question about a large scalable P2P networking approach: logical ring net ovrlay.
Consider the context of P2P networking. There are N computers that are connected everyone to each other through a ring.
Every node has a routing table memorizing the predecessor and the successor node.
This is the simplest case when routing tables store only a predecessor and a successor.
Every node is provided with an id which is a number.
The ring is organized so that ascending numbers are assigned in clockwose direction.
So we can have a situation like this: * - 12 - 13 - 45 - 55 - 180 - 255 - *
This network has 6 nodes and they are connected in circle.
When a node must send a message to another node, the routing tables are used, if the generic node has an incoming message, it looks at the dest address and, if not in his routing table, the successor or predecesor will be up to route it.
Now let's consider this example.
In my simple network, node 13 wants to send a message to node 255.
Since every node can see only a predecessor and a successor, every node is not able to consider the global network, in P2P, in fact, a node can see only a part of the net. So node 13 has a decision to take: where to route the message (since the destination is not in its neighbourhood)? Does the message have to be sent to 45 or to 12? (clockwise or counterclockwise?).
Well, obviously, sending to 12 is a better decision, but how is node 13 able to know this?
The simplest solution would be: always route clockwise, but in this case a very near node will be reached in so much time..... while it was behind the corner....
How to handle this?
PS:
There are solution like Fingering applied to clockwise routing based approaches.
Fingering puts in routing table other addresses in order to create jump links...
This is a solution that can be used but with clockwise routing only...
http://en.wikipedia.org/wiki/File:Chord_route.png
I would like to know a good solution in order to find the right routing direction... does it exist? How does Chord handle this?
Thank you.
If every node remember link to the next one, the second one, the fourth one, the eighth one, and so on, then it takes only log(n) time to find any node. I believe that this is fast enough to not consider if you should go cw or ccw.