Network Partition in RabbitMQ - networking

I am trying to analyze how RabbitMQ Partition theorum(pause_minority,pause_if_all_down,autoheal) work? I have reproduced the network partition in three-node clusters in GCP but I was unable to conclude which node will be stopped when there is a network jitter in between them.
I have used perftest for creating a production-like environment and used iptables concept in order to bring partition between two nodes.
I created ten queues with replication factor 2(master and one slave) and was using min_master for the uniform distribution of queues.
Publishing rate was 1000/sec(100/sec for each queue)
Consumption rate was 1000/sec(100/sec for each queue)
Please see the test results here for Pause_Minority
Explanation: Taking the first row, 8,9, and 6 are the connections(before partition) on Node A, B, and C respectively. I have blacklisted Node A with B and the results are Node A and Node B stopped running and connections are transferred to Node C.
I got different results for rows 2 and 3(Please See the Linked Image)
Please see the test results here pause_if_all_down
Note:0 Connections means (no publishing and consumption)
Please see the test results here autoheal
For Pause Minority I read this article in which the author has explained the master slave architecture but I was unable to get the resutls as per the blog.
I am also attaching the link for the google sheet where I have shared the resuts of my test in detail
Other Artciles that I have read are as listed:
https://www.rabbitmq.com/partitions.html
https://docs.vmware.com/en/VMware-Tanzu-RabbitMQ-for-Kubernetes/1.2/tanzu-rmq/GUID-partitions.html
Can anyone explain to me how partiton theorums decide which node will be stopped in case of partition? Does it decide on the basis of a number of queues, or no of connections, or anything else?

Related

How to get high availability when download data on 2 servers?

I have data need to be downloaded on a local server every 24 hours. For high availability we provided 2 servers to avoid failures and losing data.
My question is: What is the best approach to use the 2 servers?
My ideas are:
-Download on a server and just if download failed for any reason, download will continue on the other server.
-Download will occur on the 2 servers at the same time every day.
Any advice?
In terms of your high-level approach, break it down into manageable chunks i.e. reliable data acquisition, and highly available data dissemination. I would start with the second part first, because that's the state you want to get to.
Highly available data dissemination
Working backwards (i.e. this is the second part of your problem), when offering highly-available data to consumers you have two options:
Active-Passive
Active-Active
Active-Active means you have at least two nodes servicing requests for the data, with some kind of Load Balancer (LB) in front, which allocates the requests. Depending on the technology you are using there may be existing components/solutions in that tech stack, or reference models describing potential solutions.
Active-Passive means you have one node taking all the traffic, and when that becomes unavailable requests are directed at the stand-by / passive node.
The passive node can be "hot" ready to go, or "cold" - meaning it's not fully operational but is relatively fast and easy to stand-up and start taking traffic.
In both cases, and if you have only 2 nodes, you ideally want both the nodes to be capable of handling the entire load. That's obvious for Active-Passive, but it also applies to active-active, so that if one goes down the other will successfully handle all requests.
In both cases you need some kind of network component that routes the traffic. Ideally it will be able to operate autonomously (it will have to if you want active-active load sharing), but you could have a manual / alert based process for switching from active to passive. For one thing, it will depend on what your non-functional requirements are.
Reliable data acquisition
Having figured out how you will disseminate the data, you know where you need to get it to.
E.g. if active-active you need to get it to both at the same time (I don't know what tolerances you can have) since you want them to serve the same consistent data. One option to get around that issues is this:
Have the LB route all traffic to node A.
Node B performs the download.
The LB is informed that Node B successfully got the new data and is ready to serve it. LB then switches the traffic flow to just Node B.
Node A gets the updated data (perhaps from Node B, so the data is guaranteed to be the same).
The LB is informed that Node A successfully got the new data and is ready to serve it. LB then allows the traffic flow to Nodes A & B.
This pattern would also work for active-passive:
Node A is the active node, B is the passive node.
Node B downloads the new data, and is ready to serve it.
Node A gets updated with the new data (probably from node B), to ensure consistency.
Node A serves the new data.
You get the data on the passive node first so that if node A went down, node B would already have the new data. Admittedly the time-window for that to happen should be quite small.

MariaDB Spider with Galera Clusters failover solutions

I am having problems trying to build a database solution for the experiment to ensure HA and performance(sharding).
Now, I have a spider node and two galera clusters (3 nodes in each cluster), as shown in the figure below, and this configuration works well in general cases.:
However, as far as I know, when the spider engine performs sharding, it must assign primary IP to distribute SQL statements to two nodes in different Galera clusters.
So my first question here is:
Q1): When the machine .12 shuts down due to destruction, how can I make .13 or .14(one of them) automatically replace .12?
The servers that spider engine know
Q2): Are there any open source tools (or technologies) that can help me deal with this situation? If so, please explain how it works. (Maybe MaxScale? But I never knew what it is and what it can do.)
Q3): The motivation for this experiment is as follows. An automated factory has many machines, and each machine generates some data that must be recorded during the production process (maybe hundreds or thousands of data per second) to observe the operation of the machine and make the quality of each batch of products the best.
So my question is: how about this architecture (Figure 1)? or please provides your suggestions.
You could use MaxScale in front of the Galera cluster to make the individual nodes appear like a combined cluster. This way Spider will be able to seamlessly access the shard even if one of the nodes fails. You can take a look at the MaxScale tutorial for instructions on how to configure it for a Galera cluster.
Something like this should work:
This of course has the same limitation that a single database node has: if the MaxScale server goes down, you'll have to switch to a different MaxScale for that cluster. The benefit of using MaxScale is that it is in some sense stateless which means it can be started and stopped almost instantly. A network load balancer (e.g. ELB) can already provide some form of protection from this problem.

Mariadb galera cluster and cap theorem

Where does mariadb galera cluster lies according to cap theorem CP or AP based on a brief explanation how it works.
Consistency -- For handling the "critical read" problem, Galera needs a little help. See http://mysql.rjweb.org/doc.php/galera#critical_reads
Otherwise, one can state that Galera survives "any" single-point-of-failure.
Galera is normally deployed in 3 nodes, one in each of 3 geographic locations. That means that no single machine failure, data center failure, earthquake, tornado, network outage, etc, can take out more than one node at a time. The other two nodes (whichever two survive and still talk to each other) will declare that they "have a quorum" and continue to accept writes and deliver reads. Further, "split brain" is not 'possible'; this is what keeps any attempt at dual-master, even with monitoring from surviving any SPOF.
If the third node or the network is repaired, the Cluster goes about patching up the data as needed, so that the 3 nodes again have identical data.
Granted, this is not quite the same as the definition of CAP, but is is a reasonable goal for a computer cluster.
How it works (in a tiny nutshell)... Each node talks to each other node. It does this only during the COMMIT of a transaction. (Hence, it is reasonably efficient even when spread across a WAN, as needed to survive natural disasters.) The COMMIT says to the other nodes "I am about to do this write; is it OK?" Without actually doing the write, they check Galera's magic sauce to see if it will succeed. Once everyone says "yes", the COMMIT returns success to the client. (That gives you a hint of the "critical" read issue.)

Virtual Nodes in Dynamo

Recently I have read the paper of Dynamo, the key/value storage system of Amazon. The Dynamo uses consistent hashing algorithm as the partition algorithm. To solve the challenge of load balance and heterogeneous, it applies the "virtual node" mechanism. Here is my question:
It is described that "The number of virtual nodes that a node is
responsible can decided based on its capacity", but what capacity it
is? Is it the calculation capacity, network bandwidth, or the disk
volume?
What is the technology to partition a node to "virtual nodes"? Is a virtual node just a process? Or maybe using docker or virtual machine?
Without going into specifics, for #1 the answer would be: all of the above. The capacity may be determined empirically for different node types after running some load testing and noting the results. A similar process to what you would use to determine the capacity of a web server.
And for your second question, the paper just says that you should think of nodes from a logical stand point. In order to satisfy #1, nodes in the ring are designated such that one or multiple nodes would hash to the same physical hardware. So a virtual node is just a logical mapping. It is just one more layer of abstraction on top of the physical layer. If you are familiar with file systems, think of a virtual node like an iNode vs. a disk cylinder (a comparison perhaps slightly dated)

Which node should I push data to in a cluster?

I've setup a kafka cluster with 3 nodes.
kafka01.example.com
kafka02.example.com
kafka03.example.com
Kafka does replication so that any node in the cluster can be removed without loosing data.
Normally I would send all data to kafka01, however that will break the entire cluster if that one node goes down.
What is industry best practice when dealing with clusters? I'm evaluating setting up an NGINX reverse proxy with round robin load balancing. Then I can point all data producers at the proxy and it will divvy up between the nodes.
I need to ensure that no data is lost if one of the nodes becomes unavailable.
Is an nginx reverse proxy an appropriate tool for this use case?
Is my assumption correct that a round robin reverse proxy will distribute the data and increase reliability without data loss?
Is there a different approach that I haven't considered?
Normally your producer takes care of distributing the data to all (or selected set of) nodes that are up and running by using a partitioning function either in a round robin mode or by using some semantics of your choice. The producer publishes to a partition of a topic and different nodes are leaders for different partitions of one topic. If a broker node becomes unavailable, this node will fall out of the cluster (In Sync Replicas) and new leaders for partitions on that node will be selected. Through metadata requests/responses, your producer will become aware of this fact and push messages to other nodes which are currently up.

Resources