ProxySQL: how to configure a failover?

ProxySQL: how to configure a failover? - mariadb

How can I config ProxySQL to have a failover, independent from read-only or not. The parameter 'weight' in mysql_servers do not work properly in this case. I have some nodes (MariaDB 10.3) with master-master-replication, and If node1 goes offline, node2 should perform the statements. If node1 is back, it should be the first server again.
With my setup (all 3 servers in one hostgroup, only with different 'weight') the failover works, but with up to 9 seconds delay for the first statements after failover, and if node1 is back, the first server remains on node2.
How can I configure a failover with a fixed priorisation the best way? Should it be done in the hostgroup or in the rules? Or maybe it's not possible with ProxySQL?
PS: Maybe a similar question: ProxySQL active-standby setup

Related

Using endpoints of AWS ElastiCache for Redis

I am using AWS ElastiCache for Redis as the caching solution for my spring-boot application. I am using spring-boot-starter-data-redis and jedis client to connect with my cache.
Imagine that I am having my cache in cluster-mode-enabled and 3 shards with 2 nodes in each. I agree then the best way of doing it is using the configuration-endpoint. Alternatively, I can list all the endpoints of all nodes and let the job done.
However, even if I use a single node's endpoint from one of the shards, my caching solution works. That doesn't looks right to me. I feel even if it works, that might case problems in the cluster in long run. When there are all together 6 nodes partitioned into 3 shards but only using one node's endpoint. I have following questions.
Is using one node's endpoint create an imbalance in the cluster?
or
Is that handled automatically by the AWS ElastiCache for Redis?
If I use only one node's endpoint does that mean the other nodes will never being used?
Thank you!

To answer your questions;
Is using one node's endpoint create an imbalance in the cluster?
NO
Is that handled automatically by the AWS ElastiCache for Redis?
Somewhat
if I use only one node's endpoint does that mean the other nodes will never being used?
No. All nodes are being used.
This is how Cluster Mode Enabled works. In your case, you have 3 shards meaning all your slots (where key-value data is stored) are divided into 3 sub-clusters ie. shards.
This was explained in this answer as well - https://stackoverflow.com/a/72058580/6024431
So, essentially, your nodes are smart enough to re-direct your requests to the nodes that has the key-slot where your data needs to be stored. So, no imbalances. Redis handles the redirection for you.
Now, while using Node endpoints, you're going to be facing other problems.
Elasticache is running on cloud (which is essentially AWS Hardware). All hardware faces issues. You have 3 primaries (1p, 2p, 3p) and 3 (1r, 2r, 3r) replicas.
So, if a primary goes down due to hardware issue (lets say 1p), the replica will get promoted to become the new Primary for the cluster (1r).
Now the problem would be, your application is connected directly to 1p which has now been demoted to replica. So, all the WRITE operations will fail.
And you will have to change the application code manually whenever this happens.
Alternatively, if you were using configurational endpoint (or other cluster level endpoints) instead of node-endpoints, this issue would only be a blip to your application at most, perhaps for 1-2 seconds.
Cheers!

MariaDB Spider with Galera Clusters failover solutions

I am having problems trying to build a database solution for the experiment to ensure HA and performance(sharding).
Now, I have a spider node and two galera clusters (3 nodes in each cluster), as shown in the figure below, and this configuration works well in general cases.:
However, as far as I know, when the spider engine performs sharding, it must assign primary IP to distribute SQL statements to two nodes in different Galera clusters.
So my first question here is:
Q1): When the machine .12 shuts down due to destruction, how can I make .13 or .14(one of them) automatically replace .12?
The servers that spider engine know
Q2): Are there any open source tools (or technologies) that can help me deal with this situation? If so, please explain how it works. (Maybe MaxScale? But I never knew what it is and what it can do.)
Q3): The motivation for this experiment is as follows. An automated factory has many machines, and each machine generates some data that must be recorded during the production process (maybe hundreds or thousands of data per second) to observe the operation of the machine and make the quality of each batch of products the best.
So my question is: how about this architecture (Figure 1)? or please provides your suggestions.

You could use MaxScale in front of the Galera cluster to make the individual nodes appear like a combined cluster. This way Spider will be able to seamlessly access the shard even if one of the nodes fails. You can take a look at the MaxScale tutorial for instructions on how to configure it for a Galera cluster.
Something like this should work:
This of course has the same limitation that a single database node has: if the MaxScale server goes down, you'll have to switch to a different MaxScale for that cluster. The benefit of using MaxScale is that it is in some sense stateless which means it can be started and stopped almost instantly. A network load balancer (e.g. ELB) can already provide some form of protection from this problem.

How to prioritize nodes in Pacemaker/Corosync?

I have followed a guide and created a Nginx HA Cluster with a floating IP.
(Nginx, corosync, pacemaker are being used)
The guide I followed:
https://dzone.com/articles/how-to-configure-nginx-high-availability-cluster-u
I succesfully created a 2 node cluster and they are both working fine.
When Node1 goes offline, Node2 is used vice-versa.
My problem is that in my case, Node1 should be primary, meaning it should be always used whenever it is online.
To describe it better:
Node1 and Node2 are online -> Node1 is being used
Node1 goes offline -> Node2 is being used automatically
(The problem) When Node1 comes back online, Node2 is still being used
I need to manually stop Node2 if I want Node1 to be used again.
What do I exactly need to configure to make it automatically switch to Node1 when it is online?
Thank you in advance!

This is easily done via a simple infinity location constraint. In crmsh syntax this would appear like so:
location l_webserver_on_node1 hakase_balancing inf: node1
With that said, this does not adhere to best practices. In a well designed HA cluster both nodes should be equal, and it should not matter where the services run.
I have seen situations where there are intermittent problems with node1. For example, say node1 seems to crash and reboot about once a day. This means that twice a day your service will encounter a brief interruption as it migrates to node2 and then back to node1 when it finishes rebooting. Ideally it should only migrate to node2 once when node1 crashes the first time. Then stay there while you troubleshoot and repair/replace node1.

Mariadb Galera cluster does not come up after killing the mysql process

I have a Mariadb Galera cluster with 2 nodes and it is up and running.
Before moving to production, I want to make sure that if a node crashes abruptly, It should come up on its own.
I tried using systemd "restart", but after killing the mysql process the mariadb service does not come up, so, is there any tool or method, that I can use to automate bringing up the nodes after crashes?

Galera clusters needs to have quorum (3 nodes).
In order to avoid a split-brain condition, the minimum recommended number of nodes in a cluster is 3. Blocking state transfer is yet another reason to require a minimum of 3 nodes in order to enjoy service availability in case one of the members fails and needs to be restarted. While two of the members will be engaged in state transfer, the remaining member(s) will be able to keep on serving client requests.
You can read more here.

How to use nServiceBus in a failover cluster

We're using nServiceBus in our development environment, where we have a frontend publishing messages to a service (subscriber). Life is good.
FrontendWebServer -> MiddlewareServer
In our production environment, we'll be running two frontends and two middleware servers for failover.
FrontendWebServer -> LoadBalancer(F5) -> MiddlewareServer
FrontendWebServer -> LoadBalancer(F5) -> MiddlewareServer
This works well for URLs, but because we need to use machine names for MSMQ we're stuck.
We don't want to specify a physical middleware machine name in each frontend config (because it makes managing configs harder, and if one middleware server goes down, it will also stop messages its particular frontend).
We tried to use the nServiceBus distributor (installed on each frontend), but it seems that a subscriber can only listen to one distributor.
Any ideas how we can get around this problem without using separate configs?

I would push the F5 up in front of the web servers to balance that load. For the cluster, just reference the Clustered Server name and services not the individual machines. For example if you have Node1 and Node2 you may call the Cluster NSBNode or something like that.
If you make that cluster the Distributor, you can then add multiple Worker nodes behind it to load balance further. Again in this case also make reference to the Cluster queue names(queue#ClusterName).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex