MariaDB Galera SST using mysqldump - mariadb

Background:
I have a 3x server Mariadb Galera cluster that is happily working with mariabackup SST method, but recently have dumped a lot of legacy data out of tables (was 370GB+ now down to 75GB storage).
In order to shrink the database size to avoid such large SST requirements my understanding is mysqldump/restore could be used to achieve this. As such my thought process was to drop one of the servers from the cluster, and bring it back in with mysqldump SST method to shrink the disk usage. After this, I was then planning to force SST on the other nodes with the shrunken server as donor with mariabackup SST method again.
If my logic is flawed on the above please feel free to pull me up.
Problem:
After dropping one node and switching SST method to mysqldump, the server came up and made contact with the other nodes, but SST would fail.
SST Donor as Node 2, I get below logs on that node;
2023-02-16 21:02:40 9 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
ERROR 1045 (28000): Access denied for user 'backupuser'#'192.168.2.100' (using password: YES)
ERROR 1045 (28000): Access denied for user 'backupuser'#'192.168.2.100' (using password: YES)
ERROR 1045 (28000): Access denied for user 'backupuser'#'192.168.2.100' (using password: YES)
/usr//bin/wsrep_sst_mysqldump: line 128: [: -gt: unary operator expected
ERROR 1045 (28000): Access denied for user 'backupuser'#'192.168.2.100' (using password: YES)
2023-02-16 21:02:40 9 [ERROR] WSREP: Process completed with error: wsrep_sst_mysqldump --address '192.168.1.100:3306' --port '3306' --local-port '3306' --socket '/var/lib/mysql/mysql.sock' --gtid 'a98addba-189c-11ed-887e-4258cd7b8de4:687374620' --gtid-domain-id '0' --mysqld-args --basedir=/usr: 1 (Operation not permitted)
2023-02-16 21:02:40 9 [ERROR] WSREP: Try 1/3: 'wsrep_sst_mysqldump --address '192.168.1.100:3306' --port '3306' --local-port '3306' --socket '/var/lib/mysql/mysql.sock' --gtid 'a98addba-189c-11ed-887e-4258cd7b8de4:687374620' --gtid-domain-id '0' --mysqld-args --basedir=/usr' failed: 1 (Operation not permitted)
2023-02-16 21:02:41 0 [Note] WSREP: (a897af72, 'tcp://0.0.0.0:4567') turning message relay requesting off
SST Donor as Node 3, I get below logs on that node;
2023-02-16 21:24:01 8 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
mysqldump: Couldn't execute 'show events': Access denied for user 'backupuser'#'localhost' to database 'main_db' (1044)
ERROR 1064 (42000) at line 24: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'SST failed to complete' at line 1
2023-02-16 21:24:02 8 [ERROR] WSREP: Process completed with error: wsrep_sst_mysqldump --address '192.168.1.100:3306' --port '3306' --local-port '3306' --socket '/var/lib/mysql/mysql.sock' --gtid 'a98addba-189c-11ed-887e-4258cd7b8de4:687381861' --gtid-domain-id '0' --mysqld-args --basedir=/usr: 1 (Operation not permitted)
2023-02-16 21:24:02 8 [ERROR] WSREP: Try 1/3: 'wsrep_sst_mysqldump --address '192.168.1.100:3306' --port '3306' --local-port '3306' --socket '/var/lib/mysql/mysql.sock' --gtid 'a98addba-189c-11ed-887e-4258cd7b8de4:687381861' --gtid-domain-id '0' --mysqld-args --basedir=/usr' failed: 1 (Operation not permitted)
mysqldump: Couldn't execute 'show events': Access denied for user 'backupuser'#'localhost' to database 'main_db' (1044)
ERROR 1064 (42000) at line 24: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'SST failed to complete' at line 1
2023-02-16 21:24:04 8 [ERROR] WSREP: Process completed with error: wsrep_sst_mysqldump --address '192.168.1.100:3306' --port '3306' --local-port '3306' --socket '/var/lib/mysql/mysql.sock' --gtid 'a98addba-189c-11ed-887e-4258cd7b8de4:687381861' --gtid-domain-id '0' --mysqld-args --basedir=/usr: 1 (Operation not permitted)
2023-02-16 21:24:04 8 [ERROR] WSREP: Try 2/3: 'wsrep_sst_mysqldump --address '192.168.1.100:3306' --port '3306' --local-port '3306' --socket '/var/lib/mysql/mysql.sock' --gtid 'a98addba-189c-11ed-887e-4258cd7b8de4:687381861' --gtid-domain-id '0' --mysqld-args --basedir=/usr' failed: 1 (Operation not permitted)
For identification in the above;
192.168.1.100 is Node 1
192.168.2.100 is Node 2
192.168.3.100 is Node 3 (Though it's IP isn't listed in above)
Server version: 10.3.32-MariaDB-log MariaDB Server
I have confirmed that the backup user account and password in the "wsrep_sst_auth" parameter can be used from any node to connect to any other node via console (including itself) and even went as far as to drop a new database into Node 1 and explicitly add that user with full permissions before I tried to join it to the cluster again with the same results.
Node 1 has been rolled back to mariabackup SST method and successfully rejoined the cluster, but I would like to reattempt the above to shrink the disk usage if anyone is able to provide some guidance on why I might be seeing the above errors and additional options to try and resolve.

Related

Error on Starting MySQL Cluster 8.0 Data Node on Ubuntu 22.04 LTS

When I start the data nodeid 1 (10.1.1.103) of MySQL Cluster 8.0 on Ubuntu 22.04 LTS I am getting the following error:
# ndbd
Failed to open /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list: No such file or directory
2023-01-02 17:16:55 [ndbd] INFO -- Angel connected to '10.1.1.102:1186'
2023-01-02 17:16:55 [ndbd] INFO -- Angel allocated nodeid: 2
When I start data nodeid 2 (10.1.1.105) I get the following error:
# ndbd
Failed to open /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list: No such file or directory
2023-01-02 11:10:04 [ndbd] INFO -- Angel connected to '10.1.1.102:1186'
2023-01-02 11:10:04 [ndbd] ERROR -- Failed to allocate nodeid, error: 'Error: Could not alloc node id at 10.1.1.102:1186: Connection done from wrong host ip 10.1.1.105.'
The management node log file reports (on /var/lib/mysql-cluster/ndb_1_cluster.log):
2023-01-02 11:28:47 [MgmtSrvr] INFO -- Node 2: Initial start, waiting for 3 to connect, nodes [ all: 2 and 3 connected: 2 no-wait: ]
What is the relevance of failing to open: /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list: No such file or directory?
Why is data node on 10.1.1.105 unable to allocate a nodeid?
I initially installed a single Management Node on 10.1.1.102:
wget https://dev.mysql.com/get/Downloads/MySQL-Cluster-8.0/mysql-cluster_8.0.31-1ubuntu22.04_amd64.deb-bundle.tar
tar -xf mysql-cluster_8.0.31-1ubuntu22.04_amd64.deb-bundle.tar
dpkg -i mysql-cluster-community-management-server_8.0.31-1ubuntu22.04_amd64.deb
mkdir /var/lib/mysql-cluster
vi /var/lib/mysql-cluster/config.ini
The configuration set up on config.ini:
[ndbd default]
# Options affecting ndbd processes on all data nodes:
NoOfReplicas=2 # Number of replicas
[ndb_mgmd]
# Management process options:
hostname=10.1.1.102 # Hostname of the manager
datadir=/var/lib/mysql-cluster # Directory for the log files
[ndbd]
hostname=10.1.1.103 # Hostname/IP of the first data node
NodeId=2 # Node ID for this data node
datadir=/usr/local/mysql/data # Remote directory for the data files
[ndbd]
hostname=10.1.1.105 # Hostname/IP of the second data node
NodeId=3 # Node ID for this data node
datadir=/usr/local/mysql/data # Remote directory for the data files
[mysqld]
# SQL node options:
hostname=10.1.1.102 # In our case the MySQL server/client is on the same Droplet as the cluster manager
I then started and killed the running server and created a systemd file for Cluster manager:
ndb_mgmd -f /var/lib/mysql-cluster/config.ini
pkill -f ndb_mgmd
vi /etc/systemd/system/ndb_mgmd.service
Adding the following configuration:
[Unit]
Description=MySQL NDB Cluster Management Server
After=network.target auditd.service
[Service]
Type=forking
ExecStart=/usr/sbin/ndb_mgmd -f /var/lib/mysql-cluster/config.ini
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
I then reloaded the systemd daemon to apply the changes, started and enabled the Cluster Manager and checked its active status:
systemctl daemon-reload
systemctl start ndb_mgmd
systemctl enable ndb_mgmd
Here is the status of the Cluster Manager:
# systemctl status ndb_mgmd
● ndb_mgmd.service - MySQL NDB Cluster Management Server
Loaded: loaded (/etc/systemd/system/ndb_mgmd.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2023-01-01 08:25:07 CST; 27min ago
Main PID: 320972 (ndb_mgmd)
Tasks: 12 (limit: 9273)
Memory: 2.5M
CPU: 35.467s
CGroup: /system.slice/ndb_mgmd.service
└─320972 /usr/sbin/ndb_mgmd -f /var/lib/mysql-cluster/config.ini
Jan 01 08:25:07 nuc systemd[1]: Starting MySQL NDB Cluster Management Server...
Jan 01 08:25:07 nuc ndb_mgmd[320971]: MySQL Cluster Management Server mysql-8.0.31 ndb-8.0.31
Jan 01 08:25:07 nuc systemd[1]: Started MySQL NDB Cluster Management Server.
I then set up a data node on 10.1.1.103, installing dependencies, downloading the data node and setting up its config:
apt update && apt -y install libclass-methodmaker-perl
wget https://dev.mysql.com/get/Downloads/MySQL-Cluster-8.0/mysql-cluster_8.0.31-1ubuntu22.04_amd64.deb-bundle.tar
tar -xf mysql-cluster_8.0.31-1ubuntu22.04_amd64.deb-bundle.tar
dpkg -i mysql-cluster-community-data-node_8.0.31-1ubuntu22.04_amd64.deb
vi /etc/my.cnf
I entered the address of the Cluster Management Node in the configuration:
[mysql_cluster]
# Options for NDB Cluster processes:
ndb-connectstring=10.1.1.102 # location of cluster manager
I then created a data directory and started the node:
mkdir -p /usr/local/mysql/data
ndbd
This is when I got the "Failed to open" error result on data nodeid 1 (102.1.1.103):
# ndbd
Failed to open /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list: No such file or directory
2023-01-02 17:16:55 [ndbd] INFO -- Angel connected to '10.1.1.102:1186'
2023-01-02 17:16:55 [ndbd] INFO -- Angel allocated nodeid: 2
UPDATED (2023-01-02)
Thank you #MauritzSundell. I corrected the (private) IP addresses above and no longer got:
# ndbd
Failed to open /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list: No such file or directory
ERROR: Unable to connect with connect string: nodeid=0,10.1.1.2:1186
Retrying every 5 seconds. Attempts left: 12 11 10 9 8 7 6 5 4 3 2 1, failed.
2023-01-01 14:41:57 [ndbd] ERROR -- Could not connect to management server, error: ''
Also #MauritzSundell, in order to use the ndbmtd process rather than the ndbd process, does any alteration need to be made to any of the configuration files (e.g. /etc/systemd/system/ndb_mgmd.service)?
What is the appropriate reference/tutorial documentation for MySQL Cluster 8.0? Is it MySQL Cluster "MySQL NDB Cluster 8.0" on:
https://downloads.mysql.com/docs/mysql-cluster-excerpt-8.0-en.pdf
Or is it "MySQL InnoDB Cluster" on:
https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-cluster-introduction.html
Not sure I understand the difference.

openstack-nova-api has conflicted with the httpd service among the port 8774

I can't use httpd and nova-api at the same time.
when I used httpd service.The nova-api is dead(or inactive).
#systemctl restart openstack-nova-api
OUTPUT:
Job for openstack-nova-api.service failed because the control process exited
with error code. See "systemctl status openstack-nova-api.service" and
"journalctl -xe" for details.
I checked out the log,I get the error as follows.
LOG:ERROR nova.wsgi [-] Could not bind to 0.0.0.0:8774: error: [Errno 98] Address already in use.
CRITICAL nova [-] Unhandled error: error: [Errno 98] Address already in use.
And then,I try to find which process have used the port8774.
#netstat -tunlp | grep 8774
OUTPUT:
tcp 0 0 0.0.0.0:8774 0.0.0.0:* LISTEN 61690/httpd
When I #systemctl stop httpd->#systemctl restart nova-api->#systemctl restart http. I get a similiar mistake(I use RDO to install openstack-train version on centos 7).
they can't exist together

Innodb cluster - add instance error - the server is not configured properly

I am trying to add instance to my innodb cluster. I turned off my firewall on both hosts (local vms).
This is the output of the cluster.addInstance('root#innodb-2:3306'); command:
ERROR: Unable to start Group Replication for instance 'innodb-2:3306'.
The MySQL error_log contains the following messages:
2021-10-15 19:13:31.922188 [System] [MY-013587] Plugin group_replication reported: 'Plugin 'group_replication' is starting.'
2021-10-15 19:13:47.112664 [System] [MY-010597] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2021-10-15 19:14:13.496929 [Error] [MY-011735] Plugin group_replication reported: '[GCS] Error on opening a connection to innodb-1:33061 on local port: 33061.'
2021-10-15 19:14:23.498488 [Error] [MY-011735] Plugin group_replication reported: '[GCS] Error on opening a connection to innodb-1:33061 on local port: 33061.'
2021-10-15 19:14:33.508595 [Error] [MY-011735] Plugin group_replication reported: '[GCS] Error on opening a connection to innodb-1:33061 on local port: 33061.'
2021-10-15 19:14:43.520173 [Error] [MY-011735] Plugin group_replication reported: '[GCS] Error on opening a connection to innodb-1:33061 on local port: 33061.'
2021-10-15 19:14:47.373813 [Error] [MY-011640] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2021-10-15 19:14:47.373884 [Error] [MY-011735] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'
Cluster.addInstance: Group Replication failed to start: MySQL Error 3092 (HY000): innodb-2:3306: The server is not configured properly to be an active member of the group. Please see more details on error log. (RuntimeError)
I've done dba.configureInstance('root#innodb-2:3306') beforehand.
What version of MySQL and MySQL Shell are you using?
This seems like a connectivity issue, please ensure the allowlist is set if needed: https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-innodb-cluster-securing.html#create-allowlist-servers

ERROR InnoDB: InnoDB: Unable to allocate memory of size 18446744073709544120

I have three nodes to setup the mariadb cluster.
When I use /usr/sbin/mysqld --wsrep-new-cluster --user=root & to start the cluster in the node1, but get the below error:
2017-08-05 17:41:36 140123776886528 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-05 17:41:36 140123777161408 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.19-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
2017-08-05 17:41:41 140123698513664 [ERROR] InnoDB: InnoDB: Unable to allocate memory of size 18446744073709544120.
2017-08-05 17:41:41 7f7117464700 InnoDB: Assertion failure in thread 140123698513664 in file ha_innodb.cc line 22407
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
170805 17:41:41 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.1.19-MariaDB
key_buffer_size=0
read_buffer_size=131072
max_used_connections=3
max_threads=10002
thread_count=6
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 21969763 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0x7f6fe0b7e008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f7117463cc0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f711ca3dcde]
/usr/sbin/mysqld(handle_fatal_signal+0x2d5)[0x7f711c561005]
/lib64/libpthread.so.0(+0xf100)[0x7f711bb7b100]
/lib64/libc.so.6(gsignal+0x37)[0x7f7119ed65f7]
/lib64/libc.so.6(abort+0x148)[0x7f7119ed7ce8]
/usr/sbin/mysqld(+0x73880f)[0x7f711c6e480f]
/usr/sbin/mysqld(+0x78fbb7)[0x7f711c73bbb7]
/usr/sbin/mysqld(+0x78fcfc)[0x7f711c73bcfc]
/usr/sbin/mysqld(+0x82da1c)[0x7f711c7d9a1c]
/usr/sbin/mysqld(+0x82dc9e)[0x7f711c7d9c9e]
/usr/sbin/mysqld(+0x80ab5f)[0x7f711c7b6b5f]
/usr/sbin/mysqld(+0x8002cc)[0x7f711c7ac2cc]
/usr/sbin/mysqld(+0x7360f2)[0x7f711c6e20f2]
/usr/sbin/mysqld(_ZN7handler18index_read_idx_mapEPhjPKhm16ha_rkey_function+0x85)[0x7f711c561965]
/usr/sbin/mysqld(_ZN7handler21ha_index_read_idx_mapEPhjPKhm16ha_rkey_function+0xa6)[0x7f711c565ca6]
/usr/sbin/mysqld(+0x479524)[0x7f711c425524]
/usr/sbin/mysqld(+0x4796a0)[0x7f711c4256a0]
/usr/sbin/mysqld(+0x47edd6)[0x7f711c42add6]
/usr/sbin/mysqld(_ZN4JOIN14optimize_innerEv+0x72f)[0x7f711c432c1f]
/usr/sbin/mysqld(_ZN4JOIN8optimizeEv+0x2f)[0x7f711c43551f]
/usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x8f)[0x7f711c43565f]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x245)[0x7f711c4361c5]
/usr/sbin/mysqld(+0x4291a1)[0x7f711c3d51a1]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x5f8f)[0x7f711c3e14cf]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x352)[0x7f711c3e4e62]
/usr/sbin/mysqld(+0x439689)[0x7f711c3e5689]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1fb0)[0x7f711c3e7d10]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x7f711c3e8bb9]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x18a)[0x7f711c4af71a]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x7f711c4af8f0]
/usr/sbin/mysqld(+0x96f37d)[0x7f711c91b37d]
/lib64/libpthread.so.0(+0x7dc5)[0x7f711bb73dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f7119f9721d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f6fd9420020): is an invalid pointer
Connection ID (thread ID): 16
Status: NOT_KILLED
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
We think the query pointer is invalid, but we will try to print it anyway.
Query: SELECT agents.id AS agents_id, agents.agent_type AS agents_agent_type, agents.`binary` AS agents_binary, agents.topic AS agents_topic, agents.host AS agents_host, agents.availability_zone AS agents_availability_zone, agents.admin_state_up AS agents_admin_state_up, agents.created_at AS agents_created_at, agents.started_at AS agents_started_at, agents.heartbeat_timestamp AS agents_heartbeat_timestamp, agents.description AS agents_description, agents.configurations AS agents_configurations, agents.resource_versions AS agents_resource_versions, agents.`load` AS agents_load
FROM agents
WHERE agents.agent_type = 'Linux bridge agent' AND agents.host = 'ha-node2'

Unable to create MariaDB Galera Cluster

I have built an image based on mariadb:10.1 which basically adds a new cluster.conf but facing the following error on the second node after the first node started working successfully. Can somebody help me debug here?
Error log tail
2016-09-28 10:12:55 139799503415232 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():162
2016-09-28 10:12:55 139799503415232 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2016-09-28 10:12:55 139799503415232 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1380: Failed to open channel 'test_cluster' at 'gcomm://172.17.0.2,172.17.0.3,172.17.0.4': -110 (Connection timed out)
2016-09-28 10:12:55 139799503415232 [ERROR] WSREP: gcs connect failed: Connection timed out
2016-09-28 10:12:55 139799503415232 [ERROR] WSREP: wsrep::connect(gcomm://172.17.0.2,172.17.0.3,172.17.0.4) failed: 7
2016-09-28 10:12:55 139799503415232 [ERROR] Aborting
MySQL init process failed.
Debugging steps taken
NOTE: Container IP addresses were ensured to be the same as shown.
To ensure networking between containers is working, tried creating another container which could login to the first container's mysql instance.
This is definitely not related to MYSQL_HOST
To see if the container was running out of memory, I used docker stats and saw that the failed container was using only a meagre 142MB all through its lifecycle until it failed, which is way lesser than the total memory it was allowed (~4GB).
I am using Docker for Mac, but tried running the same on a CentOS VirtualBox and gives the same results. Doesn't look like Docker on Mac has a problem.
Config
[mysqld]
user=mysql
binlog_format=ROW
bind-address=0.0.0.0
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=122M
innodb_file_per_table=1
innodb_doublewrite=1
query_cache_size=0
query_cache_type=0
wsrep_on=ON
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_sst_method=rsync
Steps to start containers
# bootstrap node
docker run --rm -e MYSQL_ROOT_PASSWORD=123 \
activatedgeek/mariadb:devel \
--wsrep-cluster-name=test_cluster \
--wsrep-cluster-address=gcomm://172.17.0.2,172.17.0.3,172.17.0.4 \
--wsrep-new-cluster
# add node into cluster
docker run --rm -e MYSQL_ROOT_PASSWORD=123 \
activatedgeek/mariadb:devel \
--wsrep-cluster-name=test_cluster \
--wsrep-cluster-address=gcomm://172.17.0.2,172.17.0.3,172.17.0.4
# add node into cluster
docker run --rm -e MYSQL_ROOT_PASSWORD=123 \
activatedgeek/mariadb:devel \
--wsrep-cluster-name=test_cluster \
--wsrep-cluster-address=gcomm://172.17.0.2,172.17.0.3,172.17.0.4
This problem is caused due to the hanging init process. The configurations and CLI arguments above are correct. The only thing to be done before the init process starts is to create and empty mysql directory in the data directory (/var/lib/mysql by default). The must only be created on all nodes except the bootstrap node.
mkdir -p /var/lib/mysql/mysql
See sample MariaDB Cluster for usage which uses a custom MariaDB image and is a proof of concept for creating clusters.
I guess your containers should either expose the required ports:
-p 3306:3306 -p 4444:4444 -p 4567:4567 -p 4568:4568
or should be --link (ed) together.

Resources