Unable to bootstrap MariaDB 10.1 Galera Cluster on Centos 7 - mariadb

I follow this instruction
https://tunnelix.com/mariadb-galera-cluster-installation/
for installing MariaDB 10.1 Galera Cluster on CentOS 7.
The following is my galera configuration in /etc/my.cnf.d/server.cnf
[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address='gcomm://192.168.0.42,192.168.0.43'
wsrep_cluster_name='galera'
wsrep_node_address='192.168.0.42'
wsrep_node_name='galera1'
wsrep_sst_method=rsync
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
For testing purposes, I disabled SELinux and FirewallD. When I bring up the first node with galera_new_cluster I could only see only port 3306 is listening
[root#localhost ~]# netstat -ntpl | grep sql
tcp6 0 0 :::3306 :::* LISTEN 28673/mysqld
Whereas, supposedly I should have port 4567 listening too for the clustering, as shown in the example (see image below):
The following is the startup log content:
[root#localhost ~]# systemctl status mysql.service
● mariadb.service - MariaDB database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: active (running) since Fri 2017-07-14 20:44:20 +08; 8min ago
Process: 28777 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 28736 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 28734 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 28749 (mysqld)
Status: "Taking your SQL requests now..."
CGroup: /system.slice/mariadb.service
└─28749 /usr/sbin/mysqld --wsrep-new-cluster
Jul 14 20:44:19 localhost.localdomain mysqld[28749]: 2017-07-14 20:44:19 140558527281408 [Note] InnoDB: Highest supported file format is Barracuda.
Jul 14 20:44:20 localhost.localdomain mysqld[28749]: 2017-07-14 20:44:20 140558527281408 [Note] InnoDB: 128 rollback segment(s) are active.
Jul 14 20:44:20 localhost.localdomain mysqld[28749]: 2017-07-14 20:44:20 140558527281408 [Note] InnoDB: Waiting for purge to start
Jul 14 20:44:20 localhost.localdomain mysqld[28749]: 2017-07-14 20:44:20 140558527281408 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.0 started; log sequ...ber 1617718
Jul 14 20:44:20 localhost.localdomain mysqld[28749]: 2017-07-14 20:44:20 140557741455104 [Note] InnoDB: Dumping buffer pool(s) not yet started
Jul 14 20:44:20 localhost.localdomain mysqld[28749]: 2017-07-14 20:44:20 140558527281408 [Note] Plugin 'FEEDBACK' is disabled.
Jul 14 20:44:20 localhost.localdomain mysqld[28749]: 2017-07-14 20:44:20 140558527281408 [Note] Server socket created on IP: '::'.
Jul 14 20:44:20 localhost.localdomain mysqld[28749]: 2017-07-14 20:44:20 140558527281408 [Note] /usr/sbin/mysqld: ready for connections.
Jul 14 20:44:20 localhost.localdomain mysqld[28749]: Version: '10.1.25-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
Jul 14 20:44:20 localhost.localdomain systemd[1]: Started MariaDB database server.
Hint: Some lines were ellipsized, use -l to show in full.
And checking the wsrep status:
[root#localhost ~]# mysql -u root -pMyPassword --execute="SHOW STATUS LIKE 'wsrep%';"
+--------------------------+----------------------+
| Variable_name | Value |
+--------------------------+----------------------+
| wsrep_cluster_conf_id | 18446744073709551615 |
| wsrep_cluster_size | 0 |
| wsrep_cluster_state_uuid | |
| wsrep_cluster_status | Disconnected |
| wsrep_connected | OFF |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 18446744073709551615 |
| wsrep_provider_name | |
| wsrep_provider_vendor | |
| wsrep_provider_version | |
| wsrep_ready | OFF |
| wsrep_thread_count | 0 |
+--------------------------+----------------------+

starting first node of Galera cluster must be like this
service mysql start --wsrep-new-cluster

Related

MariaDB server is down after reboot

I needed to know my mariadb version but there's a typo mysqld -version (I installed it via apt, it was working with cli login, but not mysqli). The command jammed and I forced reboot the VPS from the panel. When it went up, I could not start mariadb.
● mariadb.service - MariaDB 10.3.34 database server
Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sat 2022-03-12 16:43:54 CET; 23min ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 563 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
Process: 598 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 628 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 692 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
Main PID: 692 (code=exited, status=1/FAILURE)
Status: "MariaDB server is down"
Mar 12 16:43:53 myvps systemd[1]: Starting MariaDB 10.3.34 database server...
Mar 12 16:43:53 myvps mysqld[692]: 2022-03-12 16:43:53 0 [Note] /usr/sbin/mysqld (mysqld 10.3.34-MariaDB-0ubuntu0.20.04.1) starting as process 692 ... Mar 12 16:43:54 myvps systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
Mar 12 16:43:54 myvps systemd[1]: mariadb.service: Failed with result 'exit-code'.
Mar 12 16:43:54 myvps systemd[1]: Failed to start MariaDB 10.3.34 database server
I saw this:
[ERROR] InnoDB: Missing MLOG_CHECKPOINT at 1630019 between the checkpoint 1630019 and the end 1630028.
I read this.
https://dba.stackexchange.com/questions/163445/innodb-ignoring-the-redo-log-due-to-missing-mlog-checkpoint
I did this
rm /var/lib/mysql/ib_logfile*
And worked for me.

A galera node went down and started reporting error innodb as unknown or unsupported engine

One of my node was down and it was also the donor for the node A, so from node A config I removed the down node as donor. This is so that node A should not have to wait for the node to come up.
As the node that originally went down was on slow storage and slow network connection I decided to remove it. As I removed it and restarted the cluster even the node A didn't come up, on restart it started reporting error:
● mariadb.service - MariaDB 10.1.47 database server
Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2020-12-13 22:06:50 GMT; 12h ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 32653 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
Process: 17439 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_STA
Process: 17436 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 17432 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
Main PID: 32653 (code=exited, status=1/FAILURE)
CPU: 801ms
Dec 13 22:06:50 rockpi sh[17439]: InnoDB: http://dev.mysql.com/doc/refman/5.6/en/error-creating-innodb.html
Dec 13 22:06:50 rockpi sh[17439]: 2020-12-13 22:06:50 548039303184 [ERROR] Plugin 'InnoDB' init function returned error.
Dec 13 22:06:50 rockpi sh[17439]: 2020-12-13 22:06:50 548039303184 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
Dec 13 22:06:50 rockpi sh[17439]: 2020-12-13 22:06:50 548039303184 [Note] Plugin 'FEEDBACK' is disabled.
Dec 13 22:06:50 rockpi sh[17439]: 2020-12-13 22:06:50 548039303184 [ERROR] Unknown/unsupported storage engine: innodb
Dec 13 22:06:50 rockpi sh[17439]: 2020-12-13 22:06:50 548039303184 [ERROR] Aborting'
Dec 13 22:06:50 rockpi systemd[1]: mariadb.service: Control process exited, code=exited status=1
Dec 13 22:06:50 rockpi systemd[1]: mariadb.service: Failed with result 'exit-code'.
Dec 13 22:06:50 rockpi systemd[1]: Failed to start MariaDB 10.1.47 database server.
Why it started reporting Unknown/unsupported storage engine innodb now when it was working all this time?
Incidentally, I used another third node to bootstrap (start Galera_new_cluster).

Cant start mariadb after switching Data Directory

Im trying to switch the data directory of mariadb to my HDD drive. But if i change datadir variable in the 50-server.cnf, mariadb wont start.
I have already modified the new directory with chmod and chown.
After switching the directory i get this message:
pi#raspberrypi:/etc/mysql/mariadb.conf.d $ sudo systemctl restart mariadb
Job for mariadb.service failed because the control process exited with error >code.
See "systemctl status mariadb.service" and "journalctl -xe" for details.
This is the Error:
● mariadb.service - MariaDB 10.1.38 database server
Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sat 2019-08-10 13:10:25 CEST; 3min 26s ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 5386 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
Process: 5309 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=/usr/bin/galera_recovery; [ $? -eq 0 ] &&
Process: 5305 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 5302 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
Main PID: 5386 (code=exited, status=1/FAILURE)
Status: "Starting Innodb crash recovery"
Aug 10 13:10:20 raspberrypi systemd[1]: Starting MariaDB 10.1.38 database server...
Aug 10 13:10:22 raspberrypi mysqld[5386]: 2019-08-10 13:10:22 1996119856 >[Note] /usr/sbin/mysqld (mysqld 10.1.38-MariaDB-0+deb9u1) starti
Aug 10 13:10:25 raspberrypi systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
Aug 10 13:10:25 raspberrypi systemd[1]: Failed to start MariaDB 10.1.38 database server.
Aug 10 13:10:25 raspberrypi systemd[1]: mariadb.service: Unit entered failed state.
Aug 10 13:10:25 raspberrypi systemd[1]: mariadb.service: Failed with result 'exit-code'.

502 Bad Gateway and failed to read PID from file /run/nginx.pid: Invalid argument using nginx and gunicorn

I already successfully deployed nginx and gunicorn in my centos 7 server but got 502 Bad Gateway error message. I'm using nginx/1.12.2. I already check both status for gunicorn and nginx.
gunicorn status
● deepagi.service - Gunicorn instance to serve deepagi
Loaded: loaded (/etc/systemd/system/deepagi.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2017-12-02 10:49:30 UTC; 41min ago
Main PID: 1829 (gunicorn)
CGroup: /system.slice/deepagi.service
├─1829 /root/deepagi/deepagienv/bin/python2 /root/deepagi/deepagienv/bin/gunicorn --workers 3 --bind unix:deepagi.sock -m 007 wsgi
├─1834 /root/deepagi/deepagienv/bin/python2 /root/deepagi/deepagienv/bin/gunicorn --workers 3 --bind unix:deepagi.sock -m 007 wsgi
├─1839 /root/deepagi/deepagienv/bin/python2 /root/deepagi/deepagienv/bin/gunicorn --workers 3 --bind unix:deepagi.sock -m 007 wsgi
└─1840 /root/deepagi/deepagienv/bin/python2 /root/deepagi/deepagienv/bin/gunicorn --workers 3 --bind unix:deepagi.sock -m 007 wsgi
Dec 02 10:49:30 DeepAGI systemd[1]: Started Gunicorn instance to serve deepagi.
Dec 02 10:49:30 DeepAGI systemd[1]: Starting Gunicorn instance to serve deepagi...
Dec 02 10:49:30 DeepAGI gunicorn[1829]: [2017-12-02 10:49:30 +0000] [1829] [INFO] Starting gunicorn 19.7.1
Dec 02 10:49:30 DeepAGI gunicorn[1829]: [2017-12-02 10:49:30 +0000] [1829] [INFO] Listening at: unix:deepagi.sock (1829)
Dec 02 10:49:30 DeepAGI gunicorn[1829]: [2017-12-02 10:49:30 +0000] [1829] [INFO] Using worker: sync
Dec 02 10:49:30 DeepAGI gunicorn[1829]: [2017-12-02 10:49:30 +0000] [1834] [INFO] Booting worker with pid: 1834
Dec 02 10:49:30 DeepAGI gunicorn[1829]: [2017-12-02 10:49:30 +0000] [1839] [INFO] Booting worker with pid: 1839
Dec 02 10:49:30 DeepAGI gunicorn[1829]: [2017-12-02 10:49:30 +0000] [1840] [INFO] Booting worker with pid: 1840
nginx status
● nginx.service - The nginx HTTP and reverse proxy server
Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2017-12-02 11:16:18 UTC; 14min ago
Main PID: 2317 (nginx)
CGroup: /system.slice/nginx.service
├─2317 nginx: master process /usr/sbin/nginx
└─2318 nginx: worker process
Dec 02 11:16:18 DeepAGI systemd[1]: Starting The nginx HTTP and reverse proxy server...
Dec 02 11:16:18 DeepAGI nginx[2312]: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
Dec 02 11:16:18 DeepAGI nginx[2312]: nginx: configuration file /etc/nginx/nginx.conf test is successful
Dec 02 11:16:18 DeepAGI systemd[1]: Failed to read PID from file /run/nginx.pid: Invalid argument
Dec 02 11:16:18 DeepAGI systemd[1]: Started The nginx HTTP and reverse proxy server.
But I saw in nginx status got this kind of error message
Dec 02 11:16:18 DeepAGI systemd[1]: Failed to read PID from file /run/nginx.pid: Invalid argument
How to solve this?

nginx fails to start - troubleshoot

I have a simple nginx config that is syntactically correct. I install nginx using chef and the chef script works fine.
But as I check status of nginx , I see it is in failed state. If I reload nginx , it again goes in failed state. journalctl -xn also doesnt give much of error except :
[root#localhost vagrant]# journalctl -xn
-- Logs begin at Wed 2016-10-26 04:28:18 UTC, end at Wed 2016-10-26 04:45:00 UTC. --
Oct 26 04:45:00 localhost.localdomain kill[17003]: -s, --signal <sig> send specified signal
Oct 26 04:45:00 localhost.localdomain kill[17003]: -q, --queue <sig> use sigqueue(2) rather than kill(2)
Oct 26 04:45:00 localhost.localdomain kill[17003]: -p, --pid print pids without signaling them
Oct 26 04:45:00 localhost.localdomain kill[17003]: -l, --list [=<signal>] list signal names, or convert one to a name
Oct 26 04:45:00 localhost.localdomain kill[17003]: -L, --table list signal names and numbers
Oct 26 04:45:00 localhost.localdomain kill[17003]: -h, --help display this help and exit
Oct 26 04:45:00 localhost.localdomain kill[17003]: -V, --version output version information and exit
Oct 26 04:45:00 localhost.localdomain kill[17003]: For more details see kill(1).
Oct 26 04:45:00 localhost.localdomain systemd[1]: nginx.service: control process exited, code=exited status=1
Oct 26 04:45:00 localhost.localdomain systemd[1]: Unit nginx.service entered failed state.
[root#localhost vagrant]#
nginx -t is successful and I see nothing in /var/log/nginx/errors.log
Is there any other way to troubleshoot exactly why this fails ?
Both systemctl status nginx.service gives:
[root#localhost vagrant]# systemctl status nginx.service
nginx.service - The nginx HTTP and reverse proxy server
Loaded: loaded (/usr/lib/systemd/system/nginx.service; static)
Active: failed (Result: exit-code) since Wed 2016-10-26 04:45:00 UTC; 9h ago
Process: 17003 ExecStop=/bin/kill -s QUIT $MAINPID (code=exited, status=1/FAILURE)
Process: 16999 ExecStart=/opt/nginx-1.10.1/sbin/nginx (code=exited, status=0/SUCCESS)
Process: 16998 ExecStartPre=/opt/nginx-1.10.1/sbin/nginx -t (code=exited, status=0/SUCCESS)
Main PID: 16999 (code=exited, status=0/SUCCESS)
Oct 26 04:45:00 localhost.localdomain kill[17003]: -s, --signal <sig> send specified signal
Oct 26 04:45:00 localhost.localdomain kill[17003]: -q, --queue <sig> use sigqueue(2) rather than kill(2)
Oct 26 04:45:00 localhost.localdomain kill[17003]: -p, --pid print pids without signaling them
Oct 26 04:45:00 localhost.localdomain kill[17003]: -l, --list [=<signal>] list signal names, or convert one to a name
Oct 26 04:45:00 localhost.localdomain kill[17003]: -L, --table list signal names and numbers
Oct 26 04:45:00 localhost.localdomain kill[17003]: -h, --help display this help and exit
Oct 26 04:45:00 localhost.localdomain kill[17003]: -V, --version output version information and exit
Oct 26 04:45:00 localhost.localdomain kill[17003]: For more details see kill(1).
Oct 26 04:45:00 localhost.localdomain systemd[1]: nginx.service: control process exited, code=exited status=1
Oct 26 04:45:00 localhost.localdomain systemd[1]: Unit nginx.service entered failed state.
systemctl cat nginx.service gives :
[root#virsinplatformapi02 sysadmin]# systemctl cat nginx.service
Unknown operation 'cat'.
I cd cd /lib/systemd/system and do cat on nginx.service:
[root#virsinplatformapi02 system]# cat nginx.service
[Unit]
Description=The nginx HTTP and reverse proxy server
After=network.target remote-fs.target nss-lookup.target
[Service]
ExecStartPre=/opt/nginx-1.10.1/sbin/nginx -t
ExecStart=/opt/nginx-1.10.1/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
[Install]
If I do a echo $MAINPID , I get nothing.
That is not very good unitfile. Type is not set and defaults to simple, while you want to you forking for nginx. That may be the reason for the wrong $MAINPID value. Try to use official unit:
[Unit]
Description=The NGINX HTTP and reverse proxy server
After=syslog.target network.target remote-fs.target nss-lookup.target
[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/opt/nginx-1.10.1/sbin/nginx -t
ExecStart=/opt/nginx-1.10.1/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
[Install]
WantedBy=multi-user.target
You should just add it to /etc/systemd/system/nginx.service - that directory it intended for administrator-created units, and has priority.

Resources