Galera first node won´t start - mariadb

Ive been trying to set up a Galera Cluster. Since Im new to Linux I used the guide from mariadb (Link). I made everything as it stands there but the first node just won´t start when I use the command "service mysql start --wsrep-new-cluster". Im always getting the error:
Failed to open channel 'cluster1' at 'gcomm://10.1.0.11,10.1.0.12,10.1.0.13': -110 (Connection timed out)
My config file on all three nodes looks like this:
#mysql settings
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
#galera settings
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="cluster1"
wsrep_cluster_address="gcomm://10.1.0.11,10.1.0.12,10.1.0.13"
wsrep_sst_method=rsync

Change MySQL config (remove IP addresses from gcomm://) at start-up of 1st cluster node or start cluster with --wsrep_cluster_address="gcomm://", that should do the trick.
Then you can add those IP address back into config - so that current 1st node can rejoin running cluster.
Haven't looked into it deep, but looks like option "--wsrep-new-cluster" is not handled correctly, because 1st node is still looking for live nodes, so you must temporarily remove all members of the cluster on 1st node (all IPs from cluster_address field).
Start all other nodes normally.
Newer OS versions use "bootstrap" instead "--wsrep-new-cluster".
My versions: Debian 9.4.0, MariaDB 10.1.26, Galera 25.3.19-2.

Related

Galera /var/lib/mysql/grastate.dat missing after upgrade

I am upgrading my galera mariadb cluster from mariadb -10.3 to mariadb 10.6. I am trying to start a new cluster : sudo /usr/bin/galera_new_cluster . I see an error ,
[ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
I am seeing references to set the variable safe_to_bootstrap to 1 in /var/lib/mysql/grastate.dat.
There is no such file grastate.dat in the directory in any of my cluster nodes. This is one of my qa servers. Any pointers if I can create one file new?
Thanks

Configure slave replication for MariaDB Galera

MariaDB 10.2.10+Centos 7.
I have configured the MariaDB Galera Cluster with HAProxy, and tested successfully.
For backup, I wanted to add one async replication slave for the Galera cluster, but failed.
Below is my action:
After all galera cluster actions were done, I added below configuration under each galera node's /etc/my.cnf.d/server.cnf's [mysqld] section:
[mysqld]
log_bin
log_slave_updates
gtid_strict_mode
server_id=1
[galera]
wsrep_gtid_mode
and added below configuration under each slave node's /etc/my.cnf.d/server.cnf's [mysqld] section:
[mysqld]
binlog_format=ROW
log_bin
log_slave_updates
server_id=2
gtid_strict_mode
Later created one user for replication, and did mysqldump out on one galera node and did an import on slave node.
Then ran on slave:
stop slave; change master to master_host='one galera node name ',master_port=3306,master_user='repl_user',master_password='repl_password',master_use_gtid=current_pos; start slave;
but failed.
The error msg is:
Got fatal error 1236 from master when reading data from binary log:
'Error: connecting slave requested to start from GTID 0-2-11, which is
not in the master's binlog'
Do you have any suggestion, if any, very appreciated.
After researching, I modified the settings I mentioned above:
on each node of the Galera Cluster, they have the same domain id and different server id:
[mysqld]
log_bin
log_slave_updates
gtid_strict_mode
gtid_domain_id=1
server_id=1
[galera]
wsrep_gtid_mode
on the slave node, slave node has the different domain id and server id:
[mysqld]
binlog_format=ROW
log_bin
log_slave_updates
gtid_domain_id=2
server_id=2
then do mysqldump out and mysql import, last run
change master to master_host='one galera node name',master_port=3306, master_user='repl_user',master_password='aa',master_use_gtid=current_pos;
start slave;
Everything goes well.
When I add database or table or insert data into one table, it can sync to the slave node.
#Winson He
the explanation is wrong. Its should be as follows :
galera node 1, 2, 3 => Same domain_ID and Unique server_id for each node.
Slave Node => Different domain_ID and unique server_id.
So in fact, no matter Cluster/Master/Slave all servers have a unique server_id and Galera cluster nodes will have same domain_id and slaves lie in different domain_id.
on the async slave node, since we set the master_address to one node of the galera cluster. If the particular node goes down, then the slavereplication will be stopped. how to ensure even if one node goes down, slave replication happens from the other existing master nodes.
Kindly advise

nova-scheduler don`t rpc.cast to nova-compute, no errors, but vm in 'scheduling' state

OpenStack Juno + OpenContrail. Ubuntu 14.04.2 LTS. 2 node setup: control+compute.
Everything worked well.
Delete and reinstall compute node.
Now when starting new vm its stuck in 'scheduling' state.
No errors in logs.
With debug I see how nova-scheduler doing filtering and now should
pass rpc.cast to nova-compute.
nova-compute shows nothing in debug.
p.s. rabbit is ok, I see many control connections and 3 connections from compute node.
If you exec nova list, are you see network interfaces of the new vm?

Impala 1.2.1 ERROR: Couldn't open transport for localhost:26000(connect() failed: Connection refused)

Using impala-shell, I can see the hive metastore, use any data base created by Hive and query any table created by Hive. When I try to create a table in impala-shell or do a "invalidate metadata", I get
"ERROR: Couldn't open transport for localhost:26000(connect() failed: Connection refused)"
Have following configuration. This is a multi-node cluster configuration * built by hand i.e. without using Cloudera Manager *
CentOS 6
CDH4.5
Impala 1.2.1
Hive MySQL Metastore
impalad are running on multiple nodes with data nodes
statestored and catalogd is running on a single node that is NOT impalad node
In /etc/default/impala I have changed IMPALA_STATE_STORE_HOST to point to IP of the statestored machine
From the /var/log/impala/catalogd.INFO, it seems 26000 is used by catalog service as there is a line in this file "--catalog_service_port=26000"
Just as /etc/default/impala has to tell Impalad where is the statestore (using IMPALA_STATE_STORE_HOST), I am wondering if for 1.2.1 (where catalogd is introduced) there has to be an additional entry for catalogd location as well - just a guess ....
Any help is appreciated.
Thanks,
you have to start the impalad with the option -catalog_service_host=fqdn_to_your_catalog_host.
unfortunately this is not yet in the default configuration so you have to add it yourself
change /etc/default/impala
CATALOG_SERVICE_HOST=fqdn_to_your_catalog_host
IMPALA_SERVER_ARGS=add: -catalog_service_host=${CATALOG_SERVICE_HOST}
restart impalad and it should work now :-)

Node dev1#192.168.1.11 is not reachable

First, I followed exactly "The Riak Fast Track" tutorial to build a four nodes Riak cluster.
Then, I changed the 127.0.0.1 IP to 192.168.1.11 in dev[1-4]/etc/app.config files, and reinstalled clusters(delete dev[1-4], fresh install).
but Riak tells me:
Node dev1#192.168.1.11 is not reachable when I issue dev2/bin/riak-admin cluster join dev1#192.168.1.11
What's wrong?
+1 to what Brian Roach said in the comment.
Make sure to update the node name and IP address in both the app.config files AND the vm.args, before you start up the node.
Make sure node dev1 is up and reachable, before issuing a cluster join command to dev2.
Meaning, make sure dev1/bin/riak ping returns a 'pong', etc.

Resources