Node stuck in LEAVING state after leaving from cluster in Riak - riak

I have having 5 node cluster of Riak KV on my production, I simply leaving a node from cluster because of some reason but i am facing issues with status as leaving since last 7 days how we remove this issues.
I tested for force-remove as we all force-replace node from cluster Locally by using command
sudo riak-admin force-remove -f riak#172.xx.xx.8
and for force-replace I follow this link https://gist.github.com/angrycub/4566736
but in this case I losses some data.
How do I fix these type of issues ?

Don't use force-remove command but riak-admin cluster leave. See this answer to more details Riak Force remove node from Riak KV cluster .

Related

Galera /var/lib/mysql/grastate.dat missing after upgrade

I am upgrading my galera mariadb cluster from mariadb -10.3 to mariadb 10.6. I am trying to start a new cluster : sudo /usr/bin/galera_new_cluster . I see an error ,
[ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
I am seeing references to set the variable safe_to_bootstrap to 1 in /var/lib/mysql/grastate.dat.
There is no such file grastate.dat in the directory in any of my cluster nodes. This is one of my qa servers. Any pointers if I can create one file new?
Thanks

Backup Galera cluster using mysqldump

I have a 3-node Galera MariaDB cluster and I want to have a supplementary backup using mysqldump for restore of individual tables in the event of any user errors. Currently Node1 is being used by all applications while node2 and node3 are just kept in sync. I want to run mysqldump from idle Node3. Should I not use --flush-logs? Also should I use --master-data option?
I ran mysqldump backup in a pre-prod cluster (same setup as production) from an idle node Node3 with these options
But as soon as I ran mysqldump, the data in few tables (checked only few at random) and they were not in sync with other nodes. But in few minutes it came back in sync with other nodes.
mysqldump -u root -pPassword --host=localhost --all-databases --flush-logs --events --routines --single-transaction --master-data=2 --include-master-host-port
My question is:
a) Should I avoid using --flush-logs option in my mysqldump? --Is it the cause for the current node going out of sync?
b) Should I even include --master-data option in the mysqldump command?
Take node3 out of the cluster.
Do whatever dump you like (mysqldump, copy disk, xtrabackup, etc)
Put back into the cluster -- it will repair itself to get back in sync.

How do I get back to the running instance of riak-shell?

I was in riak-shell when ssh lost its connection to the server. After reconnecting, I do the following:
sudo riak-shell
and get:
An instance of riak-shell is already running
So, I restarted the riak node in question. This did not seem to solve the problem. I do not see anything using ps -aux to kill. According to the docs, only one instance can run at a time. That makes sense, but when I run riak-shell from another node and try to connect to any node, I now get the following:
Error: invalid function call : connection_EXT:connect ["riak#<<<ip_address_elided>>>"]
You can connect to a specific node (whether in your riak_shell.config
or not) by typing 'connect "dev1#127.0.0.1";' substituting your
node name for dev1.
You may need to change the Erlang cookie to do this.
See also the 'reconnect' command.
Unhandled message received is {#Ref<0.0.0.135>,disconnected}
riak-shell(3)>
I have not changed the cookies during this process, and the cookie appears to be the same (at least in /etc/riak/riak_shell.config). (I am running the Riak TS AMI on AWS.)
riak-shell runs in its own Erlang VM - entirely separate from the riak node
(You don't need to run riak-shell from the machine your node is on - it uses the normal riak-erlang-client to talk to riak)
If you you are on a Linux do ps aux | grep riak_shell_app it will give you the process number you need to kill that instance:
08:30:45:~ $ ps aux | grep riak_shell_app
vagrant 4671 0.0 0.3 493260 34884 pts/4 Sl+ Aug17 0:03 /home/vagrant/riak_ee/dev/dev1/erts-5.10.3/bin/beam.smp -- -root /home/vagrant/riak_ee/dev/dev1 -progname erl -- -home /home/vagrant -- -boot /home/vagrant/riak_ee/dev/dev1/releases/2.1.1/start_clean -run riak_shell_app boot debug_off /home/vagrant/riak_ee/dev/dev1/bin/../log/riak_shell/riak_shell -noshell -config /home/vagrant/riak_ee/dev/dev1/bin/../etc/riak
I wrote a good chunk of it so let me know how you got on:
https://github.com/basho/riak_shell/graphs/contributors

Installation of Riak under Ubuntu 14.04 LTS

I cant bring riak to work on Ubuntu 14.04. LTS using the bash instructions under
http://docs.basho.com/riak/latest/ops/building/installing/debian-ubuntu/.
When running riak start I get:
riak failed to start within 15 seconds,
see the output of 'riak console' for more information.
If you want to wait longer, set the environment variable
WAIT_FOR_ERLANG to the number of seconds to wait.
When running riak console afterwards:
Exec: /usr/lib/riak/erts-5.10.3/bin/erlexec -boot /usr/lib/riak/releases/2.1.3/riak -config /var/lib/riak/generated.configs/app.2016.02.28.21.43.04.config -args_file /var/lib/riak/generated.configs/vm.2016.02.28.21.43.04.args -vm_args /var/lib/riak/generated.configs/vm.2016.02.28.21.43.04.args -pa /usr/lib/riak/lib/basho-patches -- console -x
Root: /usr/lib/riak
Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:2:2] [async-threads:64] [kernel-poll:true] [frame-pointer]
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#127.0.0.1',[{'riak#54.194.69.48',[{{riak_core,bucket_types},[true,false]},{{riak_core,fold_req_version},[v2,v1]},{{riak_core,net_ticktime},[true,false]},{{riak_core,resizable_ring},[true,false]},{{riak_core,security},[true,false]},{{riak_core,staged_joins},[true,false]},{{riak_core,vnode_routing},[proxy,legacy]},{{riak_pipe,trace_format},[ordsets,sets]}]}]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_capability,renegotiate_capabilities,1,[{file,\"src/riak_core_capability.erl\"},{line,441}]},{riak_core_capability,handle_call,3,[{file,\"src/riak_core_capability.erl\"},{line,213}]},{gen_server,handle_msg,5,[{file,\"gen_server.erl\"},{line,585}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]},{gen_server,call,[riak_core_capability,{register,{riak_core,vnode_routing},{capability,[proxy,legacy],legacy,{riak_core,legacy_vnode_routing,[{true,legacy},{false,proxy}]}}},infinity]}}}}}}"}
Any idea how to fix this? Installation has been done via apt-get. Default riak.conf. Riak version is 2.1.3.
This is a Riak error, not at all related to Ubuntu.
The error message indicates that the current name of the node does not match the name of any node in the ring file. This can happen if you start the node with a default configuration before configuring the node's name. See Note on changing the name value at http://docs.basho.com/riak/latest/ops/building/basic-cluster-setup/
If this is a singleton node, the simplest solution will be to delete the files in /var/lib/riak/ring (make a backup first). A new one will be created when you start the node.

Node dev1#192.168.1.11 is not reachable

First, I followed exactly "The Riak Fast Track" tutorial to build a four nodes Riak cluster.
Then, I changed the 127.0.0.1 IP to 192.168.1.11 in dev[1-4]/etc/app.config files, and reinstalled clusters(delete dev[1-4], fresh install).
but Riak tells me:
Node dev1#192.168.1.11 is not reachable when I issue dev2/bin/riak-admin cluster join dev1#192.168.1.11
What's wrong?
+1 to what Brian Roach said in the comment.
Make sure to update the node name and IP address in both the app.config files AND the vm.args, before you start up the node.
Make sure node dev1 is up and reachable, before issuing a cluster join command to dev2.
Meaning, make sure dev1/bin/riak ping returns a 'pong', etc.

Resources