How do I get back to the running instance of riak-shell? - riak

I was in riak-shell when ssh lost its connection to the server. After reconnecting, I do the following:
sudo riak-shell
and get:
An instance of riak-shell is already running
So, I restarted the riak node in question. This did not seem to solve the problem. I do not see anything using ps -aux to kill. According to the docs, only one instance can run at a time. That makes sense, but when I run riak-shell from another node and try to connect to any node, I now get the following:
Error: invalid function call : connection_EXT:connect ["riak#<<<ip_address_elided>>>"]
You can connect to a specific node (whether in your riak_shell.config
or not) by typing 'connect "dev1#127.0.0.1";' substituting your
node name for dev1.
You may need to change the Erlang cookie to do this.
See also the 'reconnect' command.
Unhandled message received is {#Ref<0.0.0.135>,disconnected}
riak-shell(3)>
I have not changed the cookies during this process, and the cookie appears to be the same (at least in /etc/riak/riak_shell.config). (I am running the Riak TS AMI on AWS.)

riak-shell runs in its own Erlang VM - entirely separate from the riak node
(You don't need to run riak-shell from the machine your node is on - it uses the normal riak-erlang-client to talk to riak)
If you you are on a Linux do ps aux | grep riak_shell_app it will give you the process number you need to kill that instance:
08:30:45:~ $ ps aux | grep riak_shell_app
vagrant 4671 0.0 0.3 493260 34884 pts/4 Sl+ Aug17 0:03 /home/vagrant/riak_ee/dev/dev1/erts-5.10.3/bin/beam.smp -- -root /home/vagrant/riak_ee/dev/dev1 -progname erl -- -home /home/vagrant -- -boot /home/vagrant/riak_ee/dev/dev1/releases/2.1.1/start_clean -run riak_shell_app boot debug_off /home/vagrant/riak_ee/dev/dev1/bin/../log/riak_shell/riak_shell -noshell -config /home/vagrant/riak_ee/dev/dev1/bin/../etc/riak
I wrote a good chunk of it so let me know how you got on:
https://github.com/basho/riak_shell/graphs/contributors

Related

Ubuntu (Oracle VM) - Mounted Samba shares hang indefinitely

I have a VM instance on Oracle Cloud (Ubuntu 22.04) set up with ZeroTier to act as a web server for some services that should work with my local Synology NAS.
For some of those services I also need to mount three SMB shares from my NAS with the ZeroTier tunnel, but I can't make it work.
I used mount and mount.cifs plenty of times with automounting too, this time it acts very strange:
running the mount command seems to succeed from the console, but /var/log/syslog reads
CIFS: VFS: \\XXX.XXX.XXX.XXX has not responded in 180 seconds.
Reconnecting...
if trying to access one of the shares (ls or lsof or cd or any other command), it succeeds for only one of the shares (always the same one), but only for the first time any command is given:
$ ls /temp
folder1 folder2 folder3
any other following command just "hangs" as if they system is working on something, but it stays like that indefinitely most of the times:
$ ls /temp
█
Just a few times it spits out this error
lsof: WARNING: can't stat() cifs file system /temp
Output information may be incomplete.
ls 1475 ubuntu 3r DIR 0,44 0 123207681 /temp
findmnt reads:
└─/temp //XXX.XXX.XXX.XXX/Downloads cifs rw,relatime,vers=2.0,cache=strict, username=[redacted],uid=1005,noforceuid,gid=0,noforcegid,addr=XXX.XXX.XXX.XXX,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=65536,wsize=65536,bsize=1048576,echo_interval=60,actimeo=1
for the remaining two "mounted" shares, none of them seems to respond to any command, not even the very first command, and they just hang like the one share that, at least, lets me browse for one time;
umount and umount -l take at least 2-3 minutes to successfully unmount the shares.
Same behavior when using smbclient and also with NFS shares from the same NAS.
What I have already tried:
update kernel and all packages;
remove, purge and reinstall cifs-utils, smbclient and so on...
tried mounting the same shares in another client / node within the ZeroTier network and it works just fine; also browsing from Windows and Android file manager apps with and without ZeroTier works flawlessly;
tried all SMB versions including SMBv3 and SMBv1 (CIFS);
tried different browsing or mounting methods / commands including mount, mount.cifs, autofs, smbclient;
tried to debug what happens behind the console, but didn't found anything that seems related to this in logs, htop or anything else. During the "hanging" sessions there is no spike in CPU, RAM or Network usage in either the Oracle VM or Synology NAS;
checked, reset and reconfigured all permissions on my NAS for shares, folders and files recursively and reconfigured users groups permissions.
What I haven't tried yet (I'll try as soon as possible):
reproduce this on another Oracle VM configured the same as the faulty one and another with a different base image (maybe Oracle Linux?);
It seems to me that the mount.cifs process doesn't really succeeds in mounting the share correctly, as it doesn't show as such anywhere. It also seems an issue not related to folder/file permissions, but rather something related to networking?
A note on something that may or may not be related to this: ZeroTier on my Synology NAS does not seems to work with IPv4 only - it remains OFFLINE. The node goes ONLINE only when IPv6 is enabled, but I must say that this is the only node in my ZT network that shows a IPv6 as public IP in the ZT web GUI - the other nodes show IPv4 public addresses.
If anyone has any clue on this, I'll be happy to support and reproduce any advice. Thank you!
I'm using YailScale, but I presume it will work the same.
You need to add the port 445 to /etc/iptables/rules.v4 just under the SSH setup like below:
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 445 -j ACCEPT (like this)
Then you need to edit the interfaces in /etc/samba/smb.conf to:
interfaces = lo tailscale0 100.0.0.0/24
Obviously, my interface is tailscale0, but yours will be different. Use ip link show to find yours. You may also need to change your IP range to suit ZeroTeirs, such as 100.0.0.0/24, which is what tailscale uses.
Then reboot!
I couldn't get it working without doing this.

Weird delay when using "tail -f" command

To monitor a log file I have to connect to an ssh connection and redirect the output of the log file(let's call it RemoteLog.txt) out to a local machine so it can be read by a java program and put on a GUI.
Right now I have the output redirected out of the ssh connection and onto the local machine with the command:
ssh remote#ip.address tail logs/RemoteLog.txt -f > ~/Log/LocalLog.txt
and everything works fine technically with one exception: for some reason LocalLog.txt only gets updated with the changes to RemoteLog.txt every 35 seconds to the millisecond.
It doesn't matter the number of changes to RemoteLog, the number of lines specified with the tail command, or using the >> operator vs the > operator; there is always a 35 second delay between updates of LocalLog.txt while RemoteLog is constantly updating.
Does anyone have any clue why this might be?

Installation of Riak under Ubuntu 14.04 LTS

I cant bring riak to work on Ubuntu 14.04. LTS using the bash instructions under
http://docs.basho.com/riak/latest/ops/building/installing/debian-ubuntu/.
When running riak start I get:
riak failed to start within 15 seconds,
see the output of 'riak console' for more information.
If you want to wait longer, set the environment variable
WAIT_FOR_ERLANG to the number of seconds to wait.
When running riak console afterwards:
Exec: /usr/lib/riak/erts-5.10.3/bin/erlexec -boot /usr/lib/riak/releases/2.1.3/riak -config /var/lib/riak/generated.configs/app.2016.02.28.21.43.04.config -args_file /var/lib/riak/generated.configs/vm.2016.02.28.21.43.04.args -vm_args /var/lib/riak/generated.configs/vm.2016.02.28.21.43.04.args -pa /usr/lib/riak/lib/basho-patches -- console -x
Root: /usr/lib/riak
Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:2:2] [async-threads:64] [kernel-poll:true] [frame-pointer]
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#127.0.0.1',[{'riak#54.194.69.48',[{{riak_core,bucket_types},[true,false]},{{riak_core,fold_req_version},[v2,v1]},{{riak_core,net_ticktime},[true,false]},{{riak_core,resizable_ring},[true,false]},{{riak_core,security},[true,false]},{{riak_core,staged_joins},[true,false]},{{riak_core,vnode_routing},[proxy,legacy]},{{riak_pipe,trace_format},[ordsets,sets]}]}]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_capability,renegotiate_capabilities,1,[{file,\"src/riak_core_capability.erl\"},{line,441}]},{riak_core_capability,handle_call,3,[{file,\"src/riak_core_capability.erl\"},{line,213}]},{gen_server,handle_msg,5,[{file,\"gen_server.erl\"},{line,585}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]},{gen_server,call,[riak_core_capability,{register,{riak_core,vnode_routing},{capability,[proxy,legacy],legacy,{riak_core,legacy_vnode_routing,[{true,legacy},{false,proxy}]}}},infinity]}}}}}}"}
Any idea how to fix this? Installation has been done via apt-get. Default riak.conf. Riak version is 2.1.3.
This is a Riak error, not at all related to Ubuntu.
The error message indicates that the current name of the node does not match the name of any node in the ring file. This can happen if you start the node with a default configuration before configuring the node's name. See Note on changing the name value at http://docs.basho.com/riak/latest/ops/building/basic-cluster-setup/
If this is a singleton node, the simplest solution will be to delete the files in /var/lib/riak/ring (make a backup first). A new one will be created when you start the node.

openMPI/mpich2 doesn't run on multiple nodes

I am trying to use install openMPI and mpich2 on a multi-node cluster and I am having trouble running on multiple machines in both cases. Using mpich2 I am able to run on an specific host from the head node, but if I try to run something from the compute nodes to a different node I get:
HYDU_sock_connect (utils/sock/sock.c:172): unable to connect from "destination_node" to "parent_node" (No route to host)
[proxy:0:0#destination_node] main (pm/pmiserv/pmip.c:189): unable to connect to server parent_node at port 56411 (check for firewalls!)
If I try to use sge to set up a job I get similar errors.
On the other hand, if I try to use openMPI to run jobs, I am not able to run in any remote machine, even from the head node. I get:
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
The machines are connected to each other, I can ping, ssh passwordlessly etc from any of them to any other, MPI_LIB and the PATH are well set in all machines.
Usually this is caused because you didn't set up a hostfile or pass the list of hosts on the command line.
For MPICH, you do this by passing the flag -host on the command line, followed by a list of hosts (host1,host2,host3,etc.).
mpiexec -host host1,host2,host3 -n 3 <executable>
You can also put these in a file:
host1
host2
host3
Then you pass that file on the command line like so:
mpiexec -f <hostfile> -n 3 <executable>
Similarly, with Open MPI, you would use:
mpiexec --host host1,host2,host3 -n 3 <executable>
and
mpiexec --hostfile hostfile -n 3 <executable>
You can get more information at these links:
MPICH - https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Hydra_with_Non-Ethernet_Networks
Open MPI - http://www.open-mpi.org/faq/?category=running#mpirun-hostfile

mpi_comm_spawn on remote nodes

How does one use MPI_Comm_spawn to start worker processes on remote nodes?
Using OpenMPI 1.4.3, I've tried this code:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "host", "node2");
MPI_Comm intercom;
MPI_Comm_spawn("worker",
MPI_ARGV_NULL,
nprocs,
info,
0,
MPI_COMM_SELF,
&intercom,
MPI_ERRCODES_IGNORE);
But that fails with this error message:
--------------------------------------------------------------------------
There are no allocated resources for the application
worker
that match the requested mapping:
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
If I replace the "node2" with the name of my local machine, then it works fine. If I ssh into node2 and run the same thing there (with "node2" in the info dictionary) then it also works fine.
I don't want to start the parent process with mpirun, so I'm just looking for a way to dynamically spawn processes on remote nodes. Is this possible?
I don't want to start the parent
process with mpirun, so I'm just
looking for a way to dynamically spawn
processes on remote nodes. Is this
possible?
I'm not sure why you don't want to start it with mpirun? You're implicitly starting up the whole MPI machinery anyway as soon as you hit MPI_Init(), this way you just get to pass it options rather than relying on the default.
The issue here is simply that when the MPI library starts up (at MPI_Init()) it doesn't see any other hosts available, because you haven't given it any with the --host or --hostfile options to mpirun. It won't just launch processes elsewhere on your say-so (indeed, spawn doesn't require Info host, so in general it wouldn't even know where to go otherwise), so it fails.
So you'll need to do
mpirun --host myhost,host2 -np 1 ./parentjob
or, more generally, provide a hostfile, preferably with a number of slots available
myhost slots=1
host2 slots=8
host3 slots=8
and launch the jobs this way, mpirun --hostfile mpihosts.txt -np 1 ./parentjob This is a feature, not a bug; now it's MPIs job to figure out where the workers go, and if you don't specify a host explicitly in the info, it'll try to put it in the most underutilized place. It also means you don't have to recompile to change the hosts you'll spawn to.

Resources