openMPI/mpich2 doesn't run on multiple nodes - networking

I am trying to use install openMPI and mpich2 on a multi-node cluster and I am having trouble running on multiple machines in both cases. Using mpich2 I am able to run on an specific host from the head node, but if I try to run something from the compute nodes to a different node I get:
HYDU_sock_connect (utils/sock/sock.c:172): unable to connect from "destination_node" to "parent_node" (No route to host)
[proxy:0:0#destination_node] main (pm/pmiserv/pmip.c:189): unable to connect to server parent_node at port 56411 (check for firewalls!)
If I try to use sge to set up a job I get similar errors.
On the other hand, if I try to use openMPI to run jobs, I am not able to run in any remote machine, even from the head node. I get:
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
The machines are connected to each other, I can ping, ssh passwordlessly etc from any of them to any other, MPI_LIB and the PATH are well set in all machines.

Usually this is caused because you didn't set up a hostfile or pass the list of hosts on the command line.
For MPICH, you do this by passing the flag -host on the command line, followed by a list of hosts (host1,host2,host3,etc.).
mpiexec -host host1,host2,host3 -n 3 <executable>
You can also put these in a file:
host1
host2
host3
Then you pass that file on the command line like so:
mpiexec -f <hostfile> -n 3 <executable>
Similarly, with Open MPI, you would use:
mpiexec --host host1,host2,host3 -n 3 <executable>
and
mpiexec --hostfile hostfile -n 3 <executable>
You can get more information at these links:
MPICH - https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Hydra_with_Non-Ethernet_Networks
Open MPI - http://www.open-mpi.org/faq/?category=running#mpirun-hostfile

Related

Ubuntu (Oracle VM) - Mounted Samba shares hang indefinitely

I have a VM instance on Oracle Cloud (Ubuntu 22.04) set up with ZeroTier to act as a web server for some services that should work with my local Synology NAS.
For some of those services I also need to mount three SMB shares from my NAS with the ZeroTier tunnel, but I can't make it work.
I used mount and mount.cifs plenty of times with automounting too, this time it acts very strange:
running the mount command seems to succeed from the console, but /var/log/syslog reads
CIFS: VFS: \\XXX.XXX.XXX.XXX has not responded in 180 seconds.
Reconnecting...
if trying to access one of the shares (ls or lsof or cd or any other command), it succeeds for only one of the shares (always the same one), but only for the first time any command is given:
$ ls /temp
folder1 folder2 folder3
any other following command just "hangs" as if they system is working on something, but it stays like that indefinitely most of the times:
$ ls /temp
█
Just a few times it spits out this error
lsof: WARNING: can't stat() cifs file system /temp
Output information may be incomplete.
ls 1475 ubuntu 3r DIR 0,44 0 123207681 /temp
findmnt reads:
└─/temp //XXX.XXX.XXX.XXX/Downloads cifs rw,relatime,vers=2.0,cache=strict, username=[redacted],uid=1005,noforceuid,gid=0,noforcegid,addr=XXX.XXX.XXX.XXX,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=65536,wsize=65536,bsize=1048576,echo_interval=60,actimeo=1
for the remaining two "mounted" shares, none of them seems to respond to any command, not even the very first command, and they just hang like the one share that, at least, lets me browse for one time;
umount and umount -l take at least 2-3 minutes to successfully unmount the shares.
Same behavior when using smbclient and also with NFS shares from the same NAS.
What I have already tried:
update kernel and all packages;
remove, purge and reinstall cifs-utils, smbclient and so on...
tried mounting the same shares in another client / node within the ZeroTier network and it works just fine; also browsing from Windows and Android file manager apps with and without ZeroTier works flawlessly;
tried all SMB versions including SMBv3 and SMBv1 (CIFS);
tried different browsing or mounting methods / commands including mount, mount.cifs, autofs, smbclient;
tried to debug what happens behind the console, but didn't found anything that seems related to this in logs, htop or anything else. During the "hanging" sessions there is no spike in CPU, RAM or Network usage in either the Oracle VM or Synology NAS;
checked, reset and reconfigured all permissions on my NAS for shares, folders and files recursively and reconfigured users groups permissions.
What I haven't tried yet (I'll try as soon as possible):
reproduce this on another Oracle VM configured the same as the faulty one and another with a different base image (maybe Oracle Linux?);
It seems to me that the mount.cifs process doesn't really succeeds in mounting the share correctly, as it doesn't show as such anywhere. It also seems an issue not related to folder/file permissions, but rather something related to networking?
A note on something that may or may not be related to this: ZeroTier on my Synology NAS does not seems to work with IPv4 only - it remains OFFLINE. The node goes ONLINE only when IPv6 is enabled, but I must say that this is the only node in my ZT network that shows a IPv6 as public IP in the ZT web GUI - the other nodes show IPv4 public addresses.
If anyone has any clue on this, I'll be happy to support and reproduce any advice. Thank you!
I'm using YailScale, but I presume it will work the same.
You need to add the port 445 to /etc/iptables/rules.v4 just under the SSH setup like below:
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 445 -j ACCEPT (like this)
Then you need to edit the interfaces in /etc/samba/smb.conf to:
interfaces = lo tailscale0 100.0.0.0/24
Obviously, my interface is tailscale0, but yours will be different. Use ip link show to find yours. You may also need to change your IP range to suit ZeroTeirs, such as 100.0.0.0/24, which is what tailscale uses.
Then reboot!
I couldn't get it working without doing this.

MPI A process or daemon was unable to complete a TCP connection

Open MPI: 4.0.1a
HostFile:
34bb0519eAAA
a2935f150BBB
I am in machine 34bb0519eAAA. And I could use ssh a2935f150BBB to connect a2935f150BBB successfully. And also ssh 34bb0519eAAA In machine a2935f150BBB to connect 34bb0519eAAA successfully .
But when I mpiexec command . I get error message
****Warning: Permanently added '[XX.XX.XX.XX]:XX' (a2935f150BBB'IP address) to the list of known hosts.**
----------------------**--------------------------------------
A process or daemon was unable to complete a TCP connection
to another process:
Local host: a2935f150BBB
Remote host: 34bb0519eAAA
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
I am very confused that.Because I run ssh to each other successfully . How could fail that.
Here is ssh connection
ssh a2935f150BBB
Warning: Permanently added '[XX.XX.XX.XX]:XX to the list of known hosts.
Welcome to Ubuntu 18.04.1 LTS (XXXXXXXXXXXXXXXXXX)
Documentation: https://help.ubuntu.com
Management: https://landscape.canonical.com
Support: https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
Last login:XXXXXXXXXXXXX from XXXXXXXXXX

Can MPI_Publish_name be used for two separately started applications?

I write an OpenMPI application which consists of a server and a client part which are launched separately:
me#server1:~> mpirun server
and
me#server2:~> mpirun client
server creates a port using MPI_Open_port. The question is: Does OpenMPI have a mechanism to communicate the port to client? I suppose that MPI_Publish_name and MPI_Lookup_name doesn't work here because server wouldn't know to which other computer the information should be sent.
To me, it looks like only processes which were started using a single mpirun can communicate with MPI_Publish_name.
I also found ompi-server, but the documentation is too minimalistic for me to understand this. Does anyone know how this is used?
Related: MPICH: How to publish_name such that a client application can lookup_name it? and https://stackoverflow.com/questions/9263458/client-server-example-using-ompi-does-not-work
MPI_Publish_name is supplied with an MPI info object, which could have an Open MPI specific boolean key ompi_global_scope. If this key is set to true, then the name would be published to the global scope, i.e. to an already running instance of ompi-server. MPI_Lookup_name by default first does a global name lookup if the URI of the ompi-server was provided.
With a dedicated Open MPI server
The process involves several steps:
1) Start the ompi-server somewhere in the cluster where it could be accessed from all nodes. For debugging purposes you may pass it the --no-daemonize -r + argument. It would start and print to the standard output an URI similar to this one:
$ ompi-server --no-daemonize -r +
1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351
2) In the server, build an MPI info object and set the ompi_global_scope key to true:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "ompi_global_scope", "true");
Then pass the info object to MPI_Publish_name:
MPI_Publish_name("server", info, port_name);
3) In the client, the call to MPI_Lookup_name would automatically do the lookup in the global context first (this could be changed by providing the proper key in the MPI info object, but in your case the default behaviour should suffice).
In order for both client and server code to know where the ompi-server is located, you have to give its URI to both mpirun commands with the --ompi-server 1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351 option.
Another option is to have ompi-server write the URI to a file, which can then be read on the node(s) where mpirun is to be run. For example, if you start the server on the same node where both mpirun commands are executed, then you could use a file in /tmp. If you start the ompi-server on a different node, then a shared file system (NFS, Lustre, etc.) would do. Either way, the set of commands would be:
$ ompi-server [--no-daemonize] -r file:/path/to/urifile
...
$ mpirun --ompi-server file:/path/to/urifile server
...
$ mpirun --ompi-server file:/path/to/urifile client
Serverless method
If run both mpirun's on the same node, the --ompi-server could also specify the PID of an already running mpirun instance to be used as a name server. It allows you to use local name publishing in the server (i.e. skip the "run an ompi-server" and "make an info object" parts). The sequence of commands would be:
head-node$ mpirun --report-pid server
[ note the PID of this mpirun instance ]
...
head-node$ mpirun --ompi-server pid:12345 client
where 12345 should be replaced by the real PID of the server's mpirun.
You can also have the server's mpirun print its URI and pass that URI to the client's mpirun:
$ mpirun --report-uri + server
[ note the URI ]
...
$ mpirun --ompi-server URI client
You could also have the URI written to a file if you specify /path/to/file (note: no file: prefix here) instead of + after the --report-uri option:
$ mpirun --report-uri /path/to/urifile server
...
$ mpirun --ompi-server file:/path/to/urifile client
Note that the URI returned by mpirun has the same format as that of an ompi-server, i.e. it includes the host IP address, so it also works if the second mpirun is executed on a different node, which is able to talk to the first node via TCP/IP (and /path/to/urifile lives on a shared file system).
I tested all of the above with Open MPI 1.6.1. Some of the variant might not work with earlier versions.

mpi_comm_spawn on remote nodes

How does one use MPI_Comm_spawn to start worker processes on remote nodes?
Using OpenMPI 1.4.3, I've tried this code:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "host", "node2");
MPI_Comm intercom;
MPI_Comm_spawn("worker",
MPI_ARGV_NULL,
nprocs,
info,
0,
MPI_COMM_SELF,
&intercom,
MPI_ERRCODES_IGNORE);
But that fails with this error message:
--------------------------------------------------------------------------
There are no allocated resources for the application
worker
that match the requested mapping:
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
If I replace the "node2" with the name of my local machine, then it works fine. If I ssh into node2 and run the same thing there (with "node2" in the info dictionary) then it also works fine.
I don't want to start the parent process with mpirun, so I'm just looking for a way to dynamically spawn processes on remote nodes. Is this possible?
I don't want to start the parent
process with mpirun, so I'm just
looking for a way to dynamically spawn
processes on remote nodes. Is this
possible?
I'm not sure why you don't want to start it with mpirun? You're implicitly starting up the whole MPI machinery anyway as soon as you hit MPI_Init(), this way you just get to pass it options rather than relying on the default.
The issue here is simply that when the MPI library starts up (at MPI_Init()) it doesn't see any other hosts available, because you haven't given it any with the --host or --hostfile options to mpirun. It won't just launch processes elsewhere on your say-so (indeed, spawn doesn't require Info host, so in general it wouldn't even know where to go otherwise), so it fails.
So you'll need to do
mpirun --host myhost,host2 -np 1 ./parentjob
or, more generally, provide a hostfile, preferably with a number of slots available
myhost slots=1
host2 slots=8
host3 slots=8
and launch the jobs this way, mpirun --hostfile mpihosts.txt -np 1 ./parentjob This is a feature, not a bug; now it's MPIs job to figure out where the workers go, and if you don't specify a host explicitly in the info, it'll try to put it in the most underutilized place. It also means you don't have to recompile to change the hosts you'll spawn to.

Unable to rsync between my server and my Mac

I have a server where I store data from Mac A and Mac B.
I use rsync to keep the files updated between my Macs.
I run the following code unsuccessfully
#!/bin/zsh
# to copy files from my server to my folder
rsync -Pav $Masi:~/private/ ~/Dropbox/Courses/math/
# to copy files from my folder to my server
rsync -Pav ~/Dropbox/Courses/math $Masi:~/private/
I get the following error message
ssh: connect to host port 22: Connection refused
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: unexplained error (code 255) at io.c(600) [receiver=3.0.5]
ssh: connect to host port 22: Connection refused
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.5]
I have ssh keys in place so the connection should work, since I can use scp without problems.
How can you use rsync between my server and one of my Macs?
I used to do a lot of this. Just ran a test, a few suggestions.
Spell out your entire user#host pattern
Run the ssh connection sans the rsync first, you may need to first approve your fingerprint
You do not seem to pass a flag to protect extended attributes, this can yield broken files on OS X. If you do not need resource forks, you are OK, but most of the time you do need them.
My test case:
$ rsync -Pav ~/Desktop/ me#remote.example.com:~/rsyc-test
In that case, all the files within ~/Desktop were copied to the remote host, in my home dir. Since the directory 'rsyc-test' did not exist, it was made for me. I had a .app on my Desktop, it made it over, surprisingly, it works. Even some .webloc files made it and appear to work, though I do not trust it.
I would strongly suggest adding in the -E flag
-E, --extended-attributes
Apple specific option to copy extended attributes, resource
forks, and ACLs. Requires at least Mac OS X 10.4 or suitably
patched rsync.
I ran a new test, moved a Interarchy bookmark to my desktop, I know for a fact these break if they are copied sans resource forks. Running without the -E versus with the -E, there is a difference of 152 bytes in xfered data. The first file on the remote machine did not work, the second transfered file did work.
I can not help but notice in your example one of your paths is ~/Dropbox so this may all not matter, since DropBox, the app, does not at all support resource forks currently, though I hear there are plans to in the future.
You also are not sending in the --delete flag, if your end goal is a mirror of your data, you are not getting that, if your end goal is backups that continually grows, keeping everything that was ever on the source, the lack of --delete is good.
Other notes:
You can exclude those silly .DS_Store files
--exclude '.DS_Store'
You can also set rsync up in a way to be a true mirror, so you would not need to run your other command, see the man page for details.
My final working command to shove the Desktop of my laptop to a remote machine:
$ rsync -PEav --delete --exclude '.DS_Store' ~/Desktop/ me#remote.example.com:~/rsycn-test
Check "$Masi". Is that the hostname you are trying to reach?
Try the following command to debug it:
rsync -e 'ssh -v' -Pav $Masi:~/private/ ~/Dropbox/Courses/math/
The Connection refused usually happens when there is a connection issue to the remote (e.g. firewall).
In your case the problem is that $Masi variable is empty. If it's not variable, use Masi.
As per this error:
ssh: connect to host port 22: Connection refused
Notice the double space above after the host word.
the connect to host message doesn't say to which host, so you're trying to connect to empty host. So it sound like a typo in the host name.

Resources