I'm trying to run my openmpi project on 2 hosts. If I run it on my machine with
mpirun -np 2 ./test
It works, but when I run the command
mpirun -host hostaname1,hostname2 -np 2 ./test
It doesn't work. It looks like it might be deadlocked, but the shell doesn't give me any error. The 2 hosts are located in the same LAN connected by an ethernet cable. Both OSes are Ubuntu 14.04 x86_64. One runs on a bootable USB and one directly on the hard disk. The executable is located on both machines.
Related
I wanted to profile an MPI program and read the following code from pgprof manual.
mpirun -np 2 -host c0-0,c0-1 pgprof -o output.%h.%p.%q{OMPI_COMM_WORLD_RANK} a.out
I couldn't understand the command -host c0-0,c0-1. I checked the openMPI manual get the following.
-H, -host, --host <host1,host2,...,hostN>
List of hosts on which to invoke processes.
Ok, so the question comes down what does c0-0, c0-1 come from. I suppose it means hosts, but where does it come from and what does it mean to set it to c0-0, c0-1?
Normally I'd use mpiexec to run a process on multiple hosts like:
mpiexec -n 8 --hostfile hosts.txt python my_mpi_script.py
where my_mpi_script.py depends on mpi4py.
Supposing I couldn't run mpiexec or mpirun, how would I be able to run my_mpi_script.py on multiple hosts -- would this be possible by changing my script or execution environment?
Edit: I'm working with a system that runs the same command on many hosts. Normally, processes would discover each other on the local network rather than all be spawned by MPI. My current solution involves: checking which host I'm on and running mpiexec on exactly one of the hosts. This doesn't work well due to some networking limitations.
Thanks.
I have been running MPI programs on my testbed with ssh without problems. But when I wanted to switch to rsh to avoid encryption and run a program with mpirun, there is no output. I inspected the traffic with Wireshark and there is a TCP packet with PUSH flag, where the data says:
bash: orted: command not found
Open MPI is installed in the same directory in both machines, and they both have Ubuntu 16.04. I set it so there is no password for rsh needed within the testbed from the other machines. I can run programs on the remote machine with rsh, but not mpirun. Any idea what the problem could be?
I set up MPICH3 (mpich-3.1.3) on my notebook(Intel Core i5) and a slave processor running on ARM Cortex15 processor and both running Ubuntu 14.04 OS with ssh keygen setup for free communication.
I have installed mpich3 in the folder which is shared between the cluster through nfs.
I have exported the path from my master server only.
The installation went well and i tried out the following command on my master node alone which runs fine:
mpiexec -n 2 ./cpi
Process 0 of 2 is on MingF
Process 1 of 2 is on MingF
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000182
But when I try running on my slave and master, then i get this error and it hangs:
mpiexec -f hosts -n 2 ./cpi
bash: /mirror/mpich3/bin/hydra_pmi_proxy: cannot execute binary file: Exec format error
It hangs there until i press 'Ctrl + C' to break out of it.
I am guessing its because of the change in processor type but I may be wrong. could someone help me out?
You cannot run the same executable on such different architectures as x86 and ARM. Compile it separately on both machines and pay attention to the endianness of the ARM machine.
Here's the thing. I've installed openmpi on two different computer, I already compile and run separetly the hello_world example on this machines and it's works well. But the problem is when I launched this command :
mpirun -hostfile hosts -n 3 hello_c
with in the hosts file : localhost and the ip of my other machine. Then, the program ask me my ssh password, and after I fill it nothing append like mpirun just crashed. My really problem is that I can't run an mpi process on two different computers trough ssh.
I want to precise that all openmpi binary and library are well set in path, even the hello_world.
update
I've already setup a pass_wordless ssh with rsa certificate, but it does'nt work too. I've launched mpirun in debug mode (-d) and I got this :
[baptiste#baptiste RE51]$ mpirun -d -hostfile hosts hello_c
[baptiste.thinkFed:02666] procdir: /tmp/openmpi-sessions-baptiste#baptiste.thinkFed_0/53471/0/0
[baptiste.thinkFed:02666] jobdir: /tmp/openmpi-sessions-baptiste#baptiste.thinkFed_0/53471/0
[baptiste.thinkFed:02666] top: openmpi-sessions-baptiste#baptiste.thinkFed_0
[baptiste.thinkFed:02666] tmp: /tmp
[roommateServer:01102] procdir: /tmp/openmpi-sessions-baptiste#roommateServer_0/53471/0/1
[roommateServer:01102] jobdir: /tmp/openmpi-sessions-baptiste#roommateServer_0/53471/0
[roommateServer:01102] top: openmpi-sessions-baptiste#roommateServer_0
[roommateServer:01102] tmp: /tmp
And nothing else, it stay here and I've to kill mpirun.
For information, I tried to lauchn mpirun hello_c trough ssh on the remote node with this command :
ssh roomServer mpirun hello_c
This work well... I definetly can't understand why it doesn't work on all nodes ..
Assuming your compiler is setup properly as well as your hosts file. Your problem is that you need to setup passwordless ssh between the two computers, otherwise you will get the error you described. This is because MPI needs to communicate quick and efficiently and not have messages be prompted for a password which would cause the messages to stall and the program to crash.