Having problems running MPI programs with rsh (with Open MPI) - mpi

I have been running MPI programs on my testbed with ssh without problems. But when I wanted to switch to rsh to avoid encryption and run a program with mpirun, there is no output. I inspected the traffic with Wireshark and there is a TCP packet with PUSH flag, where the data says:
bash: orted: command not found
Open MPI is installed in the same directory in both machines, and they both have Ubuntu 16.04. I set it so there is no password for rsh needed within the testbed from the other machines. I can run programs on the remote machine with rsh, but not mpirun. Any idea what the problem could be?

Related

How to shut down a computer (host)

I know it sounds weird, but I have a case where Deno would need to shutdown its own host (and kill its own process therefore). Is this possible?
I am specifically needing this for linux (lubuntu), if that's relevant. I guess this requires sudo rights, which sucks but would be an option.
For those interested in details: I'm coding a minecraft server software and if the server has no player for 30 minutes, it will shut itself down to save some power. A raspberry PI that runs 24/7 anyways, has a wake on lan feature, so that it can boot again. After boot, the server manager software would automatically start as a linux service.
You can create a subprocess to do this:
await Deno.run({ cmd: ["shutdown", "-h", "now"] }).status();
Concepts
Deno is capable of spawning a subprocess via Deno.run.
--allow-run permission is required to spawn a subprocess.
Spawned subprocesses do not run in a security sandbox.
Communicate with the subprocess via the stdin, stdout and stderr streams.
Use a specific shell by providing its path/name and its string input switch, e.g. Deno.run({cmd: ["bash", "-c", "ls -la"]});
See also command line - Shutdown from terminal without entering password? - Ask Ubuntu for ideas on how to avoid needing sudo to call shutdown or alternative commands that you can invoke from Deno instead.
To extend on what #mfulton2 wrote, here is how I made it work, so that I did not need to start the program with sudo rights, but still was able to shut down the computer without the use of sudo outside or within the app.
Open or create the following file: sudo nano /etc/sudoer
Add the line username ALL = NOPASSWD: /sbin/shutdown
Add this line %admin ALL = NOPASSWD: /sbin/shutdown
In your deno script, write Deno.run({ cmd: ["shutdown", "-h", "now"]}).status();
Execute script!
Keep in mind that any experienced linux user would potentially tell you that this is very dangerous (it probably is) and that it might not be the very best way. But IMHO, the damage this can cause is minor enough, as it only affects the shutdown command.

Running mpi4py script without mpi

Normally I'd use mpiexec to run a process on multiple hosts like:
mpiexec -n 8 --hostfile hosts.txt python my_mpi_script.py
where my_mpi_script.py depends on mpi4py.
Supposing I couldn't run mpiexec or mpirun, how would I be able to run my_mpi_script.py on multiple hosts -- would this be possible by changing my script or execution environment?
Edit: I'm working with a system that runs the same command on many hosts. Normally, processes would discover each other on the local network rather than all be spawned by MPI. My current solution involves: checking which host I'm on and running mpiexec on exactly one of the hosts. This doesn't work well due to some networking limitations.
Thanks.

Gnu parallel - run jobs on localhost (cygwin) and remote machines

I am trying to get GNU parallel to split up a processing job between my machine (Win7 running Cygwin) and some remote machines (Linux).
But I can't seem to figure out the syntax to do both from the same command. I have tried using -S localhost, user#server1, user#server2 but (localhost) does not have sshd running, so this fails (the job does continue running on the remote hosts).
Thanks in advance.
From man parallel:
The sshlogin : is special, it means no ssh and will therefore run on the local computer.

R Parallel - connecting to remote cores

Working in R 2.14.1, on Windows 7
Using the package parallel in R, I'm trying to take advantage of cores outside of my local machine available on my network, where all remote hosts I am connecting to are identical Windows machines.
The basic form of the commands are as such to make the connection.
library(parallel)
#assume 8 cores per machine
cl<-makePSOCKcluster(c(rep("localhost", 8), rep("otherhost", 8)))
Of course, trying to debug these things can be pretty tricky, but here is where I'm at with it.
If I specify the manual = TRUE flag as below
cl<-makePSOCKcluster(c(rep("localhost", 8), rep("otherhost", 8)), manual=TRUE)
there are no problems connecting to the remote host, and running a parallel process. The computers have identical setups to the one that I am working on. Yet, when this manual flag is not set, the connection command hangs.
This seems to indicate to me that since the manual flag bypasses ssh to make the connection to the host, that ssh is the problem when manual=FALSE.
It is not guaranteed at the moment that the remote computers have ssh on them. The question is, given that I have all the pertinent windows login information for my remote hosts, and that I cannot change the settings on the remote computers, how would I connect to cores on remote machines with the package parallel in R without specifying manual = true?
Alternatively, if ssh must be installed for this to happen, let's assume all computers have ssh on them. How would I connect to cores on the remote machines without circumventing ssh?
If you need any more information please let me know, I appreciate the time.
UPDATE 1
8-26-14
Thanks to Steve Weston for his insights. I will provide an update with the exact tools and setup I use to get my system working when it's up and running.
Feel free to comment or post if you have anything else to add as to what may be the best route to go in remote connecting to a windows machine from a windows machine via makePSOCKcluster, where the manual flag is set to FALSE.
When creating a PSOCK cluster with manual=FALSE, the only way to start a worker on a remote machine is with "ssh", "rsh", or something command-line compatible, such as "plink" from PuTTY. The reason is that makePSOCKcluster starts the remote workers using the "system" function to execute commands of the form:
ssh -l user otherhost '/usr/lib/R/bin/Rscript' -e 'parallel:::.slaveRSOCK()' MASTER=myhost PORT=10187 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
You can confirm this by looking at the source code for the newPSOCKnode function in the file snowSOCK.R from the parallel package.
For this to work, the ssh-compatible command must be available on the local machine and a corresponding ssh daemon must be running on each of the remote machines, otherwise makePSOCKcluster will simply hang. I've found that installing a good, working ssh daemon is the difficult part on Windows.
Unfortunately, manual=TRUE is generally the easiest way to create a PSOCK cluster on multiple Windows machines.
Helle everyone, I had the same problem and I managed to solve it. It is June 2018 when I'm writing this answer, my OS is windows 10 and the R version is 3.2.2. It is surprising to see this problem still exists after 4 years. I hope it can be fixed in the following release.
Before you move on, please make sure you can access the server in cmd using ssh. I didn't put any password in my code because I have the private key, you don't need to do that and you will see the reason later.
Fixing The problem
File directory
Since the function makePSOCKcluster works when manually start the workers, my first trying is to let manual=TRUE, and see what's the output. Here is my result:
machineAddresses <-list(list(host='192.168.1.220',user='jeff'))
cl <- makePSOCKcluster(spec,manual = F)
> Manually start worker on 192.168.1.220 with
"C:/PROGRA~1/R/R-32~1.2/bin/x64/Rscript" -e
"parallel:::.slaveRSOCK()" MASTER=DESKTOP-U5JA32O PORT=11756
OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
Ok, Here is the first problem. The Rscript location is incorrect(The location of Rscript in the server). Generally, it locates in C:\Program Files. In my server is C:\Program Files\R\R-3.2.2\bin. So we need to correct them by adding more option to tell this stupid code where the Rscript is:
machineAddresses <-list(list(host='192.168.1.220',
user='jeff',rscript="C:/Program Files/R/R-3.3.2/bin/Rscript"))
CMD problem
Once you fix the directory problem, you will find that the code still hangs forever. Then we need to check if we can manually access the server in R, my code is:
system("ssh jeff#192.168.1.220")
> GetConsoleMode on STD_INPUT_HANDLE failed with 6
I honestly don't know what does this error mean, but we just need to fix that. Inspired by #Steve Weston, I decide to use PuTTY, so I install it, and change my code to:
machineAddresses <-list(list(host='192.168.1.220',user='jeff',rscript="C:/Program Files/R/R-3.3.2/bin/Rscript",rshcmd="plink -pw qwer"))
The option -pw means the password. Because I'm a newbie to PuTTY, I don't know how to let the private key automatically work in PuTTY. Therefore, I use the easiest way to deal with that: put your password! The above code is equivalent to the following in cmd:
plink -pw qwer jeff#192.168.1.220 Rscript -e parallel:::.slaveRSOCK() MASTER=DESKTOP-U5JA32O PORT=11063 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
And this is exactly what we will do if we manually create the workers. For those who are new like me, you need to add the PuTTY directory in PATH in your environmental variables to run plink. Here are my final codes:
machineAddresses <-list(list(host='192.168.1.220',user='jeff',rscript="C:/Program Files/R/R-3.3.2/bin/Rscript",rshcmd="plink -pw qwer"))
cl <- makePSOCKcluster(machineAddresses,manual = F)
I run it with no problem at all. In summary, the function makePSOCKcluster makes two mistakes:
Assuming a wrong R directory in the server(At least it should assume the same directory as my local computer, but it didn't! I don't know where that strange directory comes from)
Using ssh command to start the connection, which does not work in R. It works well in cmd, but not in R. I don't know the reason.
If you are still not able to use makePSOCKcluster, here is one trick: Try to connect to the server in R using system function first. It can give you some error code, that may instruct you where the problem is. Here is my debugging code:
system("plink -pw qwer jeff#192.168.1.220 Rscript -e parallel:::.slaveRSOCK() MASTER=DESKTOP-U5JA32O PORT=11063 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE")

Openmpi trouble with mpirun and ssh

Here's the thing. I've installed openmpi on two different computer, I already compile and run separetly the hello_world example on this machines and it's works well. But the problem is when I launched this command :
mpirun -hostfile hosts -n 3 hello_c
with in the hosts file : localhost and the ip of my other machine. Then, the program ask me my ssh password, and after I fill it nothing append like mpirun just crashed. My really problem is that I can't run an mpi process on two different computers trough ssh.
I want to precise that all openmpi binary and library are well set in path, even the hello_world.
update
I've already setup a pass_wordless ssh with rsa certificate, but it does'nt work too. I've launched mpirun in debug mode (-d) and I got this :
[baptiste#baptiste RE51]$ mpirun -d -hostfile hosts hello_c
[baptiste.thinkFed:02666] procdir: /tmp/openmpi-sessions-baptiste#baptiste.thinkFed_0/53471/0/0
[baptiste.thinkFed:02666] jobdir: /tmp/openmpi-sessions-baptiste#baptiste.thinkFed_0/53471/0
[baptiste.thinkFed:02666] top: openmpi-sessions-baptiste#baptiste.thinkFed_0
[baptiste.thinkFed:02666] tmp: /tmp
[roommateServer:01102] procdir: /tmp/openmpi-sessions-baptiste#roommateServer_0/53471/0/1
[roommateServer:01102] jobdir: /tmp/openmpi-sessions-baptiste#roommateServer_0/53471/0
[roommateServer:01102] top: openmpi-sessions-baptiste#roommateServer_0
[roommateServer:01102] tmp: /tmp
And nothing else, it stay here and I've to kill mpirun.
For information, I tried to lauchn mpirun hello_c trough ssh on the remote node with this command :
ssh roomServer mpirun hello_c
This work well... I definetly can't understand why it doesn't work on all nodes ..
Assuming your compiler is setup properly as well as your hosts file. Your problem is that you need to setup passwordless ssh between the two computers, otherwise you will get the error you described. This is because MPI needs to communicate quick and efficiently and not have messages be prompted for a password which would cause the messages to stall and the program to crash.

Resources