Why OpenMPI uses a different server given a different -n setting? - unix

I am testing out OpenMPI, provided and compiled by another user, (I am using soft link to his directories for all bin, include, etc - all the mandatory directories) but I ran into this weird thing:
First of all, if I ran mpirun with -n setting <= 10, I can run this below. testrunmpi.py simply prints out "run." from each core.
# I am in serverA.
bash-3.2$ /home/karl/bin/mpirun -n 10 ./testrunmpi.py
run.
run.
run.
run.
run.
run.
run.
run.
run.
run.
However, when I tried running -n more than 10, I will run into this:
bash-3.2$ /home/karl/bin/mpirun -n 24 ./testrunmpi.py
karl#serverB's password: Could not chdir to home directory /home/karl: No such file or directory
bash: /home/karl/bin/orted: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 19203) died unexpectedly with status 127 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
bash-3.2$
bash-3.2$
Permission denied, please try again.
karl#serverB's password:
Permission denied, please try again.
karl#serverB's password:
I see that the work is dispatched to serverB, while I was on serverA. I don't have any account on serverB. But if I invoke mpirun -n <= 10, the work will be on serverA.
This is strange, so I checked out /home/karl/etc/openmpi-default-hostfile, and tried set the following:
serverA slots=24 max_slots=24
serverB slots=0 max_slots=32
But the problem persists and still gives out the same error message above. What must I do in order to have my program run on serverA only?

The default hostfile in Open MPI is system-wide, i.e. its location is determined while the library is being built and installed and there is no user-specific version of it. The actual location can be obtained by running the ompi_info command like this:
$ ompi_info --param orte orte | grep orte_default_hostfile
MCA orte: parameter "orte_default_hostfile" (current value: <LOOK HERE>, data source: default value)
You can override the list of hosts in several different ways. First, you can provide your own hostfile via the -hostfile option to mpirun. If so, you don't have to put hosts with zero slots inside it - simply omit machines that you have no access to. For example:
localhost slots=10 max_slots=10
serverA slots=24 max_slots=24
You can also change the path to the default hostfile by setting the orte_default_hostfile MCA parameter:
$ mpirun --mca orte_default_hostfile /path/to/your/hostfile -n 10 executable
Instead of passing each time the --mca option, you can set the value in an exported environment variable called OMPI_MCA_orte_default_hostfile. This could be set in your shell's dot-rc file, e.g. in .bashrc if using Bash.
You can also specify the list of nodes directly via the -H (or -host) option.

Related

How to avoid duplicating workers with Symfony local server?

I found out that I have quite many Symfony local web server workers registered (around ~35), and the number keeps growing. I usually just start server with symfony serve and then kill it (Ctrl + \) when no longer needed. Apparently killing it leaves a worker behind, as seen in symfony server:status. Running symfony serve again just creates a new worker.
symfony server:status output:
Local Web Server
Not Running
Workers
PID 6327: /usr/bin/php7.2 -S 127.0.0.1:43653 -d variables_order=EGPCS /home/mindaugas/.symfony/php/83247c3521c3ac3990bf3f823ef473db0a9445e1-router.php
PID 24596: /usr/bin/php7.2 -S 127.0.0.1:37789 -d variables_order=EGPCS /home/mindaugas/.symfony/php/83247c3521c3ac3990bf3f823ef473db0a9445e1-router.php
PID 6575: /usr/bin/php7.4 -S 127.0.0.1:42505 -d variables_order=EGPCS /home/mindaugas/.symfony/php/83247c3521c3ac3990bf3f823ef473db0a9445e1-router.php
PID 41550: /usr/bin/php7.4 -S 127.0.0.1:36313 -d variables_order=EGPCS /home/mindaugas/.symfony/php/83247c3521c3ac3990bf3f823ef473db0a9445e1-router.php
...
Environment Variables
None
So my questions regarding this:
#1: is it possible to quickly kill the server? I assume symfony server:stop is more correct way, but that requires additional console window and entering the command.
#2: how to kill those workers that are registered from previous sessions? Trying e.g. kill 6327 says that there's no such process. Also they're not gone after system restart.
Those extra workers are bothering me because for each one of them the server log output in the console is duplicated. So right now on each request to the server I get around 3k lines of log output in the console. Which makes it pretty useless.
I have the same problem after upgrading to Symfony CLI version v4.19.0...
My (very) bad workaround:
rm /home/myusername/.symfony/var/83247c3521c3ac3990bf3f823ef473db0a9445e1/*
Edit: this answer is not accurate, as hinted a by #CrSrr's answer above.
The symfony command adds data to both the ./log and ./var directories. Deleting entries in only one of those does not remove the appearance of non-existent workers in the project directory. I was fooled by checking status in a directory where the server:start had never been run.
A bug report is on file with symfony here.
Just faced with a similar issue. The PIDs were not to be found.
PS G:\workspace\joined> symfony server:status
Local Web Server
Not Running
Workers
PID 7732: C:\php\php-cgi.exe -b 63801 -d error_log=C:\Users\George\.symfony\log\e79ad2f4b30a2f0a35c3b5ab08772770b382a3d6.log
PID 19324: C:\php\php-cgi.exe -b 62927 -d error_log=C:\Users\George\.symfony\log\e79ad2f4b30a2f0a35c3b5ab08772770b382a3d6.log
PID 17968: C:\php\php-cgi.exe -b 50197 -d error_log=C:\Users\George\.symfony\log\e79ad2f4b30a2f0a35c3b5ab08772770b382a3d6.log
PID 14040: C:\php\php-cgi.exe -b 55075 -d error_log=C:\Users\George\.symfony\log\e79ad2f4b30a2f0a35c3b5ab08772770b382a3d6.log
Environment Variables
None
In the Windows OS, the log files are kept in %USERPROFILE%.symfony. There's most likely a similar location in your home directory. Deleting all the contents of that directory allowed a new Windows Terminal app to show:
PS G:\workspace> symfony server:status
Local Web Server
Not Running
Workers
No Workers
Environment Variables
None
do symfony serve:stopto stopped the server and do against symfony serve to run the server good luck

Error happens when trying to umount the Lustre file system

When I umount Lustre FS it displays:
[root#cn17663-ens4 mnt]# umount /mnt/lustre
umount: /mnt/lustre: target is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
and if I add the force option -f it gives the same result:
[root#cn17663-ens4 mnt]# umount /mnt/lustre -f
umount: /mnt/lustre: target is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
When I try to list the directory it gives me :
[root#cn17663-ens4 mnt]# ls
ls: cannot access lustre: Cannot send after transport endpoint shutdown
lustre
and I cannot find what the reason is and cannot solve it.
Did you actually try running lsof /mnt/lustre (as the error message recommends) to see what is using the filesystem? This problem is not unique to Lustre, but true of any local filesystem as well - if there is a process using the filesystem (current working directory or open file) then it can't be unmounted until that process stops using it (cd out of /mnt/lustre or close the open file(s)).
I find I can use umount -l /mnt/xx to solve this problem!

Run multiple instances of RStudio in a web browser

I have RStudio server installed on a remote aws server (ubuntu) and want to run several projects at the same time (one of which takes lots of time to finish). On Windows there is a simple GUI solution like 'Open Project in New Window'. Is there something similar for rstudio server?
Simple question, but failed to find a solution except this related question for Macs, which offers
Run multiple rstudio sessions using projects
but how?
While running batch scripts is certainly a good option, it's not the only solution. Sometimes you may still want interactive use in different sessions rather than having to do everything as batch scripts.
Nothing stops you from running multiple instances of RStudio server on your Ubuntu server on different ports. (I find this particularly easy to do by launching RStudio through docker, as outlined here. Because an instance will keep running even when you close the browser window, you can easily launch several instances and switch between them. You'll just have to login again when you switch.
Unfortunately, RStudio-server still prevents you having multiple instances open in the browser at the same time (see the help forum). This is not a big issue as you just have to log in again, but you can work around it by using different browsers.
EDIT: Multiple instances are fine, as long as they are not on the same browser, same browser-user AND on the same IP address. e.g. a session on 127.0.0.1 and another on 0.0.0.0 would be fine. More importantly, the instances keep on running even if they are not 'open', so this really isn't a problem. The only thing to note about this is you would have to log back in to access the instance.
As for projects, you'll see you can switch between projects using the 'projects' button on the top right, but while this will preserve your other sessions I do not think the it actually supports simultaneous code execution. You need multiple instances of the R environment running to actually do that.
UPDATE 2020 Okay, it's now 2020 and there's lots of ways to do this.
For running scripts or functions in a new R environment, check out:
the callr package
The RStudio jobs panel
Run new R sessions or scripts from one or more terminal sessions in the RStudio terminal panel
Log out and log in to the RStudio-server as a different user (requires multiple users to be set up in the container, obviously not a good workflow for a single user but just noting that many different users can access the same RStudio server instance no problem.
Of course, spinning up multiple docker sessions on different ports is still a good option as well. Note that many of the ways listed above still do not allow you to restart the main R session, which prevents you from reloading installed packages, switching between projects, etc, which is clearly not ideal. I think it would be fantastic if switching between projects in an RStudio (server) session would allow jobs in the previously active project to keep running in the background, but have no idea if that's in the cards for the open source version.
Often you don't need several instances of Rstudio - in this case just save your code in .R file and launch it using ubuntu command prompt (maybe using screen)
Rscript script.R
That will launch a separate R session which will do the work without freezing your Rstudio. You can pass arguments too, for example
# script.R -
args <- commandArgs(trailingOnly = TRUE)
if (length(args) == 0) {
start = '2015-08-01'
} else {
start = args[1]
}
console -
Rscript script.R 2015-11-01
I think you need R Studio Server Pro to be able to log in with multiple users/sessions.
You can see the comparison table below for reference.
https://www.rstudio.com/products/rstudio-server-pro/
Installing another instance of rstudio server is less than ideal.
Linux server admins, fear not. You just need root access or a kind admin.
Create a group to use: groupadd Rwarrior
Create an additional user with same home directory as your primary Rstudio login:
useradd -d /home/user1 user2
Add primary and new user into Rwarrior group:
gpasswd -a user2 Rwarrior
gpasswd -a user1 Rwarrior
Take care of the permissions for your primary home directory:
cd /home
chown -R user1:Rwarrior /home/user1
chmod -R 770 /home/user1
chmod g+s /home/user1
Set password for the new user:
passwd user2
Open a new browser window in incognito/private browsing mode and login to Rstudio with the new user you created. Enjoy.
I run multiple RStudio servers by isolating them in Singularity instances. Download the Singularity image with the command singularity pull shub://nickjer/singularity-rstudio
I use two scripts:
run-rserver.sh:
Find a free port
#!/bin/env bash
set -ue
thisdir="$(dirname "${BASH_SOURCE[0]}")"
# Return 0 if the port $1 is free, else return 1
is_port_free(){
port="$1"
set +e
netstat -an |
grep --color=none "^tcp.*LISTEN\s*$" | \
awk '{gsub("^.*:","",$4);print $4}' | \
grep -q "^$port\$"
r="$?"
set -e
if [ "$r" = 0 ]; then return 1; else return 0; fi
}
# Find a free port
find_free_port(){
local lower_port="$1"
local upper_port="$2"
for ((port=lower_port; port <= upper_port; port++)); do
if is_port_free "$port"; then r=free; else r=used; fi
if [ "$r" = "used" -a "$port" = "$upper_port" ]; then
echo "Ports $lower_port to $upper_port are all in use" >&2
exit 1
fi
if [ "$r" = "free" ]; then break; fi
done
echo $port
}
port=$(find_free_port 8080 8200)
echo "Access RStudio Server on http://localhost:$port" >&2
"$thisdir/cexec" \
rserver \
--www-address 127.0.0.1 \
--www-port $port
cexec:
Create a dedicated config directory for each instance
Create a dedicated temporary directory for each instance
Use the singularity instance mechanism to avoid that forked R sessions are adopted by PID 1 and stay around after the rserver has shut down. Instead, they become children of the Singularity instance and are killed when that shuts down.
Map the current directory to the directory /data inside the container and set that as home folder (this step might not be nessecary if you don't care about reproducible paths on every machine)
#!/usr/bin/env bash
# Execute a command in the container
set -ue
if [ "${1-}" = "--help" ]; then
echo <<EOF
Usage: cexec command [args...]
Execute `command` in the container. This script starts the Singularity
container and executes the given command therein. The project root is mapped
to the folder `/data` inside the container. Moreover, a temporary directory
is provided at `/tmp` that is removed after the end of the script.
EOF
exit 0
fi
thisdir="$(dirname "${BASH_SOURCE[0]}")"
container="rserver_200403.sif"
# Create a temporary directory
tmpdir="$(mktemp -d -t cexec-XXXXXXXX)"
# We delete this directory afterwards, so its important that $tmpdir
# really has the path to an empty, temporary dir, and nothing else!
# (for example empty string or home dir)
if [[ ! "$tmpdir" || ! -d "$tmpdir" ]]; then
echo "Error: Could not create temp dir $tmpdir"
exit 1
fi
# check if temp dir is empty (this might be superfluous, see
# https://codereview.stackexchange.com/questions/238439)
tmpcontent="$(ls -A "$tmpdir")"
if [ ! -z "$tmpcontent" ]; then
echo "Error: Temp dir '$tmpdir' is not empty"
exit 1
fi
# Start Singularity instance
instancename="$(basename "$tmpdir")"
# Maybe also superfluous (like above)
rundir="$(readlink -f "$thisdir/.run/$instancename")"
if [ -e "$rundir" ]; then
echo "Error: Runtime directory '$rundir' exists already!" >&2
exit 1
fi
mkdir -p "$rundir"
singularity instance start \
--contain \
-W "$tmpdir" \
-H "$thisdir:/data" \
-B "$rundir:/data/.rstudio" \
-B "$thisdir/.rstudio/monitored/user-settings:/data/.rstudio/monitored/user-settings" \
"$container" \
"$instancename"
# Delete the temporary directory after the end of the script
trap "singularity instance stop '$instancename'; rm -rf '$tmpdir'; rm -rf '$rundir'" EXIT
singularity exec \
--pwd "/data" \
"instance://$instancename" \
"$#"

SSH Issue on AIX 6.1

I recently upgraded the openssl version on AIXX 6.1 server.
The install went fine.
But now I'm unable to start new ssh sessions from puty to the server and I'm getting the error "Connection Refused".
But I have one putty terminal open which is active.
I tried the command startsrc -s sshd and it returns a new pid but I'm not able to start new sessions.
I tried the following command too and it gives the foll error:
root:stud -> $ /usr/sbin/sshd -de
exec(): 0509-036 Cannot load program /usr/sbin/sshd because of the following errors:
0509-150 Dependent module /opt/freeware/lib/libcrypto.a(libcrypto.so.0) could not be loaded.
0509-152 Member libcrypto.so.0 is not found in archive
And sshd is inoperative.
root:stud -> $ lssrc -s sshd
Subsystem Group PID Status
sshd ssh inoperative
How can I resolve this issue.
I'm not sure how it worked the first time. That is odd. The error says you need the lib crypto library. Is it installed? i.e. what does
ls /opt/freeware/lib/libcrypto.a
return? If it exists, you want to try:
ar t /opt/freeware/lib/libcrypto.a
and you should see libcrypto.so.0 inside. My guess is one of those two will not be true and you need to install it. But it might be that libcrypto.so.0 will not load for its own reasons.
Is this the official ssh package for AIX or is it something you got somewhere else? (I just compile mine from the source from scratch but that's not easy sometimes).

Openmpi trouble with mpirun and ssh

Here's the thing. I've installed openmpi on two different computer, I already compile and run separetly the hello_world example on this machines and it's works well. But the problem is when I launched this command :
mpirun -hostfile hosts -n 3 hello_c
with in the hosts file : localhost and the ip of my other machine. Then, the program ask me my ssh password, and after I fill it nothing append like mpirun just crashed. My really problem is that I can't run an mpi process on two different computers trough ssh.
I want to precise that all openmpi binary and library are well set in path, even the hello_world.
update
I've already setup a pass_wordless ssh with rsa certificate, but it does'nt work too. I've launched mpirun in debug mode (-d) and I got this :
[baptiste#baptiste RE51]$ mpirun -d -hostfile hosts hello_c
[baptiste.thinkFed:02666] procdir: /tmp/openmpi-sessions-baptiste#baptiste.thinkFed_0/53471/0/0
[baptiste.thinkFed:02666] jobdir: /tmp/openmpi-sessions-baptiste#baptiste.thinkFed_0/53471/0
[baptiste.thinkFed:02666] top: openmpi-sessions-baptiste#baptiste.thinkFed_0
[baptiste.thinkFed:02666] tmp: /tmp
[roommateServer:01102] procdir: /tmp/openmpi-sessions-baptiste#roommateServer_0/53471/0/1
[roommateServer:01102] jobdir: /tmp/openmpi-sessions-baptiste#roommateServer_0/53471/0
[roommateServer:01102] top: openmpi-sessions-baptiste#roommateServer_0
[roommateServer:01102] tmp: /tmp
And nothing else, it stay here and I've to kill mpirun.
For information, I tried to lauchn mpirun hello_c trough ssh on the remote node with this command :
ssh roomServer mpirun hello_c
This work well... I definetly can't understand why it doesn't work on all nodes ..
Assuming your compiler is setup properly as well as your hosts file. Your problem is that you need to setup passwordless ssh between the two computers, otherwise you will get the error you described. This is because MPI needs to communicate quick and efficiently and not have messages be prompted for a password which would cause the messages to stall and the program to crash.

Resources