Open MPI + Scalasca :Can not run mpirun command with option --prefix - mpi

I have installed scalasca(scorep,cube,..)with Open MPI for peformance measurement . When i add an option :
" --prefix = /my-path" to mpirun, "scalasca - analyze" can not be executed ( Aborted ) .
My command :
scalasca -analyze mpirun -np 1 --host localhost --prefix /home/as/lib/bin /home/as/Documents/a.out
"/home/as/lib" is the installed Open MPI directory.
And error :
S=C=A=N: Abort: Target executable/home/as/lib/bin' is a directory!`
If without "--prefix" , it runs normally .But i need "-- prefix" option to run it on a cluster.
I have installed Open MPI on all of cluster machines with same path (/home/as/lib) .
So how to fix it???

Open MPI adds an implicit --prefix option if orterun (or any of its symlinks like mpirun, mpiexec, etc.) is called with full path. In other words:
$ /home/as/lib/bin/mpirun ...
is equivalent to:
$ mpirun --prefix /home/as/lib ...
If your really really need to pass that --prefix option, e.g. because Open MPI is installed on the cluster nodes in a different directory than on the front-end node, then quote the entire parameter:
$ scalasca -a mpirun -np 1 --host localhost "--prefix /home/as/lib" /path/to/execuable
The same applies to any other parameter to mpirun. You could even quote them all, just to be on the safe side:
$ scalasca -a mpirun "-np 1" "--host localhost" "--prefix /home/as/lib" /path/to/execuable
Hint: When building your own version of Open MPI, do it with --enable-orterun-prefix-by-default. That way --prefix will be added automatically even when orterun/mpirun/mpiexec is not called with the full path to it. Also, --enable-wrapper-rpath is a good choice of build option, which will prevent clashes with other versions of the library.

Related

Running dotnet through cron on RedHat fails

We have a dotnet core script we use to index some files. We leverage RedHat Software Collection so items like dotnet can tie into our RHEL setup.
To run the script, we do the following:
source scl_source enable rh-dotnet30
/opt/rh/rh-dotnet30/root/usr/bin/dotnet /d/h/fileprocessor.dll 1
We want to run this in cron, but we can not get it to work. We have tried the following:
Adding the 'source' command to the bash profile, but this doesn't seem to be reliable for us, and not run on the cron event.
Running this directly in cron
Running this as a shell script in cron
We are at a loss, it seems we can never get the two commands to work together. If we don't include the source command, even if in our profile, it will not run and gives us the error " It was not possible to find any installed .NET Core SDKs
Did you mean to run .NET Core SDK commands? Install a .NET Core SDK from:
https://aka.ms/dotnet-download"
The following works for me. I am using rh-dotnet31 (.NET Core 3.1) though, since rh-dotnet30 (.NET Core 3.0) is out of support:
Install packages:
$ sudo yum install rh-dotnet31 -y
Start at a known directory
$ cd ~
Create a directory for .NET Core source code to use
$ mkdir hello
$ cd hello
Create a simple application for testing
$ scl enable rh-dotnet31 bash
$ dotnet new console
$ dotnet publish
$ exit # this exits from the subshell started from scl enable command above
Copy the build over to a separate directory where we can run it
$ cp -a bin/Debug/netcoreapp3.1/publish ../hello-bin
Create the script that cron will invoke
$ cd ~
And put this in a ./test.sh file:
#!/bin/bash
echo "test.sh running now...."
source scl_source enable rh-dotnet31
dotnet $HOME/hello-bin/hello.dll 1
You could probably even combine the last two lines (source... and dotnet ...) into scl enable rh-dotnet31 -- dotnet $HOME/hello-bin/hello.dll 1
Then make it executable:
$ chmod +x ./test.sh
Set up the crontab file
$ crontab -e
And then add the line below in this file. This one runs the script every minute.
* * * * * $HOME/test.sh >> $HOME/test.cron.log 2>&1
On my machine, cron is running, so I now see the output of the cron job in the log file after a few minutes:
$ tail -f test.cron.log
test.sh running now....
Hello World!
test.sh running now....
Hello World!
test.sh running now....
Hello World!
test.sh running now....
Hello World!
test.sh running now....
Hello World!
The issue we ran into for this was that the runtime had only been installed, and not the SDK. Once the SDK was installed, which included many other dependencies, it just worked.

Run .exe on remote machine with PSEXEC

I am trying to run an .exe that sits on a remote machine with PSEXEC, but it does not run properly.
The exe is a converted python script:
usage: WindowsUpdates3.exe [-h] [-v] [-m] [-i] [-u] [-r] [-s SKIP]
List, download and update Windows clients
optional arguments:
-h, --help show this help message and exit
-v, --version Show program version and exit
-m, --list-missing List missing updates
-i, --list-installed List installed updates
-u, --update List and install missing updates
-r, --reboot Reboot after installing updates if needed
-s SKIP, --skip SKIP Skips these KB numbers
I am able to run the .exe as intended with a PSEXEC console session:
PSEXEC \\<hostname> cmd
Navigate to the exe my.exe -i, run it and it works the same way if I execute it on the machine locally.
When I try to execute the file directly some fucntionality does not work
PSEXEC.exe \\<hostname> -h "C:\WindowsUpdates\WindowsUpdates3.exe" -i
...
C:\WindowsUpdates\WindowsUpdates3.exe exited on <hostname> with error code 0.
I am able to get the help-menu (-h) and the version (-v) of the exe with the above command.
The other arguments don't return anything but code 0 and the --update argument throws a com_error: -2147024891, which translates to access denied...
How can this be as I have the same privileges as if I spawn a cmd terminal?

Start tcsh in a specific directory

Does tcsh support launching itself in a remote directory via an argument?
The setup I am dealing with does not allow me to chdir to the remote directory before invoking tcsh, and I'd like to avoid having to create a .sh file for this workflow.
Here are the available arguments I see for v6.19:
> tcsh --help
tcsh 6.19.00 (Astron) 2015-05-21 (x86_64-unknown-Linux) options wide,nls,dl,al,kan,rh,color,filec
-b file batch mode, read and execute commands from 'file'
-c command run 'command' from next argument
-d load directory stack from '~/.cshdirs'
-Dname[=value] define environment variable `name' to `value' (DomainOS only)
-e exit on any error
-f start faster by ignoring the start-up file
-F use fork() instead of vfork() when spawning (ConvexOS only)
-i interactive, even when input is not from a terminal
-l act as a login shell, must be the only option specified
-m load the start-up file, whether or not owned by effective user
-n file no execute mode, just check syntax of the following `file'
-q accept SIGQUIT for running under a debugger
-s read commands from standard input
-t read one line from standard input
-v echo commands after history substitution
-V like -v but including commands read from the start-up file
-x echo commands immediately before execution
-X like -x but including commands read from the start-up file
--help print this message and exit
--version print the version shell variable and exit
This works, but is suboptimal because it launches two instances of tcsh:
tcsh -c 'cd /tmp && tcsh'

Why OpenMPI uses a different server given a different -n setting?

I am testing out OpenMPI, provided and compiled by another user, (I am using soft link to his directories for all bin, include, etc - all the mandatory directories) but I ran into this weird thing:
First of all, if I ran mpirun with -n setting <= 10, I can run this below. testrunmpi.py simply prints out "run." from each core.
# I am in serverA.
bash-3.2$ /home/karl/bin/mpirun -n 10 ./testrunmpi.py
run.
run.
run.
run.
run.
run.
run.
run.
run.
run.
However, when I tried running -n more than 10, I will run into this:
bash-3.2$ /home/karl/bin/mpirun -n 24 ./testrunmpi.py
karl#serverB's password: Could not chdir to home directory /home/karl: No such file or directory
bash: /home/karl/bin/orted: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 19203) died unexpectedly with status 127 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
bash-3.2$
bash-3.2$
Permission denied, please try again.
karl#serverB's password:
Permission denied, please try again.
karl#serverB's password:
I see that the work is dispatched to serverB, while I was on serverA. I don't have any account on serverB. But if I invoke mpirun -n <= 10, the work will be on serverA.
This is strange, so I checked out /home/karl/etc/openmpi-default-hostfile, and tried set the following:
serverA slots=24 max_slots=24
serverB slots=0 max_slots=32
But the problem persists and still gives out the same error message above. What must I do in order to have my program run on serverA only?
The default hostfile in Open MPI is system-wide, i.e. its location is determined while the library is being built and installed and there is no user-specific version of it. The actual location can be obtained by running the ompi_info command like this:
$ ompi_info --param orte orte | grep orte_default_hostfile
MCA orte: parameter "orte_default_hostfile" (current value: <LOOK HERE>, data source: default value)
You can override the list of hosts in several different ways. First, you can provide your own hostfile via the -hostfile option to mpirun. If so, you don't have to put hosts with zero slots inside it - simply omit machines that you have no access to. For example:
localhost slots=10 max_slots=10
serverA slots=24 max_slots=24
You can also change the path to the default hostfile by setting the orte_default_hostfile MCA parameter:
$ mpirun --mca orte_default_hostfile /path/to/your/hostfile -n 10 executable
Instead of passing each time the --mca option, you can set the value in an exported environment variable called OMPI_MCA_orte_default_hostfile. This could be set in your shell's dot-rc file, e.g. in .bashrc if using Bash.
You can also specify the list of nodes directly via the -H (or -host) option.

error on running mpi job

I'm trying to run a MPI job on a cluster with torque and openmpi 1.3.2 installed and I'm always getting the following error:
"mpirun was unable to launch the specified application as it could not find an executable:
Executable: -p
Node: compute-101-10.local
while attempting to start process rank 0."
I'm using the following script to do the qsub:
#PBS -N mphello
#PBS -l walltime=0:00:30
#PBS -l nodes=compute-101-10+compute-101-15
cd $PBS_O_WORKDIR
mpirun -npersocket 1 -H compute-101-10,compute-101-15 /home/username/mpi_teste/mphello
Any idea why this happens?
What I want is to run 1 process in each node (compute-101-10 and compute-101-15). What am I getting wrong here?
I've already tried several combinations of the mpirun command, but either the program runs on only one node or it gives me the above error...
Thanks in advance!
The -npersocket option did not exist in OpenMPI 1.2.
The diagnostics that OpenMPI reported
mpirun was unable to launch the specified application as it could not
find an executable: Executable: -p
is exactly what mpirun in OpenMPI 1.2 would say if called with this option.
Running mpirun --version will determine which version of OpenMPI is default on the compute nodes.
The problem is that the -npersocket flag is only supported by Open MPI 1.3.2 and the cluster where I'm running my code only has Open MPI 1.2 which doesn't support that flag.
A possible way around is to use the flag -loadbalance and specify the nodes where i want the code to run with the flag -H node1,node2,node3,... like this:
mpirun -loadbalance -H node1,node2,...,nodep -np number_of_processes program_name
that way each node will run number_of_processes/p processes, where p the number of nodes where the processes will be run.

Resources