Bespoke affinity maps (process bindings) in mpich - mpi

I am implementing an application using MPICH (sudo apt get mpich) on Linux (Ubuntu).
My current solution looks like this:
HYDRA_TOPO_DEBUG=1 mpiexec.hydra -n 3 -bind-to core:1 MyApp
...
process 0 binding: 10001000
process 1 binding: 01000100
process 2 binding: 00100010
What I want, however, is assigning one process to 4 cores, while the other two are assigned to 2. I want an affinity map that looks like this:
process 0 binding: 11001100
process 1 binding: 00100010
process 2 binding: 00010001
Using SMPD on Windows, I was able to obtain the required result using sth like this:
mpiexec -n 1 -host localhost --bind-to core:2 MyApp : -n 2 -host localhost --bind-to core:1 MyApp
This however does not work with Hydra. I read every manual by now and would be happy regarding any help - even if its another hydra manual that I did not read yet. Cheers!

The "user" keyword can be used to assign logical cores manually.
Hence, one can write:
HYDRA_TOPO_DEBUG=1 mpiexec.hydra -n 3 -bind-to user:0+1+4+5,2+6,3+7 MyApp
Then, I obtain:
process 0 binding: 11001100
process 1 binding: 00100010
process 2 binding: 00010001

Related

eucalyptus-node-keygen.service is showing failed on node controller

i have installed node controller on centos 7. And I am running command systemctl and it is showing that eucalyptus-node service is active and running but eucalyptus-node-keygen.service is failed. How do I fix this issue?
The eucalyptus-node-keygen.service generates keys that are used with instance migration. The service runs conditionally to generate the keys when required, if keys are present then they do not need to be generated.
# systemctl cat eucalyptus-node-keygen.service | grep Condition
ConditionPathExists=|!/etc/pki/libvirt/servercert.pem
#
# stat -t /etc/pki/libvirt/servercert.pem
/etc/pki/libvirt/servercert.pem 1298 8 81a4 0 0 fd00 833392 1 0 0 1582596904 1582596904 1582596904 0 4096 system_u:object_r:cert_t:s0
so typically this service will show "start condition failed" which is not an error, and no action is required.

MPI: Pin each instance to certain cores on each node

I want to execute several instances of my program with OpenMPI 2.11. Each instance runs on its own node (-N 1) on my cluster. This works fine. I now want to pin each program-instance to the first 2 cores of its node. To do that, it looks like I need to use rankfiles. Here is my rankfile:
rank 0=+n0 slot=0-1
rank 1=+n1 slot=0-1
This, in my opinion, should limit each program-instance to cores 0 and 1 of the local machine it runs on.
I execute mpirun like so:
mpirun -np 2 -N 1 -rf /my/rank/file my_program
But mpirun fails with this error without even executing my program:
Conflicting directives for mapping policy are causing the policy
to be redefined:
New policy: RANK_FILE
Prior policy: UNKNOWN
Please check that only one policy is defined.
What's this? Did I make a mistake in the rankfile?
Instead of using a rankfile, simply use a hostfile:
n0 slots=n max_slots=n
n1 slots=n max_slots=n
Then tell Open MPI to map one process per node with two cores per process using:
mpiexec --hostfile hostfile --map-by ppr:1:node:PE=2 --bind-to core ...
ppr:1:node:PE=2 reads as: 1 process per resource; resource type is node; 2 processing elements per process. You can check the actual binding by adding the --report-bindings option.

How do I get back to the running instance of riak-shell?

I was in riak-shell when ssh lost its connection to the server. After reconnecting, I do the following:
sudo riak-shell
and get:
An instance of riak-shell is already running
So, I restarted the riak node in question. This did not seem to solve the problem. I do not see anything using ps -aux to kill. According to the docs, only one instance can run at a time. That makes sense, but when I run riak-shell from another node and try to connect to any node, I now get the following:
Error: invalid function call : connection_EXT:connect ["riak#<<<ip_address_elided>>>"]
You can connect to a specific node (whether in your riak_shell.config
or not) by typing 'connect "dev1#127.0.0.1";' substituting your
node name for dev1.
You may need to change the Erlang cookie to do this.
See also the 'reconnect' command.
Unhandled message received is {#Ref<0.0.0.135>,disconnected}
riak-shell(3)>
I have not changed the cookies during this process, and the cookie appears to be the same (at least in /etc/riak/riak_shell.config). (I am running the Riak TS AMI on AWS.)
riak-shell runs in its own Erlang VM - entirely separate from the riak node
(You don't need to run riak-shell from the machine your node is on - it uses the normal riak-erlang-client to talk to riak)
If you you are on a Linux do ps aux | grep riak_shell_app it will give you the process number you need to kill that instance:
08:30:45:~ $ ps aux | grep riak_shell_app
vagrant 4671 0.0 0.3 493260 34884 pts/4 Sl+ Aug17 0:03 /home/vagrant/riak_ee/dev/dev1/erts-5.10.3/bin/beam.smp -- -root /home/vagrant/riak_ee/dev/dev1 -progname erl -- -home /home/vagrant -- -boot /home/vagrant/riak_ee/dev/dev1/releases/2.1.1/start_clean -run riak_shell_app boot debug_off /home/vagrant/riak_ee/dev/dev1/bin/../log/riak_shell/riak_shell -noshell -config /home/vagrant/riak_ee/dev/dev1/bin/../etc/riak
I wrote a good chunk of it so let me know how you got on:
https://github.com/basho/riak_shell/graphs/contributors

What does -2 do when using psftp.exe?

I've just come across this line of code in a .bat file:
psftp -2 -l XXXXX 195.2.37.69 -pw XXXXX -P 10022 -b c:\sftp\sendfile.bat -v -bc -be
The help tells me what all the parameters do except for the -2.
Can anybody tell me what the -2 does?
-2 or -1 forces the use of a the corresponding SSH protocol version, so in your example it enforces the use of SSH2. You can also specify -4 or -6 which forces the use of respectively IPv4 or IPv6.
Quoting the psftp documentation:
3.8.3.16 -1 and -2: specify an SSH protocol version
The -1 and -2 options force PuTTY to use version 1 or version 2 of the
SSH protocol. These options are only meaningful if you are using SSH.
These options are equivalent to selecting your preferred SSH protocol
version as ‘1 only’ or ‘2 only’ in the SSH panel of the PuTTY
configuration box (see section 4.18.4).
So the -2 forces SSH version 2.
In older versions, the psftp tried the SSH version 2 and fell back to the SSH version 1, if the server did not support the version 2. With -2, the fallback to an insecure version 1 did not happen and connection is abandoned. The latest versions do not fall back by default anymore. Nowadays, no serious SSH/SFTP server even supports the version 1 anyway.

Running MPI benchmarks on multiple nodes?

I am trying to run MPI benchmarks on four nodes, but it's always taking only one node. The command I use is as below:
mpirun -genv I_MPI_DEBUG=4 -np 4 -host mac-snb19,mac-snb20,mac-snb21,mac-snb22 IMB-MPI1 PingPong
or
mpirun -genv I_MPI_DEBUG=4 -np 4 --hosts mac-snb19,mac-snb20,mac-snb21,mac-snb22 IMB-MPI1 PingPong
Here, mac-snb19, mac-snb20, mac-snb21 and mac-snb22 are the nodes. Am I doing something wrong? Because the output I get shows that only mac-snb19 is used, and I also check by logging into the node, and only in mac-snb19 I can see that MPI processes are running, in the others it's not the case. The partial output is here which shows what I said:
[0] MPI startup(): 0 2073 mac-snb19 {0,1,2,3,16,17,18,19}
[0] MPI startup(): 1 2074 mac-snb19 {4,5,6,7,20,21,22,23}
[0] MPI startup(): 2 2075 mac-snb19 {8,9,10,11,24,25,26,27}
[0] MPI startup(): 3 2077 mac-snb19 {12,13,14,15,28,29,30,31}
benchmarks to run PingPong
Could you advise me what mistake I am doing here?
Thanks
With the Hydra process manager, you could either add -perhost 1 to force one process per host or create a machine file with the following content:
mac-snb19:1
mac-snb20:1
mac-snb21:1
mac-snb22:1
and then use it like:
mpirun -genv I_MPI_DEBUG=4 -machinefile mfname -np 4 IMB-MPI1 PingPong
where mfname is the name of the machine file. :1 instructs Hydra to provide only one slot per host.

Resources