How to use Intel Pin on MPI code

How to use Intel Pin on MPI code - mpi

I am pretty new to MPI and Intel Pin. I already installed pin-2.13-62732-gcc.4.4.7-linux on my linux environment, I need t use this tool on MPI codes. foreexample I want to get the number of instruction(such as inscount0 which is already existed in pin) of MPI code (like imul.c). Would you tell me what I can do?

The least painful way I found is to use tau_pin. https://www.cs.uoregon.edu/research/tau/docs/old/re39.html

You can start analysis of your MPI application following way:
mpirun –np $NPROCS pin -t $PIN_TOOL -- $APP
It the same as in case of Valgrind: Using valgrind to spot error in mpi code

Related

How to compile and flash attiny88 with arduino-cli?

I can't understand, what device-name I should use for attiny88? For example, string for my old nano was arduino:avr:nano:cpu=atmega328old
I installed cores for attiny in arduino-cli, and test this board in ArduinoIDE and it works well, but I want to use arduino-cli.
(Sorry for my English)

Found the solution, this string named as FQBN. And I can get this FQBN for all my boards after exec command arduino-cli board listall

HCI_UART on NRF52840, attaching the device on a Yocto based Linux SBC errors out saying "Can't init device hci0: Cannot assign requested address (99)"

I am trying to Interface a BLE module based on Nordic's nrf52840 to a Yocto based SBC, to which all the BlueZ related packages have been added.
I have flashed Zephyr's sample hci_uart program onto the module. The module seems to run perfectly on my Linux PC (BlueZ version 5.48), whereas on the SBC(BlueZ version 5.54) it fails to get inited. Here's the error that comes when I use
root#rb-imx6:~# hciconfig hci0 up
root#rb-imx6:~# Can't init device hci0: Cannot assign requested address (99)
Can anyone please help me out on this?
Thanks in advance.

The error of assigning an address is caused by missing Linux kernel configuration options:
CONFIG_CRYPTO_USER
CONFIG_CRYPTO_USER_API
CONFIG_CRYPTO_USER_API_AEAD
CONFIG_CRYPTO_USER_API_HASH
CONFIG_CRYPTO_AES
CONFIG_CRYPTO_CCM
CONFIG_CRYPTO_AEAD
CONFIG_CRYPTO_CMAC
This is likely to happen with a self-built Buildroot or Yocto Embedded Linux system. If you run into this error, you should enable above options and recompile the kernel.
See the BlueZ requirements here: https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/README#n64
To see detailed debug output from BlueZ, run it with -d option:
bluetoothd -d

"A request was made to bind to that would result in binding more processes than cpus on a resource" mpirun command (for mpi4py)

I am running OpenAI baselines, specifically the Hindsight Experience Replay code. (However, I think this question is independent of the code and is an MPI-related one, hence why I'm posting on StackOverflow.)
You can see the README there but the point is, the command to run is:
python -m baselines.her.experiment.train --num_cpu 20
where the number of CPUs can vary and is for MPI.
I am successfully running the HER training script with 1-4 CPUs (i.e., --num_cpu x for x=1,2,3,4) on a single machine with:
Ubuntu 16.04
Python 3.5.2
TensorFlow 1.5.0
One TitanX GPU
The number of CPUs seems to be 8 as I have a quad-core i7 Intel processor with hyperthreading, and Python confirms that it sees 8 CPUs.
(py3-tensorflow) daniel#titan:~/baselines$ ipython
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import os, multiprocessing
In [2]: os.cpu_count()
Out[2]: 8
In [3]: multiprocessing.cpu_count()
Out[3]: 8
Unfortunately, when I run with 5 or more CPUs, I get this message blocking the code from running:
(py3-tensorflow) daniel#titan:~/baselines$ python -m baselines.her.experiment.train --num_cpu 5
--------------------------------------------------------------------------
A request was made to bind to that would result in binding more
processes than cpus on a resource:
Bind to: CORE
Node: titan
#processes: 2
#cpus: 1
You can override this protection by adding the "overload-allowed"
option to your binding directive.
--------------------------------------------------------------------------
And here's where I got lost. There's no error message or line of code that I need to fix. I am therefore unsure about where I even add overload-allowed in the code?
The way this code works at a high level is that it takes in this argument and uses the python subprocess module to run an mpirun command. However, checking mpirun --help on the command line doesn't reveal overload-allowed as a valid argument.
Googling this error message leads to questions in the openmpi repository, for instance:
https://github.com/open-mpi/ompi/issues/626 (seems to have died out without resolving issue)
https://github.com/open-mpi/ompi/issues/2158 (not sure how this relates to my issue, didn't get clear resolution)
But I'm not sure if it's an OpenMPI thing or an mpi4py thing?
Here's pip list in my virtual environment if it helps:
(py3.5-mpi-practice) daniel#titan:~$ pip list
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
decorator (4.2.1)
ipython (6.2.1)
ipython-genutils (0.2.0)
jedi (0.11.1)
line-profiler (2.1.2)
mpi4py (3.0.0)
numpy (1.14.1)
parso (0.1.1)
pexpect (4.4.0)
pickleshare (0.7.4)
pip (9.0.1)
pkg-resources (0.0.0)
pprintpp (0.3.0)
prompt-toolkit (1.0.15)
ptyprocess (0.5.2)
Pygments (2.2.0)
setuptools (20.7.0)
simplegeneric (0.8.1)
six (1.11.0)
traitlets (4.3.2)
wcwidth (0.1.7)
So, TL;DR:
How do I fix this error in my code?
If I add the "overload-allowed" thing, what happens? Is it safe?
Thanks!

overload-allowed is a qualifier that is passed to --bind-to parameter of mpirun (source).
mpirun ... --bind-to core:overload-allowed
Beware that hyperthreading thing is more about marketing than about performance bonuses.
Your i7 can actually have four silicon cores and four "logical" ones. The logical ones basically try to use resources of the silicon cores that are currently unused. The problem is that a good HPC program will use 100% of the CPU hardware, and hyperthreading won't have resources to successfully operate.
So, it is safe to "overload" "cores", but it's not a performance boost candidate #1.
Regarding the advice that the paper authors give about reproducing the results. In the best case less cpus just means slow learning. However, if learning doesn't converge to an expected value no matter how hyperparams are tweaked, then it is a reason to look closer at the proposed algorithm.
While IEEE754 computations do differ if done in different order, this difference should not play the crucial role.

The error message suggests that mpi4py is built on top of Open MPI.
By default, a slot is a core, but if you want a slot to be an hyperthread, then you should
mpirun --use-hwthread-cpus ...

Running Gmp on Stm32f4 discovery board

I discover the stm32f4 discovery board.
For the moment I'm able to use leds, button, and to communicate through serial port.
I'm now trying to use the Gmp library on this board.
I build the arm-none-eabi toolchain following these instructions: https://blog.tan-ce.com/gcc-bare-metal/
I configure Gmp with the following options:
./configure CC=arm-none-eabi-gcc CFLAGS="-nostartfiles --specs=nosys.specs -g" --host=arm-none-eabi --disable-assembly
My project compiles and links without any issue, but
when I try to initialise an mpz_t on the board with the following code:
mpz_t a;
mpz_init_set_str(a, "31", 10);
I fall into the HardFault_Handler (), my arm-none-eabi-gdb gives me:
(gdb) bt
#0 HardFault_Handler () at ./src/stm32f4xx_it.c:34
#1 <signal handler called>
#2 0x08016ade in __gmpn_fft_best_k (n=134358201, sqr=134358201) at mul_fft.c:151
#3 0x0801816e in __gmpn_mul_fft (op=0x80006f5 <HardFault_Handler>, pl=134219497, n=0x8022471 <Reset_Handler>, nl=537001984, m=0x80224b9 <WWDG_IRQHandler>, ml=134358201, k=134358201) at mul_fft.c:870
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
If someone has already run Gmp on a microcontroller I am very interested in the way to do it !

I finally found the solution, the cortex type must be specified.
For the stm32f4, add -mcpu=cortex-m4 to the CFLAGS solve the problem.
I use the toolchain available here: arm-none-eabi toolchain
The whole configuration command is:
./configure CC=arm-none-eabi-gcc CFLAGS="-nostartfiles --specs=nosys.specs -mcpu=cortex-m4" --host=arm-none-eabi --disable-assembly --prefix=your-bare-metal-gmp-location
where "your-bare-metal-gmp-location" is the installation directory (you must not install a bare metal library in the classical /usr/local).

Error in running fortran mpi program

After running MPI fortran program, I am getting error:
"Abort signaled by rank 2: No ACTIVE ports found
MPI process terminated unexpectedly
Abort signaled by rank 1: No ACTIVE ports found"
How to solve it?

It looks like you are using an MPI implementation compiled for Infiniband. See here: https://bugzilla.redhat.com/show_bug.cgi?id=467532 Probably you need to find (or build) an MPI library for TCP.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to use Intel Pin on MPI code - mpi

The least painful way I found is to use tau_pin. https://www.cs.uoregon.edu/research/tau/docs/old/re39.html

You can start analysis of your MPI application following way: mpirun –np $NPROCS pin -t $PIN_TOOL -- $APP It the same as in case of Valgrind: Using valgrind to spot error in mpi code

Related

How to compile and flash attiny88 with arduino-cli?

HCI_UART on NRF52840, attaching the device on a Yocto based Linux SBC errors out saying "Can't init device hci0: Cannot assign requested address (99)"

"A request was made to bind to that would result in binding more processes than cpus on a resource" mpirun command (for mpi4py)

Running Gmp on Stm32f4 discovery board

Error in running fortran mpi program

Categories

Resources