MPI - one function for MPI_Init and MPI_Init_thread - mpi

Is it possible to have one function to wrap both MPI_Init and MPI_Init_thread? The purpose of this is to have a cleaner API while maintaining backward compatibility. What happens to a call to MPI_Init_thread when it is not supported by the MPI run time? How do I keep my wrapper function working for MPI implementations when MPI_Init_thread is not supported?

MPI_INIT_THREAD is part of the MPI-2.0 specification, which was released 15 years ago. Virtually all existing MPI implementations are MPI-2 compliant except for some really archaic ones. You might not get the desired level of thread support, but the function should be there and you should still be able to call it instead of MPI_INIT.
You best and most portable option is to have a configure-like mechanism probe for MPI_Init_thread in the MPI library, e.g. by trying to compile a very simple MPI program and see if it fails with an unresolved symbol reference, or you can directly examine the export table of the MPI library with nm (for archives) or objdump (for shared ELF objects). Once you've determined that the MPI library has MPI_Init_thread, you can have a preprocessor symbol defined, e.g. CONFIG_HAS_INITTHREAD. Then have your wrapped similar to this one:
int init_mpi(int *pargc, char ***pargv, int desired, int *provided)
{
#if defined(CONFIG_HAS_INITTHREAD)
return MPI_Init_thread(pargc, pargv, desired, provided);
#else
*provided = MPI_THREAD_SINGLE;
return MPI_Init(pargc, pargv);
#endif
}
Of course, if the MPI library is missing MPI_INIT_THREAD, then MPI_THREAD_SINGLE and the other thread support level constants will also not be defined in mpi.h, so you might need to define them somewhere.

Related

What is the Ada equivalent of #define in C++?

I'm new to Ada and as far as I could explore on the internet, I was unable to find an analog to this C++ concept.
Say I have package_name.data_member (multiple variables in various packages). I'm hoping to shorten that to a more reader-friendly manner like below (without using the Use keyword) because those variables will be used multiple times in the same file -
#define A package_name.data_member
#define B package_name.data_member
...
Is there a way I can do the above in Ada?
In this case you need an object renaming declaration (ARM 8.5.1):
A : Data_Member_Type renames Package_Name.Data_Member;
If you’re using GNAT, it includes a tool gnatprep; the major differences from cpp are that
symbols to be substituted have to be marked in the source text, e.g.$foo($ isn’t in the Ada source character set),
substitutions can only be defined in a separate definitions file (or e.g -Dfoo=bar on the command line).
There is no exact analogue of #define (or any pre-processing) in standard Ada (although you could use a macro preprocessor if you need that), but for this use a renaming declaration should suit:
A : Atype renames package_name.data_memberA;
B : Btype renames package_name.data_memberB;
This has the advantage, over #define, that the tokens A and B are not mistakenly replaced by their #defines in unintended places.

In GDB Python script, array indexing fails if frame’s language is Ada

I have a script to work out how much free stack space there is in each FreeRTOS task. GDB’s language is set to auto. The script works fine when the current language is c, but fails when the current language is ada.
I have, in the class Stacks,
tcb_t = gdb.lookup_type("TCB_t")
int_t = gdb.lookup_type("int")
used to:
find {Ada task control block}.Common.Thread,
thread = atcb["common"]["thread"]
convert to a pointer to the FreeRTOS task control block,
tcb = thread.cast(Stacks.tcb_t.pointer()).dereference()
find the logical top of the stack
stk = tcb["pxStack"].cast(Stacks.int_t.pointer())
Now I need to loop logically down the stack until I find an entry not equal to the initialised value,
free = 0
while stk[free] == 0xa5a5a5a5:
free = free + 1
which works fine if the current frame’s language is c, but if it’s ada I get
Python Exception <class 'gdb.error'> not an array or string:
Error occurred in Python command: not an array or string
I’ve traced this to the expression stk[free], which is being interpreted using the rules of the current language (in Ada, array indexing uses parentheses, so it would be stk(free), which is of course illegal since Python treats it as a function call).
I’ve worked round this by
def invoke(self, arg, from_tty):
gdb.execute("set language c")
...
gdb.execute("set language auto")
but it seems wrong not to set the language back to what it was originally.
So,
is there a way of detecting the current GDB language setting from Python?
is there an alternate way of indexing that doesn’t depend on the current GDB language setting?

Julia ccall outb - Problems with libc

I run the following ccall's:
status = ccall((:ioperm, "libc"), Int32, (Uint, Uint, Int32), 0x378, 5, 1)
ccall((:outb, "libc"), Void, (Uint8, Uint16), 0x00, 0x378)
After the second ccall I receive the following Error message:
ERROR: ccall: could not find function outb in library libc
in anonymous at no file
in include at ./boot.jl:245
in include_from_node1 at loading.jl:128
in process_options at ./client.jl:285
After some research and messing around I found the following information:
ioperm is in libc, but outb is not
However, both ioperm and outb are defined in the same header file <sys/io.h>
An equivalent version of C code compiles and runs smoothly.
outb in glibc, however on the system glibc is defined as libc
Same problem with full path names /lib/x86_64-linux-gnu/libc.so.6
EDIT:
Thanks for the insight #Employed Russian! I did not look closely enough to realize the extern declaration. Now, all of my above notes make total sense!
Great, we found that ioperm is a libc function that is declared in <sys/io.h>, and that outb is not in libc, but is defined in <sys/io.h> as a volatile assembly instruction.
Which library, or file path should I use?
Implementation of ccall.
However, both ioperm and outb are defined in the same header file <sys/io.h>
By "defined" you actually mean "declared". They are different. On my system:
extern int ioperm (unsigned long int __from, unsigned long int __num,
int __turn_on) __attribute__ ((__nothrow__ , __leaf__));
static __inline void
outb (unsigned char __value, unsigned short int __port)
{
__asm__ __volatile__ ("outb %b0,%w1": :"a" (__value), "Nd" (__port));
}
It should now be obvious why you can call ioperm but not outb.
Update 1
I am still lost as to how to correct the error.
You can't import outb from libc. You would have to provide your own native library, e.g.
void my_outb(unsigned char value, unsigned short port) {
outb(value, port);
}
and import my_outb from it. For symmetry, you should probably implement my_ioperm the same way, so you are importing both functions from the same native library.
Update 2
Making a library worked, but in terms of performance it is horrible.
I guess that's why the original is implemented as an inline function: you are only executing a single outb instruction, so the overhead of a function call is significant.
Unoptimized python does x5 better.
Probably by having that same outb instruction inlined into it.
Do you know if outb exist in some other library, not in libc
That is not going to help: you will still have a function call overhead. I am guessing that when you call the imported function from Julia, you probably execute a dlopen and dlsym call, which would impose an overhead of additional several 100s of instructions.
There is probably a way to "bind" the function dynamically once, and then use it repeatedly to make the call (thus avoiding repeated dlopen and dlsym). That should help.

gperftools failing to identify files

Is there a way to avoid Google Performance Tools listing files as "??:?", that is, failing to locate which file contains the function it is reporting on? How can I work out which library contains the function being called?
$ env LD_PRELOAD="/usr/lib/libprofiler.so.0" \
CPUPROFILE=output.prof python script.py
$ google-pprof --text --files /usr/bin/python output.prof
Using local file /usr/bin/python.
Using local file output.prof.
Removing _L_unlock_13 from all stack traces.
Total: 433 samples
362 83.6% 83.6% 362 83.6% dtrsm_ ??:?
58 13.4% 97.0% 58 13.4% dgemm_ ??:?
1 0.2% 97.2% 1 0.2% PyDict_GetItem /.../Objects/dictobject.c
1 0.2% 97.5% 1 0.2% PyParser_AddToken /.../Parser/parser.c
...
I am aiming to be able to profile the C code in a python package that has many compiled C extension modules. In the toy example above, what would I do to track down where "dtrsm_" is defined? If there are multiple loaded libraries that contain functions with that same name, is there any way to tell which version is being called?
C/C++ won't compile if the same pre-processed sourcefile (e.g. with #includes expanded) contains duplicate definitions for the same symbol. (Note that in the case of C++, symbols are mangled, according to compiler-specific schemes, to incorporate the argument signature so as to facilitate overloaded functions, which could not otherwise be differentiated.)
The linker is only concerned with unresolved symbols (so there ought be nothings preventing multiple libraries concurrently calling their own respective internally-defined functions with coincident names). If a file invokes a declared but undefined function, and multiple available libraries implement that symbol, then the linker is free to choose (say by precedence in a search-path) which version gets substituted in. (Incidentally, this is the same mechanism by which profilers such as gperftools or hpctoolkit are able to inject themselves and alter the normal behaviour of another application.)
Since different libraries are mapped to separate pages of memory, it ought to be possible to identify (from memory addresses) which library contains the executing version of a function. Indeed, the GNU debugger can identify the library that code is contained by, even when it fails to name a function.
$ gdb python
(gdb) run -c "from numpy import *; linalg.inv(random.random((1000,1000)))"
CTRL-C
(gdb) backtrace
#0 0x00007ffff5ba9df8 in dtrsm_ () from /usr/lib/libblas.so.3
...
#3 0x00007ffff420df83 in ?? () from /.../numpy/linalg/_umath_linalg.so
Linux (or rather the GNU C library) provides the "backtrace" call (for getting a list of pointers from the call stack), and the "backtrace_symbols" call for automatically converting each of those pointers to a descriptive string such as:
"/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fc429929ec5]"
Gperftools can (judging from a query on the github mirror) call the generic "backtrace", but instead of "backtrace_symbols" it "forks out to pprof to do the actual symbolizing". This is a fairly-epic perl script, and looks likely where the "??" comes from.
Crucially, google-pprof is trying to report on the source-file (and line-number) which defines the function, not the binary-file containing the machine-code (that is typically quoted in stack traces). It invokes the "nm" utility. On my system it appears (by running "nm -l -D") that libblas, unlike libc and the python binary, has been stripped of such debugging symbols (presumably for optimisation), explaining the result.
To answer the original question: the call-stack samples should definitively and explicitly specify which version is being called. These can probably be dumped using an option which was added in google-pprof several months ago, or (for time-intensive functions) can be roughly ascertained by manual resampling using gdb. (It's even conceivable that g-pprof can be adjusted to explicitly identify the binaries paths in its output summaries.) Alternatively one can run "nm" (and grep) on the candidate binaries/libraries (of which a short-list can be obtained by running "strings" on the profiler's raw output, among other methods). If the source is accessible (to grep) or the libraries are popular (on the web) then of course (and per Mike Dunlavey) it may be easiest to just query for the function name. In theory the "??:?" may be addressed by carefully recompiling the offending objects.
Just Google the offending function names. The ones you show above are defined in LAPACK. dtrsm is for solving a matrix equation. dgemm is for multiplying matrices.
What you need to know is 1) why they are being called, and 2) how big the matrices are.
To find out why they are being called, what I do is just examine individual stack samples, as here.
The reason matrix size matters is if they are small, these LAPACK routines can actually spend a relatively large fraction of their time just classifying their inputs, such as by calling a function LSAME.

What are kernel blocks in OpenCL?

In the article "How to set up Xcode to run OpenCL code, and how to verify the kernels before building" NeXTCoder referred to some code as the "Short Answer", i.e. https://developer.apple.com/library/mac/#documentation/Performance/Conceptual/OpenCL_MacProgGuide/XCodeHelloWorld/XCodeHelloWorld.html.
In that code the author says "Wrap your kernel code into a kernel block:" without explaining what is a "kernel block". (The OpenCL Programmer Guide for Mac OS X by Apple makes no mention of kernel block.)
The host program calls "square_kernel" but the sample kernel is called "square" and the sample kernel block is labelled "kernelName" (in italics). Can you please tell me how to put the 3 pieces together:kernel, kernel block & host program to run in Xcode 5.1? I only have one kernel. Thanks.
It's not really jargon. It's closure-like entity.
OpenCL C 2.0 adds support for the clang block syntax. You use the ^ operator to declare a Block variable and to indicate the beginning of a Block literal. The body of the Block itself is contained within {}, as shown in the example (as usual with C, ; indicates the end of the statement).The Block is able to make use of variables from the same scope in which it was defined.
Example:
int multiplier = 7;
int (^myBlock)(int) = ^(int num) {
return num * multiplier;
};
printf(“%d\n”, myBlock(3));
// prints 21
Source:
https://www.khronos.org/registry/cl/sdk/2.1/docs/man/xhtml/blocks.html
The term "kernel block" only seems to be a jargon to refer to the "part of the code that is the kernel". Particularly, the kernel block in this case is simply the function that is declared to be a kernel, by adding kernel before its declaration. Or, even simpler, and from the way how the term is used on this website, I would say that "kernel block" is the same as "kernel".
The kernelName (in italics) is a placeholder. The code there shows the general pattern of how to define any kernel:
It is prefixed with kernel
It returns void
It has a name ... the kernelName, which may for example be square
It has several input- and output parameters
The reason why the kernel is called square, but invoked with square_kernel seems to be some magic that is done by XCode: It seems to read the .cl file, and creates a .h file that contains additional declarations that are derived from the .cl file (as can be seen in this question, where a kernel called rebound is defined, and GCL generated a rebound_kernel declaration).

Resources