MPI -Scatterv increasing segmentation order - mpi

I would like to partition a vector in different size vectors by MPI_Scatterv. When I choose a partition made in decreasing order the code runs ok, but when I choose an increasing order, it fails fails.
Is it possible that MPI_Scatterv is used only for partitioning in decreasing order? I don't know where the error is. The code that is ok and the variation that is wrong follow.
program scatt
include 'mpif.h'
integer idproc, num, ierr, tag,namelen, status(MPI_STATUS_SIZE),comm
character *(MPI_MAX_PROCESSOR_NAME) processor_name
integer, allocatable :: myray(:),send_ray(:)
integer counts(3),displ(3)
integer siz,mysize,i,k,j,total
call MPI_INIT(ierror)
comm = mpi_comm_world
call MPI_COMM_SIZE(comm, num, ierror)
call MPI_COMM_RANK(comm, idproc, ierror)
siz=12
! create the segmentation in decreasing manner
counts(1)=5
counts(2)=4
counts(3)=3
displ(1)=0
displ(2)=5
displ(3)=9
allocate(myray(counts(idproc+1)))
myray=0
! create the data to be sent on the root
if(idproc == 0)then
!size=count*num
allocate(send_ray(0:siz-1))
do i=0,siz
send_ray(i)=i+1
enddo
write(*,*) send_ray
endif
! send different data to each processor
call MPI_Scatterv( send_ray, counts, displ, MPI_INTEGER, &
myray, counts, MPI_INTEGER, &
0,comm,ierr)
write(*,*)"myid= ",idproc," ray= ",myray
call MPI_FINALIZE(ierr)
end
Result ok is:
myid= 1 ray= 6 7 8 9
myid= 0 ray= 1 2 3 4 5
myid= 2 ray= 10 11 12
When I write the same code in increasing segmentation order
counts(1)=2
counts(2)=4
counts(3)=6
displ(1)=0
displ(2)=2
displ(3)=6
The segmentation is made only for the root
myid= 0 ray= 1 2
and the error message is:
Fatal error in PMPI_Scatterv: Message truncated, error stack:
PMPI_Scatterv(671)......................: MPI_Scatterv(sbuf=(nil), scnts=0x6b4da0, displs=0x6b4db0, MPI_INTEGER, rbuf=0x26024b0,
rcount=2, MPI_INTEGER, root=0, MPI_COMM_WORLD) failed
MPIR_Scatterv_impl(211).................:
I_MPIR_Scatterv_intra(278)..............: Failure during collective
I_MPIR_Scatterv_intra(272)..............:
MPIR_Scatterv(147)......................:
MPIDI_CH3_PktHandler_EagerShortSend(441): Message from rank 0 and tag 6 truncated; 16 bytes received but buffer size is 8
Fatal error in PMPI_Scatterv: Message truncated, error stack:
PMPI_Scatterv(671)................: MPI_Scatterv(sbuf=(nil), scnts=0x6b4da0, displs=0x6b4db0, MPI_INTEGER, rbuf=0x251f4b0, rcount=2, MPI_INTEGER, root=0, MPI_COMM_WORLD) failed
MPIR_Scatterv_impl(211)...........:
I_MPIR_Scatterv_intra(278)........: Failure during collective
I_MPIR_Scatterv_intra(272)........:
MPIR_Scatterv(147)................:
MPIDI_CH3U_Receive_data_found(131): Message from rank 0 and tag 6 truncated; 24 bytes received but buffer size is 8
forrtl: error (69): process interrupted (SIGINT)

There are two problems in your code.
First, the invocation of MPI_Scatterv is wrong. The size of the receive buffer must be a scalar, not an array, and give the size of the array in the calling rank only. In your case you need to change the second occurrence of counts to counts(idproc+1):
call MPI_Scatterv(send_ray, counts, displ, MPI_INTEGER, &
myray, counts(idproc+1), MPI_INTEGER, &
0, comm, ierr)
The same applies to the complimentary operation MPI_Gatherv - there the size of the local send buffer is also a scalar.
Another problem is the out-of-bounds access in this initialisation loop:
allocate(send_ray(0:siz-1))
do i=0,siz
send_ray(i)=i+1
enddo
Here send_ray is allocated with bounds 0:siz-1, but the loop runs from 0 to siz, which is one element past the end of the array. Some compilers have options to enable run-time out-of-bound access checks. For example, with Intel Fortran the option is -check bounds. For Gfortran the option is -fcheck=bounds. Accessing arrays past their end could overwrite and thus alter the values in other arrays (worst case, hard to spot) or destroy the heap pointers and crash your program (best case, easy to spot).
As Gilles Gouaillardet has noticed, do not use mpif.h. Instead, use mpi or even better use mpi_f08 should be used in newly developed programs.

Related

How to interface Python with a C program that uses MPI

Currently I have a Python program (serial) that calls a C executable (parallel through MPI) through subprocess.run. However, this is a terribly clunky implementation as it means I have to pass some very large arrays back and forth from the Python to the C program using the file system. I would like to be able to directly pass the arrays from Python to C and back. I think ctypes is what I should use. As I understand it, I would create a dll instead of an executable from my C code to be able to use it with Python.
However, to use MPI you need to launch the program using mpirun/mpiexec. This is not possible if I am simply using the C functions from a dll, correct?
Is there a good way to enable MPI for the function called from the dll? The two possibilities I've found are
launch the python program in parallel using mpi4py, then pass MPI_COMM_WORLD to the C function (per this post How to pass MPI information to ctypes in python)
somehow initialize and spawn processes inside the function without using mpirun. I'm not sure if this is possible.
One possibility, if you are OK with passing everything through the c program rank 0, is to use subprocess.Popen() with stdin=subprocess.PIPE and the communicate() function on the python side and fread() on the c side.
This is obviously fragile, but does keep everything in memory. Also, if your data size is large (which you said it was) you may have to write the data to the child process in chunk. Another option could be to use exe.stdin.write(x) rather than exe.communicate(x)
I created a small example program
c code (program named child):
#include "mpi.h"
#include "stdio.h"
int main(int argc, char *argv[]){
MPI_Init(&argc, &argv);
int size, rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
double ans;
if(rank == 0){
fread(&ans, sizeof(ans), 1, stdin);
}
MPI_Bcast(&ans, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
printf("rank %d of %d received %lf\n", rank, size, ans);
MPI_Finalize();
}
python code (named driver.py):
#!/usr/bin/env python
import ctypes as ct
import subprocess as sp
x = ct.c_double(3.141592)
exe = sp.Popen(['mpirun', '-n', '4', './child'], stdin=sp.PIPE)
exe.communicate(x)
x = ct.c_double(101.1)
exe = sp.Popen(['mpirun', '-n', '4', './child'], stdin=sp.PIPE)
exe.communicate(x)
results:
> python ./driver.py
rank 0 of 4 received 3.141592
rank 1 of 4 received 3.141592
rank 2 of 4 received 3.141592
rank 3 of 4 received 3.141592
rank 0 of 4 received 101.100000
rank 2 of 4 received 101.100000
rank 3 of 4 received 101.100000
rank 1 of 4 received 101.100000
I tried using MPI_Comm_connect() and MPI_Comm_accept() through mpi4py, but I couldn't seem to get that working on the python side.
Since most of the time is spent in the C subroutine which is invoked multiple times, and you are running within a resource manager, I would suggest the following approach :
Start all the MPI tasks at once via the following command (assuming you have allocated n+1 slots
mpirun -np 1 python wrapper.py : -np <n> a.out
You likely want to start with a MPI_Comm_split() in order to generate a communicator only for the n tasks implemented by the C program.
Then you will define a "protocol" so the python wrapper can pass parameters to the C tasks, and wait for the result or direct the C program to MPI_Finalize().
You might as well consider using an intercommunicator (first group is for python, second group is for C) but this is really up to you. Intercommunicator semantic can be seen as non intuitive, so make sure you understand how this works if you want to go into that direction.

Is this use of character string pointers safe?

While implementing a string utility function, I came across a couple of character pointer expressions that I think may be unsafe. I googled, searched on SO, read my Fortran 95 language guide (Gehrke 1996) as well as various excerpts on display in Google books. However, I could not find any sources discussing this particular usage.
Both ifort and gfortran compile the following program without warning:
PROGRAM test_pointer
IMPLICIT NONE
CHARACTER(LEN=100), TARGET :: string = "A string variable"
CHARACTER(LEN=0), TARGET :: empty = ""
CHARACTER(LEN=:), POINTER :: ptr
ptr => NULL()
IF(ptr == "") PRINT *, 'Nullified pointer is equal to ""'
ptr => string(-2:-3)
IF(ptr == "") PRINT *, 'ptr equals "", but the (empty) sub string was out of bounds.'
ptr => empty(1:0)
IF(ptr == "") PRINT *, 'ptr equals "", it was not possible to specify subarray within bonds'
END PROGRAM
The output of the program is:
Nullified pointer is equal to ""
ptr equals "", but the (empty) sub string was out of bounds.
ptr equals "", it was not possible to specify subarray within bonds
So apparently, the evaluations of the pointer make sense to the compiler and the outcome is what you would expect. Can somebody explain why the above code did not result in at least one segmentation fault? Does the standard really allow out-of-bounds substrings? What about the use of a nullified character pointer?
edit : After reading Vladimir F's answer, I realized that I forgot to activate runtime checking. The nullified pointer actually does trigger a run time error.
Why they do not result in a segfault? Dereferencing a nullified pointer is not conforming to the standard (in C terms it is undefined behaviour). The standard does not say what a non-conforming program should do. The standard only applies to programs which conform to it! Anything can happen for non-conforming programs!
I get this (sunf90):
****** FORTRAN RUN-TIME SYSTEM ******
Attempting to use an unassociated POINTER 'PTR'
Location: line 8 column 6 of 'charptr.f90'
Aborted
and with another compiler (ifort):
forrtl: severe (408): fort: (7): Attempt to use pointer PTR when it is not associated with a target
Image PC Routine Line Source
a.out 0000000000402EB8 Unknown Unknown Unknown
a.out 0000000000402DE6 Unknown Unknown Unknown
libc.so.6 00007FA0AE123A15 Unknown Unknown Unknown
a.out 0000000000402CD9 Unknown Unknown Unknown
For the other two accesses, you are not accessing anything, you are creating a substring of length 0, there is no need to access the character variable, the result is just an empty string.
Specifically, the Fortran standard (F2008:6.4.1.3) says this about creating a substring:
Both the starting point and the ending point shall be within the
range 1, 2, ..., n unless the starting point exceeds the ending
point, in which case the substring has length zero.
For this reason the first part is not standard conforming, but the other ones are.

What does ACL2 exit code 137 mean?

What does ACL2 exit code 137 mean? The output reads like this:
Form: ( INCLUDE-BOOK "centaur/ubdds/param" ...)
Rules: NIL
Time: 0.00 seconds (prove: 0.00, print: 0.00, other: 0.00)
:REDUNDANT
Note: not introducing any A4VEC field bindings for A, since none of
its fields appear to be used.
Note: not introducing any MODSCOPE field bindings for SCOPE, since
none of its fields appear to be used.
;;; Starting full GC, 10,736,500,736 bytes allocated.
Exit code from ACL2 is 137
top.cert seems to be missing
Looks like "Linux OOM killer" killed your program.
Exit status 137 means program was terminated with singal 9 (SIGKILL) (See here):
When a command terminates on a fatal signal whose number is N, Bash uses the value 128+N as the exit status.
128+9=137
This message from the log tells us that your ACL2 proof consumed 10Gb of memory:
;;; Starting full GC, 10,736,500,736 bytes allocated.
Linux has a feature where it kills an offending process when the system is very low on memory. It is called OOM Killer: https://www.kernel.org/doc/gorman/html/understand/understand016.html
Such events are logged by kernel. You can immediately see them just to make sure:
$ dmesg |grep -i "killed process"
Mar 7 02:43:11 myhost kernel: Killed process 3841 (acl2) total-vm:128024kB, anon-rss:0kB, file-rss:0kB
There are two ACL2 calls : set-max-mem and maybe-wash-memory which you can use to control memory consumtion.
(include-book "centaur/misc/memory-mgmt" :dir :system) ;; adds ttag
(value-triple (set-max-mem (* 4 (expt 2 30)))) ;; 4 GB
Unfortunately these two calls do not guarantee that memory will be freed. Consider using a more powerful computer for your proof.
An exit code of 137 suggests that it has been killed by bash with -9
Reference: http://www.tldp.org/LDP/abs/html/exitcodes.html

Segmentation faults in Fortran recursive tree implementation

I need to implement a tree structure in Fortran for a project, so I've read various guides online explaining how to do it. However, I keep getting errors or weird results.
Let's say I want to build a binary tree where each node stores an integer value. I also want to be able to insert new values into a tree and to print the nodes of the tree. So I wrote a type "tree" that contains an integer, two pointers towards the children sub-trees and a boolean which I set to .true. if there are no children sub-trees:
module class_tree
implicit none
type tree
logical :: isleaf
integer :: value
type (tree), pointer :: left,right
end type tree
interface new
module procedure newleaf
end interface
interface insert
module procedure inserttree
end interface
interface print
module procedure printtree
end interface
contains
subroutine newleaf(t,n)
implicit none
type (tree), intent (OUT) :: t
integer, intent (IN) :: n
t % isleaf = .true.
t % value = n
nullify (t % left)
nullify (t % right)
end subroutine newleaf
recursive subroutine inserttree(t,n)
implicit none
type (tree), intent (INOUT) :: t
integer, intent (IN) :: n
type (tree), target :: tleft,tright
if (t % isleaf) then
call newleaf(tleft,n)
call newleaf(tright,n)
t % isleaf = .false.
t % left => tleft
t % right => tright
else
call inserttree(t % left,n)
endif
end subroutine inserttree
recursive subroutine printtree(t)
implicit none
type (tree), intent (IN) :: t
if (t % isleaf) then
write(*,*) t % value
else
write(*,*) t % value
call printtree(t % left)
call printtree(t % right)
endif
end subroutine printtree
end module class_tree
The insertion is always done into the left sub-tree unless trying to insert into a leaf. In that case, the insertion is done into both sub-trees to make sure a node has always 0 or 2 children. The printing is done in prefix traversal.
Now if I try to run the following program:
program main
use class_tree
implicit none
type (tree) :: t
call new(t,0)
call insert(t,1)
call insert(t,2)
call print(t)
end program main
I get the desired output 0 1 2 2 1. But if I add "call insert(t,3)" after "call insert(t,2)" and run again, the output is 0 1 2 0 and then I get a segfault.
I tried to see whether the fault happened during insertion or printing so I tried to run:
program main
use class_tree
implicit none
type (tree) :: t
call new(t,0)
call insert(t,1)
call insert(t,2)
write(*,*) 'A'
call insert(t,3)
write(*,*) 'B'
call print(t)
end program main
It makes the segfault go away but I get a very weird output A B 0 1 2673568 6 1566250180.
When searching online for similar errors, I got results like here where it says it might be due to too many recursive calls. However, the call to insert(t,3) should only contain 3 recursive calls... I've also tried to compile using gfortran with -g -Wall -pedantic -fbounds-check and run with a debugger. It seems the fault happens at the "if (t % isleaf)" line in the printing subroutine, but I have no idea how to make sense of that.
Edit:
Following the comments, I have compiled with -g -fbacktrace -fcheck=all -Wall in gfortran and tried to check the state of the memory. I'm quite new to this so I'm not sure I'm using my debugger (gdb) correctly.
After the three insertions and before the call to print, it seems that everything went well: for example when I type p t % left % left % right % value in gdb I get the expected output (that is 3). If I just type p t, the output is (.FALSE.,0,x,y), where x and y are hexadecimal numbers (memory addresses, I guess). However, if I try p t % left, I get something like a "description" of the pointer:
PTR TO -> (Type tree
logical(kind=4) :: isleaf
integer(kind=4) :: value
which repeats itself a lot since each pointer points to a tree that contains two pointers. I would have expected an output similar to that of p t, but I have no idea whether that's normal.
I also tried to examine the memory: for example if I type x/4uw t % left, I get 4 words, the first 2 words seem to correspond to isleaf and value, the last 2 to memory addresses. By following the memory addresses like that, I managed to visit all the nodes and I didn't find anything wrong.
The segfault happens within the printing routine. If I type p t after the fault, it says I cannot access the 0x0 address. Does that mean my tree is somehow modified when I try to print it?
The reason for your problems is the fact, that variables, which get out of scope, are not valid anymore. This is in contrast to languages like Python, where the number of existing pointers is relevant (refcount).
In your particular case, this means, that the calls to newleaf(left, n) and newleaf(right, n) set the values of left and right, resp., but these variables get ouf of scope and, thus, invalid.
A better approach is to allocate each leaf as it is needed (except the first one, since this is already allocated and will not get out of scope till the end of the program).
recursive subroutine inserttree(t,n)
implicit none
type (tree), intent (INOUT) :: t
integer, intent (IN) :: n
if (t % isleaf) then
allocate(t%left)
allocate(t%right)
call newleaf(t%left,n)
call newleaf(t%right,n)
t % isleaf = .false.
else
call inserttree(t % left,n)
endif
end subroutine inserttree

Fortran Unhandled Exception (msvcr100d.dll)

I'm getting this unhandled exception when I exit my program:
Unhandled exception at 0x102fe274 (msvcr100d.dll) in Parameters.exe: 0xC0000005: Access violation reading location 0x00000005.
The debugger stops in a module called crtdll.c on this line:
onexitbegin_new = (_PVFV *) DecodePointer(__onexitbegin);
The top line on the call stack reads:
msvcr100d.dll!__clean_type_info_names_internal(__type_info_node * p_type_info_root_node=0x04a6506c) Line 359 + 0x3 bytes C++
The program then remains in memory until I close down the IDE.
I'm more used to developing with managed languages so I expect I'm doing something wrong with my code maintenance. The code itself reads a memory mapped file and assoiciates it with pointers:
SUBROUTINE READ_MMF ()
USE IFWIN
USE, INTRINSIC :: iso_c_binding
USE, INTRINSIC :: iso_fortran_env
INTEGER(HANDLE) file_mapping_handle
INTEGER(LPVOID) memory_location
TYPE(C_PTR) memory_location_cptr
INTEGER memory_size
INTEGER (HANDLE) file_map
CHARACTER(5) :: map_name
TYPE(C_PTR) :: cdata
integer :: n = 3
integer(4), POINTER :: A, C
real(8), POINTER :: B
TYPE STRUCT
integer(4) :: A
real(8) :: B
integer(4) :: C
END TYPE STRUCT
TYPE(STRUCT), pointer :: STRUCT_PTR
memory_size = 100000
map_name = 'myMMF'
file_map = CreateFileMapping(INVALID_HANDLE_VALUE,
+ NULL,
+ PAGE_READWRITE,
+ 0,
+ memory_size,
+ map_name // C_NULL_CHAR )
memory_location = MapViewOfFile(file_map,
+IOR(FILE_MAP_WRITE, FILE_MAP_READ),
+ 0, 0, 0 )
cdata = TRANSFER(memory_location, memory_location_cptr)
call c_f_pointer(cdata, STRUCT_PTR, [n])
A => STRUCT_PTR%A
B => STRUCT_PTR%B
C => STRUCT_PTR%C
RETURN
END
Am I supposed to deallocate the c-pointers when I'm finished with them? I looked into that but can't see how I do it in Fortran...
Thanks for any help!
The nature of the access violation (during runtime library cleanup) suggests that your program is corrupting memory in some way. There are a number of programming errors that can lead to that - and the error or errors responsible could be anywhere in your program. The usual "compile and run with all diagnostic and debugging options enabled" approach may help identify these.
That said, there is a programming error in the code example shown. The C_F_POINTER procedure from the ISO_C_BINDING intrinsic module can operate on either scalar or array Fortran pointers (the second argument). If the Fortran pointer is a scalar then the third "shape" argument must not be present (it must be present if the Fortran pointer is an array).
Your code breaks this requirement - the Fortran pointer STRUCT_PTR in your code is a scalar, but yet you provide the third shape argument (as [n]). It is quite plausible that this error will result in memory corruption - typically the implementation of C_F_POINTER would try and populate a descriptor in memory for the Fortran pointer, and the descriptor for a pointer to an array may be very different from a pointer to a scalar.
Subsequent references to STRUCT_PTR may further the corruption.
While it is not required by the standard to diagnose this situation, I am a little surprised that the compiler does not issue a diagnostic (assuming you example code is what you actually are compiling). If you reported this to your compiler's vendor (Intel, presumably given IFWIN etc) I suspect they would regard it as a deficiency in their compiler.
To release the memory associated with the file mapping you use the UnmapViewOfFile and CloseHandle API's. To use these you should "store" (your program needs to remember in some way) the base address (memory_location, which can also be obtained by calling C_LOC on STRUCT_PTR once the problem above is fixed) returned by MapViewOfFile, and the handle to the mapping (file_map) returned by CreateFileMapping; respectively.
I've only ever done this with Cray Pointers: not with the ISO bindings and I know it does work with Cray Pointers.
What you don't say is whether this is happening the first time or second time the routine is being called. If it is called more than once, then there is a problem in the coding in that Create/OpenFileMapping should only be called once to get a handle.
You don't need to deallocate memory because the memory is not yours to deallocate: you need to call UnmapViewOfFile(memory_location). After you have called this, memory_location, memory_location_cptr and possibly cdata are no longer valid.
The way this works is with two or more programs:
One program calls CreateFileMapping, the others calls OpenFileMapping to obtain a handle to the data. This only needs to be called once at the start of the program: not every time you need to access the file. Multiple calls to Create/OpenFileMapping without a corresponding close can cause crashes.
They then call MapViewOfFile to map the file into memory. Note that only one program can do this at a time. When the program is finished with the memory file, it calls UnmapViewOfFile. The other program can now get to the file. There is a blocking mechanism. If you do not call UnmapViewOfFile, other programs using MapViewOfFile will be blocked.
When all is done, call close on the handle created by Create/OpenFileMapping.

Resources