I am writing to the file with MPI on Fortran and while checking what was written I don't get the expected results - mpi

Each process is building some array and is writing this array in the "correct" place, using mpi_file_write_at().
After writing to the file I read from the same place and it is not what I wrote. The code is attached. I am just a beginner in MPI, so sorry if the question is not clever.
program output
use mpi
implicit none
integer :: ierr,i,proc_num,file,intsize
integer :: status(mpi_status_size)
integer,parameter :: count=10
integer,dimension(count) :: buf
integer,dimension(3*count) :: arr
integer(kind=mpi_offset_kind) :: disp
call mpi_init(ierr)
call mpi_comm_rank(MPI_COMM_WORLD,proc_num,ierr)
do i=1,count
buf(i) = proc_num*count+i
enddo
call mpi_file_open(MPI_COMM_WORLD,'out.txt',mpi_mode_wronly+mpi_mode_create,mpi_info_null,file,ierr)
call mpi_type_size(mpi_integer,intsize,ierr)
disp = proc_num*count*intsize
call mpi_file_write_at(file,disp,buf,count,mpi_integer,status,ierr)
if (proc_num==0) then
call mpi_file_read_at(file,0,arr,3*count,mpi_integer,status,ierr)
write(*,*),arr
endif
call mpi_file_close(file,ierr)
call mpi_finalize(ierr)
end program output
Thank you!

You are using mpi_mode_wronly to open the file. As stated here, it corresponds to "write only". Consequently, mpi_file_read_at() is likely to fail. It can be checked by looking at the output parameter ierr.
Could you try the mpi_mode_rdwr flag ? This should enable both read and write operations.
Moreover, MPI_File_write_at() is a noncollective operation. So process 0 can call mpi_file_read_at() before process 1 exited MPI_File_write_at(). A mpi_barrier() can be added to prevent that. Take a look at http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report/node305.htm . It features various examples using MPI_File_write_at(). It is likely that additional calls to MPI_File_sync() and MPI_File_set_view() are required as well.
Notice that the code you provided is equivalent to a call to the function MPI_Gather().

Related

MPI subroutines in Fortran

I have looked through all the posts on this topic I could find but they do not seem to solve my problem. I am thankful for any input/help/idea. So here it is:
I have my main program (main.f90):
program inv_main
use mod_communication
implicit none
include 'mpif.h'
...
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,id,ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nproc,ierr)
...
call SENDRECEIVE(id, nproc, ierr, VVNP, VVN)
...
call MPI_FINALIZE(ierr)
end program inv_main
And here is the module that includes the subroutine (I am aware that allgather might be a better way to do the same but I could not figure it out yet for my 4D array):
Module mod_communication
implicit none
include 'mpif.h'
integer, dimension(MPI_STATUS_SIZE) :: STATUS ! MPI
CONTAINS
Subroutine SENDRECEIVE(id, nproc, ierr, INPUT, OUTPUT )
integer, intent (in) :: nproc, id, ierr
real (dp), intent(in) :: INPUT(n,m)
real (dp), intent(out) :: OUTPUT(n,m,nty,nty)
integer :: sndr
IF (id .eq. 0) THEN
OUTPUT(1:n,1:m,1,1)=INPUT
call MPI_RECV(INPUT,n*m,MPI_DOUBLE_PRECISION,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,STATUS,ierr)
sndr=STATUS(MPI_SOURCE)
OUTPUT(1:n,1:m,int(sndr/nty)+1,sndr+1-nty*(int(sndr/nty))) = INPUT
END IF
IF (id .ne. 0) THEN
call MPI_SEND(INPUT,n*m,MPI_DOUBLE_PRECISION,0,id,MPI_COMM_WORLD,ierr)
ENDIF
call MPI_BARRIER(MPI_COMM_WORLD,ierr)
call MPI_BCAST(OUTPUT,n*m*nty*nty,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,ierr)
end Subroutine
end Module mod_communication
This is the error message I got when compiling:
use mod_communication
2
Error: Symbol 'mpi_displacement_current' at (1) conflicts with symbol from module 'mod_communication', use-associated at (2)
mpif-mpi-io.h:71.36:
Included at mpif-config.h:65:
Included at mpif-common.h:70:
Included at mpif.h:59:
Included at main.f90:27:
integer MPI_MAX_DATAREP_STRING
1
main.f90:21.6:
use mod_communication
2
Error: Symbol 'mpi_max_datarep_string' at (1) conflicts with symbol from module 'mod_communication', use-associated at (2)
mpif-mpi-io.h:73.32:
Included at mpif-config.h:65:
Included at mpif-common.h:70:
Included at mpif.h:59:
Included at main.f90:27:
parameter (MPI_FILE_NULL=0)
These are just the first two errors, it keeps going like that... And I cannot find my mistake. Also, I have to use "include 'mpif.h'" and not "use mpi" because of the machine I am ultimately going to run it on. If I compile it with use mpi however on my own computer it gives me a different error, which is the following:
mod_MPI.f90:93.41:
call MPI_BARRIER(MPI_COMM_WORLD,ierr)
1
Error: There is no specific subroutine for the generic 'mpi_barrier' at (1)
mod_MPI.f90:52.41:
Your main program probably gets (or rather tries to get) two copies of all the stuff in mpif.h. By include-ing it in the module you effectively make all its contents module things (variables, routines, parameters, what-nots). Then, in main you both use the module and, thereby, use-associate the module things, and try to include mpif.h and redeclare all those things again.
Do what #Jonathan Dursi suggests too.

Fortran 90 function return pointer

I saw this question:
Fortran dynamic objects
and the accepted answer made me question if I wrote the following function safely (without allowing a memory leak)
function getValues3D(this) result(vals3D)
implicit none
type(allBCs),intent(in) :: this
real(dpn),dimension(:,:,:),pointer :: vals3D
integer,dimension(3) :: s
if (this%TF3D) then
s = shape(this%vals3D)
if (associated(this%vals3D)) then
stop "possible memory leak - p was associated"
endif
allocate(vals3D(s(1),s(2),s(3)))
vals3D = this%vals3D
else; call propertyNotAssigned('vals3D','getValues3D')
endif
end function
This warning shows up when I run my code, but shouldn't my this%vals3D be associated if it was previously (to this function) set? I'm currently running into memory errors, and they started showing up when I introduced a new module with this function in it.
Any help is greatly appreciated.
I think I wasn't specific enough. I would like to make the following class, and know how to implement the class, safely in terms of memory. That is:
module vectorField_mod
use constants_mod
implicit none
type vecField1D
private
real(dpn),dimension(:),pointer :: x
logical :: TFx = .false.
end type
contains
subroutine setX(this,x)
implicit none
type(vecField1D),intent(inout) :: this
real(dpn),dimension(:),target :: x
allocate(this%x(size(x)))
this%x = x
this%TFx = .true.
end subroutine
function getX(this) result(res)
implicit none
real(dpn),dimension(:),pointer :: res
type(vecField1D),intent(in) :: this
nullify(res)
allocate(res(size(this%x)))
if (this%TFx) then
res = this%x
endif
end function
end module
Where the following code tests this module
program testVectorField
use constants_mod
use vectorField_mod
implicit none
integer,parameter :: Nx = 150
real(dpn),parameter :: x_0 = 0.0
real(dpn),parameter :: x_N = 1.0
real(dpn),parameter :: dx = (x_N - x_0)/dble(Nx-1)
real(dpn),dimension(Nx) :: x = (/(x_0+dble(i)*dx,i=0,Nx-1)/)
real(dpn),dimension(Nx) :: f
real(dpn),dimension(:),pointer :: fp
type(vecField1D) :: f1
integer :: i
do i=1,Nx
f(i) = sin(x(i))
enddo
do i=1,10**5
call setX(f1,f) !
f = getX(f1) ! Should I use this?
fp = getX(f1) ! Or this?
fp => getX(f1) ! Or even this?
enddo
end program
Currently, I'm running on windows. When I CTR-ALT-DLT, and view performance, the "physical memory usage histery" increases with every loop iteration. This is why I assume that I have a memory leak.
So I would like to repose my question: Is this a memory leak? (The memory increases with every one of the above cases). If so, is there a way I avoid the memory leak while still using pointers? If not, then what is happening, should I be concerned and is there a way to reduce the severity of this behavior?
Sorry for the initial vague question. I hope this is more to the point.
Are you really restricted to Fortran 90? In Fortran 2003 you would use an allocatable function result for this. This is much safer. Using pointer function results, whether you have a memory leak with this code or not depends on how you reference the function, which you don't show. If you must return a pointer from a procedure, it is much safer to return it via a subroutine argument.
BUT...
This function is pointless. There's no point testing the association status of this%vals3D` after you've referenced it as the argument to SHAPE in the previous line. If the pointer component is disassocated (or has undefined pointer association status), then you are not permitted to reference it.
Further, if the pointer component is associated, all you do is call stop!
Perhaps you have transcribed the code to the question incorrectly?
If you simply delete the entire if construct starting with if (associated(this%vals3D))... then your code may make sense.
BUT...
if this%TF3D is true, then this%vals3D must be associated.
when you reference the function, you must use pointer assignment
array_ptr => getValues3D(foo)
! ^
! |
! + this little character is very important.
Forget that little character and you are using normal assignment. Syntactically valid, difficult to pick the difference when reading code and, in this case, potentially a source of memory corruption or leaks that might go undetected until the worst possible moment, in addition to the usual pitfalls of using pointers (e.g. you need to DEALLOCATE array_ptr before you reuse it or it goes out of scope). This is why functions returning pointer results are considered risky.
Your complete code shows several memory leaks. Every time you allocate something that is a POINTER - you need to pretty much guarantee that there will be a matching DEALLOCATE.
You have a loop in your test code. ALLOCATE gets called a lot - in both the setter and the getter. Where are the matching DEALLOCATE statements?
Every time setX is called, any previously allocated memory for the x component of your type will be leaked. Since you call the function 10^5 times, you will waste 100000-1 copies. If you know that the size of this%x will never change, simply check to see if a previous call had already allocated the memory by checking to see if ASSOCIATED(this%x) is true. If it is, skip the allocation and move directly to the assignment statement. If the size does change, then you will first have to deallocate the old copy before allocating new space.
Two other minor comments on setX: The TARGET attribute of the dummy argument x appears superfluous since you never take a pointer of that argument. Second, the TFx component of your type also seems superfluous since you can instead check if x is allocated.
For the function getX, why not skip the allocation completely, and merely set res => this%x? Admittedly, this will return a direct reference to the underlying data, which maybe you want to avoid.
In your loop,
do i=1,10**5
call setX(f1,f) !
f = getX(f1) ! Should I use this?
fp = getX(f1) ! Or this?
fp => getX(f1) ! Or even this?
enddo
fp => getX(f1) will allow you to obtain a pointer to the underlying x component of your type (if you adopt my change above). The other two use assignment operators and will copy data from the result of getX into either f, or (if it is previously allocated) fp. If fp is not allocated, the code will crash.
If you do not want to grant direct access to the underlying data, then I suggest that the return value of getX should be defined as an automatic array with the size determined by this%x. That is, you can write the function as
function getX(this) result(res)
implicit none
type(vecField1D),intent(in) :: this
real(dpn),dimension(size(this%x,1)) :: res
res = this%x
end function

How can one really create a process using Unix.create_process in OCaml?

I have tried
let _ = Unix.create_process "ls" [||] Unix.stdin Unix.stdout Unix.stderr
in utop, it will crash the whole thing.
If I write that into a .ml and compile and run, it will crash the terminal and my ubuntu will throw a system error.
But why?
The right way to call it is:
let pid = Unix.create_process "ls" [|"ls"|] Unix.stdin Unix.stdout Unix.stderr
The first element of the array must be the "command" name.
On some systems /bin/ls is a link to some bigger executable that will look at argv.(0) to know how to behave (c.f. Busybox); so you really need to provide that info.
(You see more often that with /usr/bin/vi which is now on many systems a sym-link to vim).
Unix.create_process actually calls fork and the does an execvpe, which itself calls the execv primitive (in the OCaml C implementation of the Unix module).
That function then calls cstringvect (a helper function in the C side of the module implementation), which translates the arg parameters into an array of C string, with last entry set to NULL. However, execve and the like expect by convention (see the execve(2) linux man page) the first entry of that array to be the name of the program:
argv is an array of argument strings passed to the new program. By
convention, the first of these strings should contain the filename
associated with the file being executed.
That first entry (or rather, the copy it receives) can actually be changed by the program receiving these args, and is displayed by ls, top, etc.

Fortran Unhandled Exception (msvcr100d.dll)

I'm getting this unhandled exception when I exit my program:
Unhandled exception at 0x102fe274 (msvcr100d.dll) in Parameters.exe: 0xC0000005: Access violation reading location 0x00000005.
The debugger stops in a module called crtdll.c on this line:
onexitbegin_new = (_PVFV *) DecodePointer(__onexitbegin);
The top line on the call stack reads:
msvcr100d.dll!__clean_type_info_names_internal(__type_info_node * p_type_info_root_node=0x04a6506c) Line 359 + 0x3 bytes C++
The program then remains in memory until I close down the IDE.
I'm more used to developing with managed languages so I expect I'm doing something wrong with my code maintenance. The code itself reads a memory mapped file and assoiciates it with pointers:
SUBROUTINE READ_MMF ()
USE IFWIN
USE, INTRINSIC :: iso_c_binding
USE, INTRINSIC :: iso_fortran_env
INTEGER(HANDLE) file_mapping_handle
INTEGER(LPVOID) memory_location
TYPE(C_PTR) memory_location_cptr
INTEGER memory_size
INTEGER (HANDLE) file_map
CHARACTER(5) :: map_name
TYPE(C_PTR) :: cdata
integer :: n = 3
integer(4), POINTER :: A, C
real(8), POINTER :: B
TYPE STRUCT
integer(4) :: A
real(8) :: B
integer(4) :: C
END TYPE STRUCT
TYPE(STRUCT), pointer :: STRUCT_PTR
memory_size = 100000
map_name = 'myMMF'
file_map = CreateFileMapping(INVALID_HANDLE_VALUE,
+ NULL,
+ PAGE_READWRITE,
+ 0,
+ memory_size,
+ map_name // C_NULL_CHAR )
memory_location = MapViewOfFile(file_map,
+IOR(FILE_MAP_WRITE, FILE_MAP_READ),
+ 0, 0, 0 )
cdata = TRANSFER(memory_location, memory_location_cptr)
call c_f_pointer(cdata, STRUCT_PTR, [n])
A => STRUCT_PTR%A
B => STRUCT_PTR%B
C => STRUCT_PTR%C
RETURN
END
Am I supposed to deallocate the c-pointers when I'm finished with them? I looked into that but can't see how I do it in Fortran...
Thanks for any help!
The nature of the access violation (during runtime library cleanup) suggests that your program is corrupting memory in some way. There are a number of programming errors that can lead to that - and the error or errors responsible could be anywhere in your program. The usual "compile and run with all diagnostic and debugging options enabled" approach may help identify these.
That said, there is a programming error in the code example shown. The C_F_POINTER procedure from the ISO_C_BINDING intrinsic module can operate on either scalar or array Fortran pointers (the second argument). If the Fortran pointer is a scalar then the third "shape" argument must not be present (it must be present if the Fortran pointer is an array).
Your code breaks this requirement - the Fortran pointer STRUCT_PTR in your code is a scalar, but yet you provide the third shape argument (as [n]). It is quite plausible that this error will result in memory corruption - typically the implementation of C_F_POINTER would try and populate a descriptor in memory for the Fortran pointer, and the descriptor for a pointer to an array may be very different from a pointer to a scalar.
Subsequent references to STRUCT_PTR may further the corruption.
While it is not required by the standard to diagnose this situation, I am a little surprised that the compiler does not issue a diagnostic (assuming you example code is what you actually are compiling). If you reported this to your compiler's vendor (Intel, presumably given IFWIN etc) I suspect they would regard it as a deficiency in their compiler.
To release the memory associated with the file mapping you use the UnmapViewOfFile and CloseHandle API's. To use these you should "store" (your program needs to remember in some way) the base address (memory_location, which can also be obtained by calling C_LOC on STRUCT_PTR once the problem above is fixed) returned by MapViewOfFile, and the handle to the mapping (file_map) returned by CreateFileMapping; respectively.
I've only ever done this with Cray Pointers: not with the ISO bindings and I know it does work with Cray Pointers.
What you don't say is whether this is happening the first time or second time the routine is being called. If it is called more than once, then there is a problem in the coding in that Create/OpenFileMapping should only be called once to get a handle.
You don't need to deallocate memory because the memory is not yours to deallocate: you need to call UnmapViewOfFile(memory_location). After you have called this, memory_location, memory_location_cptr and possibly cdata are no longer valid.
The way this works is with two or more programs:
One program calls CreateFileMapping, the others calls OpenFileMapping to obtain a handle to the data. This only needs to be called once at the start of the program: not every time you need to access the file. Multiple calls to Create/OpenFileMapping without a corresponding close can cause crashes.
They then call MapViewOfFile to map the file into memory. Note that only one program can do this at a time. When the program is finished with the memory file, it calls UnmapViewOfFile. The other program can now get to the file. There is a blocking mechanism. If you do not call UnmapViewOfFile, other programs using MapViewOfFile will be blocked.
When all is done, call close on the handle created by Create/OpenFileMapping.

MPI_COMM_WORLD handle loses value in a subroutine

my program is as follows:
module x
use mpi !x includes mpi module
implicit none
...
contains
subroutine do_something_with_mpicommworld
!use mpi !uncommenting this makes a difference (****)
call MPI_...(MPI_COMM_WORLD,...,ierr)
end subroutine
...
end module x
program main
use mpi
use x
MPI_INIT(...)
call do_something_with_mpicommworld
end program main
This program fails with the following error: MPI_Cart_create(199): Invalid communicator, unless
the line marked with (**) is uncommented.
Now, maybe my knowledge of Fortran 90 is incomplete, but i thought if you have a use clause in the module definition (see my module x), whichever global variable exists in the included module (in case of x : MPI_COMM_WORLD from include module mpi) will have the same value in any of the contained subroutines ( do_something_with_mpicommworld ) even when those subroutines do not explicitly include the module (e.g. when (**) is commented out). Or, to put it simply, if you include a module within another module, the subroutines contained in the second module will have access to the globals in the included module without a special use statement.
When I ran my programme, I saw a different behaviour. The sub contained in x was creating errors unless it had the 'use mpi' statement.
So what is the problem, do I have a wrong idea about Fortran 90, or is there something special about MPI module which induces such behaviour?
Its annoyingly hard to find exact details about what should and shouldn't happen in these cases, and my expectation was the same as yours -- the `use mpi' should work as above. So I tried the following:
module hellompi
use mpi
implicit none
contains
subroutine hello
integer :: ierr, nprocs, rank
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
print *, 'Hello world, from ', rank, ' of ', nprocs
print *, MPI_COMM_WORLD
call MPI_FINALIZE(ierr)
return
end subroutine hello
end module hellompi
and it works fine under both gfortran and ifort with OpenMPI. Adding a cart_create doesn't change anything.
What strikes me as weird with your case is that it isn't complaining that MPI_COMM_WORLD isn't defined -- so obviously some of the relevant information is being propagated to the subroutine. Can you post a simpler full example which still fails to work?
Thank you Johnatan for your answer. The problem was really, really simple. I added the subroutine in question after the "end module"
:-D, 'implicit none' did not apply to now external sub and compiler happily initialised a brand new variable MPI_COMM_WORLD to whatever it thought suitable following the standard implicit rules.
This is just a lesson to me to enforce 'implicit none' not only by keywords, but also via the compiler flag. Evil lurks after every end statement.
I'm sorry you went trough the trouble of making the test example, I'd buy you a beer if I could :-)

Resources