MPI_Exscan- How to use for parallel File write - mpi

I am working on some MPI code for Parallel IO.
Say I have an array [2 ,5,6,9,0,4,3,1,8,7 ] written in a File, and I have 2 processes.
I define File views to read part of this file.
Process 0 sees : 2 ,5,6,9,0 (5 elements)
Process 1 sees : 4,3,1,8,7 (5 elements)
Both processes call a sort function . As a result of sort function,
Process 0 has : 0,1,2,3 (4 elements)
Process 1 has : 4,5,6,7,8,9 (6 elements)
Now i need to write this output to a new output file.
Process 0 would write it correctly as it writes at offset 0. But how would Process 1 know the offset where to write the file ? I know I need to define a File view to write , BUT what would be the new displacement. I kind of think MPI_Exscan can do it.. But I am not sure how... Can someone help please ??
Thanks in Advance

Sure; you use MPI_Exscan with MPI_SUM on the local number of elements to get the total number to the "left" of you, and use that in your view (either as an offset, or in the type you create to define your view).
Here's a little Fortran program where each rank generates a "random" (well, 2*rank+1) sized array of characters ('0' for rank 0, etc), uses MPI_Exscan to find out what offset it should use when writing, and then writes:
program testexscan
use mpi
implicit none
integer :: nelements, nleft, total
character, allocatable, dimension(:) :: array
integer :: rank, nprocs, ierr, fh
integer(kind=MPI_OFFSET_KIND) :: disp
call MPI_Init(ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, nprocs, ierr)
call generatedata(rank, array)
nelements = size(array)
call MPI_Exscan (nelements, nleft, 1, MPI_INTEGER, MPI_SUM, MPI_COMM_WORLD, ierr)
print *, 'rank = ', rank, ' nelements = ', nelements, ' nleft= ', nleft
call MPI_File_open(MPI_COMM_WORLD, 'output.txt', ior(MPI_MODE_WRONLY,MPI_MODE_CREATE), &
MPI_INFO_NULL, fh, ierr)
disp = nleft * 1 ! offset in bytes
call MPI_File_set_view(fh, disp, MPI_CHARACTER, MPI_CHARACTER, "native", MPI_INFO_NULL, ierr )
call MPI_File_write_all(fh, array, nelements, MPI_CHARACTER, MPI_STATUS_IGNORE, ierr)
call MPI_File_close(fh, ierr)
deallocate(array)
call MPI_Finalize(ierr)
contains
subroutine generatedata(rank, arr)
character, dimension(:), allocatable, intent(inout) :: arr
integer, intent(in) :: rank
nelements = rank * 2 + 1
allocate(array(nelements))
array = char(ichar("0")+rank)
end subroutine generatedata
end program testexscan
Running this gives:
$ mpif90 -o testexscan testexscan.f90
$ mpirun -np 4 ./testexscan
rank = 0 nelements = 1 nleft= 0
rank = 1 nelements = 3 nleft= 1
rank = 2 nelements = 5 nleft= 4
rank = 3 nelements = 7 nleft= 9
$ cat output.txt
0111222223333333

Your description is a bit vague, but in any case, process 1 can just send a message to process 2 that contains the offset to be used.

Related

How do I finish a recursive binary search in fortran?

I am trying to find the smallest index containing the value i in a sorted array. If this i value is not present I want -1 to be returned. I am using a binary search recursive subroutine. The problem is that I can't really stop this recursion and I get lot of answers(one right and the rest wrong). And sometimes I get an error called "segmentation fault: 11" and I don't really get any results.
I've tried to delete this call random_number since I already have a sorted array in my main program, but it did not work.
program main
implicit none
integer, allocatable :: A(:)
real :: MAX_VALUE
integer :: i,j,n,s, low, high
real :: x
N= 10 !size of table
MAX_VALUE = 10
allocate(A(n))
s = 5 ! searched value
low = 1 ! lower limit
high = n ! highest limit
!generate random table of numbers (from 0 to 1000)
call Random_Seed
do i=1, N
call Random_Number(x) !returns random x >= 0 and <1
A(i)= anint(MAX_VALUE*x)
end do
call bubble(n,a)
print *,' '
write(*,10) (a(i),i=1,N)
10 format(10i6)
call bsearch(A,n,s,low,high)
deallocate(A)
end program main
The sort subroutine:
subroutine sort(p,q)
implicit none
integer(kind=4), intent(inout) :: p, q
integer(kind=4) :: temp
if (p>q) then
temp = p
p = q
q = temp
end if
return
end subroutine sort
The bubble subroutine:
subroutine bubble(n,arr)
implicit none
integer(kind=4), intent(inout) :: n
integer(kind=4), intent(inout) :: arr(n)
integer(kind=4) :: sorted(n)
integer :: i,j
do i=1, n
do j=n, i+1, -1
call sort(arr(j-1), arr(j))
end do
end do
return
end subroutine bubble
recursive subroutine bsearch(b,n,i,low,high)
implicit none
integer(kind=4) :: b(n)
integer(kind=4) :: low, high
integer(kind=4) :: i,j,x,idx,n
real(kind=4) :: r
idx = -1
call random_Number(r)
x = low + anint((high - low)*r)
if (b(x).lt.i) then
low = x + 1
call bsearch(b,n,i,low,high)
else if (b(x).gt.i) then
high = x - 1
call bsearch(b,n,i,low,high)
else
do j = low, high
if (b(j).eq.i) then
idx = j
exit
end if
end do
end if
! Stop if high = low
if (low.eq.high) then
return
end if
print*, i, 'found at index ', idx
return
end subroutine bsearch
The goal is to get the same results as my linear search. But I'am getting either of these answers.
Sorted table:
1 1 2 4 5 5 6 7 8 10
5 found at index 5
5 found at index -1
5 found at index -1
or if the value is not found
2 2 3 4 4 6 6 7 8 8
Segmentation fault: 11
There are a two issues causing your recursive search routine bsearch to either stop with unwanted output, or result in a segmentation fault. Simply following the execution logic of your program at the hand of the examples you provided, elucidate the matter:
1) value present and found, unwanted output
First, consider the first example where array b contains the value i=5 you are searching for (value and index pointed out with || in the first two lines of the code block below). Using the notation Rn to indicate the the n'th level of recursion, L and H for the lower- and upper bounds and x for the current index estimate, a given run of your code could look something like this:
b(x): 1 1 2 4 |5| 5 6 7 8 10
x: 1 2 3 4 |5| 6 7 8 9 10
R0: L x H
R1: Lx H
R2: L x H
5 found at index 5
5 found at index -1
5 found at index -1
In R0 and R1, the tests b(x).lt.i and b(x).gt.i in bsearch work as intended to reduce the search interval. In R2 the do-loop in the else branch is executed, idx is assigned the correct value and this is printed - as intended. However, a return statement is now encountered which will return control to the calling program unit - in this case that is first R1(!) where execution will resume after the if-else if-else block, thus printing a message to screen with the initial value of idx=-1. The same happens upon returning from R0 to the main program. This explains the (unwanted) output you see.
2) value not present, segmentation fault
Secondly, consider the example resulting in a segmentation fault. Using the same notation as before, a possible run could look like this:
b(x): 2 2 3 4 4 6 6 7 8 8
x: 1 2 3 4 5 6 7 8 9 10
R0: L x H
R1: L x H
R2: L x H
R3: LxH
R4: H xL
.
.
.
Segmentation fault: 11
In R0 to R2 the search interval is again reduced as intended. However, in R3 the logic fails. Since the search value i is not present in array b, one of the .lt. or .gt. tests will always evaluate to .true., meaning that the test for low .eq. high to terminate a search is never reached. From this point onwards, the logic is no longer valid (e.g. high can be smaller than low) and the code will continue deepening the level of recursion until the call stack gets too big and a segmentation fault occurs.
These explained the main logical flaws in the code. A possible inefficiency is the use of a do-loop to find the lowest index containing a searched for value. Consider a case where the value you are looking for is e.g. i=8, and that it appears in the last position in your array, as below. Assume further that by chance, the first guess for its position is x = high. This implies that your code will immediately branch to the do-loop, where in effect a linear search is done of very nearly the entire array, to find the final result idx=9. Although correct, the intended binary search rather becomes a linear search, which could result in reduced performance.
b(x): 2 2 3 4 4 6 6 7 |8| 8
x: 1 2 3 4 5 6 7 8 |9| 10
R0: L xH
8 found at index 9
Fixing the problems
At the very least, you should move the low .eq. high test to the start of the bsearch routine, so that recursion stops before invalid bounds can be defined (you then need an additional test to see if the search value was found or not). Also, notify about a successful search right after it occurs, i.e. after the equality test in your do-loop, or the additional test just mentioned. This still does not address the inefficiency of a possible linear search.
All taken into account, you are probably better off reading up on algorithms for finding a "leftmost" index (e.g. on Wikipedia or look at a tried and tested implementation - both examples here use iteration instead of recursion, perhaps another improvement, but the same principles apply) and adapt that to Fortran, which could look something like this (only showing new code, ...refer to existing code in your examples):
module mod_search
implicit none
contains
! Function that uses recursive binary search to look for `key` in an
! ordered `array`. Returns the array index of the leftmost occurrence
! of `key` if present in `array`, and -1 otherwise
function search_ordered (array, key) result (idx)
integer, intent(in) :: array(:)
integer, intent(in) :: key
integer :: idx
! find left most array index that could possibly hold `key`
idx = binary_search_left(1, size(array))
! if `key` is not found, return -1
if (array(idx) /= key) then
idx = -1
end if
contains
! function for recursive reduction of search interval
recursive function binary_search_left(low, high) result(idx)
integer, intent(in) :: low, high
integer :: idx
real :: r
if (high <= low ) then
! found lowest possible index where target could be
idx = low
else
! new guess
call random_number(r)
idx = low + floor((high - low)*r)
! alternative: idx = low + (high - low) / 2
if (array(idx) < key) then
! continue looking to the right of current guess
idx = binary_search_left(idx + 1, high)
else
! continue looking to the left of current guess (inclusive)
idx = binary_search_left(low, idx)
end if
end if
end function binary_search_left
end function search_ordered
! Move your routines into a module
subroutine sort(p,q)
...
end subroutine sort
subroutine bubble(n,arr)
...
end subroutine bubble
end module mod_search
! your main program
program main
use mod_search, only : search_ordered, sort, bubble ! <---- use routines from module like so
implicit none
...
! Replace your call to bsearch() with the following:
! call bsearch(A,n,s,low,high)
i = search_ordered(A, s)
if (i /= -1) then
print *, s, 'found at index ', i
else
print *, s, 'not found!'
end if
...
end program main
Finally, depending on your actual use case, you could also just consider using the Fortran intrinsic procedure minloc saving you the trouble of implementing all this functionality yourself. In this case, it can be done by making the following modification in your main program:
! i = search_ordered(a, s) ! <---- comment out this line
j = minloc(abs(a-s), dim=1) ! <---- replace with these two
i = merge(j, -1, a(j) == s)
where j returned from minloc will be the lowest index in the array a where s may be found, and merge is used to return j when a(j) == s and -1 otherwise.

How to write out complex numbers using Fortran specifier

I am new to Fortran and I have the following code, it basically solves a simple quadratic equation and output the solutions
!solving ax^2 + bx + c = 0
program output
implicit none
real :: a,b,c
character :: response
do
print*, 'Enter three coefficients a,b and c'
read *, a,b,c
call quad(a,b,c)
print*, 'Press Y to continue. Anykey for otherwise'
read *, response
if ( response /= 'y' .and. response /= 'Y') stop
end do
end program output
!Function to calculate Xs
subroutine quad(a,b,c)
implicit none
real :: a,b,c,xplus, xminus
xplus = ((-b)+sqrt((b**2)-(4*a*c)))/(2*a)
xminus = ((-b)-sqrt((b**2)-(4*a*c)))/(2*a)
if (xplus == xminus) then
print*, 'There exists 1 root only: '
write(*,12) xplus
12 format(2f10.2)
else
print*, 'Solutions of quadratic equation are'
write(*,10) xplus, xminus
10 format(1f10.5)
end if
end subroutine quad
This works. However, how would I go about with complex solutions. I.e, how would this line change, to make the format for complex numbers.
10 format(1f10.5)
Thank you so much.

Using FFTW on tensor

Dear All I tried to find an answer googling but I haven't been able to find an answer.
I'm using fftw in an MPI Fotran application and i need to compute forward and backward transform of a 3D array of tensor component by component, and while in fourier space compute some complex tensorial quantities.
In order to make the array used by ffftw useful and don't spend a lot of time moving data from an array to another one the option that came into my mind was to declare a 5d dimensional array: i.e
use, intrinsic :: iso_c_binding
call MPI_INIT( mpi_err )
call MPI_COMM_RANK( MPI_COMM_WORLD, mpi_rank, mpi_err )
call MPI_COMM_SIZE( MPI_COMM_WORLD, mpi_size, mpi_err )
integer(C_INTPTR_T), parameter :: FFTDIM=3 !fft dimension
integer(C_INTPTR_T) :: fft_L !x direction
integer(C_INTPTR_T) :: fft_M !y direction
integer(C_INTPTR_T) :: fft_N !z direction
complex(C_DOUBLE_COMPLEX), pointer :: fft_in(:,:,:,:,:), fft_out(:,:,:,:,:)
type(C_PTR) :: fft_plan_fwd, fft_plan_bkw, fft_datapointer
integer(C_INTPTR_T) :: fft_alloc_local, fft_local_n0, fft_local_0_start
include 'mpif.h'
include 'fftw3-mpi.f03'
call fftw_mpi_init
fft_L=problem_dim(1)
fft_M=problem_dim(2)
fft_N=problem_dim(3)
! CALCULATE LOCAL SIZE OF FFT VARIABLE FOR EACH COMPOENNT
fft_alloc_local = fftw_mpi_local_size_3d(fft_N,fft_M,fft_L, MPI_COMM_WORLD, &
fft_local_n0, fft_local_0_start)
! allocate data pointer
fft_datapointer = fftw_alloc_complex(9*int(fft_alloc_local,C_SIZE_T))
! link pointers to the same array
call c_f_pointer(fft_datapointer, fft_in, [ FFTDIM, FFTDIM, fft_L, fft_M, fft_local_n0])
call c_f_pointer(fft_datapointer, fft_out, [ FFTDIM, FFTDIM, fft_L, fft_M, fft_local_n0])
! create plans
fft_plan_fwd = fftw_MPI_plan_dft_3d(fft_N, fft_M, fft_L, & !dimension
fft_in(1,1,:,:,:), fft_out(1,1,:,:,:), & !inpu, output
MPI_COMM_WORLD, FFTW_FORWARD, FFTW_MEASURE)
fft_plan_bkw = fftw_MPI_plan_dft_3d(fft_N, fft_M, fft_L, & !dimension
fft_in(1,1,:,:,:), fft_out(1,1,:,:,:), & !inpu, output
MPI_COMM_WORLD, FFTW_BACKWARD, FFTW_MEASURE)
now if use this peace of code and the number of processors is a multiple of 2 (2,4,8...) everything works fine, but if I use for instance 6 the application will give an error. how could i solve this issue?
do you have any better strategies instead of allocating a 5d array and without moving to many data??
Thanks in advance
Andrea
I found the solution to this problem utilizing the fffw_mpi_plan_many interface
the code performing this computation follows here. It calculate a 3D(LxMxN) complex to complex transform of tensor component by component (11,12,...) utilizing MPI capabilities. The extent on the third dimension(N) must be divisible for the number of core utilized
program test_fftw
use, intrinsic :: iso_c_binding
implicit none
include 'mpif.h'
include 'fftw3-mpi.f03'
integer(C_INTPTR_T) :: L = 8 ! extent of x data
integer(C_INTPTR_T) :: M = 8 ! extent of y data
integer(C_INTPTR_T) :: N = 192 ! extent of z data
integer(C_INTPTR_T) :: FFT_12_DIM = 3 ! tensor dimension
integer(C_INTPTR_T) :: ll, mm, nn, i, j
complex(C_DOUBLE_COMPLEX) :: fout
! many plan data variables
integer(C_INTPTR_T) :: howmany=9 ! numer of eleemnt of the tensor
integer :: rank=3 ! rank of the transform
integer(C_INTPTR_T), dimension(3) :: fft_dims ! array containing data extent
integer(C_INTPTR_T) :: alloc_local_many, fft_local_n0, fft_local_0_start
complex(C_DOUBLE_COMPLEX), pointer :: fft_data(:,:,:,:,:)
type(C_PTR) ::fft_datapointer, plan_many
integer :: ierr, myid, nproc
! Initialize
call mpi_init(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
call fftw_mpi_init()
! write data dimenion in reversed order
fft_dims(3) = L
fft_dims(2) = M
fft_dims(1) = N
! use of alloc many
alloc_local_many = fftw_mpi_local_size_many(rank, & ! rank of the transform in this case 3
fft_dims, & ! array containing data dimension in reversed order
howmany, & ! numebr of transform to compute in this case 3x3=9
FFTW_MPI_DEFAULT_BLOCK, & !default block size
MPI_COMM_WORLD, & ! mpi communicator
fft_local_n0, & ! local numebr of slice by core
fft_local_0_start) ! local shift on the last dimension
fft_datapointer = fftw_alloc_complex(alloc_local_many) ! allocate aligned memory for the data
! associate data pointer with allocated memory: note natural order
call c_f_pointer(fft_datapointer, fft_data, [FFT_12_DIM,FFT_12_DIM,L,M, fft_local_n0])
! create the plan for many inplace multidimensional transform
plan_many = fftw_mpi_plan_many_dft( &
rank , fft_dims, howmany, &
FFTW_MPI_DEFAULT_BLOCK, FFTW_MPI_DEFAULT_BLOCK, &
fft_data, fft_data, &
MPI_COMM_WORLD, FFTW_FORWARD, FFTW_ESTIMATE )
! initialize data to some function my_function(i,j)
do nn = 1, fft_local_n0
do mm = 1, M
do ll = 1, L
do i = 1, FFT_12_DIM
do j = 1, FFT_12_DIM
fout = ll*mm*nn*i*j
fft_data(i,j,ll,mm,nn) = fout
end do
end do
end do
end do
enddo
call fftw_mpi_execute_dft(plan_many, fft_data, fft_data)!
call fftw_destroy_plan(plan_many)
call fftw_mpi_cleanup()
call fftw_free(fft_datapointer)
call mpi_finalize(ierr)
end program test_fftw
thanks everyone for the help !!

fortran 90 expected bounds specification during pointer assignment

I am new to Fortran. I am writing a program in Fortran 90 to get non-zero elements of an array and put them into a new array using pointer function as following:
program prog
implicit none
integer, target :: a(5)
integer :: i
integer, pointer :: nz(:)
a(1) = 1
a(2) = 0
a(3) = 0
a(4) = 2
a(5) = 3
nz => non_zeros(a)
do i=1, size(nz)
write(*,*) nz(i)
end do
contains
function non_zeros(a)
integer, target :: a(:)
integer, pointer:: non_zeros(:)
integer :: n, i, j
n = count(a .ne. 0)
allocate(non_zeros(n))
j = 0
do i=1, m
if (a(i) .ne. 0) then
j = j + 1
non_zeros(j) => a(i)
end if
end do
end function non_zeros
end program prog
during compiling I got the error:
non_zeros(j) => a(i)
1
Error: Expected bounds specification for 'non_zeros' at (1)
Can you please tell me what did I do wrong? Thank you in advance!
Update of my question: According to the explanation of High Performance Mark, I defined a derived type:
program prog
implicit none
integer, target :: a(5)
type dt
integer, pointer :: x
end type
type(dt), allocatable :: nz(:)
a(1) = 1
a(2) = 0
a(3) = 0
a(4) = 2
a(5) = 3
nz = non_zeros(a)
contains
function non_zeros(a)
integer, target :: a(:)
type(dt), allocatable :: non_zeros(:)
integer :: n, i, j
n = count(a .ne. 0)
allocate(non_zeros(n))
j = 0
do i=1, m
if (a(i) .ne. 0) then
j = j + 1
non_zeros(j)%x => a(i)
end if
end do
end function non_zeros
end program prog
Now program works and gives the desired results. However, I did not use pointer function in this case, since my function returns an allocatable array of pointers, not pointer to an array. Is there any way to use pointer function here? Thank you
To get the non-zero elements of a into a new array you could simply declare
integer, dimension(:), allocatable :: non_zeros
and then populate that with the statement
non_zeros = pack(a,a/=0)
and avoid fiddling around with pointers entirely. This relies on a feature introduced in the 2003 standard, but it is implemented by all (I think) the current crop of Fortran compilers on the market.
The code that you have written looks to me as if you want nz to be an array of pointers, with each element in nz pointing to a non-zero element of a. If I'm right, you've misunderstood what a statement such as
integer, pointer :: nz(:)
declares. It does not declare an array of pointers to integers, it declares a pointer to an array of integers. When you write
non_zeros(j) => a(i)
you're making the mistake of trying to set an element of non_zeros to point to an element of a.
The error message is misleading here because the compiler interprets non_zeros(j) as a syntactically-incorrect bounds-spec or bounds-remapping, but the error is semantic, the compiler doesn't understand your misunderstanding of Fortran.

Synchronize array over MPI processes, if each thread changed a part of it?

I have a program I want to parallelize using MPI. I have not worked with MPI before.
The program calculates the behavior for a large numer of objects over time. The data of
these objects is stored in arrays, e.g. double precision :: body_x(10000) for the x coordinate.
To calculate the behavior of an object the information about all other objects is needed,
so every thread needs to hold all data but will only update a portion of it. But before the
new timestep every thread needs to get the information from all other threads.
As I understand MPI_Allgather could be used for this, but it needs a send buffer and a
recive buffer. How can I synchronize an array over different threads if each thread updated
a different part of the array? Do I have to send the whole array from each thread to the
master in a recive buffer, update the specific part of the masters array and after all
threads have sent their data re-broadcast from the master?
This is a pretty basic question, but I'm very new to MPI and all examples I found are
pretty simple and do not cover this. Thanks for any help.
Pseudo-Example (assuming Fortran-Style vectors with first index 1):
(Yes the send/recive would better be done non-blocking, this is for the sake of simplicity)
if (master) then
readInputFile
end if
MPI_Bcast(numberOfObject)
allocate body_arrays(numberOfObjects)
if (master) then
fill body_arrays ! with the data from the input file
end if
MPI_Bcast(body_arrays)
objectsPerThread = numberOfObjects / threadCount
myStart = threadID * objectsPerThread + 1
myEnd = (threadID + 1) * objectsPerThread
do while (t < t_end)
do i = myStart, myEnd
do stuff for body_arrays(i)
end do
! here is the question
if (.not. master)
MPI_Send(body_arrays, toMaster)
else
do i = 1, threadCount - 1
MPI_Recive(body_arrays_recive, senderID)
body_arrays(senderID*objectsPerThread+1, (senderId+1)*objectsPerThread) = body_arrays_recive(senderID*objectsPerThread+1, (senderId+1)*objectsPerThread)
end if
MPI_Bcast(body_arrays)
! ----
t = t + dt
end do
It sounds like you want MPI_Allgather. To avoid needing a separate send buffer, you may be able to use the MPI_IN_PLACE value. That tells MPI to use the same buffer for both send and receive.
See http://mpi-forum.org/docs/mpi-2.2/mpi22-report/node99.htm#Node99
The array chunks from all processes can be combined using a call to MPI_Allgatherv. The following is a complete example in Fortran. It defines an array of size 50. Then each process sets a chunk of that array to some complex number. Finally, the call to MPI_allgatherv pulls all the chunks together. The calculations of the chunk sizes, and some of the parameters that need to be passed to MPI_allgatherv are encapsulated in the mpi_split routine.
program test
use mpi
implicit none
integer, parameter :: idp = 8
integer, parameter :: n_tasks = 11
real(idp), parameter :: zero = 0.0d0
complex(idp), parameter :: czero = cmplx(zero, zero, kind=idp)
integer :: mpi_n_procs, mpi_proc_id, error
integer :: i, i_from, i_to
complex(idp) :: c(-5:5)
real(idp) :: split_size
integer, allocatable :: recvcount(:), displs(:)
call MPI_Init(error)
call MPI_Comm_size(MPI_COMM_WORLD, mpi_n_procs, error)
call MPI_Comm_rank(MPI_COMM_WORLD, mpi_proc_id, error)
allocate(recvcount(mpi_n_procs))
allocate(displs(mpi_n_procs))
i_from = -5
i_to = 5
! each process covers only part of the array
call mpi_split(i_from, i_to, counts=recvcount, displs=displs)
write(*,*) "ID", mpi_proc_id,":", i_from, "..", i_to
if (mpi_proc_id == 0) then
write(*,*) "Counts: ", recvcount
write(*,*) "Displs: ", displs
end if
c(:) = czero
do i = i_from, i_to
c(i) = cmplx(real(i, idp), real(i+1, idp), kind=idp)
end do
call MPI_Allgatherv(c(i_from), i_to-i_from+1, MPI_DOUBLE_COMPLEX, c, &
& recvcount, displs, MPI_DOUBLE_COMPLEX, MPI_COMM_WORLD, &
& error)
if (mpi_proc_id == 0) then
do i = -5, 5
write(*,*) i, ":", c(i)
end do
end if
deallocate(recvcount, displs)
call MPI_Finalize(error)
contains
!! #description: split the range (a,b) into equal chunks, where each chunk is
!! handled by a different MPI process
!! #param: a On input, the lower bound of an array to be processed. On
!! output, the lower index of the chunk that the MPI process
!! `proc_id` should process
!! #param: b On input, the upper bound of an array. On, output the
!! upper index of the chunk that process `proc_id` should
!! process.
!! #param: n_procs The total number of available processes. If not given,
!! this is determined automatically from the MPI environment.
!! #param: proc_id The (zero-based) process ID (`0 <= proc_id < n_procs`). If
!! not given, the ID of the current MPI process
!! #param: counts If given, must be of size `n_procs`. On output, the chunk
!! size for each MPI process
!! #param: displs If given, must be of size `n_procs`. On output, the offset
!! if the first index processed by each MPI process, relative
!! to the input value of `a`
subroutine mpi_split(a, b, n_procs, proc_id, counts, displs)
integer, intent(inout) :: a
integer, intent(inout) :: b
integer, optional, intent(in) :: n_procs
integer, optional, intent(in) :: proc_id
integer, optional, intent(inout) :: counts(:)
integer, optional, intent(inout) :: displs(:)
integer :: mpi_n_procs, n_tasks, mpi_proc_id, error
integer :: aa, bb
real(idp) :: split_size
logical :: mpi_is_initialized
mpi_n_procs = 1
if (present(n_procs)) mpi_n_procs = n_procs
mpi_proc_id = 0
if (present(proc_id)) mpi_proc_id = proc_id
if (.not. present(n_procs)) then
call MPI_Comm_size(MPI_COMM_WORLD, mpi_n_procs, error)
end if
if (.not. present(proc_id)) then
call MPI_Comm_rank(MPI_COMM_WORLD, mpi_proc_id, error)
end if
aa = a
bb = b
n_tasks = bb - aa + 1
split_size = real(n_tasks, idp) / real(max(mpi_n_procs, 1), idp)
a = nint(mpi_proc_id * split_size) + aa
b = min(aa + nint((mpi_proc_id+1) * split_size) - 1, bb)
if (present(counts)) then
do mpi_proc_id = 0, mpi_n_procs-1
counts(mpi_proc_id+1) = max(nint((mpi_proc_id+1) * split_size) &
& - nint((mpi_proc_id) * split_size), 0)
end do
end if
if (present(displs)) then
do mpi_proc_id = 0, mpi_n_procs-1
displs(mpi_proc_id+1) = min(nint(mpi_proc_id * split_size), bb-aa)
end do
end if
end subroutine mpi_split
end program

Resources