I know MPI_SEND will be halted until MPI_RECV is executed in the target node. What about MPI_Bcast? Does it need to wait until all other nodes receive it?
MPI_Bcast is a collective operation, so all ranks in the communicator must call it. When it returns successfully, it is guaranteed that the data transfer has completed.
Related
I am learning MPI, but i do not know exactly how they are different.
Could someone explain to me what are the differences between MPI_Send an MPI_Isend, please?
MPI_Send() blocks until the send buffer can be modified.
From a pragmatic point of view, MPI_Send() of a large message blocks until it is received.
Though MPI_Send() of a short message might return immediately, a correctly written MPI application should never expect MPI_Send() returns before the message is received.
MPI_Isend() returns immediately, and the send buffer cannot be modified until the issued request completes.
When exactly signal will start execution in unix ?Does the signal will be processed when system turns into kernel mode? or immediately when it is receives signal? I assume it will be processed immediate when it receives.
A signal is the Unix mechanism for allowing a user space process to receive asynchronous notifications. As such, signals are always "delivered by" the kernel. And hence, it is impossible for a signal to be delivered without a transition into kernel mode. Therefore it doesn't make sense to talk of a process "receiving" a signal (or sending one) without the involvement of the kernel.
Signals can be generated in different ways.
They can be generated by a device driver within the kernel (for example, tty driver in response to the interrupt, kill, or stop keys or in response to input or output by a backgrounded process).
They can be generated by the kernel in response to an emergent out-of-memory condition.
They can be generated by a processor exception in response to something the process itself does during its execution (illegal instruction, divide by zero, reference an illegal address).
They can be generated directly by another process (or by the receiving process itself) via kill(2).
SIGPIPE can be generated as a result of writing to a pipe that has no reader.
But in every case, the signal is delivered to the receiving process by the kernel and hence through a kernel-mode transition.
The kernel might need to force that transition -- pre-empt the receiving process -- in order to deliver the signal (for example, in the case of a CPU-bound process running on processor A being sent a signal by a different process running on processor B).
In some cases, the signal may be handled for the process by the kernel itself (for example, with SIGKILL -- or several others when no signal handler is configured).
Actually invoking a process' signal handler is done by manipulating the process' user space stack so that the signal handler is invoked on return from kernel-mode and then, if/when the signal handler procedure returns, the originally executing code can be resumed.
As to when it is processed, that is subject to a number of different factors.
There are operating system (i.e. kernel) operations that are never interrupted by signals (these are generally relatively short duration operations), in which case the signal will be processed after their completion.
The process may have temporarily blocked signal delivery, in which case the signal will be "pending" until it is unblocked.
The process could be swapped out or non-runnable for any of a number of reasons -- in which case, its signal handler cannot be invoked until the process is runnable again.
Resuming the process in order to deliver the signal might be delayed by interrupts and higher priority tasks.
A signal will be immediately detected by the process which receives it.
Depending on the signal type, the process might treat it with the default handler, might ignore it or might execute a custom handler. It depends a lot on what the process is and what signal it receives. The exception is the kill signal (9) which is treated by the kernel and terminates the execution of the process which was supposed to receive it.
Suppose my MPI process is waiting for a very big message, and I am waiting for it with MPI_Probe. Is it correct to suppose the MPI_Probe call will return as soon as the process receives the first notice of the message from the network (like a header with the size or something like)?
I.e., will it return much faster than if I was waiting for the message with MPI_Recv, because it wouldn't need to receive the full message?
The standard is fairly silent on this matter (MPI-3.0, section 3.8.1), but does offer this:
The MPI implementation of MPI_PROBE and MPI_IPROBE needs to guarantee progress:
if a call to MPI_PROBE has been issued by a process, and a send that matches the probe
has been initiated by some process, then the call to MPI_PROBE will return, unless the
message is received by another concurrent receive operation (that is executed by another
thread at the probing process).
Since both MPI_PROBE and MPI_RECV will engage the progress engine, I would doubt there is much difference between the two functions, aside from a memory copy. By engaging the progress engine, it's likely the message will be received (internally) by the MPI implementation. The last step of copying it into the user's buffer can be avoided in MPI_PROBE.
If you are worried about performance, then avoiding MPI_ANY_SOURCE and MPI_ANY_TAG if possible will help most implementations (certainly MPICH) take a faster path.
I got slightly mixed up regarding the concept of synchronous - asynchronous in the context of blocking & non blocking operations (in OpenMPI) from here:
link 1
: MPI_Isend is not necessarily asynchronous ( so it can synchronous ?)
link 2
:The MPI_Isend() and MPI_Irecv() are the ASYNCHRONOUS communication primitives of MPI.
I have already gone through the previous sync - async - blocking - non blocking questions on stackoverflow (asynchronous vs non-blocking), but were of no help to me.
As far as i know :
Immediate (MPI_Isend): method returns & executes next line -> nonblocking
Standard/non-immediate (MPI_Send) : for large messages it blocks until transfer completes
A synchronous operation blocks (http://www.cs.unc.edu/~dewan/242/s07/notes/ipc/node9.html)
An asynchronous operation is non-blocking (http://www.cs.unc.edu/~dewan/242/s07/notes/ipc/node9.html)
so how & why MPI_ISEND may be blocking (link 1) as well as non blocking (link 2) ?
i.e.what is meant by asynchronous & synchronous MPI_Isend here ?
Similar confusion arises regarding MPI_Ssend & MPI_Issend, since the S in MPI_SSEND means synchronous(or blocking) and:-
MPI_Ssend: synchronous send blocks until data is received on remote process & ack is
received by sender,
MPI_Issend: means immediate synchronous send
also the Immediate is non-blocking,
So, how can MPI_ISSEND be Synchronous & return Immediately ?
I guess more clarity is needed in asynchronous & synchronous in context of blocking & non blocking OpenMPI communication .
A practical example or analogy in this regard will be very useful.
There is a distinction between when MPI function calls return (blocking vs. non-blocking) and when the corresponding operation completes (standard, synchronous, buffered, ready-mode).
Non-blocking calls MPI_I... return immediately, no matter if the operation has completed or not. The operation continues in the background, or asynchronously. Blocking calls do not return unless the operation has completed. Non-blocking operations are represented by their handle which could be used to perform a blocking wait (MPI_WAIT) or non-blocking test (MPI_TEST) for completion.
Completion of an operation means that the supplied data buffer is no longer being accessed by MPI and it could therefore be reused. Send buffers become free for reuse either after the message has been put in its entirety on the network (including the case where part of the message might still be buffered by the network equipment and/or the driver), or has been buffered somewhere by the MPI implementation. The buffered case does not require that the receiver has posted a matching receive operation and hence is not synchronising - the receive could be posted at a much later time. The blocking synchronous send MPI_SSEND does not return unless the receiver has posted a receive operation, thus it synchronises both ranks. The non-blocking synchronous send MPI_ISSEND returns immediately but the asynchronous (background) operation won't complete unless the receiver has posted a matching receive.
A blocking operation is equivalent to a non-blocking one immediately followed by a wait. For example:
MPI_Ssend(buf, len, MPI_TYPE, dest, tag, MPI_COMM_WORLD);
is equivalent to:
MPI_Request req;
MPI_Status status;
MPI_Issend(buf, len, MPI_TYPE, dest, tag, MPI_COMM_WORLD, &req);
MPI_Wait(&req, &status);
The standard send (MPI_SEND / MPI_ISEND) completes once the message has been constructed and the data buffer provided as its first argument might be reused. There is no synchronisation guarantee - the message might get buffered locally or remotely. With most implementations, there is usually some size threshold: messages up to that size get buffered while longer messages are sent synchronously. The threshold is implementation-dependent.
Buffered sends always buffer the messages into a user-supplied intermediate buffer, essentially performing a more complex memory copy operation. The difference between the blocking (MPI_BSEND) and the non-blocking version (MPI_IBSEND) is that the former does not return before all the message data has been buffered.
The ready send is a very special kind of operation. It only completes successfully if the destination rank has already posted the receive operation by the time when the sender makes the send call. It might reduce the communication latency by eliminating the need to performing some kind of a handshake.
I have a program where there is a master/slave setup, and I have some functions implemented for the master which sends different kinds of data to the slaves. Some functions send to individual slaves, but some broadcast information to all the slaves via MPI_Bcast.
I want to have only one receive function in the slaves, so I want to know if I can probe for a message and know if it was broadcasted or sent as a normal blocking message, since there are different method to receive what was broadcasted and what was sent normally.
No, you can't decide whether to call Bcast or Recv on the basis of a probe call.
A MPI_Bcast call is a collective operation -- all MPI tasks must participate. As a result, these are not like point to point communication; they make use of the fact that all processes are involved to make higher-order optimizations.
Because the collective operations imply so much synchronization, it just doesn't make sense to allow other tasks to check to see whether they should start participating in a collective; it's something which has to be built into the logic of a program.
The root process' role in a broadcast is not like a Send; it can't, in general, just call MPI_Bcast and then proceed. The implementation will almost certainly block until some other number of processes have participated in the broadcast; and
The other process' role in a broadcast is not like receiving a message; in general it will be both receiving and sending information. So participating in a broadcast is different from making a simple Recv call.
So Probe won't work; the documentation for MPI_Probe is fairly clear that it returns information about what would happen upon the next MPI_Recv, and Recv is a different operation than Bcast.
You may be able to get some of what you want in MPI 3.0, which is being finalized now, which allows for nonblocking collectives -- eg, MPI_Ibcast. In that case you could start the Broadcast and call MPI_Test to check on the status of the request. However, even here, everyone would need to call the MPI_Ibcast first; this just allows easier interleaving of collective and point-to-point communication.