MPI inter communication: What is the peer communicator?

MPI inter communication: What is the peer communicator? - mpi

I'm trying to understand how to use MPI_Intercomm_create to create a communication handle from one group to another. These two groups are also written in their own C files so there is no way of one group directly accessing the other's communication handle unless I use a global variable or the like. How do I get the "peer_comm" (3rd argument of the call) for the other group? Or am I just not understanding something?

MPI_Intercomm_create() operates on communicator (e.g. MPI_Comm) and not on groups (e.g. MPI_Group) so let's use the right semantic here.
If you launch several binaries with the same mpirun command line, then they are all in MPI_COMM_WORLD, and this is likely what you want to use for peer_comm.
If you use MPI_Comm_spawn() in order to launch "the other binaries", then it returns your inter communicator, so you likely do not even need MPI_Intercomm_create().
I strongly encourage you to write a Minimal, Complete, and Verifiable example. Not only it will help you clear some confusion, you will more likely get a precise answer once the issue is clearly stated.

Related

How to implement sampling in ns3?

I wonder how to implement sampling in ns3. What exactly I want to implement is to create a simple network of switches and hosts using p2p links. Then, setting a probability (lets say 0.1) for an specific switch and expecting that every packet passing the switch will be captured with probability that I defined earlier. (Pretty much like the sampling in sflow or netflow).
I browsed nsnam.org, and the only tool I found regarding my question is Flow Monitor which I think is not helpful for my purpose.

There isn't a direct way to implement the behavior you want, but there is a solution.
Set up a normal hook to get all packets going through one of the switches. Refer to the tutorial to learn how to use the tracing system.
Then, use a RandomVariable at the beginning of your function to determine whether you want ignore that packet or not. The RandomVariable will need to be in global scope or passed in as parameter to the function.

MPI: Local and non-local calls

I want to know what exactly are the two, and how they are different. What are the advantage or disadvantage of the two types of calls? Really appreciate using some small example code.

These are detailed, e.g., in here: http://www.netlib.org/utk/papers/mpi-book/node14.html
It's not really proper to talk about advantages of either. They are for different purposes. An example of a local call MPI_Comm_rank, since it doesn't need input from other processes and an example of a non-local call is MPI_Send, since it will have to communicate with some other process that will recieve the message.

Effective debugging techniques for unix pipes?

I'm very new to unix programming, so please bear with me. :)
I'd like to pass data between two processes. I was going to use named pipes, but read about these "half-duplex" pipes, and it was very intriguing, so I figured that I would give them a try first.
I have two issues with these pipes thus far:
I haven't figured out how to get execlp to run another application from my child process
Even if I could, debugging is tough because I've only been able to set breakpoints in the parent process
I'm sure there are reasons for these issues. I am starting to wonder if it makes sense to just forget about them and just use named pipes so I can debug each application in a separate instance of eclipse.
If there is any relevant information, please let me know. The code I am using is essentially what it found on tldp.org.
EDIT -- I renamed my question to be about unix pipes in general. I had assumed that for named pipes, I wouldn't have to use fork(), but all of the examples I have seen so far require it. So regardless of half-duplex or named pipes, I'm going to need to be able to debug the child process somehow!
EDIT #2 -- this example clearly shows that what I had seen before (on an IBM link) regarding named pipes wasn't necessarily true.

I recommend two tools:
strace -ff should give you a trace of all significant events, allowing you to examine in detail what's going on, namely, all reads and writes;
lsof allows you to dump file descriptors of involved processes, clearly showing what is connected to what else and, in particular, if you forgot to close() some descriptors and the whole thing deadlocks.

what are exactly MPI, MPICH, and OPENMPI? what does "implementation" mean in this context?

My question might seem silly to those who have been in the field for long time, but I appreciate your patience in elaborating it for me.
When they say MPICH is an "implementation" of MPI, what does it mean?
Is the following analogy true(?):
if we think of MPI as a set of standards for a FORTRAN compiler, then MPICH, and OPENMPI are different versions of FORTRAN compilers, like Intel.Fortran, Compaq.Fortran, GNU.Fortran, and so on.

MPI is a standard: it outlines a particular model for message passing in a distributed system. However, it only gives a series of requirements: it does not actually include any code, nor does it specify how exactly these requirements need to be fulfilled. For example, take a look at this excerpt from the official MPI 2.2 spec (as of today):
A valid MPI implementation guarantees certain general properties of
point-to-point communication, which are described in this section.
Order Messages are non-overtaking: If a sender sends two messages in succession to the same destination, and both match the same
receive, then this operation cannot receive the second message if the
first one is still pending.
It then goes on to explain the rationale behind this requirement and provide an example, but says nothing more about the requirement itself.
An MPI implementation is a library that fulfills every requirement - like the one above - in the MPI specification. However, the standard contains absolutely no requirements as to what language constructs, OS calls, 3rd party libraries, etc can/can't/should be used. Occasionally, it will give advice to implementors, like this:
Advice to implementors. The implementation may keep a reference count
of active communications that use the datatype, in order to decide
when to free it. Also, one may implement constructors of derived
datatypes so that they keep pointers to their datatype arguments,
rather then copying them. In this case, one needs to keep track of
active datatype definition references in order to know when a datatype
object can be freed. (End of advice to implementors.)
however, these are still vague, very language-agnostic, and only recommendations: an implementation can ignore every single one of these advices, and still conform to the standard.
So yes, in essence it's similar to various implementations of a compiler. If a program takes valid source code for a language, and produces binary code that does everything that the language specification says it should do given the original source code, it's a conforming compiler for that language. Similarly, if you can use a library to pass messages in a way that doesn't break any rules of the MPI spec, then that's a valid MPI implementation.

Multitasking using setjmp, longjmp

is there a way to implement multitasking using setjmp and longjmp functions

You can indeed. There are a couple of ways to accomplish it. The difficult part is initially getting the jmpbufs which point to other stacks. Longjmp is only defined for jmpbuf arguments which were created by setjmp, so there's no way to do this without either using assembly or exploiting undefined behavior. User level threads are inherently not portable, so portability isn't a strong argument for not doing it really.
step 1
You need a place to store the contexts of different threads, so make a queue of jmpbuf stuctures for however many threads you want.
Step 2
You need to malloc a stack for each of these threads.
Step 3
You need to get some jmpbuf contexts which have stack pointers in the memory locations you just allocated. You could inspect the jmpbuf structure on your machine, find out where it stores the stack pointer. Call setjmp and then modify its contents so that the stack pointer is in one of your allocated stacks. Stacks usually grow down, so you probably want your stack pointer somewhere near the highest memory location. If you write a basic C program and use a debugger to disassemble it, and then find instructions it executes when you return from a function, you can find out what the offset ought to be. For example, with system V calling conventions on x86, you'll see that it pops %ebp (the frame pointer) and then calls ret which pops the return address off the stack. So on entry into a function, it pushes the return address and frame pointer. Each push moves the stack pointer down by 4 bytes, so you want the stack pointer to start at the high address of the allocated region, -8 bytes (as if you just called a function to get there). We will fill the 8 bytes next.
The other thing you can do is write some very small (one line) inline assembly to manipulate the stack pointer, and then call setjmp. This is actually more portable, because in many systems the pointers in a jmpbuf are mangled for security, so you can't easily modify them.
I haven't tried it, but you might be able to avoid the asm by just deliberately overflowing the stack by declaring a very large array and thus moving the stack pointer.
Step 4
You need exiting threads to return the system to some safe state. If you don't do this, and one of the threads returns, it will take the address right above your allocated stack as a return address and jump to some garbage location and likely segfault. So first you need a safe place to return to. Get this by calling setjmp in the main thread and storing the jmpbuf in a globally accessible location. Define a function which takes no arguments and just calls longjmp with the saved global jmpbuf. Get the address of that function and copy it to your allocated stacks where you left room for the return address. You can leave the frame pointer empty. Now, when a thread returns, it will go to that function which calls longjmp, and jump right back into the main thread where you called setjmp, every time.
Step 5
Right after the main thread's setjmp, you want to have some code that determines which thread to jump to next, pulling the appropriate jmpbuf off the queue and calling longjmp to go there. When there are no threads left in that queue, the program is done.
Step 6
Write a context switch function which calls setjmp and stores the current state back on the queue, and then longjmp on another jmpbuf from the queue.
Conclusion
That's the basics. As long as threads keep calling context switch, the queue keeps getting repopulated, and different threads run. When a thread returns, if there are any left to run, one is chosen by the main thread, and if none are left, the process terminates. With relatively little code you can have a pretty basic cooperative multitasking setup. There are more things you probably want to do, like implement a cleanup function to free the stack of a dead thread, etc. You can also implement preemption using signals, but that is much more difficult because setjmp doesn't save the floating point register state or the flags registers, which are necessary when the program is interrupted asynchronously.

It may be bending the rules a little, but GNU pth does this. It's possible, but you probably shouldn't try it yourself except as an academic proof-of-concept exercise, use the pth implementation if you want to do it seriously and in a remotely portable fashion -- you'll understand why when you read the pth thread creation code.
(Essentially it uses a signal handler to trick the OS into creating a fresh stack, then longjmp's out of there and keeps the stack around. It works, evidently, but it's sketchy as hell.)
In production code, if your OS supports makecontext/swapcontext, use those instead. If it supports CreateFiber/SwitchToFiber, use those instead. And be aware of the disappointing truth that one of the most compelling use of coroutines -- that is, inverting control by yielding out of event handlers called by foreign code -- is unsafe because the calling module has to be reentrant, and you generally can't prove that. This is why fibers still aren't supported in .NET...

This is a form of what is known as userspace context switching.
It's possible but error-prone, especially if you use the default implementation of setjmp and longjmp. One problem with these functions is that in many operating systems they'll only save a subset of 64-bit registers, rather than the entire context. This is often not enough, e.g. when dealing with system libraries (my experience here is with a custom implementation for amd64/windows, which worked pretty stable all things considered).
That said, if you're not trying to work with complex external codebases or event handlers, and you know what you're doing, and (especially) if you write your own version in assembler that saves more of the current context (if you're using 32-bit windows or linux this might not be necessary, if you use some versions of BSD I imagine it almost definitely is), and you debug it paying careful attention to the disassembly output, then you may be able to achieve what you want.

I did something like this for studies.
https://github.com/Kraego/STM32L476_MiniOS/blob/main/Usercode/Concurrency/scheduler.c
The context/thread switching is done by setjmp/longjmp. The difficult part was to get the allocated stack correct (see allocateStack()) this depends on your platform.
This is just a demonstration how this could work, I would never use this in production.

As was already mentioned by Sean Ogden,
longjmp() is not good for multitasking, as
it can only move the stack upward and can't
jump between different stacks. No go with that.
As mentioned by user414736, you can use getcontext/makecontext/swapcontext
functions, but the problem with those is that
they are not fully in user-space. They actually
call the sigprocmask() syscall because they switch
the signal mask as part of the context switching.
This makes swapcontext() much slower than longjmp(),
and you likely don't want the slow co-routines.
To my knowledge there is no POSIX-standard solution to
this problem, so I compiled my own from different
available sources. You can find the context-manipulating
functions extracted from libtask here:
https://github.com/dosemu2/dosemu2/tree/devel/src/base/lib/mcontext
The functions are:
getmcontext(), setmcontext(), makemcontext() and swapmcontext().
They have the similar semantic to the standard functions with similar names,
but they also mimic the setjmp() semantic in that getmcontext()
returns 1 (instead of 0) when jumped to by setmcontext().
On top of that you can use a port of libpcl, the coroutine library:
https://github.com/dosemu2/dosemu2/tree/devel/src/base/lib/libpcl
With this, it is possible to implement the fast cooperative user-space
threading. It works on linux, on i386 and x86_64 arches.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex