mpirun with multiple executables, reset rank numbers per executable [duplicate]

mpirun with multiple executables, reset rank numbers per executable [duplicate] - mpi

I have two openmpi programs which I start like this
mpirun -n 4 ./prog1 : -n 2 ./prog2
Now how do I use MPI_Comm_size(MPI_COMM_WORLD, &size) such that i get size values as
prog1 size=4
prog2 size=2.
As of now I get "6" in both programs.

This is doable albeit a bit cumbersome to get that. The principle is to split MPI_COMM_WORLD into communicators based on the value of argv[0], which contains the executable's name.
That could be something like that:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <mpi.h>
int main( int argc, char *argv[] ) {
MPI_Init( &argc, &argv );
int wRank, wSize;
MPI_Comm_rank( MPI_COMM_WORLD, &wRank );
MPI_Comm_size( MPI_COMM_WORLD, &wSize );
int myLen = strlen( argv[0] ) + 1;
int maxLen;
// Gathering the maximum length of the executable' name
MPI_Allreduce( &myLen, &maxLen, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD );
// Allocating memory for all of them
char *names = malloc( wSize * maxLen );
// and copying my name at its place in the array
strcpy( names + ( wRank * maxLen ), argv[0] );
// Now collecting all executable' names
MPI_Allgather( MPI_IN_PLACE, 0, MPI_DATATYPE_NULL,
names, maxLen, MPI_CHAR, MPI_COMM_WORLD );
// With that, I can sort-out who is executing the same binary as me
int binIdx = 0;
while( strcmp( argv[0], names + binIdx * maxLen ) != 0 ) {
binIdx++;
}
free( names );
// Now, all processes with the same binIdx value are running the same binary
// I can split MPI_COMM_WORLD accordingly
MPI_Comm binComm;
MPI_Comm_split( MPI_COMM_WORLD, binIdx, wRank, &binComm );
int bRank, bSize;
MPI_Comm_rank( binComm, &bRank );
MPI_Comm_size( binComm, &bSize );
printf( "Hello from process WORLD %d/%d running %d/%d %s binary\n",
wRank, wSize, bRank, bSize, argv[0] );
MPI_Comm_free( &binComm );
MPI_Finalize();
return 0;
}
On my machine, I compiled and ran it as follow:
~> mpicc mpmd.c
~> cp a.out b.out
~> mpirun -n 3 ./a.out : -n 2 ./b.out
Hello from process WORLD 0/5 running 0/3 ./a.out binary
Hello from process WORLD 1/5 running 1/3 ./a.out binary
Hello from process WORLD 4/5 running 1/2 ./b.out binary
Hello from process WORLD 2/5 running 2/3 ./a.out binary
Hello from process WORLD 3/5 running 0/2 ./b.out binary
Ideally, this could be greatly simplified by using MPI_Comm_split_type() if the corresponding type for sorting out by binaries existed. Unfortunately, there is no such MPI_COMM_TYPE_ pre-defined in the 3.1 MPI standard. The only pre-defined one is MPI_COMM_TYPE_SHARED to sort-out between processes running on the same shared memory compute nodes... Too bad! Maybe something to consider for the next version of the standard?

I know the question is outdated but I wanted to add to the answer by Hristo Lliev to make it work not just for OpenMPI:
you can use the value of an MPI parameter MPI_APPNUM which will be different for each executable as "color" and split the MPI_COMM_WORLD into separate communicators, then print the size of those sub-communicators. Use MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_APPNUM, &val, &flag ); to get the value of MPI_APPNUM.

Since you are using Open MPI, there is a very simple OMPI-specific solution:
#include <stdlib.h>
MPI_Comm appcomm;
int app_id = atoi(getenv("OMPI_MCA_orte_app_num"));
MPI_Comm_split(MPI_COMM_WORLD, app_id, 0, &appcomm);
There will be now as many different appcomm communicators as there are application contexts.

Related

Is there any way to infer how many workers are on the same node using MPI?

How can I tell how many workers are set up on the same node? I can get the overall COMM_WORLD size and even, using PMI, which rank a process is on the node. How can I tell how many processes are spun up on each node?

Here you go. Use MPI_Comm_split_type to find subcommunicators corresponding to nodes, then count how many there are, and their sizes.
int main( int argc,char **argv ) {
MPI_Init(&argc,&argv);
MPI_Comm comm = MPI_COMM_WORLD;
int procno,nprocs;
MPI_Comm_size( comm,&nprocs );
MPI_Comm_rank( comm,&procno );
MPI_Comm node_comm;
MPI_Comm_split_type( comm,MPI_COMM_TYPE_SHARED,procno,MPI_INFO_NULL,&node_comm);
int rank_on_node,size_of_node;
MPI_Comm_rank( node_comm,&rank_on_node );
MPI_Comm_size( node_comm,&size_of_node );
int head_node = (rank_on_node==0);
int number_of_nodes;
MPI_Reduce( &head_node,&number_of_nodes,1,MPI_INT,MPI_SUM,0,comm);
if (procno==0)
printf("There are %d nodes\n",number_of_nodes);
MPI_Comm node_heads;
MPI_Comm_split( comm,head_node,procno,&node_heads );
int node_sizes[number_of_nodes];
MPI_Gather( &size_of_node,1,MPI_INT, node_sizes,1,MPI_INT, 0,node_heads );
if (procno==0) {
printf("Node sizes:");
for (int inode=0; inode<number_of_nodes; inode++)
printf(" %d",node_sizes[inode]);
printf("\n");
}
MPI_Finalize();
return 0;
}
For instance on my system if I request 3 nodes with 10 processes total, I get:
There are 3 nodes
Node sizes: 4 3 3
Nice. I'd sort of been expecting "4 4 2".

How to programmatically detect the number of cores and run an MPI program using all cores

I do not want to use mpiexec -n 4 ./a.out to run my program on my core i7 processor (with 4 cores). Instead, I want to run ./a.out, have it detect the number of cores and fire up MPI to run a process per core.
This SO question and answer MPI Number of processors? led me to use mpiexec.
The reason I want to avoid mpiexec is because my code is destined to be a library inside a larger project I'm working on. The larger project has a GUI and the user will be starting long computations that will call my library, which will in turn use MPI. The integration between the UI and the computation code is not trivial... so launching an external process and communicating via a socket or some other means is not an option. It must be a library call.
Is this possible? How do I do it?

This is quite a nontrivial thing to achieve in general. Also, there is hardly any portable solution that does not depend on some MPI implementation specifics. What follows is a sample solution that works with Open MPI and possibly with other general MPI implementations (MPICH, Intel MPI, etc.). It involves a second executable or a means for the original executable to directly call you library provided some special command-line argument. It goes like this.
Assume the original executable was started simply as ./a.out. When your library function is called, it calls MPI_Init(NULL, NULL), which initialises MPI. Since the executable was not started via mpiexec, it falls back to the so-called singleton MPI initialisation, i.e. it creates an MPI job that consists of a single process. To perform distributed computations, you have to start more MPI processes and that's where things get complicated in the general case.
MPI supports dynamic process management, in which one MPI job can start a second one and communicate with it using intercommunicators. This happens when the first job calls MPI_Comm_spawn or MPI_Comm_spawn_multiple. The first one is used to start simple MPI jobs that use the same executable for all MPI ranks while the second one can start jobs that mix different executables. Both need information as to where and how to launch the processes. This comes from the so-called MPI universe, which provides information not only about the started processes, but also about the available slots for dynamically started ones. The universe is constructed by mpiexec or by some other launcher mechanism that takes, e.g., a host file with list of nodes and number of slots on each node. In the absence of such information, some MPI implementations (Open MPI included) will simply start the executables on the same node as the original file. MPI_Comm_spawn[_multiple] has an MPI_Info argument that can be used to supply a list of key-value paris with implementation-specific information. Open MPI supports the add-hostfile key that can be used to specify a hostfile to be used when spawning the child job. This is useful for, e.g., allowing the user to specify via the GUI a list of hosts to use for the MPI computation. But let's concentrate on the case where no such information is provided and Open MPI simply runs the child job on the same host.
Assume the worker executable is called worker. Or that the original executable can serve as worker if called with some special command-line option, -worker for example. If you want to perform computation with N processes in total, you need to launch N-1 workers. This is simple:
(separate executable)
MPI_Comm child_comm;
MPI_Comm_spawn("./worker", MPI_ARGV_NULL, N-1, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);
(same executable, with an option)
MPI_Comm child_comm;
char *argv[] = { "-worker", NULL };
MPI_Comm_spawn("./a.out", argv, N-1, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);
If everything goes well, child_comm will be set to the handle of an intercommunicator that can be used to communicate with the new job. As intercommunicators are kind of tricky to use and the parent-child job division requires complex program logic, one could simply merge the two sides of the intercommunicator into a "big world" communicator that replaced MPI_COMM_WORLD. On the parent's side:
MPI_Comm bigworld;
MPI_Intercomm_merge(child_comm, 0, &bigworld);
On the child's side:
MPI_Comm parent_comm, bigworld;
MPI_Get_parent(&parent_comm);
MPI_Intercomm_merge(parent_comm, 1, &bigworld);
After the merge is complete, all processes can communicate using bigworld instead of MPI_COMM_WORLD. Note that child jobs do not share their MPI_COMM_WORLD with the parent job.
To put it all together, here is a complete functioning example with two separate program codes.
main.c
#include <stdio.h>
#include <mpi.h>
int main (void)
{
MPI_Init(NULL, NULL);
printf("[main] Spawning workers...\n");
MPI_Comm child_comm;
MPI_Comm_spawn("./worker", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);
MPI_Comm bigworld;
MPI_Intercomm_merge(child_comm, 0, &bigworld);
int size, rank;
MPI_Comm_rank(bigworld, &rank);
MPI_Comm_size(bigworld, &size);
printf("[main] Big world created with %d ranks\n", size);
// Perform some computation
int data = 1, result;
MPI_Bcast(&data, 1, MPI_INT, 0, bigworld);
data *= (1 + rank);
MPI_Reduce(&data, &result, 1, MPI_INT, MPI_SUM, 0, bigworld);
printf("[main] Result = %d\n", result);
MPI_Barrier(bigworld);
MPI_Comm_free(&bigworld);
MPI_Comm_free(&child_comm);
MPI_Finalize();
printf("[main] Shutting down\n");
return 0;
}
worker.c
#include <stdio.h>
#include <mpi.h>
int main (void)
{
MPI_Init(NULL, NULL);
MPI_Comm parent_comm;
MPI_Comm_get_parent(&parent_comm);
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("[worker] %d of %d here\n", rank, size);
MPI_Comm bigworld;
MPI_Intercomm_merge(parent_comm, 1, &bigworld);
MPI_Comm_rank(bigworld, &rank);
MPI_Comm_size(bigworld, &size);
printf("[worker] %d of %d in big world\n", rank, size);
// Perform some computation
int data;
MPI_Bcast(&data, 1, MPI_INT, 0, bigworld);
data *= (1 + rank);
MPI_Reduce(&data, NULL, 1, MPI_INT, MPI_SUM, 0, bigworld);
printf("[worker] Done\n");
MPI_Barrier(bigworld);
MPI_Comm_free(&bigworld);
MPI_Comm_free(&parent_comm);
MPI_Finalize();
return 0;
}
Here is how it works:
$ mpicc -o main main.c
$ mpicc -o worker worker.c
$ ./main
[main] Spawning workers...
[worker] 0 of 2 here
[worker] 1 of 2 here
[worker] 1 of 3 in big world
[worker] 2 of 3 in big world
[main] Big world created with 3 ranks
[worker] Done
[worker] Done
[main] Result = 6
[main] Shutting down
The child job has to use MPI_Comm_get_parent to obtain the intercommunicator to the parent job. When a process is not part of such a child job, the returned value will be MPI_COMM_NULL. This allows for an easy way to implement both the main program and the worker in the same executable. Here is a hybrid example:
#include <stdio.h>
#include <mpi.h>
MPI_Comm bigworld_comm = MPI_COMM_NULL;
MPI_Comm other_comm = MPI_COMM_NULL;
int parlib_init (const char *argv0, int n)
{
MPI_Init(NULL, NULL);
MPI_Comm_get_parent(&other_comm);
if (other_comm == MPI_COMM_NULL)
{
printf("[main] Spawning workers...\n");
MPI_Comm_spawn(argv0, MPI_ARGV_NULL, n-1, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &other_comm, MPI_ERRCODES_IGNORE);
MPI_Intercomm_merge(other_comm, 0, &bigworld_comm);
return 0;
}
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("[worker] %d of %d here\n", rank, size);
MPI_Intercomm_merge(other_comm, 1, &bigworld_comm);
return 1;
}
int parlib_dowork (void)
{
int data = 1, result = -1, size, rank;
MPI_Comm_rank(bigworld_comm, &rank);
MPI_Comm_size(bigworld_comm, &size);
if (rank == 0)
{
printf("[main] Doing work with %d processes in total\n", size);
data = 1;
}
MPI_Bcast(&data, 1, MPI_INT, 0, bigworld_comm);
data *= (1 + rank);
MPI_Reduce(&data, &result, 1, MPI_INT, MPI_SUM, 0, bigworld_comm);
return result;
}
void parlib_finalize (void)
{
MPI_Comm_free(&bigworld_comm);
MPI_Comm_free(&other_comm);
MPI_Finalize();
}
int main (int argc, char **argv)
{
if (parlib_init(argv[0], 4))
{
// Worker process
(void)parlib_dowork();
printf("[worker] Done\n");
parlib_finalize();
return 0;
}
// Main process
// Show GUI, save the world, etc.
int result = parlib_dowork();
printf("[main] Result = %d\n", result);
parlib_finalize();
printf("[main] Shutting down\n");
return 0;
}
And here is an example output:
$ mpicc -o hybrid hybrid.c
$ ./hybrid
[main] Spawning workers...
[worker] 0 of 3 here
[worker] 2 of 3 here
[worker] 1 of 3 here
[main] Doing work with 4 processes in total
[worker] Done
[worker] Done
[main] Result = 10
[worker] Done
[main] Shutting down
Some things to keep in mind when designing such parallel libraries:
MPI can only be initialised once. If necessary, call MPI_Initialized to check if the library has already been initialised.
MPI can only be finalized once. Again, MPI_Finalized is your friend. It can be used in something like an atexit() handler to implement a universal MPI finalisation on program exit.
When used in threaded contexts (usual when GUIs are involved), MPI must be initialised with support for threads. See MPI_Init_thread.

You can get number of CPUs by using for example this solution, and then start the MPI process by calling MPI_comm_spawn. But you will need to have a separate executable file.

matrix vector multiplication mpi

I wanna do matrix vector multiplication. The code is compiling but not running. Can anyone please help me out in solving the problem? Thank you in advance.
#include "mpi.h"
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <time.h>
#define DIM 500
int main(int argc, char *argv[])
{
int i, j, n=10000;
int nlocal; /* Number of locally stored rows of A */
double *fb;
double a[DIM * DIM], b[DIM], x[DIM]; /* Will point to a buffer that stores the entire vector b */
int npes, myrank;
MPI_Status status;
MPI_Init(&argc, &argv);
/* Get information about the communicator */
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &npes);
/* Allocate the memory that will store the entire vector b */
fb = (double*)malloc(npes * sizeof(double));
nlocal = n / npes;
/* Gather the entire vector b on each processor using MPI's ALLGATHER operation */
MPI_Allgather(b, nlocal, MPI_DOUBLE, fb, nlocal, MPI_DOUBLE, MPI_COMM_WORLD);
/* Perform the matrix-vector multiplication involving the locally stored submatrix */
for (i = 0; i < nlocal; i++) {
x[i] = 0.0;
for (j = 0; j < n; j++)
x[i] += a[i * n + j] * fb[j];
}
free(fb);
MPI_Finalize();
} //end main
Please help me out in running the code. Thanks.

The issue may come from fb = (double*)malloc(npes * sizeof(double)); which should be fb = (double*)malloc(n * sizeof(double));. Indeed, npes is the number of processes and n is the total length of the vector.
Moreover, the array a is of size 500x500=250000. This is enough to store 25 rows if n=10000... Are you using 400 processes ? If you are using less that 400 processes a[i * n + j] is an attempt to read after the end of the array. It triggers undefined behaviors, such as a segmentation fault.
Last a is a large array and since it is declared as double a[500*500], it is allocated on the stack. Read : Segmentation fault on large array sizes : the best way to go is to use malloc() for a as well, with appropiate size (here nlocal*n).
double *a=malloc(nlocal*n*sizeof(double));
if(a==NULL){fprintf(stderr,"process %d : malloc failed\n",npes);exit(1);}
...
free(a);
n=10000 is rather large. Consider using computed numbers such as nlocal*n for the size of the array a, not default sizes such as DIM. That way, you will be able to debug your code on smaller n and memory will not be wasted.
The same comments apply to b and x allocated as double b[500] and double x[500] while much larger arrays are needed if n=10000. Once again, consider using malloc() with the appropriate number, not a defined value DIM=500 !
double *b=malloc(n*sizeof(double));
if(b==NULL){fprintf(stderr,"process %d : malloc failed\n",npes);exit(1);}
...
free(b);
double *x=malloc(nlocal*sizeof(double));
if(x==NULL){fprintf(stderr,"process %d : malloc failed\n",npes);exit(1);}
...
free(x);
A debugger such as valgrind can detect such problems related to memory management. Try it on your program using a single process !

How to Implement a single program in C that replicates the following Unix command(s): ps -ef | grep YOUR_USER_id | wc [duplicate]

This question already has answers here:
Connecting n commands with pipes in a shell?
(2 answers)
Learning pipes, exec, fork, and trying to chain three processes together
(1 answer)
Closed 8 years ago.
My teacher gave us a practice assignment for studying in my Operating Systems class. The assignment was to pipe three processes together and implement the commands in the title all at once. We are only allowed to use these commands when implementing it:
dup2()
one of the exec()
fork()
pipe()
close()
I can pipe two together but I don't know how to do three. Could someone either show me how to do it or at least point me in the right direction?
Here is my code so far:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
int pfd[2];
int pfdb[2];
int pid;
if (pipe(pfd) == -1) {
perror("pipe failed");
exit(-1);
}
if ((pid = fork()) < 0) {
perror("fork failed");
exit(-2);
}
if (pid == 0) {
close(pfd[1]);
dup2(pfd[0], 0);
close(pfd[0]);
execlp("ps", "ps", "-ef", (char *) 0);
perror("ps failed");
exit(-3);
}
else {
close(pfd[0]);
dup2(pfd[1], 1);
close(pfd[1]);
execlp("grep", "grep", "darrowr", (char *) 0);
perror("grep failed");
exit(-4);
}
exit(0);
}
Any help would be appreciated. Heck a tutorial on how to complete it would be wondrous!

You're going to need 3 processes and 2 pipes to connect them together. You start with 1 process, so you are going to need 2 fork() calls, 2 pipe() calls, and 3 exec*() calls. You have to decide which of the processes the initial process will end up running; it is most likely either the ps or the wc. You can write the code either way, but decide before you start.
The middle process, the grep, is going to need a pipe for its input and a pipe for its output. You could create one pipe and one child process and have it run ps with its output going to a pipe; you then create another pipe and another child process and fix its pipes up before running grep; the original process would have both pipes open and would close most of the file descriptors before running wc.
The key thing with pipes is to make sure you close enough file descriptors. If you duplicate a pipe to standard input or standard output, you should almost always close both of the original file descriptors returned by the pipe() call; in your example, you should close both. And with two pipes, that means there are four descriptors to close.
Working code
Note the use of an error report and exit function; it simplifies error reporting enormously. I have a library of functions that do different error reports; this is a simple implementation of one of those functions. (It's overly simple: it doesn't include the program name in the messages.)
#define _XOPEN_SOURCE 700
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
static void err_syserr(const char *fmt, ...);
int main(void)
{
int p1[2];
int p2[2];
pid_t pid1;
pid_t pid2;
if (pipe(p1) == -1)
err_syserr("failed to create first pipe");
if ((pid1 = fork()) < 0)
err_syserr("failed to fork first time");
if (pid1 == 0)
{
dup2(p1[1], STDOUT_FILENO);
close(p1[0]);
close(p1[1]);
execlp("ps", "ps", "-ef", (char *)0);
err_syserr("failed to exec 'ps'");
}
if (pipe(p2) == -1)
err_syserr("failed to create second pipe");
if ((pid2 = fork()) < 0)
err_syserr("failed to fork second time");
if (pid2 == 0)
{
dup2(p1[0], STDIN_FILENO);
close(p1[0]);
close(p1[1]);
dup2(p2[1], STDOUT_FILENO);
close(p2[0]);
close(p2[1]);
execlp("grep", "grep", "root", (char *)0);
err_syserr("failed to exec 'grep'");
}
else
{
close(p1[0]);
close(p1[1]);
dup2(p2[0], STDIN_FILENO);
close(p2[0]);
close(p2[1]);
execlp("wc", "wc", (char *)0);
err_syserr("failed to exec 'wc'");
}
/*NOTREACHED*/
}
#include <stdarg.h>
#include <errno.h>
#include <string.h>
static void err_syserr(const char *fmt, ...)
{
int errnum = errno;
va_list args;
va_start(args, fmt);
vfprintf(stderr, fmt, args);
va_end(args);
if (errnum != 0)
fprintf(stderr, " (%d: %s)", errnum, strerror(errnum));
putc('\n', stderr);
exit(EXIT_FAILURE);
}
Sample output:
234 2053 18213
My machine is rather busy running root-owned programs, it seems.

Difficulty in redirecting output in a dup2 and pipe code in Unix

I am new in unix. In the following code, I pass three arguments from the command line "~$ foo last sort more" in order to replicate "~$ last | sort | more". I am trying to create a program that will take three argument(at least 3 for now). The parent will fork three processes. The first process will write to the pipe. The second process will read and write to and from the pipe and the third process will read from the pipe and write to the stdout(terminal). First process will exec "last", second process will exec "sort" and third process will exec "more" and the processes will sleep for 1,2 and 3 secs in order to synchronize. I am pretty sure I am having trouble creating a pipe and redirecting the input and output. I don't get any output to the terminal but I can see that the processes have been created. I would appreciate some help.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/types.h>
#include <dirent.h>
#include <unistd.h>
#include <signal.h>
#include <fcntl.h>
#include <errno.h>
#define FOUND 1
#define NOT_FOUND 0
#define FIRST_CHILD 1
#define LAST_CHILD numargc
#define PATH_1 "/usr/bin/"
#define PATH_2 "/bin/"
#define DUP_READ() \
if (dup2(fdes[READ], fileno(stdin)) == -1) \
{ \
perror("dup error"); \
exit(4); \
}
#define DUP_WRITE() \
if (dup2(fdes[WRITE], fileno(stdout)) == -1) \
{ \
perror("dup error"); \
exit(4); \
}
#define CLOSE_FDES_READ() \
close(fdes[READ]);
#define CLOSE_FDES_WRITE() \
close(fdes[WRITE]);
#define EXEC(x, y) \
if (execl(arraycmds[x], argv[y], (char*)NULL) == -1) \
{ \
perror("EXEC ERROR"); \
exit(5); \
}
#define PRINT \
printf("FD IN:%d\n", fileno(stdin)); \
printf("FD OUT:%d\n", fileno(stdout));
enum
{
READ, /* 0 */
WRITE,
MAX
};
int cmdfinder( char* cmd, char* path); /* 1 -> found, 0 -> not found */
int main (int argc, char* argv[])
{
int numargc=argc-1;
char arraycmds[numargc][150];
int i=1, m=0, sleeptimes=5, numfork;
int rc=NOT_FOUND;
pid_t pid;
int fdes[2];
if(pipe(fdes) == -1)
{
perror("PIPE ERROR");
exit(4);
}
while(i <= numargc)
{
memset(arraycmds[m], 0, 150);
rc=cmdfinder(argv[i], arraycmds[m]);
if (rc)
{
printf("Command found:%s\n", arraycmds[m]);
}
i++;
m++;
}
i=0; //array index
numfork=1; //fork number
while(numfork <= numargc)
{
if ((pid=fork()) == -1)
{
perror("FORK ERROR");
exit(3);
}
else if (pid == 0)
{
/* Child */
sleep(sleeptimes);
if (numfork == FIRST_CHILD)
{
DUP_WRITE();
EXEC(i, numfork);
}
else if (numfork == LAST_CHILD)
{
DUP_READ();
CLOSE_FDES_WRITE();
EXEC(i, numfork);
}
else
{
DUP_READ();
DUP_WRITE();
CLOSE_FDES_READ();
CLOSE_FDES_WRITE();
EXEC(i, numfork);
}
}
else
{
/* Parent */
printf("pid:%d\n", pid);
i++;
numfork++;
sleeptimes++;
}
}
PRINT;
printf("i:%d\n", i);
printf("numfork:%d\n", numfork);
printf("DONE\n");
return 0;
}
int cmdfinder(char* cmd, char* path)
{
DIR* dir;
struct dirent *direntry;
char *pathdir;
int searchtimes=2;
while (searchtimes)
{
pathdir = (char*)malloc(250);
memset(pathdir, 0, 250);
if (searchtimes==2)
{
pathdir=PATH_1;
}
else
{
pathdir=PATH_2;
}
if ((dir = opendir(pathdir)) == NULL)
{
perror("Directory not found");
exit (1);
}
else
{
while (direntry = readdir(dir))
{
if (strncmp( direntry->d_name, cmd, strlen(cmd)) == 0)
{
strcat(path, pathdir);
strcat(path, cmd);
//searchtimes--;
return FOUND;
}
}
}
closedir(dir);
searchtimes--;
}
printf("%s: Not Found\n", cmd);
return NOT_FOUND;
}

All your macros are making this harder to read than if you just wrote it straight. Especially when they refer to local variables. To find out what's going on with EXEC my eyes have to jump up from where it's used to where it's defined, find out which local arrays it uses, then jump back down to see how that access fits in the flow of main. It's a maze of macros.
And wow, cmdfinder? Your very own $PATH lookup, only it's hardcoded /usr/bin:/bin? And double wow, readdir, just to find out if a file exists whose name is already decided? Just stat it! Or don't do anything, just exec it and handle the ENOENT by trying the next one. Or use execlp that's what it's there for!
On to the main point... you don't have enough pipes, and you're not closing all the unused descriptors.
last | sort | more is a pipeline of 3 commands connected by 2 pipes. You can't do it with one pipe. The first command should write into the first pipe, the middle command should read the first pipe and write to the second pipe, and the last command should read the second pipe.
You could create both pipes first, then do all the forks, which makes things simple to follow, but requires a lot of closes in every child process since they'll all inherit all the pipe fds. Or you can use a more sophisticated loop, creating each pipe just before forking the first process that will use it, and closing each descriptor in the parent as soon as the relevant child process has been created. I'd hate to see how many macros you'd use for that.
Every successful dup should be followed by a close of the descriptor that was copied. dup is short for "duplicate", not "move". After it's done, you have an extra descriptor left over, so don't just dup2(fdes[1], fileno(stdout) - also close(fdes[1]) afterward. (To be perfectly robust you should check whether fdes[1]==fileno(stdout) already, and in that case skip the dup2 and close.)
FOLLOWUP QUESTIONS
You can't use one pipe for 3 processes because there would be no way to distinguish which data should go to which destination. When the first process writes to the pipe, while both of the other processes are trying to read from it, one of them will get the data but you won't be able to predict which one. You need the middle process to read what the first process writes, and the last process to read what the middle process writes.
You're halfway right about file descriptors being shared after a fork. The actual pipe object is shared. That's what makes the whole system work. But the file descriptors - the endpoints designated by small integers like 1 for standard output, 0 for standard input, and so on - are not coupled the way you suggest. The same pipe object may be associated with the same file descriptor number in two processes, the associations are independent. Closing fd 1 in one process does not cause fd 1 to become closed in any other process, even if they are related.
Sharing of the fd table, so that a close in one task has an effect in another task, is part of the "pthread" feature set, not the "fork" feature set.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex