How to add robotframework-pabot process? - robotframework

Im new with using pabot and I am wondering how to have more test cases to run parallel. I can only execute 4 cases in parallel.
Does it have anything to do with the machine's specification
I Have tried adding processes in the --processes but when it reached beyond 4, no processes were added after that.

Seemingly you have 4-core processor. If I understand official readme.md correctly, number of cores is the maximum number of threads possible to run in pallarel.
--processes [NUMBER OF PROCESSES]
How many parallel executors to use (default max of 2 and cpu count)

Related

number of workers and processes in Julia

in the distributed scheme, Julia distinguishes between workers and processes as far as I understand with the number of processes being +1 the number of workers. If in my machine there are only 10 cores, should I use 10 workers (julia -p 10 file.jl) or reserve 1 for the first process? Is the first process lightweight in general? What should I use for computations the number of workers or the number of processes? Thanks.
[Adding to what #philipsgabler said.]
Assuming you will be using #distributed for loop or pmap, the process 1 will be responsible for controlling the computation and hence just will be waiting for the I/O from the workers. In some scenarios process 1 will be responsible for aggregating the data delivered by the workers. Hence you can resonably assume the process 1 is lightweight.
On the other hand if you use #spawnat than you have a strict control what is going on the Julia cluster. I would though though still recommend using process 1 as the process controlling the cluster rather than putting big workload on it.
Quoting the docs:
Each process has an associated identifier. The process providing the interactive Julia prompt always has an id equal to 1. The processes used by default for parallel operations are referred to as "workers". When there is only one process, process 1 is considered a worker. Otherwise, workers are considered to be all processes other than process 1. As a result, adding 2 or more processes is required to gain benefits from parallel processing methods like pmap. Adding a single process is beneficial if you just wish to do other things in the main process while a long computation is running on the worker.
The command line option -p auto "launches as many workers as the number of local CPU threads (logical cores)". That is, in addition to process 1 -- on my machine, workers() with -p auto returns the eight ids 2--9.
All processes are in principle equal, but definitions will only be visible in process 1, unless explicitely done so by, e.g., #everywhere (so, if at all, the workers are more lightweight).

Slowdown at increased number of processes for HPC with fat-tree architecture

I've noticed something particularly odd about a simple program I've been running on a HPC with a fat tree architecture and I'm not exactly sure why I'm getting the results I'm getting.
The program I've created simply prints the runtime of a program on a varying number of processes (using MPI). I experimented by varying the number of processes by 2^n from 2 to 256, and while the execution time for each process tends to decrease as the number of processes increases from 2 to 8 processes, this time jumps dramatically at 64 processes.
Could this be because of the architecture, itself? I'd imagine that the execution time would decrease with respect to the number of processes, but this doesn't seem to be the case past a certain threshold of processes.
I figured out the issue a while ago after reading the documentation (go figure) and wanted to post the solution here in case anyone had a similar issue. On the HPC I was using (AFRL's Mustang), I was executing my programs with mpirun on the login node. The documentation clearly states that jobs need to be submitted via a batch script per section 6 of the user guide:
https://www.afrl.hpc.mil/docs/mustangQuickStartGuide.html#jobSubmit

MPI Spawn: Not enough slots available / All which nodes are allocated for this job are already filled

I am trying use MPI's Spawn functionality to run subprocesses that also use MPI. I am using MPI 2x and dynamic process management.
I have a master process (maybe I should say "master program") that runs in python (via mpi4py) that uses MPI to communicate between cores. This master process/program runs on 16 cores, and it will also make MPI_Comm_spawn_multiple calls to C and Fortran programs (which also use MPI). While the C and Fortran processes run, the master python program waits until they are finished.
A little more explicitly, the master python program does two primary things:
Uses MPI to do preprocessing for the spawning in step (2). MPI_Barrier is called after this preprocessing to ensure that all ranks have finished their preprocessing before step (2) begins. Note that the preprocessing is distributed across all 16 cores, and at the end of the preprocessing the resulting information is passed back to the root rank (e.g. rank == 0).
After the preprocessing, the root rank spawns 4 workers, each of which use 4 cores (i.e. all 16 cores are needed to run all 4 processes at the same time). This is done via MPI_Comm_spawn_multiple and these workers use MPI to communicate within their 4 cores. In the master python program, only rank == 0 spawns the C and Fortran subprocesses, and an MPI_Barrier is called after the spawn on all ranks so that all the rank != 0 cores wait until the spawned processes finish before they continue execution.
Repeat (1) and (2) many many times in a for loop.
The issue I am having is that if I use mpiexec -np 16 to start the master python program, all the cores are being taken up by the master program and I get the error:
All nodes which are allocated for this job are already filled.
when the program hits the MPI_Comm_spawn_multiple line.
If I use any other value less than 16 for -np, then only some of the cores are allocated and some are available (but I still need all 16), so I get a similar error:
There are not enough slots available in the system to satisfy the 4 slots
that were requested by the application:
/home/username/anaconda/envs/myenvironment/bin/python
Either request fewer slots for your application, or make more slots available
for use.
So it seems like even though I am going to run MPI_Barrier in step (2) to block until the spawned processes finish, MPI still thinks those cores are being used and won't allocate another process on top of them. Is there a way to fix this?
(If the answer is hostfiles, could you please explain them for me? I am not understanding the full idea and how they might be useful here.)
This is the poster of this question. I found out that I can use -oversubscribe as an argument to mpiexec to avoid these errors, but as Zulan mentioned in his comments, this could be a poor decision.
In addition, I don't know if the cores are being subscribed like I want them to be. For example, maybe all 4 C/Fortran processes are being run on the same 4 cores. I don't know how to tell.
Most MPIs have a parameter -usize 123 for the mpiexec program that indicates the size of the "universe", which can be larger than the world communicator. In that case you can spawn extra processes up to the size of the universe. You can query the size of the universe:
int universe_size, *universe_size_attr,uflag;
MPI_Comm_get_attr(comm_world,MPI_UNIVERSE_SIZE,
&universe_size_attr,&uflag);
universe_size = *universe_size_attr;

How do I select the no. of processors/cores to run my MPI program on?

I am using mpich2 1.2.1p1 version which has MPD as its default process manager.
When we run mpiexec, we can mention the no. of processes we want to spawn, but I also want to mention/select the no. of processors/cores I want to use. How do i do it?
Also, when we simply spawn n no. of processes, how do we know how many processors/cores are being used??
Please help.
Any sensible operating system will use as many cores as possible on each machine. You should not have to worry about that. When spawning 4 mpi processes on a quad core machine, it is safe to assume that all 4 cores will be used. If not, there is something seriously wrong with the configuration. Anyway, if you really want to be sure, check the CPU usage with for example 'top'.
The number of processes is the number of cores used. Mpi will put at least one process on each core. If you want to make sure you are always using the maximum number of cores on your machine then use the OS resources on your system to get the number of cores and pass that to the mpiexec call.

mpirun actual number of processors used

I am starting programming on an OpenMPI managed cluster.
I use the following command to run my executable:
mpirun -np 32 file
Now what I understand is that 32 specifies the number of processes that should be created. They may be created on the same processor. Am I right?
I am noticing increasing time for execution with increase in the number of processes. Could the above be a reason for this?
How do I find out the execution and scheduling policy of the cluster?
Is it correct to assume that typically the cluster I am working on will have many processes running on each node just as they run on my PC.
I would expect your job management system (which is ?) to allocate 1 MPI process per core. But that is a configuration matter and your cluster may not be configured as I expect. Can you see what processes are running on the various cores of your cluster at run time ?
There are many explanations for increasing execution time with increasing numbers of processes, several good ones which include the possibility of one-process-per-core. But multiple processes per core is a potential explanation.
You find out about the policies of your cluster by asking the cluster administrator.
No, I think it is atypical for cluster processors (or cores) to execute multiple MPI processes simultaneously.

Resources