How to start distinct MPI jobs from a single allocation

How to start distinct MPI jobs from a single allocation - mpi

Let's say I have started an MPI job with 256 cores on 16 nodes.
I have an MPI program, but it is not unfortunately parallel over one parameter. Fortunately, I can easily create my own MPI program, which could handle parallelization of that parameter, only if I obtain the output files.
So, how can I start an MPI job (from within an MPI job), which uses a particular subset of these cores, namely only a particular node? So basically I want to run 16 distinct MPI calculations all with 16 cores, from within a single 256 core MPI job. These calculations take about 10 minutes with 16 cores, and there are about 200 iterations in the outer loop. With 256 cores, that is a reasonable 32 hours. It is not plausible to either resubmit 200 times, or run these 16 calculations sequentially.
To be even more precise, here is some python-pseudo-code for what I want to do:
from ase.parallel import world, rank
from os import system, chdir
while 1:
node = rank // 16
subrank = rank % 16
chdir(mydir+"Calculation_%d" % node)
# This will not work, one needs to specify somehow that only ranks from node*16 to node*16+15 will be used
os.system("mpirun -n 16 nwchem input.nw > nwchem.out")
analyse_output(mydir+"Calculation_%d/nwchem.out" % node)
rewrite_input_files()
Basicly, with 16 cores and 4 jobs:
rank 0: start nwchem process in /calculation0/ as rank 0/4.
rank 1: start nwchem in /calculation0/ as rank 1/4.
rank 2: start nwchem in /calculation0/ as rank 2/4.
rank 3: start nwchem in /calculation0/ as rank 3/4.
rank 4: start nwchem in /calculation1/ as rank 0/4.
rank 5: start nwchem in /calculation1/ as rank 1/4.
rank 6: start nwchem in /calculation1/ as rank 2/4.
rank 7: start nwchem in /calculation1/ as rank 3/4.
rank 8: start nwchem in /calculation2/ as rank 0/4.
rank 9: start nwchem in /calculation2/ as rank 1/4.
rank 10: start nwchem in /calculation2/ as rank 2/4.
rank 11: start nwchem in /calculation2/ as rank 3/4.
rank 12: start nwchem in /calculation3/ as rank 0/4.
rank 13: start nwchem in /calculation3/ as rank 1/4.
rank 14: start nwchem in /calculation3/ as rank 2/4.
rank 15: start nwchem in /calculation3/ as rank 3/4.
Gather all the results.
Optimize all geometries (this requires knowledge of forces between the calculations).
Repeat until convergence (about 200 times).
Background: In case you are interested, I will elaborate details here. But the main question still is "How to instantiate N MPI calculations from a single MPI calculation of M cores, which each have their M / N cores.
NWChem does not have image-parallel nudged elastic band calculator. Here is an example of this process, with a different code: GPAW.
https://wiki.fysik.dtu.dk/gpaw/tutorials/neb/neb.html
Here it is smooth, because it is so easy to create a sub-communicator with GPAW and it's MPI interface. However, I only have the nwchem runtime MPI, and I wish to do the same thing: create many calculators (a band or a chain of geometries, which are all linked with 'springs', and optimize that chain.)

I suppose, you are trying to use Dynamic Process Management in MPI. We can spawn new processes for the smaller jobs and then after the calculations we can connect the spawned existing process.
Detailed Explanation on DPM
Example program using DPM

Related

Hash collision probability of decimal value when translated from a large binary number

I can solve basic problems of chance of dependent and independent events. The following portion of question is an excerpt of my algorithm which i am developing for an autonomous contention based queuing system to mitigate collisions in a communication system. I want to calculate the following in terms of probability. This will help me to see how the algorithm performs for varying number of 'n' in the following question.
A group of n people generate a 64-bit number e.g., (0 1 1 0 1 0 1 0 1 0 0 1 1 . . . ) independently of each other. The 64-bit number is randomly generated by every individual and it is assumed to have an avalanche effect. It means that the binary values of two persons are significantly different. Now the decimal equivalent of the binary 64-bit value is translated by every person to a number x in the range [1, 50] using the formula
x=[(old_value - old_min)/(old_max - old_min)]*(new_max - new_min) + new_min
Then what is the probability that the same x is calculated by least two people.

How to calculate network system downtime

Here are two systems, A and B. How to calculate the downtime of each.
For A, should it be: 0.01 * 10 * 6 * 12 = 7.2 hours/year?
A system has 10 physical nodes, if any of those nodes failed, the whole system go down. The probability of failure for a individual node is 1% per month, and the downtime is 6h for fixing. Then what is the downtime for the whole system per year.
B system has 10 physical nodes, if 9 out of 10 nodes is running the whole system can function as normal. The probability of failure for a individual node is 1% per month, and the downtime is 6h for fixing. Then what is the downtime for the whole system per year.

We are talking about expected downtimes here, so we'll have to take a probabalistic approach.
We can take a Poisson approach to this problem. The expected failure rate is 1% per month for a single node, or 120% (1.2) for 10 nodes in 12 months. So you are correct that 1.2 failures/year * 6 hours/failure = 7.2 hours/year for the expected value of A.
You can figure out how likely a given amount of downtime is by using 7.2 as the lambda value for the poisson distribution.
Using R: ppois(6, lambda=7.2) = 0.42, meaning there is a 42% chance that you will have less than 6 hours of downtime in a year.
For B, it's also a Poisson, but what's important is the probability that a second node will fail in the six hours after the first failure.
The failure rate (assuming a 30 day month, with 120 6 hour periods) is 0.0083% per 6 hour period per node.
So we look at the chances of two failures within six hours, times the number of six hour periods in a year.
Using R: dpois(2.0, lambda=(0.01/120)) * 365 * 4 = 0.000005069
0.000005069 * 3 expected hours/failure = 54.75 milliseconds expected downtime per year. (3 expected hours per failure because the second failure should occur on average half way through the first failure.)

1% failure rate / month / node has a probability of 0,00138889% to fail at any given hour. I used binomial distribution in Excel to model the probability of N node failures when there are 8760 h/y * 10 nodes = 87600 "trials". I got these results:
0 failure: 29.62134067 %
1 failure: 36.03979837 %
2 failure: 21.92426490 %
3 failure: 8.89142792 %
4 failure: 2.70442094 %
5 failure: 0.65805485 %
6 failure: 0.13343314 %
...and so forth
N failures would cause 6N hours of downtime (asusming they are independent). Then for each 6N hours of single-node downtime the probability of having none of other 9 nodes to fail is (100% - 0,00138889%) ^ (9 * 6N).
Thus expected two-node downtime is P(1 node down) * (1 - P(no other node down)) * 6 hours / 2 (divided by two because on average 2nd failure occurs in mid-point of other node being repaired). When summed over all N numbers of failures I got expected two-node downtime of 9.8 seconds / year, now idea how correct estimate this is but should give a rough idea. Quite brute-force solution :/

Cost Optimization across Different Suppliers for a Product

I've this following optimization problem. A company produces a product, say Big A. To produce this product, it requires 5 processes. (Please find the detail table below). For each process, there are number of supplier that supply raw material for that particular process. E.g. For process 1, there are 3 supplier 1,2 & 3.
The constrain for the CEO of this company,say C, is that for each process the CEO has to purchase supplies from Supplier 1 first, then for additional supplies from 2nd Supplier and so on.
The optimization problem is C wants 700 units for total material to produce for 1 unit of Big A then how will he do it at minimum cost. How the optimization will change if the amount of units require increases to 1500 units.
I'll be grateful if I get the solution of this answer. But if somebody can suggest me some reference regarding this problem it will be a great help too. I'm mainly using R software here.
Process Supplier Cost Units Cumm_Cost Cumm_Unit
1 1 10 100 10 100
1 2 20 110 30 210
1 3 10 200 40 410
2 1 20 100 20 100
2 2 30 150 50 250
2 3 10 150 60 400
3 1 40 130 40 130
3 2 30 140 70 270
3 3 50 120 120 390
4 1 20 120 20 120
4 2 40 120 60 240
4 3 20 180 80 420
5 1 30 180 30 180
5 2 10 160 40 320
5 3 30 140 70 460
Regards,

I will start by solving the specific problem that you have posted and then will demonsrate how to formulate the problem more abstractively. For simplicity, I will use Excel's Solver add-in to solve the problem, but any configuration of a modeling language (such as AIMMS, AMPL, LINGO, OPL, MOSEL and numerous others) with a solver (CPLEX, GUROBI, GLPK, CBC and numerous others) can be used. If you would like to use R, there exists an lpSolve package that calls the lpSolve solver (which is not the best one in the word to be honest, but it is free of charge).
Note that for "real" (large scale) integer problems, the commercial solvers CPLEX, GUROBI and XPRESS perform a lot better than others. The first completely free solver that performs decently in most tests (including Hans Mittelman's page) is CBC. CBC can be hooked up in excel and solve the built-in solver model without restrictions in the number of constraints or variables, by using this add-in. Therefore, assuming that most CPU is going to be spent by the optimization algorithm, using CBC/OpenSolver seems like an efficient choice.
SPREADSHEET SETUP
I follow some conventions for convenience:
Decision variable cells are marked Green.
Constraints are marked red.
Data are marked grey.
Objective function is marked blue.
First, lets augment the table you presented as follows:
The added columns explained briefly:
Selected?: equals 1 if the (Process, Supplier) combo is allowed to produced a positive quantity, zero otherwise.
Quantity: the quantity produced, defined for each (Process, Supplier) combo.
Max Quantity?: Equals 1 if the Suppliers produces the maximum amount of units for that particular Process.
Quantity UB: equals Units * Selected?. This makes the upper bound either equal to Units, when the Supplier is allowed to produce this Process, or zero otherwise.
Quantity LB: equals Units * Max Quantity?. This is to ensure that whenever the Max Quantity? column is 1, the produced quantity will be equal to Units.
Selection: For the 1st supplier, it equals 0. For the 2nd and 3rd suppliers, it equals the Max Quantity? of the previous supplier (row) minus the Selected? of the current supplier (row).
A screenshot with formulas:
There exist two more constraints:
There must be at least one item produced from each process and
The total number of items should be 700 (or later 1,500).
Here is their setup:
and here are the formulas:
In brief, we use SUMIF to sum the quantities that are specific to each supplier, which we are going to constrain to be more than 1 item for each process.
To finish the spreadsheet setup, we need to calculate the objective function, namely the cost of the allocation. This is easily done by taking the SUMPRODUCT of columns Quantity and Cost. Note that the cumulative quantities are derived data and not very useful in the current context.
After the above steps, the spreadsheet looks like below:
SOLVER MODEL
For the solver model we need to declare
The Objective
The Decisions
The Constraints
The Solver (and tweak some parameters if necessary).
For ease of exposition, I have given each range the name of its header. The solver model looks as follows:
It should all be explanatory, except possibly the Selected >= 0 part. The column selected equals the difference between the binary max Quantity of the previous supplier minus the Selected of the current supplier. Selected >= 0 => max Quantity of previous supplier >= Selected of current supplier. Therefore, if the previous supplier does not produce at max quantity (binary = 0), the current supplier cannot produce.
Then we need to make sure that the solver setting are OK:
and solve the problem.
Solution for req = 700 :
As we see, the model tries to avoid procedures 3 and 5 as much as possible, and satisfies the constraint "at least 1 item per process" by picking up exactly 1 item for processes 3 and 5. The objective function value is 11,710.
Solution for req = 1,500 :
Here we need more capacity, but yet process 3 seems expensive and the model tries to avoid it by allocating whatever is necessary (just 1 unit to supplier 1).
I hope this helps. The spreadsheet can be downloaded here. I include the definition of the mathematical model below, in case you would like to transfer it to another language.
MATHEMATICAL FORMULATION
A formal definition of your problem is as follows.
SETS:
PARAMETERS:
Decisions:
Objective:
Constraints:
Constraint explanation:
C1: A supplier cannot produce anything from a process if he has not been allocated to that process.
C2: If a supplier's maximum indicator is set to 1, then the production variable should be the maximum possible.
C3: We cannot select supplier s for process p if we have not produced the max quantity available from the previous supplier s_[-1].
C4: We need to produce at least 1 item from each process.
C5: the total production from all processes and suppliers should equal the required amount.

Looks like you should look at the simplex algorithm (or some existing implementation of it).
Wikipedia has a fairly nice description of the algorithm, http://en.wikipedia.org/wiki/Simplex_algorithm

Is it possible to reduce computing time by a factor higher than your number of cores using R package multicore?

I had a R routine that spent most of its time on a lapply call of the form:
lapply(X, FUN, ...)
where X is a list with 400 elements. The total time of execution was 11.88 sec.
Then I decided to use the multicore package and made the following change on my routine
mclapply(X, FUN, ...)
After that I was surprised to see that the computing time dropped to 0.66 sec. That is, only 5% of the original time. This was surprising to me since I was expecting something around 25% of the original time since the processor on my laptop is
Intel® Core™ i5 CPU M 560 # 2.67GHz × 4
Can someone explain me where this extra reduced time comes from? Is it that each core can itself parallelize computations?

Reusable Barrier solution has a deadlock?

I have been reading "The Little Book of Semaphores" and in page 41 there is a solution for the Reusable Barrier problem. The problem I have is why it won't generate a deadlock situation.
1 # rendezvous
2
3 mutex.wait()
4 count += 1
5 if count == n:
6 turnstile2.wait() # lock the second
7 turnstile.signal() # unlock the first
8 mutex.signal()
9
10 turnstile.wait() # first turnstile
11 turnstile.signal()
12
13 # critical point
14
15 mutex.wait()
16 count -= 1
17 if count == 0:
18 turnstile.wait() # lock the first
19 turnstile2.signal() # unlock the second
20 mutex.signal()
21
22 turnstile2.wait() # second turnstile
23 turnstile2.signal()
In this solution, between lines 15 and 20, isn't it a bad habit to call wait() on a semaphore (in line 18) while holding a mutex which causes a deadlock? Please explain. Thank you.

mutex protects the count variable. The first mutex lock is concerned with incrementing the counter to account for each thread, and the last thread to enter (if count == n) locks the second tunstile in preparation of leaving (see below) and releases the waiting (n-1) threads (that are waiting on line 10). Then each signals to the next.
The second mutex lock works similarly to the first, but decrements count (same mutext protects it). The last thread to enter the mutex block locks turnstile to prepare for the next batch entring (see above) and releases the (n-1) thread waiting on line 22. Then each thread signals to the next.
Thus turnstile coordinates the entries to the critical point, while turnstile2 coordinates the exit from it.
There could be no deadlock: by the time the (last) thread gets to line 18, turnstile is guarantted to be not held by any other thread (they are all waiting on line 22). Similarly with turnstile2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex