MPI communicator for sub-range of MPI_COMM_WORLD - mpi

What is a simple way to create a (sub)communicator containing consecutive ranks [rStart, ..., last rank of MPI_COMM_WORLD] of MPI_COMM_WORLD?
rStart is >= 0, i.e., first rStart ranks need to be excluded.

The simplest code is to have
MPI_Comm_split(MPI_COMM_WORLD, rank < rStart, rank, &new_comm);
run on all ranks of MPI_COMM_WORLD. It will create two communicators - all ranks starting with rStart will get the one you desire, the others can just MPI_Comm_free their communicator.
If you cannot easily have the excluded ranks run the same code, you can use MPI_Comm_create_group, but then you have to also create the group first.

Related

for and if cicle operations

Hi¡ I have a doubt and I hope someone can help me please, I have a dataframe in R and it makes a double cicle for and an if, the data frame has some values and then if the condition is True, it makes some operations, the problem is I can't understand neither the cicle and the operation the code makes under the condition.
I reply the code I have in a simpler one but the idea is the same. And if someone can explain me the whole operation please.
w<-c(2,5,4,3,5,6,8,2,4,6,8)
x<-c(2,5,6,7,1,1,4,9,8,8,2)
y<-c(2,5,6,3,2,4,5,6,7,3,5)
z<-c(2,5,4,5,6,3,2,5,6,4,6)
letras<-data.frame(w,x,y,z)
l=1
o=1
v=nrow(letras)
letras$op1<-c(1)
letras$op2<-c(0)
for (l in 1:v) {
for (o in 1:v) {
if(letras$x[o]==letras$y[l] & letras$z[l]==letras$z[o] & letras$w[l]){
letras$op1<-letras$op1+1
letras$op2<-letras$x*letras$y
}
}
}
The result is the following:
Thanks¡¡¡¡¡
This segment of code is storing values into vectors labeled w,x,y,z.
w<-c(2,5,4,3,5,6,8,2,4,6,8)
x<-c(2,5,6,7,1,1,4,9,8,8,2)
y<-c(2,5,6,3,2,4,5,6,7,3,5)
z<-c(2,5,4,5,6,3,2,5,6,4,6)
It then transforms the 4 vectors into a data frame
letras<-data.frame(w,x,y,z)
This bit of code isn't doing anything as far as I can tell.
l=1 #???
o=1 #???
This counts how many rows is in the letras data frame and stores to v, in this case 11 rows.
v=nrow(letras)
This creates new columns in letras dataframe with all ones in op1 and all zeros in op2
letras$op1<-c(1)
letras$op2<-c(0)
Here each for loop is acting as a counter, and will run the code beneath it iteratively from 1 to v (11), so 11 iterations. Each iteration the value of l will increase by 1. So first iteration l = 1, second l=2... etc.
for (l in 1:v) {
You then have a second counter, which is running within the first counter. So this will iterate over 1 to 11, exactly the same way as above. But the difference is, this counter will need to complete it's 1 to 11 cycle before the top level counter can move onto the next number. So o will effectively cycle from 1 to 11, for each 1 count of 1l. So with the two together, the inside for loop will count from 1 to 11, 11 times.
for (o in 1:v) {
You then have a logical statement which will run the code beneath if the column x and column y values are the same. Remember they will be calling different index values so it could be 1st x value vs the 2nd y value. There is an AND statement so it also needs the two z position values to be equal. and the last part letras$w[l] is always true in this particular example, so could possibly be removed.
if(letras$x[o]==letras$y[l] & letras$z[l]==letras$z[o] & letras$w[l]){
Lastly, is the bit that happens if the above statement is true.
op1 get's 1 added (remember this was starting from 1 anyway), and op2 multiplies x*y columns together. This multiplication is perhaps a little bit inefficient, because x and y do not change, so the answer will calculate the same result each time the the if statement evaluates TRUE.
letras$op1<-letras$op1+1
letras$op2<-letras$x*letras$y
}
}
}
Hope this helps.

Optimizing (minimizing) the number of lines in file: an optimization problem in line with permutations and agenda scheduling

I have a calendar, typically a csv file containing a number of lines. Each line corresponds to an individual and is a sequence of consecutive values '0' and '1' where '0' refers to an empty time slot and '1' to an occupied slot. There cannot be two separated sequences in a line (e.g. two sequences of '1' separated by a '0' such as '1,1,1,0,1,1,1,1').
The problem is to minimize the number of lines by combining the individuals and resolving the collisions between time slots. Note the time slots cannot overlap. For example, for 4 individuals, we have the following sequences:
id1:1,1,1,0,0,0,0,0,0,0
id2:0,0,0,0,0,0,1,1,1,1
id3:0,0,0,0,1,0,0,0,0,0
id4:1,1,1,1,0,0,0,0,0,0
One can arrange them to end up with two lines, while keeping track of permuted individuals (for the record). In our example it yields:
1,1,1,0,1,0,1,1,1,1 (id1 + id2 + id3)
1,1,1,1,0,0,0,0,0,0 (id4)
The constraints are the following:
The number of individuals range from 500 to 1000,
The length of the sequence will never exceed 30
Each sequence in the file has the exact same length,
The algorithm needs to be reasonable in execution time because this task may be repeated up to 200 times.
We don't necessarly search for the optimal solution, a near optimal solution would suffice.
We need to keep track of the combined individuals (as in the example above)
Genetic algorithms seems a good option but how does it scales (in terms of execution time) with the size of this problem?
A suggestion in Matlab or R would be (greatly) appreciated.
Here is a sample test:
id1:1,1,1,0,0,0,0,0,0,0
id2:0,0,0,0,0,0,1,1,1,1
id3:0,0,0,0,1,0,0,0,0,0
id4:1,1,1,1,1,0,0,0,0,0
id5:0,1,1,1,0,0,0,0,0,0
id6:0,0,0,0,0,0,0,1,1,1
id7:0,0,0,0,1,1,1,0,0,0
id8:1,1,1,1,0,0,0,0,0,0
id9:1,1,0,0,0,0,0,0,0,0
id10:0,0,0,0,0,0,1,1,0,0
id11:0,0,0,0,1,0,0,0,0,0
id12:0,1,1,1,0,0,0,0,0,0
id13:0,0,0,1,1,1,0,0,0,0
id14:0,0,0,0,0,0,0,0,0,1
id15:0,0,0,0,1,1,1,1,1,1
id16:1,1,1,1,1,1,1,1,0,0
Solution(s)
#Nuclearman provided a working solution in O(NT) (where N is the number of individuals (ids) and T is the number of time slots (columns)) based on the Greedy algorithm.
I study algorithms as a hobby and I agree with Caduchon on this one, that greedy is the way to go. Though I believe this is actually the clique cover problem, to be more accurate: https://en.wikipedia.org/wiki/Clique_cover
Some ideas on how to approach building cliques can be found here: https://en.wikipedia.org/wiki/Clique_problem
Clique problems are related to independence set problems.
Considering the constraints, and that I'm not familiar with matlab or R, I'd suggest this:
Step 1. Build the independence set time slot data. For each time slot that is a 1, create a mapping (for fast lookup) of all records that also have a one. None of these can be merged into the same row (they all need to be merged into different rows). IE: For the first column (slot), the subset of the data looks like this:
id1 :1,1,1,0,0,0,0,0,0,0
id4 :1,1,1,1,1,0,0,0,0,0
id8 :1,1,1,1,0,0,0,0,0,0
id9 :1,1,0,0,0,0,0,0,0,0
id16:1,1,1,1,1,1,1,1,0,0
The data would be stored as something like 0: Set(id1,id4,id8,id9,id16) (zero indexed rows, we start at row 0 instead of row 1 though probably doesn't matter here). Idea here is to have O(1) lookup. You may need to quickly tell that id2 is not in that set. You can also use nested hash tables for that. IE: 0: { id1: true, id2: true }`. Sets also allow for usage of set operations which may help quite a bit when determining unassigned columns/ids.
In any case, none of these 5 can be grouped together. That means at best (given that row) you must have at least 5 rows (if the other rows can be merged into those 5 without conflict).
Performance of this step is O(NT), where N is the number of individuals and T is the number of time slots.
Step 2. Now we have options of how to attack things. To start, we pick the time slot with the most individuals and use that as our starting point. That gives us the min possible number of rows. In this case, we actually have a tie, where the 2nd and 5th rows both have 7. I'm going with the 2nd, which looks like:
id1 :1,1,1,0,0,0,0,0,0,0
id4 :1,1,1,1,1,0,0,0,0,0
id5 :0,1,1,1,0,0,0,0,0,0
id8 :1,1,1,1,0,0,0,0,0,0
id9 :1,1,0,0,0,0,0,0,0,0
id12:0,1,1,1,0,0,0,0,0,0
id16:1,1,1,1,1,1,1,1,0,0
Step 3. Now that we have our starting groups we need to add to them while trying to avoid conflicts between new members and old group members (which would require an additional row). This is where we get into NP-complete territory as there are a ton (roughly 2^N to be more accurately) to assign things.
I think the best approach might be a random approach as you can theoretically run it as many times as you have time for to get results. So here is the randomized algorithm:
Given the starting column and ids (1,4,5,8,9,12,16 above). Mark this column and ids as assigned.
Randomly pick an unassigned column (time slot). If you want a perhaps "better" result. Pick the one with the least (or most) number of unassigned ids. For faster implementation, just loop over the columns.
Randomly pick an unassigned id. For a better result, perhaps the one with the most/least groups that could be assigned that ID. For faster implementation, just pick the first unassigned id.
Find all groups that unassigned ID could be assigned to without creating conflict.
Randomly assign it to one of them. For faster implementation, just pick the first one that doesn't cause a conflict. If there are no groups without conflict, create a new group and assign the id to that group as the first id. The optimal result is that no new groups have to be created.
Update the data for that row (make 0s into 1s as needed).
Repeat steps 3-5 until no unassigned ids for that column remain.
Repeat steps 2-6 until no unassigned columns remain.
Example with the faster implementation approach, which is an optimal result (there cannot be less than 7 rows and there are only 7 rows in the result).
First 3 columns: No unassigned ids (all have 0). Skipped.
4th Column: Assigned id13 to id9 group (13=>9). id9 Looks like this now, showing that the group that started with id9 now also includes id13:
id9 :1,1,0,1,1,1,0,0,0,0 (+id13)
5th Column: 3=>1, 7=>5, 11=>8, 15=>12
Now it looks like:
id1 :1,1,1,0,1,0,0,0,0,0 (+id3)
id4 :1,1,1,1,1,0,0,0,0,0
id5 :0,1,1,1,1,1,1,0,0,0 (+id7)
id8 :1,1,1,1,1,0,0,0,0,0 (+id11)
id9 :1,1,0,1,1,1,0,0,0,0 (+id13)
id12:0,1,1,1,1,1,1,1,1,1 (+id15)
id16:1,1,1,1,1,1,1,1,0,0
We'll just quickly look the next columns and see the final result:
7th Column: 2=>1, 10=>4
8th column: 6=>5
Last column: 14=>4
So the final result is:
id1 :1,1,1,0,1,0,1,1,1,1 (+id3,id2)
id4 :1,1,1,1,1,0,1,1,0,1 (+id10,id14)
id5 :0,1,1,1,1,1,1,1,1,1 (+id7,id6)
id8 :1,1,1,1,1,0,0,0,0,0 (+id11)
id9 :1,1,0,1,1,1,0,0,0,0 (+id13)
id12:0,1,1,1,1,1,1,1,1,1 (+id15)
id16:1,1,1,1,1,1,1,1,0,0
Conveniently, even this "simple" approach allowed for us to assign the remaining ids to the original 7 groups without conflict. This is unlikely to happen in practice with as you say "500-1000" ids and up to 30 columns, but far from impossible. Roughly speaking 500 / 30 is roughly 17, and 1000 / 30 is roughly 34. So I would expect you to be able to get down to roughly 10-60 rows with about 15-45 being likely, but it's highly dependent on the data and a bit of luck.
In theory, the performance of this method is O(NT) where N is the number of individuals (ids) and T is the number of time slots (columns). It takes O(NT) to build the data structure (basically converting the table into a graph). After that for each column it requires checking and assigning at most O(N) individual ids, they might be checked multiple times. In practice since O(T) is roughly O(sqrt(N)) and performance increases as you go through the algorithm (similar to quick sort), it is likely O(N log N) or O(N sqrt(N)) on average, though really it's probably more accurate to use O(E) where E is the number of 1s (edges) in the table. Each each likely gets checked and iterated over a fixed number of times. So that is probably a better indicator.
The NP hard part comes into play in working out which ids to assign to which groups such that no new groups (rows) are created or a lowest possible number of new groups are created. I would run the "fast implementation" and the "random" approaches a few times and see how many extra rows (beyond the known minimum) you have, if it's a small amount.
This problem, contrary to some comments, is not NP-complete due to the restriction that "There cannot be two separated sequences in a line". This restriction implies that each line can be considered to be representing a single interval. In this case, the problem reduces to a minimum coloring of an interval graph, which is known to be optimally solved via a greedy approach. Namely, sort the intervals in descending order according to their ending times, then process the intervals one at a time in that order always assigning each interval to the first color (i.e.: consolidated line) that it doesn't conflict with or assigning it to a new color if it conflicts with all previously assigned colors.
Consider a constraint programming approach. Here is a question very similar to yours: Constraint Programming: Scheduling with multiple workers.
A very simple MiniZinc-model could also look like (sorry no Matlab or R):
include "globals.mzn";
%int: jobs = 4;
int: jobs = 16;
set of int: JOB = 1..jobs;
%array[JOB] of var int: start = [0, 6, 4, 0];
%array[JOB] of var int: duration = [3, 4, 1, 4];
array[JOB] of var int: start = [0, 6, 4, 0, 1, 8, 4, 0, 0, 6, 4, 1, 3, 9, 4, 1];
array[JOB] of var int: duration = [3, 4, 1, 5, 3, 2, 3, 4, 2, 2, 1, 3, 3, 1, 6, 8];
var int: machines;
constraint cumulative(start, duration, [1 | j in JOB], machines);
solve minimize machines;
This model does not, however, tell which jobs are scheduled on which machines.
Edit:
Another option would be to transform the problem into a graph coloring problem. Let each line be a vertex in a graph. Create edges for all overlapping lines (the 1-segments overlap). Find the chromatic number of the graph. The vertices of each color then represent a combined line in the original problem.
Graph coloring is a well-studied problem, for larger instances consider a local search approach, using tabu search or simulated annealing.

mpi4py create multiple groups and scatter from each group

Say I have a comm for 64 ranks. How can I create a group in mpi4py consisting of the first x ranks, a second group consisting of the remaining 64-x ranks, and comms for each group?
MPI_Comm_split creates new communicators by splitting a communicator into a group of sub-communicators based on the input values color and key.
All processes which pass in the same value for color are assigned to the same communicator. In your case, the first x processes should pass in a value for color and the rest should choose a different value.
key determines the ordering (rank) within each new communicator. The process which passes in the smallest value for key will be rank 0, the next smallest will be rank 1, and so on. If you don't need to change the original order of processes, you can use their rank as the key.
Combining these, here is an example in C:
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
int x = 10;
int color = rank < x;
MPI_Comm new_comm;
MPI_Comm_split(MPI_COMM_WORLD, color, rank, &new_comm);
Source and further information: http://mpitutorial.com/tutorials/introduction-to-groups-and-communicators/

Why my function is looping until Prolog crashes

I can't seem to understand why my function is looping until Prolog crashes:
isTerminalRow(_,_,_,Count,10):-
Count > 4.
isTerminalRow(B,A,Index,Count,Move):-
checkValue(B,Index,A,V),
C2 is V + Count,
I2 is Index + 1,
Move1 is Move + 1,
isTerminalRow(B,A,I2,C2,Move1).
checkValue(B,Index,A,V):-
getE(Index,B,Value),
Value = A, V is 1
; V is 0.
getE(1,[H|_],H). % get nth element
getE(I,[_|T],L):-
I1 is I - 1,
getE(I1,T,L).
The call is
?- isTerminalRow([w,w,w,w,w,e,e,e,e,e],w,1,0,10).
From your is uses, Move and Count are ground when you call isTerminalRow. For your first clause to fire, when Count becomes larger than 4, Move must be 10.
If not, the first clause does not fire; it doesn't even get a chance to consider the value of Count, and the execution continues with the second clause, which just loops (if checkValue/4 doesn't fail, that is).
Your termination conditions are too specific. Chances are, they are never met.
update: from your comments, Move is already 10 in your query, and Count is 0, so the first clause fails. After that, Move is always greater than 10 because you increment it with Move1 is Move + 1, and there's no chance for Count > 4 to even be tested, ever.

MPI several broadcast at the same time

I have a 2D processor grid (3*3):
P00, P01, P02 are in R0, P10, P11, P12, are in R1, P20, P21, P22 are in R2.
P*0 are in the same computer. So the same to P*1 and P*2.
Now I would like to let R0, R1, R2 call MPI_Bcast at the same time to broadcast from P*0 to p*1 and P*2.
I find that when I use MPI_Bcast, it takes three times the time I need to broadcast in only one row.
For example, if I only call MPI_Bcast in R0, it takes 1.00 s.
But if I call three MPI_Bcast in all R[0, 1, 2], it takes 3.00 s in total.
It means the MPI_Bcast cannot work parallel.
Is there any methods to make the MPI_Bcast broadcast at the same time?
(ONE node broadcast with three channels at the same time.)
Thanks.
If I understand your question right, you would like to have simultaneous row-wise broadcasts:
P00 -> P01 & P02
P10 -> P11 & P12
P20 -> P21 & P22
This could be done using subcommunicators, e.g. one that only has processes from row 0 in it, another one that only has processes from row 1 in it and so on. Then you can issue simultaneous broadcasts in each subcommunicator by calling MPI_Bcast with the appropriate communicator argument.
Creating row-wise subcommunicators is extreamly easy if you use Cartesian communicator in first place. MPI provides the MPI_CART_SUB operation for that. It works like that:
// Create a 3x3 non-periodic Cartesian communicator from MPI_COMM_WORLD
int dims[2] = { 3, 3 };
int periods[2] = { 0, 0 };
MPI_Comm comm_cart;
// We do not want MPI to reorder our processes
// That's why we set reorder = 0
MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, 0, &comm_cart);
// Split the Cartesian communicator row-wise
int remaindims[2] = { 0, 1 };
MPI_Comm comm_row;
MPI_Cart_sub(comm_cart, remaindims, &comm_row);
Now comm_row will contain handle to a new subcommunicator that will only span the same row that the calling process is in. It only takes a single call to MPI_Bcast now to perform three simultaneous row-wise broadcasts:
MPI_Bcast(&data, data_count, MPI_DATATYPE, 0, comm_row);
This works because comm_row as returned by MPI_Cart_sub will be different in processes located at different rows. 0 here is the rank of the first process in comm_row subcommunicator which will correspond to P*0 because of the way the topology was constructed.
If you do not use Cartesian communicator but operate on MPI_COMM_WORLD instead, you can use MPI_COMM_SPLIT to split the world communicator into three row-wise subcommunicators. MPI_COMM_SPLIT takes a color that is used to group processes into new subcommunicators - processes with the same color end up in the same subcommunicator. In your case color should equal to the number of the row that the calling process is in. The splitting operation also takes a key that is used to order processes in the new subcommunicator. It should equal the number of the column that the calling process is in, e.g.:
// Compute grid coordinates based on the rank
int proc_row = rank / 3;
int proc_col = rank % 3;
MPI_Comm comm_row;
MPI_Comm_split(MPI_COMM_WORLD, proc_row, proc_col, &comm_row);
Once again comm_row will contain the handle of a subcommunicator that only spans the same row as the calling process.
The MPI-3.0 draft includes a non-blocking MPI_Ibcast collective. While the non-blocking collectives aren't officially part of the standard yet, they are already available in MPICH2 and (I think) in OpenMPI.
Alternatively, you could start the blocking MPI_Bcast calls from separate threads (I'm assuming R0, R1 and R2 are different communicators).
A third possibility (which may or may not be possible) is to restructure the data so that only one broadcast is needed.

Resources