Check equality of a value in all MPI ranks - mpi

Say I have some int x. I want to check if all MPI ranks get the same value for x. What's a good way to achieve this using MPI collectives?
The simplest I could think of is, broadcast rank0's x, do the comparison, and allreduce-logical-and the comparison result. This requires two collective operations.
...
x = ...
x_bcast = comm.bcast(x, root=0)
all_equal = comm.allreduce(x==x_bcast, op=MPI.LAND)
if not all_equal:
raise Exception()
...
Is there a better way to do this?
UPDATE:
From the OpenMPI user list, I received the following response. And I think it's quite a nifty trick!
A pattern I have seen in several places is to allreduce the pair p =
{-x,x} with MPI_MIN or MPI_MAX. If in the resulting pair p[0] == -p[1],
then everyone has the same value. If not, at least one rank had a
different value. Example:
bool is_same(int x) {
int p[2];
p[0] = -x;
p[1] = x;
MPI_Allreduce(MPI_IN_PLACE, p, 2, MPI_INT, MPI_MIN, MPI_COMM_WORLD);
return (p[0] == -p[1]);
}

Solutions based on logical operators assume that you can convert between integers and logicals without any data loss. I think that's dangerous. You could do a bitwise AND where you make sure you use all the bytes of your int/real/whatever.
You could do two reductions: one max and one min, and see if they give the same result.
You could also write your own reduction operator: operate on two ints, and do a max on the first, min on the second. Then test if the two are the same.

Related

Find which sum of any numbers in an array equals amount

I have a customer who sends electronic payments but doesn't bother to specify which invoices. I'm left guessing which ones and I would rather not try every single combination manually. I need some sort of pseudo-code to do it and then I can adapt it but I'm not sure I can come up with a good algorithm myself. . I'm familiar with php, bash, and python but I can adapt.
I would need an array with the following numbers: [357.15, 223.73, 106.99, 89.96, 312.39, 120.00]. Those are the amounts of the invoices. Then I would need to find a sum of any combination of two or more of those numbers that adds up to 596.57. Once found the program would need to tell me exactly which numbers it used to reach the sum so I can then know which invoices got paid.
This is very similar to the Subset Sum problem and can be solved using a similar approach to the typical brute-force method used for that problem. I have to do this often enough that I keep a simple template of this algorithm handy for when I need it. What is posted below is a slightly modified version1.
This has no restrictions on whether the values are integer or float. The basic idea is to iterate over the list of input values and keep a running list of every subset that sums to less than the target value (since there might be a later value in the inputs that will yield the target). It could be modified to handle negative values as well by removing the rule that only keeps candidate subsets if they sum to less than the target. In that case, you'd keep all subsets, and then search through them at the end.
import copy
def find_subsets(base_values, taget):
possible_matches = [[0, []]] # [[known_attainable_value, [list, of, components]], [...], ...]
matches = [] # we'll return ALL subsets that sum to `target`
for base_value in base_values:
temp = copy.deepcopy(possible_matches) # Can't modify in loop, so use a copy
for possible_match in possible_matches:
new_val = possible_match[0] + base_value
if new_val <= target:
new_possible_match = [new_val, possible_match[1]]
new_possible_match[1].append(base_value)
temp.append(new_possible_match)
if new_val == target:
matches.append(new_possible_match[1])
possible_matches = temp
return matches
find_subsets([list, of input, values], target_sum)
This is a very inefficient algorithm and it will blow up quickly as the size of the input grows. The Subset Sum problem is NP-Complete, so you are not likely to find a generalized solution that will work in all cases and is efficient.
1: The way lists are being used here is kludgy. If the goal was to simply find any match, the nested lists could be replaced with a dictionary, and we could exit right away once a match is found. But doing that will cause intermediate subsets that sum to the same value to also map to the same dictionary slot, so only one subset with that sum is kept. Since we need to report all matching subsets (because the values represent checks and are presumably not fungible even if the dollar amounts are equal), a dictionary won't work.
You can use itertools.combinations(t,r) to list all combinations of r elements in array t.
So we loop on the possible values of r, then on the results of itertools.combinations:
import itertools
def find_sum(t, obj):
t = [x for x in t if x < obj] # filter out elements which are too big
for r in range(1, len(t)+1): # loop on number of elements
for subt in itertools.combinations(t, r): # loop on combinations of r elements
if sum(subt) == obj:
return subt
return None
find_sum([1,2,3,4], 6)
# (2, 4)
find_sum([1,2,3,4], 10)
# (1, 2, 3, 4)
find_sum([1,2,3,4], 11)
# none
find_sum([35715, 22373, 10699, 8996, 31239, 12000], 59657)
# none
Rounding errors:
The code above is meant to be used with integers, rather than floats.
To use with floats, replace the test sum(subt) == obj with the more forgiving test sum(subt) - obj < 0.01.
Relevant documentation:
itertools.combinations

Output of this strange loop related to matrices

Let us consider the following pseudocode:
int n=n;
int A[][]
scanf(A[][],%d);
for i=1:n;i++
{
x=A[i][i]
for j=1:n;j++
{
if x<A[i][j]
a=x;
x=A[i][j];
A[i][i]=x;
A[i][j]=a;
return A[][]
I am fumbling on this pseudo code.the question, I think is just that the diagonal entries are compared and exchanged for the greatest entries. But, will the output depend on the entries of the matrix or will be independent of it is my main question. Specifically, is there any general formula for the output? Is it dependent on the type of matrix A I think it should some power of A. Any hints? Thanks beforehand.
You could just write your code on any language you love.
n = 3
A = [[1,2,3], [3,5,6], [7,8,9]]
for i in range(n):
x=A[i][i]
for j in range(n):
a = None
if x < A[i][j]:
a = x
x=A[i][j]
A[i][i]=x
A[i][j]=a
print (A)
Gives you:
[[3, 1, 2], [None, 6, 3], [None, 7, None]]
But, will the output depend on the entries of the matrix or will be
independent of it is my main question.
Ofc it depends. Your can see the initial data in the output. That means output depends on data.
Specifically, is there any general formula for the output?
I believe NO, but I cant mathematically prove. Just look at Nones appear in output. I hardly imagine such formula.
Is it dependent on the type of matrix A I think it should some power
of A.
What is 'type of matrix' ?

fixing race condition in tensorflow run

I would like to maintain a variable on the GPU, and perform some operations on that variable in place. The following snippet is a minimalish example of this.
import numpy as np
import tensorflow as tf
with tf.Graph().as_default():
i = tf.placeholder(tf.int32, [4], name='i')
y = tf.placeholder(tf.float32, [4], name='y')
_x = tf.get_variable('x', [4], initializer=tf.random_normal_initializer())
x = _x + tf.reduce_sum(tf.mul(_x,y))
assign_op = tf.assign(_x, x).op
permute_op = tf.assign(_x, tf.gather(_x, i))
ii = np.array([1,2,3,0])
yy = np.random.randn(4)
s = tf.Session()
s.run(tf.initialize_all_variables())
xxx0 = s.run(_x)
s.run([permute_op, assign_op], feed_dict={i: ii, y: yy})
xxx1 = s.run(_x)
print('assigned then permuted', np.allclose((xxx0+np.dot(xxx0,yy))[ii], xxx1))
print('permuted then assigned', np.allclose((xxx0[ii]+np.dot(xxx0[ii], yy)), xxx1))
The problem is that this program is ambiguous, in terms of the ordering of the assign_op and permute_op operations. Hence, one or the other of the final two print statements will be true, but which one that is varies randomly across multiple runs of the program. I could break this into two steps, the first running the permute_op and the second running the assign_op, but it seems this will be less efficient.
Is there an efficient way of breaking the race condition, and making the results predictable?
The easiest way to order the two assignments is to use the result of the first assignment as the variable input to the second one. This creates a data dependency between the assignments, which gives them a deterministic order. For example:
assigned = tf.assign(_x, x)
permuted = tf.assign(assigned, tf.gather(assigned, i))
sess.run(permuted.op) # Runs both assignments.
Note that I reversed the order of the permutation and assignment operations from what you said in your question, because doing the permutation first and then updating still has a race. Even if this isn't the semantics you wanted, the principle should hopefully be clear.
An alternative approach is to use with tf.control_dependencies(ops): blocks, where ops is a list of operations (such as assignments) that must run before the operations in the with block. This is slightly trickier to use, because you have to be careful about reading the updated value of a variable. (Like a non-volatile variable in C, the read may be cached.) The typical idiom to force a read is to use tf.identity(var.ref()), so the example would look something like:
assign_op = tf.assign(_x, x).op
with tf.control_dependencies([assign_op]):
# Read updated value of `_x` after `assign_op`.
new_perm = tf.gather(tf.identity(_x.ref()), i)
permute_op = tf.assign(_x, new_perm).op
sess.run(permute_op) # Runs both assignments.

Numpy indexing using array

I'm trying to return a (square) section from an array, where the indices wrap around the edges. I need to juggle some indexing, but it works, however, I expect the last two lines of codes to have the same result, why don't they? How does numpy interpret the last line?
And as a bonus question: Am I being woefully inefficient with this approach? I'm using the product because I need to modulo the range so it wraps around, otherwise I'd use a[imin:imax, jmin:jmax, :], of course.
import numpy as np
from itertools import product
i = np.arange(-1, 2) % 3
j = np.arange(1, 4) % 3
a = np.random.randint(1,10,(3,3,2))
print a[i,j,:]
# Gives 3 entries [(i[0],j[0]), (i[1],j[1]), (i[2],j[2])]
# This is not what I want...
indices = list(product(i, j))
print indices
indices = zip(*indices)
print 'a[indices]\n', a[indices]
# This works, but when I'm explicit:
print 'a[indices, :]\n', a[indices, :]
# Huh?
The problem is that advanced indexing is triggered if:
the selection object, obj, is [...] a tuple with at least one sequence object or ndarray
The easiest fix in your case is to use repeated indexing:
a[i][:, j]
An alternative would be to use ndarray.take, which will perform the modulo operation for you if you specify mode='wrap':
a.take(np.arange(-1, 2), axis=0, mode='wrap').take(np.arange(1, 4), axis=1, mode='wrap')
To give another method of advanced indexing which is better in my opinion then the product solution.
If you have for every dimension an integer array these are broadcasted together and the output is the same output as the broadcast shape (you will see what I mean)...
i, j = np.ix_(i,j) # this adds extra empty axes
print i,j
print a[i,j]
# and now you will actually *not* be surprised:
print a[i,j,:]
Note that this is a 3x3x2 array, while you had a 9x2 array, but simple reshape will fix that and the 3x3x2 array is actually closer to what you want probably.
Actually the surprise is still hidden in a way, because in your examples a[indices] is the same as a[indices[0], indicies[1]] but a[indicies,:] is a[(indicies[0], indicies[1]),:] which is not a big surprise that it is different. Note that a[indicies[0], indicies[1],:] does give the same result.
See : http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing
When you add :, you are mixing integer indexing and slicing. The rules are quite complicated and better explained than I could in the above link.

Fortran's do-loop over arbitary indices like for-loop in R?

I have two p-times-n arrays x and missx, where x contains arbitrary numbers and missx is an array containing zeros and ones. I need to perform recursive calculations on those points where missx is zero. The obvious solution would be like this:
do i = 1, n
do j = 1, p
if(missx(j,i)==0) then
z(j,i) = ... something depending on the previous computations and x(j,i)
end if
end do
end do
Problem with this approach is that most of the time missx is always 0, so there is quite a lot if statements which are always true.
In R, I would do it like this:
for(i in 1:n)
for(j in which(xmiss[,i]==0))
z[j,i] <- ... something depending on the previous computations and x[j,i]
Is there a way to do the inner loop like that in Fortran? I did try a version like this:
do i = 1, n
do j = 1, xlength(i) !xlength(i) gives the number of zero-elements in x(,i)
j2=whichx(j,i) !whichx(1:xlength(i),i) contains the indices of zero-elements in x(,i)
z(j2,i) = ... something depending on the previous computations and x(j,i)
end do
end do
This seemed slightly faster than the first solution (if not counting the amount of defining xlength and whichx), but is there some more clever way to this like the R version, so I wouldn't need to store those xlength and whichx arrays?
I don't think you are going to get dramatic speedup anyway, if you must do the iteration for most items, than storing just the list of those with the 0 value for the whole array is not an option. You can of course use the WHERE or FORALL construct.
forall(i = 1: n,j = 1: p,miss(j,i)==0) z(j,i) = ...
or just
where(miss==0) z = ..
But the ussual limitations of these constructs apply.

Resources