Julia distribute function: specifying distributed dimension - julia

I'm interested in distributing an MxN integer array across p workers. Is there a way to specify which dimension gets distributed? In particular, I want to keep the number of rows M fixed and distribute over N columns. In my case M > N (I have a term-document matrix with vocabulary of size M and number of documents N).
By default, Julia appears to distribute over the dimension that has the largest size, which doesn't work for my application (I want to distribute over the documents and not the vocabulary). Is there a way to control which dimension gets distributed?

SharedArray constructor has a pids optional parameter which maps elements to processes (see documentation).
So, an MxN matrix can be initialized with the following code:
# a helper function which might be useful in other contexts
function balancedfill(v,n,b)
d,r = divrem(n,b)
return v[[repeat(1:r,inner=d+1);repeat(r+1:b,inner=d)]]
end
# N,M = size(mat)
pidvec = repeat(balancedfill(1:nprocs(),N,nprocs()),inner=M)
sharedmat = SharedArray{Float64}((N,M); pids=pidvec)
This creates a Float64 shared array, with columns balanced between processes. Float64 can be replaced by the element-type needed. With a little change (switching inner with outer and N with M in pidvec creation) a row-wise distributed array can be created.

Related

Design an algorithm that minimises the load on the most heavily loaded server

Reading the book of Aziz & Prakash 2021 I am a bit stuck on problem 3.7 and the associated solution for which I am trying to implement.
The problem says :
You have n users with unique hashes h1 through hn and
m servers, numbered 1 to m. User i has Bi bytes to store. You need to
find numbers K1 through Km such that all users with hashes between
Kj and Kj+1 get assigned to server j. Design an algorithm to find the
numbers K 1 through Km that minimizes the load on the most heavily
loaded server.
The solution says:
Let L(a,b) be the maximum load on a server when
users with hash h1 through ha are assigned to servers S1 through Sb in
an optimal way so that the max load is minimised. We observe the
following recurrence:
In other words, we find the right value of x such that if we pack the
first x users in b - 1 servers and the remaining in the last servers the max
load on a given server is minimized.
Using this relationship, we can tabulate the values of L till we get
L(n,m). While computing L(a,b) when the values of L is tabulated
for all lower values of a and b we need to find the right value of x to
minimize the load. As we increase x, L(x,b-1) in the above expression increases the the sum term decreases. We can do binary search for x to find x that minimises their max.
I know that we can probably use some sort of dynamic programming, but how could we possibly implement this idea into a code?
The dynamic programming algorithm is defined fairly well given that formula: Implementing a top-down DP algorithm just needs you to loop from x = 1 to a and record which one minimizes that max(L(x,b-1), sum(B_i)) expression.
There is, however, a simpler (and faster) greedy/binary search algorithm for this problem that you should consider, which goes like this:
Compute prefix sums for B
Find the minimum value of L such that we can partition B into m contiguous subarrays whose maximum sum is equal to L.
We know 1 <= L <= sum(B). So, perform a binary search to find L, with a helper function canSplit(v) that tests whether we can split B into such subarrays of sum <= v.
canSplit(v) works greedily: Remove as many elements from the start of B as possible so that our sum does not exceed v. Repeat this a total of m times; return True if we've used all of B.
You can use the prefix sums to run canSplit in O(m log n) time, with an additional inner binary search.
Given L, use the same strategy as the canSplit function to determine the m-1 partition points; find the m partition boundaries from there.

BLAS routine to compute diagonal elements only of a matrix product?

Say I have two matrices A and B. I want to compute the diagonal elements of the matrix product A * B and place them in a pre-allocated vector result.
Is there a BLAS (or similar) routine to do this as fast as possible?
There is no specific routine for that. However, you can use the following definition of matrix multiplication.
Consider C = AB, and aij, bij, cij to denote the (i,j)th element of the corresponding matrices. Without loss of generality, I will assume that all A,B,C are N x N dense matrices.
Then,
cij = sumk=0N-1 (aik, bkj)
Since you are interested only in the diagonal entries:
cii = sumk=0N-1 (aik, bki), for i=1,...,N
In other words, to calculate the ith diagonal matrix of matrix C you need to find a dot product between the ith row of matrix A and ith column of matrix B. That can be achieved by using a dot product BLAS level-1 function ?dot.
res = ?dot(n, x, incx, y, incy)
Let's assume that matrices A and B are stored column-wise and are accessible via pointers *A and *B (which hold N*N values), while *C is a preallocated storage for diagonal entries of matrix C (which holds N values).
The following loop should give you the diagonal:
for (int i=0;i<N;i++)
{
C[i] = ?dot(N,A[i],N,B[i*N],1);
}
Notice, that we are accessing the ith row of matrix A by passing the first element of the ith row: A[i], and using increment (incx) of N. In contrast, to access the ith column of matrix B we pass the first element of the ith column: B[i*N] and use increment of 1.
Notes:
if A,B, and C have different (but consistent with matrix multiplication) dimensions, only slight modifications will have to be applied.
if matrices are stored row-wise, the call to ?dot should be slightly changed
the pseudocode above uses a general ?dot function. In practice, it will be sdot or ddot for single- or double precision real numbers, and versions of ?dotu: cdotu and zdotu for complex numbers of single and double precision, respectively.
is it the most efficient, cache-friendly, etc-etc implementation? probably not, but it would surprise me if that becomes a bottleneck in an algorithm where NxN matrices A and B have been explicitly calculated anyway.

Weighted randomization based on runtime data in System Verilog

Is there a way to do weighted randomization in System Verilog based on runtime data. Say, I have a queue of integers and a queue of weights (unsigned integers) and wish to select a random integer from the first queue as per the weights in the second queue.
int data[$] = '{10, 20, 30};
uint_t weights[$] = '{100, 200, 300};
Any random construct expects the weights hardcoded as in
constraint range { Var dist { [0:1] := 50 , [2:7] := 50 }; }
But in my case, I need to pick an element from an unknown number of elements.
PS: Assume the number of elements and weights will be the same always.
Unfortunately, the dist constraint only lets you choose from a fixed number of values.
Two approaches I can think of are
Push each data value into a queue using the weight as a repetition count. In your example, you wind up with a queue of 600 values. Randomly pick an index into the queue. The selected element has the distribution you want. An example is posted here.
Create an array of ranges for each weight. For your example the array would be uint_t ranges[][2]'{{0,99},{100,299},{300,599}}. Then you could do the following in a constraint
index inside {[0:weights.sum()-1]};
foreach (data[ii])
index inside {[ranges[ii][0]:ranges[ii][1]} -> value == date[ii];

Generate Unique Combinations of Integers

I am looking for help with pseudo code (unless you are a user of Game Maker 8.0 by Mark Overmars and know the GML equivalent of what I need) for how to generate a list / array of unique combinations of a set of X number of integers which size is variable. It can be 1-5 or 1-1000.
For example:
IntegerList{1,2,3,4}
1,2
1,3
1,4
2,3
2,4
3,4
I feel like the math behind this is simple I just cant seem to wrap my head around it after checking multiple sources on how to do it in languages such as C++ and Java. Thanks everyone.
As there are not many details in the question, I assume:
Your input is a natural number n and the resulting array contains all natural numbers from 1 to n.
The expected output given by the combinations above, resembles a symmetric relation, i. e. in your case [1, 2] is considered the same as [2, 1].
Combinations [x, x] are excluded.
There are only combinations with 2 elements.
There is no List<> datatype or dynamic array, so the array length has to be known before creating the array.
The number of elements in your result is therefore the binomial coefficient m = n over 2 = n! / (2! * (n - 2)!) (which is 4! / (2! * (4 - 2)!) = 24 / 4 = 6 in your example) with ! being the factorial.
First, initializing the array with the first n natural numbers should be quite easy using the array element index. However, the index is a property of the array elements, so you don't need to initialize them in the first place.
You need 2 nested loops processing the array. The outer loop ranges i from 1 to n - 1, the inner loop ranges j from 2 to n. If your indexes start from 0 instead of 1, you have to take this into consideration for the loop limits. Now, you only need to fill your target array with the combinations [i, j]. To find the correct index in your target array, you should use a third counter variable, initialized with the first index and incremented at the end of the inner loop.
I agree, the math behind is not that hard and I think this explanation should suffice to develop the corresponding code yourself.

Calculating Cosine Similarity of two Vectors of Different Size

I have 2 questions,
I've made a vector from a document by finding out how many times each word appeared in a document. Is this the right way of making the vector? Or do I have to do something else also?
Using the above method I've created vectors of 16 documents, which are of different sizes. Now i want to apply cosine similarity to find out how similar each document is. The problem I'm having is getting the dot product of two vectors because they are of different sizes. How would i do this?
Sounds reasonable, as long as it means you have a list/map/dict/hash of (word, count) pairs as your vector representation.
You should pretend that you have zero values for the words that do not occur in some vector, without storing these zeros anywhere. Then, you can use the following algorithm to compute the dot product of these vectors (pseudocode):
algorithm dot_product(a : WordVector, b : WordVector):
dot = 0
for word, x in a do
y = lookup(word, b)
dot += x * y
return dot
The lookup part can be anything, but for speed, I'd use hashtables as the vector representation (e.g. Python's dict).

Resources