How to wrap a consecutive range of integers? - math

What is the shortest possible calculation f(i, n, len, offset) that wraps a range of integers starting from n (>=0) with length len, given a certain offset?
i offset 0 offset 1 offset 2 offset 15
10 -> 10 -> 15 -> 14 -> 13
11 -> 11 -> 10 -> 15 -> 14
12 -> 12 -> 11 -> 10 -> 11
13 -> 13 -> 12 -> 11 -> 10
14 -> 14 -> 13 -> 12 -> 11
15 -> 15 -> 14 -> 13 -> 12
So f(10, 10, 5, 1) = 15, f(15, 10, 5, 1) = 14 and f(10, 10, 5, 2) = 14.
Bonus karma for negative numbers or negative offsets or ranges that cross 0.

I don't know about "shortest possible", but this seems to work:
f(int n, int base, int len, int offset)
{ int r = n - offset;
if (r < base)
r += len;
return r;
}
It does require adding the base argument, though, because otherwise you have no idea where n is with respect to the range (e.g. is 15 at the bottom of the 15-20 range or the top of 10-15?). So your examples would become f(10, 10, 6, 1), f(15, 10, 6, 1), etc...
Haven't checked whether that works for negative numbers and/or ranges spanning 0, and it also fails if offset > len, but that can be worked around by adding offset %= len to normalize the input parameters.

Related

Matrix multiplication in Fixed Point for 16 bits

I need perform the matrix multiplicatión between differents layers in a neural network. That is: W0, W1, W2, ... Wn are the weights of the neural netwotk and the input is data. Resulting matrices are:
Out1 = data * W0
Out2 = Out1 * W1
Out3 = Out2 * W2
.
.
.
OutN = Out(N-1) * Wn
I Know the absolute max value in the weights matrices and also I know that the input data range values are from 0 to 1 (input are normalizated). The matrix multiplication is in fixed point with 16 bits. The weights are scalated to the optimal format point. For example: if the absolute maximun value in W0 is 2.5 I know that the minimun number of bits in the integer part is 2 and the bits in fractional part will be 14. Because the data input is in the range [0,1] also I know the integer and fractional bits are 1.15.
My question is: How can I know the mininum number of bits in the integer part in the resultant matrix to avoid overflow? Is there anyway to study and infer the maximun value in a matrix multiplication? I know about determinant and norm of a matrix, but, I think the problem is in the consecutive negatives or positives values in the matrix rows an columns. For example, if I have this row vector and this column vector, and the result is in 8 bits fixed point:
A = [1, 2, 3, 4, 5, 6, -7, -8]
B = [1, 2, 3, 4, 5, 6, 7, 8]
A * B = (1*1) + (2*2) + (3*3) + (4*4) + (5*5) + (6*6) + (7*-7) + (8*8) = 90 - 49 + -68
When the sum accumulator is below than 64, occurs overflow altough the final result be contained between [-64,63].
Another example: If I have have this row vector and this column vector, and the result is in 8 bits fixed point:
A = [1, -2, 3, -4, 5, -6, 7, -8]
B = [1, 2, 3, 4, 5, 6, 7, 8]
A * B = (1*1) - (2*2) + (3*3) - (4*4) + (5*5) - (6*6) + (7*7) - (8*8) = -36
The sum accumulator in any moment exceeds the maximun range for 8 bits.
To sum up: I'm looking for a way to analize the weights matrices to avoid the overflow in the sum accumulator. The way that I do the matrix multiplication is (only a example if matrices A and B has been scalated to 1.15 format):
A1 --> 1.15 bits
B1 --> 1.15 bits
A2 --> 1.15 bits
B2 --> 1.15 bits
mult_1 = (A1 * B1) >> 2^15; // Right shift to alineate the operands
mult_2 = (A2 * B2) >> 2^15; // Right shift to alineate the operands
sum_acc = mult_1 + mult_2; // Sum accumulator
let consider n=100 dimensional dot product (which is part of any matrix multiplication or convolution) of %3.13 fixed point format as an example.
Integer bits
max value in %4.13 is slightly below 2^4 so let consider it would be: 15.999999
Now n dimensional dot product has n multiplications and n-1 additions.
15.999999*15.999999 + 15.999999*15.999999 + .... + 15.999999*15.999999
Each multiplication will sum up the integer bits
15.999999*15.999999 = 255.999999 -> ceil(log2(255)) = 8 = 2*(4)-> %8.13
Now this value is 99 times added so its the same as:
255.999999*99 = 25343.999999 -> ceil(log2(25343)) = 15 = ceil(8+log2(99)) -> %15.13
So if n is number of dimensions and i is number of integer bits the result needs:
i' = ceil((i*2)+log2(n-1))
integer bits... so:
%1.? -> 99*( 1.999999^2) = 395.99 -> % 9.?
%2.? -> 99*( 3.999999^2) = 1583.99 -> %11.?
%3.? -> 99*( 7.999999^2) = 6335.99 -> %13.?
%4.? -> 99*(15.999999^2) = 25343.99 -> %15.?
i(1) = ceil((1*2)+log2(99)) = ceil(2+6.626) = 9
i(2) = ceil((2*2)+log2(99)) = ceil(4+6.626) = 11
i(3) = ceil((3*2)+log2(99)) = ceil(6+6.626) = 13
i(4) = ceil((4*2)+log2(99)) = ceil(8+6.626) = 15
Fractional bits
ok let see what hapens with multiplication:
0.1b^2 = 0.01b -> %?.1 -> %?.2
0.01b^2 = 0.0001b -> %?.2 -> %?.4
0.001b^2 = 0.000001b -> %?.3 -> %?.6
so f' = 2*f where f is number of fractional bits. The addition is not changing the bitwidth:
0.1b*2 = 1.0b -> %?.1 -> %?.1
0.01b*2 = 0.1b -> %?.2 -> %?.2
0.001b*2 = 0.01b -> %?.3 -> %?.3
as the result will not be smaller then operands. So when applying fractional part to the dot product we will have:
i' = ceil((i*2)+log2(n-1))
f' = 2*f

Equalizing the array elements [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
If you have three or more numbers and a division parameter such that you have to equalizing the arrays elements in minimum number of operations. You can equalize the elements by dividing the array elements from division parameter only.
Example 1 :
Vector arr{64,32,16};
Division parameter=2.
Minimum no. Of operations are 3.
Explanation: divide 64 by 2 two times and divide 32 by 2 one times. So min operations are 2+1=3.
Example 2:
Vector arr{64,33,25};
Division parameter=2.
Minimum no. Of operations are 15.
Explanation:
For minimum no. Of operations you have to divide 64 (six times) ,33(5 times) , 25 (4 times) . Such that both of three elements become 1 .
division parameter is user given. Vector array and its size is also user given
There is always integral division ex: 33/2=16.
Please help me to solve this query in an efficient way.
Taking GCD was my first thought until you corrected the question to clarify that division is integer division.
Now, I came up with 2 algorithms
Algorithm 1:
Take largest number, divide it till it becomes 2nd largest, or equal to the 2nd largest, and increase the counter with each division
If it becomes second largest, repeat the above steps again.
If it becomes equal to the 2nd largest, then then start comparing it with the 3rd largest, but now, increase the counter by 2 with each division of it (as there are 2 equal largest numbers), and then repeat the above steps.
Ex -
[64,32,17,36], div factor = 2, counter(ctr) = 0
64 -> 32, [32,32,17,36] steps = 1, ctr = 1
36 -> 18, [32,32,17,18] steps = 1, ctr = 2
32 -> 16, [16,16,17,18] steps = 1*2(as 2 values = 32) = 2, ctr = 4
18 -> 9, [16,16,17,9] steps = 1, ctr = 5
17 -> 8, [16,16,8,9] steps = 1, ctr = 6
16 -> 8, [8,8,8,9] steps = 1*2(as 2 values = 16) = 2, ctr = 8
9 -> 4, [8,8,8,4] steps = 1, ctr = 9
8 -> 4, [4,4,4,4] steps = 1*3(as 3 values = 8) = 3, ctr = 12
So the minimum steps come out to be 12.
(64 -> 4, 32 -> 3, 17 -> 2, 36 -> 3) = 4 + 3 + 2 +3 = 12
Algorithm 2 (Better)
Start by equalizing pairs, moving left from right.
With each division of the left number, increase the
counter by the index of right number (or index of left
number + 1)
With each division of right number, increase the counter
by 1
Continue till you reach the last pairs.
Ex -
[64,32,17,36], div factor 2, counter (ctr) = 0
(64,32),17,36 -> (32,32),17,36 => steps = 1*1 = 1, ctr = 1
32,(32,17),36 -> 8,(8,8),36 => steps = 2*2 + 1 = 5, ctr = 6
8,8,(8,36) -> 8,8,(4,4) => steps = 1*3 + 3 = 6, ctr = 12
Ans = 12

Best way to find least standard deviation

I have a spreadsheet where I put numbers that represent number of verses on each paragraph of a book.
I manually distribute sequential paragraphs by number of verses, so in the spreadsheet I'll have something like this:
Verses Day
5 1
6 1
3 1
10 2
8 3
4 3
2 3
6 4
3 4
10 5
3 5
2 6
5 6
10 7
= 2,7080128015
By summing the total of verses for each day - in this case, 7 days - I get the standard deviation and try to reduce it for a better distribution of paragraphs.
The question is: what is the best way to find the least standard deviation?
I thought on using brute force to generate all possible combinations, but that is not a good idea if the number increases.
EDIT: The standard deviation is based on total number of verses of each day, which are identified sequentialy. Day 1 has total of 14 verses, day 2, 10 and so on.
1 14
2 10
3 14
4 9
5 13
6 7
7 10
= 2,7080128015
Since the total number of verses and the number of days is constant, you want to minimize
sum (avg verse count - verse count of day i)^2
i
avg verse count is a constant and simply the total number of verses divided by the number of days.
This problem can be solved with a dynamic program over the days. Let us build the partial solution function f(days, paragraph) that gives us the minimal sum of squares for distributing paragraphs 0 through paragraph over days days. We are interested in the last value of this function.
We can build the function incrementally. Calculating f(1, p) for any p is straight-forward since we just need to calculate the differences to the average and square. Then, for all other days, we can calculate
f(d, p) = min f(d - 1, i) + (avg verse count - sum verse count of paragraph j)^2
i<p j:i+1..p
That means, we check the solutions for one day less and fill up the current day with the paragraphs between the previous day's end paragraph and p. While we calculate this function, we keep a pointer to the chosen minimum element (as usual for a dynamic program). When we are done calculating the entire function, we just follow the pointers back to the start, which will give us the partitioning.
The algorithm has a running time of O(d * p^2), where d is the number of days and p is the number of paragraphs.
Example Code
Here is some example C# code that implements the above algorithm:
struct Entry
{
public double minCost;
public int predecessor;
}
public static void Main()
{
//input data
int[] versesPerParagraph = { 5, 6, 3, 10, 8, 4, 2, 6, 3, 10, 3, 2, 5, 10 };
int days = 7;
//calculate constants
double avgVerses = (double)versesPerParagraph.Sum() / days;
//set up DP table (f(d,p))
int paragraphs = versesPerParagraph.Length;
Entry[,] dp = new Entry[days, paragraphs];
//initialize table
int verseCount = 0;
for(int p = 0; p < paragraphs; ++p)
{
verseCount += versesPerParagraph[p];
double diff = avgVerses - verseCount;
dp[0, p].minCost = diff * diff;
dp[0, p].predecessor = -1;
}
//run dynamic program
for(int d = 1; d < days; ++d)
{
for(int p = d; p < paragraphs; ++p)
{
verseCount = 0;
dp[d, p].minCost = double.MaxValue;
for(int i = p; i >= d; --i)
{
verseCount += versesPerParagraph[i];
double diff = avgVerses - verseCount;
double cost = dp[d - 1, i - 1].minCost + diff * diff;
if(cost < dp[d, p].minCost)
{
dp[d, p].minCost = cost;
dp[d, p].predecessor = i - 1;
}
}
}
}
//reconstruct the partitioning
{
int p = paragraphs - 1;
for (int d = days - 1; d >= 0; --d)
{
int predecessor = dp[d, p].predecessor;
//calculate number of verses, just to show them
verseCount = 0;
for (int i = predecessor + 1; i <= p; ++i)
verseCount += versesPerParagraph[i];
Console.WriteLine($"Day {d} ranges from paragraph {predecessor + 1} to {p} and has {verseCount} verses.");
p = predecessor;
}
}
}
The output is:
Day 6 ranges from paragraph 13 to 13 and has 10 verses.
Day 5 ranges from paragraph 10 to 12 and has 10 verses.
Day 4 ranges from paragraph 9 to 9 and has 10 verses.
Day 3 ranges from paragraph 6 to 8 and has 11 verses.
Day 2 ranges from paragraph 4 to 5 and has 12 verses.
Day 1 ranges from paragraph 2 to 3 and has 13 verses.
Day 0 ranges from paragraph 0 to 1 and has 11 verses.
This partitioning gives a standard deviation of 1.15.

Torch - Query matrix with another matrix

I have a m x n tensor (Tensor 1) and another k x 2 tensor (Tensor 2) and I wish to extract all the values of Tensor 1 using indices based on Tensor 2. For example;
Tensor1
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
[torch.DoubleTensor of size 4x5]
Tensor2
2 1
3 5
1 1
4 3
[torch.DoubleTensor of size 4x2]
And the function would yield;
6
15
1
18
The first solution that comes into mind is to simply loop through indexes and pick the correspoding values:
function get_elems_simple(tensor, indices)
local res = torch.Tensor(indices:size(1)):typeAs(tensor)
local i = 0
res:apply(
function ()
i = i + 1
return tensor[indices[i]:clone():storage()]
end)
return res
end
Here tensor[indices[i]:clone():storage()] is just a generic way to pick an element from a multi-dimensional tensor. In k-dimensional case this is exactly analogous to tensor[{indices[i][1], ... , indices[i][k]}].
This method works fine if you don't have to extract lots of values (the bottleneck is :apply method which is not able to use many optimization techniques and SIMD instructions because the function it executes is a black box). The job can be done way more efficiently: the method :index does exactly what you need... with a one-dimensional tensor. Multi-dimensional target/index tensors need to be flattened:
function flatten_indices(sp_indices, shape)
sp_indices = sp_indices - 1
local n_elem, n_dim = sp_indices:size(1), sp_indices:size(2)
local flat_ind = torch.LongTensor(n_elem):fill(1)
local mult = 1
for d = n_dim, 1, -1 do
flat_ind:add(sp_indices[{{}, d}] * mult)
mult = mult * shape[d]
end
return flat_ind
end
function get_elems_efficient(tensor, sp_indices)
local flat_indices = flatten_indices(sp_indices, tensor:size())
local flat_tensor = tensor:view(-1)
return flat_tensor:index(1, flat_indices)
end
The difference is drastic:
n = 500000
k = 100
a = torch.rand(n, k)
ind = torch.LongTensor(n, 2)
ind[{{}, 1}]:random(1, n)
ind[{{}, 2}]:random(1, k)
elems1 = get_elems_simple(a, ind) # 4.53 sec
elems2 = get_elems_efficient(a, ind) # 0.05 sec
print(torch.all(elems1:eq(elems2))) # true

Calculating Total Number of Times of Loops

I'm trying to calculate the total number of times the innermost statement is executed.
count = 0;
for i = 1 to n
for j = 1 to n - i
count = count + 1
I figured that the most the loop can execute is O(n*n-i) = O(n^2). I wanted to prove this by using double summation but I'm getting lost since the I'm having trouble starting the equation since j = 1 is thrown into there.
Can someone help me explain this to me?
Thanks
For each i, the inner loop executes n - i times (n is constant). Therefore (since i ranges from 1 to n), to determine the total number of times the innermost statement is executed, we must evaluate the sum
(n - 1) + (n - 2) + (n - 3) + ... + (n - n)
By rearranging the terms (grouping all the ns that appear first), we can see that this is equal to
n*n - (1 + 2 + 3 + ... + n) = n*n - n(n+1)/2 = n*(n-1)/2 = n*n/2 - n/2
Here's a simple implementation in Python to verify this:
def f(n):
count = 0;
for i in range(1, n + 1):
for _ in range(1, n - i + 1):
count = count + 1
return count
for n in range(1,11):
print n, '\t', f(n), '\t', n*n/2 - n/2
Output:
1 0 0
2 1 1
3 3 3
4 6 6
5 10 10
6 15 15
7 21 21
8 28 28
9 36 36
10 45 45
The first column is n, the second is the number of times that inner statement is executed, and the third is n*n/2 - n/2.

Resources