Matrix multiplication in Fixed Point for 16 bits - math

I need perform the matrix multiplicatión between differents layers in a neural network. That is: W0, W1, W2, ... Wn are the weights of the neural netwotk and the input is data. Resulting matrices are:
Out1 = data * W0
Out2 = Out1 * W1
Out3 = Out2 * W2
.
.
.
OutN = Out(N-1) * Wn
I Know the absolute max value in the weights matrices and also I know that the input data range values are from 0 to 1 (input are normalizated). The matrix multiplication is in fixed point with 16 bits. The weights are scalated to the optimal format point. For example: if the absolute maximun value in W0 is 2.5 I know that the minimun number of bits in the integer part is 2 and the bits in fractional part will be 14. Because the data input is in the range [0,1] also I know the integer and fractional bits are 1.15.
My question is: How can I know the mininum number of bits in the integer part in the resultant matrix to avoid overflow? Is there anyway to study and infer the maximun value in a matrix multiplication? I know about determinant and norm of a matrix, but, I think the problem is in the consecutive negatives or positives values in the matrix rows an columns. For example, if I have this row vector and this column vector, and the result is in 8 bits fixed point:
A = [1, 2, 3, 4, 5, 6, -7, -8]
B = [1, 2, 3, 4, 5, 6, 7, 8]
A * B = (1*1) + (2*2) + (3*3) + (4*4) + (5*5) + (6*6) + (7*-7) + (8*8) = 90 - 49 + -68
When the sum accumulator is below than 64, occurs overflow altough the final result be contained between [-64,63].
Another example: If I have have this row vector and this column vector, and the result is in 8 bits fixed point:
A = [1, -2, 3, -4, 5, -6, 7, -8]
B = [1, 2, 3, 4, 5, 6, 7, 8]
A * B = (1*1) - (2*2) + (3*3) - (4*4) + (5*5) - (6*6) + (7*7) - (8*8) = -36
The sum accumulator in any moment exceeds the maximun range for 8 bits.
To sum up: I'm looking for a way to analize the weights matrices to avoid the overflow in the sum accumulator. The way that I do the matrix multiplication is (only a example if matrices A and B has been scalated to 1.15 format):
A1 --> 1.15 bits
B1 --> 1.15 bits
A2 --> 1.15 bits
B2 --> 1.15 bits
mult_1 = (A1 * B1) >> 2^15; // Right shift to alineate the operands
mult_2 = (A2 * B2) >> 2^15; // Right shift to alineate the operands
sum_acc = mult_1 + mult_2; // Sum accumulator

let consider n=100 dimensional dot product (which is part of any matrix multiplication or convolution) of %3.13 fixed point format as an example.
Integer bits
max value in %4.13 is slightly below 2^4 so let consider it would be: 15.999999
Now n dimensional dot product has n multiplications and n-1 additions.
15.999999*15.999999 + 15.999999*15.999999 + .... + 15.999999*15.999999
Each multiplication will sum up the integer bits
15.999999*15.999999 = 255.999999 -> ceil(log2(255)) = 8 = 2*(4)-> %8.13
Now this value is 99 times added so its the same as:
255.999999*99 = 25343.999999 -> ceil(log2(25343)) = 15 = ceil(8+log2(99)) -> %15.13
So if n is number of dimensions and i is number of integer bits the result needs:
i' = ceil((i*2)+log2(n-1))
integer bits... so:
%1.? -> 99*( 1.999999^2) = 395.99 -> % 9.?
%2.? -> 99*( 3.999999^2) = 1583.99 -> %11.?
%3.? -> 99*( 7.999999^2) = 6335.99 -> %13.?
%4.? -> 99*(15.999999^2) = 25343.99 -> %15.?
i(1) = ceil((1*2)+log2(99)) = ceil(2+6.626) = 9
i(2) = ceil((2*2)+log2(99)) = ceil(4+6.626) = 11
i(3) = ceil((3*2)+log2(99)) = ceil(6+6.626) = 13
i(4) = ceil((4*2)+log2(99)) = ceil(8+6.626) = 15
Fractional bits
ok let see what hapens with multiplication:
0.1b^2 = 0.01b -> %?.1 -> %?.2
0.01b^2 = 0.0001b -> %?.2 -> %?.4
0.001b^2 = 0.000001b -> %?.3 -> %?.6
so f' = 2*f where f is number of fractional bits. The addition is not changing the bitwidth:
0.1b*2 = 1.0b -> %?.1 -> %?.1
0.01b*2 = 0.1b -> %?.2 -> %?.2
0.001b*2 = 0.01b -> %?.3 -> %?.3
as the result will not be smaller then operands. So when applying fractional part to the dot product we will have:
i' = ceil((i*2)+log2(n-1))
f' = 2*f

Related

Concatenation of binary representation of first n positive integers in O(logn) time complexity

I came across this question in a coding competition. Given a number n, concatenate the binary representation of first n positive integers and return the decimal value of the resultant number formed. Since the answer can be large return answer modulo 10^9+7.
N can be as large as 10^9.
Eg:- n=4. Number formed=11011100(1=1,10=2,11=3,100=4). Decimal value of 11011100=220.
I found a stack overflow answer to this question but the problem is that it only contains a O(n) solution.
Link:- concatenate binary of first N integers and return decimal value
Since n can be up to 10^9 we need to come up with solution that is better than O(n).
Here's some Python code that provides a fast solution; it uses the same ideas as in Abhinav Mathur's post. It requires Python >= 3.8, but it doesn't use anything particularly fancy from Python, and could easily be translated into another language. You'd need to write algorithms for modular exponentiation and modular inverse if they're not already available in the target language.
First, for testing purposes, let's define the slow and obvious version:
# Modulus that results are reduced by,
M = 10 ** 9 + 7
def slow_binary_concat(n):
"""
Concatenate binary representations of 1 through n (inclusive).
Reinterpret the resulting binary string as an integer.
"""
concatenation = "".join(format(k, "b") for k in range(n + 1))
return int(concatenation, 2) % M
Checking that we get the expected result:
>>> slow_binary_concat(4)
220
>>> slow_binary_concat(10)
462911642
Now we'll write a faster version. First, we split the range [1, n) into subintervals such that within each subinterval, all numbers have the same length in binary. For example, the range [1, 10) would be split into four subintervals: [1, 2), [2, 4), [4, 8) and [8, 10). Here's a function to do that splitting:
def split_by_bit_length(n):
"""
Split the numbers in [1, n) by bit-length.
Produces triples (a, b, 2**k). Each triple represents a subinterval
[a, b) of [1, n), with a < b, all of whose elements has bit-length k.
"""
a = 1
while n > a:
b = 2 * a
yield (a, min(n, b), b)
a = b
Example output:
>>> list(split_by_bit_length(10))
[(1, 2, 2), (2, 4, 4), (4, 8, 8), (8, 10, 16)]
Now for each subinterval, the value of the concatenation of all numbers in that subinterval is represented by a fairly simple mathematical sum, which can be computed in exact form. Here's a function to compute that sum modulo M:
def subinterval_concat(a, b, l):
"""
Concatenation of values in [a, b), all of which have the same bit-length k.
l is 2**k.
Equivalently, sum(i * l**(b - 1 - i)) for i in range(a, b)) modulo M.
"""
n = b - a
inv = pow(l - 1, -1, M)
q = (pow(l, n, M) - 1) * inv
return (a * q + (q - n) * inv) % M
I won't go into the evaluation of the sum here: it's a bit off-topic for this site, and it's hard to express without a good way to render formulas. If you want the details, that's a topic for https://math.stackexchange.com, or a page of fairly simple algebra.
Finally, we want to put all the intervals together. Here's a function to do that.
def fast_binary_concat(n):
"""
Fast version of slow_binary_concat.
"""
acc = 0
for a, b, l in split_by_bit_length(n + 1):
acc = (acc * pow(l, b - a, M) + subinterval_concat(a, b, l)) % M
return acc
A comparison with the slow version shows that we get the same results:
>>> fast_binary_concat(4)
220
>>> fast_binary_concat(10)
462911642
But the fast version can easily be evaluated for much larger inputs, where using the slow version would be infeasible:
>>> fast_binary_concat(10**9)
827129560
>>> fast_binary_concat(10**18)
945204784
You just have to note a simple pattern. Taking up your example for n=4, let's gradually build the solution starting from n=1.
1 -> 1 #1
2 -> 2^2(1) + 2 #6
3 -> 2^2[2^2(1)+2] + 3 #27
4 -> 2^3{2^2[2^2(1)+2]+3} + 4 #220
If you expand the coefficients of each term for n=4, you'll get the coefficients as:
1 -> (2^3)*(2^2)*(2^2)
2 -> (2^3)*(2^2)
3 -> (2^3)
4 -> (2^0)
Let the N be total number of bits in the string representation of our required number, and D(x) be the number of bits in x. The coefficients can then be written as
1 -> 2^(N-D(1))
2 -> 2^(N-D(1)-D(2))
3 -> 2^(N-D(1)-D(2)-D(3))
... and so on
Since the value of D(x) will be the same for all x between range (2^t, 2^(t+1)-1) for some given t, you can break the problem into such ranges and solve for each range using mathematics (not iteration). Since the number of such ranges will be log2(Given N), this should work in the given time limit.
As an example, the various ranges become:
1. 1 (D(x) = 1)
2. 2-3 (D(x) = 2)
3. 4-7 (D(x) = 3)
4. 8-15 (D(x) = 4)

Continued fractions and Pell's equation - numerical issues

Mathematical background
Continued fractions are a way to represent numbers (rational or not), with a basic recursion formula to calculate it. Given a number r, we define r[0]=r and have:
for n in range(0..N):
a[n] = floor(r[n])
if r[n] == [an]: break
r[n+1] = 1 / (r[n]-a[n])
where a is the final representation. We can also define a series of convergents by
h[-2,-1] = [0, 1]
k[-2, -1] = [1, 0]
h[n] = a[n]*h[n-1]+h[n-2]
k[n] = a[n]*k[n-1]+k[n-2]
where h[n]/k[n] converge to r.
Pell's equation is a problem of the form x^2-D*y^2=1 where all numbers are integers and D is not a perfect square in our case. A solution for a given D that minimizes x is given by continued fractions. Basically, for the above equation, it is guaranteed that this (fundamental) solution is x=h[n] and y=k[n] for the lowest n found which solves the equation in the continued fraction expansion of sqrt(D).
Problem
I am failing to get this simple algorithm work for D=61. I first noticed it did not solve Pell's equation for 100 coefficients, so I compared it against Wolfram Alpha's convergents and continued fraction representation and noticed the 20th elements fail - the representation is 3 compared to 4 that I get, yielding different convergents - h[20]=335159612 on Wolfram compared to 425680601 for me.
I tested the code below, two languages (though to be fair, Python is C under the hood I guess), on two systems and get the same result - a diff on loop 20. I'll note that the convergents are still accurate and converge! Why am I getting different results compared to Wolfram Alpha, and is it possible to fix it?
For testing, here's a Python program to solve Pell's equation for D=61, printing first 20 convergents and the continued fraction representation cf (and some extra unneeded fluff):
from math import floor, sqrt # Can use mpmath here as well.
def continued_fraction(D, count=100, thresh=1E-12, verbose=False):
cf = []
h = (0, 1)
k = (1, 0)
r = start = sqrt(D)
initial_count = count
x = (1+thresh+start)*start
y = start
while abs(x/y - start) > thresh and count:
i = int(floor(r))
cf.append(i)
f = r - i
x, y = i*h[-1] + h[-2], i*k[-1] + k[-2]
if verbose is True or verbose == initial_count-count:
print(f'{x}\u00B2-{D}x{y}\u00B2 = {x**2-D*y**2}')
if x**2 - D*y**2 == 1:
print(f'{x}\u00B2-{D}x{y}\u00B2 = {x**2-D*y**2}')
print(cf)
return
count -= 1
r = 1/f
h = (h[1], x)
k = (k[1], y)
print(cf)
raise OverflowError(f"Converged on {x} {y} with count {count} and diff {abs(start-x/y)}!")
continued_fraction(61, count=20, verbose=True, thresh=-1) # We don't want to stop on account of thresh in this example
A c program doing the same:
#include<stdio.h>
#include<math.h>
#include<stdlib.h>
int main() {
long D = 61;
double start = sqrt(D);
long h[] = {0, 1};
long k[] = {1, 0};
int count = 20;
float thresh = 1E-12;
double r = start;
long x = (1+thresh+start)*start;
long y = start;
while(abs(x/(double)y-start) > -1 && count) {
long i = floor(r);
double f = r - i;
x = i * h[1] + h[0];
y = i * k[1] + k[0];
printf("%ld\u00B2-%ldx%ld\u00B2 = %lf\n", x, D, y, x*x-D*y*y);
r = 1/f;
--count;
h[0] = h[1];
h[1] = x;
k[0] = k[1];
k[1] = y;
}
return 0;
}
mpmath, python's multi-precision library can be used. Just be careful that all the important numbers are in mp format.
In the code below, x, y and i are standard multi-precision integers. r and f are multi-precision real numbers. Note that the initial count is set higher than 20.
from mpmath import mp, mpf
mp.dps = 50 # precision in number of decimal digits
def continued_fraction(D, count=22, thresh=mpf(1E-12), verbose=False):
cf = []
h = (0, 1)
k = (1, 0)
r = start = mp.sqrt(D)
initial_count = count
x = 0 # some dummy starting values, they will be overwritten early in the while loop
y = 1
while abs(x/y - start) > thresh and count > 0:
i = int(mp.floor(r))
cf.append(i)
x, y = i*h[-1] + h[-2], i*k[-1] + k[-2]
if verbose or initial_count == count:
print(f'{x}\u00B2-{D}x{y}\u00B2 = {x**2-D*y**2}')
if x**2 - D*y**2 == 1:
print(f'{x}\u00B2-{D}x{y}\u00B2 = {x**2-D*y**2}')
print(cf)
return
count -= 1
f = r - i
r = 1/f
h = (h[1], x)
k = (k[1], y)
print(cf)
raise OverflowError(f"Converged on {x} {y} with count {count} and diff {abs(start-x/y)}!")
continued_fraction(61, count=22, verbose=True, thresh=mpf(1e-100))
Output is similar to wolfram's:
...
335159612²-61x42912791² = 3
1431159437²-61x183241189² = -12
1766319049²-61x226153980² = 1
[7, 1, 4, 3, 1, 2, 2, 1, 3, 4, 1, 14, 1, 4, 3, 1, 2, 2, 1, 3, 4, 1]

How do I make 100 = 1? (explanation within)

Right now I have a code that can find the number of combinations of a sum of a value using numbers greater than zero and less than the value.
I need to alter the value in order to expand the combinations so that they include more than just the value.
For example:
The number 10 yields the results:
[1, 2, 3, 4], [1, 2, 7],
[1, 3, 6], [1, 4, 5],
[1, 9], [2, 3, 5], [2, 8],
[3, 7], [4, 6]
But I need to expand this to including any number that collapses to 1 as well. Because in essence, I need 100 = n in that the sum of the individual numbers within the digits = n. So in this case 100 = 1 because 100 --> 1+0+0 = 1
Therefore the number 1999 will also be a valid combination to list for value = 100 because 1999 = 1+9+9+9 = 28, and 28 = 2+8 = 10, and 10 = 1+0 = 1
Now I realize that this will yield an infinite series of combinations, so I will need to set limits to the range I want to acquire data for. This is the current code I am using to find my combinations.
def a(lst, target, with_replacement=False):
def _a(idx, l, r, t, w):
if t == sum(l): r.append(l)
elif t < sum(l): return
for u in range(idx, len(lst)):
_a(u if w else (u + 1), l + [lst[u]], r, t, w)
return r
return _a(0, [], [], target, with_replacement)
for val in range(100,101):
s = range(1, val)
solutions = a(s, val)
print(solutions)
print('Value:', val, "Combinations", len(solutions))
You seem to have multiple issues.
To repeatedly add the decimal digits of an integer until you end with a single digit, you could use this code.
d = val
while d > 9:
d = sum(int(c) for c in str(d))
This acts in just the way you describe. However, there is an easier way. Repeatedly adding the decimal digits of a number is called casting out nines and results in the digital root of the number. This almost equals the remainder of the number when divided by nine, except that you want to get a result of 9 rather than 1. So easier and faster code is
d = val % 9
if d == 0:
d == 9
or perhaps the shorter but trickier
d = (val - 1) % 9 + 1
or the even-more-tricky
d = val % 9 or 9
To find all numbers that end up at 7 (for example, or any digit from 1 to 9) you just want all numbers with the remainder 7 when divided by 9. So start at 7 and keep adding 9 and you get all such values.
The approach you are using to find all partitions of 7 then arranging them into numbers is much more complicated and slower than necessary.
To find all numbers that end up at 16 (for example, or any integer greater than 9) your current approach may be best. It is difficult otherwise to avoid the numbers that directly add to 7 or to 25 without going through 16. If this is really what you mean, say so in your question and we can look at this situation further.

Find row of pyramid based on index?

Given a pyramid like:
0
1 2
3 4 5
6 7 8 9
...
and given the index of the pyramid i where i represents the ith number of the pyramid, is there a way to find the index of the row to which the ith element belongs? (e.g. if i = 6,7,8,9, it is in the 3rd row, starting from row 0)
There's a connection between the row numbers and the triangular numbers. The nth triangular number, denoted Tn, is given by Tn = n(n-1)/2. The first couple triangular numbers are 0, 1, 3, 6, 10, 15, etc., and if you'll notice, the starts of each row are given by the nth triangular number (the fact that they come from this triangle is where this name comes from.)
So really, the goal here is to determine the largest n such that Tn ≤ i. Without doing any clever math, you could solve this in time O(√n) by just computing T0, T1, T2, etc. until you find something bigger than i. Even better, you could binary search for it in time O(log n) by computing T1, T2, T4, T8, etc. until you overshoot, then binary searching on the range you found.
Alternatively, we could try to solve for this directly. Suppose we want to find the choice of n such that
n(n + 1) / 2 = i
Expanding, we get
n2 / 2 + n / 2 = i.
Equivalently,
n2 / 2 + n / 2 - i = 0,
or, more easily:
n2 + n - 2i = 0.
Now we use the quadratic formula:
n = (-1 &pm; √(1 + 8i)) / 2
The negative root we can ignore, so the value of n we want is
n = (-1 + √(1 + 8i)) / 2.
This number won't necessarily be an integer, so to find the row you want, we just round down:
row = ⌊(-1 + √(1 + 8i)) / 2⌋.
In code:
int row = int((-1 + sqrt(1 + 8 * i)) / 2);
Let's confirm that this works by testing it out a bit. Where does 9 go? Well, we have
(-1 + √(1 + 72)) / 2 = (-1 + √73) / 2 = 3.77
Rounding down, we see it goes in row 3 - which is correct!
Trying another one, where does 55 go? Well,
(-1 + √(1 + 440)) / 2 = (√441 - 1) / 2 = 10
So it should go in row 10. The tenth triangular number is T10 = 55, so in fact, 55 starts off that row. Looks like it works!
I get row = math.floor (√(2i + 0.25) - 0.5) where i is your number
Essentially the same as the guy above but I reduced n2 + n to (n + 0.5)2 - 0.25
I think ith element belongs nth row where n is number of n(n+1)/2 <= i < (n+1)(n+2)/2
For example, if i = 6, then n = 3 because n(n+1)/2 <= 6
and if i = 8, then n = 3 because n(n+1)/2 <= 8

Minimum number of element required to make a sequence that sums to a particular number

Suppose there is number s=12 , now i want to make sequence with the element a1+a2+.....+an=12.
The criteria is as follows-
n must be minimum.
a1 and an must be 1;
ai can differs a(i-1) by only 1,0 and -1.
for s=12 the result is 6.
So how to find the minimum value of n.
Algorithm for finding n from given s:
1.Find q = FLOOR( SQRT(s-1) )
2.Find r = q^2 + q
3.If s <= r then n = 2q, else n = 2q + 1
Example: s = 12
q = FLOOR( SQRT(12-1) ) = FLOOR(SQRT(11) = 3
r = 3^2 + 3 = 12
12 <= 12, therefore n = 2*3 = 6
Example: s = 160
q = FLOOR( SQRT(160-1) ) = FLOOR(SQRT(159) = 12
r = 12^2 + 12 = 156
159 > 156, therefore n = 2*12 + 1 = 25
and the 25-numbers sequence for
159: 1,2,3,4,5,6,7,8,9,10,10,10,9,10,10,10,9,8,7,6,5,4,3,2,1
Here's a way to visualize the solution.
First, draw the smallest triangle (rows containing successful odd numbers of stars) that has a greater or equal number of stars to n. In this case, we draw a 16-star triangle.
*
***
*****
*******
Then we have to remove 16 - 12 = 4 more stars. We do this diagonally starting from the top.
1
**2
****3
******4
The result is:
**
****
******
Finally, add up the column heights to get the final answer:
1, 2, 3, 3, 2, 1.
There are two cases: s odd and s even. When s is odd, you have the sequence:
1, 2, 3, ..., (s-1)/2, (s-1)/2, (s-1)/2-1, (s-1)/2-2, ..., 1
when n is even you have:
1, 2, 3, ..., s/2, s/2-1, s/2-2, ..., 1
The maximum possible for any given series of length n is:
n is even => (n^2+2n)/4
n is odd => (n+1)^2/4
These two results are arrived at easily enough by looking at the simple arithmetic sum of series where in the case of n even it is twice the sum of the series 1...n/2. In the case of n odd it is twice the sum of the series 1...(n-1)/2 and add on n+1/2 (the middle element).
Clearly you can generate any positive number that is less than this max as long as n>3.
So the problem then becomes finding the smallest n with a max greater than your target.
Algorithmically I'd go for:
Find (sqrt(4*s)-1) and round up to the next odd number. Call this M. This is an easy to work out value and will represent the lowest odd n that will work.
Check M-1 to see if its max sum is greater than s. If so then that your n is M-1. Otherwise your n is M.
Thank all you answer me. I derived a simpler solution. The algorithm looks like-
First find what is the maximum sum that can be made using n element-
if n=1 -> 1 sum=1;
if n=2 -> 1,1 sum=2;
if n=3 -> 1,2,1 sum=4;
if n=4 -> 1,2,2,1 sum=6;
if n=5 -> 1,2,3,2,1 sum=9;
if n=6 -> 1,2,3,3,2,1 sum=12;
So from observation it is clear that form any number,n 9<n<=12 can be
made using 6 element, similarly number
6<n<=9 can be made at using 5 element.
So it require only a binary search to find the number of
element that make a particular number.

Resources