Modular inverses and unsigned integers - math

Modular inverses can be computed as follows (from Rosetta Code):
#include <stdio.h>
int mul_inv(int a, int b)
{
int b0 = b, t, q;
int x0 = 0, x1 = 1;
if (b == 1) return 1;
while (a > 1) {
q = a / b;
t = b, b = a % b, a = t;
t = x0, x0 = x1 - q * x0, x1 = t;
}
if (x1 < 0) x1 += b0;
return x1;
}
However, the inputs are ints, as you can see. Would the above code work for unsigned integers (e.g. uint64_t) as well? I mean, would it be ok to replaced all int with uint64_t? I could try for few inputs but it is not feasible to try for all 64-bits combinations.
I'm specifically interested in two aspects:
for values [0, 264) of both a and b, would all calculation not overflow/underflow (or overflow with no harm)?
how would (x1 < 0) look like in unsigned case?

First of all how this algorithm works? It is based on the Extended Euclidean algorithm for computation of the GCD. In short the idea is following: if we can find some integer coefficients m and n such that
a*m + b*n = 1
then m will be the answer for the modular inverse problem. It is easy to see because
a*m + b*n = a*m (mod b)
Luckily the Extended Euclidean algorithm does exactly that: if a and b are co-prime, it finds such m and n. It works in the following way: for each iteration track two triplets (ai, xai, yai) and (bi, xbi, ybi) such that at every step
ai = a0*xai + b0*yai
bi = a0*xbi + b0*ybi
so when finally the algorithm stops at the state of ai = 0 and bi = GCD(a0,b0), then
1 = GCD(a0,b0) = a0*xbi + b0*ybi
It is done using more explicit way to calculate modulo: if
q = a / b
r = a % b
then
r = a - q * b
Another important thing is that it can be proven that for positive a and b at every step |xai|,|xbi| <= b and |yai|,|ybi| <= a. This means there can be no overflow during calculation of those coefficients. Unfortunately negative values are possible, moreover, on every step after the first one in each equation one is positive and the other is negative.
What the code in your question does is a reduced version of the same algorithm: since all we are interested in is the x[a/b] coefficients, it tracks only them and ignores the y[a/b] ones. The simplest way to make that code work for uint64_t is to track the sign explicitly in a separate field like this:
typedef struct tag_uint64AndSign {
uint64_t value;
bool isNegative;
} uint64AndSign;
uint64_t mul_inv(uint64_t a, uint64_t b)
{
if (b <= 1)
return 0;
uint64_t b0 = b;
uint64AndSign x0 = { 0, false }; // b = 1*b + 0*a
uint64AndSign x1 = { 1, false }; // a = 0*b + 1*a
while (a > 1)
{
if (b == 0) // means original A and B were not co-prime so there is no answer
return 0;
uint64_t q = a / b;
// (b, a) := (a % b, b)
// which is the same as
// (b, a) := (a - q * b, b)
uint64_t t = b; b = a % b; a = t;
// (x0, x1) := (x1 - q * x0, x0)
uint64AndSign t2 = x0;
uint64_t qx0 = q * x0.value;
if (x0.isNegative != x1.isNegative)
{
x0.value = x1.value + qx0;
x0.isNegative = x1.isNegative;
}
else
{
x0.value = (x1.value > qx0) ? x1.value - qx0 : qx0 - x1.value;
x0.isNegative = (x1.value > qx0) ? x1.isNegative : !x0.isNegative;
}
x1 = t2;
}
return x1.isNegative ? (b0 - x1.value) : x1.value;
}
Note that if a and b are not co-prime or when b is 0 or 1, this problem has no solution. In all those cases my code returns 0 which is an impossible value for any real solution.
Note also that although the calculated value is really the modular inverse, simple multiplication will not always produce 1 because of the overflow at multiplication over uint64_t. For example for a = 688231346938900684 and b = 2499104367272547425 the result is inv = 1080632715106266389
a * inv = 688231346938900684 * 1080632715106266389 =
= 743725309063827045302080239318310076 =
= 2499104367272547425 * 297596738576991899 + 1 =
= b * 297596738576991899 + 1
But if you do a naive multiplication of those a and inv of type uint64_t, you'll get 4042520075082636476 so (a*inv)%b will be 1543415707810089051 rather than expected 1.

The mod_inv C function :
return a modular multiplicative inverse of n with respect to the modulus
return 0 if the linear congruence has no solutions
unsigned mod_inv(unsigned n, const unsigned mod) {
unsigned a = mod, b = a, c = 0, d = 0, e = 1, f, g;
for (n *= a > 1; n > 1 && (n *= a > 0); e = g, c = (c & 3) | (c & 1) << 2) {
g = d, d *= n / (f = a);
a = n % a, n = f;
c = (c & 6) | (c & 2) >> 1;
f = c > 1 && c < 6;
c = (c & 5) | (f || e > d ? (c & 4) >> 1 : ~c & 2);
d = f ? d + e : e > d ? e - d : d - e;
}
return n ? c & 4 ? b - e : e : 0;
}
Examples
n = 7 and mod = 45 then res = 13 so 1 == ( 13 * 7 ) % 45
n = 52 and mod = 107 then res = 35 so 1 == ( 35 * 52 ) % 107
n = 213 and mod = 155 then res = 147 so 1 == ( 147 * 213 ) % 155
n = 392 and mod = 45 then res = 38 so 1 == ( 38 * 392 ) % 45
n = 3708141711 and mod = 4280761040 it still works...

Related

canberra distance - inconsistent results

I'm trying to understand what's going on with my calculation of canberra distance. I write my own simple canberra.distance function, however the results are not consistent with dist function. I added option na.rm = T to my function, to be able calculate the sum when there is zero denominator. From ?dist I understand that they use similar approach: Terms with zero numerator and denominator are omitted from the sum and treated as if the values were missing.
canberra.distance <- function(a, b){
sum( (abs(a - b)) / (abs(a) + abs(b)), na.rm = T )
}
a <- c(0, 1, 0, 0, 1)
b <- c(1, 0, 1, 0, 1)
canberra.distance(a, b)
> 3
# the result that I expected
dist(rbind(a, b), method = "canberra")
> 3.75
a <- c(0, 1, 0, 0)
b <- c(1, 0, 1, 0)
canberra.distance(a, b)
> 3
# the result that I expected
dist(rbind(a, b), method = "canberra")
> 4
a <- c(0, 1, 0)
b <- c(1, 0, 1)
canberra.distance(a, b)
> 3
dist(rbind(a, b), method = "canberra")
> 3
# now the results are the same
Pairs 0-0 and 1-1 seem to be problematic. In the first case (0-0) both numerator and denominator are equal to zero and this pair should be omitted. In the second case (1-1) numerator is 0 but denominator is not and the term is then also 0 and the sum should not change.
What am I missing here?
EDIT:
To be in line with R definition, function canberra.distance can be modified as follows:
canberra.distance <- function(a, b){
sum( abs(a - b) / abs(a + b), na.rm = T )
}
However, the results are the same as before.
This might shed some light on the difference. As far as I can see this is the actual code being run for computing the distance
static double R_canberra(double *x, int nr, int nc, int i1, int i2)
{
double dev, dist, sum, diff;
int count, j;
count = 0;
dist = 0;
for(j = 0 ; j < nc ; j++) {
if(both_non_NA(x[i1], x[i2])) {
sum = fabs(x[i1] + x[i2]);
diff = fabs(x[i1] - x[i2]);
if (sum > DBL_MIN || diff > DBL_MIN) {
dev = diff/sum;
if(!ISNAN(dev) ||
(!R_FINITE(diff) && diff == sum &&
/* use Inf = lim x -> oo */ (int) (dev = 1.))) {
dist += dev;
count++;
}
}
}
i1 += nr;
i2 += nr;
}
if(count == 0) return NA_REAL;
if(count != nc) dist /= ((double)count/nc);
return dist;
}
I think the culprit is this line
if(!ISNAN(dev) ||
(!R_FINITE(diff) && diff == sum &&
/* use Inf = lim x -> oo */ (int) (dev = 1.)))
which handles a special case and may not be documented.

Triple Modular Multiplicaiton

I am calculating the following sum:
(a[x]+ma[x-1]+2ma[x-2]+3m*a[x-3]+....)%MOD (MOD=1e9+7)
For this, I am using this loop.
long long mulmod(long long a,long long b,long long c)
{
if (a == 0 || b == 0)
return 0;
if (a == 1)
return b;
if (b == 1)
return a;
long long a2 = mulmod(a, b / 2, c);
if ((b & 1) == 0)
{
return (a2 + a2) % c;
}
else
{
return ((a % c) + (a2 + a2)) % c;
}
}
res=a[x]%MOD;
for(i=x-1;i>=1;i--)
res=(res%MOD+mulmod(mulmod(x-i,m,MOD),a[i],MOD))%MOD;
However, this is still giving me overflow errors. The basic error, I guess is in (abc)%MOD.
Thank you.
You need to incorporate the following modular arithmetic identities into your program to avoid overflow:
(A + B + ...) mod C = (A mod C + B mod C + ... mod C) mod C
and
(A * B * ...) mod C = (A mod C * B mod C * ... mod C) mod C

Is there an algorithm known for power towers modulo a number managing all cases?

I would like to have an implementation in PARI/GP
for the calculation of
a_1 ^ a_2 ^ ... ^ a_n (mod m)
which manages all cases, especially the cases where high powers appear in the phi-chain.
Does anyone know such an implementation ?
Here's a possibility using Chinese remainders to make sure the modulus is a prime power. This simplifies the computation of x^n mod m in the painful case where gcd(x,m) is not 1. The code assumes the a_i are > 1; most of the code checks whether p^a_1^a_2^...^a_n is 0 mod (p^e) for a prime number p, while avoiding overflow.
\\ x[1]^x[2]^ ...^ x[#x] mod m, assuming x[i] > 1 for all i
tower(x, m) =
{ my(f = factor(m), P = f[,1], E = f[,2]);
chinese(vector(#P, i, towerp(x, P[i], E[i])));
}
towerp(x, p, e) =
{ my(q = p^e, i, t, v);
if (#x == 0, return (Mod(1, q)));
if (#x == 1, return (Mod(x[1], q)));
if (v = valuation(x[1], p),
t = x[#x]; i = #x;
while (i > 1,
if (t >= e, return (Mod(0, q)));
t = x[i]^t; i--);
if (t * v >= e, return (Mod(0, q)));
return (Mod(x[1], q)^t);
);
Mod(x[1], q)^lift(tower(x[^1], (p-1)*p^e));
}
For instance
? 5^(4^(3^2)) % 163 \\ direct computation, wouldn't scale
%1 = 158
? tower([5,4,3,2], 163)
%2 = Mod(158, 163)

Optimization of Fibonacci sequence generating algorithm

As we all know, the simplest algorithm to generate Fibonacci sequence is as follows:
if(n<=0) return 0;
else if(n==1) return 1;
f(n) = f(n-1) + f(n-2);
But this algorithm has some repetitive calculation. For example, if you calculate f(5), it will calculate f(4) and f(3). When you calculate f(4), it will again calculate both f(3) and f(2). Could someone give me a more time-efficient recursive algorithm?
I have read about some of the methods for calculating Fibonacci with efficient time complexity following are some of them -
Method 1 - Dynamic Programming
Now here the substructure is commonly known hence I'll straightly Jump to the solution -
static int fib(int n)
{
int f[] = new int[n+2]; // 1 extra to handle case, n = 0
int i;
f[0] = 0;
f[1] = 1;
for (i = 2; i <= n; i++)
{
f[i] = f[i-1] + f[i-2];
}
return f[n];
}
A space-optimized version of above can be done as follows -
static int fib(int n)
{
int a = 0, b = 1, c;
if (n == 0)
return a;
for (int i = 2; i <= n; i++)
{
c = a + b;
a = b;
b = c;
}
return b;
}
Method 2- ( Using power of the matrix {{1,1},{1,0}} )
This an O(n) which relies on the fact that if we n times multiply the matrix M = {{1,1},{1,0}} to itself (in other words calculate power(M, n )), then we get the (n+1)th Fibonacci number as the element at row and column (0, 0) in the resultant matrix. This solution would have O(n) time.
The matrix representation gives the following closed expression for the Fibonacci numbers:
fibonaccimatrix
static int fib(int n)
{
int F[][] = new int[][]{{1,1},{1,0}};
if (n == 0)
return 0;
power(F, n-1);
return F[0][0];
}
/*multiplies 2 matrices F and M of size 2*2, and
puts the multiplication result back to F[][] */
static void multiply(int F[][], int M[][])
{
int x = F[0][0]*M[0][0] + F[0][1]*M[1][0];
int y = F[0][0]*M[0][1] + F[0][1]*M[1][1];
int z = F[1][0]*M[0][0] + F[1][1]*M[1][0];
int w = F[1][0]*M[0][1] + F[1][1]*M[1][1];
F[0][0] = x;
F[0][1] = y;
F[1][0] = z;
F[1][1] = w;
}
/*function that calculates F[][] raise to the power n and puts the
result in F[][]*/
static void power(int F[][], int n)
{
int i;
int M[][] = new int[][]{{1,1},{1,0}};
// n - 1 times multiply the matrix to {{1,0},{0,1}}
for (i = 2; i <= n; i++)
multiply(F, M);
}
This can be optimized to work in O(Logn) time complexity. We can do recursive multiplication to get power(M, n) in the previous method.
static int fib(int n)
{
int F[][] = new int[][]{{1,1},{1,0}};
if (n == 0)
return 0;
power(F, n-1);
return F[0][0];
}
static void multiply(int F[][], int M[][])
{
int x = F[0][0]*M[0][0] + F[0][1]*M[1][0];
int y = F[0][0]*M[0][1] + F[0][1]*M[1][1];
int z = F[1][0]*M[0][0] + F[1][1]*M[1][0];
int w = F[1][0]*M[0][1] + F[1][1]*M[1][1];
F[0][0] = x;
F[0][1] = y;
F[1][0] = z;
F[1][1] = w;
}
static void power(int F[][], int n)
{
if( n == 0 || n == 1)
return;
int M[][] = new int[][]{{1,1},{1,0}};
power(F, n/2);
multiply(F, F);
if (n%2 != 0)
multiply(F, M);
}
Method 3 (O(log n) Time)
Below is one more interesting recurrence formula that can be used to find nth Fibonacci Number in O(log n) time.
If n is even then k = n/2:
F(n) = [2*F(k-1) + F(k)]*F(k)
If n is odd then k = (n + 1)/2
F(n) = F(k)*F(k) + F(k-1)*F(k-1)
How does this formula work?
The formula can be derived from the above matrix equation.
fibonaccimatrix
Taking determinant on both sides, we get
(-1)n = Fn+1Fn-1 – Fn2
Moreover, since AnAm = An+m for any square matrix A, the following identities can be derived (they are obtained from two different coefficients of the matrix product)
FmFn + Fm-1Fn-1 = Fm+n-1
By putting n = n+1,
FmFn+1 + Fm-1Fn = Fm+n
Putting m = n
F2n-1 = Fn2 + Fn-12
F2n = (Fn-1 + Fn+1)Fn = (2Fn-1 + Fn)Fn (Source: Wiki)
To get the formula to be proved, we simply need to do the following
If n is even, we can put k = n/2
If n is odd, we can put k = (n+1)/2
public static int fib(int n)
{
if (n == 0)
return 0;
if (n == 1 || n == 2)
return (f[n] = 1);
// If fib(n) is already computed
if (f[n] != 0)
return f[n];
int k = (n & 1) == 1? (n + 1) / 2
: n / 2;
// Applyting above formula [See value
// n&1 is 1 if n is odd, else 0.
f[n] = (n & 1) == 1? (fib(k) * fib(k) +
fib(k - 1) * fib(k - 1))
: (2 * fib(k - 1) + fib(k))
* fib(k);
return f[n];
}
Method 4 - Using a formula
In this method, we directly implement the formula for the nth term in the Fibonacci series. Time O(1) Space O(1)
Fn = {[(√5 + 1)/2] ^ n} / √5
static int fib(int n) {
double phi = (1 + Math.sqrt(5)) / 2;
return (int) Math.round(Math.pow(phi, n)
/ Math.sqrt(5));
}
Reference: http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/fibFormula.html
Look here for implementation in Erlang which uses formula
. It shows nice linear resulting behavior because in O(M(n) log n) part M(n) is exponential for big numbers. It calculates fib of one million in 2s where result has 208988 digits. The trick is that you can compute exponentiation in O(log n) multiplications using (tail) recursive formula (tail means with O(1) space when used proper compiler or rewrite to cycle):
% compute X^N
power(X, N) when is_integer(N), N >= 0 ->
power(N, X, 1).
power(0, _, Acc) ->
Acc;
power(N, X, Acc) ->
if N rem 2 =:= 1 ->
power(N - 1, X, Acc * X);
true ->
power(N div 2, X * X, Acc)
end.
where X and Acc you substitute with matrices. X will be initiated with and Acc with identity I equals to .
One simple way is to calculate it iteratively instead of recursively. This will calculate F(n) in linear time.
def fib(n):
a,b = 0,1
for i in range(n):
a,b = a+b,a
return a
Hint: One way you achieve faster results is by using Binet's formula:
Here is a way of doing it in Python:
from decimal import *
def fib(n):
return int((Decimal(1.6180339)**Decimal(n)-Decimal(-0.6180339)**Decimal(n))/Decimal(2.236067977))
you can save your results and use them :
public static long[] fibs;
public long fib(int n) {
fibs = new long[n];
return internalFib(n);
}
public long internalFib(int n) {
if (n<=2) return 1;
fibs[n-1] = fibs[n-1]==0 ? internalFib(n-1) : fibs[n-1];
fibs[n-2] = fibs[n-2]==0 ? internalFib(n-2) : fibs[n-2];
return fibs[n-1]+fibs[n-2];
}
F(n) = (φ^n)/√5 and round to nearest integer, where φ is the golden ratio....
φ^n can be calculated in O(lg(n)) time hence F(n) can be calculated in O(lg(n)) time.
// D Programming Language
void vFibonacci ( const ulong X, const ulong Y, const int Limit ) {
// Equivalent : if ( Limit != 10 ). Former ( Limit ^ 0xA ) is More Efficient However.
if ( Limit ^ 0xA ) {
write ( Y, " " ) ;
vFibonacci ( Y, Y + X, Limit + 1 ) ;
} ;
} ;
// Call As
// By Default the Limit is 10 Numbers
vFibonacci ( 0, 1, 0 ) ;
EDIT: I actually think Hynek Vychodil's answer is superior to mine, but I'm leaving this here just in case someone is looking for an alternate method.
I think the other methods are all valid, but not optimal. Using Binet's formula should give you the right answer in principle, but rounding to the closest integer will give some problems for large values of n. The other solutions will unnecessarily recalculate the values upto n every time you call the function, and so the function is not optimized for repeated calling.
In my opinion the best thing to do is to define a global array and then to add new values to the array IF needed. In Python:
import numpy
fibo=numpy.array([1,1])
last_index=fibo.size
def fib(n):
global fibo,last_index
if (n>0):
if(n>last_index):
for i in range(last_index+1,n+1):
fibo=numpy.concatenate((fibo,numpy.array([fibo[i-2]+fibo[i-3]])))
last_index=fibo.size
return fibo[n-1]
else:
print "fib called for index less than 1"
quit()
Naturally, if you need to call fib for n>80 (approximately) then you will need to implement arbitrary precision integers, which is easy to do in python.
This will execute faster, O(n)
def fibo(n):
a, b = 0, 1
for i in range(n):
if i == 0:
print(i)
elif i == 1:
print(i)
else:
temp = a
a = b
b += temp
print(b)
n = int(input())
fibo(n)

extract rotation, scale values from 2d transformation matrix

how can i extract rotation, scale and translation values from 2d transformation matrix? i mean a have a 2d transformation
matrix = [1, 0, 0, 1, 0, 0]
matrix.rotate(45 / 180 * PI)
matrix.scale(3, 4)
matrix.translate(50, 100)
matrix.rotate(30 / 180 * PI)
matrix.scale(-2, 4)
now my matrix have values [a, b, c, d, tx, ty]
lets forget about the processes above and imagine that we have only the values a, b, c, d, tx, ty
how can i find total rotation and scale values via a, b, c, d, tx, ty
sorry for my english
Thanks your advance
EDIT
I think it should be an answer somewhere...
i just tried in Flash Builder (AS3) like this
var m:Matrix = new Matrix;
m.rotate(.25 * Math.PI);
m.scale(4, 5);
m.translate(100, 50);
m.rotate(.33 * Math.PI);
m.scale(-3, 2.5);
var shape:Shape = new Shape;
shape.transform.matrix = m;
trace(shape.x, shape.y, shape.scaleX, shape.scaleY, shape.rotation);
and the output is:
x = -23.6
y = 278.8
scaleX = 11.627334873920528
scaleY = -13.54222263865791
rotation = 65.56274134518259 (in degrees)
Not all values of a,b,c,d,tx,ty will yield a valid rotation sequence. I assume the above values are part of a 3x3 homogeneous rotation matrix in 2D
| a b tx |
A = | c d ty |
| 0 0 1 |
which transforms the coordinates [x, y, 1] into:
[x', y', 1] = A * |x|
|y|
|z|
Thus set the traslation into [dx, dy]=[tx, ty]
The scale is sx = sqrt(a² + c²) and sy = sqrt(b² + d²)
The rotation angle is t = atan(c/d) or t = atan(-b/a) as also they should be the same.
Otherwise you don't have a valid rotation matrix.
The above transformation is expanded to:
x' = tx + sx (x Cos θ - y Sin θ)
y' = ty + sy (x Sin θ + y Cos θ)
when the order is rotation, followed by scale and then translation.
I ran into this problem today and found the easiest solution to transform a point using the matrix. This way, you can extract the translation first, then rotation and scaling.
This only works if x and y are always scaled the same (uniform scaling).
Given your matrix m which has undergone a series of transforms,
var translate:Point;
var rotate:Number;
var scale:Number;
// extract translation
var p:Point = new Point();
translate = m.transformPoint(p);
m.translate( -translate.x, -translate.y);
// extract (uniform) scale
p.x = 1.0;
p.y = 0.0;
p = m.transformPoint(p);
scale = p.length;
// and rotation
rotate = Math.atan2(p.y, p.x);
There you go!
The term for this is matrix decomposition. Here is a solution that includes skew as described by Frédéric Wang.
function decompose_2d_matrix(mat) {
var a = mat[0];
var b = mat[1];
var c = mat[2];
var d = mat[3];
var e = mat[4];
var f = mat[5];
var delta = a * d - b * c;
let result = {
translation: [e, f],
rotation: 0,
scale: [0, 0],
skew: [0, 0],
};
// Apply the QR-like decomposition.
if (a != 0 || b != 0) {
var r = Math.sqrt(a * a + b * b);
result.rotation = b > 0 ? Math.acos(a / r) : -Math.acos(a / r);
result.scale = [r, delta / r];
result.skew = [Math.atan((a * c + b * d) / (r * r)), 0];
} else if (c != 0 || d != 0) {
var s = Math.sqrt(c * c + d * d);
result.rotation =
Math.PI / 2 - (d > 0 ? Math.acos(-c / s) : -Math.acos(c / s));
result.scale = [delta / s, s];
result.skew = [0, Math.atan((a * c + b * d) / (s * s))];
} else {
// a = b = c = d = 0
}
return result;
}
If in scaling you'd scaled by the same amount in x and in y, then the determinant of the matrix, i.e. ad-bc, which tells you the area multiplier would tell you the linear change of scale too - it would be the square root of the determinant. atan( b/a ) or better atan2( b,a ) would tell you the total angle you have rotated through.
However, as your scaling isn't uniform, there is usually not going to be a way to condense your series of rotations and scaling to a single rotation followed by a single non-uniform scaling in x and y.

Resources