Counting how many times an equation is true for items from an array - math

I have an array containing integers from 1 to 1000. I'm trying to count how many times this equation is true A + B + C + D = E where A <= B <= C <= D and A, B, C, D, E are all items from the array. Could you guys suggest any solutions?
The array contains all integers from 1 to 1000, so 1, 2, 3, 4, .. , 999, 1000. The numbers A - D can be the same number from the array.

You need to calculate number of integer partitions for every value E in range 1..1000 into 4 parts.
Python function countparts to calculate number of such partitions.
def cp(n, k, m):
if k == 0:
if n == 0:
return 1
else:
return 0
res = 0
for i in range(min(n + 1, m + 1)):
res += cp(n - i, k - 1, i)
return res
def countparts(n, k):
return cp(n - k, k, n - k + 1)
print(countparts(8, 4))
>> 5 (1115, 1124, 1133, 1223, 2222)
But it works slowly for large arguments.
Also at this page I found formula to get needed values fast:
P(i) = round((i**3 + 3*i*i - 9*i*(i % 2))/144)

Related

Number of ways to sit in a 2XN grid

I had this question in an interview but I was not able to solve it.
We have a grid of 2 rows and n columns. We have to find the number of ways to sit M men and W women given no men can sit adjacent or in front of each other.
I thought of solving it by Dynamic Programming but I'm not sure how to get the recurrence relation.
I know that if I am at (0,i), I can go to (1,i+1) but I don't know how to keep track of counts of men and women so far. Can anybody help me with the recurrence relation or states of dp?
Recurrence relation
Here is one recurrence relation I could thing of.
Let's call n_ways(N, W, M, k) the number of ways to seat:
W women
and M men
in the remaining N columns
knowing there was k men in the previous column. k can only take values 0 and 1.
n_ways(N, 0, 0, k) = 1
n_ways(N, W, M, k) = 0 if M > N or W+M > 2*N
n_ways(N, W, 0, k) = ((2*N) choose W)
n_ways(N, 0, M, 0) = 2 * (N choose M)
n_ways(N, 0, M, 1) = (N choose M)
n_ways(N, W, M, 0)
= n_ways(N-1, W, M, 0) // put nobody in first column
+ 2 * n_ways(N-1, W-1, M, 0) // put 1 woman in first column
+ n_ways(N-1, W-2, M, 0) // put 2 women in first column
+ 2 * n_ways(N-1, W-1, M-1, 1) // put 1 woman & 1 man in first column
+ 2 * n_ways(N-1, W, M-1, 1) // put 1 man in first column
n_ways(N, W, M, 1)
= n_ways(N-1, W, M, 0) // put nobody in first column
+ 2 * n_ways(N-1, W-1, M, 0) // put 1 woman in first column
+ n_ways(N-1, W-2, M, 0) // put 2 women in first column
+ n_ways(N-1, W-1, M-1, 1) // put 1 woman & 1 man in first column
+ n_ways(N-1, W, M-1, 1) // put 1 man in first column
Testing with python
Since I'm too lazy to implement dynamic programming myself, I instead opted for caching using python's functools.cache.
I implemented the recurrence relation above as a cached recursive function, and got the following results:
Number of ways to seat w women and m men in n columns:
n, w, m --> ways
0, 0, 0 --> 1
0, 0, 1 --> 0
0, 1, 0 --> 0
1, 2, 0 --> 1
1, 0, 2 --> 0
1, 1, 1 --> 2
2, 2, 2 --> 2
3, 3, 3 --> 2
4, 6, 2 --> 18
10, 15, 5 --> 2364
10, 10, 10 --> 2
Here is the python code:
from functools import cache
from math import comb
#cache
def n_ways(n, w, m, k):
if w == m == 0:
return 1
elif m > n or w + m > 2 * n:
return 0
elif m == 0:
return comb(2*n, w)
elif w == 0:
return (2-k) * comb(n, m)
else:
r_0, r_w, r_ww, r_wm, r_m = (
n_ways(n-1, w, m, 0),
n_ways(n-1, w-1, m, 0),
n_ways(n-1, w-2, m, 0) if w > 1 else 0,
n_ways(n-1, w-1, m-1, 1),
n_ways(n-1, w, m-1, 1)
)
return (r_0 + r_w + r_w + r_ww + r_wm + r_m) + (1-k) * (r_wm + r_m)
if __name__=='__main__':
print(f'Number of ways to seat w women and m men in n columns:')
print(f' n, w, m --> ways')
for w, m in [(0, 0), (0, 1), (1, 0), (2, 0), (0, 2),
(1, 1), (2, 2), (3, 3), (6, 2), (15, 5),
(10, 10)]:
n = (w + m) // 2
print(f'{n:2d}, {w:2d}, {m:2d} --> ', end='')
print(n_ways(n, w, m, 0))

Continued fractions and Pell's equation - numerical issues

Mathematical background
Continued fractions are a way to represent numbers (rational or not), with a basic recursion formula to calculate it. Given a number r, we define r[0]=r and have:
for n in range(0..N):
a[n] = floor(r[n])
if r[n] == [an]: break
r[n+1] = 1 / (r[n]-a[n])
where a is the final representation. We can also define a series of convergents by
h[-2,-1] = [0, 1]
k[-2, -1] = [1, 0]
h[n] = a[n]*h[n-1]+h[n-2]
k[n] = a[n]*k[n-1]+k[n-2]
where h[n]/k[n] converge to r.
Pell's equation is a problem of the form x^2-D*y^2=1 where all numbers are integers and D is not a perfect square in our case. A solution for a given D that minimizes x is given by continued fractions. Basically, for the above equation, it is guaranteed that this (fundamental) solution is x=h[n] and y=k[n] for the lowest n found which solves the equation in the continued fraction expansion of sqrt(D).
Problem
I am failing to get this simple algorithm work for D=61. I first noticed it did not solve Pell's equation for 100 coefficients, so I compared it against Wolfram Alpha's convergents and continued fraction representation and noticed the 20th elements fail - the representation is 3 compared to 4 that I get, yielding different convergents - h[20]=335159612 on Wolfram compared to 425680601 for me.
I tested the code below, two languages (though to be fair, Python is C under the hood I guess), on two systems and get the same result - a diff on loop 20. I'll note that the convergents are still accurate and converge! Why am I getting different results compared to Wolfram Alpha, and is it possible to fix it?
For testing, here's a Python program to solve Pell's equation for D=61, printing first 20 convergents and the continued fraction representation cf (and some extra unneeded fluff):
from math import floor, sqrt # Can use mpmath here as well.
def continued_fraction(D, count=100, thresh=1E-12, verbose=False):
cf = []
h = (0, 1)
k = (1, 0)
r = start = sqrt(D)
initial_count = count
x = (1+thresh+start)*start
y = start
while abs(x/y - start) > thresh and count:
i = int(floor(r))
cf.append(i)
f = r - i
x, y = i*h[-1] + h[-2], i*k[-1] + k[-2]
if verbose is True or verbose == initial_count-count:
print(f'{x}\u00B2-{D}x{y}\u00B2 = {x**2-D*y**2}')
if x**2 - D*y**2 == 1:
print(f'{x}\u00B2-{D}x{y}\u00B2 = {x**2-D*y**2}')
print(cf)
return
count -= 1
r = 1/f
h = (h[1], x)
k = (k[1], y)
print(cf)
raise OverflowError(f"Converged on {x} {y} with count {count} and diff {abs(start-x/y)}!")
continued_fraction(61, count=20, verbose=True, thresh=-1) # We don't want to stop on account of thresh in this example
A c program doing the same:
#include<stdio.h>
#include<math.h>
#include<stdlib.h>
int main() {
long D = 61;
double start = sqrt(D);
long h[] = {0, 1};
long k[] = {1, 0};
int count = 20;
float thresh = 1E-12;
double r = start;
long x = (1+thresh+start)*start;
long y = start;
while(abs(x/(double)y-start) > -1 && count) {
long i = floor(r);
double f = r - i;
x = i * h[1] + h[0];
y = i * k[1] + k[0];
printf("%ld\u00B2-%ldx%ld\u00B2 = %lf\n", x, D, y, x*x-D*y*y);
r = 1/f;
--count;
h[0] = h[1];
h[1] = x;
k[0] = k[1];
k[1] = y;
}
return 0;
}
mpmath, python's multi-precision library can be used. Just be careful that all the important numbers are in mp format.
In the code below, x, y and i are standard multi-precision integers. r and f are multi-precision real numbers. Note that the initial count is set higher than 20.
from mpmath import mp, mpf
mp.dps = 50 # precision in number of decimal digits
def continued_fraction(D, count=22, thresh=mpf(1E-12), verbose=False):
cf = []
h = (0, 1)
k = (1, 0)
r = start = mp.sqrt(D)
initial_count = count
x = 0 # some dummy starting values, they will be overwritten early in the while loop
y = 1
while abs(x/y - start) > thresh and count > 0:
i = int(mp.floor(r))
cf.append(i)
x, y = i*h[-1] + h[-2], i*k[-1] + k[-2]
if verbose or initial_count == count:
print(f'{x}\u00B2-{D}x{y}\u00B2 = {x**2-D*y**2}')
if x**2 - D*y**2 == 1:
print(f'{x}\u00B2-{D}x{y}\u00B2 = {x**2-D*y**2}')
print(cf)
return
count -= 1
f = r - i
r = 1/f
h = (h[1], x)
k = (k[1], y)
print(cf)
raise OverflowError(f"Converged on {x} {y} with count {count} and diff {abs(start-x/y)}!")
continued_fraction(61, count=22, verbose=True, thresh=mpf(1e-100))
Output is similar to wolfram's:
...
335159612²-61x42912791² = 3
1431159437²-61x183241189² = -12
1766319049²-61x226153980² = 1
[7, 1, 4, 3, 1, 2, 2, 1, 3, 4, 1, 14, 1, 4, 3, 1, 2, 2, 1, 3, 4, 1]

How to approach this type of problem in permutation and combination?

Altitudes
Alice and Bob took a journey to the mountains. They have been climbing
up and down for N days and came home extremely tired.
Alice only remembers that they started their journey at an altitude of
H1 meters and they finished their wandering at an alitude of H2
meters. Bob only remembers that every day they changed their altitude
by A, B, or C meters. If their altitude on the ith day was x,
then their altitude on day i + 1 can be x + A, x + B, or x + C.
Now, Bob wonders in how many ways they could complete their journey.
Two journeys are considered different if and only if there exist a day
when the altitude that Alice and Bob covered that day during the first
journey differs from the altitude Alice and Bob covered that day during
the second journey.
Bob asks Alice to tell her the number of ways to complete the journey.
Bob needs your help to solve this problem.
Input format
The first and only line contains 6 integers N, H1, H2, A, B, C that
represents the number of days Alice and Bob have been wandering,
altitude on which they started their journey, altitude on which they
finished their journey, and three possible altitude changes,
respectively.
Output format
Print the answer modulo 10**9 + 7.
Constraints
1 <= N <= 10**5
-10**9 <= H1, H2 <= 10**9
-10**9 <= A, B, C <= 10**9
Sample Input
2 0 0 1 0 -1
Sample Output
3
Explanation
There are only 3 possible journeys-- (0, 0), (1, -1), (-1, 1).
Note
This problem comes originally from a hackerearth competition, now closed. The explanation for the sample input and output has been corrected.
Here is my solution in Python 3.
The question can be simplified from its 6 input parameters to only 4 parameters. There is no need for the beginning and ending altitudes--the difference of the two is enough. Also, we can change the daily altitude changes A, B, and C and get the same answer if we make a corresponding change to the total altitude change. For example, if we add 1 to each of A, B, and C, we could add N to the altitude change: 1 additional meter each day over N days means N additional meters total. We can "normalize" our daily altitude changes by sorting them so A is the smallest, then subtract A from each of the altitude changes and subtract N * A from the total altitude change. This means we now need to add a bunch of 0's and two other values (let's call them D and E). D is not larger than E.
We now have an easier problem: take N values, each of which is 0, D, or E, so they sum to a particular total (let's say H). This is the same at using up to N numbers equaling D or E, with the rest zeros.
We can use mathematics, in particular Bezout's identity, to see if this is possible. Some more mathematics can find all the ways of doing this. Once we know how many 0's, D's, and E's, we can use multinomial coefficients to find how many ways these values can be rearranged. Total all these up and we have the answer.
This code finds the total number of ways to complete the journey, and takes it modulo 10**9 + 7 only at the very end. This is possible since Python uses large integers. The largest result I found in my testing is for the input values 100000 0 100000 0 1 2 which results in a number with 47,710 digits before taking the modulus. This takes a little over 8 seconds on my machine.
This code is a little longer than necessary, since I made some of the routines more general than necessary for this problem. I did this so I can use them in other problems. I used many comments for clarity.
# Combinatorial routines -----------------------------------------------
def comb(n, k):
"""Compute the number of ways to choose k elements out of a pile of
n, ignoring the order of the elements. This is also called
combinations, or the binomial coefficient of n over k.
"""
if k < 0 or k > n:
return 0
result = 1
for i in range(min(k, n - k)):
result = result * (n - i) // (i + 1)
return result
def multcoeff(*args):
"""Return the multinomial coefficient
(n1 + n2 + ...)! / n1! / n2! / ..."""
if not args: # no parameters
return 1
# Find and store the index of the largest parameter so we can skip
# it (for efficiency)
skipndx = args.index(max(args))
newargs = args[:skipndx] + args[skipndx + 1:]
result = 1
num = args[skipndx] + 1 # a factor in the numerator
for n in newargs:
for den in range(1, n + 1): # a factor in the denominator
result = result * num // den
num += 1
return result
def new_multcoeff(prev_multcoeff, x, y, z, ag, bg):
"""Given a multinomial coefficient prev_multcoeff =
multcoeff(x-bg, y+ag, z+(bg-ag)), calculate multcoeff(x, y, z)).
NOTES: 1. This uses bg multiplications and bg divisions,
faster than doing multcoeff from scratch.
"""
result = prev_multcoeff
for d in range(1, ag + 1):
result *= y + d
for d in range(1, bg - ag + 1):
result *= z + d
for d in range(bg):
result //= x - d
return result
# Number theory routines -----------------------------------------------
def bezout(a, b):
"""For integers a and b, find an integral solution to
a*x + b*y = gcd(a, b).
RETURNS: (x, y, gcd)
NOTES: 1. This routine uses the convergents of the continued
fraction expansion of b / a, so it will be slightly
faster if a <= b, i.e. the parameters are sorted.
2. This routine ensures the gcd is nonnegative.
3. If a and/or b is zero, the corresponding x or y
will also be zero.
4. This routine is named after Bezout's identity, which
guarantees the existences of the solution x, y.
"""
if not a:
return (0, (b > 0) - (b < 0), abs(b)) # 2nd is sign(b)
p1, p = 0, 1 # numerators of the two previous convergents
q1, q = 1, 0 # denominators of the two previous convergents
negate_y = True # flag if negate y=q (True) or x=p (False)
quotient, remainder = divmod(b, a)
while remainder:
b, a = a, remainder
p, p1 = p * quotient + p1, p
q, q1 = q * quotient + q1, q
negate_y = not negate_y
quotient, remainder = divmod(b, a)
if a < 0:
p, q, a = -p, -q, -a # ensure the gcd is nonnegative
return (p, -q, a) if negate_y else (-p, q, a)
def byzantine_bball(a, b, s):
"""For nonnegative integers a, b, s, return information about
integer solutions x, y to a*x + b*y = s. This is
equivalent to finding a multiset containing only a and b that
sums to s. The name comes from getting a given basketball score
given scores for shots and free throws in a hypothetical game of
"byzantine basketball."
RETURNS: None if there is no solution, or an 8-tuple containing
x the smallest possible nonnegative integer value of
x.
y the value of y corresponding to the smallest
possible integral value of x. If this is negative,
there is no solution for nonnegative x, y.
g the greatest common divisor (gcd) of a, b.
u the found solution to a*u + b*v = g
v " "
ag a // g, or zero if g=0
bg b // g, or zero if g=0
sg s // g, or zero if g=0
NOTES: 1. If a and b are not both zero and one solution x, y is
returned, then all integer solutions are given by
x + t * bg, y - t * ag for any integer t.
2. This routine is slightly optimized for a <= b. In that
case, the solution returned also has the smallest sum
x + y among positive integer solutions.
"""
# Handle edge cases of zero parameter(s).
if 0 == a == b: # the only score possible from 0, 0 is 0
return (0, 0, 0, 0, 0, 0, 0, 0) if s == 0 else None
if a == 0:
sb = s // b
return (0, sb, b, 0, 1, 0, 1, sb) if s % b == 0 else None
if b == 0:
sa = s // a
return (sa, 0, a, 1, 0, 1, 0, sa) if s % a == 0 else None
# Find if the score is possible, ignoring the signs of x and y.
u, v, g = bezout(a, b)
if s % g:
return None # only multiples of the gcd are possible scores
# Find one way to get the score, ignoring the signs of x and y.
ag, bg, sg = a // g, b // g, s // g # we now have ag*u + bg*v = 1
x, y = sg * u, sg * v # we now have a*x + b*y = s
# Find the solution where x is nonnegative and as small as possible.
t = x // bg # Python rounds toward minus infinity--what we want
x, y = x - t * bg, y + t * ag
# Return the information
return (x, y, g, u, v, ag, bg, sg)
# Routines for this puzzle ---------------------------------------------
def altitude_reduced(n, h, d, e):
"""Return the number of distinct n-tuples containing only the
values 0, d, and e that sum to h. Assume that all these
numbers are integers and that 0 <= d <= e.
"""
# Handle some impossible special cases
if n < 0 or h < 0:
return 0
# Handle some other simple cases with zero values
if n == 0:
return 0 if h else 1
if 0 == d == e: # all step values are zero
return 0 if h else 1
if 0 == d or d == e: # e is the only non-zero step value
# If possible, return # of tuples with proper # of e's, the rest 0's
return 0 if h % e else comb(n, h // e)
# Handle the main case 0 < d < e
# --Try to get the solution with the fewest possible non-zero days:
# x d's and y e's and the rest zeros: all solutions are given by
# x + t * bg, y - t * ag
solutions_info = byzantine_bball(d, e, h)
if not solutions_info:
return 0 # no way at all to get h from d, e
x, y, _, _, _, ag, bg, _ = solutions_info
# --Loop over all solutions with nonnegative x, y, small enough x + y
result = 0
while y >= 0 and x + y <= n: # at most n non-zero days
# Find multcoeff(x, y, n - x - y), in a faster way
if result == 0: # 1st time through loop: no prev coeff available
amultcoeff = multcoeff(x, y, n - x - y)
else: # use previous multinomial coefficient
amultcoeff = new_multcoeff(amultcoeff, x, y, n - x - y, ag, bg)
result += amultcoeff
x, y = x + bg, y - ag # x+y increases by bg-ag >= 0
return result
def altitudes(input_str=None):
# Get the input
if input_str is None:
input_str = input('Numbers N H1 H2 A B C? ')
# input_str = '100000 0 100000 0 1 2' # replace with prev line for input
n, h1, h2, a, b, c = map(int, input_str.strip().split())
# Reduce the number of parameters by normalizing the values
h_diff = h2 - h1 # net altitude change
a, b, c = sorted((a, b, c)) # a is now the smallest
h, d, e = h_diff - n * a, b - a, c - a # reduce a to zero
# Solve the reduced problem
print(altitude_reduced(n, h, d, e) % (10**9 + 7))
if __name__ == '__main__':
altitudes()
Here are some of my test routines for the main problem. These are suitable for pytest.
# Testing, some with pytest ---------------------------------------------------
import itertools # for testing
import collections # for testing
def brute(n, h, d, e):
"""Do alt_reduced with brute force."""
return sum(1 for v in itertools.product({0, d, e}, repeat=n)
if sum(v) == h)
def brute_count(n, d, e):
"""Count achieved heights with brute force."""
if n < 0:
return collections.Counter()
return collections.Counter(
sum(v) for v in itertools.product({0, d, e}, repeat=n)
)
def test_impossible():
assert altitude_reduced(0, 6, 1, 2) == 0
assert altitude_reduced(-1, 6, 1, 2) == 0
assert altitude_reduced(3, -1, 1, 2) == 0
def test_simple():
assert altitude_reduced(1, 0, 0, 0) == 1
assert altitude_reduced(1, 1, 0, 0) == 0
assert altitude_reduced(1, -1, 0, 0) == 0
assert altitude_reduced(1, 1, 0, 1) == 1
assert altitude_reduced(1, 1, 1, 1) == 1
assert altitude_reduced(1, 2, 0, 1) == 0
assert altitude_reduced(1, 2, 1, 1) == 0
assert altitude_reduced(2, 4, 0, 3) == 0
assert altitude_reduced(2, 4, 3, 3) == 0
assert altitude_reduced(2, 4, 0, 2) == 1
assert altitude_reduced(2, 4, 2, 2) == 1
assert altitude_reduced(3, 4, 0, 2) == 3
assert altitude_reduced(3, 4, 2, 2) == 3
assert altitude_reduced(4, 4, 0, 2) == 6
assert altitude_reduced(4, 4, 2, 2) == 6
assert altitude_reduced(2, 6, 0, 2) == 0
assert altitude_reduced(2, 6, 2, 2) == 0
def test_main():
N = 12
maxcnt = 0
for n in range(-1, N):
for d in range(N): # must have 0 <= d
for e in range(d, N): # must have d <= e
counts = brute_count(n, d, e)
for h, cnt in counts.items():
if cnt == 25653:
print(n, h, d, e, cnt)
maxcnt = max(maxcnt, cnt)
assert cnt == altitude_reduced(n, h, d, e)
print(maxcnt) # got 25653 for N = 12, (n, h, d, e) = (11, 11, 1, 2) etc.

Implementing FFT over finite fields

I would like to implement multiplication of polynomials using NTT. I followed Number-theoretic transform (integer DFT) and it seems to work.
Now I would like to implement multiplication of polynomials over finite fields Z_p[x] where p is arbitrary prime number.
Does it changes anything that the coefficients are now bounded by p, compared to the former unbounded case?
In particular, original NTT required to find prime number N as the working modulus that is larger than (magnitude of largest element of input vector)^2 * (length of input vector) + 1 so that the result never overflows. If the result is going to be bounded by that p prime anyway, how small can the modulus be? Note that p - 1 does not have to be of form (some positive integer) * (length of input vector).
Edit: I copy-pasted the source from the link above to illustrate the problem:
#
# Number-theoretic transform library (Python 2, 3)
#
# Copyright (c) 2017 Project Nayuki
# All rights reserved. Contact Nayuki for licensing.
# https://www.nayuki.io/page/number-theoretic-transform-integer-dft
#
import itertools, numbers
def find_params_and_transform(invec, minmod):
check_int(minmod)
mod = find_modulus(len(invec), minmod)
root = find_primitive_root(len(invec), mod - 1, mod)
return (transform(invec, root, mod), root, mod)
def check_int(n):
if not isinstance(n, numbers.Integral):
raise TypeError()
def find_modulus(veclen, minimum):
check_int(veclen)
check_int(minimum)
if veclen < 1 or minimum < 1:
raise ValueError()
start = (minimum - 1 + veclen - 1) // veclen
for i in itertools.count(max(start, 1)):
n = i * veclen + 1
assert n >= minimum
if is_prime(n):
return n
def is_prime(n):
check_int(n)
if n <= 1:
raise ValueError()
return all((n % i != 0) for i in range(2, sqrt(n) + 1))
def sqrt(n):
check_int(n)
if n < 0:
raise ValueError()
i = 1
while i * i <= n:
i *= 2
result = 0
while i > 0:
if (result + i)**2 <= n:
result += i
i //= 2
return result
def find_primitive_root(degree, totient, mod):
check_int(degree)
check_int(totient)
check_int(mod)
if not (1 <= degree <= totient < mod):
raise ValueError()
if totient % degree != 0:
raise ValueError()
gen = find_generator(totient, mod)
root = pow(gen, totient // degree, mod)
assert 0 <= root < mod
return root
def find_generator(totient, mod):
check_int(totient)
check_int(mod)
if not (1 <= totient < mod):
raise ValueError()
for i in range(1, mod):
if is_generator(i, totient, mod):
return i
raise ValueError("No generator exists")
def is_generator(val, totient, mod):
check_int(val)
check_int(totient)
check_int(mod)
if not (0 <= val < mod):
raise ValueError()
if not (1 <= totient < mod):
raise ValueError()
pf = unique_prime_factors(totient)
return pow(val, totient, mod) == 1 and all((pow(val, totient // p, mod) != 1) for p in pf)
def unique_prime_factors(n):
check_int(n)
if n < 1:
raise ValueError()
result = []
i = 2
end = sqrt(n)
while i <= end:
if n % i == 0:
n //= i
result.append(i)
while n % i == 0:
n //= i
end = sqrt(n)
i += 1
if n > 1:
result.append(n)
return result
def transform(invec, root, mod):
check_int(root)
check_int(mod)
if len(invec) >= mod:
raise ValueError()
if not all((0 <= val < mod) for val in invec):
raise ValueError()
if not (1 <= root < mod):
raise ValueError()
outvec = []
for i in range(len(invec)):
temp = 0
for (j, val) in enumerate(invec):
temp += val * pow(root, i * j, mod)
temp %= mod
outvec.append(temp)
return outvec
def inverse_transform(invec, root, mod):
outvec = transform(invec, reciprocal(root, mod), mod)
scaler = reciprocal(len(invec), mod)
return [(val * scaler % mod) for val in outvec]
def reciprocal(n, mod):
check_int(n)
check_int(mod)
if not (0 <= n < mod):
raise ValueError()
x, y = mod, n
a, b = 0, 1
while y != 0:
a, b = b, a - x // y * b
x, y = y, x % y
if x == 1:
return a % mod
else:
raise ValueError("Reciprocal does not exist")
def circular_convolve(vec0, vec1):
if not (0 < len(vec0) == len(vec1)):
raise ValueError()
if any((val < 0) for val in itertools.chain(vec0, vec1)):
raise ValueError()
maxval = max(val for val in itertools.chain(vec0, vec1))
minmod = maxval**2 * len(vec0) + 1
temp0, root, mod = find_params_and_transform(vec0, minmod)
temp1 = transform(vec1, root, mod)
temp2 = [(x * y % mod) for (x, y) in zip(temp0, temp1)]
return inverse_transform(temp2, root, mod)
vec0 = [24, 12, 28, 8, 0, 0, 0, 0]
vec1 = [4, 26, 29, 23, 0, 0, 0, 0]
print(circular_convolve(vec0, vec1))
def modulo(vec, prime):
return [x % prime for x in vec]
print(modulo(circular_convolve(vec0, vec1), 31))
Prints:
[96, 672, 1120, 1660, 1296, 876, 184, 0]
[3, 21, 4, 17, 25, 8, 29, 0]
However, where I change minmod = maxval**2 * len(vec0) + 1 to minmod = maxval + 1, it stops working:
[14, 16, 13, 20, 25, 15, 20, 0]
[14, 16, 13, 20, 25, 15, 20, 0]
What is the smallest minmod (N in the link above) be in order to work as expected?
If your input of n integers is bound to some prime q (any mod q not just prime will be the same) You can use it as a max value +1 but beware you can not use it as a prime p for the NTT because NTT prime p has special properties. All of them are here:
Translation from Complex-FFT to Finite-Field-FFT
so our max value of each input is q-1 but during your task computation (Convolution on 2 NTT results) the magnitude of first layer results can rise up to n.(q-1) but as we are doing convolution on them the input magnitude of final iNTT will rise up to:
m = n.((q-1)^2)
If you are doing different operations on the NTTs than the m equation might change.
Now let us get back to the p so in a nutshell you can use any prime p that upholds these:
p mod n == 1
p > m
and there exist 1 <= r,L < p such that:
p mod (L-1) = 0
r^(L*i) mod p == 1 // i = { 0,n }
r^(L*i) mod p != 1 // i = { 1,2,3, ... n-1 }
If all this is satisfied then p is nth root of unity and can be used for NTT. To find such prime and also the r,L look at the link above (there is C++ code that finds such).
For example during string multiplication we take 2 strings do NTT on them then convolute the result and iNTT back the result (that is sum of both input sizes). So for example:
99999999999999999999999999999999
*99999999999999999999999999999999
----------------------------------------------------------------
9999999999999999999999999999999800000000000000000000000000000001
the q = 10 and both operands are 9^32 so n=32 hence m = 9*9*32 = 2592 and the found prime is p = 2689. As you can see the result matches so no overflow occurs. However if I use any smaller prime that still fit all the other conditions the result will not match. I used this specifically to stretch the NTT values as much as possible (all values are q-1 and sizes are equal to the same power of 2)
In case your NTT is fast and n is not a power of 2 then you need to zero pad to nearest higher or equal power of 2 size for each NTT. But that should not affect the m value as zero pad should not increase the magnitude of values. My testing proves it so for convolution you can use:
m = (n1+n2).((q-1)^2)/2
where n1,n2 are the raw inputs sizes before zeropad.
For more info about implementing NTT you can check out mine in C++ (extensively optimized):
Modular arithmetics and NTT (finite field DFT) optimizations
So to answer your questions:
yes you can take advantage of the fact that input is mod q but you can not use q as p !!!
You can use minmod = n * (maxval + 1) only for single NTT (or first layer of NTTs) but as you are chaining them with convolution during your NTT usage you can not use that for the final INTT stage !!!
However as I mentioned in the comments easiest is to use max possible p that fits in the data type you are using and is usable for all power of 2 sizes of input supported.
Which basically renders your question irrelevant. The only case I can think of where this is not possible/desired is on arbitrary precision numbers where there is "no" max limit. There are many performance issues binded to variable p as the search for p is really slow (may be even slower than the NTT itself) and also variable p disables many performance optimizations of the modular arithmetics needed making the NTT really slow.

When will this Recurrence Relation repeat

I have this recurrence formula:
P(n) = ( P(n-1) + 2^(n/2) ) % (X)
s.t. P(1) = 2;
where n/2 is computer integer division i.e. floor of x/2
Since i am taking mod X, this relation should repeat at least with in X outputs.
but it can start repeating before that.
How to find this value?
It needn't repeat within x terms, consider x = 3:
P(1) = 2
P(2) = (P(1) + 2^(2/2)) % 3 = 4 % 3 = 1
P(3) = (P(2) + 2^(3/2)) % 3 = (1 + 2) % 3 = 0
P(4) = (P(3) + 2^(4/2)) % 3 = 4 % 3 = 1
P(5) = (P(4) + 2^(5/2)) % 3 = (1 + 4) % 3 = 2
P(6) = (P(5) + 2^(6/2)) % 3 = (2 + 8) % 3 = 1
P(7) = (P(6) + 2^(7/2)) % 3 = (1 + 8) % 3 = 0
P(8) = (P(7) + 2^(8/2)) % 3 = 16 % 3 = 1
P(9) = (P(8) + 2^(9/2)) % 3 = (1 + 16) % 3 = 2
P(10) = (P(9) + 2^(10/2)) % 3 = (2 + 32) % 3 = 1
P(11) = (P(10) + 2^(11/2)) % 3 = (1 + 32) % 3 = 0
P(12) = (P(11) + 2^(12/2)) % 3 = (0 + 64) % 3 = 1
and you see that the period is 4.
Generally (suppose X is odd, it's a bit more involved for even X), let k be the period of 2 modulo X, i.e. k > 0, 2^k % X = 1, and k is minimal with these properties (see below).
Consider all arithmetic modulo X. Then
n
P(n) = 2 + ∑ 2^(j/2)
j=2
It is easier to see when we separately consider odd and even n:
m m
P(2*m+1) = 2 + 2 * ∑ 2^i = 2 * ∑ 2^i = 2*(2^(m+1) - 1) = 2^((n+2)/2) + 2^((n+1)/2) - 2
i=1 i=0
since each 2^j appears twice, for j = 2*i and j = 2*i+1. For even n = 2*m, there's one summand 2^m missing, so
P(2*m) = 2^(m+1) + 2^m - 2 = 2^((n+2)/2) + 2^((n+1)/2) - 2
and we see that the length of the period is 2*k, since the changing parts 2^((n+1)/2) and 2^((n+2)/2) have that period. The period immediately begins, there is no pre-period part (there can be a pre-period for even X).
Now k <= φ(X) by Euler's generalisation of Fermat's theorem, so the period is at most 2 * φ(X).
(φ is Euler's totient function, i.e. φ(n) is the number of integers 1 <= k <= n with gcd(n,k) = 1.)
What makes it possible that the period is longer than X is that P(n+1) is not completely determined by P(n), the value of n also plays a role in determining P(n+1), in this case the dependence is simple, each power of 2 being used twice in succession doubles the period of the pure powers of 2.
Consider the sequence a[k] = (2^k) % X for odd X > 1. It has the simple recurrence
a[0] = 1
a[k+1] = (2 * a[k]) % X
so each value completely determines the next, thus the entire following part of the sequence. (Since X is assumed odd, it also determines the previous value [if k > 0] and thus the entire previous part of the sequence. With H = (X+1)/2, we have a[k-1] = (H * a[k]) % X.)
Hence if the sequence assumes one value twice (and since there are only X possible values, that must happen within the first X+1 values), at indices i and j = i+p > i, say, the sequence repeats and we have a[k+p] = a[k] for all k >= i. For odd X, we can go back in the sequence, therefore a[k+p] = a[k] also holds for 0 <= k < i. Thus the first value that occurs twice in the sequence is a[0] = 1.
Let p be the smallest positive integer with a[p] = 1. Then p is the length of the smallest period of the sequence a, and a[k] = 1 if and only if k is a multiple of p, thus the set of periods of a is the set of multiples of p. Euler's theorem says that a[φ(X)] = 1, from that we can conclude that p is a divisor of φ(X), in particular p <= φ(X) < X.
Now back to the original sequence.
P(n) = 2 + a[1] + a[1] + a[2] + a[2] + ... + a[n/2]
= a[0] + a[0] + a[1] + a[1] + a[2] + a[2] + ... + a[n/2]
Since each a[k] is used twice in succession, it is natural to examine the subsequences for even and odd indices separately,
E[m] = P(2*m)
O[m] = P(2*m+1)
then the transition from one value to the next is more regular. For the even indices we find
E[m+1] = E[m] + a[m] + a[m+1] = E[m] + 3*a[m]
and for the odd indices
O[m+1] = O[m] + a[m+1] + a[m+1] = O[m] + 2*a[m+1]
Now if we ignore the modulus for the moment, both E and O are geometric sums, so there's an easy closed formula for the terms. They have been given above (in slightly different form),
E[m] = 3 * 2^m - 2 = 3 * a[m] - 2
O[m] = 2 * 2^(m+1) - 2 = 2 * a[m+1] - 2 = a[m+2] - 2
So we see that O has the same (minimal) period as a, namely p, and E also has that period. Unless maybe if X is divisible by 3, that is also the minimal (positive) period of E (if X is divisible by 3, the minimal positive period of E could be a proper divisor of p, for X = 3 e.g., E is constant).
Thus we see that 2*p is a period of the sequence P obtained by interlacing E and O.
It remains to be seen that 2*p is the minimal positive period of P. Let m be the minimal positive period. Then m is a divisor of 2*p.
Suppose m were odd, m = 2*j+1. Then
P(1) = P(m+1) = P(2*m+1)
P(2) = P(m+2) = P(2*m+2)
and consequently
P(2) - P(1) = P(m+2) - P(m+1) = P(2*m+2) - P(2*m+1)
But P(2) - P(1) = a[1] and
P(m+2) - P(m+1) = a[(m+2)/2] = a[j+1]
P(2*m+2) - P(2*m+1) = a[(2*m+2)/2] = a[m+1] = a[2*j+2]
So we must have a[1] = a[j+1], hence j is a period of a, and a[j+1] = a[2*j+2], hence j+1 is a period of a too. But that means that 1 is a period of a, which implies X = 1, a contradiction.
Therefore m is even, m = 2*j. But then j is a period of O (and of E), thus a multiple of p. On the other hand, m <= 2*p implies j <= p, and the only (positive) multiple of p satisfying that inequality is p itself, hence j = p, m = 2*p.

Resources