Functionally idiomatic FFT - functional-programming

I've written the this radix-2 FFT with the goal of making it functionally idiomatic without sacrificing too much performance:
let reverse x bits =
let rec reverse' x bits y =
match bits with
| 0 -> y
| _ -> ((y <<< 1) ||| (x &&& 1))
|> reverse' (x >>> 1) (bits - 1)
reverse' x bits 0
let radix2 (vector: Complex[]) (direction: int) =
let z = vector.Length
let depth = floor(Math.Log(double z, 2.0)) |> int
if (1 <<< depth) <> z then failwith "Vector length is not a power of 2"
// Complex roots of unity; "twiddle factors"
let unity: Complex[] =
let xpn = float direction * Math.PI / double z
Array.Parallel.init<Complex> (z/2) (fun i ->
Complex.FromPolarCoordinates(1.0, (float i) * xpn))
// Permutes elements of input vector via bit-reversal permutation
let pvec = Array.Parallel.init z (fun i -> vector.[reverse i depth])
let outerLoop (vec: Complex[]) =
let rec recLoop size =
if size <= z then
let mid, step = size / 2, z / size
let rec inrecLoop i =
if i < z then
let rec bottomLoop idx k =
if idx < i + mid then
let temp = vec.[idx + mid] * unity.[k]
vec.[idx + mid] <- (vec.[idx] - temp)
vec.[idx] <- (vec.[idx] + temp)
bottomLoop (idx + 1) (k + step)
bottomLoop i 0
inrecLoop (i + size)
inrecLoop 0
recLoop (size * 2)
recLoop 2
vec
outerLoop pvec
The outerLoop segment is the biggest nested tail-recursive mess I have ever written. I replicated the algorithm in the Wikipedia article for the Cooley-Tukey algorithm, but the only functional constructs I could think to implement using higher-order functions result in massive hits to both performance and memory efficiency. Are there other solutions that would yield the same results without resulting in massive slow-downs, while still being idiomatic?

I'm not an expert on how the algorithm works, so there might be a nice functional implementation, but it is worth noting that using a localised mutation is perfectly idiomatic in F#.
Your radix2 function is functional from the outside - it takes vector array as an input, never mutates it, creates a new array pvec which it then initializes (using some mutation along the way) and then returns it. This is a similar pattern to what built-in functions like Array.map use (which initializes a new array, mutates it and then returns it). This is often a sensible way of doing things, because some algorithms are better written using mutation.
In this case, it's perfectly reasonable to also use local mutable variables and loops. Doing that will make your code more readable compared to the tail-recursive version. I have not tested this, but my naive translation of your outerLoop function would just be to use three nested loops - something like this:
let mutable size = 2
while size <= z do
let mid, step = size / 2, z / size
let mutable i = 0
while i < z do
for j in 0 .. mid - 1 do
let idx, k = i + j, step * j
let temp = pvec.[idx + mid] * unity.[k]
pvec.[idx + mid] <- (pvec.[idx] - temp)
pvec.[idx] <- (pvec.[idx] + temp)
i <- i + size
size <- size * 2
This might not be exactly right (I did this just be refactoring your code), but I think it's actually more idiomatic than using complex nested tail-recursive functions in this case.

Related

implementing an algorithm to transform a real number to a continued fraction in #F

i am trying to implement a recursive function which takes a float and returns a list of ints representing the continued fraction representation of the float (https://en.wikipedia.org/wiki/Continued_fraction) In general i think i understand how the algorithm is supposed to work. its fairly simply. What i have so far is this:
let rec float2cfrac (x : float) : int list =
let q = int x
let r = x - (float q)
if r = 0.0 then
[]
else
q :: (float2cfrac (1.0 / r ))
the problem is with the base case obviously. It seems the value r never does reduce to 0.0 instead the algorithm keeps on returning values which are the likes of 0.0.....[number]. I am just not sure how to perform the comparison. How exactly should i go about it. The algorithm the function is based on says the base case is 0, so i naturally interpret this as 0.0. I dont see any other way. Also, do note that this is for an assignment where i am explicitly asked to implement the algorithm recursively. Does anyone have some guidance for me? It would be much appreciated
It seems the value r never does reduce to 0.0 instead the algorithm keeps on returning values which are the likes of 0.0.....[number].
This is a classic issue with floating point comparisons. You need to use some epsilon tolerance value for comparisons, because r will never reach exactly 0.0:
let epsilon = 0.0000000001
let rec float2cfrac (x : float) : int list =
let q = int x
let r = x - (float q)
if r < epsilon then
[]
else
q :: (float2cfrac (1.0 / r))
> float2cfrac 4.23
val it : int list = [4; 4; 2; 1]
See this MSDN documentation for more.
You could define a helper function for this:
let withinTolerance (x: float) (y: float) e =
System.Math.Abs(x - y) < e
Also note your original solution isn't tail-recursive, so it consumes stack as it recurses and could overflow the stack. You could refactor it such that a float can be unfolded without recursion:
let float2cfrac (x: float) =
let q = int x
let r = x - (float q)
if withinTolerance r 0.0 epsilon then None
else Some (q, (1.0 / r))
4.23 |> Seq.unfold float2cfrac // seq [4; 4; 2; 1]

Is it safe to replace "a/(b*c)" with "a/b/c" when using integer-division?

Is it safe to replace a/(b*c) with a/b/c when using integer-division on positive integers a,b,c, or am I at risk losing information?
I did some random tests and couldn't find an example of a/(b*c) != a/b/c, so I'm pretty sure it's safe but not quite sure how to prove it.
Thank you.
Mathematics
As mathematical expressions, ⌊a/(bc)⌋ and ⌊⌊a/b⌋/c⌋ are equivalent whenever b is nonzero and c is a positive integer (and in particular for positive integers a, b, c). The standard reference for these sorts of things is the delightful book Concrete Mathematics: A Foundation for Computer Science by Graham, Knuth and Patashnik. In it, Chapter 3 is mostly on floors and ceilings, and this is proved on page 71 as a part of a far more general result:
In the 3.10 above, you can define x = a/b (mathematical, i.e. real division), and f(x) = x/c (exact division again), and plug those into the result on the left ⌊f(x)⌋ = ⌊f(⌊x⌋)⌋ (after verifying that the conditions on f hold here) to get ⌊a/(bc)⌋ on the LHS equal to ⌊⌊a/b⌋/c⌋ on the RHS.
If we don't want to rely on a reference in a book, we can prove ⌊a/(bc)⌋ = ⌊⌊a/b⌋/c⌋ directly using their methods. Note that with x = a/b (the real number), what we're trying to prove is that ⌊x/c⌋ = ⌊⌊x⌋/c⌋. So:
if x is an integer, then there is nothing to prove, as x = ⌊x⌋.
Otherwise, ⌊x⌋ < x, so ⌊x⌋/c < x/c which means that ⌊⌊x⌋/c⌋ ≤ ⌊x/c⌋. (We want to show it's equal.) Suppose, for the sake of contradiction, that ⌊⌊x⌋/c⌋ < ⌊x/c⌋ then there must be a number y such that ⌊x⌋ < y ≤ x and y/c = ⌊x/c⌋. (As we increase a number from ⌊x⌋ to x and consider division by c, somewhere we must hit the exact value ⌊x/c⌋.) But this means that y = c*⌊x/c⌋ is an integer between ⌊x⌋ and x, which is a contradiction!
This proves the result.
Programming
#include <stdio.h>
int main() {
unsigned int a = 142857;
unsigned int b = 65537;
unsigned int c = 65537;
printf("a/(b*c) = %d\n", a/(b*c));
printf("a/b/c = %d\n", a/b/c);
}
prints (with 32-bit integers),
a/(b*c) = 1
a/b/c = 0
(I used unsigned integers as overflow behaviour for them is well-defined, so the above output is guaranteed. With signed integers, overflow is undefined behaviour, so the program can in fact print (or do) anything, which only reinforces the point that the results can be different.)
But if you don't have overflow, then the values you get in your program are equal to their mathematical values (that is, a/(b*c) in your code is equal to the mathematical value ⌊a/(bc)⌋, and a/b/c in code is equal to the mathematical value ⌊⌊a/b⌋/c⌋), which we've proved are equal. So it is safe to replace a/(b*c) in code by a/b/c when b*c is small enough not to overflow.
While b*c could overflow (in C) for the original computation, a/b/c can't overflow, so we don't need to worry about overflow for the forward replacement a/(b*c) -> a/b/c. We would need to worry about it the other way around, though.
Let x = a/b/c. Then a/b == x*c + y for some y < c, and a == (x*c + y)*b + z for some z < b.
Thus, a == x*b*c + y*b + z. y*b + z is at most b*c-1, so x*b*c <= a <= (x+1)*b*c, and a/(b*c) == x.
Thus, a/b/c == a/(b*c), and replacing a/(b*c) by a/b/c is safe.
Nested floor division can be reordered as long as you keep track of your divisors and dividends.
#python3.x
x // m // n = x // (m * n)
#python2.x
x / m / n = x / (m * n)
Proof (sucks without LaTeX :( ) in python3.x:
Let k = x // m
then k - 1 < x / m <= k
and (k - 1) / n < x / (m * n) <= k / n
In addition, (x // m) // n = k // n
and because x // m <= x / m and (x // m) // n <= (x / m) // n
k // n <= x // (m * n)
Now, if k // n < x // (m * n)
then k / n < x / (m * n)
and this contradicts the above statement that x / (m * n) <= k / n
so if k // n <= x // (m * n) and k // n !< x // (m * n)
then k // n = x // (m * n)
and (x // m) // n = x // (m * n)
https://en.wikipedia.org/wiki/Floor_and_ceiling_functions#Nested_divisions

Dynamic programming problems using iteration

I have spent a lot of time to learn about implementing/visualizing dynamic programming problems using iteration but I find it very hard to understand, I can implement the same using recursion with memoization but it is slow when compared to iteration.
Can someone explain the same by a example of a hard problem or by using some basic concepts. Like the matrix chain multiplication, longest palindromic sub sequence and others. I can understand the recursion process and then memoize the overlapping sub problems for efficiency but I can't understand how to do the same using iteration.
Thanks!
Dynamic programming is all about solving the sub-problems in order to solve the bigger one. The difference between the recursive approach and the iterative approach is that the former is top-down, and the latter is bottom-up. In other words, using recursion, you start from the big problem you are trying to solve and chop it down to a bit smaller sub-problems, on which you repeat the process until you reach the sub-problem so small you can solve. This has an advantage that you only have to solve the sub-problems that are absolutely needed and using memoization to remember the results as you go. The bottom-up approach first solves all the sub-problems, using tabulation to remember the results. If we are not doing extra work of solving the sub-problems that are not needed, this is a better approach.
For a simpler example, let's look at the Fibonacci sequence. Say we'd like to compute F(101). When doing it recursively, we will start with our big problem - F(101). For that, we notice that we need to compute F(99) and F(100). Then, for F(99) we need F(97) and F(98). We continue until we reach the smallest solvable sub-problem, which is F(1), and memoize the results. When doing it iteratively, we start from the smallest sub-problem, F(1) and continue all the way up, keeping the results in a table (so essentially it's just a simple for loop from 1 to 101 in this case).
Let's take a look at the matrix chain multiplication problem, which you requested. We'll start with a naive recursive implementation, then recursive DP, and finally iterative DP. It's going to be implemented in a C/C++ soup, but you should be able to follow along even if you are not very familiar with them.
/* Solve the problem recursively (naive)
p - matrix dimensions
n - size of p
i..j - state (sub-problem): range of parenthesis */
int solve_rn(int p[], int n, int i, int j) {
// A matrix multiplied by itself needs no operations
if (i == j) return 0;
// A minimal solution for this sub-problem, we
// initialize it with the maximal possible value
int min = std::numeric_limits<int>::max();
// Recursively solve all the sub-problems
for (int k = i; k < j; ++k) {
int tmp = solve_rn(p, n, i, k) + solve_rn(p, n, k + 1, j) + p[i - 1] * p[k] * p[j];
if (tmp < min) min = tmp;
}
// Return solution for this sub-problem
return min;
}
To compute the result, we starts with the big problem:
solve_rn(p, n, 1, n - 1)
The key of DP is to remember all the solutions to the sub-problems instead of forgetting them, so we don't need to recompute them. It's trivial to make a few adjustments to the above code in order to achieve that:
/* Solve the problem recursively (DP)
p - matrix dimensions
n - size of p
i..j - state (sub-problem): range of parenthesis */
int solve_r(int p[], int n, int i, int j) {
/* We need to remember the results for state i..j.
This can be done in a matrix, which we call dp,
such that dp[i][j] is the best solution for the
state i..j. We initialize everything to 0 first.
static keyword here is just a C/C++ thing for keeping
the matrix between function calls, you can also either
make it global or pass it as a parameter each time.
MAXN is here too because the array size when doing it like
this has to be a constant in C/C++. I set it to 100 here.
But you can do it some other way if you don't like it. */
static int dp[MAXN][MAXN] = {{0}};
/* A matrix multiplied by itself has 0 operations, so we
can just return 0. Also, if we already computed the result
for this state, just return that. */
if (i == j) return 0;
else if (dp[i][j] != 0) return dp[i][j];
// A minimal solution for this sub-problem, we
// initialize it with the maximal possible value
dp[i][j] = std::numeric_limits<int>::max();
// Recursively solve all the sub-problems
for (int k = i; k < j; ++k) {
int tmp = solve_r(p, n, i, k) + solve_r(p, n, k + 1, j) + p[i - 1] * p[k] * p[j];
if (tmp < dp[i][j]) dp[i][j] = tmp;
}
// Return solution for this sub-problem
return dp[i][j];;
}
We start with the big problem as well:
solve_r(p, n, 1, n - 1)
Iterative solution is only to, well, iterate all the states, instead of starting from the top:
/* Solve the problem iteratively
p - matrix dimensions
n - size of p
We don't need to pass state, because we iterate the states. */
int solve_i(int p[], int n) {
// But we do need our table, just like before
static int dp[MAXN][MAXN];
// Multiplying a matrix by itself needs no operations
for (int i = 1; i < n; ++i)
dp[i][i] = 0;
// L represents the length of the chain. We go from smallest, to
// biggest. Made L capital to distinguish letter l from number 1
for (int L = 2; L < n; ++L) {
// This double loop goes through all the states in the current
// chain length.
for (int i = 1; i <= n - L + 1; ++i) {
int j = i + L - 1;
dp[i][j] = std::numeric_limits<int>::max();
for (int k = i; k <= j - 1; ++k) {
int tmp = dp[i][k] + dp[k+1][j] + p[i-1] * p[k] * p[j];
if (tmp < dp[i][j])
dp[i][j] = tmp;
}
}
}
// Return the result of the biggest problem
return dp[1][n-1];
}
To compute the result, just call it:
solve_i(p, n)
Explanation of the loop counters in the last example:
Let's say we need to optimize the multiplication of 4 matrices: A B C D. We are doing an iterative approach, so we will first compute the chains with the length of two: (A B) C D, A (B C) D, and A B (C D). And then chains of three: (A B C) D, and A (B C D). That is what L, i and j are for.
L represents the chain length, it goes from 2 to n - 1 (n is 4 in this case, so that is 3).
i and j represent the starting and ending position of the chain. In case L = 2, i goes from 1 to 3, and j goes from 2 to 4:
(A B) C D A (B C) D A B (C D)
^ ^ ^ ^ ^ ^
i j i j i j
In case L = 3, i goes from 1 to 2, and j goes from 3 to 4:
(A B C) D A (B C D)
^ ^ ^ ^
i j i j
So generally, i goes from 1 to n - L + 1, and j is i + L - 1.
Now, let's continue with the algorithm assuming that we are at the step where we have (A B C) D. We now need to take into account the sub-problems (which are already calculated): ((A B) C) D and (A (B C)) D. That is what k is for. It goes through all the positions between i and j and computes the sub problems.
I hope I helped.
The problem with recursion is the high number of stack frames that need to be pushed/popped. This can quickly become the bottle-neck.
The Fibonacci Series can be calculated with iterative DP or recursion with memoization. If we calculate F(100) in DP all we need is an array of length 100 e.g. int[100] and that's the guts of our used memory. We calculate all entries of the array pre-filling f[0] and f[1] as they are defined to be 1. and each value just depends on the previous two.
If we use a recursive solution we start at fib(100) and work down. Every method call from 100 down to 0 is pushed onto the stack, AND checked if it's memoized. These operations add up and iteration doesn't suffer from either of these. In iteration (bottom-up) we already know all of the previous answers are valid. The bigger impact is probably the stack frames; and given a larger input you may get a StackOverflowException for what was otherwise trivial with an iterative DP approach.

Reversing an int in OCaml

I'm teaching myself OCaml, and the main resources I'm using for practice are some problem sets Cornell has made available from their 3110 class. One of the problems is to write a function to reverse an int (i.e: 1234 -> 4321, -1234 -> -4321, 2 -> 2, -10 -> -1 etc).
I have a working solution, but I'm concerned that it isn't exactly idiomatic OCaml:
let rev_int (i : int) : int =
let rec power cnt value =
if value / 10 = 0 then cnt
else power (10 * cnt) (value/10) in
let rec aux pow temp value =
if value <> 0 then aux (pow/10) (temp + (value mod 10 * pow)) (value / 10)
else temp in
aux (power 1 i) 0 i
It works properly in all cases as far as I can tell, but it just seems seriously "un-OCaml" to me, particularly because I'm running through the length of the int twice with two inner-functions. So I'm just wondering whether there's a more "OCaml" way to do this.
I would say, that the following is idiomatic enough.
(* [rev x] returns such value [y] that its decimal representation
is a reverse of decimal representation of [x], e.g.,
[rev 12345 = 54321] *)
let rev n =
let rec loop acc n =
if n = 0 then acc
else loop (acc * 10 + n mod 10) (n / 10) in
loop 0 n
But as Jeffrey said in a comment, your solution is quite idiomatic, although not the nicest one.
Btw, my own style, would be to write like this:
let rev n =
let rec loop acc = function
| 0 -> acc
| n -> loop (acc * 10 + n mod 10) (n / 10) in
loop 0 n
As I prefer pattern matching to if/then/else. But this is a matter of mine personal taste.
I can propose you some way of doing it:
let decompose_int i =
let r = i / 10 in
i - (r * 10) , r
This function allows me to decompose the integer as if I had a list.
For instance 1234 is decomposed into 4 and 123.
Then we reverse it.
let rec rev_int i = match decompose_int i with
| x , 0 -> 10 , x
| h , t ->
let (m,r) = rev_int t in
(10 * m, h * m + r)
The idea here is to return 10, 100, 1000... and so on to know where to place the last digit.
What I wanted to do here is to treat them as I would treat lists, decompose_int being a List.hd and List.tl equivalent.

Memoization in OCaml?

It is possible to improve "raw" Fibonacci recursive procedure
Fib[n_] := If[n < 2, n, Fib[n - 1] + Fib[n - 2]]
with
Fib[n_] := Fib[n] = If[n < 2, n, Fib[n - 1] + Fib[n - 2]]
in Wolfram Mathematica.
First version will suffer from exponential explosion while second one will not since Mathematica will see repeating function calls in expression and memoize (reuse) them.
Is it possible to do the same in OCaml?
How to improve
let rec fib n = if n<2 then n else fib (n-1) + fib (n-2);;
in the same manner?
The solution provided by rgrinberg can be generalized so that we can memoize any function. I am going to use associative lists instead of hashtables. But it does not really matter, you can easily convert all my examples to use hashtables.
First, here is a function memo which takes another function and returns its memoized version. It is what nlucaroni suggested in one of the comments:
let memo f =
let m = ref [] in
fun x ->
try
List.assoc x !m
with
Not_found ->
let y = f x in
m := (x, y) :: !m ;
y
The function memo f keeps a list m of results computed so far. When asked to compute f x it first checks m to see if f x has been computed already. If yes, it returns the result, otherwise it actually computes f x, stores the result in m, and returns it.
There is a problem with the above memo in case f is recursive. Once memo calls f to compute f x, any recursive calls made by f will not be intercepted by memo. To solve this problem we need to do two things:
In the definition of such a recursive f we need to substitute recursive calls with calls to a function "to be provided later" (this will be the memoized version of f).
In memo f we need to provide f with the promised "function which you should call when you want to make a recursive call".
This leads to the following solution:
let memo_rec f =
let m = ref [] in
let rec g x =
try
List.assoc x !m
with
Not_found ->
let y = f g x in
m := (x, y) :: !m ;
y
in
g
To demonstrate how this works, let us memoize the naive Fibonacci function. We need to write it so that it accepts an extra argument, which I will call self. This argument is what the function should use instead of recursively calling itself:
let fib self = function
0 -> 1
| 1 -> 1
| n -> self (n - 1) + self (n - 2)
Now to get the memoized fib, we compute
let fib_memoized = memo_rec fib
You are welcome to try it out to see that fib_memoized 50 returns instantly. (This is not so for memo f where f is the usual naive recursive definition.)
You pretty much do what the mathematica version does but manually:
let rec fib =
let cache = Hashtbl.create 10 in
begin fun n ->
try Hashtbl.find cache n
with Not_found -> begin
if n < 2 then n
else
let f = fib (n-1) + fib (n-2) in
Hashtbl.add cache n f; f
end
end
Here I choose a hashtable to store already computed results instead of recomputing them.
Note that you should still beware of integer overflow since we are using a normal and not a big int.

Resources