How to shuffle list in O(n) in OCaml? - functional-programming

It is not hard to shuffle an array in O(n), with in place swapping,
How to do it for list in OCaml, with O(n)?
Requirement:
No array or in place usage
Consider this as an interview question

Lists are immutable, and there's often a log n price to pay for working with immutable data. If you're willing to pay this cost, there's an obvious n log n approach: tag each list element with a random value, sort based on random value, remove random values. This is the way I shuffle lists in my production code.
Here is the shuffle code from the iOS apps that I sell:
let shuffle d =
let nd = List.map (fun c -> (Random.bits (), c)) d in
let sond = List.sort compare nd in
List.map snd sond

You could mimick the riffle shuffle for cards.
A riffle shuffle of a deck of cards means to:
cut the deck in two parts
interleave the two parts
It is actually easier to do the reverse permutation:
have two auxiliary lists A and B, iter through your original list L and push each element randomly (with probability 1/2) in front of A or B.
L := List.rev A # List.rev B (this can be tail recursive with a custom List.rev).
repeat k times.
According to "Mathematical developments from the analysis of riffle shuffling, by Persi Diaconis, 2002", choose k = 3/2 log_2(n) + c. Indeed, the total variation distance between uniformity and the result falls exponentially fast to 0: it is approximately halved each time you increment c. You could choose c=10.
Space O(1) (if you destroy L), time O(n log n). But there are O(n log n) calls to the random generator, while Jeffrey Scofield's solution only needs O(n) random bits, but Θ(n) space.

Related

Propositional logic, logical equivalent

a) Determine whether the following statement forms are logically equivalent:
p -> (q -> r) and (p -> q) -> r
b) Use the logical equivalence established in part (a) to rewrite the following sentence in two different ways. (Assume that n represents a fixed integer.)
If n is prime, then n is odd or n is 2.
Can someone help me with the B one? its really confusing
If n is prime, then n is odd or n is 2.
The question is asking you to rewrite the sentence in two different ways in English
If n is prime and n is not odd, then n is 2.
If n is prime and n is not 2, then n is odd.
The following links do a better job of explaining it:
Logically_Equivalent_Statements
Exercises on Logic of Compound Statements and Valid Arguments

Find a function between two arrays that minimises distance between pairs

I will explain my problem in general setting (as I am interested in a general algorithm), then decline it to my particular case.
Say we have two finite sets, A and B, both subsets of X and a distance function d that assigns a distance between any two points of X.
What is an algorithm to find two functions: f1 from A to B and f2 from B to A such that f1(a) is the element in B that is closest to a and the same viceversa for f2.
My special case is in R language, where I have two sets of points on earth (lat, lon) and I need to pair them up (from A to B and viceversa) according to their distance.
For reference, I am using the Haversine distance from geosphere package.
Thanks in advance.
Just mentioning, this is an algorithmic solution for an algorithmic problem.
Lets begin with a solution in O(n^2) time and memory complexity. For each element in A remember the distance from each element in B. Then iterate over this 2 dimensional array and for each row find its minimum - these elements are the image of f1, f2 is always the reverse function from f1.
Now we can create a similar solution in O(n log n) time complexity and O(n) memory complexity. Using a binary search.
Let's sort the elements in A in a way we can say what is the closest item to some item out of the set in O(log n). With numbers it can be done just by sorting them, with lon & lat you just need to sort them first by lon than by lat.
Now for each element in A search what is the closest item in B using binary search. It will take O(log n) per question. Now for each element we know which is the closest. O(n log n).

Find paths of length = 4, starting by an adjacency matrix of a directed graph, considering only distinct edges?

Given an EREW-PRAM model, that allows me to use an arbitrary number of processors in parallel without them conflicting nor in read, nor in write access, I need to find the number of paths of length 4, considering that I have an input node-node adjacency matrix A representing a directed graph and that I need to exclude paths that don't use distinct edges (e.g.: (a,b),(b,a),(a,b),(b,a) is not a valid path).
I have a function that uses n^3 processors and calculates the matrix multiplication of two given matrices in time O(logn):
mult-matrix(A, A, n) => B --> gives me the paths of length 2.
mult-matrix(B, B, n) => C --> gives me the paths of length 4, but I think it considers paths that run across the same edges.
I tried subtracting 1 from elements of C that have a node u communicating with a node v in both directions, but I'm not sure it works.
How could I solve the problem considering that I just need to exclude some paths from the resulting matrix C?
Any working solution is appreciated, considering that the number of processors is constrained to n^3 and time must be O(logn) in the worst case. The exercises must be solved using a pseudo-pascal language, but given a working solution, I should be able to write the pseudocode by myself.
I think I found a solution in https://www.perlmonks.org/?node_id=522270
Given an input matrix A, I am able to calculate the adjacency matrix for paths of length 2, 3 and 4 with the provided function.
A2 is the adjacency matrix obtained by multiplying A*A and contains paths of length 2
A3 is obtained by multiplying A2*A and contains paths of length 3
A4 is obtained by multiplying A3*A and contains paths of length 4
In order to exclude the repeated edges, I have to compute the matrix C, obtained by doing an element-wise subtraction among the calculated matrices.
C[i,j] = A4[i,j] - A3[i,j] - A2[i,j] - A[i,j]
C contains the final result.
The following pseudocode solves the problem with an EREW-PRAM using O(n^3) processors and in time O(logn).
procedure paths_length_4(A, n) // Work = O(n^3 logn)
begin
A2 := mult_matrix(A, A, n) // T=O(logn), P=O(n^3)
A3 := mult_matrix(A2, A, n) // T=O(logn), P=O(n^3)
A4 := mult_matrix(A3, A, n) // T=O(logn), P=O(n^3)
for all i,j where 1 ≤ i ≤ n, 1 ≤ j ≤ n pardo // P=O(n^2)
C[i,j] := A4[i,j] - A3[i,j] - A2[i,j] - A[i,j]
end

Time complexity of this recursive block

int recursiveFunc(int n) {
if (n == 1) return 0;
for (int i = 2; i < n; i++)
if (n % i == 0) return i + recursiveFunc(n / i);
return n;
}
I know Complexity = length of tree from root node to leaf node * number of leaf nodes, but having hard time to come to an equation.
This one is tricky, because the runtime is highly dependent on what number you provide in as input in a way that most recursive functions are not.
For starters, notice that the way that this recursion works, it takes in a number and then either
returns without making any further calls if the number is prime, or
recursively calls itself on number divided by that proper factor.
This means that in one case, the function, called on a number n, will do Θ(n) work and make no calls (which happens if the number is prime), and in the other case will do Θ(d) work and then make a recursive call on the number n / d, which happens if n is composite and is the largest divisor of n.
One useful fact we'll use to analyze this function is that given a composite number n, the smallest factor d of n is never any greater than √n. If it were, then we would have that n = df for some other factor f, and since d is the smallest proper divisor, we'd have that f ≥ d, so df > √n √ n = n, which would be impossible.
With that in mind, we can argue that the worst-case runtime of this function is O(n), and in fact that happens when n is prime. Here's how to see this. Imagine the worst-case amount of time this function can take if it ends up making a recursive call. In that case, the function will do at most Θ(√n) work (let's assume our smallest divisor is as large as possible), then recursively makes a call on a number whose size is at most n / 2 (which is the absolute largest number we could get as part of the recursive call. In that case, we'd get this recurrence relation under the pessimistic assumption that we do the maximum work possible
T(n) = T(n / 2) + √n
This solves, by the Master Theorem, to Θ(√n), which is less work than what we'd do if we had a prime number as an input.
But what happens if, instead, we do the maximum amount of work possible for some number of iterations, and then end up with a prime number and stop? In that case, using the iteration method, we'd see that the work done would be
n1/2 + n1/4 + ... + n / 2k,
which would happen if we stopped after k iterations. In this case, notice that this expression is maximized when we pick k to be as small as possible - which would correspond to stopping as soon as possible, which happens if we pick a prime number for n.
So in this sense, the worst-case runtime of this function is Θ(n), which happens for n being a prime number, with composite numbers terminating much faster than this.
So how fast can this function be? Well, imagine, for example, that we have a number of the form pk, where p is some prime number. In that case, this function will do Θ(p) work to discover p as a prime factor, then recursively call itself on the number pk-1. If you think about what this will look like, this function will end up doing Θ(p) work Θ(k) times for a total runtime of Θ(pk). And since n = pk, we'd have k = logp n, so the runtime would be Θ(p logp n). That's minimized at either p = 2 or p = 3, and in either case gives us a runtime of Θ(log n) in this case.
I strongly suspect that's the best case here, though I'm not entirely sure. But what this does mean is that
the worst-case runtime is definitely Θ(n), occurring at prime numbers, and
the best-case runtime is O(log n), which I'm fairly certain is a tight bound but I'm not 100% sure how to prove.

White-box and Black-box testing of recursive functions

I learned white-box and black-box testing in terms of iterative functions. Now i need to do white-box and black-box testing of several recursive functions (in F#). take the following recursive algorithm for gcd:
gcd (m, n)
if (m % n) = 0 then
n
else
gcd n ( m % n)
For the white-box test: how exactly do i go about covering the different branches of the algorithm? Naively one could say there are two branches but when the function is called more than once the possible branches will obviously increase. Should i do testing with arguments which results in different amounts of recursive calls or how exactly do i determine which values to test with?
black-box: i get the general idea of black box testing. we should look at possible values we might want to call the function with without having knowledge of its inner workings. In this case i am just not sure which are values we might want to call it with. one way could be just to start with two values m and n for which gcd = 1 and then do the same for values m and for which gcd = 2 up to some gcd= n for some arbitrary number n. Is this how one is supposed to go about this?
First of all, I don't think there is one single established definition of how to do white-box and black-box testing of recursive functions, but here is how I interpret it.
White-box testing. We want to test the function based on its inner working. In case of recursive functions, I think this means that we want to test that the recursive calls it makes are the ones we would expect. One way to do this is to log all recursive calls. A simple implementation of gcd that does this adds a parameter to keep a log and returns it with the result:
let rec gcd log m n =
let log = (m, n)::log
if (m % n) = 0 then List.rev log, n
else gcd log n (m % n)
Now, for some two parameters, say 54 and 22, you can do the calculation by hand, decide what the parameters of the recursive calls should be and write a test for that:
let log, res = gcd [] 54 22
log |> shouldEqual [ (54, 22); (22, 10); (10, 2) ]
Black-box testing. Here, we assume we do not know how exactly the function works, so we cannot test its internals. All we can do is to test it using a number of inputs. It is probably a good idea to think of corner-case or tricky inputs because those are the ones that could cause problems. Given a simple implementation:
let rec gcd m n =
if (m % n) = 0 then n
else gcd n (m % n)
I would probably write tests for the following:
// A random case where one of the numbers is the result
gcd 100 50 |> shouldEqual 50
gcd 50 100 |> shouldEqual 50
// A random case where the only divisor is 1
gcd 13 123 |> shouldEqual 1
gcd 123 13 |> shouldEqual 1
// The following are problematic and I'm not sure what the right behaviour is
gcd 0 0 // This probably should not be allowed
gcd 10 -5 // This returns -5, but I'm not sure that's what we want
Random testing.
You could also use random testing (which is a form of black box testing) to generate multiple test cases automatically. There are at least two random tests I can think of:
Generate two random numbers, a and b and check that gcd a b = gcd b a. This is testing only a very basic property, but it can cover quite a lot of cases.
Pick a random number a and a couple of primes p1, p2, .... Then split the primes into two groups and produce a*p1*p3*p5 and a*p2*p4*p6. Write a test that checks that the GCD of the two numbers is a.

Resources