D-like slices of immutable data in OCaml

D-like slices of immutable data in OCaml - functional-programming

Does OCaml have slices (like D slices of immutable data)? It seems like it would fit really nicely into the OCaml paradigm (you could avoid constantly having to reverse a list every time you want to do any kind of processing with tail recursion, because you can access/slice the list from both ends). Would it be difficult to implement?
As an example, if OCaml lists behaved like slices, I could say
let merge lhs rhs =
merge_helper lhs rhs []
let rec merge_helper lhs rhs res =
match lhs with
| [] -> res ^ rhs
| l_first :: l_rest ->
match rhs with
| [] -> res ^ lhs
| r_first :: r_rest ->
if l_first <= r_first then
merge_helper l_rest rhs (res ^ [l_first])
else
merge_helper lhs r_rest (res ^ [r_first])
Where lhs ^ rhs attempts to concatenate them by copying rhs into the space next to lhs (if available) and otherwise copies them into a new slot in memory at least twice as large as lhs.
EDIT: Perhaps I need to clarify
Concatenation such as let concatted = lhs ^ rhs is not a mutating operation. lhs will be the same as it was, and rhs will be the same as it was. concatted may or may not point to the same segment of memory as lhs (just with a larger length). The copying I was talking about is an "under-the-hood" operation. From the client's perspective all objects behave as if they were immutable and the construction lhs ^ rhs takes amortized O(|rhs|) time (amortized in the sense that if we keep on constructing longer slices by repeatedly concatenating things on the right, the number of internal re-allocations is small).
EDIT 2: Sorry, I was imagining that concatenating behaves like D appending. D doesn't do this because they also allow slices of mutable data, but in OCaml things default to immutable, so this wouldn't be a problem (at least, no more than it is for D lists).

I think you don't understand what lists are. You seem to consider that a list is just like a c++ vector (I don't know D's slices but from what I have found, it looks like a c++ vector) but with more restrictions.
It is not! A list is immutable, persistent and gives constant time cons (::) and it is impossible to achieve this with an array (even with a smart array that you call slice).
example of thing you can do with lists but not with arrays:
let l = [1; 2; 3; 4; 5]
let a = 0 :: l
let b = 1 :: l
the last two lines are constant time (no matter the size of l) and add constant space.

Related

How to make an tail recursive function and test it?

I would like to make this functions recursive but I don't know where to start.
let rec rlist r n =
if n < 1 then []
else Random.int r :: rlist r (n-1);;
let rec divide = function
h1::h2::t -> let t1,t2 = divide t in
h1::t1, h2::t2
| l -> l,[];;
let rec merge ord (l1,l2) = match l1,l2 with
[],l | l,[] -> l
| h1::t1,h2::t2 -> if ord h1 h2
then h1::merge ord (t1,l2)
else h2::merge ord (l1,t2);;
Is there any way to test if a function is recursive or not?

If you give a man a fish, you feed him for a day. But if you give him a fishing rod, you feed him for a lifetime.
Thus, instead of giving you the solution, I would better teach you how to solve it yourself.
A tail-recursive function is a recursive function, where all recursive calls are in a tail position. A call position is called a tail position if it is the last call in a function, i.e., if the result of a called function will become a result of a caller.
Let's take the following simple function as our working example:
let rec sum n = if n = 0 then 0 else n + sum (n-1)
It is not a tail-recursive function as the call sum (n-1) is not in a tail position because its result is then incremented by one. It is not always easy to translate a general recursive function into a tail-recursive form. Sometimes, there is a tradeoff between efficiency, readability, and tail-recursion.
The general techniques are:
use accumulator
use continuation-passing style
Using accumulator
Sometimes a function really needs to store the intermediate results, because the result of recursion must be combined in a non-trivial way. A recursive function gives us a free container to store arbitrary data - the call stack. A place, where the language runtime, stores parameters for the currently called functions. Unfortunately, the stack container is bounded, and its size is unpredictable. So, sometimes, it is better to switch from the stack to the heap. The latter is slightly slower (because it introduces more work to the garbage collector), but is bigger and more controllable. In our case, we need only one word to store the running sum, so we have a clear win. We are using less space, and we're not introducing any memory garbage:
let sum n =
let rec loop n acc = if n = 0 then acc else loop (n-1) (acc+n) in
loop n 0
However, as you may see, this came with a tradeoff - the implementation became slightly bigger and less understandable.
We used here a general pattern. Since we need to introduce an accumulator, we need an extra parameter. Since we don't want or can't change the interface of our function, we introduce a new helper function, that is recursive and will carry the extra parameter. The trick here is that we apply the summation before we do the recursive call, not after.
Using continuation-passing style
It is not always the case when you can rewrite your recursive algorithm using an accumulator. In this case, a more general technique can be used - the continuation-passing style. Basically, it is close to the previous technique, but we will use a continuation in the place of an accumulator. A continuation is a function, that will actually postpone the work, that is needed to be done after the recursion, to a later time. Conventionally, we call this function return or simply k (for the continuation). Mentally, the continuation is a way of throwing the result of computation back into the future. "Back" is because you returning the result back to the caller, in the future, because, the result will be used not now, but once everything is ready. But let's look at the implementation:
let sum n =
let rec loop n k = if n = 0 then k 0 else loop (n-1) (fun x -> k (x+n)) in
loop n (fun x -> x)
You may see, that we employed the same strategy, except that instead of int accumulator we used a function k as a second parameter. If the base case, if n is zero, we will return 0, (you can read k 0 as return 0). In the general case, we recurse in a tail position, with a regular decrement of the inductive variable n, however, we pack the work, that should be done with the result of the recursive function into a function: fun x -> k (x+n). Basically, this function says, once x - the result of recursion call is ready, add it to the number n and return. (Again, if we will use name return instead of k it could be more readable: fun x -> return (x+n)).
There is no magic here, we still have the same tradeoff, as with accumulator, as we create a new closure (functional object) at every recursive call. And each newly created closure contains a reference to the previous one (that was passed via the parameter). For example, fun x -> k (x+n) is a function, that captures two free variables, the value n and function k, that was the previous continuation. Basically, these continuations form a linked list, where each node bears a computation and all arguments except one. So, the computation is delayed until the last one is known.
Of course, for our simple example, there is no need to use CPS, since it will create unnecessary garbage and be much slower. This is only for demonstration. However, for more complex algorithms, in particular for those that combine results of two or more recursive calls in a non-trivial case, e.g., folding over a graph data structure.
So now, armed with the new knowledge, I hope that you will be able to solve your problems as easy as pie.
Testing for the tail recursion
The tail call is a pretty well-defined syntactic notion, so it should be pretty obvious whether the call is in a tail position or not. However, there are still few methods that allow one to check whether the call is in a tail position. In fact, there are other cases, when tail-call optimization may come into play. For example, a call that is right to the shortcircuit logical operator is also a tail call. So, it is not always obvious when a call is using the stack or it is a tail call. The new version of OCaml allows one to put an annotation at the call place, e.g.,
let rec sum n = if n = 0 then 0 else n + (sum [#tailcall]) (n-1)
If the call is not really a tail call, a warning is issued by a compiler:
Warning 51: expected tailcall
Another method is to compile with -annot option. The annotation file will contain an annotation for each call, for example, if we will put the above function into a file sum.ml and compile with ocamlc -annot sum.ml, then we can open sum.annot file and look for all calls:
"sum.ml" 1 0 41 "sum.ml" 1 0 64
call(
stack
)
If we, however, put our third implementation, then the see that all calls are tail calls, e.g. grep call -A1 sum.annot:
call(
tail
--
call(
tail
--
call(
tail
--
call(
tail
Finally, you can just test your program with some big input, and see whether your program will fail with the stack overflow. You can even reduce the size of the stack, this can be controlled with the environment variable OCAMLRUNPARAM, for example, to limit the stack to one thousand words:
export OCAMLRUNPARAM='l=1000'
ocaml sum.ml

You could do the following :
let rlist r n =
let aux acc n =
if n < 1 then acc
else aux (Random.int r :: acc) (n-1)
in aux [] n;;
let divide l =
let aux acc1 acc2 = function
| h1::h2::t ->
aux (h1::acc1) (h2::acc2) t
| [e] -> e::acc1, acc2
| [] -> acc1, acc2
in aux [] [] l;;
But for divide I prefer this solution :
let divide l =
let aux acc1 acc2 = function
| [] -> acc1, acc2
| hd::tl -> aux acc2 (hd :: acc1) tl
in aux [] [] l;;
let merge ord (l1,l2) =
let rec aux acc l1 l2 =
match l1,l2 with
| [],l | l,[] -> List.rev_append acc l
| h1::t1,h2::t2 -> if ord h1 h2
then aux (h1 :: acc) t1 l2
else aux (h2 :: acc) l1 t2
in aux [] l1 l2;;
As to your question about testing if a function is tail recursive or not, by looking out for it a bit you would have find it here.

What is the simplest way to iterate over an array of arrays?

Let x::Vector{Vector{T}}. What is the best way to iterate over all the elements of each inner vector (that is, all elements of type T)? The best I can come up with is a double iteration using the single-line notation, ie:
for n in eachindex(x), m in eachindex(x[n])
x[n][m]
end
but I'm wondering if there is a single iterator, perhaps in the Iterators package, designed specifically for this purpose, e.g. for i in some_iterator(x) ; x[i] ; end.
More generally, what about iterating over the inner-most elements of any array of arrays (that is, arrays of any dimension)?

Your way
for n in eachindex(x), m in eachindex(x[n])
x[n][m]
end
is pretty fast. If you want best speed, use
for n in eachindex(x)
y = x[n]
for m in eachindex(y)
y[m]
end
end
which avoids dereferencing twice (the first dereference is hard to optimize out because arrays are mutable, and so getindex isn't pure). Alternatively, if you don't need m and n, you could just use
for y in x, for z in y
z
end
which is also fast.
Note that column-major storage is irrelevant, since all arrays here are one-dimensional.
To answer your general question:
If the number of dimensions is a compile-time constant, see Base.Cartesian
If the number of dimensions is not a compile-time constant, use recursion
And finally, as Dan Getz mentioned in a comment:
using Iterators
for z in chain(x...)
z
end
also works. This however has a bit of a performance penalty.

I'm wondering if there is a single iterator, perhaps in the Iterators package, designed specifically for this purpose, e.g. for i in some_iterator(x) ; x[i] ; end
Today (in Julia 1.x versions), Iterators.flatten is exactly this.
help?> Iterators.flatten
flatten(iter)
Given an iterator that yields iterators, return an iterator that
yields the elements of those iterators. Put differently, the
elements of the argument iterator are concatenated.
julia> x = [1:5, [π, ℯ, 42], 'a':'e']
3-element Vector{AbstractVector}:
1:5
[3.141592653589793, 2.718281828459045, 42.0]
'a':1:'e'
julia> for el in Iterators.flatten(x)
print(el, " ")
end
1 2 3 4 5 3.141592653589793 2.718281828459045 42.0 a b c d e
julia>

Memoisation in OCaml and a Reference List

I am learning OCaml. I know that OCaml provides us with both imperative style of programming and functional programming.
I came across this code as part of my course to compute the n'th Fibonacci number in OCaml
let memoise f =
let table = ref []
in
let rec find tab n =
match tab with
| [] ->
let v = (f n)
in
table := (n, v) :: !table;
v
| (n', v) :: t ->
if n' = n then v else (find t n)
in
fun n -> find !table n
let fibonacci2 = memoise fibonacci1
Where the function fibonacci1 is implemented in the standard way as follows:
let rec fibonacci1 n =
match n with
| 0 | 1 -> 1
| _ -> (fibonacci1 (n - 1)) + (fibonacci1 (n - 2))
Now my question is that how are we achieving memoisation in fibonacci2. table has been defined inside the function fibonacci2 and thus, my logic dictates that after the function finishes computation, the list table should get lost and after each call the table will get built again and again.
I ran some a simple test where I called the function fibonacci 35 twice in the OCaml REPL and the second function call returned the answer significantly faster than the first call to the function (contrary to my expectations).
I though that this might be possible if declaring a variable using ref gives it a global scope by default.
So I tried this
let f y = let x = ref 5 in y;;
print_int !x;;
But this gave me an error saying that the value of x is unbounded.
Why does this behave this way?

The function memoise returns a value, call it f. (f happens to be a function). Part of that value is the table. Every time you call memoise you're going to get a different value (with a different table).
In the example, the returned value f is given the name fibonacci2. So, the thing named fibonacci2 has a table inside it that can be used by the function f.
There is no global scope by default, that would be a huge mess. At any rate, this is a question of lifetime not of scope. Lifetimes in OCaml last as long as an object can be reached somehow. In the case of the table, it can be reached through the returned function, and hence it lasts as long as the function does.
In your second example you are testing the scope (not the lifetime) of x, and indeed the scope of x is restricted to the subexpresssion of its let. (I.e., it is meaningful only in the expression y, where it's not used.) In the original code, all the uses of table are within its let, hence there's no problem.
Although references are a little tricky, the underlying semantics of OCaml come from lambda calculus, and are extremely clean. That's why it's such a delight to code in OCaml (IMHO).

Sublists of N length function in Erlang style

I've been learning Erlang and tried completing some practise functions. I struggled making one function in particular and think it might be due to me not thinking "Erlang" enough.
The function in question takes a list and a sublist size then produces a list of tuples containing the number of elements before the a sublist, the sublist itself and the number of elements after the sublist. For example
sublists(1,[a,b,c])=:=[{0,[a],2}, {1,[b],1}, {2,[c],0}].
sublists(2,[a,b,c])=:=[{0,[a,b],1}, {1,[b,c],0}].
My working solution was
sublists(SubListSize, [H | T]) ->
Length = length(1, T),
sublists(SubListSize, Length, Length-SubListSize, [H|T], []).
sublists(_, _, -1, _, Acc) -> lists:reverse(Acc);
sublists(SubSize, Length, Count, [H|T], Acc) ->
Sub = {Length-SubSize-Count, grab(SubSize, [H|T],[]),Count},
sublists(SubSize, Length, Count-1, T, [Sub|Acc]).
length(N, []) -> N;
length(N, [_|T]) -> length(N+1, T).
grab(0, _, Acc) -> lists:reverse(Acc);
grab(N, [H|T], Acc) -> grab(N-1, T, [H|Acc]).
but it doesn't feel right and I wondered if there was a better way?
There was an extension that asked for the sublists function to be re-implemented using a list comprehension. My failed attempt was
sublist_lc(SubSize, L) ->
Length = length(0, L),
Indexed = lists:zip(L, lists:seq(0, Length-1)),
[{I, X, Length-1-SubSize} || {X,I} <- Indexed, I =< Length-SubSize].
As I understand it, list comprehensions can't see ahead so I was unable to use my grab function from earlier. This again makes me thing there must be a better way of solving this problem.

I show a few approaches below. All protect against the case where the requested sublist length is greater than the list length. All use functions from the standard lists module.
The first one uses lists:split/2 to capture each sublist and the length of the remaining tail list, and uses a counter C to keep track of how many elements precede the sublist. The length of the remaining tail list, named Rest, gives the number of elements that follow each sublist.
sublists(N,L) when N =< length(L) ->
sublists(N,L,[],0).
sublists(N,L,Acc,C) when N == length(L) ->
lists:reverse([{C,L,0}|Acc]);
sublists(N,[_|T]=L,Acc,C) ->
{SL,Rest} = lists:split(N,L),
sublists(N,T,[{C,SL,length(Rest)}|Acc],C+1).
The next one uses two lists of counters, one indicating how many elements precede the sublist and the other indicating how many follow it. The first is easily calculated by simply counting from 0 to the length of the input list minus the length of each sublist, and the second list of counters is just the reverse of the first. These counter lists are also used to control recursion; we stop when each contains only a single element, indicating we've reached the final sublist and can end the recursion. This approach uses the lists:sublist/2 call to obtain all but the final sublist.
sublists(N,L) when N =< length(L) ->
Up = lists:seq(0,length(L)-N),
Down = lists:reverse(Up),
sublists(N,L,[],{Up,Down}).
sublists(_,L,Acc,{[U],[D]}) ->
lists:reverse([{U,L,D}|Acc]);
sublists(N,[_|T]=L,Acc,{[U|UT],[D|DT]}) ->
sublists(N,T,[{U,lists:sublist(L,N),D}|Acc],{UT,DT}).
And finally, here's a solution based on a list comprehension. It's similar to the previous solution in that it uses two lists of counters to control iteration. It also makes use of lists:nthtail/2 and lists:sublist/2 to obtain each sublist, which admittedly isn't very efficient; no doubt it can be improved.
sublists(N,L) when N =< length(L) ->
Up = lists:seq(0,length(L)-N),
Down = lists:reverse(Up),
[{U,lists:sublist(lists:nthtail(U,L),N),D} || {U,D} <- lists:zip(Up,Down)].
Oh, and a word of caution: your code implements a function named length/2, which is somewhat confusing because it has the same name as the standard length/1 function. I recommend avoiding naming your functions the same as such commonly-used standard functions.

Keeping a counter at each recursive call in OCaml

I am trying to write a function that returns the index of the passed value v in a given list x; -1 if not found. My attempt at the solution:
let rec index (x, v) =
let i = 0 in
match x with
[] -> -1
| (curr::rest) -> if(curr == v) then
i
else
succ i; (* i++ *)
index(rest, v)
;;
This is obviously wrong to me (it will return -1 every time) because it redefines i at each pass. I have some obscure ways of doing it with separate functions in my head, none which I can write down at the moment. I know this is a common pattern in all programming, so my question is, what's the best way to do this in OCaml?

Mutation is not a common way to solve problems in OCaml. For this task, you should use recursion and accumulate results by changing the index i on certain conditions:
let index(x, v) =
let rec loop x i =
match x with
| [] -> -1
| h::t when h = v -> i
| _::t -> loop t (i+1)
in loop x 0
Another thing is that using -1 as an exceptional case is not a good idea. You may forget this assumption somewhere and treat it as other indices. In OCaml, it's better to treat this exception using option type so the compiler forces you to take care of None every time:
let index(x, v) =
let rec loop x i =
match x with
| [] -> None
| h::t when h = v -> Some i
| _::t -> loop t (i+1)
in loop x 0

This is pretty clearly a homework problem, so I'll just make two comments.
First, values like i are immutable in OCaml. Their values don't change. So succ i doesn't do what your comment says. It doesn't change the value of i. It just returns a value that's one bigger than i. It's equivalent to i + 1, not to i++.
Second the essence of recursion is to imagine how you would solve the problem if you already had a function that solves the problem! The only trick is that you're only allowed to pass this other function a smaller version of the problem. In your case, a smaller version of the problem is one where the list is shorter.

You can't mutate variables in OCaml (well, there is a way but you really shouldn't for simple things like this)
A basic trick you can do is create a helper function that receives extra arguments corresponding to the variables you want to "mutate". Note how I added an extra parameter for the i and also "mutate" the current list head in a similar way.
let rec index_helper (x, vs, i) =
match vs with
[] -> -1
| (curr::rest) ->
if(curr == x) then
i
else
index_helper (x, rest, i+1)
;;
let index (x, vs) = index_helper (x, vs, 0) ;;
This kind of tail-recursive transformation is a way to translate loops to functional programming but to be honest it is kind of low level (you have full power but the manual recursion looks like programming with gotos...).
For some particular patterns what you can instead try to do is take advantage of reusable higher order functions, such as map or folds.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

D-like slices of immutable data in OCaml - functional-programming

Related

How to make an tail recursive function and test it?

What is the simplest way to iterate over an array of arrays?

Memoisation in OCaml and a Reference List

Sublists of N length function in Erlang style

Keeping a counter at each recursive call in OCaml

Categories

Resources