Standard ML : Calculating the average of a given set - functional-programming

I recently had the assignment to calculate the average of a set (given by input) in Standard ML.
The idea is to have a function like below in which you input a list of real numbers and receive the average of those numbers (also a real), such that the terminal gives you this as a return answer when you input the function:
average = fn : real list -> real
We discussed this in a tutorial as well but I wanted to know if there was some sort of trick when creating such functions in Standard ML.
Thanks in advance!

Sum the numbers and divide by the length. A simple recursive sum is typically one of the first examples that you would see in any SML tutorial. You would need to have the empty list basis case of sum evaluate to 0.0 rather than 0 to make sure that the return type is real. Once you define a sum function then you can define average in 1 line using sum and the built in length function. A subtlty is that SML doesn't allow a real to be divided by an int. You could use the conversion function Real.fromInt on the length before dividing the sum by it. There is some inefficiency in passing over the same list twice, once to sum it and once to calculate its length, but there is little reason to worry about such things when you are first learning the language.
On Edit: Since you have found a natural solution and shared it in the comments, here is a more idiomatic version which computes the average in one pass over the list:
fun average nums =
let
fun av (s,n,[]) = s/Real.fromInt(n)
| av (s,n,x::xs) = av (s+x,n+1,xs)
in
av (0.0, 0, nums)
end;
It works by defining a helper function which does the heavy lifting. These are used extensively in functional programming. In the absence of mutable state, a common trick is to explicitly pass as parameters quantities which would be successively modified by a corresponding loop in an imperative language. Such parameters are often called accumulators since they typically accumulate growing lists, running sums, running products, etc. Here s and n are the accumulators, with s the sum of the elements and n the length of the list. In the basis case of (s,n,[]) there is nothing more to accumulate so the final answer is returned. In the non-basis case, (s,n,x::xs), s and n are modified appropriately and passed to the helper function along with the tail of the list. The definition of av is tail-recursive hence will run with the speed of a loop without growing the stack. The only thing that the overall average function needs to do is to invoke the helper function with the appropriate initial values. The let ... helper def ... in ... helper called with start-up values ...end is a common idiom used to prevent the top-level of a program from being cluttered with helper functions.

Since only non-empty lists can have averages, an alternative on John Coleman's answer is:
fun average [] = NONE
| average nums =
let
fun av (s,n,[]) = s/Real.fromInt(n)
| av (s,n,x::xs) = av (s+x,n+1,xs)
in
SOME (av (0.0, 0, nums))
end;
Whether a function for calculating averages should take non-empty lists into account depends on whether you intend to export it or only use it within a scope in which you guarantee elsewhere that the input list is non-empty.

Related

Prime factorization in a functional language

I'm writing in OCaml, trying to give a (somewhat) efficient implementation of prime factorization. I figure the best representation of a number 2 or more is in a list of exponents. For simplicity with consing I'll do it in decreasing order of primes. So 2 would be [1] and 3 would be [1;0] and 4 would be [2], and 5 [1;0;0].
I was thinking of using the sieve idea to take a number n and look for all possible divisors between 2 and sqrt(n). Then divide by any divisor and recurse. However, every implementation that I can think of seems to involve repeatedly searching over a list and that seems just unnecessarily inefficient. The outline of my solution is best stated in this code
let rec pf n =
if (n=2) then ([1], 0)
else let sq = int_of_float ( (float_of_int n) ** 0.5 ) in
let primes = getPrimes sq in
match earliestDiv n primes with
| None -> n::(zero_list (n-1))
| Some (x, i) -> let subproblem pf (n/x) in
increment subproblem i
The helper functions here would be:
getPrimes which takes an int and returns a list of all prime numbers less-than-or-equal to it.
earliestDiv which takes an int n and list of ints lst, returns an int*int option corresponding to the earliest number in lst which divides n. That will be the first coordinate of the tuple; the second coordinate will return the index of this prime x in the list of primes.
increment will take an int list and index, and increase by 1 the number located at the index.
All of these helper functions keep making lists, and passing through lists, and so on. And in fact, I often feel like I'm doing this in functional programming. I often have the sense that I'm unnecessarily iterating over lists whereas in imperative languages I would be writing code that is more efficient. Perhaps it's just in my head, and when writing in imperative languages I less often notice how many resources are going into some of the list operations I use. But if I'm missing some important technique that could prevent repeatedly scanning lists, I'd be curious to hear it.
The question: Is it necessary to repeatedly make and iterate over lists in order to write this function?
If you end up indexing a list, or filling a list with default elements up to a fixed size, lists are most probably the wrong data structure. For prime factorisation, you probably want an implementation of a sparse array. Maps would be a better (if not optimal) implementation than fixed-size lists.

Divisibility function in SML

I've been struggling with the basics of functional programming lately. I started writing small functions in SML, so far so good. Although, there is one problem I can not solve. It's on Project Euler (https://projecteuler.net/problem=5) and it simply asks for the smallest natural number that is divisible from all the numbers from 1 - n (where n is the argument of the function I'm trying to build).
Searching for the solution, I've found that through prime factorization, you analyze all the numbers from 1 to 10, and then keep the numbers where the highest power on a prime number occurs (after performing the prime factorization). Then you multiply them and you have your result (eg for n = 10, that number is 2520).
Can you help me on implementing this to an SML function?
Thank you for your time!
Since coding is not a spectator sport, it wouldn't be helpful for me to give you a complete working program; you'd have no way to learn from it. Instead, I'll show you how to get started, and start breaking down the pieces a bit.
Now, Mark Dickinson is right in his comments above that your proposed approach is neither the simplest nor the most efficient; nonetheless, it's quite workable, and plenty efficient enough to solve the Project Euler problem. (I tried it; the resulting program completed instantly.) So, I'll go with it.
To start with, if we're going to be operating on the prime decompositions of positive integers (that is: the results of factorizing them), we need to figure out how we're going to represent these decompositions. This isn't difficult, but it's very helpful to lay out all the details explicitly, so that when we write the functions that use them, we know exactly what assumptions we can make, what requirements we need to satisfy, and so on. (I can't tell you how many times I've seen code-writing attempts where different parts of the program disagree about what the data should look like, because the exact easiest form for one function to work with was a bit different from the exact easiest form for a different function to work with, and it was all done in an ad hoc way without really planning.)
You seem to have in mind an approach where a prime decomposition is a product of primes to the power of exponents: for example, 12 = 22 × 31. The simplest way to represent that in Standard ML is as a list of pairs: [(2,2),(3,1)]. But we should be a bit more precise than this; for example, we don't want 12 to sometimes be [(2,2),(3,1)] and sometimes [(3,1),(2,2)] and sometimes [(3,1),(5,0),(2,2)]. So, we can say something like "The prime decomposition of a positive integer is represented as a list of prime–exponent pairs, with the primes all being positive primes (2,3,5,7,…), the exponents all being positive integers (1,2,3,…), and the primes all being distinct and arranged in increasing order." This ensures a unique, easy-to-work-with representation. (N.B. 1 is represented by the empty list, nil.)
By the way, I should mention — when I tried this out, I found that everything was a little bit simpler if instead of storing exponents explicitly, I just repeated each prime the appropriate number of times, e.g. [2,2,3] for 12 = 2 × 2 × 3. (There was no single big complication with storing exponents explicitly, it just made a lot of little things a bit more finicky.) But the below breakdown is at a high level, and applies equally to either representation.
So, the overall algorithm is as follows:
Generate a list of the integers from 1 to 10, or 1 to 20.
This part is optional; you can just write the list by hand, if you want, so as to jump into the meatier part faster. But since your goal is to learn the basics of functional programming, you might as well do this using List.tabulate [documentation].
Use this to generate a list of the prime decompositions of these integers.
Specifically: you'll want to write a factorize or decompose function that takes a positive integer and returns its prime decomposition. You can then use map, a.k.a. List.map [documentation], to apply this function to each element of your list of integers.
Note that this decompose function will need to keep track of the "next" prime as it's factoring the integer. In some languages, you would use a mutable local variable for this; but in Standard ML, the normal approach is to write a recursive helper function with a parameter for this purpose. Specifically, you can write a function helper such that, if n and p are positive integers, p ≥ 2, where n is not divisible by any prime less than p, then helper n p is the prime decomposition of n. Then you just write
local
fun helper n p = ...
in
fun decompose n = helper n 2
end
Use this to generate the prime decomposition of the least common multiple of these integers.
To start with, you'll probably want to write a lcmTwoDecompositions function that takes a pair of prime decompositions, and computes the least common multiple (still in prime-decomposition form). (Writing this pairwise function is much, much easier than trying to create a multi-way least-common-multiple function from scratch.)
Using lcmTwoDecompositions, you can then use foldl or foldr, a.k.a. List.foldl or List.foldr [documentation], to create a function that takes a list of zero or more prime decompositions instead of just a pair. This makes use of the fact that the least common multiple of { n1, n2, …, nN } is lcm(n1, lcm(n2, lcm(…, lcm(nN, 1)…))). (This is a variant of what Mark Dickinson mentions above.)
Use this to compute the least common multiple of these integers.
This just requires a recompose function that takes a prime decomposition and computes the corresponding integer.

Count negative numbers in list using list comprehension

Working through the first edition of "Introduction to Functional Programming", by Bird & Wadler, which uses a theoretical lazy language with Haskell-ish syntax.
Exercise 3.2.3 asks:
Using a list comprehension, define a function for counting the number
of negative numbers in a list
Now, at this point we're still scratching the surface of lists. I would assume the intention is that only concepts that have been introduced at that point should be used, and the following have not been introduced yet:
A function for computing list length
List indexing
Pattern matching i.e. f (x:xs) = ...
Infinite lists
All the functions and operators that act on lists - with one exception - e.g. ++, head, tail, map, filter, zip, foldr, etc
What tools are available?
A maximum function that returns the maximal element of a numeric list
List comprehensions, with possibly multiple generator expressions and predicates
The notion that the output of the comprehension need not depend on the generator expression, implying the generator expression can be used for controlling the size of the generated list
Finite arithmetic sequence lists i.e. [a..b] or [a, a + step..b]
I'll admit, I'm stumped. Obviously one can extract the negative numbers from the original list fairly easily with a comprehension, but how does one then count them, with no notion of length or indexing?
The availability of the maximum function would suggest the end game is to construct a list whose maximal element is the number of negative numbers, with the final result of the function being the application of maximum to said list.
I'm either missing something blindingly obvious, or a smart trick, with a horrible feeling it may be the former. Tell me SO, how do you solve this?
My old -- and very yellowed copy of the first edition has a note attached to Exercise 3.2.3: "This question needs # (length), which appears only later". The moral of the story is to be more careful when setting exercises. I am currently finishing a third edition, which contains answers to every question.
By the way, did you answer Exercise 1.2.1 which asks for you to write down all the ways that
square (square (3 + 7)) can be reduced to normal form. It turns out that there are 547 ways!
I think you may be assuming too many restrictions - taking the length of the filtered list seems like the blindingly obvious solution to me.
An couple of alternatives but both involve using some other function that you say wasn't introduced:
sum [1 | x <- xs, x < 0]
maximum (0:[index | (index, ()) <- zip [1..] [() | x <- xs, x < 0]])

Choosing unique items from a list, using recursion

As follow up to yesterday's question Erlang: choosing unique items from a list, using recursion
In Erlang, say I wanted choose all unique items from a given list, e.g.
List = [foo, bar, buzz, foo].
and I had used your code examples resulting in
NewList = [bar, buzz].
How would I further manipulate NewList in Erlang?
For example, say I not only wanted to choose all unique items from List, but also count the total number of characters of all resulting items from NewList?
In functional programming we have patterns that occur so frequently they deserve their own names and support functions. Two of the most widely used ones are map and fold (sometimes reduce). These two form basic building blocks for list manipulation, often obviating the need to write dedicated recursive functions.
Map
The map function iterates over a list in order, generating a new list where each element is the result of applying a function to the corresponding element in the original list. Here's how a typical map might be implemented:
map(Fun, [H|T]) -> % recursive case
[Fun(H)|map(Fun, T)];
map(_Fun, []) -> % base case
[].
This is a perfect introductory example to recursive functions; roughly speaking, the function clauses are either recursive cases (result in a call to iself with a smaller problem instance) or base cases (no recursive calls made).
So how do you use map? Notice that the first argument, Fun, is supposed to be a function. In Erlang, it's possible to declare anonymous functions (sometimes called lambdas) inline. For example, to square each number in a list, generating a list of squares:
map(fun(X) -> X*X end, [1,2,3]). % => [1,4,9]
This is an example of Higher-order programming.
Note that map is part of the Erlang standard library as lists:map/2.
Fold
Whereas map creates a 1:1 element mapping between one list and another, the purpose of fold is to apply some function to each element of a list while accumulating a single result, such as a sum. The right fold (it helps to think of it as "going to the right") might look like so:
foldr(Fun, Acc, [H|T]) -> % recursive case
foldr(Fun, Fun(H, Acc), T);
foldr(_Fun, Acc, []) -> % base case
Acc.
Using this function, we can sum the elements of a list:
foldr(fun(X, Sum) -> Sum + X, 0, [1,2,3,4,5]). %% => 15
Note that foldr and foldl are both part of the Erlang standard library, in the lists module.
While it may not be immediately obvious, a very large class of common list-manipulation problems can be solved using map and fold alone.
Thinking recursively
Writing recursive algorithms might seem daunting at first, but as you get used to it, it turns out to be quite natural. When encountering a problem, you should identify two things:
How can I decompose the problem into smaller instances? In order for recursion to be useful, the recursive call must take a smaller problem as its argument, or the function will never terminate.
What's the base case, i.e. the termination criterion?
As for 1), consider the problem of counting the elements of a list. How could this possibly be decomposed into smaller subproblems? Well, think of it this way: Given a non-empty list whose first element (head) is X and whose remainder (tail) is Y, its length is 1 + the length of Y. Since Y is smaller than the list [X|Y], we've successfully reduced the problem.
Continuing the list example, when do we stop? Well, eventually, the tail will be empty. We fall back to the base case, which is the definition that the length of the empty list is zero. You'll find that writing function clauses for the various cases is very much like writing definitions for a dictionary:
%% Definition:
%% The length of a list whose head is H and whose tail is T is
%% 1 + the length of T.
length([H|T]) ->
1 + length(T);
%% Definition: The length of the empty list ([]) is zero.
length([]) ->
0.
You could use a fold to recurse over the resulting list. For simplicity I turned your atoms into strings (you could do this with list_to_atom/1):
1> NewList = ["bar", "buzz"].
["bar","buzz"]
2> L = lists:foldl(fun (W, Acc) -> [{W, length(W)}|Acc] end, [], NewList).
[{"buzz",4},{"bar",3}]
This returns a proplist you can access like so:
3> proplists:get_value("buzz", L).
4
If you want to build the recursion yourself for didactic purposes instead of using lists:
count_char_in_list([], Count) ->
Count;
count_char_in_list([Head | Tail], Count) ->
count_char_in_list(Tail, Count + length(Head)). % a string is just a list of numbers
And then:
1> test:count_char_in_list(["bar", "buzz"], 0).
7

New to OCaml: How would I go about implementing Gaussian Elimination?

I'm new to OCaml, and I'd like to implement Gaussian Elimination as an exercise. I can easily do it with a stateful algorithm, meaning keep a matrix in memory and recursively operating on it by passing around a reference to it.
This statefulness, however, smacks of imperative programming. I know there are capabilities in OCaml to do this, but I'd like to ask if there is some clever functional way I haven't thought of first.
OCaml arrays are mutable, and it's hard to avoid treating them just like arrays in an imperative language.
Haskell has immutable arrays, but from my (limited) experience with Haskell, you end up switching to monadic, mutable arrays in most cases. Immutable arrays are probably amazing for certain specific purposes. I've always imagined you could write a beautiful implementation of dynamic programming in Haskell, where the dependencies among array entries are defined entirely by the expressions in them. The key is that you really only need to specify the contents of each array entry one time. I don't think Gaussian elimination follows this pattern, and so it seems it might not be a good fit for immutable arrays. It would be interesting to see how it works out, however.
You can use a Map to emulate a matrix. The key would be a pair of integers referencing the row and column. You'll want to use your own get x y function to ensure x < n and y < n though, instead of accessing the Map directly. (edit) You can use the compare function in Pervasives directly.
module OrderedPairs = struct
type t = int * int
let compare = Pervasives.compare
end
module Pairs = Map.Make (OrderedPairs)
let get_ n set x y =
assert( x < n && y < n );
Pairs.find (x,y) set
let set_ n set x y v =
assert( x < n && y < n );
Pairs.add (x,y) set v
Actually, having a general set of functions (get x y and set x y at a minimum), without specifying the implementation, would be an even better option. The functions then can be passed to the function, or be implemented in a module through a functor (a better solution, but having a set of functions just doing what you need would be a first step since you're new to OCaml). In this way you can use a Map, Array, Hashtbl, or a set of functions to access a file on the hard-drive to implement the matrix if you wanted. This is the really important aspect of functional programming; that you trust the interface over exploiting the side-effects, and not worry about the underlying implementation --since it's presumed to be pure.
The answers so far are using/emulating mutable data-types, but what does a functional approach look like?
To see, let's decompose the problem into some functional components:
Gaussian elimination involves a sequence of row operations, so it is useful first to define a function taking 2 rows and scaling factors, and returning the resultant row operation result.
The row operations we want should eliminate a variable (column) from a particular row, so lets define a function which takes a pair of rows and a column index and uses the previously defined row operation to return the modified row with that column entry zero.
Then we define two functions, one to convert a matrix into triangular form, and another to back-substitute a triangular matrix to the diagonal form (using the previously defined functions) by eliminating each column in turn. We could iterate or recurse over the columns, and the matrix could be defined as a list, vector or array of lists, vectors or arrays. The input is not changed, but a modified matrix is returned, so we can finally do:
let out_matrix = to_diagonal (to_triangular in_matrix);
What makes it functional is not whether the data-types (array or list) are mutable, but how they they are used. This approach may not be particularly 'clever' or be the most efficient way to do Gaussian eliminations in OCaml, but using pure functions lets you express the algorithm cleanly.

Resources