In which functional programming function would one grow a set of items? - functional-programming

Which of the three (if any (please provide an alternative)) would be used to add elements to a list of items?
Fold
Map
Filter
Also; how would items be added? (appended to the end / inserted after working item / other)

A list in functional programming is usually defined as a recursive data structure that is either a special empty value, or is composed of a value (dubbed "head") and another list (dubbed "tail"). In Haskell:
-- A* = 1 + A x A*
-- there is a builtin list type:
data [a] = [] | (a : [a])
To add an element at the head, you can use "cons": the function that takes a head and a tail, and produces the corresponding list.
-- (:) is "cons" in Haskell
(:) :: a -> [a] -> [a]
x = [1,2,3] -- this is short for (1:(2:(3:[])))
y = 0 : x -- y = [0,1,2,3]
To add elements at the end, you need to recurse down the list to add it. You can do this easily with a fold.
consAtEnd :: a -> [a] -> [a]
consAtEnd x = foldr [x] (:)
-- this "rebuilds" the whole list with cons,
-- but uses [x] in the place of []
-- effectively adding to the end
To add elements in the middle, you need to use a similar strategy:
consAt :: Int -> a -> [a] -> [a]
consAt n x l = consAtEnd (take n l) ++ drop n l
-- ++ is the concatenation operator: it joins two lists
-- into one.
-- take picks the first n elements of a list
-- drop picks all but the first n elements of a list
Notice that except for insertions at the head, these operations cross the whole list, which may become a performance issue.

"cons" is the low-level operation used in most functional programming languages to construct various data structure including lists. In lispy syntax it looks like this:
(cons 0 (cons 1 (cons 2 (cons 3 nil))))
Visually this is a linked list
0 -> 1 -> 2 -> 3 -> nil
Or perhaps more accurately
cons -- cons -- cons -- cons -- nil
| | | |
0 1 2 3
Of course you could construct various "tree"-like data structures with cons as well.
A tree like structure might look something like this
(cons (cons 1 2) (cons 3 4))
I.e. Visually:
cons
/ \
cons cons
/ \ / \
1 2 3 4
However most functional programming languages will provide many "higher level" functions for manipulating lists.
For example, in Haskell there's
Append: (++) :: [a] -> [a] -> [a]
List comprehension: [foo c | c <- s]
Cons: (:) :: a -> [a] -> [a] (as Martinho already mentioned)
And many many more
Just to offer a concluding remark, you wouldn't often operate on individual elements in a list in the way that you're probably thinking, this is an imperative mindset. You're more likely to copy the entire structure using a recursive function or something in that line. The compiler/virtual machine is responsible recognizing when the memory can be modified in place and updating pointers etc.

Related

Minimal `set cover` solution in Clojure

I've been trying ways to distill (a lot of) database index suggestions into a set of indexes that are applicable to most databases. To do that it turns out I need to solve a pretty basic but NP complete set theory problem: the minimum set cover problem.
This means that given a set of sets, select a subset of sets s that covers a certain domain u, but if u isn't given make it the union of s. The most optimal subset of sets is the one reaching a certain minimum, usually the minimal amount of sets, but it could also be a minimum in total weight if the sets are weighted.
(def s #{#{1 4 7} #{1 2} #{2 5 6} #{2 5} #{3} #{8 6}})
(def u (apply set/union s))
(set-cover s u)
=> (#{7 1 4} #{6 2 5} #{3} #{6 8})
I implemented a naive version of this by using clojure.math.combinatorics, relying on it returning subsets in order of increasing amounts of sets.
(defn set-cover
([s]
(set-cover s (apply set/union s)))
([s u]
(->> s
(combo/subsets)
(filter (fn [s] (= u (apply set/union s))))
first)))
However this is very slow on larger s, because of the NP nature and the recurring unions (even optimized ones). For my use-case a version supporting weighted sets would also be preferable.
Looking into optimized versions most trails ended up in thesis-land, which I'm regrettably not smart enough for. I found this small python implementation on SO
def setCover(setList,target=None):
if not setList: return None
if target is None: target = set.union(*setList)
bestCover = []
for i,values in enumerate(setList):
remaining = target - values
if remaining == target: continue
if not remaining: return [values]
subCover = setCover(setList[i+1:],remaining)
if not subCover: continue
if not bestCover or len(subCover)<len(bestCover)-1:
bestCover = [values] + subCover
return bestCover
It ticks many boxes:
work recursively
compares partial results as optimization
seems suitable for different minimum definitions: count or weight
has additional optimizations I can grok
which can be done outside of the basic algorithm
sorting input sets on high minimum score (size, weight)
identifying unique singleton sets in u not found in other sets
I have been trying to translate this into Clojure as a loop-recur function, but couldn't get the basic version of it to work, since there are niggling paradigm gaps between the two languages.
Does anyone have suggestions how I could go about solving this problem in Clojure, either by tips how to convert the python algorithm successfully, or which other Clojure (or even Java) libraries I could use and how ?
Here's a Clojure version of the greedy set cover algorithm i.e. selects a set which covers the most uncovered elements at each iteration. Rather than use loop/recur to build the complete result, it lazily returns each result element using lazy-seq:
(defn greedy-set-cover
([xs]
(greedy-set-cover xs (apply set/union xs)))
([xs us]
(lazy-seq
(when (seq us)
(let [x (apply max-key #(count (set/intersection us %)) xs)
xs' (disj xs x)
us' (set/difference us x)]
(cons x (greedy-set-cover xs' us')))))))
(def s #{#{1 4 7} #{1 2} #{2 5 6} #{2 5} #{3} #{8 6}})
(greedy-set-cover s) ;; = (#{7 1 4} #{6 2 5} #{6 8} #{3})

Invoking a continuation vs a function call

Let us consider a simple example of a factorial function, written in a pseudo-like CPS-style (enumeration&ordering of intermediate results is omitted, as will be very noisy):
(def (fact n k) (if (eq? n 0) (k 1) (fact (- n 1) (\result (k (* n result))))))
Does invoking a continuation (k 1) (i.e. "returning" a value) is somehow technically different to a "normal" function call, as in else-branch? The one thing that comes to my mind is that in this case, a continuation is the only thing that does not have another continuation in it's argument :)
Also, can you say that this computation is resembling a DFS tree walk of a dynamically constructed tree, with current computation being the currently explored node and "other unexplored branches" is a call stack/continuation?
Well, you've answered your first question: the technical difference is there's no more passing continuations: this is the point where the tension built up to this point gets released, i.e. where the accumulated computations will actually be performed [by a sequence of beta reductions].
And of course from the point of view of the language's semantics, this is an ordinary function application.
As of second question, I may understand you wrong, but I'd say that every computation is a leaf-ward tree walk, but the tree is not dynamically constructed, but rather is a static (most often infinite) object [uniquiely] defined by the program. But then, the call stack [even if it's trivial, because of tail-calls] is what you already walked through (from root to current node), while the continuation is your future path, from the point of applying continuation to something (i.e. while you're applying this fact function, the next node is fact again unless n is 0).
(fact _ id)
/ \
(= _ 0)? otherwise
| \
(id 1) (fact _ (λ (x) (id (* 2 x))))
| / \
1 (= _ 0)? otherwise
| \
(id (* 2 1))) (fact _ (λ (x) (id (* 2 (* 1 x)))))
| / \
(id 2) (= _ 0)? otherwise
| | \
2 (id (* 2 (* 1 1))) ...
|
(id (* 2 1))
|
(id 2)
|
2
If you like thinking in these directions, you might like to read about process trees in e.g.
Hatcliff's "An Introduction to Online and Offline Partial Evaluation" http://repository.readscheme.org/ftp/papers/pe98-school/hatcliff-DIKU-PE-summerschool.pdf -- btw the topic of PE is damn interesting),
and perhaps you might like (at least the first 20 or so pages) of Scott's "Lattice of flow diagrams" (https://www.cs.ox.ac.uk/files/3223/PRG03.pdf -- actually imho this paper translates "more natural" to applicative functional languages).
Hope that gives you some insights.

So: what's the point?

What is the intended purpose of the So type? Transliterating into Agda:
data So : Bool → Set where
oh : So true
So lifts a Boolean proposition up to a logical one. Oury and Swierstra's introductory paper The Power of Pi gives an example of a relational algebra indexed by the tables' columns. Taking the product of two tables requires that they have different columns, for which they use So:
Schema = List (String × U) -- U is the universe of SQL types
-- false iff the schemas share any column names
disjoint : Schema -> Schema -> Bool
disjoint = ...
data RA : Schema → Set where
-- ...
Product : ∀ {s s'} → {So (disjoint s s')} → RA s → RA s' → RA (append s s')
I'm used to constructing evidence terms for the things I want to prove about my programs. It seems more natural to construct a logical relation on Schemas to ensure disjointedness:
Disjoint : Rel Schema _
Disjoint s s' = All (λ x -> x ∉ cols s) (cols s')
where cols = map proj₁
So seems to have serious disadvantages compared to a "proper" proof-term: pattern matching on oh doesn't give you any information with which you could make another term type-check (Does it?) - which would mean So values can't usefully participate in interactive proving. Contrast this with the computational usefulness of Disjoint, which is represented as a list of proofs that each column in s' doesn't appear in s.
I don't really believe that the specification So (disjoint s s') is simpler to write than Disjoint s s' - you have to define the Boolean disjoint function without help from the type-checker - and in any case Disjoint pays for itself when you want to manipulate the evidence contained therein.
I am also sceptical that So saves effort when you're constructing a Product. In order to give a value of So (disjoint s s'), you still have to do enough pattern matching on s and s' to satisfy the type checker that they are in fact disjoint. It seems like a waste to discard the evidence thus generated.
So seems unwieldy for both authors and users of code in which it's deployed. 'So', under what circumstances would I want to use So?
If you already have a b : Bool, you can turn it into proposition: So b, which is a bit shorther than b ≡ true. Sometimes (I don't remember any actual case) there is no need to bother with a proper data type, and this quick solution is enough.
So seems to have serious disadvantages compared to a "proper"
proof-term: pattern matching on oh doesn't give you any information
with which you could make another term type-check. As a corollary,
So values can't usefully participate in interactive proving.
Contrast this with the computational usefulness of Disjoint, which
is represented as a list of proofs that each column in s' doesn't
appear in s.
So does give you the same information as Disjoint — you just need to extract it. Basically, if there is no inconsistency between disjoint and Disjoint, then you should be able to write a function So (disjoint s) -> Disjoint s using pattern matching, recursion and impossible cases elimination.
However, if you tweak the definition a bit:
So : Bool -> Set
So true = ⊤
So false = ⊥
So becomes a really useful data type, because x : So true immediately reduces to tt due to the eta-rule for ⊤. This allows to use So like a constraint: in pseudo-Haskell we could write
forall n. (n <=? 3) => Vec A n
and if n is in canonical form (i.e. suc (suc (suc ... zero))), then n <=? 3 can be checked by the compiler and no proofs are needed. In actual Agda it is
∀ {n} {_ : n <=? 3} -> Vec A n
I used this trick in this answer (it is {_ : False (m ≟ 0)} there). And I guess it would be impossible to write a usable version of the machinery decribed here without this simple definition:
Is-just : ∀ {α} {A : Set α} -> Maybe A -> Set
Is-just = T ∘ isJust
where T is So in the Agda's standard library.
Also, in the presence of instance arguments So-as-a-data-type can be used as So-as-a-constraint:
open import Data.Bool.Base
open import Data.Nat.Base
open import Data.Vec
data So : Bool -> Set where
oh : So true
instance
oh-instance : So true
oh-instance = oh
_<=_ : ℕ -> ℕ -> Bool
0 <= m = true
suc n <= 0 = false
suc n <= suc m = n <= m
vec : ∀ {n} {{_ : So (n <= 3)}} -> Vec ℕ n
vec = replicate 0
ok : Vec ℕ 2
ok = vec
fail : Vec ℕ 4
fail = vec

What determines when a collection is created?

If I understand correctly Clojure can return lists (as in other Lisps) but also vectors and sets.
What I don't really get is why there's not always a collection that is returned.
For example if I take the following code:
(loop [x 128]
(when (> x 1)
(println x)
(recur (/ x 2))))
It does print 128 64 32 16 8 4 2. But that's only because println is called and println has the side-effect (?) of printing something.
So I tried replacing it with this (removing the println):
(loop [x 128]
(when (> x 1)
x
(recur (/ x 2))))
And I was expecting to get some collecting (supposedly a list), like this:
(128 64 32 16 8 4 2)
but instead I'm getting nil.
I don't understand which determines what creates a collection and what doesn't and how you switch from one to the other. Also, seen that Clojure somehow encourages a "functional" way of programming, aren't you supposed to nearly always return collections?
Why are so many functions that apparently do not return any collection? And what would be an idiomatic way to make these return collections?
For example, how would I solve the above problem by first constructing a collection and then iterating (?) in an idiomatic way other the resulting list/vector?
First I don't know how to transform the loop so that it produces something else than nil and then I tried the following:
(reduce println '(1 2 3))
But it prints "1 2nil 3nil" instead of the "1 2 3nil" I was expecting.
I realize this is basic stuff but I'm just starting and I'm obviously missing basic stuff here.
(P.S.: retag appropriately, I don't know which terms I should use here)
A few other comments have pointed out that when doesn't really work like if - but I don't think that's really your question.
The loop and recur forms create an iteration - like a for loop in other languages. In this case, when you are printing, it is indeed just for the side effects. If you want to return a sequence, then you'll need to build one:
(loop [x 128
acc []]
(if (< x 1)
acc
(recur (/ x 2)
(cons x acc))))
=> (1 2 4 8 16 32 64 128)
In this case, I replaced the spot where you were calling printf with a recur and a form that adds x to the front of that accumulator. In the case that x is less than 1, the code returns the accumulator - and thus a sequence. If you want to add to the end of the vector instead of the front, change it to conj:
(loop [x 128
acc []]
(if (< x 1)
acc
(recur (/ x 2)
(conj acc x))))
=> [128 64 32 16 8 4 2 1]
You were getting nil because that was the result of your expression -- what the final println returned.
Does all this make sense?
reduce is not quite the same thing -- it is used to reduce a list by repeatedly applying a binary function (a function that takes 2 arguments) to either an initial value and the first element of a sequence, or the first two elements of the sequence for the first iteration, then subsequent iterations are passed the result of the previous iteration and the next value from the sequence. Some examples may help:
(reduce + [1 2 3 4])
10
This executes the following:
(+ 1 2) => 3
(+ 3 3) => 6
(+ 6 4) => 10
Reduce will result in whatever the final result is from the binary function being executed -- in this case we're reducing the numbers in the sequence into the sum of all the elements.
You can also supply an initial value:
(reduce + 5 [1 2 3 4])
15
Which executes the following:
(+ 5 1) => 6
(+ 6 2) => 8
(+ 8 3) => 11
(+ 11 4) => 15
HTH,
Kyle
The generalized abstraction over collection is called a sequence in Clojure and many data structure implement this abstraction so that you can use all sequence related operations on those data structure without thinking about which data structure is being passed to your function(s).
As far as the sample code is concerned - the loop, recur is for recursion - so basically any problem that you want to solve using recursion can be solved using it, classic example being factorial. Although you can create a vector/list using loop - by using the accumulator as a vector and keep appending items to it and in the exist condition of recursion returning the accumulated vector - but you can use reductions and take-while functions to do so as shown below. This will return a lazy sequence.
Ex:
(take-while #(> % 1) (reductions (fn [s _] (/ s 2)) 128 (range)))

Map and Reduce Monad for Clojure... What about a Juxt Monad?

Whilst learning Clojure, I've spent ages trying to make sense of monads - what they are and how we can use them.... with not too much success. However, I found an excellent 'Monads for Dummies' Video Series - http://vimeo.com/20717301 - by Brian Marik for Clojure
So far, my understanding of monads is that it is sort of like a macro in that it allows a set of statements to be written in a form that is easy to read - but monads are much more formalised. My observations are limited to two examples:
1. The Identity Monad (or the 'let' monad) taken from http://onclojure.com/2009/03/05/a-monad-tutorial-for-clojure-programmers-part-1/
The form that we wish to write is:
(let [a 1
b (inc a)]
(* a b))
and the corresponding monad is
(domonad identity-m
[a 1
b (inc a)]
(* a b))
2. The Sequence Monad (or the 'for' monad) taken from http://onclojure.com/2009/03/06/a-monad-tutorial-for-clojure-programmers-part-2/
The form we wish to write is:
(for [a (range 5)
b (range a)]
(* a b))
and the corresponding monad is
(domonad sequence-m
[a (range 5)
b (range a)]
(* a b))
Monad Definitions in Clojure
Looking at the source, using clojure monads library - https://github.com/clojure/algo.monads:
user=>(use 'clojure.algo.monads)
nil
indentity monad:
user=> (source identity-m)
(defmonad identity-m
[m-result identity
m-bind (fn m-result-id [mv f]
(f mv))
])
sequence monad:
user=> (source sequence-m)
(defmonad sequence-m
[m-result (fn m-result-sequence [v]
(list v))
m-bind (fn m-bind-sequence [mv f]
(flatten* (map f mv)))
m-zero (list)
m-plus (fn m-plus-sequence [& mvs]
(flatten* mvs))
])
So my conclusion is that a monad is some sort of a generalised higher-order function that takes in an input-function and input-values, adds its own control logic and spits out a 'thing' that can be used in a 'domonad' block.
Question 1
So finally, to the questions: I want to learn how to write a monad and say I want to write a 'map monad' that imitates the 'map' form in clojure:
(domonad map-m
[a [1 2 3 4 5]
b [5 6 7 8 9]]
(+ a b))
Should be equivalent to
(map + [1 2 3 4 5] [5 6 7 8 9])
and return the values
[6 8 10 12 14]
If I look at the source, it should give me something similar to identity-m and sequence-m:
user=> (source map-m)
(defmonad map-m
[m-result ...
m-bind ...
m-zero ...
m-plus ...
])
Question 2
I also want to be able to define 'reduce-m' such that I can write:
(domonad reduce-m
[a [1 2 3 4 5]]
(* a))
this could potentially give me 1 x 2 x 3 x 4 x 5 = 120 or
(domonad reduce-m
[a [1 2 3 4 5]
b [1 2 3 4 5]]
(+ a b))
will give me (1+2+3+4+5) + (1+2+3+4+5) = 30
Finally
Would I also be able to write a 'juxt monad' that imitates the juxt function but instead of passing in values for binding, I pass in a set of functions. :
(domonad juxt-m
[a #(+ % 1)
b #(* % 2)]
'([1 2 3 4 5] b a) )
gives
[ [2 2] [4 3] [6 4] [8 5] [9 6] ]
Potentially, I could do all of those things with macros so I don't really know how useful these 'monads' will be or if they are even considered 'monads'... With all the resources on the internet, It seems to me that if I wanted to learn Monads properly, I have to learn Haskell and right now, learning another syntactic form is just too hard. I think I found some links that maybe relevant but it is too cryptic for me
Please can someone shed some light!
Your examples are not monads. A monad represents composable computational steps. In the trivial identity monad, the computational step is just an expression evaluation.
In the maybe monad, a step is an expression that may succeed or fail.
In the sequence monad, a step is an expression that produces a variable number of results (the elements of the sequence).
In the writer monad, a computational step is a combination of expression evaluation and log output. In the state monad, a computational step involves accessing and/or modifying a piece of mutable state.
In all these cases, the monad plumbery takes care of correctly combining steps. The m-result function packages a "plain" value to fit into the monadic computation scheme, and the m-bind function feeds the result of one computational step into the next computational step.
In (map + a b), there are no computational steps to be combined. There is no notion of order. It's just nested expression evaluation. Same for reduce.
Your questions are not a type of monads. They seems more like a syntactic sugar and that you can accomplish using macros and I won't even suggest you do that because map, reduce etc are simple functions and there is no need to make their interface complex.
The reason these cases are not monads because moands are sort of amplified values which wrap normal values. In map and reduce case the vector that you use doesn't needs to be amplified to allow it to work in map or reduce.
It would be helpful for you to try macroexpand on the domoand expressions.
For ex: the sequence monad example should expand to something like:
(bind (result (range 5))
(fn [a]
(bind (result (range a))
(fn [b] (* a b))
)))
Where the bind and result are functions defined in the sequence monad m-bind and m-result.
So basically the vector expression(s) after the domand get nested and the expression(s) after the vector are used as it is in above case the (* a b) is called as it is (just that the a and b values are provided by the monad). In your example of map monad the vector expressions are supposed to be as it is and the last expression (+ a b) should somehow mean (map + a b) which is not what a monad is supposed to do.
I found some really good monads resources:
http://www.clojure.net/tags.html#monads-ref (Jim Duey's Monads Guide, which really goes into the nitty gritty monad definitions)
http://homepages.inf.ed.ac.uk/wadler/topics/monads.html#marktoberdorf (A whole load of papers on monads)
http://vimeo.com/17207564 (A talk on category theory, which I half followed)
So from Jim's Guide - http://www.clojure.net/2012/02/06/Legalities/ - he gives three axioms for definition of 'bind-m' and 'reduce-m' functions:
Identity
The first Monad Law can be written as
(m-bind (m-result x) f) is equal to (f x)
What this means is that whatever m-result does to x to make it into a monadic value, m-bind undoes before applying f to x. So with regards to m-bind, m-result is sort of like an identity function. Or in category theory terminology, its unit. Which is why sometimes you’ll see it named ‘unit’.
Reverse Identity The second Monad Law can be written as
(m-bind mv m-result) is equal to mv where mv is a monadic value.
This law is something like a complement to the first law. It basically ensures that m-result is a monadic function and that whatever m-bind does to a monadic value to extract a value, m-result undoes to create a monadic value.
Associativity
The third Monad Law can be written as
(m-bind (m-bind mv f) g) is equal to (m-bind mv (fn [x] (m-bind (f x) g)))
where f and g are monadic functions and mv is a monadic value.
What this law is saying is that it doesnt matter whether the f is applied to mv and then g is applied to the result, or whether a new monadic function is created out of a composition of f and g which is then applied to mv. In either order, the resulting value is the same monadic value. Stated another way, m-bind is left and right associative.
In http://www.clojure.net/2012/02/04/Sets-not-lists/ he gives an monad that takes sets as inputs instead of taking lists. Will work through all the examples...

Resources