What would be the best way to implement a bi-directional map in clojure? (By bi-directional map, I mean an associative map which can provide both A->B and B->A access. So in effect, the values themselves would be keys for going in the opposite direction.)
I suppose I could set up two maps, one in each direction, but is there a more idiomatic way of doing this?
I'm interested both in cases where we want a bijection, implying that no two keys could map to the same value, and cases where that condition isn't imposed.
You could always use a Java library for this, like one of the collections in Apache commons. TreeBidiMap implements java.util.Map so it's even seq-able without any effort.
user> (def x (org.apache.commons.collections.bidimap.TreeBidiMap.))
#'user/x
user> (.put x :foo :bar)
nil
user> (keys x)
(:foo)
user> (.getKey x :bar)
:foo
user> (:foo x)
:bar
user> (map (fn [[k v]] (str k ", " v)) x)
(":foo, :bar")
Some things won't work though, like assoc and dissoc, since they expect persistent collections and TreeBidiMap is mutable.
If you really want to do this in native Clojure, you could use metadata to hold the reverse-direction hash. This is still going to double your memory requirements and double the time for every add and delete, but lookups will be fast enough and at least everything is bundled.
(defn make-bidi []
(with-meta {} {}))
(defn assoc-bidi [h k v]
(vary-meta (assoc h k v)
assoc v k))
(defn dissoc-bidi [h k]
(let [v (h k)]
(vary-meta (dissoc h k)
dissoc v)))
(defn getkey [h v]
((meta h) v))
You'd probably have to implement a bunch of other functions to get full functionality of course. Not sure how feasible this approach is.
user> (def x (assoc-bidi (make-bidi) :foo :bar))
#'user/x
user> (:foo x)
:bar
user> (getkey x :bar)
:foo
For most cases, I've found the following works fine:
(defn- bimap
[a-map]
(merge a-map (clojure.set/map-invert a-map)))
This will simply merge the original map with the inverted map into a new map, then you can use regular map operations with the result.
It's quick and dirty, but obviously you should avoid this implementation if any of the keys might be the same as any of the values or if the values in a-map aren't unique.
We can reframe this problem as follows:
Given a list of tuples ([a1 b1] [a2 b2] ...) we want to be able to quickly lookup tuples by both a and b. So we want mappings a -> [a b] and b -> [a b]. Which means that at least tuples [a b] could be shared. And if we think about implementing it then it quickly becomes apparent that this is an in-memory database table with columns A and B that is indexed by both columns.
So you can just use a lightweight in-memory database.
Sorry, I'm not familiar with Java Platform enough to recommend something specific. Of cause Datomic comes to mind, but it could be an overkill for this particular use case. On the other hand it has immutability baked in which seems desirable to you based on your comment to Brian's answer.
Related
I've looked at everything I can find about letrec, and I still don't understand what it brings to a language as a feature. It seems like everything expressible with letrec could just as easily be written as a recursive function. But are there any reasons to expose letrec as a feature of a programming language, if the language already supports recursive functions? Why do several languages expose both?
I get that letrec might be used to implement other features including recursive functions, but that's not relevant to why it should itself be a feature. I've also read that some people find it more readable than recursive functions in some lisps, but again this is not relevant, because the designer of the language can make an effort to make recursive functions readable enough to not need another feature. Finally, I've been told that letrec makes it possible to express some kinds of recursive values more succinctly, but I have yet to find a motivating example.
TL;DR: define is letrec. This is what enables us to write recursive defintions in the first place.
Consider
let fact = fun (n => (n==0 -> 1 ; n * fact (n-1)))
To what entity does the name fact inside the body of this definiton refer? With let foo = val, val is defined in terms of already known entities, so it can't refer to foo which is not defined yet. In terms of scope this can be said (and usually is) that the RHS of the let equation is defined in the outer scope.
The only way for the inner fact to actually point at the one being defined, is to use letrec, where the entity being defined is allowed to refer to the scope in which it is being defined. So while causing evaluation of an entity while its definition is in progress is an error, storing a reference to its (future, at this point in time) value is fine -- in the case of using letrec that is.
The define you refer to, is just letrec under another name. In Scheme as well.
Without the ability of an entity being defined to refer to itself, i.e. in languages with non-recursive let, to have recursion one has to resort to the use of arcane devices such as the y-combinator. Which is cumbersome and usually inefficient. Another way is the definitions like
let fact = (fun (f => f f)) (fun (r => n => (n==0 -> 1 ; n * r r (n-1))))
So letrec brings to the table the efficiency of implementation, and convenience for a programmer.
The quesion then becomes, why expose the non-recursive let? Haskell indeed does not. Scheme has both letrec and let. One reason might be for completeness. Another might be a simpler implementation for let, with less self-referential run-time structures in memory making it easier on the garbage collector.
You ask for a motivational example. Consider defining Fibonacci numbers as a self-referential lazy list:
letrec fibs = {0} + {1} + add fibs (tail fibs)
With non-recursive let another copy of the list fibs will be defined, to be used as the input to the element-wise addition function add. Which will cause the definition of another copy of fibs for this one to be defined in its terms. And so on; accessing the nth Fibonacci number will cause a chain of n-1 lists to be created and maintained at run-time! Not a pretty picture.
And that's assuming the same fibs was used for tail fibs as well. If not, all bets are off.
What is needed is that fibs uses itself, refers to itself, so only one copy of the list is maintained.
NB: Although this is not a Scheme specific problem I'm using Scheme to demonstrate the differences. Hope you can read a little lisp code
A letrec is just a special let where the bindings themselves are defined before the expressions that represent their values are evaluated. Imagine this:
(define (fib n)
(let ((fib (lambda (n a b)
(if (zero? n)
a
(fib (- n 1) b (+ a b))))))
(fib n))
This code fails since while fib does exist in the body of the let it does exist in the closure it defines since the binding didn't exist when the lambda was evaluated. To fix this letrec comes to the rescue:
(define (fib n)
(letrec ((fib (lambda (n a b)
(if (zero? n)
a
(fib (- n 1) b (+ a b))))))
(fib n))
That letrec is just syntax that does something like this:
(define (fib n)
(let ((fib 'undefined))
(let ((tmp (lambda (n a b)
(if (zero? n)
a
(fib (- n 1) b (+ a b))))))
(set! fib tmp))
(fib n)))
So here you clearly see fib exists when the lambda gets evaluated and the binding is later set to the closure itself. The binding is the same, only it's pointer has changed. It's circular reference 101..
So what happens when you make a global function? Clearly if it is to recurse it needs to exist before the lambda is evaluated or the environment has to be mutated. It needs to fix the same problem here too.
In a functional language implementation where mutation is not ok you can solve this problem with a Y (or Z) combinator.
If you are interested in how languages are implemented I suggest you start at Matt Mights articles.
I'm studying lambda calculus with the book "An Introduction to Functional Programming Through Lambda Calculus" by Greg Michaelson.
I implement examples in Clojure using only a subset of the language. I only allow :
symbols
one-arg lambda functions
function application
var definition for convenience.
So far I have those functions working :
(def identity (fn [x] x))
(def self-application (fn [s] (s s)))
(def select-first (fn [first] (fn [second] first)))
(def select-second (fn [first] (fn [second] second)))
(def make-pair (fn [first] (fn [second] (fn [func] ((func first) second))))) ;; def make-pair = λfirst.λsecond.λfunc.((func first) second)
(def cond make-pair)
(def True select-first)
(def False select-second)
(def zero identity)
(def succ (fn [n-1] (fn [s] ((s False) n-1))))
(def one (succ zero))
(def zero? (fn [n] (n select-first)))
(def pred (fn [n] (((zero? n) zero) (n select-second))))
But now I am stuck on recursive functions. More precisely on the implementation of add. The first attempt mentioned in the book is this one :
(def add-1
(fn [a]
(fn [b]
(((cond a) ((add-1 (succ a)) (pred b))) (zero? b)))))
((add zero) zero)
Lambda calculus rules of reduction force to replace the inner call to add-1 with the actual definition that contains the definition itself... endlessly.
In Clojure, wich is an application order language, add-1 is also elvaluated eagerly before any execution of any kind, and we got a StackOverflowError.
After some fumblings, the book propose a contraption that is used to avoid the infinite replacements of the previous example.
(def add2 (fn [f]
(fn [a]
(fn [b]
(((zero? b) a) (((f f) (succ a)) (pred b)))))))
(def add (add2 add2))
The definition of add expands to
(def add (fn [a]
(fn [b]
(((zero? b) a) (((add2 add2) (succ a)) (pred b))))))
Which is totally fine until we try it! This is what Clojure will do (referential transparency) :
((add zero) zero)
;; ~=>
(((zero? zero) zero) (((add2 add2) (succ zero)) (pred zero)))
;; ~=>
((select-first zero) (((add2 add2) (succ zero)) (pred zero)))
;; ~=>
((fn [second] zero) ((add (succ zero)) (pred zero)))
On the last line (fn [second] zero) is a lambda that expects one argument when applied. Here the argument is ((add (succ zero)) (pred zero)).
Clojure is an "applicative order" language so the argument is evaluated before function application, even if in that case the argument won't be used at all. Here we recur in add that will recur in add... until the stack blows up.
In a language like Haskell I think that would be fine because it's lazy (normal order), but I'm using Clojure.
After that, the book go in length presenting the tasty Y-combinator that avoid the boilerplate but I came to the same gruesome conclusion.
EDIT
As #amalloy suggests, I defined the Z combinator :
(def YC (fn [f] ((fn [x] (f (fn [z] ((x x) z)))) (fn [x] (f (fn [z] ((x x) z)))))))
I defined add2 like this :
(def add2 (fn [f]
(fn [a]
(fn [b]
(((zero? b) a) ((f (succ a)) (pred b)))))))
And I used it like this :
(((YC add2) zero) zero)
But I still get a StackOverflow.
I tried to expand the function "by hand" but after 5 rounds of beta reduction, it looks like it expands infinitely in a forest of parens.
So what is the trick to make Clojure "normal order" and not "applicative order" without macros. Is it even possible ? Is it even the solution to my question ?
This question is very close to this one : How to implement iteration of lambda calculus using scheme lisp? . Except that mine is about Clojure and not necessarily about Y-Combinator.
For strict languages, you need the Z combinator instead of the Y combinator. It's the same basic idea but replacing (x x) with (fn [v] (x x) v) so that the self-reference is wrapped in a lambda, meaning it is only evaluated if needed.
You also need to fix your definition of booleans in order to make them work in a strict language: you can't just pass it the two values you care about and select between them. Instead, you pass it thunks for computing the two values you care about, and call the appropriate function with a dummy argument. That is, just as you fix the Y combinator by eta-expanding the recursive call, you fix booleans by eta-expanding the two branches of the if and eta-reduce the boolean itself (I'm not 100% sure that eta-reducing is the right term here).
(def add2 (fn [f]
(fn [a]
(fn [b]
((((zero? b) (fn [_] a)) (fn [_] ((f (succ a)) (pred b)))) b)))))
Note that both branches of the if are now wrapped with (fn [_] ...), and the if itself is wrapped with (... b), where b is a value I chose arbitrarily to pass in; you could pass zero instead, or anything at all.
The problem I'm seeing is that you have too strong of a coupling between your Clojure program and your Lambda Calculus program
you're using Clojure lambdas to represent LC lambdas
you're using Clojure variables/definitions to represent LC variables/definitions
you're using Clojure's application mechanism (Clojure's evaluator) as LC's application mechanism
So you're actually writing a clojure program (not an LC program) that is subject to the effects of the clojure compiler/evaluator – which means strict evaluation and non-constant-space direction recursion. Let's look at:
(def add2 (fn [f]
(fn [a]
(fn [b]
(((zero? b) a) ((f (succ a)) (pred b)))))))
As a Clojure program, in a strictly evaluated environment, each time we call add2, we evaluate
(zero? b) as value1
(value1 a) as value2
(succ a) as value3
(f value2) as value4
(pred b) as value5
(value2 value4) as value6
(value6 value5)
We can now see that calling to add2 always results in call to the recursion mechanism f – of course the program never terminates and we get a stack overflow!
You have a few options
per #amalloy's suggestions, use thunks to delay the evaluation of certain expressions and then force (run) them when you're ready to continue the computation – tho I don't think this is going to teach you much
you can use Clojure's loop/recur or trampoline for constant-space recursions to implement your Y or Z combinator – there's a little hang-up here tho because you're only wishing to support single-parameter lambdas, and it's going to be a tricky (maybe impossible) to do so in a strict evaluator that doesn't optimise tail calls
I do a lot of this kind of work in JS because most JS machines suffer the same problem; if you're interested in seeing homebrew workarounds, check out: How do I replace while loops with a functional programming alternative without tail call optimization?
write an actual evaluator – this means you can decouple your the representation of your Lambda Calculus program from datatypes and behaviours of Clojure and Clojure's compiler/evaluator – you get to choose how those things work because you're the one writing the evaluator
I've never done this exercise in Clojure, but I've done it a couple times in JavaScript – the learning experience is transformative. Just last week, I wrote https://repl.it/Kluo which is uses a normal order substitution model of evaluation. The evaluator here is not stack-safe for large LC programs, but you can see that recursion is supported via Curry's Y on line 113 - it supports the same recursion depth in the LC program as the underlying JS machine supports. Here's another evaluator using memoisation and the more familiar environment model: https://repl.it/DHAT/2 – also inherits the recursion limit of the underlying JS machine
Making recursion stack-safe is really difficult in JavaScript, as I linked above, and (sometimes) considerable transformations need to take place in your code before you can make it stack-safe. It took me two months of many sleepless nights to adapt this to a stack-safe, normal-order, call-by-need evaluator: https://repl.it/DIfs/2 – this is like Haskell or Racket's #lang lazy
As for doing this in Clojure, the JavaScript code could be easily adapted, but I don't know enough Clojure to show you what a sensible evaluator program might look like – In the book, Structure and Interpretation of Computer Programs,
(chapter 4), the authors show you how to write an evaluator for Scheme (a Lisp) using Scheme itself. Of course this is 10x more complicated than primitive Lambda Calculus, so it stands to reason that if you can write a Scheme evaluator, you can write an LC one too. This might be more helpful to you because the code examples look much more like Clojure
a starting point
I studied a little Clojure for you and came up with this – it's only the beginning of a strict evaluator, but it should give you an idea of how little work it takes to get pretty close to a working solution.
Notice we use a fn when we evaluate a 'lambda but this detail is not revealed to the user of the program. The same is true for the env – ie, the env is just an implementation detail and should not be the user's concern.
To beat a dead horse, you can see that the substitution evaluator and the environment-based evaluator both arrive at the equivalent answers for same input program – I can't stress enough how these choices are up to you – in SICP, the authors even go on to change the evaluator to use a simple register-based model for binding variables and calling procs. The possibilities are endless because we've elected to control the evaluation; writing everything in Clojure (as you did originally) does not give us that kind of flexibility
;; lambda calculus expression constructors
(defn variable [identifier]
(list 'variable identifier))
(defn lambda [parameter body]
(list 'lambda parameter body))
(defn application [proc argument]
(list 'application proc argument))
;; environment abstraction
(defn empty-env []
(hash-map))
(defn env-get [env key]
;; implement
)
(defn env-set [env key value]
;; implement
)
;; meat & potatoes
(defn evaluate [env expr]
(case (first expr)
;; evaluate a variable
variable (let [[_ identifier] expr]
(env-get env identifier))
;; evaluate a lambda
lambda (let [[_ parameter body] expr]
(fn [argument] (evaluate (env-set env parameter argument) body)))
;; evaluate an application
;; this is strict because the argument is evaluated first before being given to the evaluated proc
application (let [[_ proc argument] expr]
((evaluate env proc) (evaluate env argument)))
;; bad expression given
(throw (ex-info "invalid expression" {:expr expr}))))
(evaluate (empty-env)
;; ((λx.x) y)
(application (lambda 'x (variable 'x)) (variable 'y))) ;; should be 'y
* or it could throw an error for unbound identifier 'y; your choice
This is probably straightforward, but I just can't get over it.
I have a data structure that is a nested map, like this:
(def m {:1 {:1 2 :2 5 :3 10} :2 {:1 2 :2 50 :3 25} :3 {:1 42 :2 23 :3 4}})
I need to set every m[i][i]=0. This is simple in non-functional languages, but I cant make it work on Clojure. How is the idiomatic way to do so, considering that I do have a vector with every possible value? (let's call it v)
doing (map #(def m (assoc-in m [% %] 0)) v) will work, but using def inside a function on map doesn't seems right.
Making m into an atomic version and using swap! seems better. But not much It also seems to be REALLY slow.
(def am (atom m))
(map #(swap! am assoc-in[% %] 0) v)
What is the best/right way to do that?
UPDATE
Some great answers over here. I've posted a follow-up question here Clojure: iterate over map of sets that is close-related, but no so much, to this one.
You're right that it's bad form to use def inside a function. It's also bad form to use functions with side-effects (such as swap) inside map. Furthemore, map is lazy, so your map/swap attempt won't actually do anything unless it is forced with, e.g., dorun.
Now that we've covered what not to do, let's take a look at how to do it ;-).
There are several approaches you could take. Perhaps the easiest for someone coming from an imperative paradigm to start with is loop:
(defn update-m [m v]
(loop [v' v
m' m]
(if (empty? v')
;; then (we're done => return result)
m'
;; else (more to go => process next element)
(let [i (first v')]
(recur (rest v') ;; iterate over v
(assoc-in m' [i i] 0)))))) ;; accumulate result in m'
However, loop is a relatively low-level construct within the functional paradigm. Here we can note a pattern: we are looping over the elements of v and accumulating changes in m'. This pattern is captured by the reduce function:
(defn update-m [m v]
(reduce (fn [m' i]
(assoc-in m' [i i] 0)) ;; accumulate changes
m ;; initial-value
v)) ;; collection to loop over
This form is quite a bit shorter, because it doesn't need the boiler-plate code the loop form requires. The reduce form might not be as easy to read at first, but once you get used to functional code, it will become more natural.
Now that we have update-m, we can use it to transform maps in the program at large. For example, we can use it to swap! an atom. Following your example above:
(swap! am update-m v)
Using def inside a function would certainly not be recommended. The OO way to look at the problem is to look inside the data structure and modify the value. The functional way to look at the problem would be to build up a new data structure that represents the old one with values changed. So we could do something like this
(into {} (map (fn [[k v]] [k (assoc v k 0)]) m))
What we're doing here is mapping over m in much the same way you did in your first example. But then instead of modifying m we return a [] tuple of key and value. This list of tuples we can then put back into a map.
The other answers are fine, but for completeness here's a slightly shorter version using a for comprehension. I find it more readable but it's a matter of taste:
(into {} (for [[k v] m] {k (assoc v k 0)}))
The classic book The Little Lisper (The Little Schemer) is founded on two big ideas
You can solve most problems in a recursive way (instead of using loops) (assuming you have Tail Call Optimisation)
Lisp is great because it is easy to implement in itself.
Now one might think this holds true for all Lispy languages (including Clojure). The trouble is, the book is an artefact of its time (1989), probably before Functional Programming with Higher Order Functions (HOFs) was what we have today.(Or was at least considered palatable for undergraduates).
The benefit of recursion (at least in part) is the ease of traversal of nested data structures like ('a 'b ('c ('d 'e))).
For example:
(def leftmost
(fn [l]
(println "(leftmost " l)
(println (non-atom? l))
(cond
(null? l) '()
(non-atom? (first l)) (leftmost (first l))
true (first l))))
Now with Functional Zippers - we have a non-recursive approach to traversing nested data structures, and can traverse them as we would any lazy data structure. For example:
(defn map-zipper [m]
(zip/zipper
(fn [x] (or (map? x) (map? (nth x 1))))
(fn [x] (seq (if (map? x) x (nth x 1))))
(fn [x children]
(if (map? x)
(into {} children)
(assoc x 1 (into {} children))))
m))
(def m {:a 3 :b {:x true :y false} :c 4})
(-> (map-zipper m) zip/down zip/right zip/node)
;;=> [:b {:y false, :x true}]
Now it seems you can solve any nested list traversal problem with either:
a zipper as above, or
a zipper that walks the structure and returns a set of keys that will let you modify the structure using assoc.
Assumptions:
I'm assuming of course data structures that fixed-size, and fully known prior to traversal
I'm excluding the streaming data source scenario.
My question is: Is recursion a smell (in idiomatic Clojure) because of of zippers and HOFs?
I would say that, yes, if you are doing manual recursion you should at least reconsider whether you need to. But I wouldn't say that zippers have anything to do with this. My experience with zippers has been that they are of theoretical use, and are very exciting to Clojure newcomers, but of little practical value once you get the hang of things, because the situations in which they are useful are vanishingly rare.
It's really because of higher-order functions that have already implemented the common recursive patterns for you that manual recursion is uncommon. However, it's certainly not the case that you should never use manual recursion: it's just a warning sign, suggesting you might be able to do something else. I can't even recall a situation in my four years of using Clojure that I've actually needed a zipper, but I end up using recursion fairly often.
Clojure idioms discourage explicit recursion because the call stack is limited: usually to about 10K deep. Amending the first of Halloway & Bedra's Six Rules of Clojure Functional Programming (Programming Clojure (p 89)),
Avoid unbounded recursion. The JVM cannot optimize recursive calls and
Clojure programs that recurse without bound will blow their stack.
There are a couple of palliatives:
recur deals with tail recursion.
Lazy sequences can turn a deep call stack into a shallow call stack
across an unfolding data structure. Many HOFs in the sequence
library, such as map and filter, do this.
Consider a function with the following signature:
(defn make-widget [& {:keys [x y] :or {x 10 y 20}}]
...)
What is the best way to pass a map to the function, e.g.:
(make-widget {:x 100})
or
(make-widget {:y 200 :x 0})
What I have currently thought of is via vec, flatten and apply e.g.:
(apply make-widget (flatten (vec ({:x 100}))
I strongly believe there is a better way to do this. Can you please consider one?
I can't think of a more elegant way, either, though it seems to me to that there should be one (like a map-specific variant of apply).
Using flatten has problems beyond not being very elegant, though. If the values of your map are collections, flatten will work recursively on those, too, so things could get totally mixed up. This alternative avoids that problem:
(apply make-widget (apply concat {:x 100}))
There is also a known (not invented by me at least), function "mapply":
(defn mapply [f & args] (apply f (apply concat (butlast args) (last args))))
which can be applied like
(mapply your-function {:your "map"})
As to why is this language-specific functionality absent from Clojure core, being implemented more natively and elegantly, no one could ever give me a clear answer.
UPDATE
After I have spent much time programming in Clojure, I personally tend to refrain from creating functions that accept a {} as a vararg. Although at first this can seem appealing, in reality experience proves that passing an explicit {} is always better for many reasons.
You can use:
(apply make-widget (mapcat identity {:x 200 :y 0}))