I'm trying to update values in a structure consisting of nested maps and sequences, but update-in won't work because I want to allow wildcards. My manual approach led me to ugly, big, nested for and into {} calls. I ended up making a function that takes the structure, a selector-like sequence, and an update function.
(defn update-each-in
([o [head & tail :as path] f]
(update-each-in o path f []))
([o [head & tail :as path] f current-path]
(cond
(empty? path) (f o current-path)
(identical? * head)
(cond
(map? o)
(into {} (for [[k v] o]
[k (update-each-in v tail f (conj current-path k))]))
:else (for [[i v] (map-indexed vector o)]
(update-each-in v tail f (conj current-path i))))
:else (assoc o head
(update-each-in (get o head) tail f (conj current-path head))))))
This allows me to simplify my updates to the following
(def sample {"TR" [{:geometry {:ID12 {:buffer 22}}}
{:geometry {:ID13 {:buffer 33}
:ID14 {:buffer 55}}}
{:geometry {:ID13 {:buffer 44}}}]
"BR" [{:geometry {:ID13 {:buffer 22}
:ID18 {:buffer 11}}}
{:geometry {:ID13 {:buffer 33}}}
{:geometry {:ID13 {:buffer 44}}}]})
(update-each-in sample [* * :geometry * :buffer]
(fn [buf path] (inc buf)))
Obviously this has a stack overflow problem with deeply nested structures; although I'm far from hitting that one, it'd be nice to have a robust solution. Can anyone suggest a simpler/faster/more elegant solution? Could this be done with reducers/transducers?
UPDATE It's a requirement that the updating function also gets the full path to the value it's updating.
update-in has exactly the same signature as the function you created, and it does almost exactly the same thing. There are two differences: it doesn't allow wildcards in the "path," and it doesn't pass intermediary paths to the update function.
Adding wildcards to update-in
I've adapted this from the source code for update-in.
(defn update-in-*
[m [k & ks] f & args]
(if (identical? k *)
(let [idx (if (map? m) (keys m) (range (count m)))]
(if ks
(reduce #(assoc % %2 (apply update-in-* (get % %2) ks f args))
m
idx)
(reduce #(assoc % %2 (apply f (get % %2) args))
m
idx)))
(if ks
(assoc m k (apply update-in-* (get m k) ks f args))
(assoc m k (apply f (get m k) args)))))
Now these two lines produce the same result:
(update-in-* sample [* * :geometry * :buffer] (fn [buf] (inc buf)))
(update-each-in sample [* * :geometry * :buffer] (fn [buf path] (inc buf)))
The change I made to update-in is just by branching on a check for the wildcard. If the wildcard is encountered, then every child-node at that level must be modified. I used reduce to keep the cumulative updates to the collection.
Also, another remark, in the interests of robustness: I'd try to use something other than * for the wildcard. It could possibly occur as the key in a map.
Adding path-tracking to update-in
If it is required that the updating function receive the full path, then I would just modify update-in one more time. The function signature changes and (conj p k) gets added, but that's about it.
(defn update-in-*
[m ks f & args] (apply update-in-*-with-path [] m ks f args))
(defn- update-in-*-with-path
[p m [k & ks] f & args]
(if (identical? k *)
(let [idx (if (map? m) (keys m) (range (count m)))]
(if ks
(reduce #(assoc % %2 (apply update-in-*-with-path (conj p k) (get % %2) ks f args))
m
idx)
(reduce #(assoc % %2 (apply f (conj p k) (get % %2) args))
m
idx)))
(if ks
(assoc m k (apply update-in-*-with-path (conj p k) (get m k) ks f args))
(assoc m k (apply f (conj p k) (get m k) args)))))
Now these two lines produce the same result:
(update-in-* sample [* * :geometry * :buffer] (fn [path val] (inc val)))
(update-each-in sample [* * :geometry * :buffer] (fn [buf path] (inc buf)))
Is this better than your original solution? I don't know. I like it because it is modeled after update-in, and other people have probably put more careful thought into update-in than I care to myself.
Related
Consider this simple-minded recursive implementation of comp in Clojure:
(defn my-comp
([f]
(fn [& args]
(apply f args)))
([f & funcs]
(fn [& args]
(f (apply (apply my-comp funcs) args)))))
The right way to do this, I am told, is using recur, but I am unsure how recur works. In particular: is there a way to coax the code above into being recurable?
evaluation 1
First let's visualize the problem. my-comp as it is written in the question will create a deep stack of function calls, each waiting on the stack to resolve, blocked until the the deepest call returns -
((my-comp inc inc inc) 1)
((fn [& args]
(inc (apply (apply my-comp '(inc inc)) args))) 1)
(inc (apply (fn [& args]
(inc (apply (apply my-comp '(inc)) args))) '(1)))
(inc (inc (apply (apply my-comp '(inc)) '(1))))
(inc (inc (apply (fn [& args]
(apply inc args)) '(1))))
(inc (inc (apply inc '(1)))) ; ⚠️ deep in the hole we go...
(inc (inc 2))
(inc 3)
4
tail-recursive my-comp
Rather than creating a long sequence of functions, this my-comp is refactored to return a single function, which when called, runs a loop over the supplied input functions -
(defn my-comp [& fs]
(fn [init]
(loop [acc init [f & more] fs]
(if (nil? f)
acc
(recur (f acc) more))))) ; 🐍 tail recursion
((my-comp inc inc inc) 1)
;; 4
((apply my-comp (repeat 1000000 inc)) 1)
;; 1000001
evaluation 2
With my-comp rewritten to use loop and recur, we can see linear iterative evaluation of the composition -
((my-comp inc inc inc) 1)
(loop 1 (list inc inc inc))
(loop 2 (list inc inc))
(loop 3 (list inc))
(loop 4 nil)
4
multiple input args
Did you notice ten (10) apply calls at the beginning of this post? This is all in service to support multiple arguments for the first function in the my-comp sequence. It is a mistake to tangle this complexity with my-comp itself. The caller has control to do this if it is the desired behavior.
Without any additional changes to the refactored my-comp -
((my-comp #(apply * %) inc inc inc) '(3 4)) ; ✅ multiple input args
Which evaluates as -
(loop '(3 4) (list #(apply * %) inc inc inc))
(loop 12 (list inc inc inc))
(loop 13 (list inc inc))
(loop 14 (list inc))
(loop 15 nil)
15
right-to-left order
Above (my-comp a b c) will apply a first, then b, and finally c. If you want to reverse that order, a naive solution would be to call reverse at the loop call site -
(defn my-comp [& fs]
(fn [init]
(loop [acc init [f & more] (reverse fs)] ; ⚠️ naive
(if (nil? f)
acc
(recur (f acc) more)))))
Each time the returned function is called, (reverse fs) will be recomputed. To avoid this, use a let binding to compute the reversal just once -
(defn my-comp [& fs]
(let [fs (reverse fs)] ; ✅ reverse once
(fn [init]
(loop [acc init [f & more] fs]
(if (nil? f)
acc
(recur (f acc) more))))))
a way to do this, is to rearrange this code to pass some intermediate function back up to the definition with recur.
the model would be something like this:
(my-comp #(* 10 %) - +)
(my-comp (fn [& args] (#(* 10 %) (apply - args)))
+)
(my-comp (fn [& args]
((fn [& args] (#(* 10 %) (apply - args)))
(apply + args))))
the last my-comp would use the first my-comp overload (which is (my-comp [f])
here's how it could look like:
(defn my-comp
([f] f)
([f & funcs]
(if (seq funcs)
(recur (fn [& args]
(f (apply (first funcs) args)))
(rest funcs))
(my-comp f))))
notice that despite of not being the possible apply target, the recur form can still accept variadic params being passed as a sequence.
user> ((my-comp (partial repeat 3) #(* 10 %) - +) 1 2 3)
;;=> (-60 -60 -60)
notice, though, that in practice this implementation isn't really better than yours: while recur saves you from stack overflow on function creation, it would still overflow on application (somebody, correct me if i'm wrong):
(apply my-comp (repeat 1000000 inc)) ;; ok
((apply my-comp (repeat 1000000 inc)) 1) ;; stack overflow
so it would probably be better to use reduce or something else:
(defn my-comp-reduce [f & fs]
(let [[f & fs] (reverse (cons f fs))]
(fn [& args]
(reduce (fn [acc curr-f] (curr-f acc))
(apply f args)
fs))))
user> ((my-comp-reduce (partial repeat 3) #(* 10 %) - +) 1 2 3)
;;=> (-60 -60 -60)
user> ((apply my-comp-reduce (repeat 1000000 inc)) 1)
;;=> 1000001
There is already a good answer above, but I think the original suggestion to use recur may have been thinking of a more manual accumulation of the result. In case you haven't seen it, reduce is just a very specific usage of loop/recur:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test))
(defn my-reduce
[step-fn init-val data-vec]
(loop [accum init-val
data data-vec]
(if (empty? data)
accum
(let [accum-next (step-fn accum (first data))
data-next (rest data)]
(recur accum-next data-next)))))
(dotest
(is= 10 (my-reduce + 0 (range 5))) ; 0..4
(is= 120 (my-reduce * 1 (range 1 6))) ; 1..5 )
In general, there can be any number of loop variables (not just 2 like for reduce). Using loop/recur gives you a more "functional" way of looping with accumulated state instead of using and atom and a doseq or something. As the name suggests, from the outside the effect is quite similar to a normal recursion w/o any stack size limits (i.e. tail-call optimization).
P.S. As this example shows, I like to use a let form to very explicitly name the values being generated for the next iteration.
P.P.S. While the compiler will allow you to type the following w/o confusion:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test))
(defn my-reduce
[step-fn accum data]
(loop [accum accum
data data]
...))
it can be a bit confusing and/or sloppy to re-use variable names (esp. for people new to Clojure or your particular program).
Also
I would be remiss if I didn't point out that the function definition itself can be a recur target (i.e. you don't need to use loop). Consider this version of the factorial:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test))
(defn fact-impl
[cum x]
(if (= x 1)
cum
(let [cum-next (* cum x)
x-next (dec x)]
(recur cum-next x-next))))
(defn fact [x] (fact-impl 1 x))
(dotest
(is= 6 (fact 3))
(is= 120 (fact 5)))
In clojure, I would like to write a tail-recursive function that memoizes its intermediate results for subsequent calls.
[EDIT: this question has been rewritten using gcd as an example instead of factorial.]
The memoized gcd (greatest common divisor) could be implemented like this:
(def gcd (memoize (fn [a b]
(if (zero? b)
a
(recur b (mod a b))))
In this implementation, intermediate results are not memoized for subsequent calls. For example, in order to calculate gcd(9,6), gcd(6,3) is called as an intermediate result. However, gcd(6,3) is not stored in the cache of the memoized function because the recursion point of recur is the anonymous function that is not memoized.
Therefore, if after having called gcd(9,6), we call gcd(6,3) we won't benefit from the memoization.
The only solution I can think about will be to use mundane recursion (explicitely call gcd instead of recur) but then we will not benefit from Tail Call Optimization.
Bottom Line
Is there a way to achieve both:
Tail call optimization
Memoization of intermediate results for subsequent calls
Remarks
This question is similar to Combine memoization and tail-recursion. But all the answers there are related to F#. Here, I am looking for an answer in clojure.
This question has been left as an exercise for the reader by The Joy of Clojure (chap 12.4). You can consult the relevant page of the book at http://bit.ly/HkQrio.
in your case it's hard to show memoize do anything with factorial because the intermediate calls are unique, so I'll rewrite a somewhat contrived example assuming the point is to explore ways to avoid blowing the stack:
(defn stack-popper [n i]
(if (< i n) (* i (stack-popper n (inc i))) 1))
which can then get something out of a memoize:
(def stack-popper
(memoize (fn [n i] (if (< i n) (* i (stack-popper n (inc i))) 1))))
the general approaches to not blowing the stack are:
use tail calls
(def stack-popper
(memoize (fn [n acc] (if (> n 1) (recur (dec n) (* acc (dec n))) acc))))
use trampolines
(def stack-popper
(memoize (fn [n acc]
(if (> n 1) #(stack-popper (dec n) (* acc (dec n))) acc))))
(trampoline (stack-popper 4 1))
use a lazy sequence
(reduce * (range 1 4))
None of these work all the time, though I have yet to hit a case where none of them work. I almost always go for the lazy ones first because I find them to be most clojure like, then I head for tail calling with recur or tramplines
(defmacro memofn
[name args & body]
`(let [cache# (atom {})]
(fn ~name [& args#]
(let [update-cache!# (fn update-cache!# [state# args#]
(if-not (contains? state# args#)
(assoc state# args#
(delay
(let [~args args#]
~#body)))
state#))]
(let [state# (swap! cache# update-cache!# args#)]
(-> state# (get args#) deref))))))
This will allow a recursive definition of a memoized function, which also caches intermediate results. Usage:
(def fib (memofn fib [n]
(case n
1 1
0 1
(+ (fib (dec n)) (fib (- n 2))))))
(def gcd
(let [cache (atom {})]
(fn [a b]
#(or (#cache [a b])
(let [p (promise)]
(deliver p
(loop [a a b b]
(if-let [p2 (#cache [a b])]
#p2
(do
(swap! cache assoc [a b] p)
(if (zero? b)
a
(recur b (mod a b))))))))))))
There is some concurrency issues (double evaluation, the same problem as with memoize, but worse because of the promises) which may be fixed using #kotarak's advice.
Turning the above code into a macro is left as an exercise to the reader. (Fogus's note was imo tongue-in-cheek.)
Turning this into a macro is really a simple exercise in macrology, please remark that the body (the 3 last lines) remain unchanged.
Using Clojure's recur you can write factorial using an accumulator that has no stack growth, and just memoize it:
(defn fact
([n]
(fact n 1))
([n acc]
(if (= 1 n)
acc
(recur (dec n)
(* n acc)))))
This is factorial function implemented with anonymous recursion with tail call and memoization of intermediate results. The memoization is integrated with the function and a reference to shared buffer (implemented using Atom reference type) is passed by a lexical closure.
Since the factorial function operates on natural numbers and the arguments for succesive results are incremental, Vector seems more tailored data structure to store buffered results.
Instead of passing the result of a previous computation as an argument (accumulator) we're getting it from the buffer.
(def ! ; global variable referring to a function
(let [m (atom [1 1 2 6 24])] ; buffer of results
(fn [n] ; factorial function definition
(let [m-count (count #m)] ; number of results in a buffer
(if (< n m-count) ; do we have buffered result for n?
(nth #m n) ; · yes: return it
(loop [cur m-count] ; · no: compute it recursively
(let [r (*' (nth #m (dec cur)) cur)] ; new result
(swap! m assoc cur r) ; store the result
(if (= n cur) ; termination condition:
r ; · base case
(recur (inc cur)))))))))) ; · recursive case
(time (do (! 8000) nil)) ; => "Elapsed time: 154.280516 msecs"
(time (do (! 8001) nil)) ; => "Elapsed time: 0.100222 msecs"
(time (do (! 7999) nil)) ; => "Elapsed time: 0.090444 msecs"
(time (do (! 7999) nil)) ; => "Elapsed time: 0.055873 msecs"
I'm trying to write a function that returns a memoized recursive function in Clojure, but I'm having trouble making the recursive function see its own memoized bindings. Is this because there is no var created? Also, why can't I use memoize on the local binding created with let?
This slightly unusual Fibonacci sequence maker that starts at a particular number is an example of what I wish I could do:
(defn make-fibo [y]
(memoize (fn fib [x] (if (< x 2)
y
(+ (fib (- x 1))
(fib (- x 2)))))))
(let [f (make-fibo 1)]
(f 35)) ;; SLOW, not actually memoized
Using with-local-vars seems like the right approach, but it doesn't work for me either. I guess I can't close over vars?
(defn make-fibo [y]
(with-local-vars [fib (fn [x] (if (< x 2)
y
(+ (#fib (- x 1))
(#fib (- x 2)))))]
(memoize fib)))
(let [f (make-fibo 1)]
(f 35)) ;; Var null/null is unbound!?!
I could of course manually write a macro that creates a closed-over atom and manage the memoization myself, but I was hoping to do this without such hackery.
There is an interesting way to do it that does rely neither on rebinding nor the behavior of def. The main trick is to go around the limitations of recursion by passing a function as an argument to itself:
(defn make-fibo [y]
(let
[fib
(fn [mem-fib x]
(let [fib (fn [a] (mem-fib mem-fib a))]
(if (<= x 2)
y
(+ (fib (- x 1)) (fib (- x 2))))))
mem-fib (memoize fib)]
(partial mem-fib mem-fib)))
Then:
> ((make-fibo 1) 50)
12586269025
What happens here:
The fib recursive function got a new argument mem-fib. This will be the memoized version of fib itself, once it gets defined.
The fib body is wrapped in a let form that redefines calls to fib so that they pass the mem-fib down to next levels of recursion.
mem-fib is defined as memoized fib
... and will be passed by partial as the first argument to itself to start the above mechanism.
This trick is similar to the one used by the Y combinator to calculate function's fix point in absence of a built-in recursion mechanism.
Given that def "sees" the symbol being defined, there is little practical reason to go this way, except maybe for creating anonymous in-place recursive memoized functions.
This seems to work:
(defn make-fibo [y]
(with-local-vars
[fib (memoize
(fn [x]
(if (< x 2)
y
(+ (fib (- x 2)) (fib (dec x))))))]
(.bindRoot fib #fib)
#fib))
with-local-vars only provides thread-local bindings for the newly created Vars, which are popped once execution leaves the with-local-vars form; hence the need for .bindRoot.
(def fib (memoize (fn [x] (if (< x 2)
x
(+ (fib (- x 1))
(fib (- x 2)))))))
(time (fib 35))
Here is the simplest solution:
(def fibo
(memoize (fn [n]
(if (< n 2)
n
(+ (fibo (dec n))
(fibo (dec (dec n))))))))
You can encapsulate the recursive memoized function pattern in a macro if you plan to use it several times.
(defmacro defmemo
[name & fdecl]
`(def ~name
(memoize (fn ~fdecl))))
Here's a cross between the Y-combinator and Clojure's memoize:
(defn Y-mem [f]
(let [mem (atom {})]
(#(% %)
(fn [x]
(f #(if-let [e (find #mem %&)]
(val e)
(let [ret (apply (x x) %&)]
(swap! mem assoc %& ret)
ret))))))))
You can macrosugar this up:
(defmacro defrecfn [name args & body]
`(def ~name
(Y-mem (fn [foo#]
(fn ~args (let [~name foo#] ~#body))))))
Now for using it:
(defrecfn fib [n]
(if (<= n 1)
n
(+' (fib (- n 1))
(fib (- n 2)))))
user=> (time (fib 200))
"Elapsed time: 0.839868 msecs"
280571172992510140037611932413038677189525N
Or the Levenshtein distance:
(defrecfn edit-dist [s1 s2]
(cond (empty? s1) (count s2)
(empty? s2) (count s1)
:else (min (inc (edit-dist s1 (butlast s2)))
(inc (edit-dist (butlast s1) s2))
((if (= (last s1) (last s2)) identity inc)
(edit-dist (butlast s1) (butlast s2))))))
Your first version actually works, but you're not getting all the benefits of memoization because you're only running through the algorithm once.
Try this:
user> (time (let [f (make-fibo 1)]
(f 35)))
"Elapsed time: 1317.64842 msecs"
14930352
user> (time (let [f (make-fibo 1)]
[(f 35) (f 35)]))
"Elapsed time: 1345.585041 msecs"
[14930352 14930352]
You can generate memoized recursive functions in Clojure with a variant of the Y combinator. For instance, the code for factorial would be:
(def Ywrap
(fn [wrapper-func f]
((fn [x]
(x x))
(fn [x]
(f (wrapper-func (fn [y]
((x x) y))))))))
(defn memo-wrapper-generator []
(let [hist (atom {})]
(fn [f]
(fn [y]
(if (find #hist y)
(#hist y)
(let [res (f y)]
(swap! hist assoc y res)
res))))))
(def Ymemo
(fn [f]
(Ywrap (memo-wrapper-generator) f)))
(def factorial-gen
(fn [func]
(fn [n]
(println n)
(if (zero? n)
1
(* n (func (dec n)))))))
(def factorial-memo (Ymemo factorial-gen))
This is explained in details in this article about Y combinator real life application: recursive memoization in clojure.
Doing the Y-Combinator for a single argument function such as factorial or fibonacci in Clojure is well documented:
http://rosettacode.org/wiki/Y_combinator#Clojure
My question is - how do you do it for a two argument function such as this getter for example?
(Assumption here is that I want to solve this problem recursively and this non-idiomatic clojure code is there deliberately for another reason)
[non y-combinator version]
(defn get_ [n lat]
(cond
(empty? lat) ()
(= 0 (- n 1)) (first lat)
true (get_ (- n 1) (rest lat))))
(get_ 3 '(a b c d e f g h i j))
The number of args doesn't change anything since the args are apply'd. You just need to change the structure of get_:
(defn get_ [f]
(fn [n lat]
(cond
(empty? lat) ()
(= 1 n) (first lat)
:else (f (dec n) (next lat)))))
(defn Y [f]
((fn [x] (x x))
(fn [x]
(f (fn [& args]
(apply (x x) args))))))
user=> ((Y getf) 3 '(a b c d e f g h i j))
c
It'd be pretty straight forward.
Say you've got a function H:
(def H
(fn [x]
(fn [x y]
(stuff happens))))
Then you apply the same ol' Y-Combinator:
((Y H) 4 5)
Where 4 and 5 are arguments you want to pass to H.
The combinator is essentially "dealing with" the top-level function in H, not the one that's doing the hard work (the one with arity 2, here).
What function can I put as FOO here to yield true at the end? I played with hash-set (only correct for first 2 values), conj, and concat but I know I'm not handling the single-element vs set condition properly with just any of those.
(defn mergeMatches [propertyMapList]
"Take a list of maps and merges them combining values into a set"
(reduce #(merge-with FOO %1 %2) {} propertyMapList))
(def in
(list
{:a 1}
{:a 2}
{:a 3}
{:b 4}
{:b 5}
{:b 6} ))
(def out
{ :a #{ 1 2 3}
:b #{ 4 5 6} })
; this should return true
(= (mergeMatches in) out)
What is the most idiomatic way to handle this?
This'll do:
(let [set #(if (set? %) % #{%})]
#(clojure.set/union (set %) (set %2)))
Rewritten more directly for the example (Alex):
(defn to-set [s]
(if (set? s) s #{s}))
(defn set-union [s1 s2]
(clojure.set/union (to-set s1) (to-set s2)))
(defn mergeMatches [propertyMapList]
(reduce #(merge-with set-union %1 %2) {} propertyMapList))
I didn't write this but it was contributed by #amitrathore on Twitter:
(defn kv [bag [k v]]
(update-in bag [k] conj v))
(defn mergeMatches [propertyMapList]
(reduce #(reduce kv %1 %2) {} propertyMapList))
I wouldn't use merge-with for this,
(defn fnil [f not-found]
(fn [x y] (f (if (nil? x) not-found x) y)))
(defn conj-in [m map-entry]
(update-in m [(key map-entry)] (fnil conj #{}) (val map-entry)))
(defn merge-matches [property-map-list]
(reduce conj-in {} (apply concat property-map-list)))
user=> (merge-matches in)
{:b #{4 5 6}, :a #{1 2 3}}
fnil will be part of core soon so you can ignore the implementation... but it just creates a version of another function that can handle nil arguments. In this case conj will substitute #{} for nil.
So the reduction conjoining to a set for every key/value in the list of maps supplied.
Another solution contributed by #wmacgyver on Twitter based on multimaps:
(defn add
"Adds key-value pairs the multimap."
([mm k v]
(assoc mm k (conj (get mm k #{}) v)))
([mm k v & kvs]
(apply add (add mm k v) kvs)))
(defn mm-merge
"Merges the multimaps, taking the union of values."
[& mms]
(apply (partial merge-with union) mms))
(defn mergeMatches [property-map-list]
(reduce mm-merge (map #(add {} (key (first %)) (val (first %))) property-map-list)))
This seems to work:
(defn FOO [v1 v2]
(if (set? v1)
(apply hash-set v2 v1)
(hash-set v1 v2)))
Not super pretty but it works.
(defn mergeMatches [propertyMapList]
(for [k (set (for [pp propertyMapList] (key (first pp))))]
{k (set (remove nil? (for [pp propertyMapList] (k pp))))}))