Clojure - Recursively Flatten Nested Maps - recursion

Given a map with the key :content, where content is a list of strings or other maps, how can I flatten out the values to receive only the strings?
(flattener {:content '("b" {:content ("c" {:content ("d")})} "e")})
> '("b" "c" "d" "e")
I'm stumbling through very hacky loop recur attempts and now my brain is burnt out. Is there a nice idiomatic way to do this in Clojure?
Thanks.
What I've got is below, and although it works, it's quite ugly
(defn flatten-content
[coll]
(loop [acc '(), l coll]
(let [fst (first l), rst (rest l)]
(cond
(empty? l) (reverse acc)
(seq? fst) (recur acc (concat fst rst))
(associative? fst) (recur acc (concat (:content fst) rst))
:else (recur (conj acc fst) rst)))))

The tree-seq function helps walk, and since your map
(def m {:content '("b" {:content ("c" {:content ("d")})} "e")})
always has a list of "children" keyed by :content, this works
(filter string? (tree-seq associative? :content m))
;=> ("b" "c" "d" "e")

The following recursive function works (and is about 25% faster than a filtered tree-seq approach):
(defn flatten-content [node]
(lazy-seq
(if (string? node)
(list node)
(mapcat flatten-content (:content node)))))

Related

Can I use `recur` in this implementation of function composition in Clojure?

Consider this simple-minded recursive implementation of comp in Clojure:
(defn my-comp
([f]
(fn [& args]
(apply f args)))
([f & funcs]
(fn [& args]
(f (apply (apply my-comp funcs) args)))))
The right way to do this, I am told, is using recur, but I am unsure how recur works. In particular: is there a way to coax the code above into being recurable?
evaluation 1
First let's visualize the problem. my-comp as it is written in the question will create a deep stack of function calls, each waiting on the stack to resolve, blocked until the the deepest call returns -
((my-comp inc inc inc) 1)
((fn [& args]
(inc (apply (apply my-comp '(inc inc)) args))) 1)
(inc (apply (fn [& args]
(inc (apply (apply my-comp '(inc)) args))) '(1)))
(inc (inc (apply (apply my-comp '(inc)) '(1))))
(inc (inc (apply (fn [& args]
(apply inc args)) '(1))))
(inc (inc (apply inc '(1)))) ; ⚠️ deep in the hole we go...
(inc (inc 2))
(inc 3)
4
tail-recursive my-comp
Rather than creating a long sequence of functions, this my-comp is refactored to return a single function, which when called, runs a loop over the supplied input functions -
(defn my-comp [& fs]
(fn [init]
(loop [acc init [f & more] fs]
(if (nil? f)
acc
(recur (f acc) more))))) ; 🐍 tail recursion
((my-comp inc inc inc) 1)
;; 4
((apply my-comp (repeat 1000000 inc)) 1)
;; 1000001
evaluation 2
With my-comp rewritten to use loop and recur, we can see linear iterative evaluation of the composition -
((my-comp inc inc inc) 1)
(loop 1 (list inc inc inc))
(loop 2 (list inc inc))
(loop 3 (list inc))
(loop 4 nil)
4
multiple input args
Did you notice ten (10) apply calls at the beginning of this post? This is all in service to support multiple arguments for the first function in the my-comp sequence. It is a mistake to tangle this complexity with my-comp itself. The caller has control to do this if it is the desired behavior.
Without any additional changes to the refactored my-comp -
((my-comp #(apply * %) inc inc inc) '(3 4)) ; ✅ multiple input args
Which evaluates as -
(loop '(3 4) (list #(apply * %) inc inc inc))
(loop 12 (list inc inc inc))
(loop 13 (list inc inc))
(loop 14 (list inc))
(loop 15 nil)
15
right-to-left order
Above (my-comp a b c) will apply a first, then b, and finally c. If you want to reverse that order, a naive solution would be to call reverse at the loop call site -
(defn my-comp [& fs]
(fn [init]
(loop [acc init [f & more] (reverse fs)] ; ⚠️ naive
(if (nil? f)
acc
(recur (f acc) more)))))
Each time the returned function is called, (reverse fs) will be recomputed. To avoid this, use a let binding to compute the reversal just once -
(defn my-comp [& fs]
(let [fs (reverse fs)] ; ✅ reverse once
(fn [init]
(loop [acc init [f & more] fs]
(if (nil? f)
acc
(recur (f acc) more))))))
a way to do this, is to rearrange this code to pass some intermediate function back up to the definition with recur.
the model would be something like this:
(my-comp #(* 10 %) - +)
(my-comp (fn [& args] (#(* 10 %) (apply - args)))
+)
(my-comp (fn [& args]
((fn [& args] (#(* 10 %) (apply - args)))
(apply + args))))
the last my-comp would use the first my-comp overload (which is (my-comp [f])
here's how it could look like:
(defn my-comp
([f] f)
([f & funcs]
(if (seq funcs)
(recur (fn [& args]
(f (apply (first funcs) args)))
(rest funcs))
(my-comp f))))
notice that despite of not being the possible apply target, the recur form can still accept variadic params being passed as a sequence.
user> ((my-comp (partial repeat 3) #(* 10 %) - +) 1 2 3)
;;=> (-60 -60 -60)
notice, though, that in practice this implementation isn't really better than yours: while recur saves you from stack overflow on function creation, it would still overflow on application (somebody, correct me if i'm wrong):
(apply my-comp (repeat 1000000 inc)) ;; ok
((apply my-comp (repeat 1000000 inc)) 1) ;; stack overflow
so it would probably be better to use reduce or something else:
(defn my-comp-reduce [f & fs]
(let [[f & fs] (reverse (cons f fs))]
(fn [& args]
(reduce (fn [acc curr-f] (curr-f acc))
(apply f args)
fs))))
user> ((my-comp-reduce (partial repeat 3) #(* 10 %) - +) 1 2 3)
;;=> (-60 -60 -60)
user> ((apply my-comp-reduce (repeat 1000000 inc)) 1)
;;=> 1000001
There is already a good answer above, but I think the original suggestion to use recur may have been thinking of a more manual accumulation of the result. In case you haven't seen it, reduce is just a very specific usage of loop/recur:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test))
(defn my-reduce
[step-fn init-val data-vec]
(loop [accum init-val
data data-vec]
(if (empty? data)
accum
(let [accum-next (step-fn accum (first data))
data-next (rest data)]
(recur accum-next data-next)))))
(dotest
(is= 10 (my-reduce + 0 (range 5))) ; 0..4
(is= 120 (my-reduce * 1 (range 1 6))) ; 1..5 )
In general, there can be any number of loop variables (not just 2 like for reduce). Using loop/recur gives you a more "functional" way of looping with accumulated state instead of using and atom and a doseq or something. As the name suggests, from the outside the effect is quite similar to a normal recursion w/o any stack size limits (i.e. tail-call optimization).
P.S. As this example shows, I like to use a let form to very explicitly name the values being generated for the next iteration.
P.P.S. While the compiler will allow you to type the following w/o confusion:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test))
(defn my-reduce
[step-fn accum data]
(loop [accum accum
data data]
...))
it can be a bit confusing and/or sloppy to re-use variable names (esp. for people new to Clojure or your particular program).
Also
I would be remiss if I didn't point out that the function definition itself can be a recur target (i.e. you don't need to use loop). Consider this version of the factorial:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test))
(defn fact-impl
[cum x]
(if (= x 1)
cum
(let [cum-next (* cum x)
x-next (dec x)]
(recur cum-next x-next))))
(defn fact [x] (fact-impl 1 x))
(dotest
(is= 6 (fact 3))
(is= 120 (fact 5)))

depth first tree traversal accumulation in clojure

I'd like to take a tree-like structure like this:
{"foo" {"bar" "1" "baz" "2"}}
and recursively traverse while remembering the path from the root in order to produce something like this:
["foo/bar/1", "foo/baz/2"]
Any suggestions on how this can be done without zippers or clojure.walk?
As nberger does, we separate enumerating the paths from presenting them as strings.
Enumeration
The function
(defn paths [x]
(if (map? x)
(mapcat (fn [[k v]] (map #(cons k %) (paths v))) x)
[[x]]))
... returns the sequence of path-sequences of a nested map. For example,
(paths {"foo" {"bar" "1", "baz" "2"}})
;(("foo" "bar" "1") ("foo" "baz" "2"))
Presentation
The function
#(clojure.string/join \/ %)
... joins strings together with "/"s. For example,
(#(clojure.string/join \/ %) (list "foo" "bar" "1"))
;"foo/bar/1"
Compose these to get the function you want:
(def traverse (comp (partial map #(clojure.string/join \/ %)) paths))
... or simply
(defn traverse [x]
(->> x
paths
(map #(clojure.string/join \/ %))))
For example,
(traverse {"foo" {"bar" "1", "baz" "2"}})
;("foo/bar/1" "foo/baz/2")
You could entwine these as a single function: clearer and more useful to
separate them, I think.
The enumeration is not lazy, so it will run out of
stack space on deeply enough nested maps.
This is my attempt using tree-seq clojure core function.
(def tree {"foo" {"bar" "1" "baz" "2"}})
(defn paths [t]
(let [cf (fn [[k v]]
(if (map? v)
(->> v
(map (fn [[kv vv]]
[(str k "/" kv) vv]))
(into {}))
(str k "/" v)))]
(->> t
(tree-seq map? #(map cf %))
(remove map?)
vec)))
(paths tree) ; => ["foo/bar/1" "foo/baz/2"]
Map keys are used to accumulate paths.
I did something real quick using accumulator, but it isn't depth first.
(defn paths [separator tree]
(let [finished? (fn [[_ v]] ((complement map?) v))]
(loop [finished-paths nil
path-trees (seq tree)]
(let [new-paths (mapcat
(fn [[path children]]
(map
(fn [[k v]]
(vector (str path separator k) v))
children))
path-trees)
finished (->> (filter finished? new-paths)
(map
(fn [[k v]]
(str k separator v)))
(concat finished-paths))
remaining-paths (remove finished? new-paths)]
(if (seq remaining-paths)
(recur finished remaining-paths)
finished)))))
In the repl
(clojure-scratch.core/paths "/" {"foo" {"bar" {"bosh" "1" "bash" "3"} "baz" "2"}})
=> ("foo/baz/2" "foo/bar/bash/3" "foo/bar/bosh/1")
The following uses recursive depth first traversal:
(defn combine [k coll]
(mapv #(str k "/" %) coll))
(defn f-map [m]
(into []
(flatten
(mapv (fn [[k v]]
(if (map? v)
(combine k (f-map v))
(str k "/" v)))
m))))
(f-map {"foo" {"bar" "1" "baz" "2"}})
=> ["foo/bar/1" "foo/baz/2"]
Here's my take:
(defn traverse [t]
(letfn [(traverse- [path t]
(when (seq t)
(let [[x & xs] (seq t)
[k v] x]
(lazy-cat
(if (map? v)
(traverse- (conj path k) v)
[[(conj path k) v]])
(traverse- path xs)))))]
(traverse- [] t)))
(traverse {"foo" {"bar" "1" "baz" "2"}})
;=> [[["foo" "bar"] "1"] [["foo" "baz"] "2"]]
Traverse returns a lazy seq of path-leaf pairs. You can then apply any transformation to each path-leaf, for example to the "/path/to/leaf" fullpath form:
(def ->full-path #(->> (apply conj %) (clojure.string/join "/")))
(->> (traverse {"foo" {"bar" "1" "baz" "2"}})
(map ->full-path))
;=> ("foo/bar/1" "foo/baz/2")
(->> (traverse {"foo" {"bar" {"buzz" 4 "fizz" "fuzz"} "baz" "2"} "faa" "fee"})
(map ->full-path))
;=> ("foo/bar/buzz/4" "foo/bar/fizz/fuzz" "foo/baz/2" "faa/fee")

implementing foreach (doseq) in clojure

i'm working through SICP - one exercise is to implement foreach (doseq). This is an academic exercise. In clojure, this is what I came up with:
(defn for-each [proc, items]
(if (empty? items) nil
(do
(proc (first items))
(recur proc (rest items)))))
but, i'm a little murky about if do is cheating, because do is a special form in clojure and i don't think anything like that has been introduced yet in SICP. is there a more minimalist answer?
Here's another attempt which only executes proc on the last element:
(defn for-each-2 [proc, items]
(let [f (first items)
r (rest items)]
(if (empty? r)
(proc f)
(recur proc r))))
Use doseq and you're all set. For example:
(doseq [e '(1 2 3)]
(prn e))
Will print:
1
2
3
nil
EDIT :
If you want to implement for-each by hand and using as few special forms as possible, here's another alternative, although it ends up being almost as short as yours:
(defn for-each [f l]
(cond (empty? l) nil
:else (do (f (first l))
(recur f (rest l)))))
Interestingly, the same procedure could have been written more succinctly in Scheme, the Lisp dialect used in SICP:
(define (for-each f l)
(cond ((null? l) null)
(else (f (first l))
(for-each f (rest l)))))
Here is my attempt. It just carries function execution in an inner loop.
(defn for-each [fun, xs]
(loop [fun fun
xs xs
action nil]
(if (first xs)
(recur fun (rest xs) (fun (first xs)))
xs)))

Recursion over a list of s-expressions in Clojure

To set some context, I'm in the process of learning Clojure, and Lisp development more generally. On my path to Lisp, I'm currently working through the "Little" series in an effort to solidify a foundation in functional programming and recursive-based solution solving. In "The Little Schemer," I've worked through many of the exercises, however, I'm struggling a bit to convert some of them to Clojure. More specifically, I'm struggling to convert them to use "recur" so as to enable TCO. For example, here is a Clojure-based implementation to the "occurs*" function (from Little Schemer) which counts the number of occurrences of an atom appearing within a list of S-expressions:
(defn atom? [l]
(not (list? l)))
(defn occurs [a lst]
(cond
(empty? lst) 0
(atom? (first lst))
(cond
(= a (first lst)) (inc (occurs a (rest lst)))
true (occurs a (rest lst)))
true (+ (occurs a (first lst))
(occurs a (rest lst)))))
Basically, (occurs 'abc '(abc (def abc) (abc (abc def) (def (((((abc))))))))) will evaluate to 5. The obvious problem is that this definition consumes stack frames and will blow the stack if given a list of S-expressions too deep.
Now, I understand the option of refactoring recursive functions to use an accumulator parameter to enable putting the recursive call into the tail position (to allow for TCO), but I'm struggling if this option is even applicable to situations such as this one.
Here's how far I get if I try to refactor this using "recur" along with using an accumulator parameter:
(defn recur-occurs [a lst]
(letfn [(myoccurs [a lst count]
(cond
(empty? lst) 0
(atom? (first lst))
(cond
(= a (first lst)) (recur a (rest lst) (inc count))
true (recur a (rest lst) count))
true (+ (recur a (first lst) count)
(recur a (rest lst) count))))]
(myoccurs a lst 0)))
So, I feel like I'm almost there, but not quite. The obvious problem is my "else" clause in which the head of the list is not an atom. Conceptually, I want to sum the result of recurring over the first element in the list with the result of recurring over the rest of the list. I'm struggling in my head on how to refactor this such that the recurs can be moved to the tail position.
Are there additional techniques to the "accumulator" pattern to achieving getting your recursive calls put into the tail position that I should be applying here, or, is the issue simply more "fundamental" and that there isn't a clean Clojure-based solution due to the JVM's lack of TCO? If the latter, generally speaking, what should be the general pattern for Clojure programs to use that need to recur over a list of S-expressions? For what it's worth, I've seen the multi method w/lazy-seq technique used (page 151 of Halloway's "Programming Clojure" for reference) to "Replace Recursion with Laziness" - but I'm not sure how to apply that pattern to this example in which I'm not attempting to build a list, but to compute a single integer value.
Thank you in advance for any guidance on this.
Firstly, I must advise you to not worry much about implementation snags like stack overflows as you make your way through The Little Schemer. It is good to be conscientious of issues like the lack of tail call optimization when you're programming in anger, but the main point of the book is to teach you to think recursively. Converting the examples accumulator-passing style is certainly good practice, but it's essentially ditching recursion in favor of iteration.
However, and I must preface this with a spoiler warning, there is a way to keep the same recursive algorithm without being subject to the whims of the JVM stack. We can use continuation-passing style to make our own stack in the form of an extra anonymous function argument k:
(defn occurs-cps [a lst k]
(cond
(empty? lst) (k 0)
(atom? (first lst))
(cond
(= a (first lst)) (occurs-cps a (rest lst)
(fn [v] (k (inc v))))
:else (occurs-cps a (rest lst) k))
:else (occurs-cps a (first lst)
(fn [fst]
(occurs-cps a (rest lst)
(fn [rst] (k (+ fst rst))))))))
Instead of the stack being created implicitly by our non-tail function calls, we bundle up "what's left to do" after each call to occurs, and pass it along as the next continuation k. When we invoke it, we start off with a k that represents nothing left to do, the identity function:
scratch.core=> (occurs-cps 'abc
'(abc (def abc) (abc (abc def) (def (((((abc))))))))
(fn [v] v))
5
I won't go further into the details of how to do CPS, as that's for a later chapter of TLS. However, I will note that this of course doesn't yet work completely:
scratch.core=> (def ls (repeat 20000 'foo))
#'scratch.core/ls
scratch.core=> (occurs-cps 'foo ls (fn [v] v))
java.lang.StackOverflowError (NO_SOURCE_FILE:0)
CPS lets us move all of our non-trivial, stack-building calls to tail position, but in Clojure we need to take the extra step of replacing them with recur:
(defn occurs-cps-recur [a lst k]
(cond
(empty? lst) (k 0)
(atom? (first lst))
(cond
(= a (first lst)) (recur a (rest lst)
(fn [v] (k (inc v))))
:else (recur a (rest lst) k))
:else (recur a (first lst)
(fn [fst]
(recur a (rest lst) ;; Problem
(fn [rst] (k (+ fst rst))))))))
Alas, this goes wrong: java.lang.IllegalArgumentException: Mismatched argument count to recur, expected: 1 args, got: 3 (core.clj:39). The very last recur actually refers to the fn right above it, the one we're using to represent our continuations! We can get good behavior most of the time by changing just that recur to a call to occurs-cps-recur, but pathologically-nested input will still overflow the stack:
scratch.core=> (occurs-cps-recur 'foo ls (fn [v] v))
20000
scratch.core=> (def nested (reduce (fn [onion _] (list onion))
'foo (range 20000)))
#'scratch.core/nested
scratch.core=> (occurs-cps-recur 'foo nested (fn [v] v))
Java.lang.StackOverflowError (NO_SOURCE_FILE:0)
Instead of making the call to occurs-* and expecting it to give back an answer, we can have it return a thunk immediately. When we invoke that thunk, it'll go off and do some work right up until it does a recursive call, which in turn will return another thunk. This is trampolined style, and the function that "bounces" our thunks is trampoline. Returning a thunk each time we make a recursive call bounds our stack size to one call at a time, so our only limit is the heap:
(defn occurs-cps-tramp [a lst k]
(fn []
(cond
(empty? lst) (k 0)
(atom? (first lst))
(cond
(= a (first lst)) (occurs-cps-tramp a (rest lst)
(fn [v] (k (inc v))))
:else (occurs-cps-tramp a (rest lst) k))
:else (occurs-cps-tramp a (first lst)
(fn [fst]
(occurs-cps-tramp a (rest lst)
(fn [rst] (k (+ fst rst)))))))))
(declare done answer)
(defn my-trampoline [th]
(if done
answer
(recur (th))))
(defn empty-k [v]
(set! answer v)
(set! done true))
(defn run []
(binding [done false answer 'whocares]
(my-trampoline (occurs-cps-tramp 'foo nested empty-k))))
;; scratch.core=> (run)
;; 1
Note that Clojure has a built-in trampoline (with some limitations on the return type). Using that instead, we don't need a specialized empty-k:
scratch.core=> (trampoline (occurs-cps-tramp 'foo nested (fn [v] v)))
1
Trampolining is certainly a cool technique, but the prerequisite to trampoline a program is that it must contain only tail calls; CPS is the real star here. It lets you define your algorithm with the clarity of natural recursion, and through correctness-preserving transformations, express it efficiently on any host that has a single loop and a heap.
You can't do this with a fixed amount of memory. You can consume stack, or heap; that's the decision you get to make. If I were writing this in Clojure I would do it with map and reduce rather than with manual recursion:
(defn occurs [x coll]
(if (coll? coll)
(reduce + (map #(occurs x %) coll))
(if (= x coll)
1, 0)))
Note that shorter solutions exist if you use tree-seq or flatten, but at that point most of the problem is gone so there's not much to learn.
Edit
Here's a version that doesn't use any stack, instead letting its queue get larger and larger (using up heap).
(defn heap-occurs [item coll]
(loop [count 0, queue coll]
(if-let [[x & xs] (seq queue)]
(if (coll? x)
(recur count (concat x xs))
(recur (+ (if (= item x) 1, 0)
count)
xs))
count)))

Merge list of maps and combine values to sets in Clojure

What function can I put as FOO here to yield true at the end? I played with hash-set (only correct for first 2 values), conj, and concat but I know I'm not handling the single-element vs set condition properly with just any of those.
(defn mergeMatches [propertyMapList]
"Take a list of maps and merges them combining values into a set"
(reduce #(merge-with FOO %1 %2) {} propertyMapList))
(def in
(list
{:a 1}
{:a 2}
{:a 3}
{:b 4}
{:b 5}
{:b 6} ))
(def out
{ :a #{ 1 2 3}
:b #{ 4 5 6} })
; this should return true
(= (mergeMatches in) out)
What is the most idiomatic way to handle this?
This'll do:
(let [set #(if (set? %) % #{%})]
#(clojure.set/union (set %) (set %2)))
Rewritten more directly for the example (Alex):
(defn to-set [s]
(if (set? s) s #{s}))
(defn set-union [s1 s2]
(clojure.set/union (to-set s1) (to-set s2)))
(defn mergeMatches [propertyMapList]
(reduce #(merge-with set-union %1 %2) {} propertyMapList))
I didn't write this but it was contributed by #amitrathore on Twitter:
(defn kv [bag [k v]]
(update-in bag [k] conj v))
(defn mergeMatches [propertyMapList]
(reduce #(reduce kv %1 %2) {} propertyMapList))
I wouldn't use merge-with for this,
(defn fnil [f not-found]
(fn [x y] (f (if (nil? x) not-found x) y)))
(defn conj-in [m map-entry]
(update-in m [(key map-entry)] (fnil conj #{}) (val map-entry)))
(defn merge-matches [property-map-list]
(reduce conj-in {} (apply concat property-map-list)))
user=> (merge-matches in)
{:b #{4 5 6}, :a #{1 2 3}}
fnil will be part of core soon so you can ignore the implementation... but it just creates a version of another function that can handle nil arguments. In this case conj will substitute #{} for nil.
So the reduction conjoining to a set for every key/value in the list of maps supplied.
Another solution contributed by #wmacgyver on Twitter based on multimaps:
(defn add
"Adds key-value pairs the multimap."
([mm k v]
(assoc mm k (conj (get mm k #{}) v)))
([mm k v & kvs]
(apply add (add mm k v) kvs)))
(defn mm-merge
"Merges the multimaps, taking the union of values."
[& mms]
(apply (partial merge-with union) mms))
(defn mergeMatches [property-map-list]
(reduce mm-merge (map #(add {} (key (first %)) (val (first %))) property-map-list)))
This seems to work:
(defn FOO [v1 v2]
(if (set? v1)
(apply hash-set v2 v1)
(hash-set v1 v2)))
Not super pretty but it works.
(defn mergeMatches [propertyMapList]
(for [k (set (for [pp propertyMapList] (key (first pp))))]
{k (set (remove nil? (for [pp propertyMapList] (k pp))))}))

Resources