How should I monitor the progress of a mapped function in clojure?
When processing records in an imperative language I often print a message every so often to indicate how far things have gone, e.g. reporting every 1000 records. Essentially this is counting loop repetitions.
I was wondering what approaches I could take to this in clojure where I am mapping a function over my sequence of records. In this case printing the message (and even keeping count of the progress) seem to be essentially side-effects.
What I have come up with so far looks like:
(defn report
[report-every val cnt]
(if (= 0 (mod cnt report-every))
(println "Done" cnt))
val)
(defn report-progress
[report-every aseq]
(map (fn [val cnt]
(report report-every val cnt))
aseq
(iterate inc 1)))
For example:
user> (doall (report-progress 2 (range 10)))
Done 2
Done 4
Done 6
Done 8
Done 10
(0 1 2 3 4 5 6 7 8 9)
Are there other (better) ways of achieving this effect?
Are there any pitfalls in what I am doing? (I think I am preserving laziness and not holding the head for example.)
The great thing about clojure is you can attach the reporting to the data itself instead of the code that does the computing. This allows you to separate these logically distinct parts. Here is a chunk from my misc.clj that I find I use in just about every project:
(defn seq-counter
"calls callback after every n'th entry in sequence is evaluated.
Optionally takes another callback to call once the seq is fully evaluated."
([sequence n callback]
(map #(do (if (= (rem %1 n) 0) (callback)) %2) (iterate inc 1) sequence))
([sequence n callback finished-callback]
(drop-last (lazy-cat (seq-counter sequence n callback)
(lazy-seq (cons (finished-callback) ()))))))
then wrap the reporter around your data and then pass the result to the processing function.
(map process-data (seq-counter inc-progress input))
I would probably perform the reporting in an agent. Something like this:
(defn report [a]
(println "Done " s)
(+ 1 s))
(let [reports (agent 0)]
(map #(do (send reports report)
(process-data %))
data-to-process)
I don't know of any existing ways of doing that, maybe it would be a good idea to browse clojure.contrib documentation to look if there's already something. In the meantime, I've looked at your example and cleared it up a little bit.
(defn report [cnt]
(when (even? cnt)
(println "Done" cnt)))
(defn report-progress []
(let [aseq (range 10)]
(doall (map report (take (count aseq) (iterate inc 1))))
aseq))
You're heading in the right direction, even though this example is too simple. This gave me an idea about a more generalized version of your report-progress function. This function would take a map-like function, the function to be mapped, a report function and a set of collections (or a seed value and a collection for testing reduce).
(defn report-progress [m f r & colls]
(let [result (apply m
(fn [& args]
(let [v (apply f args)]
(apply r v args) v))
colls)]
(if (seq? result)
(doall result)
result)))
The seq? part is there only for use with reduce which doesn't
necessarily returns a sequence. With this function, we can rewrite your
example like this:
user>
(report-progress
map
(fn [_ v] v)
(fn [result cnt _]
(when (even? cnt)
(println "Done" cnt)))
(iterate inc 1)
(range 10))
Done 2
Done 4
Done 6
Done 8
Done 10
(0 1 2 3 4 5 6 7 8 9)
Test the filter function:
user>
(report-progress
filter
odd?
(fn [result cnt]
(when (even? cnt)
(println "Done" cnt)))
(range 10))
Done 0
Done 2
Done 4
Done 6
Done 8
(1 3 5 7 9)
And even the reduce function:
user>
(report-progress
reduce
+
(fn [result s v]
(when (even? s)
(println "Done" s)))
2
(repeat 10 1))
Done 2
Done 4
Done 6
Done 8
Done 10
12
I have had this problem with some slow-running apps (e.g. database ETL, etc). I solved it by adding the function (tupelo.misc/dot ...) to the tupelo library. Sample:
(ns xxx.core
(:require [tupelo.misc :as tm]))
(tm/dots-config! {:decimation 10} )
(tm/with-dots
(doseq [ii (range 2345)]
(tm/dot)
(Thread/sleep 5)))
Output:
0 ....................................................................................................
1000 ....................................................................................................
2000 ...................................
2345 total
API docs for the tupelo.misc namespace can be found here.
Related
Im dealing with recursion in clojure, which i dont really understand.
I made a small program taked from here that tries to find the smalles number that can be divided by all the numbers from 1 to 20. This is the code i wrotte, but there must be something im missing because it does not work.
Could you give me a hand? thanks!
(defn smallest [nume index]
(while(not ( = index 0))
(do
(cond
(zero?(mod nume index))(let [dec' index] (smallest nume index))
:else (let [inc' nume] (smallest nume index))))))
EDIT:
Looks like is better loop/recur so i tried it:
(loop [nume 20
index 20]
(if (= index 0)
(println nume)
(if (zero?(mod nume index))
(recur nume (dec index))
(recur (inc nume) 20)))))
Working. If you are curious about result--> 232792560
while does not do what you think it does.
In clojure, everything (well, almost) is immutable, meaning that if index is 0, it will always be 0 in the same context. Thus looping until it is 1 makes little sense.
There are a number of ways to achieve what you are trying to do, the first, and most trivial (I think!) to newcomers, is to understand loop/recur. So for example:
(loop [counter 0]
(when (< counter 10)
(println counter)
(recur (inc counter))))
In here, counter is defined to be 0, and it never changes in the usual way. When you hit recur, you submit a new value, in this case the increment of the previous counter, into a brand new iteration starting at loop, only now counter will be bound to 1.
Edit: Notice however, that this example will always return nil. It is only used for the side effect of println. Why does it return nil? Because in the last iteration, the when clause will return nil. If you want to return something else, you should perhaps use if and specify what would you like to be returned at the last iteration.
You should read a little more about this paradigm, and perhaps do exercises like 4clojure to get a better grasp at this. Once you do, it will become MUCH simpler for you to think in this way, and the tremendous benefits of this style will begin to emerge.
Good luck!
Here's a brute force implementation testing all numbers on the condition if they can be divided by all numbers from 1 to 10 (please note (range 1 11)) in the code:
(first
(filter #(second %)
(map (fn[x] [x (every? identity
(map #(= 0 (mod x %))
(range 2 11)))])
(range 1 Integer/MAX_VALUE))))
It's output is
[2520 true]
Unfortunately, this is not a good approach for bigger numbers. With (range 1 21) it doesn't finish after few minutes of waiting on my Macbook. Let's try this:
user=> (defn gcd [a b] (if (zero? b) a (recur b (mod a b))))
#'user/gcd
user=> (reduce (fn[acc n] (if (not= 0 (mod acc n)) (* acc (/ n (gcd n acc))) acc)) 1 (range 1 11))
2520
user=> (reduce (fn[acc n] (if (not= 0 (mod acc n)) (* acc (/ n (gcd n acc))) acc)) 1 (range 1 21))
232792560
Here is an example:
;; Helper function for marking multiples of a number as 0
(def mark (fn [[x & xs] k m]
(if (= k m)
(cons 0 (mark xs 1 m))
(cons x (mark xs (inc k) m))
)))
;; Sieve of Eratosthenes
(defn sieve
[x & xs]
(if (= x 0)
(sieve xs)
(cons x (sieve (mark xs 1 x)))
))
(take 10 (lazy-seq (sieve (iterate inc 2))))
It produces a StackOverflowError.
There are a couple of issues here. First, as pointed out in the other answer, your mark and sieve functions don't have terminating conditions. It looks like they are designed to work with infinite sequences, but if you passed a finite-length sequence they'd keep going off the end.
The deeper problem here is that it looks like you're trying to have a function create a lazy infinite sequence by recursively calling itself. However, cons is not lazy in any way; it is a pure function call, so the recursive calls to mark and sieve are invoked immediately. Wrapping the outer-most call to sieve in lazy-seq only serves to defer the initial call; it does not make the entire sequence lazy. Instead, each call to cons must be wrapped in its own lazy sequence.
For instance:
(defn eager-iterate [f x]
(cons x (eager-iterate f (f x))))
(take 3 (eager-iterate inc 0)) ; => StackOverflowError
(take 3 (lazy-seq (eager-iterate inc 0))) ; => Still a StackOverflowError
Compare this with the actual source code of iterate:
(defn iterate
"Returns a lazy sequence of x, (f x), (f (f x)) etc. f must be free of side-effects"
{:added "1.0"
:static true}
[f x] (cons x (lazy-seq (iterate f (f x)))))
Putting it together, here's an implementation of mark that works correctly for finite sequences and preserves laziness for infinite sequences. Fixing sieve is left as an exercise for the reader.
(defn mark [[x :as xs] k m]
(lazy-seq
(when (seq xs)
(if (= k m)
(cons 0 (mark (next xs) 1 m))
(cons x (mark (next xs) (inc k) m))))))
(mark (range 4 14) 1 3)
; => (4 5 0 7 8 0 10 11 0 13)
(take 10 (mark (iterate inc 4) 1 3))
; => (4 5 0 7 8 0 10 11 0 13)
Need terminating conditions
The problem here is both your mark and sieve functions have no terminating conditions. There must be some set of inputs for which each function does not call itself, but returns an answer. Additionally, every set of (valid) inputs to these functions should eventually resolve to a non-recursive return value.
But even if you get it right...
I'll add that even if you succeed in creating the correct terminating conditions, there is still the possibility of having a stack overflow if the depth of the recursion in too large. This can be mitigated to some extent by increasing the JVM stack size, but this has it's limits.
A way around this for some functions is to use tail call optimization. Some recursive functions are tail recursive, meaning that all recursive calls to the function being defined within it's definition are in the tail call position (are the final function called in the definition body). For example, in your sieve function's (= x 0) case, sieve is the tail call, since the result of sieve doesn't get passed into any other function. However, in the case that (not (= x 0)), the result of calling sieve gets passed to cons, so this is not a tail call. When a function is fully tail recursive, it is possible to behind the scenes transform the function definition into a looping construct which avoids consuming the stack. In clojure this is possible by using recur in the function definition instead of the function name (there is also a loop construct which can sometimes be helpful). Again, because not all recursive functions are tail recursive, this isn't a panacea. But when they are it's good to know that you can do this.
Thanks to #Alex's answer I managed to come up with a working lazy solution:
;; Helper function for marking mutiples of a number as 0
(defn mark [[x :as xs] k m]
(lazy-seq
(when-not (empty? xs)
(if (= k m)
(cons 0 (mark (rest xs) 1 m))
(cons x (mark (rest xs) (inc k) m))))))
;; Sieve of Eratosthenes
(defn sieve
[[x :as xs]]
(lazy-seq
(when-not (empty? xs)
(if (= x 0)
(sieve (rest xs))
(cons x (sieve (mark (rest xs) 1 x)))))))
I was adviced by someone else to use rest instead of next.
The ClojureDocs page for lazy-seq gives an example of generating a lazy-seq of all positive numbers:
(defn positive-numbers
([] (positive-numbers 1))
([n] (cons n (lazy-seq (positive-numbers (inc n))))))
This lazy-seq can be evaluated for pretty large indexes without throwing a StackOverflowError (unlike the sieve example on the same page):
user=> (nth (positive-numbers) 99999999)
100000000
If only recur can be used to avoid consuming stack frames in a recursive function, how is it possible this lazy-seq example can seemingly call itself without overflowing the stack?
A lazy sequence has the rest of the sequence generating calculation in a thunk. It is not immediately called. As each element (or chunk of elements as the case may be) is requested, a call to the next thunk is made to retrieve the value(s). That thunk may create another thunk to represent the tail of the sequence if it continues. The magic is that (1) these special thunks implement the sequence interface and can transparently be used as such and (2) each thunk is only called once -- its value is cached -- so the realized portion is a sequence of values.
Here it is the general idea without the magic, just good ol' functions:
(defn my-thunk-seq
([] (my-thunk-seq 1))
([n] (list n #(my-thunk-seq (inc n)))))
(defn my-next [s] ((second s)))
(defn my-realize [s n]
(loop [a [], s s, n n]
(if (pos? n)
(recur (conj a (first s)) (my-next s) (dec n))
a)))
user=> (-> (my-thunk-seq) first)
1
user=> (-> (my-thunk-seq) my-next first)
2
user=> (my-realize (my-thunk-seq) 10)
[1 2 3 4 5 6 7 8 9 10]
user=> (count (my-realize (my-thunk-seq) 100000))
100000 ; Level stack consumption
The magic bits happen inside of clojure.lang.LazySeq defined in Java, but we can actually do the magic directly in Clojure (implementation that follows for example purposes), by implementing the interfaces on a type and using an atom to cache.
(deftype MyLazySeq [thunk-mem]
clojure.lang.Seqable
(seq [_]
(if (fn? #thunk-mem)
(swap! thunk-mem (fn [f] (seq (f)))))
#thunk-mem)
;Implementing ISeq is necessary because cons calls seq
;on anyone who does not, which would force realization.
clojure.lang.ISeq
(first [this] (first (seq this)))
(next [this] (next (seq this)))
(more [this] (rest (seq this)))
(cons [this x] (cons x (seq this))))
(defmacro my-lazy-seq [& body]
`(MyLazySeq. (atom (fn [] ~#body))))
Now this already works with take, etc., but as take calls lazy-seq we'll make a my-take that uses my-lazy-seq instead to eliminate any confusion.
(defn my-take
[n coll]
(my-lazy-seq
(when (pos? n)
(when-let [s (seq coll)]
(cons (first s) (my-take (dec n) (rest s)))))))
Now let's make a slow infinite sequence to test the caching behavior.
(defn slow-inc [n] (Thread/sleep 1000) (inc n))
(defn slow-pos-nums
([] (slow-pos-nums 1))
([n] (cons n (my-lazy-seq (slow-pos-nums (slow-inc n))))))
And the REPL test
user=> (def nums (slow-pos-nums))
#'user/nums
user=> (time (doall (my-take 10 nums)))
"Elapsed time: 9000.384616 msecs"
(1 2 3 4 5 6 7 8 9 10)
user=> (time (doall (my-take 10 nums)))
"Elapsed time: 0.043146 msecs"
(1 2 3 4 5 6 7 8 9 10)
Keep in mind that lazy-seq is a macro, and therefore does not evaluate its body when your positive-numbers function is called. In that sense, positive-numbers isn't truly recursive. It returns immediately, and the inner "recursive" call to positive-numbers doesn't happen until the seq is consumed.
user=> (source lazy-seq)
(defmacro lazy-seq
"Takes a body of expressions that returns an ISeq or nil, and yields
a Seqable object that will invoke the body only the first time seq
is called, and will cache the result and return it on all subsequent
seq calls. See also - realized?"
{:added "1.0"}
[& body]
(list 'new 'clojure.lang.LazySeq (list* '^{:once true} fn* [] body)))
I think the trick is that the producer function (positive-numbers) isn't getting called recursively, it doesn't accumulate stack frames as if it was called with basic recursion Little-Schemer style, because LazySeq is invoking it as needed for the individual entries in the sequence. Once a closure gets evaluated for an entry then it can be discarded. So stack frames from previous invocations of the function can get garbage-collected as the code churns through the sequence.
I want to reverse a sequence in Clojure without using the reverse function, and do so recursively.
Here is what I came up with:
(defn reverse-recursively [coll]
(loop [r (rest coll)
acc (conj () (first coll))]
(if (= (count r) 0)
acc
(recur (rest r) (conj acc (first r))))))
Sample output:
user> (reverse-recursively '(1 2 3 4 5 6))
(6 5 4 3 2 1)
user> (reverse-recursively [1 2 3 4 5 6])
(6 5 4 3 2 1)
user> (reverse-recursively {:a 1 :b 2 :c 3})
([:c 3] [:b 2] [:a 1])
Questions:
Is there a more concise way of doing this, i.e. without loop/recur?
Is there a way to do this without using an "accumulator" parameter in the loop?
References:
Whats the best way to recursively reverse a string in Java?
http://groups.google.com/group/clojure/browse_thread/thread/4e7a4bfb0d71a508?pli=1
You don't need to count. Just stop when the remaining sequence is empty.
You shouldn't pre-populate the acc, since the original input may be empty (and it's more code).
Destructuring is cool.
(defn reverse-recursively [coll]
(loop [[r & more :as all] (seq coll)
acc '()]
(if all
(recur more (cons r acc))
acc)))
As for loop/recur and the acc, you need some way of passing around the working reversed list. It's either loop, or add another param to the function (which is really what loop is doing anyway).
Or use a higher-order function:
user=> (reduce conj '() [1 2 3 4])
(4 3 2 1)
For the sake of exhaustivenes, there is one more method using into. Since into internally uses conj it can be used as follows :
(defn reverse-list
"Reverse the element of alist."
[lst]
(into '() lst))
Yes to question 1, this is what I came up with for my answer to the recursion koan (I couldn't tell you whether it was good clojure practice or not).
(defn recursive-reverse [coll]
(if (empty? coll)
[]
(conj (recursive-reverse (rest coll)) (first coll) )))
In current version of Clojure there's a built-in function called rseq. For anyone who passes by.
(defn my-rev [col]
(loop [ col col
result []]
(if (empty? col)
result
(recur (rest col) (cons (first col) result)))))
Q1.
The JVM can not optimize the recursion, a recursive function that would directly and stack overflow. Therefore, in Clojure, which uses the loop/recur. So, without using a function that recur deep recursion can not be defined. (which is also used internally to recur as a function trampoline.)
Q2.
a recursive function by recur, must be tail-recursive. If the normal recursive function change to tail-recursive function, so there is a need to carry about the value of a variable is required as the accumulator.
(defn reverse-seq [sss]
(if (not (empty? sss))
(conj (reverse-seq (rest sss)) (first sss))
)
)
(defn recursive-reverse [coll]
(if (empty? coll)
()
(concat (vector (peek coll)) (recursive-reverse (pop coll )))
)
)
and test:
user=> (recursive-reverse [1])
(1)
user=> (recursive-reverse [1 2 3 4 5])
(5 4 3 2 1)
Say I have a vector:
(def data ["Hello" "World" "Test" "This"])
And I want to populate a table somewhere that has an api:
(defn setCell
[row col value]
(some code here))
Then what is the best way to get the following calls to happen:
(setCell 0 0 "Hello")
(setCell 0 1 "World")
(setCell 0 2 "Test")
(setCell 0 3 "This")
I found that the following will work:
(let [idv (map vector (iterate inc 0) data)]
(doseq [[index value] idv] (setCell 0 index value)))
But is there a faster way that does not require a new temporary datastructure idv?
You can get the same effect in a very clojure-idiomatic way by just mapping the indexes along with the data.
(map #(setCell 0 %1 %2) (iterate inc 0) data)
You may want to wrap this in a (doall or (doseq to make the calls happen now. It's just fine to map an infinite seq along with the finite one because map will stop when the shortest seq runs out.
A bit late in the game but for people accessing this page: there is now (since clojure 1.2) a map-indexed function available in clojure.core.
One issue (unless I'm mistaken): there's no "pmap" equivalent, meaning that map-indexed computations cannot easily be parallelized. In that case, I'd refer to solutions offered above.
The way you're doing it is idiomatic (and identical to clojure.contrib.seq-utils/indexed in fact). If you really want to avoid the extra data structure, you can do this:
(loop [data data, index 0]
(when (seq data)
(setCell 0 index (first data))
(recur (rest data) (inc index))))
I'd use your version unless there was a good reason not to though.
The nicest way would be to use clojure.contrib.seq-utils/indexed, which will look like this (using destructuring):
(doseq [[idx val] (indexed ["Hello" "World" "Test" "This"])]
(setCell 0 idx val))
I did a short comparison of the performance of the options sofar:
; just some function that sums stuff
(defn testThis
[i value]
(def total (+ total i value)))
; our test dataset. Make it non-lazy with doall
(def testD (doall (range 100000)))
; time using Arthur's suggestion
(def total 0.0)
(time (doall (map #(testThis %1 %2) (iterate inc 0) testD)))
(println "Total: " total)
; time using Brian's recursive version
(def total 0.0)
(time (loop [d testD i 0]
(when (seq d)
(testThis i (first d))
(recur (rest d) (inc i)))))
(println "Total: " total)
; with the idiomatic indexed version
(def total 0.0)
(time (let [idv (map vector (iterate inc 0) testD)]
(doseq [[i value] idv] (testThis i value))))
(println "Total: " total)
Results on my 1 core laptop:
"Elapsed time: 598.224635 msecs"
Total: 9.9999E9
"Elapsed time: 241.573161 msecs"
Total: 9.9999E9
"Elapsed time: 959.050662 msecs"
Total: 9.9999E9
Preliminary Conclusion:
Use the loop/recur solution.