Idiomatic approach to stopping producer/consumer go-loops? - asynchronous

In an effort to get some good concurrent programming practice I'm trying to implement the producer/consumer pattern using Clojure's core.async library. All is working well but I wanted to be able to stop both the producer and consumer at some point in time.
My current code looks something like this...
(def c (a/chan 5))
(def alive (atom true))
(def producer (a/go-loop []
(Thread/sleep 1000)
(when #alive
(a/>! c 1)
(recur))))
(def consumer (a/go-loop []
(Thread/sleep 3000)
(when #alive
(println (a/<! c))
(recur))))
(do
(reset! alive false)
(a/<!! producer)
(a/<!! consumer))
Unfortunately it appears that the 'do' block occasionally blocks indefinitely. I essentially want to be able to stop both go-loops from continuing and block until both loops have exited. The Thread/sleep code is there to simulate performing some unit of work.
I suspect that stopping the producer causes the consumer to park, hence the hanging, though I'm not sure of an alternative approach, any ideas?

Please see ClojureDocs for details. Example:
(let [c (chan 2) ]
(>!! c 1)
(>!! c 2)
(close! c)
(println (<!! c)) ; 1
(println (<!! c)) ; 2
;; since we closed the channel this will return false(we can no longer add values)
(>!! c 1))
For your problem something like:
(let [c (a/chan 5)
producer (a/go-loop [cnt 0]
(Thread/sleep 1000)
(let [put-result (a/>! c cnt)]
(println "put: " cnt put-result)
(when put-result
(recur (inc cnt)))))
consumer (a/go-loop []
(Thread/sleep 3000)
(let [result (a/<! c)]
(when result
(println "take: " result)
(recur))))]
(Thread/sleep 5000)
(println "closing chan...")
(a/close! c))
with result
put: 0 true
put: 1 true
take: 0
put: 2 true
put: 3 true
closing chan...
put: 4 false
take: 1
take: 2
take: 3

Related

Why doesn't this clojure code using go blocks work?

(defn ff [t]
(let [ch (chan 5)]
(map (fn [i]
(println i)) t)
(go (>! ch 0))))
(ff [1 2 3 4 5])
The mapping function body isn't being executed. If I remove the go block in the last line, it works as expected.
This function gives the same problem:
(defn ff [t]
(let [ch (chan 5)]
(map (fn [i]
(println i)) t)
(>!! ch 0)))
map runs lazily.
When it's not the last form in the let block, the result isn't being evaluated, so the mapping function doesn't get executed.
This'd happen without the go blocks.
If you explicitly want to evaluate a sequence for side-effects (like println), use doseq. If you need a lazy sequence to be evaluated eagerly (e.g. it depends on a network connection that will close), wrap it in doall

Why can't Clojure's async library handle the Go prime sieve?

To try out the async library in Clojure, I translated the prime sieve example from Go. Running in the REPL, it successfully printed out the prime numbers up to 227 and then stopped. I hit Ctrl-C and tried running it again but it wouldn't print out any more numbers. Is there a way to get Clojure to handle this, or is the async library just not ready for it yet?
;; A concurrent prime sieve translated from
;; https://golang.org/doc/play/sieve.go
(require '[clojure.core.async :as async :refer [<!! >!! chan go]])
(defn generate
[ch]
"Sends the sequence 2, 3, 4, ... to channel 'ch'."
(doseq [i (drop 2 (range))]
(>!! ch i)))
(defn filter-multiples
[in-chan out-chan prime]
"Copies the values from 'in-chan' to 'out-chan', removing
multiples of 'prime'."
(while true
;; Receive value from 'in-chan'.
(let [i (<!! in-chan)]
(if (not= 0 (mod i prime))
;; Send 'i' to 'out-chan'.
(>!! out-chan i)))))
(defn main
[]
"The prime sieve: Daisy-chain filter-multiples processes."
(let [ch (chan)]
(go (generate ch))
(loop [ch ch]
(let [prime (<!! ch)]
(println prime)
(let [ch1 (chan)]
(go (filter-multiples ch ch1 prime))
(recur ch1))))))
go is a macro. If you want to take advantage of goroutine-like behaviour in go blocks you must use <! and >!, and they must be visible to the go macro (that is you mustn't extract these operations into separate functions).
This literal translation of the program at https://golang.org/doc/play/sieve.go seems to work fine, also with a larger i in the main loop:
(require '[clojure.core.async :refer [<! <!! >! chan go]])
(defn go-generate [ch]
(go (doseq [i (iterate inc 2)]
(>! ch i))))
(defn go-filter [in out prime]
(go (while true
(let [i (<! in)]
(if-not (zero? (rem i prime))
(>! out i))))))
(defn main []
(let [ch (chan)]
(go-generate ch)
(loop [i 10 ch ch]
(if (pos? i)
(let [prime (<!! ch)]
(println prime)
(let [ch1 (chan)]
(go-filter ch ch1 prime)
(recur (dec i) ch1)))))))

If the only non-stack-consuming looping construct in Clojure is "recur", how does this lazy-seq work?

The ClojureDocs page for lazy-seq gives an example of generating a lazy-seq of all positive numbers:
(defn positive-numbers
([] (positive-numbers 1))
([n] (cons n (lazy-seq (positive-numbers (inc n))))))
This lazy-seq can be evaluated for pretty large indexes without throwing a StackOverflowError (unlike the sieve example on the same page):
user=> (nth (positive-numbers) 99999999)
100000000
If only recur can be used to avoid consuming stack frames in a recursive function, how is it possible this lazy-seq example can seemingly call itself without overflowing the stack?
A lazy sequence has the rest of the sequence generating calculation in a thunk. It is not immediately called. As each element (or chunk of elements as the case may be) is requested, a call to the next thunk is made to retrieve the value(s). That thunk may create another thunk to represent the tail of the sequence if it continues. The magic is that (1) these special thunks implement the sequence interface and can transparently be used as such and (2) each thunk is only called once -- its value is cached -- so the realized portion is a sequence of values.
Here it is the general idea without the magic, just good ol' functions:
(defn my-thunk-seq
([] (my-thunk-seq 1))
([n] (list n #(my-thunk-seq (inc n)))))
(defn my-next [s] ((second s)))
(defn my-realize [s n]
(loop [a [], s s, n n]
(if (pos? n)
(recur (conj a (first s)) (my-next s) (dec n))
a)))
user=> (-> (my-thunk-seq) first)
1
user=> (-> (my-thunk-seq) my-next first)
2
user=> (my-realize (my-thunk-seq) 10)
[1 2 3 4 5 6 7 8 9 10]
user=> (count (my-realize (my-thunk-seq) 100000))
100000 ; Level stack consumption
The magic bits happen inside of clojure.lang.LazySeq defined in Java, but we can actually do the magic directly in Clojure (implementation that follows for example purposes), by implementing the interfaces on a type and using an atom to cache.
(deftype MyLazySeq [thunk-mem]
clojure.lang.Seqable
(seq [_]
(if (fn? #thunk-mem)
(swap! thunk-mem (fn [f] (seq (f)))))
#thunk-mem)
;Implementing ISeq is necessary because cons calls seq
;on anyone who does not, which would force realization.
clojure.lang.ISeq
(first [this] (first (seq this)))
(next [this] (next (seq this)))
(more [this] (rest (seq this)))
(cons [this x] (cons x (seq this))))
(defmacro my-lazy-seq [& body]
`(MyLazySeq. (atom (fn [] ~#body))))
Now this already works with take, etc., but as take calls lazy-seq we'll make a my-take that uses my-lazy-seq instead to eliminate any confusion.
(defn my-take
[n coll]
(my-lazy-seq
(when (pos? n)
(when-let [s (seq coll)]
(cons (first s) (my-take (dec n) (rest s)))))))
Now let's make a slow infinite sequence to test the caching behavior.
(defn slow-inc [n] (Thread/sleep 1000) (inc n))
(defn slow-pos-nums
([] (slow-pos-nums 1))
([n] (cons n (my-lazy-seq (slow-pos-nums (slow-inc n))))))
And the REPL test
user=> (def nums (slow-pos-nums))
#'user/nums
user=> (time (doall (my-take 10 nums)))
"Elapsed time: 9000.384616 msecs"
(1 2 3 4 5 6 7 8 9 10)
user=> (time (doall (my-take 10 nums)))
"Elapsed time: 0.043146 msecs"
(1 2 3 4 5 6 7 8 9 10)
Keep in mind that lazy-seq is a macro, and therefore does not evaluate its body when your positive-numbers function is called. In that sense, positive-numbers isn't truly recursive. It returns immediately, and the inner "recursive" call to positive-numbers doesn't happen until the seq is consumed.
user=> (source lazy-seq)
(defmacro lazy-seq
"Takes a body of expressions that returns an ISeq or nil, and yields
a Seqable object that will invoke the body only the first time seq
is called, and will cache the result and return it on all subsequent
seq calls. See also - realized?"
{:added "1.0"}
[& body]
(list 'new 'clojure.lang.LazySeq (list* '^{:once true} fn* [] body)))
I think the trick is that the producer function (positive-numbers) isn't getting called recursively, it doesn't accumulate stack frames as if it was called with basic recursion Little-Schemer style, because LazySeq is invoking it as needed for the individual entries in the sequence. Once a closure gets evaluated for an entry then it can be discarded. So stack frames from previous invocations of the function can get garbage-collected as the code churns through the sequence.

Clojure: Call a function for each element in a vector with it index

Say I have a vector:
(def data ["Hello" "World" "Test" "This"])
And I want to populate a table somewhere that has an api:
(defn setCell
[row col value]
(some code here))
Then what is the best way to get the following calls to happen:
(setCell 0 0 "Hello")
(setCell 0 1 "World")
(setCell 0 2 "Test")
(setCell 0 3 "This")
I found that the following will work:
(let [idv (map vector (iterate inc 0) data)]
(doseq [[index value] idv] (setCell 0 index value)))
But is there a faster way that does not require a new temporary datastructure idv?
You can get the same effect in a very clojure-idiomatic way by just mapping the indexes along with the data.
(map #(setCell 0 %1 %2) (iterate inc 0) data)
You may want to wrap this in a (doall or (doseq to make the calls happen now. It's just fine to map an infinite seq along with the finite one because map will stop when the shortest seq runs out.
A bit late in the game but for people accessing this page: there is now (since clojure 1.2) a map-indexed function available in clojure.core.
One issue (unless I'm mistaken): there's no "pmap" equivalent, meaning that map-indexed computations cannot easily be parallelized. In that case, I'd refer to solutions offered above.
The way you're doing it is idiomatic (and identical to clojure.contrib.seq-utils/indexed in fact). If you really want to avoid the extra data structure, you can do this:
(loop [data data, index 0]
(when (seq data)
(setCell 0 index (first data))
(recur (rest data) (inc index))))
I'd use your version unless there was a good reason not to though.
The nicest way would be to use clojure.contrib.seq-utils/indexed, which will look like this (using destructuring):
(doseq [[idx val] (indexed ["Hello" "World" "Test" "This"])]
(setCell 0 idx val))
I did a short comparison of the performance of the options sofar:
; just some function that sums stuff
(defn testThis
[i value]
(def total (+ total i value)))
; our test dataset. Make it non-lazy with doall
(def testD (doall (range 100000)))
; time using Arthur's suggestion
(def total 0.0)
(time (doall (map #(testThis %1 %2) (iterate inc 0) testD)))
(println "Total: " total)
; time using Brian's recursive version
(def total 0.0)
(time (loop [d testD i 0]
(when (seq d)
(testThis i (first d))
(recur (rest d) (inc i)))))
(println "Total: " total)
; with the idiomatic indexed version
(def total 0.0)
(time (let [idv (map vector (iterate inc 0) testD)]
(doseq [[i value] idv] (testThis i value))))
(println "Total: " total)
Results on my 1 core laptop:
"Elapsed time: 598.224635 msecs"
Total: 9.9999E9
"Elapsed time: 241.573161 msecs"
Total: 9.9999E9
"Elapsed time: 959.050662 msecs"
Total: 9.9999E9
Preliminary Conclusion:
Use the loop/recur solution.

Idiomatic clojure for progress reporting?

How should I monitor the progress of a mapped function in clojure?
When processing records in an imperative language I often print a message every so often to indicate how far things have gone, e.g. reporting every 1000 records. Essentially this is counting loop repetitions.
I was wondering what approaches I could take to this in clojure where I am mapping a function over my sequence of records. In this case printing the message (and even keeping count of the progress) seem to be essentially side-effects.
What I have come up with so far looks like:
(defn report
[report-every val cnt]
(if (= 0 (mod cnt report-every))
(println "Done" cnt))
val)
(defn report-progress
[report-every aseq]
(map (fn [val cnt]
(report report-every val cnt))
aseq
(iterate inc 1)))
For example:
user> (doall (report-progress 2 (range 10)))
Done 2
Done 4
Done 6
Done 8
Done 10
(0 1 2 3 4 5 6 7 8 9)
Are there other (better) ways of achieving this effect?
Are there any pitfalls in what I am doing? (I think I am preserving laziness and not holding the head for example.)
The great thing about clojure is you can attach the reporting to the data itself instead of the code that does the computing. This allows you to separate these logically distinct parts. Here is a chunk from my misc.clj that I find I use in just about every project:
(defn seq-counter
"calls callback after every n'th entry in sequence is evaluated.
Optionally takes another callback to call once the seq is fully evaluated."
([sequence n callback]
(map #(do (if (= (rem %1 n) 0) (callback)) %2) (iterate inc 1) sequence))
([sequence n callback finished-callback]
(drop-last (lazy-cat (seq-counter sequence n callback)
(lazy-seq (cons (finished-callback) ()))))))
then wrap the reporter around your data and then pass the result to the processing function.
(map process-data (seq-counter inc-progress input))
I would probably perform the reporting in an agent. Something like this:
(defn report [a]
(println "Done " s)
(+ 1 s))
(let [reports (agent 0)]
(map #(do (send reports report)
(process-data %))
data-to-process)
I don't know of any existing ways of doing that, maybe it would be a good idea to browse clojure.contrib documentation to look if there's already something. In the meantime, I've looked at your example and cleared it up a little bit.
(defn report [cnt]
(when (even? cnt)
(println "Done" cnt)))
(defn report-progress []
(let [aseq (range 10)]
(doall (map report (take (count aseq) (iterate inc 1))))
aseq))
You're heading in the right direction, even though this example is too simple. This gave me an idea about a more generalized version of your report-progress function. This function would take a map-like function, the function to be mapped, a report function and a set of collections (or a seed value and a collection for testing reduce).
(defn report-progress [m f r & colls]
(let [result (apply m
(fn [& args]
(let [v (apply f args)]
(apply r v args) v))
colls)]
(if (seq? result)
(doall result)
result)))
The seq? part is there only for use with reduce which doesn't
necessarily returns a sequence. With this function, we can rewrite your
example like this:
user>
(report-progress
map
(fn [_ v] v)
(fn [result cnt _]
(when (even? cnt)
(println "Done" cnt)))
(iterate inc 1)
(range 10))
Done 2
Done 4
Done 6
Done 8
Done 10
(0 1 2 3 4 5 6 7 8 9)
Test the filter function:
user>
(report-progress
filter
odd?
(fn [result cnt]
(when (even? cnt)
(println "Done" cnt)))
(range 10))
Done 0
Done 2
Done 4
Done 6
Done 8
(1 3 5 7 9)
And even the reduce function:
user>
(report-progress
reduce
+
(fn [result s v]
(when (even? s)
(println "Done" s)))
2
(repeat 10 1))
Done 2
Done 4
Done 6
Done 8
Done 10
12
I have had this problem with some slow-running apps (e.g. database ETL, etc). I solved it by adding the function (tupelo.misc/dot ...) to the tupelo library. Sample:
(ns xxx.core
(:require [tupelo.misc :as tm]))
(tm/dots-config! {:decimation 10} )
(tm/with-dots
(doseq [ii (range 2345)]
(tm/dot)
(Thread/sleep 5)))
Output:
0 ....................................................................................................
1000 ....................................................................................................
2000 ...................................
2345 total
API docs for the tupelo.misc namespace can be found here.

Resources