How to batch process values on a channel - asynchronous

I'm trying to figure out how to batch incoming requests, do an action with the values in those requests, and then return the result of that action to each request. A slightly simplified version of my problem looks like the following:
Incoming requests make calls to
(defn process
[values]
;; put values on the queue and wait for result, then return the result
...)
Periodically, another function is called
(defn batch-process
[]
;; take up to 10 of the values from the queue, sum those values,
;; then return the result to their process requests
...)
I think I am lacking the vocabulary to figure out how I should be doing this. Any advice or pointers would be appreciated!

I think I figured it out. The key was passing the out-channels into the batch-process call
(defn batch-process
[]
(let [trigger (chan)
in-chan (chan 100)]
(go (loop []
(let [trigger-val (<! trigger)]
(if trigger-val
(let [temp-chan (take (min 10 (.count (.buf in-chan))) in-chan)
chan-vals (<! (into [] temp-chan))
sum-vals (reduce (fn [cur-sum [num out-chan]] (+ cur-sum num))
0
chan-vals)]
(do (doseq [[num out-chan] chan-vals]
(>! out-chan [num sum-vals]))
(recur)))))))
[trigger in-chan]))
(defn process
[value in-chan]
(let [out-chan (chan)]
(>!! in-chan [2 out-chan])
(<!! out-chan)))
Then keep track of trigger and in-chan after calling batch-process and pass in-chan to process. Putting a "true" value on trigger will trigger a batch-process.

i would propose different approach, simply accumulating data and flush on desired count achieved, providing one more channel to force flush:
(require '[clojure.core.async :as a])
(defn batch-consume [n in]
(let [flush-chan (a/chan)
out-chan (a/chan)]
(a/go-loop [data []]
(a/alt! in ([v] (let [data (conj data v)]
(if (= n (count data))
(do (a/>! out-chan data)
(recur []))
(recur data))))
flush-chan (do (a/>! out-chan data)
(recur []))))
{:out out-chan
:flush flush-chan}))
so that could be used somehow like this:
(let [ch (a/chan)
{:keys [out flush]} (batch-consume 3 ch)]
(a/go-loop []
(let [data (a/<! out)]
;; processing batch
(println data (apply + data)))
(recur))
(a/go (dotimes [i 10] ;; automatic flush demo
(a/>! ch i))
(a/>! flush :flush) ;; flushing the pending 10th item
(dotimes [i 3] ;; force flushing by 2 items
(dotimes [j 2]
(a/>! ch (+ (* 10 i) j)))
(a/>! flush :flush))))
output:
;; [0 1 2] 3
;; [3 4 5] 12
;; [6 7 8] 21
;; [9] 9
;; [0 1] 1
;; [10 11] 21
;; [20 21] 41
notice, that if you pass non positive n to the batch-consume function, you're left with only a force flush (which could also be usable in some cases):
(let [ch (a/chan)
{:keys [out flush]} (batch-consume -1 ch)]
(a/go-loop []
(let [data (a/<! out)]
(println data (apply + data)))
(recur))
(a/go (dotimes [i 10]
(a/>! ch i))
(a/>! flush :flush)
(dotimes [i 3]
(dotimes [j 2]
(a/>! ch (+ (* 10 i) j)))
(a/>! flush :flush))))
;; [0 1 2 3 4 5 6 7 8 9] 45
;; [0 1] 1
;; [10 11] 21
;; [20 21] 41

Related

deref an atom after recursive function completes

I have an atom fs that I'm updating inside a recursive function freq-seq that's the value that holds the results of my computation. I have another function mine-freq-seqs to start freq-seq and when mine-freq-seqs is done I would like to receive the last value of said atom. So I thought I would do it like so
(ns freq-seq-enum)
(def fs (atom #{}))
(defn locally-frequents
[sdb min-sup]
(let [uniq-sdb (map (comp frequencies set) sdb)
freqs (apply merge-with + uniq-sdb)]
(->> freqs
(filter #(<= min-sup (second %)))
(map #(vector (str (first %)) (second %))))))
(defn project-sdb
[sdb prefix]
(if (empty? prefix) sdb
(into [] (->> sdb
(filter #(re-find (re-pattern (str (last prefix))) %))
(map #(subs % (inc (.indexOf % (str (last prefix))))))
(remove empty?)))))
(defn freq-seq
[sdb prefix prefix-support min-sup frequent-seqs]
(if ((complement empty?) prefix) (swap! fs conj [prefix prefix-support]))
(let [lf (locally-frequents sdb min-sup)]
(if (empty? lf) nil
(for [[item sup] lf] (freq-seq (project-sdb sdb (str prefix item)) (str prefix item) sup min-sup #fs)))))
(defn mine-freq-seqs
[sdb min-sup]
(freq-seq sdb "" 0 min-sup #fs))
running it first
(mine-freq-seqs ["CAABC" "ABCB" "CABC" "ABBCA"] 2)
then deref-ing the atom
(deref fs)
yields
#{["B" 4]
["BC" 4]
["AB" 4]
["CA" 3]
["CAC" 2]
["AC" 4]
["ABC" 4]
["CAB" 2]
["A" 4]
["CABC" 2]
["ABB" 2]
["CC" 2]
["CB" 3]
["C" 4]
["BB" 2]
["CBC" 2]
["AA" 2]}
however (doall (mine-freq-seqs ["CAABC" "ABCB" "CABC" "ABBCA"] 2) (deref fs))
just gives #{}
What I want is to let the freq-seq recurse to completion then get the value of the atom fs. So I can call mine-freq-seq and have my result returned in the REPL instead of having to manually deref it there.
First some alternate code without the atom then a look at why you get the empty return.
A more compact version where the sequences in a string are derived with a reduce rather than the recursion with regex and substr.
Then just do a frequencies on those results.
(defn local-seqs
[s]
(->> s
(reduce (fn [acc a] (into acc (map #(conj % a) acc))) #{[]})
(map #(apply str %))
(remove empty?)))
(defn freq-seqs
[sdb min-sup]
(->> (mapcat local-seqs sdb)
frequencies
(filter #(>= (second %) min-sup))
set))
That's the whole thing!
I haven't involved an atom because I didn't see a need but add it at the end if freq-seqs if you like.
For your original question: why the return that you see?
You are calling doall with 2 args, the result of your call and a collection. doall is a function and not a macro so the deref is performed immediately.
(defn doall
;; <snip>
([n coll] ;; you have passed #{} as coll
(dorun n coll) ;; and this line evals to nil
coll) ;; and #{} is returned
You have passed your result as the n arg and an empty set as the coll (from (deref fs))
Now when doall calls dorun, it encounters the following:
(defn dorun
;; <snip>
([n coll]
(when (and (seq coll) (pos? n)) ;; coll is #{} so the seq is falesy
(recur (dec n) (next coll)))) ;; and a nil is returned
Since the empty set from fs is the second arg (coll) and and is a macro, it will be falsey on (seq coll), return nil and then doall returns the empty set that was it's second arg.
Final note:
So that is something that works and why yours failed. As to how to make yours work, to fix the call above I tried:
(do (doall (mine-freq-seqs ["CAABC" "ABCB" "CABC" "ABBCA"] 2))
(deref fs))
That is closer to working but with the recusion in your process, it only forces the eval one level deep. So you could push the doall deeper into your funcs but I have proposed a completely different internal structure so I will leave the rest to you if you really need that structure.
I changed it a bit to remove all of the lazy bits (this happens silently in the repl but can be confusing when it changes outside of the repl). Note the changes with vec, mapv, and doall. At least now I get your result:
(def fs (atom #{}))
(defn locally-frequents
[sdb min-sup]
(let [uniq-sdb (map (comp frequencies set) sdb)
freqs (apply merge-with + uniq-sdb)]
(->> freqs
(filter #(<= min-sup (second %)))
(mapv #(vector (str (first %)) (second %))))))
(defn project-sdb
[sdb prefix]
(if (empty? prefix)
sdb
(into [] (->> sdb
(filter #(re-find (re-pattern (str (last prefix))) %))
(map #(subs % (inc (.indexOf % (str (last prefix))))))
(remove empty?)))))
(defn freq-seq
[sdb prefix prefix-support min-sup frequent-seqs]
(if ((complement empty?) prefix) (swap! fs conj [prefix prefix-support]))
(let [lf (locally-frequents sdb min-sup)]
(if (empty? lf)
nil
(vec (for [[item sup] lf] (freq-seq (project-sdb sdb (str prefix item)) (str prefix item) sup min-sup #fs))))))
(defn mine-freq-seqs
[sdb min-sup]
(freq-seq sdb "" 0 min-sup #fs))
(doall (mine-freq-seqs ["CAABC" "ABCB" "CABC" "ABBCA"] 2))
(deref fs) => #{["B" 4] ["BC" 4] ["AB" 4] ["CA" 3]
["CAC" 2] ["AC" 4] ["ABC" 4] ["CAB" 2]
["A" 4] ["CABC" 2] ["ABB" 2] ["CC" 2] ["CB" 3]
["C" 4] ["BB" 2] ["CBC" 2] ["AA" 2]}
I'm still not really sure what the goal is or how/why you get entries like "CABC".

Clojure loop and recur or reduce in state space model

I am trying to write up a simple Markovian state space models, that, as the name suggests iteratively looks back one step to predict the next state.
Here is what is supposed to be a MWE, though it is not because I cannot quite figure out how I am supposed to place (recur ... ) in the below code.
;; helper function
(defn dur-call
[S D]
(if (< 1 D)
(- D 1)
(rand-int S)))
;; helper function
(defn trans-call
[S D]
(if (< 1 D)
S
(rand-int 3)))
;; state space model
(defn test-func
[t]
(loop
[S (rand-int 3)]
(if (<= t 0)
[S (rand-int (+ S 1))]
(let [pastS (first (test-func (- t 1)))
pastD (second (test-func (- t 1)))
S (trans-call pastS pastD)]
(recur ...?)
[S (dur-call S pastD)]))))
My target is to calculate some a state at say time t=5 say, in which case the model needs to look back and calculate states t=[0 1 2 3 4] as well. This should, in my mind, be done well with loop/recur but could also be done with reduce perhaps (not sure how, still new to Clojure). My problem is really that it would seemt have to use recur inside let but that should not work given how loop/recur are designed.
your task is really to generate the next item based on the previous one, starting with some seed. In clojure it can be fulfilled by using iterate function:
user> (take 10 (iterate #(+ 2 %) 1))
(1 3 5 7 9 11 13 15 17 19)
you just have to define the function to produce the next value. It could look like this (not sure about the correctness of the computation algorithm, just based on what is in the question):
(defn next-item [[prev-s prev-d :as prev-item]]
(let [s (trans-call prev-s prev-d)]
[s (dur-call s prev-d)]))
and now let's iterate with it, starting from some value:
user> (take 5 (iterate next-item [3 4]))
([3 4] [3 3] [3 2] [3 1] [0 0])
now your test function could be implemented this way:
(defn test-fn [t]
(when (not (neg? t))
(nth (iterate next-item
(let [s (rand-int 3)]
[s (rand-int (inc s))]))
t)))
you can also do it with loop (but it is still less idiomatic):
(defn test-fn-2 [t]
(when (not (neg? t))
(let [s (rand-int 3)
d (rand-int (inc s))]
(loop [results [[s d]]]
(if (< t (count results))
(peek results)
(recur (conj results (next-item (peek results)))))))))
here we pass all the accumulated results to the next iteration of the loop.
also you can introduce the loop's iteration index and just pass around the last result together with it:
(defn test-fn-3 [t]
(when (not (neg? t))
(let [s (rand-int 3)
d (rand-int (inc s))]
(loop [result [s d] i 0]
(if (= i t)
result
(recur (next-item result) (inc i)))))))
and one more example with reduce:
(defn test-fn-4 [t]
(when (not (neg? t))
(reduce (fn [prev _] (next-item prev))
(let [s (rand-int 3)
d (rand-int (inc s))]
[s d])
(range t))))

Clojure reducer/map not working

I've an algorithm as follows -
(defn max-of
[args]
(into [] (apply map #(apply max %&) args)))
which works fine.
(max-of [[1 7] [3 5] [7 9] [2 2]]) returns [7 9]
It basically finds the maximum element on each position. 7 is the largest first element is the collection and 9 is the largest second element. However, when trying to use reducer/map from core.reducers, i get
CompilerException clojure.lang.ArityException: Wrong number of args (21) passed to: reducers/map
So this does not work -
(defn max-of
[args]
(into [] (apply r/map #(apply max %&) args)))
Why?
UPDATE
my final code is
(defn max-of [[tuple & tuples]]
(into [] (r/fold (fn
([] tuple)
([t1 t2] (map max t1 t2)))
(vec tuples))))
Running a quick bench on it gives Execution time mean : 626.125215 ms
I've got this other algorithm that I wrote before -
(defn max-fold
[seq-arg]
(loop [acc (transient []) t seq-arg]
(if (empty? (first t))
(rseq (persistent! acc))
(recur (conj! acc (apply max (map peek t))) (map pop t)))))
which does that same thing. For this i got - Execution time mean : 308.200310 ms which is twice as fast than the r/fold parallel thingy. Any ideas why?
Btw, if I remove into [] from the r/fold stuff, then I get Execution time mean : 13.101313 ms.
r/map takes [f] or [f coll] - so your apply approach won't work here
user=> (doc r/map)
-------------------------
clojure.core.reducers/map
([f] [f coll])
vs.
user=> (doc map)
-------------------------
clojure.core/map
([f] [f coll] [f c1 c2] [f c1 c2 c3] [f c1 c2 c3 & colls])
The answer to the question why has already been given. So let's answer the next question: "what are you trying to do?"
According to how i've understood your goal (find maximum elements by the position in tuples) and to do it potentially in parallel (as you are trying to use reducers), that's what you have to do
(defn max-of [tuples]
(r/fold (fn
([] (first tuples))
([t1 t2] (map max t1 t2)))
((rest tuples)))
user> (max-of [[1 7] [3 5] [7 9] [2 2]])
(7 9)
(max-of [[1 2 3] [3 2 1] [4 0 4]])
(4 2 4)
user> (max-of [])
nil
user> (max-of [[1 2 3]])
[1 2 3]
or even better with destructuring:
(defn max-of [[tuple & tuples]]
(r/fold (fn
([] tuple)
([t1 t2] (map max t1 t2)))
tuples))
update:
for large data you should optimize it and switch to using vectors
(defn max-of [[tuple & tuples]]
(r/fold (fn
([] tuple)
([t1 t2] (map max t1 t2)))
(vec tuples)))
user> (max-of (repeat 1000000 [1 2 3 4 5 6 7 8 9 10]))
(1 2 3 4 5 6 7 8 9 10)

Does Clojure recursion work backwards?

I'm currently going through the 4clojure Problem 23
My current solution uses recursion to go through the list and append each element to the end of the result of the same function:
(fn self [x] (if (= x [])
x
(conj (self (rest x)) (first x))
))
But when I run it against [1 2 3] it gives me (1 2 3)
What I think it should be doing through recursion is:
(conj (conj (conj (conj (conj [] 5) 4) 3) 2) 1)
which does return
[5 4 3 2 1]
But it is exactly the opposite, so I must be missing something. Also, I don't understand why ones return a vector and the other one returns a list.
When you do (rest v) you're getting a list (not a vector), and then conj is appending to the front each time (not the back):
user=> (defn self [v] (if (empty? v) v (conj (self (rest v)) (first v))))
#'user/self
user=> (self [1 2 3])
(1 2 3)
user=> (defn self [v] (if (empty? v) [] (conj (self (rest v)) (first v))))
#'user/self
user=> (self [1 2 3])
[3 2 1]
user=>
user=> (rest [1])
()
user=> (conj '() 2)
(2)
user=> (conj '(2) 1)
(1 2)
user=>

Clojure Remove item from Vector at a Specified Location

Is there a way to remove an item from a vector based on index as of now i am using subvec to split the vector and recreate it again. I am looking for the reverse of assoc for vectors?
subvec is probably the best way. The Clojure docs say subvec is "O(1) and very fast, as the resulting vector shares structure with the original and no trimming is done". The alternative would be walking the vector and building a new one while skipping certain elements, which would be slower.
Removing elements from the middle of a vector isn't something vectors are necessarily good at. If you have to do this often, consider using a hash-map so you can use dissoc.
See:
subvec at clojuredocs.org
subvec at clojure.github.io, where the official website points to.
(defn vec-remove
"remove elem in coll"
[pos coll]
(into (subvec coll 0 pos) (subvec coll (inc pos))))
user=> (def a [1 2 3 4 5])
user=> (time (dotimes [n 100000] (vec (concat (take 2 a) (drop 3 a)))))
"Elapsed time: 1185.539413 msecs"
user=> (time (dotimes [n 100000] (vec (concat (subvec a 0 2) (subvec a 3 5)))))
"Elapsed time: 760.072048 msecs"
Yup - subvec is fastest
The vector library clojure.core.rrb-vector provides logarithmic time concatenation and slicing. Assuming you need persistence, and considering what you're asking for, a logarithmic time solution is as fast as theoretically possible. In particular, it is much faster than any solution using clojure's native subvec, as the concat step puts any such solution into linear time.
(require '[clojure.core.rrb-vector :as fv])
(let [s (vec [0 1 2 3 4])]
(fv/catvec (fv/subvec s 0 2) (fv/subvec s 3 5)))
; => [0 1 3 4]
Here is a solution iv found to be nice:
(defn index-exclude [r ex]
"Take all indices execpted ex"
(filter #(not (ex %)) (range r)))
(defn dissoc-idx [v & ds]
(map v (index-exclude (count v) (into #{} ds))))
(dissoc-idx [1 2 3] 1 2)
'(1)
subvec is fast ; combined with transients it gives even better results.
Using criterium to benchmark:
user=> (def len 5)
user=> (def v (vec (range 0 5))
user=> (def i (quot len 2))
user=> (def j (inc i))
; using take/drop
user=> (bench
(vec (concat (take i v) (drop j v))))
; Execution time mean : 817,618757 ns
; Execution time std-deviation : 9,371922 ns
; using subvec
user=> (bench
(vec (concat (subvec v 0 i) (subvec v j len))))
; Execution time mean : 604,501041 ns
; Execution time std-deviation : 8,163552 ns
; using subvec and transients
user=> (bench
(persistent!
(reduce conj! (transient (vec (subvec v 0 i))) (subvec v j len))))
; Execution time mean : 307,819500 ns
; Execution time std-deviation : 4,359432 ns
The speedup is even greater at greater lengths ; the same bench with a len equal to 10000 gives means: 1,368250 ms, 953,565863 µs, 314,387437 µs.
Yet another possibility which ought to work with any sequence and not bomb if the index was out of range...
(defn drop-index [col idx]
(filter identity (map-indexed #(if (not= %1 idx) %2) col)))
It may be faster to get the indexes you want.
(def a [1 2 3 4 5])
(def indexes [0 1 3 4])
(time (dotimes [n 100000] (vec (concat (subvec a 0 2) (subvec a 3 5)))))
"Elapsed time: 69.401787 msecs"
(time (dotimes [n 100000] (mapv #(a %) indexes)))
"Elapsed time: 28.18766 msecs"

Resources