Clojure reducer/map not working - dictionary

I've an algorithm as follows -
(defn max-of
[args]
(into [] (apply map #(apply max %&) args)))
which works fine.
(max-of [[1 7] [3 5] [7 9] [2 2]]) returns [7 9]
It basically finds the maximum element on each position. 7 is the largest first element is the collection and 9 is the largest second element. However, when trying to use reducer/map from core.reducers, i get
CompilerException clojure.lang.ArityException: Wrong number of args (21) passed to: reducers/map
So this does not work -
(defn max-of
[args]
(into [] (apply r/map #(apply max %&) args)))
Why?
UPDATE
my final code is
(defn max-of [[tuple & tuples]]
(into [] (r/fold (fn
([] tuple)
([t1 t2] (map max t1 t2)))
(vec tuples))))
Running a quick bench on it gives Execution time mean : 626.125215 ms
I've got this other algorithm that I wrote before -
(defn max-fold
[seq-arg]
(loop [acc (transient []) t seq-arg]
(if (empty? (first t))
(rseq (persistent! acc))
(recur (conj! acc (apply max (map peek t))) (map pop t)))))
which does that same thing. For this i got - Execution time mean : 308.200310 ms which is twice as fast than the r/fold parallel thingy. Any ideas why?
Btw, if I remove into [] from the r/fold stuff, then I get Execution time mean : 13.101313 ms.

r/map takes [f] or [f coll] - so your apply approach won't work here
user=> (doc r/map)
-------------------------
clojure.core.reducers/map
([f] [f coll])
vs.
user=> (doc map)
-------------------------
clojure.core/map
([f] [f coll] [f c1 c2] [f c1 c2 c3] [f c1 c2 c3 & colls])

The answer to the question why has already been given. So let's answer the next question: "what are you trying to do?"
According to how i've understood your goal (find maximum elements by the position in tuples) and to do it potentially in parallel (as you are trying to use reducers), that's what you have to do
(defn max-of [tuples]
(r/fold (fn
([] (first tuples))
([t1 t2] (map max t1 t2)))
((rest tuples)))
user> (max-of [[1 7] [3 5] [7 9] [2 2]])
(7 9)
(max-of [[1 2 3] [3 2 1] [4 0 4]])
(4 2 4)
user> (max-of [])
nil
user> (max-of [[1 2 3]])
[1 2 3]
or even better with destructuring:
(defn max-of [[tuple & tuples]]
(r/fold (fn
([] tuple)
([t1 t2] (map max t1 t2)))
tuples))
update:
for large data you should optimize it and switch to using vectors
(defn max-of [[tuple & tuples]]
(r/fold (fn
([] tuple)
([t1 t2] (map max t1 t2)))
(vec tuples)))
user> (max-of (repeat 1000000 [1 2 3 4 5 6 7 8 9 10]))
(1 2 3 4 5 6 7 8 9 10)

Related

How to batch process values on a channel

I'm trying to figure out how to batch incoming requests, do an action with the values in those requests, and then return the result of that action to each request. A slightly simplified version of my problem looks like the following:
Incoming requests make calls to
(defn process
[values]
;; put values on the queue and wait for result, then return the result
...)
Periodically, another function is called
(defn batch-process
[]
;; take up to 10 of the values from the queue, sum those values,
;; then return the result to their process requests
...)
I think I am lacking the vocabulary to figure out how I should be doing this. Any advice or pointers would be appreciated!
I think I figured it out. The key was passing the out-channels into the batch-process call
(defn batch-process
[]
(let [trigger (chan)
in-chan (chan 100)]
(go (loop []
(let [trigger-val (<! trigger)]
(if trigger-val
(let [temp-chan (take (min 10 (.count (.buf in-chan))) in-chan)
chan-vals (<! (into [] temp-chan))
sum-vals (reduce (fn [cur-sum [num out-chan]] (+ cur-sum num))
0
chan-vals)]
(do (doseq [[num out-chan] chan-vals]
(>! out-chan [num sum-vals]))
(recur)))))))
[trigger in-chan]))
(defn process
[value in-chan]
(let [out-chan (chan)]
(>!! in-chan [2 out-chan])
(<!! out-chan)))
Then keep track of trigger and in-chan after calling batch-process and pass in-chan to process. Putting a "true" value on trigger will trigger a batch-process.
i would propose different approach, simply accumulating data and flush on desired count achieved, providing one more channel to force flush:
(require '[clojure.core.async :as a])
(defn batch-consume [n in]
(let [flush-chan (a/chan)
out-chan (a/chan)]
(a/go-loop [data []]
(a/alt! in ([v] (let [data (conj data v)]
(if (= n (count data))
(do (a/>! out-chan data)
(recur []))
(recur data))))
flush-chan (do (a/>! out-chan data)
(recur []))))
{:out out-chan
:flush flush-chan}))
so that could be used somehow like this:
(let [ch (a/chan)
{:keys [out flush]} (batch-consume 3 ch)]
(a/go-loop []
(let [data (a/<! out)]
;; processing batch
(println data (apply + data)))
(recur))
(a/go (dotimes [i 10] ;; automatic flush demo
(a/>! ch i))
(a/>! flush :flush) ;; flushing the pending 10th item
(dotimes [i 3] ;; force flushing by 2 items
(dotimes [j 2]
(a/>! ch (+ (* 10 i) j)))
(a/>! flush :flush))))
output:
;; [0 1 2] 3
;; [3 4 5] 12
;; [6 7 8] 21
;; [9] 9
;; [0 1] 1
;; [10 11] 21
;; [20 21] 41
notice, that if you pass non positive n to the batch-consume function, you're left with only a force flush (which could also be usable in some cases):
(let [ch (a/chan)
{:keys [out flush]} (batch-consume -1 ch)]
(a/go-loop []
(let [data (a/<! out)]
(println data (apply + data)))
(recur))
(a/go (dotimes [i 10]
(a/>! ch i))
(a/>! flush :flush)
(dotimes [i 3]
(dotimes [j 2]
(a/>! ch (+ (* 10 i) j)))
(a/>! flush :flush))))
;; [0 1 2 3 4 5 6 7 8 9] 45
;; [0 1] 1
;; [10 11] 21
;; [20 21] 41

Clojure, comparing vectors of integers: why "longer" vector is always "greater"? Is there a remedy?

It works like this:
pcc.core=> (compare [4] [2 2])
-1
pcc.core=> (compare [4 0] [2 2])
1
I want a vector comparator with "string semantics":
pcc.core=> (compare-like-strings [4] [2 2])
1 ;; or 2, for that matter
pcc.core=> (compare-like-strings [4 0] [2 2])
1
Is there a lightweigt, nice way to get what I want?
How about:
(defn compare-like-strings [[x & xs] [y & ys]]
(let [c (compare x y)]
(if (and (zero? c) (or xs ys))
(recur xs ys)
c)))
So far it's
(defn cmpv-int
"Compare vectors of integers using 'string semantics'"
[vx vy]
(let [res (first (drop-while zero? (map compare vx vy)))
diffenence (- (count vx) (count vy))]
(if res res diffenence)
)
)
based on Fabian approach.
Why not use subvec?
(defn compare-like-strings
[vec1 vec2]
(let [len (min (count vec1) (count vec2))]
(compare (subvec vec1 0 len)
(subvec vec2 0 len))))
Comparison seems to work if both vectors are the same length, so let me offer this:
(defn compare-vectors
[a b]
(compare
(reduce conj a (map #{} b))
(reduce conj b (map #{} a))))
This is basically padding the inputs with as many nils as necessary before running the comparison. I like how it looks (and it should fit your requirements perfectly) but I'm not particularly sure I'd recommend it to anyone. ;)
(compare-vectors [2 2] [2 2]) ;; => 0
(compare-vectors [4 2] [2 2]) ;; => 1
(compare-vectors [2 2] [4 2]) ;; => -1
(compare-vectors [4] [2 2]) ;; => 1
EDIT: I probably wouldn't - it's terribly inefficient.
As I said in the comments on Diego's answer, I think the least creative approach is best here: just write a loop, enumerate all the cases, and slog through it. As a bonus, this approach also works for arbitrary sequences, possibly lazy, because we don't need to rely on any vector-specific tricks.
(defn lexicographic-compare
([xs ys]
(lexicographic-compare compare xs ys))
([compare xs ys]
(loop [xs (seq xs) ys (seq ys)]
(if xs
(if ys
(let [c (compare (first xs) (first ys))]
(if (not (zero? c))
c
(recur (next xs), (next ys))))
1)
(if ys
-1
0)))))
Maybe like this?
(defn compare-like-strings [a b]
(let [res (first (drop-while zero? (map compare a b)))]
(if (nil? res)
0
res)))
The idea would be to do a pairwise comparison, returning a seq of -1, 0, or 1s and then drop all leading 0s. The first non-zero element is the first element that differs.

Does Clojure recursion work backwards?

I'm currently going through the 4clojure Problem 23
My current solution uses recursion to go through the list and append each element to the end of the result of the same function:
(fn self [x] (if (= x [])
x
(conj (self (rest x)) (first x))
))
But when I run it against [1 2 3] it gives me (1 2 3)
What I think it should be doing through recursion is:
(conj (conj (conj (conj (conj [] 5) 4) 3) 2) 1)
which does return
[5 4 3 2 1]
But it is exactly the opposite, so I must be missing something. Also, I don't understand why ones return a vector and the other one returns a list.
When you do (rest v) you're getting a list (not a vector), and then conj is appending to the front each time (not the back):
user=> (defn self [v] (if (empty? v) v (conj (self (rest v)) (first v))))
#'user/self
user=> (self [1 2 3])
(1 2 3)
user=> (defn self [v] (if (empty? v) [] (conj (self (rest v)) (first v))))
#'user/self
user=> (self [1 2 3])
[3 2 1]
user=>
user=> (rest [1])
()
user=> (conj '() 2)
(2)
user=> (conj '(2) 1)
(1 2)
user=>

How do I map a vector to a map, pushing into it repeated key values?

This is my input data:
[[:a 1 2] [:a 3 4] [:a 5 6] [:b \a \b] [:b \c \d] [:b \e \f]]
I would like to map this into the following:
{:a [[1 2] [3 4] [5 6]] :b [[\a \b] [\c \d] [\e \f]]}
This is what I have so far:
(defn- build-annotation-map [annotation & m]
(let [gff (first annotation)
remaining (rest annotation)
seqname (first gff)
current {seqname [(nth gff 3) (nth gff 4)]}]
(if (not (seq remaining))
m
(let [new-m (merge-maps current m)]
(apply build-annotation-map remaining new-m)))))
(defn- merge-maps [m & ms]
(apply merge-with conj
(when (first ms)
(reduce conj ;this is to avoid [1 2 [3 4 ... etc.
(map (fn [k] {k []}) (keys m))))
m ms))
The above produces:
{:a [[1 2] [[3 4] [5 6]]] :b [[\a \b] [[\c \d] [\e \f]]]}
It seems clear to me that the problem is in merge-maps, specifically with the function passed to merge-with (conj), but after banging my head for a while now, I'm about ready for someone to help me out.
I'm new to lisp in general, and clojure in particular, so I also appreciate comments not specifically addressing the problem, but also style, brain-dead constructs on my part, etc. Thanks!
Solution (close enough, anyway):
(group-by first [[:a 1 2] [:a 3 4] [:a 5 6] [:b \a \b] [:b \c \d] [:b \e \f]])
=> {:a [[:a 1 2] [:a 3 4] [:a 5 6]], :b [[:b \a \b] [:b \c \d] [:b \e \f]]}
(defn build-annotations [coll]
(reduce (fn [m [k & vs]]
(assoc m k (conj (m k []) (vec vs))))
{} coll))
Concerning your code, the most significant problem is naming. Firstly, I wouldn't, especially without first understanding your code, have any idea what is meant by annotation, gff, and seqname. current is pretty ambiguous too. In Clojure, remaining would generally be called more, depending on the context, and whether a more specific name should be used.
Within your let statement, gff (first annotation)
remaining (rest annotation), I'd probably take advantage of destructuring, like this:
(let [[first & more] annotation] ...)
If you would rather use (rest annotation) then I'd suggest using next instead, as it will return nil if it's empty, and allow you to write (if-not remaining ...) rather than (if-not (seq remaining) ...).
user> (next [])
nil
user> (rest [])
()
In Clojure, unlike other lisps, the empty list is truthy.
This article shows the standard for idiomatic naming.
Works at least on the given data set.
(defn build-annotations [coll]
(reduce
(fn [result vec]
(let [key (first vec)
val (subvec vec 1)
old-val (get result key [])
conjoined-val (conj old-val val)]
(assoc
result
key
conjoined-val)))
{}
coll))
(build-annotations [[:a 1 2] [:a 3 4] [:a 5 6] [:b \a \b] [:b \c \d] [:b \e \f]])
I am sorry for not offering improvements on your code. I am just learning Clojure and it is easier to solve problems piece by piece instead of understanding a bigger piece of code and finding the problems in it.
Although I have no comments to your code yet, I tried it for my own and came up with this solution:
(defn build-annotations [coll]
(let [anmap (group-by first coll)]
(zipmap (keys anmap) (map #(vec (map (comp vec rest) %)) (vals anmap)))))
Here's my entry leveraging group-by, although several steps in here are really concerned with returning vectors rather than lists. If you drop that requirement, it gets a bit simpler:
(defn f [s]
(let [g (group-by first s)
k (keys g)
v (vals g)
cleaned-v (for [group v]
(into [] (map (comp #(into [] %) rest) group)))]
(zipmap k cleaned-v)))
Depending what you actually want, you might even be able to get by with just doing group-by.
(defn build-annotations [coll]
(apply merge-with concat
(map (fn [[k & vals]] {k [vals]})
coll))
So,
(map (fn [[k & vals]] {k [vals]})
coll))
takes a collection of [keys & values] and returns a list of {key [values]}
(apply merge-with concat ...list of maps...)
takes a list of maps, merges them together, and concats the values if a key already exists.

Clojure Remove item from Vector at a Specified Location

Is there a way to remove an item from a vector based on index as of now i am using subvec to split the vector and recreate it again. I am looking for the reverse of assoc for vectors?
subvec is probably the best way. The Clojure docs say subvec is "O(1) and very fast, as the resulting vector shares structure with the original and no trimming is done". The alternative would be walking the vector and building a new one while skipping certain elements, which would be slower.
Removing elements from the middle of a vector isn't something vectors are necessarily good at. If you have to do this often, consider using a hash-map so you can use dissoc.
See:
subvec at clojuredocs.org
subvec at clojure.github.io, where the official website points to.
(defn vec-remove
"remove elem in coll"
[pos coll]
(into (subvec coll 0 pos) (subvec coll (inc pos))))
user=> (def a [1 2 3 4 5])
user=> (time (dotimes [n 100000] (vec (concat (take 2 a) (drop 3 a)))))
"Elapsed time: 1185.539413 msecs"
user=> (time (dotimes [n 100000] (vec (concat (subvec a 0 2) (subvec a 3 5)))))
"Elapsed time: 760.072048 msecs"
Yup - subvec is fastest
The vector library clojure.core.rrb-vector provides logarithmic time concatenation and slicing. Assuming you need persistence, and considering what you're asking for, a logarithmic time solution is as fast as theoretically possible. In particular, it is much faster than any solution using clojure's native subvec, as the concat step puts any such solution into linear time.
(require '[clojure.core.rrb-vector :as fv])
(let [s (vec [0 1 2 3 4])]
(fv/catvec (fv/subvec s 0 2) (fv/subvec s 3 5)))
; => [0 1 3 4]
Here is a solution iv found to be nice:
(defn index-exclude [r ex]
"Take all indices execpted ex"
(filter #(not (ex %)) (range r)))
(defn dissoc-idx [v & ds]
(map v (index-exclude (count v) (into #{} ds))))
(dissoc-idx [1 2 3] 1 2)
'(1)
subvec is fast ; combined with transients it gives even better results.
Using criterium to benchmark:
user=> (def len 5)
user=> (def v (vec (range 0 5))
user=> (def i (quot len 2))
user=> (def j (inc i))
; using take/drop
user=> (bench
(vec (concat (take i v) (drop j v))))
; Execution time mean : 817,618757 ns
; Execution time std-deviation : 9,371922 ns
; using subvec
user=> (bench
(vec (concat (subvec v 0 i) (subvec v j len))))
; Execution time mean : 604,501041 ns
; Execution time std-deviation : 8,163552 ns
; using subvec and transients
user=> (bench
(persistent!
(reduce conj! (transient (vec (subvec v 0 i))) (subvec v j len))))
; Execution time mean : 307,819500 ns
; Execution time std-deviation : 4,359432 ns
The speedup is even greater at greater lengths ; the same bench with a len equal to 10000 gives means: 1,368250 ms, 953,565863 µs, 314,387437 µs.
Yet another possibility which ought to work with any sequence and not bomb if the index was out of range...
(defn drop-index [col idx]
(filter identity (map-indexed #(if (not= %1 idx) %2) col)))
It may be faster to get the indexes you want.
(def a [1 2 3 4 5])
(def indexes [0 1 3 4])
(time (dotimes [n 100000] (vec (concat (subvec a 0 2) (subvec a 3 5)))))
"Elapsed time: 69.401787 msecs"
(time (dotimes [n 100000] (mapv #(a %) indexes)))
"Elapsed time: 28.18766 msecs"

Resources