I'm a bit lost with usage of transients in clojure. Any help will be appreciated.
The sample code:
(defn test-transient [v]
(let [b (transient [])]
(for [x v] (conj! b x))
(persistent! b)))
user> (test-transient [1 2 3])
[]
I tried to make it persistent before return and the result is:
(defn test-transient2 [v]
(let [b (transient [])]
(for [x v] (conj! b x))
(persistent! b)
b))
user> (test-transient2 [1 2 3])
#<TransientVector clojure.lang.PersistentVector$TransientVector#1dfde20>
But if I use conj! separately it seems work ok:
(defn test-transient3 [v]
(let [b (transient [])]
(conj! b 0)
(conj! b 1)
(conj! b 2)
(persistent! b)))
user> (test-transient3 [1 2 3])
[0 1 2]
Does for has some constraint? If so, how can i copy values from persistent vector to transient?
Thank you.
Transients aren't supposed to be bashed in-place like that. Your last example only works due to implementation details which you shouldn't rely on.
The reason why for doesn't work is that it is lazy and the conj! calls are never executed, but that is besides the point, as you shouldn't work with transients that way anyway.
You should use conj! the same way as you would use the "regular" conj with immutable vectors - by using the return value.
What you are trying to do could be accomplished like this:
(defn test-transient [v]
(let [t (transient [])]
(persistent! (reduce conj! t v))))
Related
I know this is a recurring question (here, here, and more), and I know that the problem is related to creating lazy sequencies, but I can't see why it fails.
The problem: I had written a (not very nice) quicksort algorithm to sort strings that uses loop/recur. But applied to 10000 elements, I get a StackOverflowError:
(defn qsort [list]
(loop [[current & todo :as all] [list] sorted []]
(cond
(nil? current) sorted
(or (nil? (seq current)) (= (count current) 1)) (recur todo (concat sorted current))
:else (let [[pivot & rest] current
pred #(> (compare pivot %) 0)
lt (filter pred rest)
gte (remove pred rest)
work (list* lt [pivot] gte todo)]
(recur work sorted)))))
I used in this way:
(defn tlfnum [] (str/join (repeatedly 10 #(rand-int 10))))
(defn tlfbook [n] (repeatedly n #(tlfnum)))
(time (count (qsort (tlfbook 10000))))
And this is part of the stack trace:
[clojure.lang.LazySeq seq "LazySeq.java" 49]
[clojure.lang.RT seq "RT.java" 521]
[clojure.core$seq__4357 invokeStatic "core.clj" 137]
[clojure.core$concat$fn__4446 invoke "core.clj" 706]
[clojure.lang.LazySeq sval "LazySeq.java" 40]
[clojure.lang.LazySeq seq "LazySeq.java" 49]
[clojure.lang.RT seq "RT.java" 521]
[clojure.core$seq__4357 invokeStatic "core.clj" 137]]}
As far as I know, loop/recur performs tail call optimization, so no stack is used (is, in fact, an iterative process written using recursive syntax).
Reading other answers, and because of the stack trace, I see there's a problem with concat and adding a doall before concat solves the stack overflow problem. But... why?
Here's part of the code for the two-arity version of concat.
(defn concat [x y]
(lazy-seq
(let [s (seq x)]
,,,))
)
Notice that it uses two other functions, lazy-seq, and seq. lazy-seq is a bit like a lambda, it wraps some code without executing it yet. The code inside the lazy-seq block has to result in some kind of sequence value. When you call any sequence operation on the lazy-seq, then it will first evaluate the code ("realize" the lazy seq), and then perform the operation on the result.
(def lz (lazy-seq
(println "Realizing!")
'(1 2 3)))
(first lz)
;; prints "realizing"
;; => 1
Now try this:
(defn lazy-conj [xs x]
(lazy-seq
(println "Realizing" x)
(conj (seq xs) x)))
Notice that it's similar to concat, it calls seq on its first argument, and returns a lazy-seq
(def up-to-hundred
(reduce lazy-conj () (range 100)))
(first up-to-hundred)
;; prints "Realizing 99"
;; prints "Realizing 98"
;; prints "Realizing 97"
;; ...
;; => 99
Even though you asked for only the first element, it still ended up realizing the whole sequence. That's because realizing the outer "layer" results in calling seq on the next "layer", which realizes another lazy-seq, which again calls seq, etc. So it's a chain reaction that realizes everything, and each step consumes a stack frame.
(def up-to-ten-thousand
(reduce lazy-conj () (range 10000)))
(first up-to-ten-thousand)
;;=> java.lang.StackOverflowError
You get the same problem when stacking concat calls. That's why for instance (reduce concat ,,,) is always a smell, instead you can use (apply concat ,,,) or (into () cat ,,,).
Other lazy operators like filter and map can exhibit the exact same problem. If you really have a lot of transformation steps over a sequence consider using transducers instead.
;; without transducers: many intermediate lazy seqs and deep call stacks
(->> my-seq
(map foo)
(filter bar)
(map baz)
,,,)
;; with transducers: seq processed in a single pass
(sequence (comp
(map foo)
(filter bar)
(map baz))
my-seq)
Arne had a good answer (and, in fact, I'd never noticed cat before!). If you want a simpler solution, you can use the glue function from the Tupelo library:
Gluing Together Like Collections
The concat function can sometimes have rather surprising results:
(concat {:a 1} {:b 2} {:c 3} )
;=> ( [:a 1] [:b 2] [:c 3] )
In this example, the user probably meant to merge the 3 maps into one. Instead, the three maps were mysteriously converted into length-2 vectors, which were then nested inside another sequence.
The conj function can also surprise the user:
(conj [1 2] [3 4] )
;=> [1 2 [3 4] ]
Here the user probably wanted to get [1 2 3 4] back, but instead got a nested vector by mistake.
Instead of having to wonder if the items to be combined will be merged, nested, or converted into another data type, we provide the glue function to always combine like collections together into a result collection of the same type:
; Glue together like collections:
(is (= (glue [ 1 2] '(3 4) [ 5 6] ) [ 1 2 3 4 5 6 ] )) ; all sequential (vectors & lists)
(is (= (glue {:a 1} {:b 2} {:c 3} ) {:a 1 :c 3 :b 2} )) ; all maps
(is (= (glue #{1 2} #{3 4} #{6 5} ) #{ 1 2 6 5 3 4 } )) ; all sets
(is (= (glue "I" " like " \a " nap!" ) "I like a nap!" )) ; all text (strings & chars)
; If you want to convert to a sorted set or map, just put an empty one first:
(is (= (glue (sorted-map) {:a 1} {:b 2} {:c 3}) {:a 1 :b 2 :c 3} ))
(is (= (glue (sorted-set) #{1 2} #{3 4} #{6 5}) #{ 1 2 3 4 5 6 } ))
An Exception will be thrown if the collections to be 'glued' are not all of the same type. The allowable input types are:
all sequential: any mix of lists & vectors (vector result)
all maps (sorted or not)
all sets (sorted or not)
all text: any mix of strings & characters (string result)
I put glue into your code instead of concat and still got a StackOverflowError. So, I also replaced the lazy filter and remove with eager versions keep-if and drop-if to get this result:
(defn qsort [list]
(loop [[current & todo :as all] [list] sorted []]
(cond
(nil? current) sorted
(or (nil? (seq current)) (= (count current) 1))
(recur todo (glue sorted current))
:else (let [[pivot & rest] current
pred #(> (compare pivot %) 0)
lt (keep-if pred rest)
gte (drop-if pred rest)
work (list* lt [pivot] gte todo)]
(recur work sorted)))))
(defn tlfnum [] (str/join (repeatedly 10 #(rand-int 10))))
(defn tlfbook [n] (repeatedly n #(tlfnum)))
(def result
(time (count (qsort (tlfbook 10000)))))
-------------------------------------
Clojure 1.8.0 Java 1.8.0_111
-------------------------------------
"Elapsed time: 1377.321118 msecs"
result => 10000
It works like this:
pcc.core=> (compare [4] [2 2])
-1
pcc.core=> (compare [4 0] [2 2])
1
I want a vector comparator with "string semantics":
pcc.core=> (compare-like-strings [4] [2 2])
1 ;; or 2, for that matter
pcc.core=> (compare-like-strings [4 0] [2 2])
1
Is there a lightweigt, nice way to get what I want?
How about:
(defn compare-like-strings [[x & xs] [y & ys]]
(let [c (compare x y)]
(if (and (zero? c) (or xs ys))
(recur xs ys)
c)))
So far it's
(defn cmpv-int
"Compare vectors of integers using 'string semantics'"
[vx vy]
(let [res (first (drop-while zero? (map compare vx vy)))
diffenence (- (count vx) (count vy))]
(if res res diffenence)
)
)
based on Fabian approach.
Why not use subvec?
(defn compare-like-strings
[vec1 vec2]
(let [len (min (count vec1) (count vec2))]
(compare (subvec vec1 0 len)
(subvec vec2 0 len))))
Comparison seems to work if both vectors are the same length, so let me offer this:
(defn compare-vectors
[a b]
(compare
(reduce conj a (map #{} b))
(reduce conj b (map #{} a))))
This is basically padding the inputs with as many nils as necessary before running the comparison. I like how it looks (and it should fit your requirements perfectly) but I'm not particularly sure I'd recommend it to anyone. ;)
(compare-vectors [2 2] [2 2]) ;; => 0
(compare-vectors [4 2] [2 2]) ;; => 1
(compare-vectors [2 2] [4 2]) ;; => -1
(compare-vectors [4] [2 2]) ;; => 1
EDIT: I probably wouldn't - it's terribly inefficient.
As I said in the comments on Diego's answer, I think the least creative approach is best here: just write a loop, enumerate all the cases, and slog through it. As a bonus, this approach also works for arbitrary sequences, possibly lazy, because we don't need to rely on any vector-specific tricks.
(defn lexicographic-compare
([xs ys]
(lexicographic-compare compare xs ys))
([compare xs ys]
(loop [xs (seq xs) ys (seq ys)]
(if xs
(if ys
(let [c (compare (first xs) (first ys))]
(if (not (zero? c))
c
(recur (next xs), (next ys))))
1)
(if ys
-1
0)))))
Maybe like this?
(defn compare-like-strings [a b]
(let [res (first (drop-while zero? (map compare a b)))]
(if (nil? res)
0
res)))
The idea would be to do a pairwise comparison, returning a seq of -1, 0, or 1s and then drop all leading 0s. The first non-zero element is the first element that differs.
In clojure, I would like to write a tail-recursive function that memoizes its intermediate results for subsequent calls.
[EDIT: this question has been rewritten using gcd as an example instead of factorial.]
The memoized gcd (greatest common divisor) could be implemented like this:
(def gcd (memoize (fn [a b]
(if (zero? b)
a
(recur b (mod a b))))
In this implementation, intermediate results are not memoized for subsequent calls. For example, in order to calculate gcd(9,6), gcd(6,3) is called as an intermediate result. However, gcd(6,3) is not stored in the cache of the memoized function because the recursion point of recur is the anonymous function that is not memoized.
Therefore, if after having called gcd(9,6), we call gcd(6,3) we won't benefit from the memoization.
The only solution I can think about will be to use mundane recursion (explicitely call gcd instead of recur) but then we will not benefit from Tail Call Optimization.
Bottom Line
Is there a way to achieve both:
Tail call optimization
Memoization of intermediate results for subsequent calls
Remarks
This question is similar to Combine memoization and tail-recursion. But all the answers there are related to F#. Here, I am looking for an answer in clojure.
This question has been left as an exercise for the reader by The Joy of Clojure (chap 12.4). You can consult the relevant page of the book at http://bit.ly/HkQrio.
in your case it's hard to show memoize do anything with factorial because the intermediate calls are unique, so I'll rewrite a somewhat contrived example assuming the point is to explore ways to avoid blowing the stack:
(defn stack-popper [n i]
(if (< i n) (* i (stack-popper n (inc i))) 1))
which can then get something out of a memoize:
(def stack-popper
(memoize (fn [n i] (if (< i n) (* i (stack-popper n (inc i))) 1))))
the general approaches to not blowing the stack are:
use tail calls
(def stack-popper
(memoize (fn [n acc] (if (> n 1) (recur (dec n) (* acc (dec n))) acc))))
use trampolines
(def stack-popper
(memoize (fn [n acc]
(if (> n 1) #(stack-popper (dec n) (* acc (dec n))) acc))))
(trampoline (stack-popper 4 1))
use a lazy sequence
(reduce * (range 1 4))
None of these work all the time, though I have yet to hit a case where none of them work. I almost always go for the lazy ones first because I find them to be most clojure like, then I head for tail calling with recur or tramplines
(defmacro memofn
[name args & body]
`(let [cache# (atom {})]
(fn ~name [& args#]
(let [update-cache!# (fn update-cache!# [state# args#]
(if-not (contains? state# args#)
(assoc state# args#
(delay
(let [~args args#]
~#body)))
state#))]
(let [state# (swap! cache# update-cache!# args#)]
(-> state# (get args#) deref))))))
This will allow a recursive definition of a memoized function, which also caches intermediate results. Usage:
(def fib (memofn fib [n]
(case n
1 1
0 1
(+ (fib (dec n)) (fib (- n 2))))))
(def gcd
(let [cache (atom {})]
(fn [a b]
#(or (#cache [a b])
(let [p (promise)]
(deliver p
(loop [a a b b]
(if-let [p2 (#cache [a b])]
#p2
(do
(swap! cache assoc [a b] p)
(if (zero? b)
a
(recur b (mod a b))))))))))))
There is some concurrency issues (double evaluation, the same problem as with memoize, but worse because of the promises) which may be fixed using #kotarak's advice.
Turning the above code into a macro is left as an exercise to the reader. (Fogus's note was imo tongue-in-cheek.)
Turning this into a macro is really a simple exercise in macrology, please remark that the body (the 3 last lines) remain unchanged.
Using Clojure's recur you can write factorial using an accumulator that has no stack growth, and just memoize it:
(defn fact
([n]
(fact n 1))
([n acc]
(if (= 1 n)
acc
(recur (dec n)
(* n acc)))))
This is factorial function implemented with anonymous recursion with tail call and memoization of intermediate results. The memoization is integrated with the function and a reference to shared buffer (implemented using Atom reference type) is passed by a lexical closure.
Since the factorial function operates on natural numbers and the arguments for succesive results are incremental, Vector seems more tailored data structure to store buffered results.
Instead of passing the result of a previous computation as an argument (accumulator) we're getting it from the buffer.
(def ! ; global variable referring to a function
(let [m (atom [1 1 2 6 24])] ; buffer of results
(fn [n] ; factorial function definition
(let [m-count (count #m)] ; number of results in a buffer
(if (< n m-count) ; do we have buffered result for n?
(nth #m n) ; · yes: return it
(loop [cur m-count] ; · no: compute it recursively
(let [r (*' (nth #m (dec cur)) cur)] ; new result
(swap! m assoc cur r) ; store the result
(if (= n cur) ; termination condition:
r ; · base case
(recur (inc cur)))))))))) ; · recursive case
(time (do (! 8000) nil)) ; => "Elapsed time: 154.280516 msecs"
(time (do (! 8001) nil)) ; => "Elapsed time: 0.100222 msecs"
(time (do (! 7999) nil)) ; => "Elapsed time: 0.090444 msecs"
(time (do (! 7999) nil)) ; => "Elapsed time: 0.055873 msecs"
This is my input data:
[[:a 1 2] [:a 3 4] [:a 5 6] [:b \a \b] [:b \c \d] [:b \e \f]]
I would like to map this into the following:
{:a [[1 2] [3 4] [5 6]] :b [[\a \b] [\c \d] [\e \f]]}
This is what I have so far:
(defn- build-annotation-map [annotation & m]
(let [gff (first annotation)
remaining (rest annotation)
seqname (first gff)
current {seqname [(nth gff 3) (nth gff 4)]}]
(if (not (seq remaining))
m
(let [new-m (merge-maps current m)]
(apply build-annotation-map remaining new-m)))))
(defn- merge-maps [m & ms]
(apply merge-with conj
(when (first ms)
(reduce conj ;this is to avoid [1 2 [3 4 ... etc.
(map (fn [k] {k []}) (keys m))))
m ms))
The above produces:
{:a [[1 2] [[3 4] [5 6]]] :b [[\a \b] [[\c \d] [\e \f]]]}
It seems clear to me that the problem is in merge-maps, specifically with the function passed to merge-with (conj), but after banging my head for a while now, I'm about ready for someone to help me out.
I'm new to lisp in general, and clojure in particular, so I also appreciate comments not specifically addressing the problem, but also style, brain-dead constructs on my part, etc. Thanks!
Solution (close enough, anyway):
(group-by first [[:a 1 2] [:a 3 4] [:a 5 6] [:b \a \b] [:b \c \d] [:b \e \f]])
=> {:a [[:a 1 2] [:a 3 4] [:a 5 6]], :b [[:b \a \b] [:b \c \d] [:b \e \f]]}
(defn build-annotations [coll]
(reduce (fn [m [k & vs]]
(assoc m k (conj (m k []) (vec vs))))
{} coll))
Concerning your code, the most significant problem is naming. Firstly, I wouldn't, especially without first understanding your code, have any idea what is meant by annotation, gff, and seqname. current is pretty ambiguous too. In Clojure, remaining would generally be called more, depending on the context, and whether a more specific name should be used.
Within your let statement, gff (first annotation)
remaining (rest annotation), I'd probably take advantage of destructuring, like this:
(let [[first & more] annotation] ...)
If you would rather use (rest annotation) then I'd suggest using next instead, as it will return nil if it's empty, and allow you to write (if-not remaining ...) rather than (if-not (seq remaining) ...).
user> (next [])
nil
user> (rest [])
()
In Clojure, unlike other lisps, the empty list is truthy.
This article shows the standard for idiomatic naming.
Works at least on the given data set.
(defn build-annotations [coll]
(reduce
(fn [result vec]
(let [key (first vec)
val (subvec vec 1)
old-val (get result key [])
conjoined-val (conj old-val val)]
(assoc
result
key
conjoined-val)))
{}
coll))
(build-annotations [[:a 1 2] [:a 3 4] [:a 5 6] [:b \a \b] [:b \c \d] [:b \e \f]])
I am sorry for not offering improvements on your code. I am just learning Clojure and it is easier to solve problems piece by piece instead of understanding a bigger piece of code and finding the problems in it.
Although I have no comments to your code yet, I tried it for my own and came up with this solution:
(defn build-annotations [coll]
(let [anmap (group-by first coll)]
(zipmap (keys anmap) (map #(vec (map (comp vec rest) %)) (vals anmap)))))
Here's my entry leveraging group-by, although several steps in here are really concerned with returning vectors rather than lists. If you drop that requirement, it gets a bit simpler:
(defn f [s]
(let [g (group-by first s)
k (keys g)
v (vals g)
cleaned-v (for [group v]
(into [] (map (comp #(into [] %) rest) group)))]
(zipmap k cleaned-v)))
Depending what you actually want, you might even be able to get by with just doing group-by.
(defn build-annotations [coll]
(apply merge-with concat
(map (fn [[k & vals]] {k [vals]})
coll))
So,
(map (fn [[k & vals]] {k [vals]})
coll))
takes a collection of [keys & values] and returns a list of {key [values]}
(apply merge-with concat ...list of maps...)
takes a list of maps, merges them together, and concats the values if a key already exists.
What function can I put as FOO here to yield true at the end? I played with hash-set (only correct for first 2 values), conj, and concat but I know I'm not handling the single-element vs set condition properly with just any of those.
(defn mergeMatches [propertyMapList]
"Take a list of maps and merges them combining values into a set"
(reduce #(merge-with FOO %1 %2) {} propertyMapList))
(def in
(list
{:a 1}
{:a 2}
{:a 3}
{:b 4}
{:b 5}
{:b 6} ))
(def out
{ :a #{ 1 2 3}
:b #{ 4 5 6} })
; this should return true
(= (mergeMatches in) out)
What is the most idiomatic way to handle this?
This'll do:
(let [set #(if (set? %) % #{%})]
#(clojure.set/union (set %) (set %2)))
Rewritten more directly for the example (Alex):
(defn to-set [s]
(if (set? s) s #{s}))
(defn set-union [s1 s2]
(clojure.set/union (to-set s1) (to-set s2)))
(defn mergeMatches [propertyMapList]
(reduce #(merge-with set-union %1 %2) {} propertyMapList))
I didn't write this but it was contributed by #amitrathore on Twitter:
(defn kv [bag [k v]]
(update-in bag [k] conj v))
(defn mergeMatches [propertyMapList]
(reduce #(reduce kv %1 %2) {} propertyMapList))
I wouldn't use merge-with for this,
(defn fnil [f not-found]
(fn [x y] (f (if (nil? x) not-found x) y)))
(defn conj-in [m map-entry]
(update-in m [(key map-entry)] (fnil conj #{}) (val map-entry)))
(defn merge-matches [property-map-list]
(reduce conj-in {} (apply concat property-map-list)))
user=> (merge-matches in)
{:b #{4 5 6}, :a #{1 2 3}}
fnil will be part of core soon so you can ignore the implementation... but it just creates a version of another function that can handle nil arguments. In this case conj will substitute #{} for nil.
So the reduction conjoining to a set for every key/value in the list of maps supplied.
Another solution contributed by #wmacgyver on Twitter based on multimaps:
(defn add
"Adds key-value pairs the multimap."
([mm k v]
(assoc mm k (conj (get mm k #{}) v)))
([mm k v & kvs]
(apply add (add mm k v) kvs)))
(defn mm-merge
"Merges the multimaps, taking the union of values."
[& mms]
(apply (partial merge-with union) mms))
(defn mergeMatches [property-map-list]
(reduce mm-merge (map #(add {} (key (first %)) (val (first %))) property-map-list)))
This seems to work:
(defn FOO [v1 v2]
(if (set? v1)
(apply hash-set v2 v1)
(hash-set v1 v2)))
Not super pretty but it works.
(defn mergeMatches [propertyMapList]
(for [k (set (for [pp propertyMapList] (key (first pp))))]
{k (set (remove nil? (for [pp propertyMapList] (k pp))))}))