I'm trying to concatenate a Seq of Seqs.
I can do it with apply concat.
user=> (count (apply concat (repeat 3000 (repeat 3000 true))))
9000000
However, from my limited knowledge, I would assume that the use of apply forces the lazy Seq to be realised, and that doesn't seem right for very large inputs. I'd rather do this lazily if I can.
So I thought that using reduce would do the job.
user=> (count (reduce concat (repeat 3000 (repeat 3000 true))))
But this results in
StackOverflowError clojure.lang.RT.seq (RT.java:484)
I'm surprised because I would have thought that the semantics of reduce would mean it was tail-call recursive.
Two questions:
Is apply the best way to do this?
Is reduce generally inappropriate for large inputs?
Use apply. When the function argument is lazy, so is apply.
Let's check that with a counting side effect on the underlying sub-sequences:
(def counter (atom 0))
(def ss (repeatedly 3000
(fn [] (repeatedly 3000
(fn [] (do (swap! counter inc) true))))))
(def foo (apply concat ss))
so.core=> #counter
0
so.core=> (dorun (take 1 foo))
nil
so.core=> #counter
1
so.core=> (dorun (take 3001 foo))
nil
so.core=> #counter
3001
reduce with a large number of concats overflows due to thunk composition
Lazy sequences, such as those produced by concat are implemented with thunks, delayed function calls. When you concat the result of a concat you have nested a thunk within another thunk. In your function, the nesting goes 3000 deep and thus the stack is overflowed as soon as the first item is requested and the 3000 nested thunks are unwound.
so.core=> (def bar (reduce concat (repeat 3000 (repeat 3000 true))))
#'so.core/bar
so.core=> (first bar)
StackOverflowError clojure.lang.LazySeq.seq (LazySeq.java:49)
The implementation of lazy-sequences will in general unwind nested thunks trampoline style when seqed and not blow the stack:
so.core=> (loop [lz [1], n 0]
(if (< n 3000) (recur (lazy-seq lz) (inc n)) lz))
(1)
However, if you call seq within the lazy-sequence on the unrealized portion while realizing it...
so.core=> (loop [lz [1], n 0]
(if (< n 3000) (recur (lazy-seq (seq lz)) (inc n)) lz))
StackOverflowError so.core/eval1405/fn--1406 (form-init584039696026177116.clj:1)
so.core=> (pst 3000)
StackOverflowError
so.core/eval1619/fn--1620 (form-init584039696026177116.clj:2)
clojure.lang.LazySeq.sval (LazySeq.java:40)
clojure.lang.LazySeq.seq (LazySeq.java:49)
clojure.lang.RT.seq (RT.java:484)
clojure.core/seq (core.clj:133)
so.core/eval1619/fn--1620 (form-init584039696026177116.clj:2)
clojure.lang.LazySeq.sval (LazySeq.java:40)
clojure.lang.LazySeq.seq (LazySeq.java:49)
clojure.lang.RT.seq (RT.java:484)
clojure.core/seq (core.clj:133)
... (repeatedly)
Then you end up building seq stack frames. The implementation of concat is such. Examine the stack trace for your StackOverflowError with concat and you will see similar.
I can suggest a way to avoid the problem. The reduce function isn't the problem here; concat is.
Take a look at:
https://stuartsierra.com/2015/04/26/clojure-donts-concat
Instead of using concat use into
(count (reduce into (repeat 3000 (repeat 3000 true))))
9000000
Related
Here is an example:
;; Helper function for marking multiples of a number as 0
(def mark (fn [[x & xs] k m]
(if (= k m)
(cons 0 (mark xs 1 m))
(cons x (mark xs (inc k) m))
)))
;; Sieve of Eratosthenes
(defn sieve
[x & xs]
(if (= x 0)
(sieve xs)
(cons x (sieve (mark xs 1 x)))
))
(take 10 (lazy-seq (sieve (iterate inc 2))))
It produces a StackOverflowError.
There are a couple of issues here. First, as pointed out in the other answer, your mark and sieve functions don't have terminating conditions. It looks like they are designed to work with infinite sequences, but if you passed a finite-length sequence they'd keep going off the end.
The deeper problem here is that it looks like you're trying to have a function create a lazy infinite sequence by recursively calling itself. However, cons is not lazy in any way; it is a pure function call, so the recursive calls to mark and sieve are invoked immediately. Wrapping the outer-most call to sieve in lazy-seq only serves to defer the initial call; it does not make the entire sequence lazy. Instead, each call to cons must be wrapped in its own lazy sequence.
For instance:
(defn eager-iterate [f x]
(cons x (eager-iterate f (f x))))
(take 3 (eager-iterate inc 0)) ; => StackOverflowError
(take 3 (lazy-seq (eager-iterate inc 0))) ; => Still a StackOverflowError
Compare this with the actual source code of iterate:
(defn iterate
"Returns a lazy sequence of x, (f x), (f (f x)) etc. f must be free of side-effects"
{:added "1.0"
:static true}
[f x] (cons x (lazy-seq (iterate f (f x)))))
Putting it together, here's an implementation of mark that works correctly for finite sequences and preserves laziness for infinite sequences. Fixing sieve is left as an exercise for the reader.
(defn mark [[x :as xs] k m]
(lazy-seq
(when (seq xs)
(if (= k m)
(cons 0 (mark (next xs) 1 m))
(cons x (mark (next xs) (inc k) m))))))
(mark (range 4 14) 1 3)
; => (4 5 0 7 8 0 10 11 0 13)
(take 10 (mark (iterate inc 4) 1 3))
; => (4 5 0 7 8 0 10 11 0 13)
Need terminating conditions
The problem here is both your mark and sieve functions have no terminating conditions. There must be some set of inputs for which each function does not call itself, but returns an answer. Additionally, every set of (valid) inputs to these functions should eventually resolve to a non-recursive return value.
But even if you get it right...
I'll add that even if you succeed in creating the correct terminating conditions, there is still the possibility of having a stack overflow if the depth of the recursion in too large. This can be mitigated to some extent by increasing the JVM stack size, but this has it's limits.
A way around this for some functions is to use tail call optimization. Some recursive functions are tail recursive, meaning that all recursive calls to the function being defined within it's definition are in the tail call position (are the final function called in the definition body). For example, in your sieve function's (= x 0) case, sieve is the tail call, since the result of sieve doesn't get passed into any other function. However, in the case that (not (= x 0)), the result of calling sieve gets passed to cons, so this is not a tail call. When a function is fully tail recursive, it is possible to behind the scenes transform the function definition into a looping construct which avoids consuming the stack. In clojure this is possible by using recur in the function definition instead of the function name (there is also a loop construct which can sometimes be helpful). Again, because not all recursive functions are tail recursive, this isn't a panacea. But when they are it's good to know that you can do this.
Thanks to #Alex's answer I managed to come up with a working lazy solution:
;; Helper function for marking mutiples of a number as 0
(defn mark [[x :as xs] k m]
(lazy-seq
(when-not (empty? xs)
(if (= k m)
(cons 0 (mark (rest xs) 1 m))
(cons x (mark (rest xs) (inc k) m))))))
;; Sieve of Eratosthenes
(defn sieve
[[x :as xs]]
(lazy-seq
(when-not (empty? xs)
(if (= x 0)
(sieve (rest xs))
(cons x (sieve (mark (rest xs) 1 x)))))))
I was adviced by someone else to use rest instead of next.
The ClojureDocs page for lazy-seq gives an example of generating a lazy-seq of all positive numbers:
(defn positive-numbers
([] (positive-numbers 1))
([n] (cons n (lazy-seq (positive-numbers (inc n))))))
This lazy-seq can be evaluated for pretty large indexes without throwing a StackOverflowError (unlike the sieve example on the same page):
user=> (nth (positive-numbers) 99999999)
100000000
If only recur can be used to avoid consuming stack frames in a recursive function, how is it possible this lazy-seq example can seemingly call itself without overflowing the stack?
A lazy sequence has the rest of the sequence generating calculation in a thunk. It is not immediately called. As each element (or chunk of elements as the case may be) is requested, a call to the next thunk is made to retrieve the value(s). That thunk may create another thunk to represent the tail of the sequence if it continues. The magic is that (1) these special thunks implement the sequence interface and can transparently be used as such and (2) each thunk is only called once -- its value is cached -- so the realized portion is a sequence of values.
Here it is the general idea without the magic, just good ol' functions:
(defn my-thunk-seq
([] (my-thunk-seq 1))
([n] (list n #(my-thunk-seq (inc n)))))
(defn my-next [s] ((second s)))
(defn my-realize [s n]
(loop [a [], s s, n n]
(if (pos? n)
(recur (conj a (first s)) (my-next s) (dec n))
a)))
user=> (-> (my-thunk-seq) first)
1
user=> (-> (my-thunk-seq) my-next first)
2
user=> (my-realize (my-thunk-seq) 10)
[1 2 3 4 5 6 7 8 9 10]
user=> (count (my-realize (my-thunk-seq) 100000))
100000 ; Level stack consumption
The magic bits happen inside of clojure.lang.LazySeq defined in Java, but we can actually do the magic directly in Clojure (implementation that follows for example purposes), by implementing the interfaces on a type and using an atom to cache.
(deftype MyLazySeq [thunk-mem]
clojure.lang.Seqable
(seq [_]
(if (fn? #thunk-mem)
(swap! thunk-mem (fn [f] (seq (f)))))
#thunk-mem)
;Implementing ISeq is necessary because cons calls seq
;on anyone who does not, which would force realization.
clojure.lang.ISeq
(first [this] (first (seq this)))
(next [this] (next (seq this)))
(more [this] (rest (seq this)))
(cons [this x] (cons x (seq this))))
(defmacro my-lazy-seq [& body]
`(MyLazySeq. (atom (fn [] ~#body))))
Now this already works with take, etc., but as take calls lazy-seq we'll make a my-take that uses my-lazy-seq instead to eliminate any confusion.
(defn my-take
[n coll]
(my-lazy-seq
(when (pos? n)
(when-let [s (seq coll)]
(cons (first s) (my-take (dec n) (rest s)))))))
Now let's make a slow infinite sequence to test the caching behavior.
(defn slow-inc [n] (Thread/sleep 1000) (inc n))
(defn slow-pos-nums
([] (slow-pos-nums 1))
([n] (cons n (my-lazy-seq (slow-pos-nums (slow-inc n))))))
And the REPL test
user=> (def nums (slow-pos-nums))
#'user/nums
user=> (time (doall (my-take 10 nums)))
"Elapsed time: 9000.384616 msecs"
(1 2 3 4 5 6 7 8 9 10)
user=> (time (doall (my-take 10 nums)))
"Elapsed time: 0.043146 msecs"
(1 2 3 4 5 6 7 8 9 10)
Keep in mind that lazy-seq is a macro, and therefore does not evaluate its body when your positive-numbers function is called. In that sense, positive-numbers isn't truly recursive. It returns immediately, and the inner "recursive" call to positive-numbers doesn't happen until the seq is consumed.
user=> (source lazy-seq)
(defmacro lazy-seq
"Takes a body of expressions that returns an ISeq or nil, and yields
a Seqable object that will invoke the body only the first time seq
is called, and will cache the result and return it on all subsequent
seq calls. See also - realized?"
{:added "1.0"}
[& body]
(list 'new 'clojure.lang.LazySeq (list* '^{:once true} fn* [] body)))
I think the trick is that the producer function (positive-numbers) isn't getting called recursively, it doesn't accumulate stack frames as if it was called with basic recursion Little-Schemer style, because LazySeq is invoking it as needed for the individual entries in the sequence. Once a closure gets evaluated for an entry then it can be discarded. So stack frames from previous invocations of the function can get garbage-collected as the code churns through the sequence.
I would like to reduce the following seq:
({0 "Billie Verpooten"}
{1 "10:00"}
{2 "17:00"}
{11 "11:10"}
{12 "19:20"})
to
{:name "Billie Verpooten"
:work {:1 ["10:00" "17:00"]
:11 ["11:10" "19:20"]}}
but I have no idea to do this.
I was think about a recursive function that uses deconstruction.
There's a function for reducing a sequence to something in the standard library, and it's called reduce. Though in your specific case, it seems appropriate to remove the special case key 0 first and partition the rest into the pairs of entries that they're meant to be.
The following function gives the result described in your question:
(defn build-map [maps]
(let [entries (map first maps)
key-zero? (comp zero? key)]
{:name (val (first (filter key-zero? entries)))
:work (reduce (fn [acc [[k1 v1] [k2 v2]]]
(assoc acc (keyword (str k1)) [v1 v2]))
{}
(partition 2 (remove key-zero? entries)))}))
Just for variety here is a different way of expressing an answer by threading sequence manipulation functions:
user> (def data '({0 "Billie Verpooten"}
{1 "10:00"}
{2 "17:00"}
{11 "11:10"}
{12 "19:20"}))
user> {:name (-> data first first val)
:work (as-> data x
(rest x)
(into {} x)
(zipmap (map first (partition 1 2 (keys x)))
(partition 2 (vals x))))}
teh as-> threading macro is new to Clojure 1.5 and makes expressing this sort of function a bit more concise.
How do I do recursion in an anonymous function, without using tail recursion?
For example (from Vanderhart 2010, p 38):
(defn power
[number exponent]
(if (zero? exponent)
1
(* number (power number (- exponent 1)))))
Let's say I wanted to do this as an anonymous function. And for some reason I didn't want to use tail recursion. How would I do it? For example:
( (fn [number exponent] ......))))) 5 3)
125
Can I use loop for this, or can loop only be used with recur?
The fn special form gives you the option to provide a name that can be used internally for recursion.
(doc fn)
;=> (fn name? [params*] exprs*)
So, add "power" as the name to complete your example.
(fn power [n e]
(if (zero? e)
1
(* n (power n (dec e)))))
Even if the recursion happened in the tail position, it will not be optimized to replace the current stack frame. Clojure enforces you to be explicit about it with loop/recur and trampoline.
I know that in Clojure there's syntactic support for "naming" an anonymous function, as other answers have pointed out. However, I want to show a first-principles approach to solve the question, one that does not depend on the existence of special syntax on the programming language and that would work on any language with first-order procedures (lambdas).
In principle, if you want to do a recursive function call, you need to refer to the name of the function so "anonymous" (i.e. nameless functions) can not be used for performing a recursion ... unless you use the Y-Combinator. Here's an explanation of how it works in Clojure.
Let me show you how it's used with an example. First, a Y-Combinator that works for functions with a variable number of arguments:
(defn Y [f]
((fn [x] (x x))
(fn [x]
(f (fn [& args]
(apply (x x) args))))))
Now, the anonymous function that implements the power procedure as defined in the question. Clearly, it doesn't have a name, power is only a parameter to the outermost function:
(fn [power]
(fn [number exponent]
(if (zero? exponent)
1
(* number (power number (- exponent 1))))))
Finally, here's how to apply the Y-Combinator to the anonymous power procedure, passing as parameters number=5 and exponent=3 (it's not tail-recursive BTW):
((Y
(fn [power]
(fn [number exponent]
(if (zero? exponent)
1
(* number (power number (- exponent 1)))))))
5 3)
> 125
fn takes an optional name argument that can be used to call the function recursively.
E.g.:
user> ((fn fact [x]
(if (= x 0)
1
(* x (fact (dec x)))))
5)
;; ==> 120
Yes you can use loop for this. recur works in both loops and fns
user> (loop [result 5 x 1] (if (= x 3) result (recur (* result 5) (inc x))))
125
an idomatic clojure solution looks like this:
user> (reduce * (take 3 (repeat 5)))
125
or uses Math.pow() ;-)
user> (java.lang.Math/pow 5 3)
125.0
loop can be a recur target, so you could do it with that too.
What am I doing wrong? Simple recursion a few thousand calls deep throws a StackOverflowError.
If the limit of Clojure recursions is so low, how can I rely on it?
(defn fact[x]
(if (<= x 1) 1 (* x (fact (- x 1)) )))
user=> (fact 2)
2
user=> (fact 4)
24
user=> (fact 4000)
java.lang.StackOverflowError (NO_SOURCE_FILE:0)
Here's another way:
(defn factorial [n]
(reduce * (range 1 (inc n))))
This won't blow the stack because range returns a lazy seq, and reduce walks across the seq without holding onto the head.
reduce makes use of chunked seqs if it can, so this can actually perform better than using recur yourself. Using Siddhartha Reddy's recur-based version and this reduce-based version:
user> (time (do (factorial-recur 20000) nil))
"Elapsed time: 2905.910426 msecs"
nil
user> (time (do (factorial-reduce 20000) nil))
"Elapsed time: 2647.277182 msecs"
nil
Just a slight difference. I like to leave my recurring to map and reduce and friends, which are more readable and explicit, and use recur internally a bit more elegantly than I'm likely to do by hand. There are times when you need to recur manually, but not that many in my experience.
The stack size, I understand, depends on the JVM you are using as well as the platform. If you are using the Sun JVM, you can use the -Xss and -XThreadStackSize parameters to set the stack size.
The preferred way to do recursion in Clojure though is to use loop/recur:
(defn fact [x]
(loop [n x f 1]
(if (= n 1)
f
(recur (dec n) (* f n)))))
Clojure will do tail-call optimization for this; that ensures that you’ll never run into StackOverflowErrors.
And due defn implies a loop binding, you could omit the loop expression, and use its arguments as the function argument. And to make it a 1 argument function, use the multiary caracteristic of functions:
(defn fact
([n] (fact n 1))
([n f]
(if (<= n 1)
f
(recur (dec n) (* f n)))))
Edit: For the record, here is a Clojure function that returns a lazy sequence of all the factorials:
(defn factorials []
(letfn [(factorial-seq [n fact]
(lazy-seq
(cons fact (factorial-seq (inc n) (* (inc n) fact)))))]
(factorial-seq 1 1)))
(take 5 (factorials)) ; will return (1 2 6 24 120)
Clojure has several ways of busting recursion
explicit tail calls with recur. (they must be truely tail calls so this wont work)
Lazy sequences as mentioned above.
trampolining where you return a function that does the work instead of doing it directly and then call a trampoline function that repeatedly calls its result until it turnes into a real value instead of a function.
(defn fact ([x] (trampoline (fact (dec x) x)))
([x a] (if (<= x 1) a #(fact (dec x) (*' x a)))))
(fact 42)
620448401733239439360000N
memoizing the the case of fact this can really shorten the stack depth, though it is not generally applicable.
ps: I dont have a repl on me so would someone kindly test-fix the trapoline fact function?
As I was about to post the following, I see that it's almost the same as the Scheme example posted by JasonTrue... Anyway, here's an implementation in Clojure:
user=> (defn fact[x]
((fn [n so_far]
(if (<= n 1)
so_far
(recur (dec n) (* so_far n)))) x 1))
#'user/fact
user=> (fact 0)
1
user=> (fact 1)
1
user=> (fact 2)
2
user=> (fact 3)
6
user=> (fact 4)
24
user=> (fact 5)
120
etc.
As l0st3d suggested, consider using recur or lazy-seq.
Also, try to make your sequence lazy by building it using the built-in sequence forms as a opposed to doing it directly.
Here's an example of using the built-in sequence forms to create a lazy Fibonacci sequence (from the Programming Clojure book):
(defn fibo []
(map first (iterate (fn [[a b]] [b (+ a b)]) [0 1])))
=> (take 5 (fibo))
(0 1 1 2 3)
The stack depth is a small annoyance (yet configurable), but even in a language with tail recursion like Scheme or F# you'd eventually run out of stack space with your code.
As far as I can tell, your code is unlikely to be tail recursion optimized even in an environment that supports tail recursion transparently. You would want to look at a continuation-passing style to minimize stack depth.
Here's a canonical example in Scheme from Wikipedia, which could be translated to Clojure, F# or another functional language without much trouble:
(define factorial
(lambda (n)
(let fact ([i n] [acc 1])
(if (zero? i)
acc
(fact (- i 1) (* acc i))))))
Another, simple recursive implementation simple could be this:
(defn fac [x]
"Returns the factorial of x"
(if-not (zero? x) (* x (fac (- x 1))) 1))
To add to Siddhartha Reddy's answer, you can also borrow the Factorial function form Structure And Interpretation of Computer Programs, with some Clojure-specific tweaks. This gave me pretty good performance even for very large factorial calculations.
(defn fac [n]
((fn [product counter max-count]
(if (> counter max-count)
product
(recur (apply *' [counter product])
(inc counter)
max-count)))
1 1 n))
Factorial numbers are by their nature very big. I'm not sure how Clojure deals with this (but I do see it works with java), but any implementation that does not use big numbers will overflow very fast.
Edit: This is without taking into consideration the fact that you are using recursion for this, which is also likely to use up resources.
Edit x2: If the implementation is using big numbers, which, as far as I know, are usually arrays, coupled with recursion (one big number copy per function entry, always saved on the stack due to the function calls) would explain a stack overflow. Try doing it in a for loop to see if that is the problem.