How does Clojure's laziness interact with calls to Java/impure code? - functional-programming

We stumbled upon an issue in our code today, and couldn't answer this Clojure question:
Does Clojure evaluate impure code (or calls to Java code) strictly or lazily?
It seems that side-effects + lazy sequences can lead to strange behavior.
Here's what we know that led to the question:
Clojure has lazy sequences:
user=> (take 5 (range)) ; (range) returns an infinite list
(0 1 2 3 4)
And Clojure has side-effects and impure functions:
user=> (def value (println 5))
5 ; 5 is printed out to screen
user=> value
nil ; 'value' is assigned nil
Also, Clojure can make calls to Java objects, which may include side-effects.
However, side-effects may interact poorly with lazy evaluation:
user=> (def my-seq (map #(do (println %) %) (range)))
#'user/my-seq
user=> (take 5 my-seq)
(0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0 1 2 3 4)
So it returned the first 5 elements, but printed the first 31!
I assume the same kinds of problems could occur if calling side-effecting methods on Java objects. This could make it really hard to reason about code and figure out what's going to happen.
Ancillary questions:
Is it up to the programmer to watch out for and prevent such situations? (Yes?)
Besides sequences, does Clojure perform strict evaluation? (Yes?)

Clojure's lazy seqs chunk about 30 items so the little overhead is further reduced. It's not the purist's choice but a practical one. Consult "The Joy of Clojure" for an ordinary solution to realize one element at time.
Lazy seqs aren't a perfect match for impure functions for the reason you encountered.
Clojure will also evaluate strictly, but with macros things are a bit different. Builtins such as if will naturally hold evaluating.

Lazy constructs are evaluated more or less whenever is convenient for the implementation no matter what's referenced in them. So, yes, it's up to the programmer to be careful and force realization of lazy seqs when needed.
I have no idea what you mean by strict evaluation.

Related

Confused about evaluation of lazy sequences

I am experimenting with clojure's lazy sequences. In order to see when the evaluation of an item would occur, I created a function called square that prints the result before returning it. I then apply this function to a vector using map.
(defn square [x]
(let [result (* x x)]
(println "printing " result)
result))
(def s (map square [1 2 3 4 5])) ; outputs nothing
Here in my declaration of s, the REPL does not output anything. This signals that the computation has not started yet. This appears to be correct. I then do:
(first s)
The function "first" takes only the first item. So I expect that only 1 will be evaluated. My expectation is the REPL will output the following:
printing 1
1
However, the REPL outputted the following instead.
printing 1
printing 4
printing 9
printing 16
printing 25
1
So rather than evaluating only the first item, it seems it evaluates all items, even though I am accessing just the first item.
If the state of a lazy sequence can only be either all values computed and no values computed, then how can it gain the advantages of lazy evaluation? I came from a scheme background and I was expecting more like the behavior of streams. Looks like I am mistaken. Can anyone explain what is going on?
Laziness isn't all-or-nothing, but some implementations of seq operate on 'chunks' of the input sequence (see here for an explanation). This is the case for vector which you can test for with chunked-seq?:
(chunked-seq? (seq [1 2 3 4 5]))
When given a collection map checks to see if the underlying sequence is chunked, and if so evaluates the result on a per-chunk basis instead of an element at a time.
The chunk size is usually 32 so you can see this behaviour by comparing the result of
(first (map square (vec (range 35))))
This should only display a message for the first 32 items and not the entire sequence.

Evaluate the expression CLISP

(+ '(1 2 3 4 5) '(3 4 5 6 7))
Evaluate this expression. I don't know much about CLISP.
It returns an error when I runs it on CLISP.
Explain the reason for error ??
Thanks in advance
It's possible you were expecting this to concatenate
(+ '(1 2 3 4 5) '(3 4 5 6 7)) ==> '(1 2 3 4 5 3 4 5 6 7)
Common Lisp doesn't do this because it wouldn't make sense. In some languages like Python, there are a limited number of infix operators, so overloading + makes a certain amount of sense. However, in Common Lisp, there are infinitely many function names and + is but one of them, so we have different functions, such as append to do this.
It's also possible you were expecting this to add pointwise.
(+ '(1 2 3 4 5) '(3 4 5 6 7)) ==> '(4 6 8 10 12)
It doesn't do this as it's also against Lisp's philosophy. Adding elements pointwise like this is a feature of tacit languages such as APL or J. These languages go to a lot of trouble to get features like this to work in the most general possible cases. As such, they tend not to focus so much on certain other features, such as the object system or metaprogramming.
This is where Lisp shines: Lisp is a metaprogramming language, so rather than spend all their time developing corner cases for mathematical functions, they made a simple function that simply adds numbers together and does nothing more, and then spent the bulk of their development time making a good macro system. Common Lisp in particular adds to this by having one of the best object systems (with fully generic dispatch) that I've seen. Languages aren't good at everything, so the best languages have to define a philosophy and stick to it. Accepting all kinds of input simply isn't Lisp's philosophy; metaprogramming is.
CLISP itself will give you part of the answer:
[1]> (+ '(1 2 3 4 5) '(3 4 5 6 7))
*** - +: (1 2 3 4 5) is not a number
+ is a function that is only defined on numbers, so when the interpreter sees that a list of numbers has been offered as an argument to +, it can't go any further.
The interpreter sees (1 2 3 4 5) as the first argument to + after evaluating '(1 2 3 4 5), i.e. (quote (1 2 3 4 5)), which returns (1 2 3 4 5).
You may be thinking that + is more flexible, sort of like the way that it works in Javascript. No: It's just the math plus function.
(Why doesn't the interpreter complain about (3 4 5 6 7)? Because it stopped when it saw (1 2 3 4 5), and went to the debugger prompt.)
By the way, that error message is CLISP-specific, but any Common Lisp will give you an error on that input.
EDIT: Per #RainerJoswig's comment, it may be that the correct way to describe this is not that interpreter code responds to (1 2 3 4 5) being paired with +, but the + function code that does so, and doesn't bother with the second argument, etc. What I wrote was my best guess about the right way to describe the situation, and partially addresses OP's question, but I am not a Lisp internals expert.

Destructive sorting in lisp

I'm reading Practical Common Lisp. In chapter 11, it says this about sorting:
Typically you won't care about the unsorted version of a sequence after you've sorted it, so it makes sense to allow SORT and STABLE-SORT to destroy the sequence in the course of sorting it. But it does mean you need to remember to write the following:
(setf my-sequence (sort my-sequence #'string<))
I tried the following code:
CL-USER> (defparameter *a* #( 8 4 3 9 5 9 2 3 9 2 9 4 3))
*A*
CL-USER> *a*
#(8 4 3 9 5 9 2 3 9 2 9 4 3)
CL-USER> (sort *a* #'<)
#(2 2 3 3 3 4 4 5 8 9 9 9 9)
CL-USER> *a*
#(2 2 3 3 3 4 4 5 8 9 9 9 9)
In this code we can see that the variable *a* has been changed by the sort function.
Then why do the book say that is necessary to do an assignment?
I'm using SBCL + Ubuntu 14.04 + Emacs + Slime
EDIT:
Following the comment of #Sylwester I add the evaluation of *a* so it's clear that the value has been changed.
It's necessary to do the assignment if you want your variable to contain the proper value of the sorted sequence afterwards. If you don't care about that and only want the return value of sort, you don't need an assignment.
There are two reasons for this. First, an implementation is allowed to use non-destructive copying to implement destructive operations. Secondly, destructive operations on lists can permute the conses such that the value passed into the operation no longer points to the first cons of the sequence.
Here's an example of the second problem (run under SBCL):
(let ((xs (list 4 3 2 1)))
(sort xs '<)
xs)
=> (4)
If we add the assignment:
(let ((xs (list 4 3 2 1)))
(setf xs (sort xs '<))
xs)
=> (1 2 3 4)
The variable can't be changed by the sort function, since the sort function does not know about the variable at all.
All the sort function gets is a vector or a list, but not variables.
In Common Lisp the sort function can be destructive. When it gets a vector for sorting, it can return the same vector or a new one. This is up to the implementation. In one implementation it might return the same vector and in another one it may return a new one. But in any case they will be sorted.
If there is a variable, which points to a sequence and for which the author expects that it will after sorting point to a sorted sequence: set the variable to the result of the sort operation. Otherwise there might be cases, where after potentially destructive sorting, the variable won't point to the SORT result, but still to an unsorted, or otherwise changed, sequence. In case of the vector this CAN be the old and unsorted vector.
REMEMBER
The only thing you can be sure: the SORT function returns a sorted sequence as its value.

Trying to create a date from a string in Racket - find-seconds VERY slow, week-day year-day required?

I'm trying to parse dates from a large csv file in Racket.
The most straightforward way to do this would be to create a new date struct. But it requires the week-day and year-day parameters. Of course I don't have these, and this seems like a real weakness of the date module that I don't understand.
So, as an alternative, I decided to use find-seconds to convert the raw date vals into seconds and then pass that to seconds->date. This works, but is brutally slow.
(time
(let loop ([n 10000])
(apply find-seconds '(0 0 12 1 1 2012)) ; this takes 3 seconds for 10000
;(date 0 0 12 1 1 2012 0 0 #f 0) ; this is instant
(if (zero? n)
'done
(loop (sub1 n)))))
find-seconds takes 3 seconds to do 10000 values, and I have several million. Creating the date struct is of course instant, but I don't have the week-day, year-day values.
My questions are:
1.) Why is week-day/year-day required for creating date structs?
2.) Is find-seconds supposed to be this slow (ie, bug)? Or am I doing something wrong?
3.) Are there any alternatives to parse dates in a fast manner. I know srfi/19 has a string->date function, but I'd then have to change everything to use that module's struct instead of racket's built-in one. And it may suffer the same performance hit of find-seconds, I'm not sure.
Although not documented as such, it appears that week-day and year-day are "no-ops" when using the date struct with date->seconds. If I set them both to 0, a date->seconds doesn't complain. I suspect it ignores them:
#lang racket
(require racket/date)
(define d (date 1 ;sc
2 ;mn
3 ;hr
20 ;day
8 ;month
2012 ;year
0 ;weekday <<<
0 ;year-day <<<
#f ;dst?
0 ;time-zone-offset
))
(displayln (seconds->date (date->seconds d)))
;; =>
#(struct:date* 1 2 3 20 8 2012 1 232 #t -14400 0 EDT)
^ ^^^
My guess is that the date struct was defined for use with seconds->date, where week-day and year-day would be interesting information to provide. Then for date->seconds, rather than define another struct with those fields missing (they're "redundant" for determining the date, which is why you're understandably annoyed :)) for use with date->seconds, the same struct was reused.
Does that help? It's not clear to me from your question what you're trying to do with the date information from the CSV. If you want to convert it to an integer seconds value, I think the above should work for you. If you have something else in mind, perhaps you could explain.
I would say this is an oversight in racket/date.
The call to find-seconds is expensive because it needs to search to find the number of seconds. And since you only need to know the week-day it an unnecessary computation.
Write to the mailing list in order to get advice.

lazy sequence depending on previous elements

Learning clojure, trying to create a lazy infinite sequence of all prime numbers.
I'm aware that there are more efficient algorithms; I'm doing the following more as a POC/lesson than as the ideal solution.
I have a function that, given a sequence of primes, tells me what the next prime is:
(next-prime [2 3 5]) ; result: 7
My lazy sequence must therefore pass itself to this function, then take the result and add that to itself.
My first attempt:
(def lazy-primes
(lazy-cat [2 3 5] (find-next-prime lazy-primes)))
..which results in an IllegalArgumentException: Don't know how to create ISeq from: java.lang.Integer
My second attempt:
(def lazy-primes
(lazy-cat [2 3 5] [(find-next-prime lazy-primes)]))
..which gives me [2 3 5 7] when asked for 10 elements.
Attempt 3:
(def lazy-primes
(lazy-cat [2 3 5]
(conj lazy-primes (find-next-prime lazy-primes))))
(take 10 lazy-primes) ; result: (2 3 5 7 2 3 5 7 2 3)
All of these seem like they should work (or at least, should work given that the preceding didn't work). Why am I getting the bogus output for each case?
Reasons why your initial attempts don't work:
(find-next-prime lazy-primes) returns an integer but lazy-cat needs a sequence
[(find-next-prime lazy-primes)] creates a vector (and is hence seqable) but it only gets evaluated once when it is first accessed
conj is adding new primes to the start of the sequence (since lazy-cat and hence lazy-primes returns a sequence)... which is probably not what you want! It's also possibly confusing find-next-prime depending on how that is implemented, and there might be a few subtle issues around chunked sequences as well.....
You might instead want to use something like:
(defn seq-fn [builder-fn num ss]
(cons
(builder-fn (take num ss))
(lazy-seq (seq-fn builder-fn (inc num) ss))))
(def lazy-primes
(lazy-cat [2 3 5] (seq-fn next-prime 3 lazy-primes)))
A bit complicated, but basically what I'm doing is using the higher-order helper function to provide a closure over a set of parameters that includes the number of primes created so far, so that it can generate the next prime incrementally at each step.
p.s. as I'm sure you are aware there are faster algorithms for generating primes! I'm assuming that this is intended primarily as an exercise in Clojure and the use of lazy sequences, in which case all well and good! But if you really care about generating lots of primes I'd recommend taking a look at the Sieve of Atkin
Alternatively, you could use iterate: the built-in function that lazily takes the output of a function and applies that to the function again
clojure.core/iterate
([f x])
Returns a lazy sequence of x, (f x), (f (f x)) etc.
f must be free of side-effects
in order for you to make it work, the next-prime function should concatenate its result to its input, and return the concatenation.
Then you can just call (take 100 (iterate list-primes [1])) to get a list of the first 100
primes.
With your next-prime function you can generate a lazy sequence of all primes with the following snippet of code:
(def primes (map peek (iterate #(conj % (next-prime %)) [2])))
The combination you are looking for is concat + lazy-seq + local fn.
Take a look at the implementation of Erathostenes' Sieve in the Clojure Contrib libraries: https://github.com/richhickey/clojure-contrib/blob/78ee9b3e64c5ac6082fb223fc79292175e8e4f0c/src/main/clojure/clojure/contrib/lazy_seqs.clj#L66
One more word, though: this implementation uses a more sophisticated algorithm for the Sieve in a functional language.
Another implementation for Clojure can be found in Rosetta code. However, I don't like that one as it uses atoms, which you don't need for the solution of this algo in Clojure.

Resources