What determines when a collection is created? - collections

If I understand correctly Clojure can return lists (as in other Lisps) but also vectors and sets.
What I don't really get is why there's not always a collection that is returned.
For example if I take the following code:
(loop [x 128]
(when (> x 1)
(println x)
(recur (/ x 2))))
It does print 128 64 32 16 8 4 2. But that's only because println is called and println has the side-effect (?) of printing something.
So I tried replacing it with this (removing the println):
(loop [x 128]
(when (> x 1)
x
(recur (/ x 2))))
And I was expecting to get some collecting (supposedly a list), like this:
(128 64 32 16 8 4 2)
but instead I'm getting nil.
I don't understand which determines what creates a collection and what doesn't and how you switch from one to the other. Also, seen that Clojure somehow encourages a "functional" way of programming, aren't you supposed to nearly always return collections?
Why are so many functions that apparently do not return any collection? And what would be an idiomatic way to make these return collections?
For example, how would I solve the above problem by first constructing a collection and then iterating (?) in an idiomatic way other the resulting list/vector?
First I don't know how to transform the loop so that it produces something else than nil and then I tried the following:
(reduce println '(1 2 3))
But it prints "1 2nil 3nil" instead of the "1 2 3nil" I was expecting.
I realize this is basic stuff but I'm just starting and I'm obviously missing basic stuff here.
(P.S.: retag appropriately, I don't know which terms I should use here)

A few other comments have pointed out that when doesn't really work like if - but I don't think that's really your question.
The loop and recur forms create an iteration - like a for loop in other languages. In this case, when you are printing, it is indeed just for the side effects. If you want to return a sequence, then you'll need to build one:
(loop [x 128
acc []]
(if (< x 1)
acc
(recur (/ x 2)
(cons x acc))))
=> (1 2 4 8 16 32 64 128)
In this case, I replaced the spot where you were calling printf with a recur and a form that adds x to the front of that accumulator. In the case that x is less than 1, the code returns the accumulator - and thus a sequence. If you want to add to the end of the vector instead of the front, change it to conj:
(loop [x 128
acc []]
(if (< x 1)
acc
(recur (/ x 2)
(conj acc x))))
=> [128 64 32 16 8 4 2 1]
You were getting nil because that was the result of your expression -- what the final println returned.
Does all this make sense?
reduce is not quite the same thing -- it is used to reduce a list by repeatedly applying a binary function (a function that takes 2 arguments) to either an initial value and the first element of a sequence, or the first two elements of the sequence for the first iteration, then subsequent iterations are passed the result of the previous iteration and the next value from the sequence. Some examples may help:
(reduce + [1 2 3 4])
10
This executes the following:
(+ 1 2) => 3
(+ 3 3) => 6
(+ 6 4) => 10
Reduce will result in whatever the final result is from the binary function being executed -- in this case we're reducing the numbers in the sequence into the sum of all the elements.
You can also supply an initial value:
(reduce + 5 [1 2 3 4])
15
Which executes the following:
(+ 5 1) => 6
(+ 6 2) => 8
(+ 8 3) => 11
(+ 11 4) => 15
HTH,
Kyle

The generalized abstraction over collection is called a sequence in Clojure and many data structure implement this abstraction so that you can use all sequence related operations on those data structure without thinking about which data structure is being passed to your function(s).
As far as the sample code is concerned - the loop, recur is for recursion - so basically any problem that you want to solve using recursion can be solved using it, classic example being factorial. Although you can create a vector/list using loop - by using the accumulator as a vector and keep appending items to it and in the exist condition of recursion returning the accumulated vector - but you can use reductions and take-while functions to do so as shown below. This will return a lazy sequence.
Ex:
(take-while #(> % 1) (reductions (fn [s _] (/ s 2)) 128 (range)))

Related

Deriving a (generalized) Sequence from a Proper Sequence

A number of the Common Lisp sequence functions take a proper sequence as an input and return a sequence as output. Starting with a proper sequence, how could the function not return another proper sequence? Example?
(mapcan #'rest (list (list 0 1 2) (cons :a :b)))
=> (1 2 . :b)
... but it is true that most of the time you can expect to have proper sequences as a result; functions might be underspecified for various reasons (cost to implementers, etc).
By the way, notice that NCONC is specified to return a list (at least in the HyperSpec), but the formal definition as given in the same page allows to have non-lists as a result, e.g. (nconc nil 2) is 2. This incomplete over-approximation of the type of result (in the signature, not the actual description of the function) contaminates all other results:
(mapcan #'rest (list (list) (cons 1 2)))
=> 2
See also Proposed ANSI Changes and ANSI Clarifications and Errata.

Removing last two elements from a list in Lisp

I need to remove the last two elements from a list in common list, but I can remove only one. What's the way?
(defun my-butlast (list)
(loop for l on list
while (rest l)
collect (first l)))
Simple: reverse, pop, pop, reverse ;-) 1
More efficiently, the following works too:
(let ((list '(a b c d)))
(loop
for x in list
for y in (cddr list)
collect x))
This can also be written, for some arbitrary L and N:
(mapcar #'values L (nthcdr N L))
It works because iteration over multiple lists is bounded by the shortest one. What matters here is the length of the second list (we don't care about its values), which is the length of the original list minus N, which must be a non-negative integer. Notice that NTHCDR conveniently works with sizes greater than the length of the list given in argument.
With the second example, I use the VALUES function as a generalized identity function; MAPCAR only uses the primary value of the computed values, so this works as desired.
The behavior is consistent with the actual BUTLAST2 function, which returns nil for N larger than the number of elements in the list. The actual BUTLAST function can also deal with improper (dotted) lists, but the above version cannot.
1. (alexandria:compose #'nreverse #'cddr #'reverse)
2. BUTLAST is specified as being equivalent to (ldiff list (last list n)). I completely forgot about the existence of LDIFF !
There's a function in the standard for this: butlast, or if you're willing to modify the input list, nbutlast.
butlast returns a copy of list from which the last n conses have been omitted. If n is not supplied, its value is 1. If there are fewer than n conses in list, nil is returned and, in the case of nbutlast, list is not modified.
nbutlast is like butlast, but nbutlast may modify list. It changes the cdr of the cons n+1 from the end of the list to nil.
Examples:
CL-USER> (butlast '(1 2 3 4 5) 2)
(1 2 3)
CL-USER> (nbutlast (list 6 7 8 9 10) 2)
(6 7 8)
The fact that you called your function my-butlast suggests that you might know about this function, but you didn't mention wanting to not use this function, so I assume it's still fair game. Wrapping it up is easy:
CL-USER> (defun my-butlast (list)
(butlast list 2))
MY-BUTLAST
CL-USER> (my-butlast (list 1 2 3 4))
(1 2)

Learning Clojure: recursion for Hidden Markov Model

I'm learning Clojure and started by copying the functionality of a Python program that would create genomic sequences by following an (extremely simple) Hidden Markov model.
In the beginning I stuck with my known way of serial programming and used the def keyword a lot, thus solving the problem with tons of side effects, kicking almost every concept of Clojure right in the butt. (although it worked as supposed)
Then I tried to convert it to a more functional way, using loop, recur, atom and so on. Now when I run I get an ArityException, but I can't read the error message in a way that shows me even which function throws it.
(defn create-model [name pA pC pG pT pSwitch]
; converts propabilities to cumulative prop's and returns a char
(with-meta
(fn []
(let [x (rand)]
(if (<= x pA)
\A
(if (<= x (+ pA pC))
\C
(if (<= x (+ pA pC pG))
\G
\T))))) ; the function object
{:p-switch pSwitch :name name})) ; the metadata, used to change model
(defn create-genome [n]
; adds random chars from a model to a string and switches models randomly
(let [models [(create-model "std" 0.25 0.25 0.25 0.25 0.02) ; standard model, equal props
(create-model "cpg" 0.1 0.4 0.4 0.1 0.04)] ; cpg model
islands (atom 0) ; island counter
m (atom 0)] ; model index
(loop [result (str)]
(let [model (nth models #m)]
(when (< (rand) (:p-switch (meta model))) ; random says "switch model!"
; (swap! m #(mod (inc #m) 2)) ; swap model used in next iteration
(swap! m #(mod (inc %) 2)) ; EDIT: correction
(if (= #m 1) ; on switch to model 1, increase island counter
; (swap! islands #(inc #islands)))) ; EDIT: my try, with error
(swap! islands inc)))) ; EDIT: correction
(if (< (count result) n) ; repeat until result reaches length n
(recur (format "%s%c" result (model)))
result)))))
Running it works, but calling (create-genome 1000) leads to
ArityException Wrong number of args (1) passed to: user/create-genome/fn--772 clojure.lang.AFn.throwArity (AFn.java:429)
My questions:
(obviously) what am I doing wrong?
how exactly do I have to understand the error message?
Information I'd be glad to receive
how can the code be improved (in a way a clojure-newb can understand)? Also different paradigms - I'm grateful for suggestions.
Why do I need to put pound-signs # before the forms I use in changing the atoms' states? I saw this in an example, the function wouldn't evaluate without it, but I don't understand :)
Since you asked for ways to improve, here's one approach I often find myself going to : Can I abstract this loop into a higher order pattern?
In this case, your loop is picking characters at random - this can be modelled as calling a fn of no arguments that returns a character - and then accumulating them together until it has enough of them. This fits very naturally into repeatedly, which takes functions like that and makes lazy sequences of their results to whatever length you want.
Then, because you have the entire sequence of characters all together, you can join them into a string a little more efficiently than repeated formats - clojure.string/join should fit nicely, or you could apply str over it.
Here's my attempt at such a code shape - I tried to also make it fairly data-driven and that may have resulted in it being a bit arcane, so bear with me:
(defn make-generator
"Takes a probability distribution, in the form of a map
from values to the desired likelihood of that value appearing in the output.
Normalizes the probabilities and returns a nullary producer fn with that distribution."
[p-distribution]
(let[sum-probs (reduce + (vals p-distribution))
normalized (reduce #(update-in %1 [%2] / sum-probs) p-distribution (keys p-distribution) )]
(fn [] (reduce
#(if (< %1 (val %2)) (reduced (key %2)) (- %1 (val %2)))
(rand)
normalized))))
(defn markov-chain
"Takes a series of states, returns a producer fn.
Each call, the process changes to the next state in the series with probability :p-switch,
and produces a value from the :producer of the current state."
[states]
(let[cur-state (atom (first states))
next-states (atom (cycle states))]
(fn []
(when (< (rand) (:p-switch #cur-state))
(reset! cur-state (first #next-states))
(swap! next-states rest))
((:producer #cur-state)))))
(def my-states [{:p-switch 0.02 :producer (make-generator {\A 1 \C 1 \G 1 \T 1}) :name "std"}
{:p-switch 0.04 :producer (make-generator {\A 1 \C 4 \G 4 \T 1}) :name "cpg"}])
(defn create-genome [n]
(->> my-states
markov-chain
(repeatedly n)
clojure.string/join))
To hopefully explain a little of the complexity:
The let in make-generator is just making sure the probabilities sum to 1.
make-generator makes heavy use of another higher-order looping pattern, namely reduce. This essentially takes a function of 2 values and threads a collection through it. (reduce + [4 5 2 9]) is like (((4 + 5) + 2) + 9). I chiefly use it to do a similar thing to your nested ifs in create-model, but without naming how many values are in the probability distribution.
markov-chain makes two atoms, cur-state to hold the current state and next-states, which holds an infinite sequence (from cycle) of the next states to switch to. This is to work like your m and models, but for arbitrary numbers of states.
I then use when to check if the random state switch should occur, and if it does perform the two side effects I need to keep the state atoms up to date. Then I just call the :producer of #cur-state with no arguments and return that.
Now obviously, you don't have to do this exactly this way, but looking for those patterns certainly does tend to help me.
If you want to go even more functional, you could also consider moving to a design where your generators take a state (with seeded random number generator) and return a value plus a new state. This "state monad" approach would make it possible to be fully declarative, which this design isn't.
Ok, it's a long shot, but it looks like your atom-updating functions:
#(mod (inc #m) 2)
and
#(inc #islands)
are of 0-arity, and they should be of arity at least 1.
This leads to the answer to your last question: the #(body) form is a shortcut for (fn [...] (body)). So it creates an anonymous function.
Now the trick is that if body contains % or %x where x is a number, the position where it appears will be substituted for the referece to the created function's argument number x (or the first argument if it's only %).
In your case that body doesn't contain references to the function arguments, so #() creates an anonymous function that takes no arguments, which is not what swap! expects.
So swap tries to pass an argument to something that doesn't expect it and boom!, you get an ArityException.
What you really needed in those cases was:
(swap! m #(mod (inc %) 2)) ; will swap value of m to (mod (inc current-value-of-m) 2) internally
and
(swap! islands inc) ; will swap value of islands to (inc current-value-of-islands) internally
respectively
Your mistake has to do with what you asked about the hashtag macro #.
#(+ %1 %2) is shorthand for (fn [x y] (+ x y)). It can be without arguments too: #(+ 1 1). That's how you are using it. The error you are getting is because swap! needs a function that accepts a parameter. What it does is pass the atom's current value to your function. If you don't care about its state, use reset!: (reset! an-atom (+ 1 1)). That will fix your error.
Correction:
I just took a second look at your code and realised that you are actually using working on the atoms' states. So what you want to do is this:
(swap! m #(mod (inc %) 2)) instead of (swap! m #(mod (inc #m) 2)).
As far as style goes, you are doing good. I write my functions differently every day of the week, so maybe I'm not one to give advice on that.

Count amount of odd numbers in a sentence

I am fairly new to lisp and this is one of the practice problems.
First of all, this problem is from simply scheme. I am not sure how to answer this.
The purpose of this question is to write the function, count-odd that takes a sentence as its input and count how many odd digits are contained in it as shown below:
(count-odd'(234 556 4 10 97))
6
or
(count-odd '(24680 42 88))
0
If possible, how would you be able to do it, using higher order functions, or recursion or both - whatever gets the job done.
I'll give you a few pointers, not a full solution:
First of all, I see 2 distinct ways of doing this, recursion or higher order functions + recursion. For this case, I think straight recursion is easier to grok.
So we'll want a function which takes in a list and does stuff, so
(define count-odd
(lambda (ls) SOMETHING))
So this is recursive, so we'd want to split the list
(define count-odd
(lambda (ls)
(let ((head (car ls)) (rest (cdr ls)))
SOMETHING)))
Now this has a problem, it's an error for an empty list (eg (count-odd '())), but I'll let you figure out how to fix that. Hint, check out scheme's case expression, it makes it easy to check and deal with an empty list
Now something is our recursion so for something something like:
(+ (if (is-odd head) 1 0) (Figure out how many odds are in rest))
That should give you something to start on. If you have any specific questions later, feel free to post more questions.
Please take first into consideration the other answer guide so that you try to do it by yourself. The following is a different way of solving it. Here is a tested full solution:
(define (count-odd num_list)
(if (null? num_list)
0
(+ (num_odds (car num_list)) (count-odd (cdr num_list)))))
(define (num_odds number)
(if (zero? number)
0
(+ (if (odd? number) 1 0) (num_odds (quotient number 10)))))
Both procedures are recursive.
count-odd keeps getting the first element of a list and passing it to num_odds until there is no element left in the list (that is the base case, a null list).
num_odds gets the amount of odd digits of a number. To do so, always asks if the number is odd in which case it will add 1, otherwise 0. Then the number is divided by 10 to remove the least significant digit (which determines if the number is odd or even) and is passed as argument to a new call. The process repeats until the number is zero (base case).
Try to solve the problem by hand using only recursion before jumping to a higher-order solution; for that, I'd suggest to take a look at the other answers. After you have done that, aim for a practical solution using the tools at your disposal - I would divide the problem in two parts.
First, how to split a positive integer in a list of its digits; this is a recursive procedure over the input number. There are several ways to do this - by first converting the number to a string, or by using arithmetic operations to extract the digits, to name a few. I'll use the later, with a tail-recursive implementation:
(define (split-digits n)
(let loop ((n n)
(acc '()))
(if (< n 10)
(cons n acc)
(loop (quotient n 10)
(cons (remainder n 10) acc)))))
With this, we can solve the problem in terms of higher-order functions, the structure of the solution mirrors the mental process used to solve the problem by hand:
First, we iterate over all the numbers in the input list (using map)
Split each number in the digits that compose it (using split-digits)
Count how many of those digits are odd, this gives a partial solution for just one number (using count)
Add all the partial solutions in the list returned by map (using apply)
This is how it looks:
(define (count-odd lst)
(apply +
(map (lambda (x)
(count odd? (split-digits x)))
lst)))
Don't be confused if some of the other solutions look strange. Simply Scheme uses non-standard definitions for first and butfirst. Here is a solution, that I hope follows Simply Scheme friendly.
Here is one strategy to solve the problem:
turn the number into a list of digits
transform into a list of zero and ones (zero=even, one=odd)
add the numbers in the list
Example: 123 -> '(1 2 3) -> '(1 0 1) -> 2
(define (digit? x)
(<= 0 x 9))
(define (number->digits x)
(if (digit? x)
(list x)
(cons (remainder x 10)
(number->digits (quotient x 10)))))
(define (digit->zero/one d)
(if (even? d) 0 1))
(define (digits->zero/ones ds)
(map digit->zero/one ds))
(define (add-numbers xs)
(if (null? xs)
0
(+ (first xs)
(add-numbers (butfirst xs)))))
(define (count-odds x)
(add-numbers
(digits->zero/ones
(number->digits x))))
The above is untested, so you might need to fix a few typos.
I think this is a good way, too.
(define (count-odd sequence)
(length (filter odd? sequence)))
(define (odd? num)
(= (remainder num 2) 1))
(count-odd '(234 556 4 10 97))
Hope this will help~
The (length sequence) will return the sequence's length,
(filter proc sequence) will return a sequence that contains all the elements satisfy the proc.
And you can define a function called (odd? num)

What to return in a collection when using map

I read a lot of documentation about Clojure (and shall need to read it again) and read several Clojure questions here on SO to get a "feel" of the language. Besides a few tiny functions in elisp I've never written in any Lisp language before. I wrote my first project Euler solution in Clojure and before going further I'd like to better understand something about map and reduce.
Using a lambda, I ended up with the following (to sum all multiple of either 3 or 5 or both between 1 and 1000 inclusive):
(reduce + (map #(if (or (= 0 (mod %1 3)) (= 0 (mod %1 5))) %1 0) (range 1 1000)))
I put it on one line because I wrote it on the REPL (and it gives the correct solution).
Without the lambda, I wrote this:
(defn val [x] (if (or (= 0 (mod x 3)) (= 0 (mod x 5))) x 0))
And then I compute the solution doing this:
(reduce + (map val (range 1 1000)))
In both cases, my question concerns what the map should return, before doing the reduce. After doing the map I noticed I ended up with a list looking like this: (0 0 3 0 5 6 ...).
I tried removing the '0' at the end of the val definition but then I received a list made of (nil nil 3 nil 5 6 etc.). I don't know if the nil are an issue or not. I figured out that I was going to sum while doing a fold-left anyway so that the zero weren't really an issue.
But still: what's a sensible map to return? (0 0 3 0 5 6 ...) or (nil nil 3 nil 5 6...) or (3 5 6 ...) (how would I go about this last one?) or something else?
Should I "filter out" the zeroes / nils and if so how?
I know I'm asking a basic question but map/reduce is obviously something I'll be using a lot so any help is welcome.
It sounds like you already have an intuative undestanding of the need to seperate mapping concerns form the reducing It's perfectly natural to have data produced by map that is not used by the reduce. infact using the fact that zero is the identity value for addition make this even more elegant.
mappings job is to produce the new data (in this case 3 5 or "ignore")
reduces job is to decide what to include and to produce the final result.
what you started with is idiomatic clojure and there is no need to complicate it any more,
so this next example is just to illustrate the point of having map decide what to include:
(reduce #(if-not (zero? %1) (+ %1 %2) %2) (map val (range 10)))
in this contrived example the reduce function ignores the zeros. In typical real world code if the idea was as simple as filtering out some value then people tend to just use the filter function
(reduce + (filter #(not (zero? %)) (map val (range 10))))
you can also just start with filter and skip the map:
(reduce + (filter #(or (zero? (rem % 3)) (zero? (rem % 5))) (range 10)))
The watchword is clarity.
Use filter, not map. Then you don't have to choose a null
value that you later have to decide not to act on.
Naming the filtering/mapping function can help. Do so with let
or letfn, not defn, unless you have use for the function elsewhere.
Acting on this advice brings us to ...
(let [divides-by-3-or-5? (fn [n] (or (zero? (mod n 3)) (zero? (mod n 5))))]
(reduce + (filter divides-by-3-or-5? (range 1 1000))))
You may want to stop here for now.
This reads well, but the divides-by-3-or-5? function sticks in the throat. Change the factors and we need a completely new function. And that repeated phrase (zero? (mod n ...)) jars. So ...
We want a function, that - given a list (or other collection) of possible factors - tells us whether any of them apply to a given number. In other words, we want
a function of a collection of numbers - the possible factors - ...
that returns a function of one number - the candidate - ...
that tells us whether the candidate is divisible by any of the possible factors.
One such function is
(fn [ns] (fn [n] (some (fn [x] (zero? (mod n x))) ns)))
... which we can employ thus
(let [divides-by-any? (fn [ns] (fn [n] (some (fn [x] (zero? (mod n x))) ns)))]
(reduce + (filter (divides-by-any? [3 5]) (range 1 1000))))
Notes
This "improvement" has made the program a little slower.
divides-by-any? might prove useful enough to be promoted to a
defn.
If the operation were critical, you could consider stripping out
redundant factors. For example [2 3 6] could be reduced to [6].
If the operation were really critical, and the factors were supplied
as constants, you could consider creating the filter function with a
macro that went back to using or.
This is a bit of a shaggy-dog story, but it recounts the thoughts prompted by the problem you refer to.
In your case I would use keep instead of map. It is similar to map except that it keeps only the non-nil values.

Resources