merging a series of vectors into one sequentially in clojure - vector

I have a program which performs a search to Amazon and returns information on the specified book. Once all the searches are performed I wanted to be able to sort the books by SalesRank. The problem that i'm having is being able to combine the vectors into one large collection. Right now I can get them to print one by one but each iteration overwrites the previous. I'm a functional programming NOOB so any help is appreciated. Below is a snippet:
(defn get_title_and_rank_for_one_isbn [isbn]
(def book_title (get-in (amazon_search isbn)[:items 0 :item-atributes :title]))
(def sales_rank(get-in (amazon_search isbn)[:items 0 :SalesRank]))
(def book_isbn(get-in (amazon_search isbn)[:items 0 :asin]))
)
(defn get_title_and_rank_for_all_isbns [list_of_isbns]
(doseq [isbn list_of_isbns]
(Thread/sleep 3000)
(get_title_and_rank_for_one_isbn isbn)
(def combine_attributes(reduce into [[book_title] [book_isbn] [sales_rank]]))
(println combine_attributes)
)
)

(defn- get-books [data]
(letfn [(one-book [book]
(let [title (get-in book :title)
rank (get-in book :rank)
isbn (get-in book :isbn)]
{:title title
:rank rank
:isbn isbn}))]
(map one-book data))))
There are probably several ways to do this, but this is one take with some simplified code. You might call the function like this: (println (get-books data)) Where data is your json structure.
So what's happening in get-books? leftn allows you to define functions which you can encapsulate here. Clojure functions will return the last thing defined, in this case, it's the map function. In fact, it's pretty much the only thing that "runs" here. It maps your data over the one-book function which uses let to create bindings that are re-bound as each book passes through the function. This takes the place of your thinking about using def.
Again, one-book returns its last definition which is a key-value structure. You map over all the data then print it out, pass it to another function, or whatever you need to do with it.
This is less an exact solution than it is a suggestion for how to think about processing your data and how to return values from a function.

First observe that (amazon_search isbn) should only run once per isbn.
Then map over each search result independently and extract the desired data into new maps that you finally sort by sales rank.
(defn get-book [isbn]
(let [itm (get-in (amazon_search isbn) [:items 0])]
{:book/title (get-in itm [:item-attributes :title])
:book/sales-rank (:SalesRank itm)
:book/isbn (:asim itm)}))
(defn get-books-sorted-by-sales-rank [isbns]
(->> isbns (map get-book) (sort-by :book/sales-rank)))
;; a bit more performant

Related

How to understand clojure's lazy-seq

I'm trying to understand clojure's lazy-seq operator, and the concept of lazy evaluation in general. I know the basic idea behind the concept: Evaluation of an expression is delayed until the value is needed.
In general, this is achievable in two ways:
at compile time using macros or special forms;
at runtime using lambda functions
With lazy evaluation techniques, it is possible to construct infinite data structures that are evaluated as consumed. These infinite sequences utilizes lambdas, closures and recursion. In clojure, these infinite data structures are generated using lazy-seq and cons forms.
I want to understand how lazy-seq does it's magic. I know it is actually a macro. Consider the following example.
(defn rep [n]
(lazy-seq (cons n (rep n))))
Here, the rep function returns a lazily-evaluated sequence of type LazySeq, which now can be transformed and consumed (thus evaluated) using the sequence API. This API provides functions take, map, filter and reduce.
In the expanded form, we can see how lambda is utilized to store the recipe for the cell without evaluating it immediately.
(defn rep [n]
(new clojure.lang.LazySeq (fn* [] (cons n (rep n)))))
But how does the sequence API actually work with LazySeq?
What actually happens in the following expression?
(reduce + (take 3 (map inc (rep 5))))
How is the intermediate operation map applied to the sequence,
how does take limit the sequence and
how does terminal operation reduce evaluate the sequence?
Also, how do these functions work with either a Vector or a LazySeq?
Also, is it possible to generate nested infinite data structures?: list containing lists, containing lists, containing lists... going infinitely wide and deep, evaluated as consumed with the sequence API?
And last question, is there any practical difference between this
(defn rep [n]
(lazy-seq (cons n (rep n))))
and this?
(defn rep [n]
(cons n (lazy-seq (rep n))))
That's a lot of questions!
How does the seq API actually works with LazySeq?
If you take a look at LazySeq's class source code you will notice that it implements ISeq interface providing methods like first, more and next.
Functions like map, take and filter are built using lazy-seq (they produce lazy sequences) and first and rest (which in turn uses more) and that's how they can work with lazy seq as their input collection - by using first and more implementations of LazySeq class.
What actually happens in the following expression?
(reduce + (take 3 (map inc (rep 5))))
The key is to look how LazySeq.first works. It will invoke the wrapped function to obtain and memoize the result. In your case it will be the following code:
(cons n (rep n))
Thus it will be a cons cell with n as its value and another LazySeq instance (result of a recursive call to rep) as its rest part. It will become the realised value of this LazySeq object and first will return the value of the cached cons cell.
When you call more on it, it will in the same way ensure that the value of the particular LazySeq object is realised (or reused memoized value) and call more on it (in this case more on the cons cell containing another LazySeq object).
Once you obtain another instance of LazySeq object with more the story repeats when you call first on it.
map and take will create another lazy-seq that will call first and more of the collection passed as their argument (just another lazy seq) so it will be similar story. The difference will be only in how the values passed to cons are generated (e.g. calling f to a value obtained by first invoked on the LazySeq value mapped over in map instead of a raw value like n in your rep function).
With reduce it's a bit simpler as it will use loop with first and more to iterate over the input lazy seq and apply the reducing function to produce the final result.
As the actual implementation looks like for map and take I encourage you to check their source code - it's quite easy to follow.
How seq API can works with different collection types (e.g. lazy seq and persistent vector)?
As mentioned above, map, take and other functions work in terms of first and rest (reminder - rest is implemented on top of more). Thus we need to explain how first and rest/more can work with different collection types: they check if the collection implements ISeq (and then it implement those functions directly) or they try to create a seq view of the collection and coll its implementation of first and more.
Is it possible to generate nested infinite data structures?
It's definitely possible but I am not sure what the exact data shape you would like to get. Do you mean getting a lazy seq which generates another sequence as it's value (instead of a single value like n in your rep) but returns it as a flat sequence?
(defn nested-cons [n]
(lazy-seq (cons (repeat n n) (nested-cons (inc n)))))
(take 3 (nested-cons 1))
;; => ((1) (2 2) (3 3 3))
that would rather return (1 2 2 3 3 3)?
For such cases you might use concat instead of cons which creates a lazy sequence of two or more sequences:
(defn nested-concat [n]
(lazy-seq (concat (repeat n n) (nested-concat (inc n)))))
(take 6 (nested-concat 1))
;; => (1 2 2 3 3 3)
Is there any practical difference with this
(defn rep [n]
(lazy-seq (cons n (rep n))))
and this?
(defn rep [n]
(cons n (lazy-seq (rep n))))
In this particular case not really. But in the case where a cons cell doesn't wrap a raw value but a result of a function call to calculate it, the latter form is not fully lazy. For example:
(defn calculate-sth [n]
(println "Calculating" n)
n)
(defn rep1 [n]
(lazy-seq (cons (calculate-sth n) (rep1 (inc n)))))
(defn rep2 [n]
(cons (calculate-sth n) (lazy-seq (rep2 (inc n)))))
(take 0 (rep1 1))
;; => ()
(take 0 (rep2 1))
;; Prints: Calculating 1
;; => ()
Thus the latter form will evaluate its first element even if you might not need it.

Clojure: map function isn't returning something I can eval

I'm writing a little "secret santa" program to get my hands dirty with Clojure, and I'm stumbling with my output.
The program takes a list of sets (Santas), extracts their emails into another list, then randomly assigns recipients to Santas. I think I've mostly got it, but when I try to output the results of my map, I'm getting #<Fn#dc32d15 clojure.core/map$fn__4549>,
(ns secret-santas-helper.core
(:require [clojure.pprint :as pprint])
(:gen-class))
(def santas [{:name "Foo" :email "foo#gmail.com"}
{:name "Bar" :email "bar#gmail.com"}
{:name "Baz" :email "baz#gmail.com"}])
(defn pluck
"Pull out the value of a given key from a seq"
[arr k]
(map #(get % k) arr))
(defn find-first
"Find the first matching value"
[f coll]
(first (filter f coll)))
(defn assign-santas
"Iterate over a list of santas and assign a recipient"
[recipients santas]
(let [r (atom recipients)])
(map (fn [santa]
(let [recipient (find-first #(= % (get santa :email)) #recipients)]
(assoc santa :recipient recipient)
(swap! recipients (remove #(= % recipient) recipients))))))
(defn -main []
(let [recipients (shuffle (pluck santas :email))
pairs (assign-santas recipients santas)]
(pprint/pprint pairs)))
Also be careful on how you use map. You are returning the result of your swap! which I don't believe is what you are aiming at.
Keep working on getting your version compiling and functioning correctly. I wanted to give an alternative solution to your problem that works less with mutation and instead is focused on combining collections.
(def rand-santas
"Randomize the current santa list"
(shuffle santas))
(def paired-santas
"Use partition with overlap to pair up all random santas"
(partition 2 1 rand-santas))
(def final-pairs
"Add the first in the list as santa to the last to ensure everyone is paired"
(conj paired-santas (list (last rand-santas) (first rand-santas))))
(defn inject-santas
"Loop through all pairs and assoc the second pair into first as the secret santa"
[pairs]
(map
(fn [[recipent santa]]
(assoc recipent :santa santa))
pairs))
(defn -main []
(pprint/pprint (inject-santas final-pairs)))
Your assign-santas function is returning a map transducer. When you apply map to a single argument, it returns a transducer that will perform that transform in a transducing context. Most likely you intended to provide a third arg, santas, to map over.
Inside the assign-santas function, you are using # to deref a value that is not an atom. Perhaps you meant #r instead of #recipients, but your let block is stops too soon and doesn't yet provide the r binding to the rest of the function body.
Lisp (in general) and Clojure (specific case) are different, and require a different way of approaching a problem. Part of learning how to use Clojure to solve problems seems to be unlearning a lot of habits we've acquired when doing imperative programming. In particular, when doing something in an imperative language we often think "How can I start with an empty collection and then add elements to it as I iterate through my data so I end up with the results I want?". This is not good thinking, Clojure-wise. In Clojure the thought process needs to be more along the lines of, "I have one or more collections which contain my data. How can I apply functions to those collections, very likely creating intermediate (and perhaps throw-away) collections along the way, to finally get the collection of results I want?".
OK, let's cut to the chase, and then we'll go back and see why we did what we did. Here's how I modified the original code:
(def santas [{:name "Foo" :email "foo#gmail.com"}
{:name "Bar" :email "bar#gmail.com"}
{:name "Baz" :email "baz#gmail.com"}])
(def kids [{:name "Tommy" :email "tommy#gmail.com"}
{:name "Jimmy" :email "jimmy#gmail.com"}
{:name "Jerry" :email "jerry#gmail.com"}
{:name "Johny" :email "johny#gmail.com"}
{:name "Juney" :email "juney#gmail.com"}])
(defn pluck
"Pull out the value of a given key from a seq"
[arr k]
(map #(get % k) arr))
(defn assign-santas [recipients santas]
; Assign kids to santas randomly
; recipients is a shuffled/randomized vector of kids
; santas is a vector of santas
(let [santa-reps (inc (int (/ (count recipients) (count santas)))) ; counts how many repetitions of the santas collection we need to cover the kids
many-santas (flatten (repeat santa-reps santas))] ; repeats the santas collection 'santa-reps' times
(map #(hash-map :santa %1 :kid %2) many-santas recipients)
)
)
(defn assign-santas-main []
(let [recipients (shuffle (pluck kids :email))
pairs (assign-santas recipients (map #(%1 :name) santas))]
; (pprint/pprint pairs)
pairs))
I created a separate collection of kids who are supposed to be assigned randomly to a santa. I also changed it so it creates an assign-santas-main function instead of -main, just for testing purposes.
The only function changed is assign-santas. Instead of starting with an empty collection and then trying to mutate that collection to accumulate the associations we need I did the following:
Determine how many repetitions of the santas collection are needed so we have have at least as many santas as kids (wait - we'll get to it... :-). This is just
TRUNC(#_of_kids / #_of_santas) + 1
or, in Clojure-speak
`(inc (int (/ (count recipients) (count santas))))`
Create a collection which the santas collection repeated as many times as needed (from step 1). This is done with
(flatten (repeat santa-reps santas))
This duplicates (repeat) the santas collection santa-reps times (santa-reps was computed by step 1) and then flatten's it - i.e. takes the elements from all the sub-collections (try executing (repeat 3 santas) and see what you get) and just makes a big flat collection of all the sub-collection's elements.
We then do
(map #(hash-map :santa %1 :kid %2) many-santas recipients)
This says "Take the first element from each of the many-santas and recipients collections, pass them in to the anonymous function given, and then accumulate the results returned by the function into a new collection". (New collection, again - we do that a lot in Clojure). Our little anonymous function says "Create an association (hash-map function), assigning a key of :santa to the first argument I'm given, and a key of :kid to the second argument". The map function then returns that collection of associations.
If you run the assign-santas-main function you get a result which looks like
({:kid "jimmy#gmail.com", :santa "Foo"}
{:kid "tommy#gmail.com", :santa "Bar"}
{:kid "jerry#gmail.com", :santa "Baz"}
{:kid "johny#gmail.com", :santa "Foo"}
{:kid "juney#gmail.com", :santa "Bar"})
(I put each association on a separate line - Clojure isn't so gracious when it prints it out - but you get the idea). If you run it again you get something different:
({:kid "juney#gmail.com", :santa "Foo"}
{:kid "tommy#gmail.com", :santa "Bar"}
{:kid "jimmy#gmail.com", :santa "Baz"}
{:kid "johny#gmail.com", :santa "Foo"}
{:kid "jerry#gmail.com", :santa "Bar"})
And so on with each different run.
Note that in the rewritten version of assign-santas the entire function could have been written on a single line. I only used a let here to break the calculation of santa-reps and the creation of many-santas out so it was easy to see and explain.
For me, one of the things I find difficult with Clojure (and this is because I'm still very much climbing the learning curve - and for me, with 40+ years of imperative programming experience and habits behind me, this is a pretty steep curve) is just learning the basic functions and how to use them. Some that I find handy on a regular basis are:
map
apply
reduce
I have great difficulty remembering the difference between apply and
reduce. In practice, if one doesn't do what I want I use the other.
repeat
flatten
interleave
partition
hash-map
mapcat
and of course all the "usual" things like +, -, etc.
I'm pretty sure that someone who's more expert than I am at Clojure (not much of a challenge :-) could come up with a way to do this faster/better/cooler, but this might at least give you a different perspective on how to approach this.
Best of luck.

Put an element to the tail of a collection

I find myself doing a lot of:
(concat coll [e]) where coll is a collection and e a single element.
Is there a function for doing this in Clojure? I know conj does the job best for vectors but I don't know up front which coll will be used. It could be a vector, list or sorted-set for example.
Some types of collections can add cheaply to the front (lists, seqs), while others can add cheaply to the back (vectors, queues, kinda-sorta lazy-seqs). Rather than using concat, if possible you should arrange to be working with one of those types (vector is most common) and just conj to it: (conj [1 2 3] 4) yields [1 2 3 4], while (conj '(1 2 3) 4) yields (4 1 2 3).
concat does not add an element to the tail of a collection, nor does it concatenate two collections.
concat returns a seq made of the concatenation of two other seqs. The original type of the collections from which seqs may be inferred are lost for the return type of concat.
Now, clojure collections have different properties one must know about in order to write efficient code, that's why there isn't a universal function available in core to concatenate collections of any kind together.
To the contrary, list and vectors do have "natural insertion positions" which conj knows, and does what is right for the kind of collection.
This is a very small addendum to #amalloy's answer in order to address OP's request for a function that always adds to the tail of whatever kind of collection. This is an alternative to (concat coll [x]). Just create a vector version of the original collection:
(defn conj*
[s x]
(conj (vec s) x))
Caveats:
If you started with a lazy sequence, you've now destroyed the laziness--i.e. the output is not lazy. This may be either a good thing or a bad thing, depending on your needs.
There's some cost to creating the vector. If you need to call this function a lot, and you find (e.g. by benchmarking with Criterium) that this cost is significant for your purposes, then follow the other answers' advice to try to use vectors in the first place.
To distill the best of what amalloy and Laurent Petit have already said: use the conj function.
One of the great abstractions that Clojure provides is the Sequence API, which includes the conj function. If at all possible, your code should be as collection-type agnostic as it can be, instead using the seq API to handle operations on collections and picking a particular collection type only when you need to be specific.
If vectors are a good match, then yes, conj will be adding items onto the end. If use lists instead, then conj will be adding things to the front of your collection. But if you then use the standard seq API functions for pulling items from the "top" of a collection (the back of a vector, the front of a list), it doesn't matter which implementation you use, because it will always use the one with best performance and thus adding and removing items will be consistent.
If you are working with lazy sequences, you can also use lazy-cat:
(take 5 (lazy-cat (range) [1])) ; (0 1 2 3 4)
Or you could make it a utility method:
(defn append [coll & items] (lazy-cat coll items))
Then use it like this:
(take 5 (append (range) 1)) ; (0 1 2 3 4)

lazy-seq for recursive function

Sorry for the vague title, I guess I just don't understand my problem well enough to ask it yet but here goes. I want to write a recursive function which takes a sequence of functions to evaluate and then calls itself with their results & so on. The recursion stops at some function which returns a number.
However, I would like the function being evaluated at any point in the recursion, f, to be wrapped in a function, s, which returns an initial value (say 0, or the result of another function i) the first time it is evaluated, followed by the result of evaluating f (so that the next time it is evaluated it returns the previously evaluated result, and computes the next value). The aim is to decouple the recursion so that it can proceed without causing this.
I think I'm asking for a lazy-seq. It's a pipe that's filling-up with evaluations of a function at one end, and historical results are coming out of the other.
Your description reminds me some of reductions? Reductions will perform a reduce and return all the intermediate results.
user> (reductions + (range 10))
(0 1 3 6 10 15 21 28 36 45)
Here (range 10) creates a seq of 0 to 9. Reductions applies + repeatedly, passing the previous result of + and the next item in the sequence. All of the intermediate results are returned. You might find it instructive to look at the source of reductions.
If you need to build a test (check for value) into this, that's easy to do with an if in your function (although it won't stop traversing the seq). If you want early exit on a condition being true, then you'll need to write your own loop/recur which amalloy has already done well.
I hate to say it, but I suspect this might also be a case for the State Monad but IANAMG (I Am Not A Monad Guy).
I don't understand your entire goal: a lot of the terms you use are vague. Like, what do you mean you want to evaluate a sequence of functions and then recur on their results? These functions must be no-arg functions (thunks), then, I suppose? But having a thunk which first returns x, and then returns y the next time you call it, is pretty vile and stateful. Perhaps trampoline will solve part of your problem?
You also linked to something you want to avoid, but seem to have pasted the wrong link - it's just a link back to this page. If what you want to avoid is stack overflow, then trampoline is likely to be an okay way to go about it, although it should be possible with just loop/recur. This notion of thunks returning x unless they return y is madness, if avoiding stack overflow is your primary goal. Do not do that.
I've gone ahead and taken a guess at the most plausible end goal you might have, and here's my implementation:
(defn call-until-number [& fs]
(let [numeric (fn [x] (when (number? x) x))]
(loop [fs fs]
(let [result (map #(%) fs)]
(or (some numeric result)
(recur result))))))
(call-until-number (fn [] (fn [] 1))) ; yields 1
(call-until-number (fn [] (fn [] 1)) ; yields 2
(fn [] 2))
(call-until-number (fn f [] f)) ; never returns, no overflow

Permuting output of a tree of closures

This a conceptual question on how one would implement the following in Lisp (assuming Common Lisp in my case, but any dialect would work). Assume you have a function that creates closures that sequentially iterate over an arbitrary collection (or otherwise return different values) of data and returns nil when exhausted, i.e.
(defun make-counter (up-to)
(let ((cnt 0))
(lambda ()
(if (< cnt up-to)
(incf cnt)
nil))))
CL-USER> (defvar gen (make-counter 3))
GEN
CL-USER> (funcall gen)
1
CL-USER> (funcall gen)
2
CL-USER> (funcall gen)
3
CL-USER> (funcall gen)
NIL
CL-USER> (funcall gen)
NIL
Now, assume you are trying to permute a combinations of one or more of these closures. How would you implement a function that returns a new closure that subsequently creates a permutation of all closures contained within it? i.e.:
(defun permute-closures (counters)
......)
such that the following holds true:
CL-USER> (defvar collection (permute-closures (list
(make-counter 3)
(make-counter 3))))
CL-USER> (funcall collection)
(1 1)
CL-USER> (funcall collection)
(1 2)
CL-USER> (funcall collection)
(1 3)
CL-USER> (funcall collection)
(2 1)
...
and so on.
The way I had it designed originally was to add a 'pause' parameter to the initial counting lambda such that when iterating you can still call it and receive the old cached value if passed ":pause t", in hopes of making the permutation slightly cleaner. Also, while the example above is a simple list of two identical closures, the list can be an arbitrarily-complicated tree (which can be permuted in depth-first order, and the resulting permutation set would have the shape of the tree.).
I had this implemented, but my solution wasn't very clean and am trying to poll how others would approach the problem.
Thanks in advance.
edit Thank you for all the answers. What I ended up doing was adding a 'continue' argument to the generator and flattening my structure by replacing any nested list with a closure that permuted that list. The generators did not advance and always returned the last cached value unless 'continue' was passed. Then I just recursively called each generator until I got to the either the last cdr or a nil. If i got to the last cdr, I just bumped it. If I got to a NIL, I bumped the one before it, and reset every closure following it.
You'll clearly need some way of using each value returned by a generator more than once.
In addition to Rainer Joswig's suggestions, three approaches come to mind.
Caching values
permute-closures could, of course, remember every value returned by each generator by storing it in a list, and reuse that over and over. This approach obviously implies some memory overhead, and it won't work very well if the generated sequences can be infinite.
Creating new generators on each iteration
In this approach, you would change the signature of permute-closures to take as arguments not ready-to-use generators but thunks that create them. Your example would then look like this:
(permute-closures (list (lambda () (make-counter 3))
(lambda () (make-counter 3))))
This way, permute-closures is able to reset a generator by simply recreating it.
Making generator states copyable
You could provide a way of making copies of generators along with their states. This is kind of like approach #2 in that permute-closures would reset the generators as needed except the resetting would be done by reverting to a copy of the original state. Also, you would be able to do partial resets (i.e., backtrack to an arbitrary point rather than just the beginning), which may or may not make the code of permute-closures significantly simpler.
Copying generator states might be a bit easier in a language with first-class continuations (like Scheme), but if all generators follow some predefined structure, abstracting it away with a define-generator macro or some such should be possible in Common Lisp as well.
I would add to the counter either one of these:
being able to reset the counter to the start
letting the counter return NIL when the count is done and then starting from the first value again on the next call

Resources