Delete an entry from a collection in Clojure - collections

I'm new to Clojure and I'm wondering how I remove an element from a collection.
Say I have:
(def example ["a" "b" "c"])
I want to be able to remove say "b" and when I call
(println example)
and have it return a collection with only "a" and "c"
I know using (remove (partial = "b") example)
will return what I want but then how do i update the example variable with this?
Thanks!

(filter (fn [x] (not (= x "b"))) example)
Will get you '("a" "c"). Couple of points:
You shouldn't be thinking in terms of mutation. The whole point of using functional programming in general and clojure with it's persistent data structures in particular is to avoid the problems associated with mutability.
If you do really, really need something to be mutable you can use atoms, but if you're not sure you need it to be mutable, you don't.

First, check if you really need mutation in the first place. Clojure is designed around working with immutable data - there's a chance that what you ultimately want to do can be achieved without changing values in place.
If you need mutable data, you can create an atom for that - changing a car's value is generally a bad practice.
(def example (atom ["a" "b" "c"]))
(println #example) ;; ["a" "b" "c"]
(swap! example #(remove (partial = "b") %))
(println #example) ;; ["a" "c"]

Clojure's default data structures are immutable. Hence you cannot change a vector in place, but you can create a new one with the desired elements.
In a function context, this is how you could use remove:
(defn my-func [col]
(let [without-b (remove #(= "b" %) col)]
(println without-b)
; do something else w/ without-b
))
...
=> (my-func ["a" "b" "c"])
(a c)
This is the idiomatic Clojure way to work with collections, i.e., you create a new collection from an old one. This doesn't have "significant" memory or performance implications, as the data structures are implemented on a tree-based data structure, Tries, you can learn more about this here:
https://hypirion.com/musings/understanding-persistent-vector-pt-1

Related

Why exactly is filtering using a set more performant than filtering using a vector?

After some researching, I was recently able to dramatically improve the performance of some code by using a set to compare rather than a vector. Here is a simple example of the initial code:
(def target-ids ["a" "b" "c"])
(def maps-to-search-through
[{"id": "a" "value": "example"}
{"id": "e" "value": "example-2"}])
(filter (fn [i] (some #(= (:id i) %) target-ids)) maps-to-search-through)
And here is the optimised code:
(def target-ids #{"a" "b" "c"})
(def maps-to-search-through
[{"id": "a" "value": "example"}
{"id": "e" "value": "example-2"}])
(filter (comp target-ids :id) maps-to-search-through)
For reference, target-ids and maps-to-search-through are both generated dynamically, and can contain thousands of values each -- although maps-to-search-through will always be at least 5x larger than target-ids.
All advice and documentation I found online suggested this improvement, specifically using a set instead of a vector, would be significantly faster, but didn't elaborate on why that is. I understand that in the initial case, filter is doing a lot of work - iterating through both vectors on every step. But I don't understand how that isn't the case in the improved code.
Can anyone help explain?
Sets are data structures that are designed to only contain unique values. Also you can use them as functions to check whether a given value is a member of the very set - just as you use your target-ids set. It basically boils down to a call of Set.contains on JVM side which uses some clever hash-based logic.
Your first solution loops through the vector using some, so it's similar to a nested for loop which is obviously slower.

What's the difference between a sequence and a collection in Clojure

I am a Java programmer and am new to Clojure. From different places, I saw sequence and collection are used in different cases. However, I have no idea what the exact difference is between them.
For some examples:
1) In Clojure's documentation for Sequence:
The Seq interface
(first coll)
Returns the first item in the collection.
Calls seq on its argument. If coll is nil, returns nil.
(rest coll)
Returns a sequence of the items after the first. Calls seq on its argument.
If there are no more items, returns a logical sequence for which seq returns nil.
(cons item seq)
Returns a new seq where item is the first element and seq is the rest.
As you can see, when describing the Seq interface, the first two functions (first/rest) use coll which seems to indicate this is a collection while the cons function use seq which seems to indicate this is a sequence.
2) There are functions called coll? and seq? that can be used to test if a value is a collection or a sequence. It is clearly collection and sequence are different.
3) In Clojure's documentation about 'Collections', it is said:
Because collections support the seq function, all of the sequence
functions can be used with any collection
Does this mean all collections are sequences?
(coll? [1 2 3]) ; => true
(seq? [1 2 3]) ; => false
The code above tells me it is not such case because [1 2 3] is a collection but is not a sequence.
I think this is a pretty basic question for Clojure but I am not able to find a place explaining this clearly what their difference is and which one should I use in different cases. Any comment is appreciated.
Any object supporting the core first and rest functions is a sequence.
Many objects satisfy this interface and every Clojure collection provides at least one kind of seq object for walking through its contents using the seq function.
So:
user> (seq [1 2 3])
(1 2 3)
And you can create a sequence object from a map too
user> (seq {:a 1 :b 2})
([:a 1] [:b 2])
That's why you can use filter, map, for, etc. on maps sets and so on.
So you can treat many collection-like objects as sequences.
That's also why many sequence handling functions such as filter call seq on the input:
(defn filter
"Returns a lazy sequence of the items in coll for which
(pred item) returns true. pred must be free of side-effects."
{:added "1.0"
:static true}
([pred coll]
(lazy-seq
(when-let [s (seq coll)]
If you call (filter pred 5)
Don't know how to create ISeq from: java.lang.Long
RT.java:505 clojure.lang.RT.seqFrom
RT.java:486 clojure.lang.RT.seq
core.clj:133 clojure.core/seq
core.clj:2523 clojure.core/filter[fn]
You see that seq call is the is this object a sequence validation.
Most of this stuff is in Joy of Clojure chapter 5 if you want to go deeper.
Here are few points that will help understand the difference between collection and sequence.
"Collection" and "Sequence" are abstractions, not a property that can be determined from a given value.
Collections are bags of values.
Sequence is a data structure (subset of collection) that is expected to be accessed in a sequential (linear) manner.
The figure below best describes the relation between them:
You can read more about it here.
Every sequence is a collection, but not every collection is a sequence.
The seq function makes it possible to convert a collection into a sequence. E.g. for a map you get a list of its entries. That list of entries is different from the map itself, though.
In Clojure for the brave and true the author sums it up in a really understandable way:
The collection abstraction is closely related to the sequence
abstraction. All of Clojure's core data structures — vectors, maps,
lists and sets — take part in both abstractions.
The abstractions differ in that the sequence abstraction is "about"
operating on members individually while the collection abstraction is
"about" the data structure as a whole. For example, the collection
functions count, empty?, and every? aren't about any individual
element; they're about the whole.
I have just been through Chapter 5 - "Collection Types" of "The Joy of Clojure", which is a bit confusing (i.e. the next version of that book needs a review). In Chapter 5, on page 86, there is a table which I am not fully happy with:
So here's my take (fully updated after coming back to this after a month of reflection).
collection
It's a "thing", a collection of other things.
This is based on the function coll?.
The function coll? can be used to test for this.
Conversely, anything for which coll? returns true is a collection.
The coll? docstring says:
Returns true if x implements IPersistentCollection
Things that are collections as grouped into three separate classes. Things in different classes are never equal.
Maps Test using (map? foo)
Map (two actual implementations with slightly differing behaviours)
Sorted map. Note: (sequential? (sorted-map :a 1) ;=> false
Sets Test using (set? foo)
Set
Sorted set. Note: (sequential? (sorted-set :a :b)) ;=> false
Sequential collections Test using (sequential? foo)
List
Vector
Queue
Seq: (sequential? (seq [1 2 3])) ;=> true
Lazy-Seq: (sequential? (lazy-seq (seq [1 2 3]))) ;=> true
The Java interop stuff is outside of this:
(coll? (to-array [1 2 3])) ;=> false
(map? (doto (new java.util.HashMap) (.put "a" 1) (.put "b" 2))) ;=> false
sequential collection (a "chain")
It's a "thing", a collection holding other things according to a specific, stable ordering.
This is based on the function sequential?.
The function sequential? can be used to test for this.
Conversely, anything for which sequential? returns true is a sequential collection.
The sequential? docstring says:
Returns true if coll implements Sequential
Note: "sequential" is an adjective! In "The Joy of Clojure", the adjective is used as a noun and this is really, really, really confusing:
"Clojure classifies each collection data type into one of three
logical categories or partitions: sequentials, maps, and sets."
Instead of "sequential" one should use a "sequential thing" or a "sequential collection" (as used above). On the other hand, in mathematics the following words already exist: "chain", "totally ordered set", "simply ordered set", "linearly ordered set". "chain" sounds excellent but no-one uses that word. Shame!
"Joy of Clojure" also has this to say:
Beware type-based predicates!
Clojure includes a few predicates with names like the words just
defined. Although they’re not frequently used, it seems worth
mentioning that they may not mean exactly what the definitions here
might suggest. For example, every object for which sequential? returns
true is a sequential collection, but it returns false for some that
are also sequential [better: "that can be considered sequential
collections"]. This is because of implementation details that may be
improved in a future version of Clojure [and maybe this has already been
done?]
sequence (also "sequence abstraction")
This is more a concept than a thing: a series of values (thus ordered) which may or may not exist yet (i.e. a stream). If you say that a thing is a sequence, is that thing also necessarily a Clojure collection, even a sequential collection? I suppose so.
That sequential collection may have been completely computed and be completely available. Or it may be a "machine" to generate values on need (by computation - likely in a "pure" fashion - or by querying external "impure", "oracular" sources: keyboard, databases)
seq
This is a thing: something that can be processed by the functions
first, rest, next, cons (and possibly others?), i.e. something that obeys the protocol clojure.lang.ISeq (which is about the same concept as "providing an implementation for an interface" in Java), i.e. the system has registered function implementations for a pair (thing, function-name) [I sure hope I get this right...]
This is based on the function seq?.
The function seq? can be used to test for this
Conversely, a seq is anything for which seq? returns true.
Docstring for seq?:
Return true if x implements ISeq
Docstring for first:
Returns the first item in the collection. Calls seq on its argument.
If coll is nil, returns nil.
Docstring for rest:
Returns a possibly empty seq of the items after the first. Calls seq
on its argument.
Docstring for next:
Returns a seq of the items after the first. Calls seq on its argument.
If there are no more items, returns nil.
You call next on the seq to generate the next element and a new seq. Repeat until nil is obtained.
Joy of Clojure calls this a "simple API for navigating collections" and says "a seq is any object that implements the seq API" - which is correct if "the API" is the ensemble of the "thing" (of a certain type) and the functions which work on that thing. It depends on suitable shift in the concept of API.
A note on the special case of the empty seq:
(def empty-seq (rest (seq [:x])))
(type? empty-seq) ;=> clojure.lang.PersistentList$EmptyList
(nil? empty-seq) ;=> false ... empty seq is not nil
(some? empty-seq) ;=> true ("true if x is not nil, false otherwise.")
(first empty-seq) ;=> nil ... first of empty seq is nil ("does not exist"); beware confusing this with a nil in a nonempty list!
(next empty-seq) ;=> nil ... "next" of empty seq is nil
(rest empty-seq) ;=> () ... "rest" of empty seq is the empty seq
(type (rest empty-seq)) ;=> clojure.lang.PersistentList$EmptyList
(seq? (rest empty-seq)) ;=> true
(= (rest empty-seq) empty-seq) ;=> true
(count empty-seq) ;=> 0
(empty? empty-seq) ;=> true
Addenda
The function seq
If you apply the function seq to a thing for which that makes sense (generally a sequential collection), you get a seq representing/generating the members of that collection.
The docstring says:
Returns a seq on the collection. If the collection is empty, returns
nil. (seq nil) returns nil. seq also works on Strings, native Java
arrays (of reference types) and any objects that implement Iterable.
Note that seqs cache values, thus seq should not be used on any
Iterable whose iterator repeatedly returns the same mutable object.
After applying seq, you may get objects of various actual classes:
clojure.lang.Cons - try (class (seq (map #(* % 2) '( 1 2 3))))
clojure.lang.PersistentList
clojure.lang.APersistentMap$KeySeq
clojure.lang.PersistentList$EmptyList
clojure.lang.PersistentHashMap$NodeSeq
clojure.lang.PersistentQueue$Seq
clojure.lang.PersistentVector$ChunkedSeq
If you apply seq to a sequence, the actual class of the thing returned may be different from the actual class of the thing passed in. It will still be a sequence.
What the "elements" in the sequence are depends. For example, for maps, they are key-value pairs which look like 2-element vector (but their actual class is not really a vector).
The function lazy-seq
Creates a thing to generate more things lazily (a suspended machine, a suspended stream, a thunk)
The docstring says:
Takes a body of expressions that returns an ISeq or nil, and yields a
Seqable object that will invoke the body only the first time seq is
called, and will cache the result and return it on all subsequent seq
calls. See also - realized?"
A note on "functions" and "things" ... and "objects"
In the Clojure Universe, I like to talk about "functions" and "things", but not about "objects", which is a term heavily laden with Java-ness and other badness. Mention of objects feels like shards poking up from the underlying Java universe.
What is the difference between function and thing?
It's fluid! Some stuff is pure function, some stuff is pure thing, some is in between (can be used as function and has attributes of a thing)
In particular, Clojure allows contexts where one considers keywords (things) as functions (to look up values in maps) or where one interpretes maps (things) as functions, or shorthand for functions (which take a key and return the value associated to that key in the map)
Evidently, functions are things as they are "first-class citizens".
It's also contextual! In some contexts, a function becomes a thing, or a thing becomes a function.
There are nasty mentions of objects ... these are shards poking up from the underlying Java universe.
For presentation purposes, a diagram of Collections
For seq?:
Return true if x implements ISeq
For coll?:
Returns true if x implements IPersistentCollection
And I found ISeq interface extends from IPersistentCollection in Clojure source code, so as Rörd said, every sequences is a collection.

Checking the reverse of a list is the same as the list unchanged?

I am learning scheme and one of the things I have to do is recursion to figure out if the list is reflective, i.e. the list looks the same when it is reversed. I have to do it primitively so I can't use the reverse method for lists. I must also use recursion which is obvious. The problem is in scheme it's very hard to access the list or shorten the list using the very basic stuff that we have learned, since these are kinda of like linkedlists. I also want to do without using indexing.
With that said I have a few ideas and was wondering If any of these sufficient and do you think I could actually do it better with the basics of scheme.
Reverse the list using recursion (my implementation) and compare the original and this rev. list.
Compare the first and last element by recursing on the rest of the list to find the last element and compare to first. Keeping track of how many times I have recursed and then do it one less for the second last element of the list, to compare to the second element of the list. (This is very complicated to do as I've tried it and wound up failing but I want to see you guys would've done the same)
Shorten the list to trim off the first and last element each time and compare. I'm not sure if this can be done using the basics of scheme.
Your suggestions or hints or anything. I'm very new to scheme. Thanks for reading. I know it's long.
Checking if a list is a palindrome without reversing it is one of the examples of a technique explained in "There and Back Again" by Danvy and Goldberg. They do it in (ceil (/ (length lst) 2)) recursive calls, but there's a simpler version that does it in (length lst) calls.
Here's the skeleton of the solution:
(define (pal? orig-lst)
;; pal* : list -> list/#f
;; If lst contains the first N elements of orig-lst in reverse order,
;; where N = (length lst), returns the Nth tail of orig-lst.
;; Otherwise, returns #f to indicate orig-lst cannot be a palindrome.
(define (pal* lst)
....)
.... (pal* orig-lst) ....)
This sounds like it might be homework, so I don't want to fill in all of the blanks.
I agree that #1 seems the way to go here. It's simple, and I can't imagine it failing. Maybe I don't have a strong enough imagination. :)
The other options you're considering seem awkward because we're talking about linked lists, which directly support sequential access but not random access. As you note, "indexing" into a linked list is verboten. It ends up marching dutifully down the list structure. Because of this, the other options are certainly doable, but they are expensive.
That expense isn't because we're in Scheme: it's because we're dealing with linked lists. Just to make sure it's clear: Scheme has vectors (arrays) which do support fast random-access. Testing palindrome-ness on a vector is as easy as you'd expect:
#lang racket
;; In Professional-level Racket
(define (palindrome? vec)
(for/and ([i (in-range 0 (vector-length vec))]
[j (in-range (sub1 (vector-length vec)) -1 -1)])
(equal? (vector-ref vec i)
(vector-ref vec j))))
;; Examples
(palindrome? (vector "b" "a" "n" "a" "n" "a"))
(palindrome? (vector "a" "b" "b" "a"))
so one point of the problem you're tackling, I think, is to show that the data structure you choose---the representation of the problem---can have a strong effect on the problem solution.
(Aside: #2 is certainly doable, though it goes against the grain of the single-linked list data structure. The approach to #3 requires a radical change to the representation: from a first glance, I think you'd need mutable, double-linked lists to do the solution any justice, since you need to be able march backward.)
To answer dyoo's question, you don't have to know that you're at the halfway point; you just have to make a certain comparison. If that comparison works, then a) the string must be reversible and b) you must be at the midway point.
So that more efficient solution is there, if you can just reach for it...

Tying the knot in Clojure: circular references without (explicit, ugly) mutation?

In my answer at Clojure For Comprehension example I have a function that processes its own output:
(defn stream [seed]
(defn helper [slow]
(concat (map #(str (first slow) %) seed) (lazy-seq (helper (rest slow)))))
(declare delayed)
(let [slow (cons "" (lazy-seq delayed))]
(def delayed (helper slow))
delayed))
(take 25 (stream ["a" "b" "c"]))
("a" "b" "c" "aa" "ab" "ac" "ba" "bb" "bc" "ca" "cb" "cc" "aaa" "aab" "aac"
"aba" "abb" "abc" "aca" "acb" "acc" "baa" "bab" "bac" "bba")
It works by creating a forward reference (delayed) which is used as the second entry in a lazy sequence (slow). That sequence is passed to the function, which is lazy, and the output from that function (the very first part of the lazy sequence, which does not require the evaluation of delayed) is then used to set the value of delayed.
In this way I "tie the knot". But this is done much more elegantly in Haskell (eg. Explanation of “tying the knot”). Given that Clojure has delay and force, I wondered if there was a better way to do the above?
The question then: can the (ugly, explicit) mutation (of delayed) somehow be avoided in the code above? Obviously(?) you still need mutation, but can it be hidden by "lazy" constructs?
[I had a question last night with a similar title when I was still trying to understand how to do this; no one replied before the code above worked, so I deleted it, but I am not really happy with this approach so am trying again.]
See also: Must Clojure circular data structures involve constructs like ref? (kinda frustrating that people are duplicating questions).
I'm not sure I can answer the question for the general case, but this function seems to solve the particular case.
(defn stream
[seed]
(let [step (fn [prev] (for [p prev s seed] (str p s)))]
(for [x (iterate step seed) y x] y)))
Although I ran into a out of memory exception for a large (dorun (take ...)). So there probably is an issue with this function.

Put an element to the tail of a collection

I find myself doing a lot of:
(concat coll [e]) where coll is a collection and e a single element.
Is there a function for doing this in Clojure? I know conj does the job best for vectors but I don't know up front which coll will be used. It could be a vector, list or sorted-set for example.
Some types of collections can add cheaply to the front (lists, seqs), while others can add cheaply to the back (vectors, queues, kinda-sorta lazy-seqs). Rather than using concat, if possible you should arrange to be working with one of those types (vector is most common) and just conj to it: (conj [1 2 3] 4) yields [1 2 3 4], while (conj '(1 2 3) 4) yields (4 1 2 3).
concat does not add an element to the tail of a collection, nor does it concatenate two collections.
concat returns a seq made of the concatenation of two other seqs. The original type of the collections from which seqs may be inferred are lost for the return type of concat.
Now, clojure collections have different properties one must know about in order to write efficient code, that's why there isn't a universal function available in core to concatenate collections of any kind together.
To the contrary, list and vectors do have "natural insertion positions" which conj knows, and does what is right for the kind of collection.
This is a very small addendum to #amalloy's answer in order to address OP's request for a function that always adds to the tail of whatever kind of collection. This is an alternative to (concat coll [x]). Just create a vector version of the original collection:
(defn conj*
[s x]
(conj (vec s) x))
Caveats:
If you started with a lazy sequence, you've now destroyed the laziness--i.e. the output is not lazy. This may be either a good thing or a bad thing, depending on your needs.
There's some cost to creating the vector. If you need to call this function a lot, and you find (e.g. by benchmarking with Criterium) that this cost is significant for your purposes, then follow the other answers' advice to try to use vectors in the first place.
To distill the best of what amalloy and Laurent Petit have already said: use the conj function.
One of the great abstractions that Clojure provides is the Sequence API, which includes the conj function. If at all possible, your code should be as collection-type agnostic as it can be, instead using the seq API to handle operations on collections and picking a particular collection type only when you need to be specific.
If vectors are a good match, then yes, conj will be adding items onto the end. If use lists instead, then conj will be adding things to the front of your collection. But if you then use the standard seq API functions for pulling items from the "top" of a collection (the back of a vector, the front of a list), it doesn't matter which implementation you use, because it will always use the one with best performance and thus adding and removing items will be consistent.
If you are working with lazy sequences, you can also use lazy-cat:
(take 5 (lazy-cat (range) [1])) ; (0 1 2 3 4)
Or you could make it a utility method:
(defn append [coll & items] (lazy-cat coll items))
Then use it like this:
(take 5 (append (range) 1)) ; (0 1 2 3 4)

Resources