convert multidimensional array to nested vector in clojure? - multidimensional-array

to-array-2D is a handy function for converting a collection of collections into a 2D java array. Is there a function to go the other way?
I would like to get a vector of vectors from a 2D java array.

You could do:
(mapv vec the-array)
Although in that case, take into account the documentation of vec
clojure.core/vec
([coll])
Creates a new vector containing the contents of coll. Java arrays
will be aliased and should not be modified.
If you prefer to make a copy (less efficient but safer), do what leeor says in the comment. Shorter version:
(mapv #(into [] %) the-array)

Related

Best way to reduce/fold over 2d array in common lisp

Emacs lisp has reduce-vec. What's the proper way to do this in common lisp, without using loop or reinventing the wheel?
You should be able to use something like the following. It works for arrays of any dimensions.
(defun reduce-multidimensional-array (fn arr &rest args)
(apply #'reduce
fn
(make-array (array-total-size arr) :displaced-to arr)
args))
In short, this works by creating a one dimensional array that shares elements with the array passed in. Since reduce works on one dimensional arrays it is possible to reduce the new array.
The function array-total-size returns the total number of elements in the array and the :displaced-to keyword argument causes the new array to share elements with the array passed in (even if they have different dimensions).

How do you iterate in functional languages?

One reason that pushes me away from functional languages like Lisp is that I have no idea how to do a 'raw' array iteration. Say, I have an array in C that represents the screen pixels's RGB values. Changing colors is trivial with a for loop in C, but how do you do this elegantly in Lisp?
EDIT:
Sorry, I haven't phrased my question correctly.
In C, when I want to change color on the screen, I simply write a for loop over a part of the array.
BUT in scheme, clojure or haskell all data is immutable. So when I change a part of matrix, it would return a brand new matrix. That's a bit awkward. Is there a 'clean' way to change the color of a part of matrix without recursing over whole array and making copies?
In a functional language, you would use recursion.
The recursion scheme can be named.
For example, to recurse over an array of data, applying a function to each pixel, you can manually recurse over the structure of the array:
map f [] = []
-- the empty array
map f (x:xs) = f x : map f xs
-- apply f to the head of the array, and loop on the tail.
(in Haskell).
This recursive form is so common it is called map in most libraries.
To "iterate" through an array in some language like Lisp is a simple map.
The structure is (map f x) where f is a function you want applied to every element of the list/array x.

Generating a bitvector in scheme

I am trying to implement a spellchecker that takes a hash function and a dictionary, and then map the hash values of the words to a bitvector. More specifically, I am trying to write a function called gen-checker that takes as input a list of hash functions and a dictionary of words and returns a spellchecker. The spellchecker must generate a bitvector representation for the input of the dictionary, which contains #t or #f indicating the correct or incorrect spelling of the word.
I have already defined the has functions and have a dictionary to use, but I can't seem to get the bit vector setup
I have tried implementing (make-bitvector 8 #f) found here:
http://www.gnu.org/software/guile/manual/html_node/Bit-Vectors.html
But for some reason drracket does not recognize it. What am I doing wrong? How to implement the bitvector representation?
It may seem like this answer is joking, but it is not:
(define make-bitvector make-vector)
(define bitvector-ref vector-ref)
;; ...
After everything is working, and only then, would one need to optimize storage by bit packing.

The complexity of Scheme vectors

The 6.3.6 Vectors section in the Scheme R5RS standard states the following about vectors:
Vectors are heterogenous structures whose elements are indexed by integers. A vector typically occupies less space than a list of the same length, and the average time required to access a randomly chosen element is typically less for the vector than for the list.
This description of vectors is a bit diffuse.
I'd like to know what this actually means in terms of the vector-ref and list-ref operations and their complexity. Both procedures returns the k-th element of a vector and a list. Is the vector operation O(1) and is the list operation O(n)? How are vectors different than lists? Where can I find more information about this?
Right now I'm using association lists as a data structure for storing key/value pairs for easy lookup. If the keys are integers it would perhaps be better to use vectors to store the values.
The very specific details of vector-ref and list-ref are implementation-dependent, meaning: each Scheme interpreter can implement the specification as it sees fit, so an answer for your question can not be generalized to all interpreters conforming to R5RS, it depends on the actual interpreter you're using.
But yes, in any decent implementation is a safe bet to assume that the vector-ref operation is O(1), and that the list-ref operation is probably O(n). Why? because a vector, under the hood, should be implemented using a data structure native to the implementation language, that allows O(1) access to an element given its index (say, a primitive array) - therefore making the implementation of vector-ref straightforward. Whereas lists in Lisp are created by linking cons cells, and finding an element at any given index entails traversing all the elements before it in the list - hence O(n) complexity.
As a side note - yes, using vectors would be a faster alternative than using association lists of key/value pairs, as long as the keys are integers and the number of elements to be indexed is known beforehand (a Scheme vector can not grow its size after its creation). For the general case (keys other than integers, variable size) check if your interpreter supports hash tables, or use an external library that provides them (say, SRFI 69).
A list is constructed from cons cells. From the R5RS list section:
The objects in the car fields of successive pairs of a list are the elements of the list. For example, a two-element list is a pair whose car is the first element and whose cdr is a pair whose car is the second element and whose cdr is the empty list. The length of a list is the number of elements, which is the same as the number of pairs.
For example, the list (a b c) is equivalent to the following series of pairs: (a . (b . (c . ())))
And could be represented in memory by the following "nodes":
[p] --> [p] --> [p] --> null
| | |
|==> a |==> b |==> c
With each node [] containing a pointer p to the value (it's car), and another pointer to the next element (it's cdr).
This allows the list to grow to an unlimited length, but requires a ref operation to start at the front of the list and traverse k elements in order to find the requested one. As you stated, this is O(n).
By contrast, a vector is basically an array of values which could be internally represented as an array of pointers. For example, the vector #(a b c) might be represented as:
[p p p]
| | |
| | |==> c
| |
| |==> b
|
|==> a
Where the array [] contains a series of three pointers, and each pointer is assigned to a value in the vector. So internally you could reference the third element of the vector v using the notation v[3]. Since you do not need to traverse the previous elements, vector-ref is an O(1) operation.
The main disadvantage is that vectors are of fixed size, so if you need to add more elements than the vector can hold, you have to allocate a new vector and copy the old elements to this new vector. This can potentially be an expensive operation if your application does this on a regular basis.
There are many resources online - this article on Scheme Data Structures goes into more detail and provides some examples, although it is much more focused on lists.
All that said, if your keys are (or can become) integers and you either have a fixed number of elements or can manage with a reasonable amount of vector reallocations - for example, you load the vector at startup and then perform mostly reads - a vector may be an attractive alternative to an association list.

Into or vec: converting sequence back to vector in Clojure

I have the following code which increments the first element of every pair in a vector:
(vec (map (fn [[key value]] [(inc key) value]) [[0 :a] [1 :b]]))
However i fear this code is inelegant, as it first creates a sequence using map and then casts it back to a vector.
Consider this analog:
(into [] (map (fn [[key value]] [(inc key) value]) [[0 :a] [1 :b]]))
On #clojure#irc.freenode.net i was told, that using the code above is bad, because into expands into (reduce conj [] (map-indexed ...)), which produces many intermediate objects in the process. Then i was told that actually into doesn't expand into (reduce conj ...) and uses transients when it can. Also measuring elapsed time showed that into is actually faster than vec.
So my questions are:
What is the proper way to use map over vectors?
What happens underneath, when i use vec and into with vectors?
Related but not duplicate questions:
Clojure: sequence back to vector
How To Turn a Reduce-Realized Sequence Back Into Lazy Vector Sequence
Actually as of Clojure 1.4.0 the preferred way of doing this is to use mapv, which is like map except its return value is a vector. It is by far the most efficient approach, with no unnecessary intermediate allocations at all.
Clojure 1.5.0 will bring a new reducers library which will provide a generic way to map, filter, take, drop etc. while creating vectors, usable with into []. You can play with it in the 1.5.0 alphas and in the recent tagged releases of ClojureScript.
As for (vec some-seq) and (into [] some-seq), the first ultimately delegates to a Java loop which pours some-seq into an empty transient vector, while the second does the same thing in very efficient Clojure code. In both cases there are some initial checks involved to determine which approach to take when constructing the final return value.
vec and into [] are significantly different for Java arrays of small length (up to 32) -- the first will alias the array (use it as the tail of the newly created vector) and demands that the array not be modified subsequently, lest the contents of the vector change (see the docstring); the latter creates a new vector with a new tail and doesn't care about future changes to the array.

Resources