What is the correct definition of destructive and non-destructive constructs in LISP (or in general). I have tried to search for the actual meaning but I have only found a lot of usage of these terms without actually explaining them.
It came to my understanding, that by destructive function is meant a function, that changes the meaning of the construct (or variable) - so when I pass a list as a parameter to a function, which changes it, it is called a destructive operation, because it changes the initial list and return a brand new one. Is this right or are there some exceptions?
So is for example set a destructive function (because it changes the value of x)? I think not but I do not how, how would I justify this.
(set 'x 1)
Sorry for probably a very basic question.... Thanks for any answers!
I would not interpret too much into the word 'destructive'.
In list processing, a destructive operation is one that potentially changes one or more of the input lists as a visible side effect.
Now, you can widen the meaning to operations over arrays, structures, CLOS objects, etc. You can also call variable assignment 'destructive' and so on.
In Common Lisp, it makes sense to talk about destructive operations over sequences (which are lists, strings, and vectors in general) and multi-dimensional arrays.
Practical Common Lisp distinguishes two kinds of destructive operations: for-side-effect operations and recycling operations.
set is destructive and for-side-effect: it always modifies its first argument. Beware, that it changes the binding for a symbol, but not the thing currently bound to that symbol. setf can change either bindings or objects in-place.
By contrast, nreverse is recycling: it is allowed to modify its argument list, although there's no guarantee that it will, so it should be used just like reverse (take the return value), except that the input argument may be "destroyed" and should no longer be used. [Scheme programmers may call this a "linear update" function.]
Related
I am reading about functional languages and I can't understand this particular thing. Suppose a function takes an array of numbers and has to square each number. What if we need to remove or insert some elements? Do we have to return a copy of the mutated array for every operation? If so how are arrays of hundreds of millions of objects manipulated reasonably?
There are several ways that functional languages handle array arguments.
Don't actually use arrays.
Instead of using arrays, one should almost always use some other data structure. Lists, binary search trees, finger trees, functional queues, and other data structures are commonly employed in functional code instead of arrays. It often takes some thought to pick the best data structure.
Have a "special escape hatch" for using mutation.
In Haskell, there is a magical thing known as the ST monad. This allows you to write code in Haskell which manipulates mutable arrays in an imperative style while still guaranteeing that the mutation can't "leak out" the escape hatch. For example, if I have a function f :: Int -> Int and I call f 3 twice, I am guaranteed to get the same results each time even if the function internally uses a mutable array. This is not the case in a language like Java, since calling f(3) might read from and write to mutable state, but in Haskell, you can use mutation fairly freely without compromising purity using ST.
Use linear types.
This is a relatively recent addition to Haskell. Consider a function modify :: Int -> a -> Array a -> Array a, where modify idx new_val original_array should return a new array which is a copy of original_array, except that position idx has been overwritten with value new_val. If we never read from the array original_array after we call the modify function on it, then it's ok for the compiler to secretly modify original_array rather than creating a new array without breaking the abstraction of the code. Linear types basically enforce this restriction within the language. It's rather sophisticated and takes some getting used to, but it allows you to use an underlying mutable data structure safely with functional abstractions. It's more limited than ST but doesn't involve any "imperative thinking".
Use immutable arrays.
You might just bite the bullet and use arrays that must be copied on modification. This is very rarely optimal, but the language may offer some abstractions that make this less bearable and more asymptotically efficient in certain circumstances.
Coming from Wolfram Mathematica, I like the idea that whenever I pass a variable to a function I am effectively creating a copy of that variable. On the other hand, I am learning that in Julia there are the notions of mutable and immutable types, with the former passed by reference and the latter passed by value. Can somebody explain me the advantage of such a distinction? why arrays are passed by reference? Naively I see this as a bad aspect, since it creates side effects and ruins the possibility to write purely functional code. Where I am wrong in my reasoning? is there a way to make immutable an array, such that when it is passed to a function it is effectively passed by value?
here an example of code
#x is an in INT and so is immutable: it is passed by value
x = 10
function change_value(x)
x = 17
end
change_value(x)
println(x)
#arrays are mutable: they are passed by reference
arr = [1, 2, 3]
function change_array!(A)
A[1] = 20
end
change_array!(arr)
println(arr)
which indeed modifies the array arr
There is a fair bit to respond to here.
First, Julia does not pass-by-reference or pass-by-value. Rather it employs a paradigm known as pass-by-sharing. Quoting the docs:
Function arguments themselves act as new variable bindings (new
locations that can refer to values), but the values they refer to are
identical to the passed values.
Second, you appear to be asking why Julia does not copy arrays when passing them into functions. This is a simple one to answer: Performance. Julia is a performance oriented language. Making a copy every time you pass an array into a function is bad for performance. Every copy operation takes time.
This has some interesting side-effects. For example, you'll notice that a lot of the mature Julia packages (as well as the Base code) consists of many short functions. This code structure is a direct consequence of near-zero overhead to function calls. Languages like Mathematica and MatLab on the other hand tend towards long functions. I have no desire to start a flame war here, so I'll merely state that personally I prefer the Julia style of many short functions.
Third, you are wondering about the potential negative implications of pass-by-sharing. In theory you are correct that this can result in problems when users are unsure whether a function will modify its inputs. There were long discussions about this in the early days of the language, and based on your question, you appear to have worked out that the convention is that functions that modify their arguments have a trailing ! in the function name. Interestingly, this standard is not compulsory so yes, it is in theory possible to end up with a wild-west type scenario where users live in a constant state of uncertainty. In practice this has never been a problem (to my knowledge). The convention of using ! is enforced in Base Julia, and in fact I have never encountered a package that does not adhere to this convention. In summary, yes, it is possible to run into issues when pass-by-sharing, but in practice it has never been a problem, and the performance benefits far outweigh the cost.
Fourth (and finally), you ask whether there is a way to make an array immutable. First things first, I would strongly recommend against hacks to attempt to make native arrays immutable. For example, you could attempt to disable the setindex! function for arrays... but please don't do this. It will break so many things.
As was mentioned in the comments on the question, you could use StaticArrays. However, as Simeon notes in the comments on this answer, there are performance penalties for using static arrays for really big datasets. More than 100 elements and you can run into compilation issues. The main benefit of static arrays really is the optimizations that can be implemented for smaller static arrays.
Another package-based options suggested by phipsgabler in the comments below is FunctionalCollections. This appears to do what you want, although it looks to be only sporadically maintained. Of course, that isn't always a bad thing.
A simpler approach is just to copy arrays in your own code whenever you want to implement pass-by-value. For example:
f!(copy(x))
Just be sure you understand the difference between copy and deepcopy, and when you may need to use the latter. If you're only working with arrays of numbers, you'll never need the latter, and in fact using it will probably drastically slow down your code.
If you wanted to do a bit of work then you could also build your own array type in the spirit of static arrays, but without all the bells and whistles that static arrays entails. For example:
struct MyImmutableArray{T,N}
x::Array{T,N}
end
Base.getindex(y::MyImmutableArray, inds...) = getindex(y.x, inds...)
and similarly you could add any other functions you wanted to this type, while excluding functions like setindex!.
I think in most implementations of Common Lisp cons cells are generally/always heap allocated (see Why is consing in Lisp slow?)
Common Lisp does provide a facility for returning multiple values from a function (using values when returning and multiple-value-bind at the call-site). I'm speculating a bit here, but I think the motivation for this construction is two-fold: 1) make functions like truncate easier to use in the typical case where you don't care about the discarded value and 2) make it possible to return multiple values without using a heap-allocated data structure at all and (depending on the implementation (?)) avoiding the heap entirely (and GC overhead later down the road).
Does Common Lisp (or a specific implementation like SBCL maybe) give you the ability to use stack-allocated data (maybe in conjunction with something like weak references) or create composite/large-ish value types (something like structs in C)?
Common Lisp has a DYNAMIC-EXTENT declaration. Implementations can use this information to stack allocate some data structures - they can also ignore this declaration.
See the respective documentation how some implementations support it:
Allegro CL: Stack consing
LispWorks: Stack allocation of objects with dynamic extent
SBCL: Dynamic-extent allocation
Other implementations support it also, but they may lack explicit documentation about it.
The main motivation for explicit support of returning multiple values was to get rid of consing / destructuring lists of return values or even putting some results in global variables. Thus one may now be able to return multiple values in registers or via a stack.
The Racket FFI's documentation has types for _ptr, _cpointer, and _pointer.1
However, the documentation (as of writing this question) does not seem to compare the three different types. Obviously the first two are functions that produce ctype?s, where as the last one is a ctype? itself. But when would I use one type over the other?
1It also has as other types such as _box, _list, _gcpointer, and _cpointer/null. These are all variants of those three functions.
_ptr is a macro that is used to create types that are suitable for function types in which you need to pass data via a pointer passed as an argument (a very common idiom in C).
_pointer is a generic pointer ctype that can be used pretty much wherever a pointer is expected or returned. On the Racket side, it becomes an opaque value that you can't manipulate very easily (you can use ptr-ref if you need it). Note the docs have some caveats about interactions with GC when using this.
_cpointer constructs safer variants of _pointer that use tags to ensure that you don't mix up pointers of different types. It's generally more convenient to use define-cpointer-type instead of manually constructing these. In other words, these help you build abstractions represented by Racket's C pointers. You can do it manually with cpointer-push-tag! and _pointer but that's less convenient.
There's also a blog post I wrote that goes into more detail about some of these pointer issues: http://prl.ccs.neu.edu/blog/2016/06/27/tutorial-using-racket-s-ffi/
For a while I was thinking that you just need a map to a monoid, and then reduce would do reduction according to monoid's multiplication.
First, this is not exactly how monoids work, and second, this is not exactly how map/reduce works in practice.
Namely, take the ubiquitous "count" example. If there's nothing to count, any map/reduce engine will return an empty dataset, not a neutral element. Bummer.
Besides, in a monoid, an operation is defined for two elements. We can easily extend it to finite sequences, or, due to associativity, to finite ordered sets. But there's no way to extend it to arbitrary "collections" unless we actually have a σ-algebra.
So, what's the theory? I tried to figure it out, but I could not; and I tried to go Google it but found nothing.
I think the right way to think about map-reduce is not as a computational paradigm in its own right, but rather as a control flow construct similar to a while loop. You can view while as a program constructor with two arguments, a predicate function and an arbitrary program. Similarly, the map-reduce construct has two arguments named map and reduce, each functions. So analogously to while, the useful questions to ask are about proving correctness of constructed programs relative to given preconditions and postconditions. And as usual, those questions involve (a) termination and run-time performance and (b) maintenance of invariants.