I think in most implementations of Common Lisp cons cells are generally/always heap allocated (see Why is consing in Lisp slow?)
Common Lisp does provide a facility for returning multiple values from a function (using values when returning and multiple-value-bind at the call-site). I'm speculating a bit here, but I think the motivation for this construction is two-fold: 1) make functions like truncate easier to use in the typical case where you don't care about the discarded value and 2) make it possible to return multiple values without using a heap-allocated data structure at all and (depending on the implementation (?)) avoiding the heap entirely (and GC overhead later down the road).
Does Common Lisp (or a specific implementation like SBCL maybe) give you the ability to use stack-allocated data (maybe in conjunction with something like weak references) or create composite/large-ish value types (something like structs in C)?
Common Lisp has a DYNAMIC-EXTENT declaration. Implementations can use this information to stack allocate some data structures - they can also ignore this declaration.
See the respective documentation how some implementations support it:
Allegro CL: Stack consing
LispWorks: Stack allocation of objects with dynamic extent
SBCL: Dynamic-extent allocation
Other implementations support it also, but they may lack explicit documentation about it.
The main motivation for explicit support of returning multiple values was to get rid of consing / destructuring lists of return values or even putting some results in global variables. Thus one may now be able to return multiple values in registers or via a stack.
Related
The information about Ada.Containers.Functional_Maps in the GNAT documentation is quite—let's say—abstruse.
First, it says this:
…these containers can still be used safely.
In the second paragraph, it seems to me that you cannot free the memory allocated for those objects once the program exits the context where they are created. I am understanding that you could run into a memory leak. Am I right?
They are also memory consuming, as the allocated memory is not reclaimed when the container is no longer referenced.
Read the next two sentences in the doc:
Thus, they should in general be used in ghost code and annotations, so that they can be removed from the final executable. The specification of this unit is compatible with SPARK 2014.
Because the specification of Ada.Containers.Functional_Maps is compatible with SPARK, it may help to examine it in the context of related SPARK Libraries with regard to proof, testing and annotation. In particular,
The functional maps, sets and vectors are unbounded collections of indefinite elements that are neither controlled nor limited. While they are inefficient with regard to memory, they are simple, immutable and useful "to model user defined data structures."
The functional containers can be used in Ghost Code, "parts of the code that are only meant for specification and verification", as suggested here. This related example illustrates a ghost function.
it seems to me that you cannot free the memory allocated for those
objects once the program exits the context where they are created. I
am understanding that you could run into a memory leak. Am I right?
There are some things that you can do in Ada to manage memory, I would be surprised if (for example) the usage of an instance inside a declare-block were not cleaned-up on the block's exit. — This is, in fact, how some surprisingly robust applications can get away without "dynamically-allocated" memory/values (it's actually heap-allocated, but that's pedantic).
This sort of granular control is really nice, as you can constrain things/usages to specific points. Combined with Ada's good facilities for presenting interfaces, this means that changing some structure to another can be less-painful than it otherwise might be.
As an example of the above, I had a nested key-value map (a JSON object) that was being used to pass parameters around; the method for doing this changed and so I had a string of values (with common-rooted keys) coming in and a procedure that took JSON as input. Obviously what was needed was a "keys&values-to-JSON function, so inside the function I used the multiway-tree container where the leafs represented values and the internal-nodes the keys, the second step was to traverse the tree and create the JSON-object as needed - simple recursion and data-structure selection used to address the problem of adapting the textual key-value pairs of these nested parameters to JSON. — And because the usage of multi-way trees was exclusive to this function, I can be confident that the memory used by the intermediate tree-object I used is released on the function's exit.
Coming from Wolfram Mathematica, I like the idea that whenever I pass a variable to a function I am effectively creating a copy of that variable. On the other hand, I am learning that in Julia there are the notions of mutable and immutable types, with the former passed by reference and the latter passed by value. Can somebody explain me the advantage of such a distinction? why arrays are passed by reference? Naively I see this as a bad aspect, since it creates side effects and ruins the possibility to write purely functional code. Where I am wrong in my reasoning? is there a way to make immutable an array, such that when it is passed to a function it is effectively passed by value?
here an example of code
#x is an in INT and so is immutable: it is passed by value
x = 10
function change_value(x)
x = 17
end
change_value(x)
println(x)
#arrays are mutable: they are passed by reference
arr = [1, 2, 3]
function change_array!(A)
A[1] = 20
end
change_array!(arr)
println(arr)
which indeed modifies the array arr
There is a fair bit to respond to here.
First, Julia does not pass-by-reference or pass-by-value. Rather it employs a paradigm known as pass-by-sharing. Quoting the docs:
Function arguments themselves act as new variable bindings (new
locations that can refer to values), but the values they refer to are
identical to the passed values.
Second, you appear to be asking why Julia does not copy arrays when passing them into functions. This is a simple one to answer: Performance. Julia is a performance oriented language. Making a copy every time you pass an array into a function is bad for performance. Every copy operation takes time.
This has some interesting side-effects. For example, you'll notice that a lot of the mature Julia packages (as well as the Base code) consists of many short functions. This code structure is a direct consequence of near-zero overhead to function calls. Languages like Mathematica and MatLab on the other hand tend towards long functions. I have no desire to start a flame war here, so I'll merely state that personally I prefer the Julia style of many short functions.
Third, you are wondering about the potential negative implications of pass-by-sharing. In theory you are correct that this can result in problems when users are unsure whether a function will modify its inputs. There were long discussions about this in the early days of the language, and based on your question, you appear to have worked out that the convention is that functions that modify their arguments have a trailing ! in the function name. Interestingly, this standard is not compulsory so yes, it is in theory possible to end up with a wild-west type scenario where users live in a constant state of uncertainty. In practice this has never been a problem (to my knowledge). The convention of using ! is enforced in Base Julia, and in fact I have never encountered a package that does not adhere to this convention. In summary, yes, it is possible to run into issues when pass-by-sharing, but in practice it has never been a problem, and the performance benefits far outweigh the cost.
Fourth (and finally), you ask whether there is a way to make an array immutable. First things first, I would strongly recommend against hacks to attempt to make native arrays immutable. For example, you could attempt to disable the setindex! function for arrays... but please don't do this. It will break so many things.
As was mentioned in the comments on the question, you could use StaticArrays. However, as Simeon notes in the comments on this answer, there are performance penalties for using static arrays for really big datasets. More than 100 elements and you can run into compilation issues. The main benefit of static arrays really is the optimizations that can be implemented for smaller static arrays.
Another package-based options suggested by phipsgabler in the comments below is FunctionalCollections. This appears to do what you want, although it looks to be only sporadically maintained. Of course, that isn't always a bad thing.
A simpler approach is just to copy arrays in your own code whenever you want to implement pass-by-value. For example:
f!(copy(x))
Just be sure you understand the difference between copy and deepcopy, and when you may need to use the latter. If you're only working with arrays of numbers, you'll never need the latter, and in fact using it will probably drastically slow down your code.
If you wanted to do a bit of work then you could also build your own array type in the spirit of static arrays, but without all the bells and whistles that static arrays entails. For example:
struct MyImmutableArray{T,N}
x::Array{T,N}
end
Base.getindex(y::MyImmutableArray, inds...) = getindex(y.x, inds...)
and similarly you could add any other functions you wanted to this type, while excluding functions like setindex!.
I am looking for an Array-like type with the following properties:
stores elements on disk
elements can have composite type
elements are read into memory, not the whole array
it is possible to write individual elements without writing the whole array
supports setindex!, getindex, push!, pop!, shift!, unshift! and maybe vcat
is reasonably efficient
So far I have found the following leads:
https://docs.julialang.org/en/latest/stdlib/SharedArrays/
http://juliadb.org
https://github.com/JuliaIO/JLD.jl
The first one seems promising, but it seems the type of the elements has to be isbits (meaning a simple number, some structs but not, e.g., an Array{Float64,1}). And it's not clear if the whole array contents are loaded into memory.
If it does not exist yet, I will of course try to construct it myself.
NCDatasets.jl addresses part of the requirements:
stores elements on disk: yes
elements can have composite type: no (although some support for composite type is in NetCDF4, but not yet in NCDatasets.jl). Currently you can have only Arrays of basic types and Arrays of Vectors (of basic types).
elements are read into memory, not the whole array: yes
it is possible to write individual elements without writing the whole array supports setindex!, getindex, push!, pop!, shift!, unshift! and maybe vcat: just setindex!, getindex
is reasonably efficient: the efficency is reasonable for me :-)
The project making it yourself sounds very interesting. I think it would server certainly a gap in the current ecosystem.
Some storage technologies that might be good to have a look at are:
HDF5 (for storage, cross-platform and cross-language)
JLD2 (successor of JLD) https://github.com/simonster/JLD2.jl
rasdaman (a "database" for arrays) http://www.rasdaman.org/
possibly also BSON http://bsonspec.org/
Maybe you can also reach out to the JuliaIO group.
What is the correct definition of destructive and non-destructive constructs in LISP (or in general). I have tried to search for the actual meaning but I have only found a lot of usage of these terms without actually explaining them.
It came to my understanding, that by destructive function is meant a function, that changes the meaning of the construct (or variable) - so when I pass a list as a parameter to a function, which changes it, it is called a destructive operation, because it changes the initial list and return a brand new one. Is this right or are there some exceptions?
So is for example set a destructive function (because it changes the value of x)? I think not but I do not how, how would I justify this.
(set 'x 1)
Sorry for probably a very basic question.... Thanks for any answers!
I would not interpret too much into the word 'destructive'.
In list processing, a destructive operation is one that potentially changes one or more of the input lists as a visible side effect.
Now, you can widen the meaning to operations over arrays, structures, CLOS objects, etc. You can also call variable assignment 'destructive' and so on.
In Common Lisp, it makes sense to talk about destructive operations over sequences (which are lists, strings, and vectors in general) and multi-dimensional arrays.
Practical Common Lisp distinguishes two kinds of destructive operations: for-side-effect operations and recycling operations.
set is destructive and for-side-effect: it always modifies its first argument. Beware, that it changes the binding for a symbol, but not the thing currently bound to that symbol. setf can change either bindings or objects in-place.
By contrast, nreverse is recycling: it is allowed to modify its argument list, although there's no guarantee that it will, so it should be used just like reverse (take the return value), except that the input argument may be "destroyed" and should no longer be used. [Scheme programmers may call this a "linear update" function.]
Does containers in D have value or reference semantics by default? If they have reference semantics doesn't that fundamentally hinder the use of functional programming style in D (compared to C++11's Move Semantics) such as in the following (academic) example:
auto Y = reverse(sort(X));
where X is a container.
Whether containers have value semantics or reference semantics depends entirely on the container. The only built-in containers are dynamic arrays, static arrays, and associative arrays. Static arrays have strict value semantics, because they sit on the stack. Associative arrays have strict reference semantics. And dynamic arrays mostly have reference semantics. They're elements don't get copied, but they do, so they end up with semantics which are a bit particular. I'd advise reading this article on D arrays for more details.
As for containers which are official but not built-in, the containers in std.container all have reference semantics, and in general, that's how containers should be, because it's highly inefficient to do otherwise. But since anyone can implement their own containers, anyone can create containers which are value types if they want to.
However, like C++, D does not take the route of having algorithms operate on containers, so as far as algorithms go, whether containers have reference or value semantics is pretty much irrelevant. In C++, algorithms operate on iterators, so if you wanted to sort a container, you'd do something like sort(container.begin(), container.end()). In D, they operate on ranges, so you'd do sort(container[]). In neither language would you actually sort a container directly. Whether containers themselves have value or references semantics is therefore irrelevant to your typical algorithm.
However, D does better at functional programming with algorithms than C++ does, because ranges are better suited for it. Iterators have to be passed around in pairs, which doesn't work very well for chaining functions. Ranges, on the other hand, chain quite well, and Phobos takes advantage of this. It's one of its primary design principles that most of its functions operate on ranges to allow you to do in code what you typically end up doing on the unix command line with pipes, where you have a lot of generic tools/functions which generate output which you can pipe/pass to other tools/functions to operate on, allowing you to chain independent operations to do something specific to your needs rather than relying on someone to have written a program/function which did exactly what you want directly. Walter Bright discussed it recently in this article.
So, in D, it's easy to do something like:
auto stuff = sort(array(take(map!"a % 1000"(rndGen()), 100)));
or if you prefer UFCS (Universal Function Call Syntax):
auto stuff = rndGen().map!"a % 1000"().take(100).array().sort();
In either case, it generates a sorted list of 100 random numbers between 0 and 1000, and the code is in a functional style, which C++ would have a much harder time doing, and libraries which operate on containers rather than iterators or ranges would have an even harder time doing.
In-Built Containers
The only in-built containers in D are slices (also called arrays/dynamic arrays) and static arrays. The latter have value semantics (unlike in C and C++) - the entire array is (shallow) copied when passed around.
As for slices, they are value types with indirection, so you could say they have both value and reference semantics.
Imagine T[] as a struct like this:
struct Slice(T)
{
size_t length;
T* ptr;
}
Where ptr is a pointer to the first element of the slice, and length is the number of elements within the bounds of the slice. You can access the .ptr and .length fields of a slice, but while the data structure is identical to the above, it's actually a compiler built-in and thus not defined anywhere (the name Slice is just for demonstrative purposes).
Knowing this, you can see that copying a slice (assign to another variable, pass to a function etc.) just copies a length (no indrection - value semantics) and a pointer (has indirection - reference semantics).
In other words, a slice is a view into an array (located anywhere in memory), and there can be multiple views into the same array.
Algorithms
sort and reverse from std.algorithm work in-place to cater to as many users as possible. If the user wanted to put the result in a GC-allocated copy of the slice and leave the original unchanged, that can easily be done (X.dup). If the user wanted to put the result in a custom-allocated buffer, that can be done too. Finally, if the user wanted to sort in-place, this is an option. At any rate, any extra overhead is made explicit.
However, it's important to note that most algorithms in the standard library don't require mutation, instead returning lazily-evaluated range results, which is characteristic of functional programming.
User-Defined Containers
When it comes to user-defined containers, they can have whatever semantics they want - any configuration is possible in D.
The containers in std.container are reference types with .dup methods for making copies, thus slightly emulating slices.