Raku provides many types that are immutable and thus cannot be modified after they are created. Until I started looking into this area recently, my understanding was that these Types were not persistent data structures – that is, unlike the core types in Clojure or Haskell, my belief was that Raku's immutable types did not take advantage of structural sharing to allow for inexpensive copies. I thought that statement my List $new = (|$old-list, 42); literally copied the values in $old-list, without the data-sharing features of persistent data structures.
That description of my understanding is in the past tense, however, due to the following code:
my Array $a = do {
$_ = [rand xx 10_000_000];
say "Initialized an Array in $((now - ENTER now).round: .001) seconds"; $_}
my List $l = do {
$_ = |(rand xx 10_000_000);
say "Initialized the List in $((now - ENTER now).round: .001) seconds"; $_}
do { $a.push: rand;
say "Pushed the element to the Array in $((now - ENTER now).round: .000001) seconds" }
do { my $nl = (|$l, rand);
say "Appended an element to the List in $((now - ENTER now).round: .000001) seconds" }
do { my #na = |$l;
say "Copied List \$l into a new Array in $((now - ENTER now).round: .001) seconds" }
which produced this output in one run:
Initialized an Array in 5.938 seconds
Initialized the List in 5.639 seconds
Pushed the element to the Array in 0.000109 seconds
Appended an element to the List in 0.000109 seconds
Copied List $l into a new Array in 11.495 seconds
That is, creating a new List with the old values + one more is just as fast as pushing to a mutable Array, and dramatically faster than copying the List into a new Array – exactly the performance characteristics that you'd expect to see from a persistent List (copying to an Array is still slow because it can't take advantage of structural sharing without breaking the immutability of the List). The fast copying of $l into $nl is not due to either List being lazy; neither are.
All of the above leads me to believe that Lists in Rakudo actually are persistent data structures, with all the performance benefits that implies. That leaves me with several questions:
Am I right about Lists being persistent data structures?
Are all other immutable Types also persistent data structures? Or are any?
Is any of this part of Raku, or just an implementation choice Rakudo has made?
Are any of these performance characteristics documented/guaranteed anywhere?
I have to say, I am both extremely impressed and more than a bit baffled to discover evidence that at least some of Raku(do)'s types are persistent. It's the sort of feature that other languages list as a key selling point or that leads to the creation of libraries with 30k+ stars on GitHub. Have we really had it in Raku without even mentioning it?
I remember implementing these semantics, and I certainly don't recall thinking about them giving rise to a persistent data structure at the time - although it does seems fair to attach that label to the result!
I don't think you'll find anywhere that explicitly spells out this exact behavior, however the most natural implementation of things that are required by the language quite naturally leads to it. Taking the ingredients:
The infix:<,> operator is the List constructor in Raku
When a List is created, it is non-committal with regards to laziness and flattening (these arise from how we use the List, which we don't - in general - know at the point of its construction)
When we write (|$x, 1), the prefix:<|> operator constructs a Slip, which is a kind of List that should melt into its surrounding List. Thus what infix:<,> sees is a Slip and an Int.
Making the Slip melt into the result List immediately would mean making a commitment about eagerness, which List construction alone should not do. Thus the Slip and everything after it is placed into the lazily evaluated ("non-reified") portion of the List.
This last of these is what gives rise to the observed persistent data structure style behavior.
I expect it would be possible to have a implementation that inspects the Slip and chooses to eagerly copy things that are known not to be lazy, and still be in compliance with the specification test suite. That would change the time complexity of your example. If you want to be defensive against that, then:
do { my $nl = (|$l.lazy, rand);
say "Appended an element to the List in $((now - ENTER now).round: .000001) seconds" }
Should be sufficient to force the issue even if the implementation changed.
Of other cases that immediately come to mind that are related to persistent data structures or at least tail sharing:
The MoarVM implementation of strings, which is behind str and thus Str, implements string concatenation by creating a new string that refers to the two that are being concatenated instead of copying the data in the two strings (and does similar tricks for substr and repetition). This is strictly an optimization, not a language requirement, and in some delicate cases (the last grapheme of one string and the first grapheme of the next will form a single grapheme in the resulting string), it gives up and takes the copying path.
Outside of the core, modules like Concurrent::Stack, Concurrent::Queue, and Concurrent::Trie use tail sharing as a technique to implement relatively efficient lock-free data structures.
From the Golang source code, they seem to follow a pretty standard implementation of hash tables (ie array of buckets). Based on this it seems that iteration should be deterministic for an unchanged map (ie iterate the array in order, then iterate within the buckets in order). Why do they make the iteration random?
TL;DR; They intentionally made it random starting with Go 1 to make developers not rely on it (to not rely on a specific iteration order which order may change from release-to-relase, from platform-to-platform, or may even change during a single runtime of an app when map internals change due to accommodating more elements).
The Go Blog: Go maps in action: Iteration order:
When iterating over a map with a range loop, the iteration order is not specified and is not guaranteed to be the same from one iteration to the next. Since the release of Go 1.0, the runtime has randomized map iteration order. Programmers had begun to rely on the stable iteration order of early versions of Go, which varied between implementations, leading to portability bugs. If you require a stable iteration order you must maintain a separate data structure that specifies that order.
Also Go 1 Release Notes: Iterating in maps:
The old language specification did not define the order of iteration for maps, and in practice it differed across hardware platforms. This caused tests that iterated over maps to be fragile and non-portable, with the unpleasant property that a test might always pass on one machine but break on another.
In Go 1, the order in which elements are visited when iterating over a map using a for range statement is defined to be unpredictable, even if the same loop is run multiple times with the same map. Code should not assume that the elements are visited in any particular order.
This change means that code that depends on iteration order is very likely to break early and be fixed long before it becomes a problem. Just as important, it allows the map implementation to ensure better map balancing even when programs are using range loops to select an element from a map.
Notable exceptions
Please note that the "random" order applies when ranging over the map using for range.
For reproducible outputs (for easy testing and other conveniences it brings) the standard lib sorts map keys in numerous places:
1. encoding/json
The json package marshals maps using sorted keys. Quoting from json.Marshal():
Map values encode as JSON objects. The map's key type must either be a string, an integer type, or implement encoding.TextMarshaler. The map keys are sorted and used as JSON object keys by applying the following rules, subject to the UTF-8 coercion described for string values above:
keys of any string type are used directly
encoding.TextMarshalers are marshaled
integer keys are converted to strings
2. fmt package
Starting with Go 1.12 the fmt package prints maps using sorted keys. Quoting from the release notes:
Maps are now printed in key-sorted order to ease testing. The ordering rules are:
When applicable, nil compares low
ints, floats, and strings order by <
NaN compares less than non-NaN floats
bool compares false before true
Complex compares real, then imaginary
Pointers compare by machine address
Channel values compare by machine address
Structs compare each field in turn
Arrays compare each element in turn
Interface values compare first by reflect.Type describing the concrete > - type and then by concrete value as described in the previous rules.
3. Go templates
The {{range}} action of text/template and html/template packages also visit elements in sorted keys order. Quoting from package doc of text/template:
{{range pipeline}} T1 {{end}}
The value of the pipeline must be an array, slice, map, or channel.
If the value of the pipeline has length zero, nothing is output;
otherwise, dot is set to the successive elements of the array,
slice, or map and T1 is executed. If the value is a map and the
keys are of basic type with a defined order, the elements will be
visited in sorted key order.
This is important for security, among other things.
There are lots of resources talking about this online -- see this post for example
The CLtL2 reference clearly distinguishes between nondestructive and destructive common-lisp operations. But, within the destructive camp, it seems a little less clear in marking the difference between those which simply return the result, and those which additionally modify a place (given as argument) to contain the result. The usual convention of annexing "f" to such place modifying operations (eg, setf, incf, alexandria:deletef) is somewhat sporadic, and also applies to many place accessors (eg, aref, getf). In an ideal functional programming style (based only on returned values) such confusion is probably not an issue, but it seems like it could lead to programming errors in some practical applications that do use place modification. Since different implementations can handle the place results differently, couldn't portability be affected? It even seems difficult to test a particular implementation's approach.
To better understand the above distinction, I've divided the destructive common-lisp sequence operations into two categories corresponding to "argument returning" and "operation returning". Could someone validate or invalidate these categories for me? I'm assuming these categories could apply to other kinds of destructive operations (for lists, hash-tables, arrays, numbers, etc) too.
Argument returning: fill, replace, map-into
Operation returning: delete, delete-if, delete-if-not, delete-duplicates, nsubstitute, nsubstitute-if, nsubstitute-not-if, nreverse, sort, stable-sort, merge
But, within the destructive camp, it seems a little less clear in marking the difference between those which simply return the result.
There are no easy syntactic marker about which operation is destructive or not, even though there are useful conventions like the n prefix. Remember that CL is a standard inspired by different Lisps, which does not help enforcing a consistent terminology.
The usual convention of annexing "f" to such place modifying operations (eg, setf, incf, alexandria:deletef) is somewhat sporadic, and also applies to many place accessors (eg, aref, getf).
All setf expanders should ends with f, but not everything that ends with f is a setf expander. For example, aref takes its name from array and reference and isn't a macro.
... but it seems like it could lead to programming errors in some practical applications that do use place modification.
Most data is mutable (see comments); once you code in CL with that in mind, you take care not to modify data you did not create yourself. As for using a destructive operation in place of a non-destructive one inadvertently, I don't know: I guess it can happen, with sort or delete, maybe the first times you use them. In my mind delete is stronger, more destructive than simply remove, but maybe that's because I already know the difference.
Since different implementations can handle the place results differently, couldn't portability be affected?
If you want portability, you follow the specification, which does not offer much guarantee w.r.t. which destructive operations are applied. Take for example DELETE (emphasis mine):
Sequence may be destroyed and used to construct the result; however, the result might or might not be identical to sequence.
It is wrong to assume anything about how the list is being modified, or even if it is being modified. You could actually implement delete as an alias of remove in a minimal implementation. In all cases, you use the return value of your function (both delete and remove have the same signature).
Categories
I've divided the destructive common-lisp sequence operations into two categories corresponding to "argument returning" and "operation returning".
It is not clear at all what those categories are supposed to represent. Are those definition the one you have in mind?
an argument returning operation is one which returns one of its argument as a return value, possibly modified.
an operation returning operation is one where the result is based on one of its argument, and might be identical to that argument, but needs not be.
The definition of operation returning is quite vague and encompass both destructive and non-destructive operations. I would classify cons as such because it does not return one of its argument; OTOH, it is a purely functional operation.
I don't really get what those categories offer in addition to destructive or non-destructive.
Setf composition gotcha
Suppose you write a function (remote host key) which gets a value from a remote key/value datastore. Suppose also that you define (setf remote) so that it updates the remote value.
You might expect (setf (first (remote host key)) value) to:
Fetch a list from host, indexed by key,
Replace its first element by value,
Push the changes back to the remote host.
However, step 3 does generally not happen: the local list is modified in place (this is the most efficient alternative, but it makes setf expansions somewhat lazy about updates). You could define a new set of macros such as the whole round-trip is always implemented, with DEFINE-SETF-EXPANDER, though.
Let me try to address your question by introducing some concepts.
I hope it helps you to consolidate your knowledge and to find your remaining answers about this subject.
The first concept is that of non-destructive versus destructive behavior.
A function that is non-destructive won't change the data passed to it.
A function that is destructive may change the data passed to it.
You can apply the (non-)destructive nature to something other than a single function. For instance, if a function stores the data passed to it somewhere, say in a object's slot, then the destructiveness depends on that object's behavior, its other operations, events, etc.
The convention for functions that immediately modify its arguments is to (usually) prefix with n.
The convention doesn't work the other way around, there are many functions that start with n (e.g. not/null, nth, ninth, notany, notevery, numberp etc.) There are also notable exceptions, such as delete, merge, sort and stable-sort. The only way to naturally grasp them is with time/experience. For instance, always refer to the HyperSpec whenever you see a function you don't know yet.
Moreover, you usually need to store the result of some destructive functions, such as delete and sort, because they may choose to skip the head of the list or to not be destructive at all. delete may actually return nil, the empty list, which is not possible to obtain from a modified cons.
The second concept is that of generalized reference.
A generalized reference is anything that can hold data, such as a variable, the car and cdr of a cons, the element locations of an array or hash table, the slots of an object, etc.
For each container data structure, you need to know the specific modifying function. However, for some generalized references, there might not be a function to modify it, such as a local variable, in which case there are still special forms to modify it.
As such, in order to modify any generalized reference, you need to know its modifying form.
Another concept closely related to generalized references is the place. A form that identifies a generalized reference is called a place. Or in other words, a place is the written way (form) that represents a generalized reference.
For each kind of place, you have a reader form and a writer form.
Some of these forms are documented, such as using the symbol of a variable to read it and setq a variable to write to it, or car/cdr to read from and rplaca/rplacd to write to a cons. Others are only documented to be accessors, such as aref to read from arrays; its writer form is not actually documented.
To get these forms, you have get-setf-expansion. You actually also get a set of variables and their initializing forms (to be used as through let*) that will be used by the reader form and/or the writer form, and a set of variables (to be bound to the new values) that will be used by the writer form.
If you've used Lisp before, you've probably used setf. setf is a macro that generates code that runs within the scope (environment) of its expansion.
Essentially, it behaves as if by using get-setf-expansion, generating a let* form for the variables and initializing forms, generating extra bindings for the writer variables with the result of the value(s) form and invoking the writer form within all this environment.
For instance, let's define a my-setf1 macro which takes only a single place and a single newvalue form:
(defmacro my-setf1 (place newvalue &environment env)
(multiple-value-bind (vars vals store-vars writer-form reader-form)
(get-setf-expansion place env)
`(let* (,#(mapcar #'(lambda (var val)
`(,var ,val))
vars vals))
;; In case some vars are used only by reader-form
(declare (ignorable ,#vars))
(multiple-value-bind (,#store-vars)
,newvalue
,writer-form
;; Uncomment the next line to mitigate buggy writer-forms
;;(values ,#store-vars)
))))
You could then define my-setf as:
(defmacro my-setf (&rest pairs)
`(progn
,#(loop
for (place newvalue) on pairs by #'cddr
collect `(my-setf1 ,place ,newvalue))))
There is a convention for such macros, which is to suffix with f, such as setf itself, psetf, shiftf, rotatef, incf, decf, getf and remf.
Again, the convention doesn't work the other way around, there are operators that end with f, such as aref, svref and find-if, which are functions, and if, which is a conditional execution special operator. And yet again, there are notable exceptions, such as push, pushnew, pop, ldb, mask-field, assert and check-type.
Depending on your point-of-view, many more operators are implicitly destructive, even if not effectively tagged as such.
For instance, every defining operator (e.g. the macros defun, defpackage, defclass, defgeneric, defmethod, the function load) changes either the global environment or a temporary one, such as the compilation environment.
Others, like compile-file, compile and eval, depend on the forms they'll execute. For compile-file, it also depends on how much it isolates the compilation environment from the startup environment.
Other operators, like makunbound, fmakunbound, intern, export, shadow, use-package, rename-package, adjust-array, vector-push, vector-push-extend, vector-pop, remhash, clrhash, shared-initialize, change-class, slot-makunbound, add-method and remove-method, are (more or less) clearly intended to have side-effects.
And it's this last concept that can be the widest. Usually, a side-effect is regarded as any observable variation in one environment. As such, functions that don't change data are usually considered free of side-effects.
However, this is ill-defined. You may consider that all code execution implies side-effects, depending on what you define to be your environment or on what you can measure (e.g. consumed quotas, CPU time and real time, used memory, GC overhead, resource contention, system temperature, energy consumption, battery drain).
NOTE: None of the example lists are exhaustive.
I'm trying to integrate two Fortran 9x codes which contain data arrays with opposite array ordering. One code (I'll call it the old code) has an established library of subroutines and I am trying to take advantage of these with the other (new) code as efficiently as possible (i.e. not having to create temporary arrays just to reorder an array and pass it to a subroutine and then have to replace the old array with the new reordered result). For example,
Old code:
oldarray(1:n,1) -> variable 1 for n elements
oldarray(1:n,2) -> variable 2 for n elements
.. and so on
new code:
newarray(1,1:n) -> variable 1 for n elements
newarray(1,1:n) -> variable 2 for n elements
.. and so on
The variable indices do not necessarily relate between the two codes. If I only need one variable to pass to a procedure, I just pass newarray(1,1:n) and the procedure doesn't know the difference. However, if a procedure from the old code requires variables 1-6 of oldarray which might correspond to variables 2,6,8,1,4,3 (I just picked arbitrary numbers) of newarray, is it possible to create a pointer that I could pass to the procedure?
On a simpler side, would it be possible to just create pointer for the tranpose of the new array? As an example, pointer(1000,6) points to newarray(6,1000).
Note: It is not possible to rewrite the new code to use the same array ordering because both codes use an array ordering that best suits its loop structures which cannot be changed.
Also, I have very little experience with pointers. I know I can create a derived datatype which consists of an array of pointers but I don't think I would be able to pass that to a procedure in the manner required (I could be wrong as I also have very little experience with derived datatypes). The reference book I have (Fortran 95/2003 for Scientists and Engineers) only explores advanced applications of pointers in terms of linked lists and trees. I have also found little Fortran pointer information outside of what is covered in this book on the internet.
Thank you for your help.
I think the answer is no, you can't do this, and it wouldn't help anyway.
You can do all sorts of super-cool things with array pointers, with strides across arrays, etc, but I don't see on the face of it how you can change the order of the data.
So I could be wrong on this and it is possible, but then the question is: how would it help you? Presumably you want to user pointers to re-arrange the data without copying; but when you're passing such a thing around, the compiler is allowed to do copy-in, copy-out; eg, create a temporary array, copy the data in, pass it to the subroutine, and copy the data out upon return. And in fact that would almost certainly be the right thing to do in this case, performance-wise; that way the old code could be accessing memory in the fast order, and the transpose-copy could be done in a fast way, as well.
So I suspect the right way to treat this problem is to do the copy-in/copy-out approach yourself explicitly.
I'm working in Ada95, and I'm having difficulty figuring out pointers.
I have code that looks like the following:
type vector is array (1 .. 3) of integer;
type vector_access is access vector;
my_vec : vector;
procedure test is
pointer : vector_access := my_vec'access;
begin
...
end;
This fails compilation on the definition of pointer, saying
"The prefix to 'ACCESS must be either an aliased view of an object or denote a subprogram with a non-intrinsic calling convention"
If I then change the definition of the vector itself to:
my_vec : aliased vector
it now returns the compiler error:
"The expected type for X'ACCESS, where X denotes and aliased view of an object, must be a general acces type"
At the end of the day what I really need is a pointer to a specific item within the array, the position being dynamic based on input parameters. Can anyone point me in the right direction?
If you're using GNAT, the error message after the "must be a general access type" should have given you the solution:
add "all" to type "vector_access"
defined at line ...
so that you would end up with:
type Vector_Access is access all Vector;
The use of "all" to designate a general access type has to do with dynamic memory allocation pools in Ada. If you don't care about dynamic memory allocation pools, then don't worry about it, and just include "all" in your access type definitions.
I'm not sure if this is part of what you're looking for at the end of the day, but you are aware that in most circumstances Ada's access (pointer) types are used to handle dynamically allocated memory, right?
So instead of pointing my_vec at an aliased variable, you would dynamically allocate it:
Pointer_2_Dynamic : vector_access := new Vector;
This way you can dynamically allocate at runtime the objects you need, and easily handle variably sized ones (though you'd need a different vector definition to accomplish that:
type Dynamic_Vector is array (Natural range <>) of Integer;
type Dynamic_Vector_Access is access Dynamic_Vector;
N : Natural := 10; -- Variable here, but could be a subprogram parameter.
Dyn_Vector : Dynamic_Vector_Access := new Dynamic_Vector(1..N);
OK. Lesson one on Ada for expert Cish coders: Ada parameters are different than Cish parameters. In C, (pre-reference parameters) every single parameter is the equivalent of an Ada 'in' parameter, with an extra proviso that the C compiler must always stupidly pass the entire thing on the stack, no matter how huge it is. So you poor C coders get it nailed into your brains that you never pass large objects into subroutines directly, but always use pointers.
Ada is different. You tell the compiler how you want to access your parameters (read only - 'in', write only - 'out', or read write - 'in out'). However, that has nothing to do with how parameters are passed. The parameter passing mechanisim is up to the compiler, and the compiler will chose the most efficient way to do it. In practice, on nearly all platforms, that means than anything too big for a register will be passed by reference. But this is an implementation detail, and is the compiler's business, not yours. You shouldn't even have to think about it, except in really rare cases.
So grit your teeth and pass that array naked as an in out parameter. Trust me, you'll get to like it.
type vector is array (natural range <>) of integer;
my_vec : vector(1..3);
procedure test (subject : in out vector) is
begin
...
end;
Ada is designed to be quite usable in nearly all cases without needing pointers, and usable in all but a very few very rare cases without needing pointers to stack-allocated objects.
The former is fairly unsafe (dangers from unallocated pointers and memory leaks), and the latter is even more unsafe (stack objects may go out of scope before your pointer does, and even if they don't one little size error can corrupt your entire program). You can still do both in Ada, but unlike many languages it is designed to make unsafe things require a bit more work on your part to do, and make very unsafe things a major PITA to write.
For example, if you'd just dynamically allocate the entire array, you wouldn't have to fool with this aliased and all business. Furthermore, if you just want to pass the array into a subroutine, you could simply pass it as a parameter and you don't even have to fool with the dynamic allocations and deallocations. Again, Ada compilers are smart enough to pass large objects by reference (yes, even if you specified in). This takes an attitude adjustment from C/C++ coders, who are used to having to tell their dumbass compiler not to pass 10meg objects on the stack. You have to learn to let the Ada compiler worry about how to pass your parameters around efficiently, and you can just worry about how to write great code.
Think I found it, for anyone else running into the same issue.
The answer has to do with what specifically is being aliased. The array declaration needs to be:
type vector is array (1 .. 3) of aliased integer;
in order to make sure the integers are stored in memory, and not registers.