Why were these common operations on collections renamed in Julia? - julia

Many common operations on collections in Julia such as deleting an item from a Set were renamed recently, with the old functions deprecated.
For example,
del(IntSet(1,2,3), 1)
now pops up a warning
WARNING: del is deprecated, use delete! instead.
Some of the renamed functions:
#deprecate push push!
#deprecate pop pop!
#deprecate grow grow!
#deprecate enqueue unshift!
#deprecate unshift unshift!
#deprecate shift shift!
#deprecate insert insert!
#deprecate del delete!
#deprecate del_all empty!
Why were these renamed? Is appending a ! to functions that change the state of a collection now a convention?

You can read the julia-dev thread here. Basically, it's simply changing to respect the rule described in the arrays documentation:
The last function, fill!, is different in that it modifies an existing
array instead of constructing a new one. As a convention, functions
with this property have names ending with an exclamation point. These
functions are sometimes called "mutating" functions, or "in-place"
functions.
FWIW I think this is a good idea, at least for Base.

The use of ! was always an explicit convention to indicate mutation, it was just not properly enforced until recently. For push, one could easily argue that it's not essential. But using push! instead of push makes it clear that mutation is happening in this case in precisely the same way that mutation would happen when using sort!, which is very different from sort.
This exclamation-mark convention exists in Scheme and Ruby and probably several other languages. It doesn't exist in a language like R which wouldn't allow one to perform the mutation without recourse to the underlying guts of the language.

Related

Difference between the internal procedures and functions in Progress4gl?

Both internal procedures and functions are accepting the parameters to give the output. So what is the use of using Internal procedures instead of functions.
A user-defined function is used when you want to perform some calculation and return a single value. In this respect it is the same as a built-in ABL function, like the SUBSTRING or EXP functions. Putting this calculation code in a FUNCTION block instead of inline in your code allows you to put it in one place and reference it multiple times without code duplication.
An internal procedure is also an encapsulated piece of code that does some work, but it is more general-purpose. While a function must return a single value, an internal procedure may or may not have input parameters or output parameters.
https://docs.progress.com/category/openedge-archives
Also functions (like methods) parameters and return value type are checked at compile time, which removes some potential problems at run time later.
The question acknowledges that both functions and internal procedures allow OUTPUT parameters and asks "what is the use" of internal procedures instead of functions.
To me, this implies that the poster is contemplating always using functions and deprecating internal procedures and is asking: "what would I lose if I do that?"
Two things spring to mind:
Sort of the opposite of Jean-Christophe Cardot's point: you would lose some automatic type conversions and syntactic flexibility about the parameter lists. Some people see that flexibility in a negative light. Others see it as a positive.
You need to "forward declare" your functions or use dynamic invocations. With an internal procedure you can RUN it without providing a declaration earlier in the code.
If you tend to think that strict type checking is useful then these are probably not benefits that you think of as being lost. If you prefer more flexible behaviors, then you may regret choosing functions rather than internal procedures.

Julia functions: making mutable types immutable

Coming from Wolfram Mathematica, I like the idea that whenever I pass a variable to a function I am effectively creating a copy of that variable. On the other hand, I am learning that in Julia there are the notions of mutable and immutable types, with the former passed by reference and the latter passed by value. Can somebody explain me the advantage of such a distinction? why arrays are passed by reference? Naively I see this as a bad aspect, since it creates side effects and ruins the possibility to write purely functional code. Where I am wrong in my reasoning? is there a way to make immutable an array, such that when it is passed to a function it is effectively passed by value?
here an example of code
#x is an in INT and so is immutable: it is passed by value
x = 10
function change_value(x)
x = 17
end
change_value(x)
println(x)
#arrays are mutable: they are passed by reference
arr = [1, 2, 3]
function change_array!(A)
A[1] = 20
end
change_array!(arr)
println(arr)
which indeed modifies the array arr
There is a fair bit to respond to here.
First, Julia does not pass-by-reference or pass-by-value. Rather it employs a paradigm known as pass-by-sharing. Quoting the docs:
Function arguments themselves act as new variable bindings (new
locations that can refer to values), but the values they refer to are
identical to the passed values.
Second, you appear to be asking why Julia does not copy arrays when passing them into functions. This is a simple one to answer: Performance. Julia is a performance oriented language. Making a copy every time you pass an array into a function is bad for performance. Every copy operation takes time.
This has some interesting side-effects. For example, you'll notice that a lot of the mature Julia packages (as well as the Base code) consists of many short functions. This code structure is a direct consequence of near-zero overhead to function calls. Languages like Mathematica and MatLab on the other hand tend towards long functions. I have no desire to start a flame war here, so I'll merely state that personally I prefer the Julia style of many short functions.
Third, you are wondering about the potential negative implications of pass-by-sharing. In theory you are correct that this can result in problems when users are unsure whether a function will modify its inputs. There were long discussions about this in the early days of the language, and based on your question, you appear to have worked out that the convention is that functions that modify their arguments have a trailing ! in the function name. Interestingly, this standard is not compulsory so yes, it is in theory possible to end up with a wild-west type scenario where users live in a constant state of uncertainty. In practice this has never been a problem (to my knowledge). The convention of using ! is enforced in Base Julia, and in fact I have never encountered a package that does not adhere to this convention. In summary, yes, it is possible to run into issues when pass-by-sharing, but in practice it has never been a problem, and the performance benefits far outweigh the cost.
Fourth (and finally), you ask whether there is a way to make an array immutable. First things first, I would strongly recommend against hacks to attempt to make native arrays immutable. For example, you could attempt to disable the setindex! function for arrays... but please don't do this. It will break so many things.
As was mentioned in the comments on the question, you could use StaticArrays. However, as Simeon notes in the comments on this answer, there are performance penalties for using static arrays for really big datasets. More than 100 elements and you can run into compilation issues. The main benefit of static arrays really is the optimizations that can be implemented for smaller static arrays.
Another package-based options suggested by phipsgabler in the comments below is FunctionalCollections. This appears to do what you want, although it looks to be only sporadically maintained. Of course, that isn't always a bad thing.
A simpler approach is just to copy arrays in your own code whenever you want to implement pass-by-value. For example:
f!(copy(x))
Just be sure you understand the difference between copy and deepcopy, and when you may need to use the latter. If you're only working with arrays of numbers, you'll never need the latter, and in fact using it will probably drastically slow down your code.
If you wanted to do a bit of work then you could also build your own array type in the spirit of static arrays, but without all the bells and whistles that static arrays entails. For example:
struct MyImmutableArray{T,N}
x::Array{T,N}
end
Base.getindex(y::MyImmutableArray, inds...) = getindex(y.x, inds...)
and similarly you could add any other functions you wanted to this type, while excluding functions like setindex!.

How to pass an object by reference and value in Julia?

I know that from here:
Julia function arguments follow a convention sometimes called "pass-by-sharing", which means that values are not copied when they are passed to functions. Function arguments themselves act as new variable bindings (new locations that can refer to values), but the values they refer to are identical to the passed values. Modifications to mutable values (such as Arrays) made within a function will be visible to the caller. This is the same behavior found in Scheme, most Lisps, Python, Ruby and Perl, among other dynamic languages.
Given this, it's clear to me that to pass by reference, all you need to do is have a mutable type that you pass into a function and edit.
My question then becomes, how can I clearly distinguish between pass by value and pass by reference? Does anyone have an example that shows a function being called twice; once with pass by reference, and once with pass by value?
I saw this post which alludes to some similar ideas, but it did not fully answer my question.
In Julia, functions always have pass-by-sharing argument-passing behavior:
https://docs.julialang.org/en/v1/manual/functions/
This argument-passing convention is also used in most general purpose dynamic programming languages, including various Lisps, Python, Perl and Ruby. A good and useful description can be found here:
https://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_sharing
In short, pass-by-sharing works like pass-by-reference but you cannot change which value a binding in the calling scope refers to by reassigning to an argument in the function being called—if you reassign an argument, the binding in the caller is unchanged. This means that in general you cannot use functions to change bindings, such as for example to swap to variables. (Macros can, however, modify bindings in the caller.) In particular, if a variable in the caller refers to an immutable value like an integer or a floating-point number, its value cannot be changed by a function call since which object the variable refers to cannot be changed by a function call and the value itself cannot be modified as it is immutable.
If you want to have something like R or Matlab pass by value behavior, you need to explicitly create a copy of the argument before modifying it. This is precisely what R and Matlab do when an argument is passed in a modified and an external reference to the argument remains. In Julia it must be done explicitly by the programmer rather than being done automatically by the system. A downside is that the system can sometimes know that no copy is required (no external references remain) when the programmer cannot generally know this. That ability, however, is deeply tied with the reference counting garbage collections technique, which is not used by Julia due to performance considerations.
By convention, functions which mutate the contents of an argument have a ! postfix (e.g., sort v/s sort!).

Binding vs Linking in Ada

I wonder what is the fundamental difference between binding and linking when working with Ada code? I couldn't find a good explanation on google and this is why I ask the question.
For the binding process what is the input and what is the output?
What is the relation between binding and linking? I assume binding needs to be done first.
Thanks,
Bogdan.
With GNAT, there are two jobs which the binder performs: first, checking that all the necessary compilations have been done, so that the program’s closure is consistent, and secondly arranging for elaboration to happen (these jobs are needed for any Ada build system, but they may be implemented differently).
When using gnatmake, the first of these jobs is usually superfluous, because gnatmake has already organised all the necessary compilations. It is possible to get this wrong (by, for example, moving a unit to a different library and not deleting its compilation products from the original place) but quite hard!
Elaboration is a feature of Ada that isn’t present in many other languages. There’s explanation at gcc.gnu.org and other places, but for a simple example,
with Foo;
package Bar is
Int : Integer := Foo.Value;
[...]
end Bar;
package Foo is
function Value return Integer;
[...]
end Foo;
we don’t know what Foo.Value is going to return at compile time, and we may not know until run time (what if it reads a value from the command line?), so Foo.Value must be in a fit state to be called before Bar’s initialisation happens.
Bar’s initialisation happens when Bar is elaborated, and likewise for Foo, so it’s gnatbind’s job to recognise this and arrange that Foo is elaborated before Bar.
It does this by emitting calls to packages’ elaboration code in a function (usually called adanit), and a main(), which is to be called by the operating system and calls adainit and then the Ada main program, say program.adb.
gnatmake then calls gnatlink, which takes the gnatbind-generated code, in Ada in files called b-program.ad[sb] or b__program.ad[sb] or b~program.ad[sb] depending on the vintage of the compiler, compiles it, and links it with the program’s closure to produce the final executable.
See the four points listed here: https://docs.adacore.com/gnat_ugn-docs/html/gnat_ugn/gnat_ugn/building_executable_programs_with_gnat.html#binding-with-gnatbind
You could think of it as a built-in make but without the recompilation: it ensures objects are consistent, generates a correct initialization order, compiles it, and passes everything to the linker.
As pointed out, in Ada the program entry point is not your main procedure, but one that performs a safe initialization and then calls your main procedure.

When to use Pragma Pure/Preelaborate

Is there a set of general rules/guidelines that can help to understand when to prefer pragma Pure, pragma Preelaborate, or something else entirely? The rules and definitions presented in the standard (Ada 2012), are a little heavy-going and I'd be grateful to read something that's a little more clear and geared towards the average case.
If I wanted to be thorough without fully understanding the "why" of it, can I simply try:
Mark the package spec with pragma Pure;
If it doesn't compile, try pragma Preelaborate;
If that fails, then I've done something tricky and either need to pragma Elaborate units on a with-by-with basis, or rethink the package layout.
While this might work (does it?), because it's recommended to mark a package as Pure whenever possible (likewise with Preelaborate), however it seems a bit brain damaged and I'd prefer to understand the process a bit better.
pragma Pure
You should use this on any package which does not have an internal state. It tells the user of the package that calls to any subprograms cannot have side effects, because there is no internal state they could change. So a function declared at library level inside a pure package will always return the same result when called with the same parameters.
The Ada implementation is allowed to cache return values of functions of a pure package, and to omit calls to subroutines if their return values won't be used because of these requirements. However, you can violate the constraints by calling imported subroutines (e.g. from a C library) inside your pure package (these may change some internal state which the Ada compiler doesn't know of). If you're evil, you can even import Ada subroutines from other parts of the software with pragma Import to bypass the requirements of pragma Pure. Needless to say: If you're doing anything like this, don't use pragma Pure.
Edit: To clarify the circumstances when calls may be omitted, let me quote the ARM:
If a library unit is declared pure, then the implementation is permitted to omit a call on a library-level subprogram of the library unit if the results are not needed after the call. Similarly, it may omit such a call and simply reuse the results produced by an earlier call on the same subprogram, provided that none of the parameters are of a limited type, and the addresses and values of all by-reference actual parameters, and the values of all by-copy-in actual parameters, are the same as they were at the earlier call. This permission applies even if the subprogram produces other side effects when called.
GNAT, for example, additionally defines that any subroutines that take a parameter of type System.Address or a type derived from it are not considered pure even if they are defined in a pure package, because the location the address points to may be altered, but GNAT does not know what kind of structure the address points to and therefore cannot run any checks about whether the referenced value of the parameter has been changed.
pragma Preelaborate
This tells the compiler that the package won't execute any code at elaboration time (i.e. before the main procedure starts executing). At elaboration time, the following constructs will execute:
Initialization of library-level variables (this can be a function call)
Initialization of tasks declared at library level (they may start executing before the main procedure does)
Statements in a begin ... end block at library level
You generally should avoid these things if you don't need them. Use pragma Preelaborate wherever possible, it tells the caller that he can safely use the package without executing anything at elaboration time.
If something doesn't compile with one of these pragmas when you think it should, look into why it doesn't compile. It may help you discover problems with your package implementation or structure. Don't just drop the pragma when it doesn't compile. As the constraint affects possible constraints on any packages that depend on yours, you should always choose the strictest applicable pragma.
Elaboration Order Handling in GNAT is a helpful guide. Ideally, the standard rules will suffice for most programs. The pragmas tell the compiler to substitute your elaboration order. They should be applied to solve specific problems, rather than used empirically.
Addendum: #ajb underscores an important distinction among the pragmas. The article cited agrees with the approach outlined in the question (bullets one and two): "Consequently a good rule is to mark units as Pure or Preelaborate if possible, and if this is not possible, mark them as Elaborate_Body if possible." It goes on to discuss situations (bullet three) "where neither of these three pragmas can be used."

Resources