Julia convention for optional arguments - julia

Say I have a function such as f(x,y) but the y parameter is optional. What is the preferred way to set y as an optional argument? One option that works for me:
function f(x, y=nothing)
# do stuff
if y == nothing
# do stuff
else
# do stuff
end
# do stuff
end
But is this the preferred way? I can't set y to a single default value to use in calculation, as there are a number of calculations that are done differently when y is nothing. I could also just have separate functions f(x) and f(x,y) but that seems like too much code duplication.

This is fine. Note that optional arguments cause dispatch. This means that the if y == nothing (or equivalently, if typeof(y) <: Void), will actually compile away. You'll get two different functions which depend on whether the user gives a value or not. So in the end, the if statement compiles away and it's perfectly efficient to do this.
I will note that the same is not currently true for keyword arguments.
Another way to do this is to have two methods:
f(x)
and
f(x,y)
Whether two methods is nicer than 1 method with an if depends on the problem. Since the if will compile away using type information, there's no difference between these two approaches other than code organization.

Related

Type declaration in Julia for function argument that can be both array and scalar

What type should I specify in Julia for function arguments that can be either scalars or arrays? For instance, in the function below x and y could be e.g. Float64 or Array{Float64}.
function myfun(x, y)
return x .+ y
end
Is there an appropriate type declaration for such variables? Or should I just refrain from declaring the type there (or writing functions that are that generic)?
You can safely refrain from specifying types. This will not have an impact on performance of your code.
However, if you want to explicitly specify the type restriction you provided do (this is mostly useful to make sure your function is called with proper arguments and fail fast if it is not):
function myfun(x::Union{Float64, Array{Float64}},
y::Union{Float64, Array{Float64}})
return x .+ y
end
However, most likely you will rather want the following signature:
function myfun(x::Union{AbstractFloat, AbstractArray{<:AbstractFloat}},
y::Union{AbstractFloat, AbstractArray{<:AbstractFloat}})
return x .+ y
end
which says you accept any scalar float or any array of floats (not necessarily only Float64 and Array). This is more flexible, as e.g. you can accept views then or other floats (BigFloat or Float32) if you prefer to switch precision of your computations. Such a signature clearly signals your users what types of inputs you expect them to pass to myfun while remaining flexible.
I recommend this as being overly restrictive (Union{Float64, Array{Float64}}), while accepted by the compiler, usually leads to problems later when you start using your function with various input types.

Parameters of function in Julia

Does anyone know the reasons why Julia chose a design of functions where the parameters given as inputs cannot be modified?  This requires, if we want to use it anyway, to go through a very artificial process, by representing these data in the form of a ridiculous single element table.
Ada, which had the same kind of limitation, abandoned it in its 2012 redesign to the great satisfaction of its users. A small keyword (like out in Ada) could very well indicate that the possibility of keeping the modifications of a parameter at the output is required.
From my experience in Julia it is useful to understand the difference between a value and a binding.
Values
Each value in Julia has a concrete type and location in memory. Value can be mutable or immutable. In particular when you define your own composite type you can decide if objects of this type should be mutable (mutable struct) or immutable (struct).
Of course Julia has in-built types and some of them are mutable (e.g. arrays) and other are immutable (e.g. numbers, strings). Of course there are design trade-offs between them. From my perspective two major benefits of immutable values are:
if a compiler works with immutable values it can perform many optimizations to speed up code;
a user is can be sure that passing an immutable to a function will not change it and such encapsulation can simplify code analysis.
However, in particular, if you want to wrap an immutable value in a mutable wrapper a standard way to do it is to use Ref like this:
julia> x = Ref(1)
Base.RefValue{Int64}(1)
julia> x[]
1
julia> x[] = 10
10
julia> x
Base.RefValue{Int64}(10)
julia> x[]
10
You can pass such values to a function and modify them inside. Of course Ref introduces a different type so method implementation has to be a bit different.
Variables
A variable is a name bound to a value. In general, except for some special cases like:
rebinding a variable from module A in module B;
redefining some constants, e.g. trying to reassign a function name with a non-function value;
rebinding a variable that has a specified type of allowed values with a value that cannot be converted to this type;
you can rebind a variable to point to any value you wish. Rebinding is performed most of the time using = or some special constructs (like in for, let or catch statements).
Now - getting to the point - function is passed a value not a binding. You can modify a binding of a function parameter (in other words: you can rebind a value that a parameter is pointing to), but this parameter is a fresh variable whose scope lies inside a function.
If, for instance, we wanted a call like:
x = 10
f(x)
change a binding of variable x it is impossible because f does not even know of existence of x. It only gets passed its value. In particular - as I have noted above - adding such a functionality would break the rule that module A cannot rebind variables form module B, as f might be defined in a module different than where x is defined.
What to do
Actually it is easy enough to work without this feature from my experience:
What I typically do is simply return a value from a function that I assign to a variable. In Julia it is very easy because of tuple unpacking syntax like e.g. x,y,z = f(x,y,z), where f can be defined e.g. as f(x,y,z) = 2x,3y,4z;
You can use macros which get expanded before code execution and thus can have an effect modifying a binding of a variable, e.g. macro plusone(x) return esc(:($x = $x+1)) end and now writing y=100; #plusone(y) will change the binding of y;
Finally you can use Ref as discussed above (or any other mutable wrapper - as you have noted in your question).
"Does anyone know the reasons why Julia chose a design of functions where the parameters given as inputs cannot be modified?" asked by Schemer
Your question is wrong because you assume the wrong things.
Parameters are variables
When you pass things to a function, often those things are values and not variables.
for example:
function double(x::Int64)
2 * x
end
Now what happens when you call it using
double(4)
What is the point of the function modifying it's parameter x , it's pointless. Furthermore the function has no idea how it is called.
Furthermore, Julia is built for speed.
A function that modifies its parameter will be hard to optimise because it causes side effects. A side effect is when a procedure/function changes objects/things outside of it's scope.
If a function does not modifies a variable that is part of its calling parameter then you can be safe knowing.
the variable will not have its value changed
the result of the function can be optimised to a constant
not calling the function will not break the program's behaviour
Those above three factors are what makes FUNCTIONAL language fast and NON FUNCTIONAL language slow.
Furthermore when you move into Parallel programming or Multi Threaded programming, you absolutely DO NOT WANT a variable having it's value changed without you (The programmer) knowing about it.
"How would you implement with your proposed macro, the function F(x) which returns a boolean value and modifies c by c:= c + 1. F can be used in the following piece of Ada code : c:= 0; While F(c) Loop ... End Loop;" asked by Schemer
I would write
function F(x)
boolean_result = perform_some_logic()
return (boolean_result,x+1)
end
flag = true
c = 0
(flag,c) = F(c)
while flag
do_stuff()
(flag,c) = F(c)
end
"Unfortunately no, because, and I should have said that, c has to take again the value 0 when F return the value False (c increases as long the Loop lives and return to 0 when it dies). " said Schemer
Then I would write
function F(x)
boolean_result = perform_some_logic()
if boolean_result == true
return (true,x+1)
else
return (false,0)
end
end
flag = true
c = 0
(flag,c) = F(c)
while flag
do_stuff()
(flag,c) = F(c)
end

Puzzling results for Julia typeof

I am puzzled by the following results of typeof in the Julia 1.0.0 REPL:
# This makes sense.
julia> typeof(10)
Int64
# This surprised me.
julia> typeof(function)
ERROR: syntax: unexpected ")"
# No answer at all for return example and no error either.
julia> typeof(return)
# In the next two examples the REPL returns the input code.
julia> typeof(in)
typeof(in)
julia> typeof(typeof)
typeof(typeof)
# The "for" word returns an error like the "function" word.
julia> typeof(for)
ERROR: syntax: unexpected ")"
The Julia 1.0.0 documentation says for typeof
"Get the concrete type of x."
The typeof(function) example is the one that really surprised me. I expected a function to be a first-class object in Julia and have a type. I guess I need to understand types in Julia.
Any suggestions?
Edit
Per some comment questions below, here is an example based on a small function:
julia> function test() return "test"; end
test (generic function with 1 method)
julia> test()
"test"
julia> typeof(test)
typeof(test)
Based on this example, I would have expected typeof(test) to return generic function, not typeof(test).
To be clear, I am not a hardcore user of the Julia internals. What follows is an answer designed to be (hopefully) an intuitive explanation of what functions are in Julia for the non-hardcore user. I do think this (very good) question could also benefit from a more technical answer provided by one of the more core developers of the language. Also, this answer is longer than I'd like, but I've used multiple examples to try and make things as intuitive as possible.
As has been pointed out in the comments, function itself is a reserved keyword, and is not an actual function istself per se, and so is orthogonal to the actual question. This answer is intended to address your edit to the question.
Since Julia v0.6+, Function is an abstract supertype, much in the same way that Number is an abstract supertype. All functions, e.g. mean, user-defined functions, and anonymous functions, are subtypes of Function, in the same way that Float64 and Int are subtypes of Number.
This structure is deliberate and has several advantages.
Firstly, for reasons I don't fully understand, structuring functions in this way was the key to allowing anonymous functions in Julia to run just as fast as in-built functions from Base. See here and here as starting points if you want to learn more about this.
Secondly, because each function is its own subtype, you can now dispatch on specific functions. For example:
f1(f::T, x) where {T<:typeof(mean)} = f(x)
and:
f1(f::T, x) where {T<:typeof(sum)} = f(x) + 1
are different dispatch methods for the function f1
So, given all this, why does, e.g. typeof(sum) return typeof(sum), especially given that typeof(Float64) returns DataType? The issue here is that, roughly speaking, from a syntactical perspective, sum needs to serves two purposes simultaneously. It needs to be both a value, like e.g. 1.0, albeit one that is used to call the sum function on some input. But, it is also needs to be a type name, like Float64.
Obviously, it can't do both at the same time. So sum on its own behaves like a value. You can write f = sum ; f(randn(5)) to see how it behaves like a value. But we also need some way of representing the type of sum that will work not just for sum, but for any user-defined function, and any anonymous function. The developers decided to go with the (arguably) simplest option and have the type of sum print literally as typeof(sum), hence the behaviour you observe. Similarly if I write f1(x) = x ; typeof(f1), that will also return typeof(f1).
Anonymous functions are a bit more tricky, since they are not named as such. What should we do for typeof(x -> x^2)? What actually happens is that when you build an anonymous function, it is stored as a temporary global variable in the module Main, and given a number that serves as its type for lookup purposes. So if you write f = (x -> x^2), you'll get something back like #3 (generic function with 1 method), and typeof(f) will return something like getfield(Main, Symbol("##3#4")), where you can see that Symbol("##3#4") is the temporary type of this anonymous function stored in Main. (a side effect of this is that if you write code that keeps arbitrarily generating the same anonymous function over and over you will eventually overflow memory, since they are all actually being stored as separate global variables of their own type - however, this does not prevent you from doing something like this for n = 1:largenumber ; findall(y -> y > 1.0, x) ; end inside a function, since in this case the anonymous function is only compiled once at compile-time).
Relating all of this back to the Function supertype, you'll note that typeof(sum) <: Function returns true, showing that the type of sum, aka typeof(sum) is indeed a subtype of Function. And note also that typeof(typeof(sum)) returns DataType, in much the same way that typeof(typeof(1.0)) returns DataType, which shows how sum actually behaves like a value.
Now, given everything I've said, all the examples in your question now make sense. typeof(function) and typeof(for) return errors as they should, since function and for are reserved syntax. typeof(typeof) and typeof(in) correctly return (respectively) typeof(typeof), and typeof(in), since typeof and in are both functions. Note of course that typeof(typeof(typeof)) returns DataType.

Evaluation in definition

I am sorry about the title, but I couldn't find a better one.
Let's define
function test(n)
print("test executed")
return n
end
f(n) = test(n)
Every time we call f we get
f(5)
test executed
5
Is there a way to tell julia to evaluate test once in the definition of f?
I expect that this is probably not going to be possible, in which case I have a slightly different question. If ar=[1,2,:x,-2,2*:x] is there any way to define f(x) to be the sum of ar, i.e. f(x) = 3*x+1?
If you want to compile based on type information, you can use #generated functions. But it seems like you want to compile based on the runtime values of the input. In this case, you might want to do memoization. There is a library Memoize that provides a macro for memoizing functions.

Define a new method with only a few changes

I want to write a version that accepts a supplementary argument. The difference with the initial version only resides in a few lines of codes, potentially within loops. A typical example is to user a vector of weight w.
One solution is to completely rewrite a new function
function f(Vector::a)
...
for x in a
...
s += x[i]
...
end
...
end
function f(a::Vector, w::Vector)
...
for x in a
...
s += x[i] * w[i]
...
end
...
end
This solution duplicates code and therefore makes the program harder to maintain.
I could split ... into different helper functions, which are called by both functions, but the resulting code would be hard to follow
Another solution is to write only one function and use a ? : structure for each line that should be changed
function f(a, w::Union(Nothing, Vector) = nothing)
....
for x in a
...
s += (w == nothing)? x[i] : x[i] * w[i]
...
end
....
end
This code requires to check a condition at every step in a loop, which does not sound efficient, compared to the first version.
I'm sure there is a better solution, maybe using macros. What would be a good way to deal with this?
There are lots of ways to do this sort of thing, ranging from optional arguments to custom types to metaprogramming with #eval'ed code generation (this would splice in the changes for each new method as you loop over a list of possibilities).
I think in this case I'd use a combination of the approaches suggested by #ColinTBowers and #GnimucKey.
It's fairly simple to define a custom array type that is all ones:
immutable Ones{N} <: AbstractArray{Int,N}
dims::NTuple{N, Int}
end
Base.size(O::Ones) = O.dims
Base.getindex(O::Ones, I::Int...) = (checkbounds(O, I...); 1)
I've chosen to use an Int as the element type since it tends to promote well. Now all you need is to be a bit more flexible in your argument list and you're good to go:
function f(a::Vector, w::AbstractVector=Ones(size(a))
…
This should have a lower overhead than either of the other proposed solutions; getindex should inline nicely as a bounds check and the number 1, there's no type instability, and you don't need to rewrite your algorithm. If you're sure that all your accesses are in-bounds, you could even remove the bounds checking as an additional optimization. Or on a recent 0.4, you could define and use Base.unsafe_getindex(O::Ones, I::Int...) = 1 (that won't quite work on 0.3 since it's not guaranteed to be defined for all AbstractArrays).
In this case, using Optional Arguments may play the trick.
Just make the w argument default to ones().
I've come up against this problem a few times. If you want to avoid the conditional if statement inside the loop, one possibility is to use multiple dispatch over some dummy types. For example:
abstract MyFuncTypes
type FuncWithNoWeight <: MyFuncTypes; end
evaluate(x::Vector, i::Int, ::FuncWithNoWeight) = x[i]
type FuncWithWeight{T} <: MyFuncTypes
w::Vector{T}
end
evaluate(x::Vector, i::Int, wT::FuncWithWeight) = x[i] * wT.w[i]
function f(a, w::MyFuncTypes=FuncWithNoWeight())
....
for x in a
...
s += evaluate(x, i, w)
...
end
....
end
I extend the evaluate method over FuncWithNoWeight and FuncWithWeight in order to get the appropriate behaviour. I also nest these types within an abstract type MyFuncTypes, which is the second input to f (with default value of FuncWithNoWeight). From here, multiple dispatch and Julia's type system takes care of the rest.
One neat thing about this approach is that if you decide later on you want to add a third type of behaviour inside the loop (not necessarily even weighting, pretty much any type of transformation will be possible), it is as simple as defining a new type, nesting it under MyFuncTypes, and extending the evaluate method to the new type.
UPDATE: As Matt B. has pointed out, the first version of my answer accidentally introduced type instability into the function with my solution. As a general rule I typically find that if Matt posts something it is worth paying close attention (hint, hint, check out his answer). I'm still learning a lot about Julia (and am answering questions on StackOverflow to facilitate that learning). I've updated my answer to remove the type instability pointed out by Matt.

Resources