dropping singleton dimensions in julia

dropping singleton dimensions in julia - julia

Just playing around with Julia (1.0) and one thing that I need to use a lot in Python/numpy/matlab is the squeeze function to drop the singleton dimensions.
I found out that one way to do this in Julia is:
a = rand(3, 3, 1);
a = dropdims(a, dims = tuple(findall(size(a) .== 1)...))
The second line seems a bit cumbersome and not easy to read and parse instantly (this could also be my bias that I bring from other languages). However, I wonder if this is the canonical way to do this in Julia?

The actual answer to this question surprised me. What you are asking could be rephrased as:
why doesn't dropdims(a) remove all singleton dimensions?
I'm going to quote Tim Holy from the relevant issue here:
it's not possible to have squeeze(A) return a type that the compiler
can infer---the sizes of the input matrix are a runtime variable, so
there's no way for the compiler to know how many dimensions the output
will have. So it can't possibly give you the type stability you seek.
Type stability aside, there are also some other surprising implications of what you have written. For example, note that:
julia> f(a) = dropdims(a, dims = tuple(findall(size(a) .== 1)...))
f (generic function with 1 method)
julia> f(rand(1,1,1))
0-dimensional Array{Float64,0}:
0.9939103383167442
In summary, including such a method in Base Julia would encourage users to use it, resulting in potentially type-unstable code that, under some circumstances, will not be fast (something the core developers are strenuously trying to avoid). In languages like Python, rigorous type-stability is not enforced, and so you will find such functions.
Of course, nothing stops you from defining your own method as you have. And I don't think you'll find a significantly simpler way of writing it. For example, the proposition for Base that was not implemented was the method:
function squeeze(A::AbstractArray)
singleton_dims = tuple((d for d in 1:ndims(A) if size(A, d) == 1)...)
return squeeze(A, singleton_dims)
end
Just be aware of the potential implications of using it.

Let me simply add that "uncontrolled" dropdims (drop any singleton dimension) is a frequent source of bugs. For example, suppose you have some loop that asks for a data array A from some external source, and you run R = sum(A, dims=2) on it and then get rid of all singleton dimensions. But then suppose that one time out of 10000, your external source returns A for which size(A, 1) happens to be 1: boom, suddenly you're dropping more dimensions than you intended and perhaps at risk for grossly misinterpreting your data.
If you specify those dimensions manually instead (e.g., dropdims(R, dims=2)) then you are immune from bugs like these.

You can get rid of tuple in favor of a comma ,:
dropdims(a, dims = (findall(size(a) .== 1)...,))

I'm a bit surprised at Colin's revelation; surely something relying on 'reshape' is type stable? (plus, as a bonus, returns a view rather than a copy).
julia> function squeeze( A :: AbstractArray )
keepdims = Tuple(i for i in size(A) if i != 1);
return reshape( A, keepdims );
end;
julia> a = randn(2,1,3,1,4,1,5,1,6,1,7);
julia> size( squeeze(a) )
(2, 3, 4, 5, 6, 7)
No?

Related

Puzzling results for Julia typeof

I am puzzled by the following results of typeof in the Julia 1.0.0 REPL:
# This makes sense.
julia> typeof(10)
Int64
# This surprised me.
julia> typeof(function)
ERROR: syntax: unexpected ")"
# No answer at all for return example and no error either.
julia> typeof(return)
# In the next two examples the REPL returns the input code.
julia> typeof(in)
typeof(in)
julia> typeof(typeof)
typeof(typeof)
# The "for" word returns an error like the "function" word.
julia> typeof(for)
ERROR: syntax: unexpected ")"
The Julia 1.0.0 documentation says for typeof
"Get the concrete type of x."
The typeof(function) example is the one that really surprised me. I expected a function to be a first-class object in Julia and have a type. I guess I need to understand types in Julia.
Any suggestions?
Edit
Per some comment questions below, here is an example based on a small function:
julia> function test() return "test"; end
test (generic function with 1 method)
julia> test()
"test"
julia> typeof(test)
typeof(test)
Based on this example, I would have expected typeof(test) to return generic function, not typeof(test).

To be clear, I am not a hardcore user of the Julia internals. What follows is an answer designed to be (hopefully) an intuitive explanation of what functions are in Julia for the non-hardcore user. I do think this (very good) question could also benefit from a more technical answer provided by one of the more core developers of the language. Also, this answer is longer than I'd like, but I've used multiple examples to try and make things as intuitive as possible.
As has been pointed out in the comments, function itself is a reserved keyword, and is not an actual function istself per se, and so is orthogonal to the actual question. This answer is intended to address your edit to the question.
Since Julia v0.6+, Function is an abstract supertype, much in the same way that Number is an abstract supertype. All functions, e.g. mean, user-defined functions, and anonymous functions, are subtypes of Function, in the same way that Float64 and Int are subtypes of Number.
This structure is deliberate and has several advantages.
Firstly, for reasons I don't fully understand, structuring functions in this way was the key to allowing anonymous functions in Julia to run just as fast as in-built functions from Base. See here and here as starting points if you want to learn more about this.
Secondly, because each function is its own subtype, you can now dispatch on specific functions. For example:
f1(f::T, x) where {T<:typeof(mean)} = f(x)
and:
f1(f::T, x) where {T<:typeof(sum)} = f(x) + 1
are different dispatch methods for the function f1
So, given all this, why does, e.g. typeof(sum) return typeof(sum), especially given that typeof(Float64) returns DataType? The issue here is that, roughly speaking, from a syntactical perspective, sum needs to serves two purposes simultaneously. It needs to be both a value, like e.g. 1.0, albeit one that is used to call the sum function on some input. But, it is also needs to be a type name, like Float64.
Obviously, it can't do both at the same time. So sum on its own behaves like a value. You can write f = sum ; f(randn(5)) to see how it behaves like a value. But we also need some way of representing the type of sum that will work not just for sum, but for any user-defined function, and any anonymous function. The developers decided to go with the (arguably) simplest option and have the type of sum print literally as typeof(sum), hence the behaviour you observe. Similarly if I write f1(x) = x ; typeof(f1), that will also return typeof(f1).
Anonymous functions are a bit more tricky, since they are not named as such. What should we do for typeof(x -> x^2)? What actually happens is that when you build an anonymous function, it is stored as a temporary global variable in the module Main, and given a number that serves as its type for lookup purposes. So if you write f = (x -> x^2), you'll get something back like #3 (generic function with 1 method), and typeof(f) will return something like getfield(Main, Symbol("##3#4")), where you can see that Symbol("##3#4") is the temporary type of this anonymous function stored in Main. (a side effect of this is that if you write code that keeps arbitrarily generating the same anonymous function over and over you will eventually overflow memory, since they are all actually being stored as separate global variables of their own type - however, this does not prevent you from doing something like this for n = 1:largenumber ; findall(y -> y > 1.0, x) ; end inside a function, since in this case the anonymous function is only compiled once at compile-time).
Relating all of this back to the Function supertype, you'll note that typeof(sum) <: Function returns true, showing that the type of sum, aka typeof(sum) is indeed a subtype of Function. And note also that typeof(typeof(sum)) returns DataType, in much the same way that typeof(typeof(1.0)) returns DataType, which shows how sum actually behaves like a value.
Now, given everything I've said, all the examples in your question now make sense. typeof(function) and typeof(for) return errors as they should, since function and for are reserved syntax. typeof(typeof) and typeof(in) correctly return (respectively) typeof(typeof), and typeof(in), since typeof and in are both functions. Note of course that typeof(typeof(typeof)) returns DataType.

dealing with types in kwargs in Julia

How can I use kwargs in a Julia function and declare their types for speed?
function f(x::Float64; kwargs...)
kwargs = Dict(kwargs)
if haskey(kwargs, :c)
c::Float64 = kwargs[:c]
else
c::Float64 = 1.0
end
return x^2 + c
end
f(0.0, c=10.0)
yields:
ERROR: LoadError: syntax: multiple type declarations for "c"
Of course I can define the function as f(x::Float64, c::Float64=1.0) to achieve the result, but I have MANY optional arguments with default values to pass, so I'd prefer to use kwargs.
Thanks.
Related post

As noted in another answer, this really only matters if you're going to have a type instability. If you do, the answer is to layer your functions. Have a top layer which does type checking and all sorts of setup, and then call a function which uses dispatch to be fast. For example,
function f(x::Float64; kwargs...)
kwargs = Dict(kwargs)
if haskey(kwargs, :c)
c = kwargs[:c]
else
c = 1.0
end
return _f(x,c)
end
_f(x,c) = x^2 + c
If most of your time is spent in the inner function, then this will be faster (it might not be for very simple functions). This allows for very general usage too, where you have have a keyword argument be by default nothing and do and if nothing ... which could setup a complicated default, and not have to worry about the type stability since it will be shielded from the inner function.
This kind of high-level type-checking wrapper above a performance sensitive inner function is used a lot in DifferentialEquations.jl. Check out the high-level wrapper for the SDE solvers which led to nice speedups by insuring type stability (the inner function is sde_solve) (or check out the solve for ODEProblem, it's much more complex since it handles conversions to different pacakges but it's the same idea).
A simpler answer for small examples like yours may be possible after this PR merges.
To fix some confusion, here's a declaration form:
function f(x::Float64; kwargs...)
local c::Float64 # Ensures the type of `c` will be `Float64`
kwargs = Dict(kwargs)
if haskey(kwargs, :c)
c = float(kwargs[:c])
else
c = 1.0
end
return x^2 + c
end
This will force anything that saves to c to convert to a Float64 or error, resulting in a type-stability, but is not as general of a solution. What form you use really depends on what you're doing.
Lastly, there's also the type assert, as #TotalVerb showed:
function f(x::Float64; c::Float64=1.0, kwargs...)
return x^2 + c
end
That's clean, or you could assert in the function:
function f(x::Float64; kwargs...)
kwargs = Dict(kwargs)
if haskey(kwargs, :c)
c = float(kwargs[:c])::Float64
else
c = 1.0
end
return x^2 + c
end
which will cause convertions only on the lines where the assertion occurs (i.e. the #TotalVerb form won't dispatch, so you can't make another function with c::Int, and it will only assert (convert) when the keyword arg is first read in).
Summary
The first solution will dispatch to be type stable in _f no matter what type the user makes c, and so if _f is a long calculation, this will get pretty much optimal performance, but for really quick calls it will have dispatch overhead.
The second solution will fix any type stability by forcing anything you set c to be a Float64 (it will try to convert, and if it can't, error). Thus this gets speed by forcing type stability, or erroring.
The assert in the keyword spot (#TotalVerb's answer) is the cleanest, but won't auto-convert later (so you could get a type-instability. But if you don't accidentally convert it later, then you have type stability, types can be inferred, and so you'll get optimal performance) and you can't extend it to cases where the function has c passed in as other types (no dispatch).
The last solution is pretty much the same as 3, except not as nice. I wouldn't recommend it. If you're doing something complicated with asserts, you likely are designing something wrong or really want to do something like the first (dispatch in a longer function call which is type stable).
But note that dispatch with version 3 may be fixed in the near future, which would allow you to have a different function with c::Float64 and c::Int (if necessary). Hopefully your solution is in here somewhere.

Note that declaring types does not give you increased performance; you may wish to relax the type constraints on x and c for your code to be more generic. Anyway, this is probably what you want:
function f(x::Float64; c::Float64=1.0, kwargs...)
return x^2 + c
end
See the keyword arguments section of the manual.

How to declare array type that can have int and floats

I am a newbie to Julia and still trying to figure out everything.
I want to restrict input variable type to array that can contain int and floats.
I would really appreciate any help.
function foo(array::??)

As I mentioned in the comment, you don't want to mix them for performance reasons. However, if your array can be either Floats or Ints, but you don't know which it will be, then the best approach is to make it dispatch on the parametric type:
function foo{T<:Number,N}(array::Array{T,N})
This will make it compile a separate function for arrays of each number type (only when needed), and since the type will be known for the compiler, it will run an optimized version of the function whether you give it foo([0.1,0.3,0.4]), foo([1 2 3]), foo([1//2 3//4]), etc.
Updated syntax in Julia 0.6+
function foo(array::Array{T,N}) where {T<:Number,N}
For more generality, you can use Array{Union{Int64,Float64},N} as a type. This will allow Floats and Ints, and you can use its constructor like
arr = Array{Union{Int64,Float64},2}(4,4) # The 2 is the dimension, (4,4) is the size
and you can allow dispatching onto weird things like this as well by doing
function foo{T,N}(array::Array{T,N})
i.e. just remove the restriction on T. However, since the compiler cannot know in advance whether any element of the array is an Int or a Float, it cannot optimize it very well. So in general you should not do this...
But let me explain one way you can work with this and still get something with decent performance. It also works by multiple dispatch. Essentially, if you encase your inner loops with a function call which is a strictly typed dispatch, then when doing all of the hard calculations it can know exactly what type it is and optimize the code anyways. This is best explained by an example. Let's say we want to do:
function foo{T,N}(array::Array{T,N})
for i in eachindex(array)
val = array[i]
# do algorithm X on val
end
end
You can check using #code_warntype that val will not compile as an Int64 or Float64 because it won't know until runtime what type it will be for each i. If you check #code_llvm (or #code_native for the assembly) you see that there is a really long code that is generated in order to handle this. What we can instead do is define
function inner_foo{T<:Number}(val::T)
# Do algorithm X on val
end
and then instead define foo as
function foo2{T,N}(array::Array{T,N})
for i in eachindex(array)
inner_foo(array[i])
end
end
While this looks the same to you, it is very different to the compiler. Note that inner_foo(array[i]) dispatches a specially-compiled function for whatever number type it sees, so in foo2 algorithm X is calculated efficiently, and the only non-efficient part is the wrapping above inner_foo (so if all your time is spent in inner_foo, you will get basically maximal performance).
This is why Julia is built around multiple-dispatch: it's a design which allows you to push things out to optimized functions whenever possible. Julia is fast because of it. Use it.

This should be a comment to Chris' answer, but I don't have enough points to comment.
As Chris points out, using function barriers can be quite useful to generate optimal code. However be aware that dynamic dispatch has some overhead. This may or may not be important depending on the complexity of the inner function.
function foo1{T,N}(array::Array{T,N})
s = 0.0
for i in eachindex(array)
val = array[i]
s += val*val
end
s
end
function foo2{T,N}(array::Array{T,N})
s = 0.0
for i in eachindex(array)
s += inner_foo(array[i])
end
s
end
function foo3{T,N}(array::Array{T,N})
s = 0.0
for i in eachindex(array)
val = array[i]
if isa(val, Float64)
s += inner_foo(val::Float64)
else
s += inner_foo(val::Int64)
end
end
s
end
function inner_foo{T<:Number}(val::T)
val*val
end
For A = Array{Union{Int64,Float64},N}, foo2 doesn't provide much speedup over foo1 since benefit of the optimised inner_foo is countered by the cost of dynamic dispatch.
foo3 is much faster (~7 times) and could be used if possible types are limited and known ahead of time (as in above example where elements are either Int64 or Float64)
See https://groups.google.com/forum/#!topic/julia-users/OBs0fmNmjCU for further discussion.

fast apply_along_axis equivalent in Julia

Is there an equivalent to numpy's apply_along_axis() (or R's apply())in Julia? I've got a 3D array and I would like to apply a custom function to each pair of co-ordinates of dimensions 1 and 2. The results should be in a 2D array.
Obviously, I could do two nested for loops iterating over the first and second dimension and then reshape, but I'm worried about performance.
This Example produces the output I desire (I am aware this is slightly pointless for sum(). It's just a dummy here:
test = reshape(collect(1:250), 5, 10, 5)
a=[]
for(i in 1:5)
for(j in 1:10)
push!(a,sum(test[i,j,:]))
end
end
println(reshape(a, 5,10))
Any suggestions for a faster version?
Cheers

Julia has the mapslices function which should do exactly what you want. But keep in mind that Julia is different from other languages you might know: library functions are not necessarily faster than your own code, because they may be written to a level of generality higher than what you actually need, and in Julia loops are fast. So it's quite likely that just writing out the loops will be faster.
That said, a couple of tips:
Read the performance tips section of the manual. From that you'd learn to put everything in a function, and to not use untyped arrays like a = [].
The slice or sub function can avoid making a copy of the data.

How about
f = sum # your function here
Int[f(test[i, j, :]) for i in 1:5, j in 1:10]
The last line is a two-dimensional array comprehension.
The Int in front is to guarantee the type of the elements; this should not be necessary if the comprehension is inside a function.
Note that you should (almost) never use untyped (Any) arrays, like your a = [], since this will be slow. You can write a = Int[] instead to create an empty array of Ints.
EDIT: Note that in Julia, loops are fast. The need for creating functions like that in Python and R comes from the inherent slowness of loops in those languages. In Julia it's much more common to just write out the loop.

Define a new method with only a few changes

I want to write a version that accepts a supplementary argument. The difference with the initial version only resides in a few lines of codes, potentially within loops. A typical example is to user a vector of weight w.
One solution is to completely rewrite a new function
function f(Vector::a)
...
for x in a
...
s += x[i]
...
end
...
end
function f(a::Vector, w::Vector)
...
for x in a
...
s += x[i] * w[i]
...
end
...
end
This solution duplicates code and therefore makes the program harder to maintain.
I could split ... into different helper functions, which are called by both functions, but the resulting code would be hard to follow
Another solution is to write only one function and use a ? : structure for each line that should be changed
function f(a, w::Union(Nothing, Vector) = nothing)
....
for x in a
...
s += (w == nothing)? x[i] : x[i] * w[i]
...
end
....
end
This code requires to check a condition at every step in a loop, which does not sound efficient, compared to the first version.
I'm sure there is a better solution, maybe using macros. What would be a good way to deal with this?

There are lots of ways to do this sort of thing, ranging from optional arguments to custom types to metaprogramming with #eval'ed code generation (this would splice in the changes for each new method as you loop over a list of possibilities).
I think in this case I'd use a combination of the approaches suggested by #ColinTBowers and #GnimucKey.
It's fairly simple to define a custom array type that is all ones:
immutable Ones{N} <: AbstractArray{Int,N}
dims::NTuple{N, Int}
end
Base.size(O::Ones) = O.dims
Base.getindex(O::Ones, I::Int...) = (checkbounds(O, I...); 1)
I've chosen to use an Int as the element type since it tends to promote well. Now all you need is to be a bit more flexible in your argument list and you're good to go:
function f(a::Vector, w::AbstractVector=Ones(size(a))
…
This should have a lower overhead than either of the other proposed solutions; getindex should inline nicely as a bounds check and the number 1, there's no type instability, and you don't need to rewrite your algorithm. If you're sure that all your accesses are in-bounds, you could even remove the bounds checking as an additional optimization. Or on a recent 0.4, you could define and use Base.unsafe_getindex(O::Ones, I::Int...) = 1 (that won't quite work on 0.3 since it's not guaranteed to be defined for all AbstractArrays).

In this case, using Optional Arguments may play the trick.
Just make the w argument default to ones().

I've come up against this problem a few times. If you want to avoid the conditional if statement inside the loop, one possibility is to use multiple dispatch over some dummy types. For example:
abstract MyFuncTypes
type FuncWithNoWeight <: MyFuncTypes; end
evaluate(x::Vector, i::Int, ::FuncWithNoWeight) = x[i]
type FuncWithWeight{T} <: MyFuncTypes
w::Vector{T}
end
evaluate(x::Vector, i::Int, wT::FuncWithWeight) = x[i] * wT.w[i]
function f(a, w::MyFuncTypes=FuncWithNoWeight())
....
for x in a
...
s += evaluate(x, i, w)
...
end
....
end
I extend the evaluate method over FuncWithNoWeight and FuncWithWeight in order to get the appropriate behaviour. I also nest these types within an abstract type MyFuncTypes, which is the second input to f (with default value of FuncWithNoWeight). From here, multiple dispatch and Julia's type system takes care of the rest.
One neat thing about this approach is that if you decide later on you want to add a third type of behaviour inside the loop (not necessarily even weighting, pretty much any type of transformation will be possible), it is as simple as defining a new type, nesting it under MyFuncTypes, and extending the evaluate method to the new type.
UPDATE: As Matt B. has pointed out, the first version of my answer accidentally introduced type instability into the function with my solution. As a general rule I typically find that if Matt posts something it is worth paying close attention (hint, hint, check out his answer). I'm still learning a lot about Julia (and am answering questions on StackOverflow to facilitate that learning). I've updated my answer to remove the type instability pointed out by Matt.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

dropping singleton dimensions in julia - julia

You can get rid of tuple in favor of a comma ,: dropdims(a, dims = (findall(size(a) .== 1)...,))

Related

Puzzling results for Julia typeof

dealing with types in kwargs in Julia

How to declare array type that can have int and floats

fast apply_along_axis equivalent in Julia

Define a new method with only a few changes

Categories

Resources