getindex for specific dims in Julia - julia

Suppose I want to write a dynamic function that gets an object subtype of AbstractMatrix and shuffles the values along a specified dimension. Surely there can be various approaches and ways to do this, but suppose the following way:
import Random.shuffle
function shuffle(data::AbstractMatrix; dims=1)
n = size(data, dims)
shuffled_idx = shuffle(1:n)
data[shuffled_idx, :] #This line is wrong. It's not dynamic
A wrong way is to use several (actually indefinite) if-else statements like if dims==1 do... if dims==2 do. But it isn't the way to do these kinds of things. I could write data::AbstractArray then the input could have various dimensions. So this came to my mind that this can be possible if I can do something like getindex(data, [idxs]; dims). But I checked for the dims keyword argument (or even positional one) in the dispatches of getindex, but there isn't such a definition. So how can I get values by specified indexes and along a dim?

You are looking for selectdim:
help?> selectdim
search: selectdim
selectdim(A, d::Integer, i)
Return a view of all the data of A where the index for dimension d equals i.
Equivalent to view(A,:,:,...,i,:,:,...) where i is in position d.
Here's a code example:
function myshuffle(data::AbstractMatrix; dim=1)
inds = shuffle(axes(data, dim))
return selectdim(data, dim, inds)
end
Make sure not to use 1:n as indices for AbstractArrays, as they may have non-standard indices. Use axes instead.
BTW, selectdim apparently returns a view, so you may or may not need to use collect on it.

Related

Function taking Vectors and Scalars

I have a function that takes a vector
function foo(x::Vector{Int64})
x
end
How can I make it work also for scalars (i.e. turn them into one-element vectors)?
I know that I can do this:
foo(x::Int64) = foo([x])
but this stops being cool when there are more arguments, because you're writing multiple methods to achieve only one thing.
I think I something like foo(x::Union{Int64, Vector{Int64}}), but I don't know where or how it works or if it is the right thing to do.
Can anyone help?
You can make a helper function which either converts or does nothing. Then the main function can accept any combination:
_vec(x::Number) = [x]
_vec(x::AbstractVector) = x
function f(x, y, z) # could specify ::Union{Number, AbstractVector}
xv = _vec(x)
yv = _vec(y)
...
end
The ... could do the actual work, or could call f(xv, yv, zv) where another method f(x::AbstractVector, y::AbstractVector, z::AbstractVector) does the work --- whichever seems cleaner.
The main time this comes up is if the version of your function for vectors does the same thing for all of it's elements. In this case, what you want to do is define f(x::Int), and use broadcasting f.([1,2,3]) for the vector case.

Define a new method with only a few changes

I want to write a version that accepts a supplementary argument. The difference with the initial version only resides in a few lines of codes, potentially within loops. A typical example is to user a vector of weight w.
One solution is to completely rewrite a new function
function f(Vector::a)
...
for x in a
...
s += x[i]
...
end
...
end
function f(a::Vector, w::Vector)
...
for x in a
...
s += x[i] * w[i]
...
end
...
end
This solution duplicates code and therefore makes the program harder to maintain.
I could split ... into different helper functions, which are called by both functions, but the resulting code would be hard to follow
Another solution is to write only one function and use a ? : structure for each line that should be changed
function f(a, w::Union(Nothing, Vector) = nothing)
....
for x in a
...
s += (w == nothing)? x[i] : x[i] * w[i]
...
end
....
end
This code requires to check a condition at every step in a loop, which does not sound efficient, compared to the first version.
I'm sure there is a better solution, maybe using macros. What would be a good way to deal with this?
There are lots of ways to do this sort of thing, ranging from optional arguments to custom types to metaprogramming with #eval'ed code generation (this would splice in the changes for each new method as you loop over a list of possibilities).
I think in this case I'd use a combination of the approaches suggested by #ColinTBowers and #GnimucKey.
It's fairly simple to define a custom array type that is all ones:
immutable Ones{N} <: AbstractArray{Int,N}
dims::NTuple{N, Int}
end
Base.size(O::Ones) = O.dims
Base.getindex(O::Ones, I::Int...) = (checkbounds(O, I...); 1)
I've chosen to use an Int as the element type since it tends to promote well. Now all you need is to be a bit more flexible in your argument list and you're good to go:
function f(a::Vector, w::AbstractVector=Ones(size(a))
…
This should have a lower overhead than either of the other proposed solutions; getindex should inline nicely as a bounds check and the number 1, there's no type instability, and you don't need to rewrite your algorithm. If you're sure that all your accesses are in-bounds, you could even remove the bounds checking as an additional optimization. Or on a recent 0.4, you could define and use Base.unsafe_getindex(O::Ones, I::Int...) = 1 (that won't quite work on 0.3 since it's not guaranteed to be defined for all AbstractArrays).
In this case, using Optional Arguments may play the trick.
Just make the w argument default to ones().
I've come up against this problem a few times. If you want to avoid the conditional if statement inside the loop, one possibility is to use multiple dispatch over some dummy types. For example:
abstract MyFuncTypes
type FuncWithNoWeight <: MyFuncTypes; end
evaluate(x::Vector, i::Int, ::FuncWithNoWeight) = x[i]
type FuncWithWeight{T} <: MyFuncTypes
w::Vector{T}
end
evaluate(x::Vector, i::Int, wT::FuncWithWeight) = x[i] * wT.w[i]
function f(a, w::MyFuncTypes=FuncWithNoWeight())
....
for x in a
...
s += evaluate(x, i, w)
...
end
....
end
I extend the evaluate method over FuncWithNoWeight and FuncWithWeight in order to get the appropriate behaviour. I also nest these types within an abstract type MyFuncTypes, which is the second input to f (with default value of FuncWithNoWeight). From here, multiple dispatch and Julia's type system takes care of the rest.
One neat thing about this approach is that if you decide later on you want to add a third type of behaviour inside the loop (not necessarily even weighting, pretty much any type of transformation will be possible), it is as simple as defining a new type, nesting it under MyFuncTypes, and extending the evaluate method to the new type.
UPDATE: As Matt B. has pointed out, the first version of my answer accidentally introduced type instability into the function with my solution. As a general rule I typically find that if Matt posts something it is worth paying close attention (hint, hint, check out his answer). I'm still learning a lot about Julia (and am answering questions on StackOverflow to facilitate that learning). I've updated my answer to remove the type instability pointed out by Matt.

Can I use a subtype of a function parameter in the function definition?

I would like to use a subtype of a function parameter in my function definition. Is this possible? For example, I would like to write something like:
g{T1, T2<:T1}(x::T1, y::T2) = x + y
So that g will be defined for any x::T1 and any y that is a subtype of T1. Obviously, if I knew, for example, that T1 would always be Number, then I could write g{T<:Number}(x::Number, y::T) = x + y and this would work fine. But this question is for cases where T1 is not known until run-time.
Read on if you're wondering why I would want to do this:
A full description of what I'm trying to do would be a bit cumbersome, but what follows is a simplified example.
I have a parameterised type, and a simple method defined over that type:
type MyVectorType{T}
x::Vector{T}
end
f1!{T}(m::MyVectorType{T}, xNew::T) = (m.x[1] = xNew)
I also have another type, with an abstract super-type defined as follows
abstract MyAbstract
type MyType <: MyAbstract ; end
I create an instance of MyVectorType with vector element type set to MyAbstract using:
m1 = MyVectorType(Array(MyAbstract, 1))
I now want to place an instance of MyType in MyVectorType. I can do this, since MyType <: MyAbstract. However, I can't do this with f1!, since the function definition means that xNew must be of type T, and T will be MyAbstract, not MyType.
The two solutions I can think of to this problem are:
f2!(m::MyVectorType, xNew) = (m.x[1] = xNew)
f3!{T1, T2}(m::MyVectorType{T1}, xNew::T2) = T2 <: T1 ? (m.x[1] = xNew) : error("Oh dear!")
The first is essentially a duck-typing solution. The second performs the appropriate error check in the first step.
Which is preferred? Or is there a third, better solution I am not aware of?
The ability to define a function g{T, S<:T}(::Vector{T}, ::S) has been referred to as "triangular dispatch" as an analogy to diagonal dispatch: f{T}(::Vector{T}, ::T). (Imagine a table with a type hierarchy labelling the rows and columns, arranged such that the super types are to the top and left. The rows represent the element type of the first argument, and the columns the type of the second. Diagonal dispatch will only match the cells along the diagonal of the table, whereas triangular dispatch matches the diagonal and everything below it, forming a triangle.)
This simply isn't implemented yet. It's a complicated problem, especially once you start considering the scoping of T and S outside of function definitions and in the context of invariance. See issue #3766 and #6984 for more details.
So, practically, in this case, I think duck-typing is just fine. You're relying upon the implementation of myVectorType to do the error checking when it assigns its elements, which it should be doing in any case.
The solution in base julia for setting elements of an array is something like this:
f!{T}(A::Vector{T}, x::T) = (A[1] = x)
f!{T}(A::Vector{T}, x) = f!(A, convert(T, x))
Note that it doesn't worry about the type hierarchy or the subtype "triangle." It just tries to convert x to T… which is a no-op if x::S, S<:T. And convert will throw an error if it cannot do the conversion or doesn't know how.
UPDATE: This is now implemented on the latest development version (0.6-dev)! In this case I think I'd still recommend using convert like I originally answered, but you can now define restrictions within the static method parameters in a left-to-right manner.
julia> f!{T1, T2<:T1}(A::Vector{T1}, x::T2) = "success!"
julia> f!(Any[1,2,3], 4.)
"success!"
julia> f!(Integer[1,2,3], 4.)
ERROR: MethodError: no method matching f!(::Array{Integer,1}, ::Float64)
Closest candidates are:
f!{T1,T2<:T1}(::Array{T1,1}, ::T2<:T1) at REPL[1]:1
julia> f!([1.,2.,3.], 4.)
"success!"

Using outer() with a multivariable function

Suppose you have a function f<- function(x,y,z) { ... }. How would you go about passing a constant to one argument, but letting the other ones vary? In other words, I would like to do something like this:
output <- outer(x,y,f(x,y,z=2))
This code doesn't evaluate, but is there a way to do this?
outer(x, y, f, z=2)
The arguments after the function are additional arguments to it, see ... in ?outer. This syntax is very common in R, the whole apply family works the same for instance.
Update:
I can't tell exactly what you want to accomplish in your follow up question, but think a solution on this form is probably what you should use.
outer(sigma_int, theta_int, function(s,t)
dmvnorm(y, rep(0, n), y_mat(n, lambda, t, s)))
This calculates a variance matrix for each combination of the values in sigma_int and theta_int, uses that matrix to define a dennsity and evaluates it in the point(s) defined in y. I haven't been able to test it though since I don't know the types and dimensions of the variables involved.
outer (along with the apply family of functions and others) will pass along extra arguments to the functions which they call. However, if you are dealing with a case where this is not supported (optim being one example), then you can use the more general approach of currying. To curry a function is to create a new function which has (some of) the variables fixed and therefore has fewer parameters.
library("functional")
output <- outer(x,y,Curry(f,z=2))

in python what is the difference between map(func,list) and [func(x) for x in list]

As far as I can tell the only difference is speed and you have to be a bit tricker in how you define lambda functions.
For instance:
map(lambda x: x + 1, range(4)) == [(lambda x: x + 1)(y) for y in range(4)]
It seems to me like the second way is more pythonic, but I am not sure why.
EDIT:
Yes I understand that the lambda would be excluded in the second example, I was just trying to show as equivalent code as possible.
The right way to do this would be
[y + 1 for y in range(4)]
No need to construct a lambda function here. Your code would unnecessarily build a new function object in every single iteration of the list comprehension.
That said, you can write any call to map() as an equivalent list comprehension. If the first argument to map() is a lambda function, the list comprehension is usually preferred. If the first argument to map() is a function name, both variants are fine. Some people (including me) prefer, say,
map(str, my_list)
while others prefer
[str(x) for x in my_list]
There is no difference, but the pythonic way would be to omit the lambda completely:
[y + 1 for y in range(4)]
Note also that if your mapping function is a "built-in" (written in C) function, rather than a python function or a lambda, map will be faster.
Another pythonic, but uncommon, way (avoids unnecessary lambda) would be:
map(1 .__add__, range(4)) # thanks to SvenMarnach for this
It is usually preferable to avoid lambdas in mapping forms, because a list comprehension will always be more efficient, AND clearer. By contrast, using multi-line functions is perfectly acceptable - there is no way to write them inline, and even if you could, it would likely be less clear.
Another difference is that because map can take multiple sequences to map against, and passes them as positional parameters to the mapping function, one can avoid the zipping that would be required in a list comprehension:
[x+y for x,y in zip(range(4), range(2,6))]
#vs
from operator import add
map(add, range(4), range(2,6))

Resources