I recently started learning Julia by coding a simple implementation of Self Organizing Maps. I want the size and dimensions of the map to be specified by the user, which means I can't really use for loops to work on the map arrays because I don't know in advance how many layers of loops I will need. So I absolutely need broadcasting and slicing functions that work on arrays of arbitrary dimensions.
Right now, I need to construct an array of indices of the map. Say my map is defined by an array of size mapsize = (5, 10, 15), I need to construct an array indices of size (3, 5, 10, 15) where indices[:, a, b, c] should return [a, b, c].
I come from a Python/NumPy background, in which the solution is already given by a specific "function", mgrid :
indices = numpy.mgrid[:5, :10, :15]
print indices.shape # gives (3, 5, 10, 15)
print indices[:, 1, 2, 3] gives [1, 2, 3]
I didn't expect Julia to have such a function on the get-go, so I turned to broadcasting. In NumPy, broadcasting is based on a set of rules that I find quite clear and logical. You can use arrays of different dimensions in the same expression as long as the sizes in each dimension match or one of it is 1 :
(5, 10, 15) broadcasts to (5, 10, 15)
(10, 1)
(5, 1, 15) also broadcasts to (5, 10, 15)
(1, 10, 1)
To help with this, you can also use numpy.newaxis or None to easily add new dimensions to your array :
array = numpy.zeros((5, 15))
array[:,None,:] has shape (5, 1, 15)
This helps broadcast arrays easily :
A = numpy.arange(5)
B = numpy.arange(10)
C = numpy.arange(15)
bA, bB, bC = numpy.broadcast_arrays(A[:,None,None], B[None,:,None], C[None,None,:])
bA.shape == bB.shape == bC.shape = (5, 10, 15)
Using this, creating the indices array is rather straightforward :
indices = numpy.array(numpy.broadcast_arrays(A[:,None,None], B[None,:,None], C[None,None,:]))
(indices == numpy.mgrid[:5,:10,:15]).all() returns True
The general case is of course a bit more complicated, but can be worked around using list comprehension and slices :
arrays = [ numpy.arange(i)[tuple([None if m!=n else slice(None) for m in range(len(mapsize))])] for n, i in enumerate(mapsize) ]
indices = numpy.array(numpy.broadcast_arrays(*arrays))
So back to Julia. I tried to apply the same kind of rationale and ended up achieving the equivalent of the arrays list of the code above. This ended up being rather simpler than the NumPy counterpart thanks to the compound expression syntax :
arrays = [ (idx = ones(Int, length(mapsize)); idx[n] = i;reshape([1:i], tuple(idx...))) for (n,i)=enumerate(mapsize) ]
Now I'm stuck here, as I don't really know how to apply the broadcasting to my list of generating arrays here... The broadcast[!] functions ask for a function f to apply, and I don't have any. I tried using a for loop to try forcing the broadcasting:
indices = Array(Int, tuple(unshift!([i for i=mapsize], length(mapsize))...))
for i=1:length(mapsize)
A[i] = arrays[i]
end
But this gives me an error : ERROR: convert has no method matching convert(::Type{Int64}, ::Array{Int64,3})
Am I doing this the right way? Did I overlook something important? Any help is appreciated.
If you're running julia 0.4, you can do this:
julia> function mgrid(mapsize)
T = typeof(CartesianIndex(mapsize))
indices = Array(T, mapsize)
for I in eachindex(indices)
indices[I] = I
end
indices
end
It would be even nicer if one could just say
indices = [I for I in CartesianRange(CartesianIndex(mapsize))]
I'll look into that :-).
Broadcasting in Julia has been modelled pretty much on broadcasting in NumPy, so you should hopefully find that it obeys more or less the same simple rules (not sure if the way to pad dimensions when not all inputs have the same number of dimensions is the same though, since Julia arrays are column-major).
A number of useful things like newaxis indexing and broadcast_arrays have not been implemented (yet) however. (I hope they will.) Also note that indexing works a bit differently in Julia compared to NumPy: when you leave off indices for trailing dimensions in NumPy, the remaining indices default to colons. In Julia they could be said to default to ones instead.
I'm not sure if you actually need a meshgrid function, most things that you would want to use it for could be done by using the original entries of your arrays array with broadcasting operations. The major reason that meshgrid is useful in matlab is because it is terrible at broadcasting.
But it is quite straightforward to accomplish what you want to do using the broadcast! function:
# assume mapsize is a vector with the desired shape, e.g. mapsize = [2,3,4]
N = length(mapsize)
# Your line to create arrays below, with an extra initial dimension on each array
arrays = [ (idx = ones(Int, N+1); idx[n+1] = i;reshape([1:i], tuple(idx...))) for (n,i) in enumerate(mapsize) ]
# Create indices and fill it one coordinate at a time
indices = zeros(Int, tuple(N, mapsize...))
for (i,arr) in enumerate(arrays)
dest = sub(indices, i, [Colon() for j=1:N]...)
broadcast!(identity, dest, arr)
end
I had to add an initial singleton dimension on the entries of arrays to line up with the axes of indices (newaxis had been useful here...).
Then I go through each coordinate, create a subarray (a view) on the relevant part of indices, and fill it. (Indexing will default to returning subarrays in Julia 0.4, but for now we have to use sub explicitly).
The call to broadcast! just evaluates the identity function identity(x)=x on the input arr=arrays[i], broadcasts to the shape of the output. There's no efficiency lost in using the identity function for this; broadcast! generates a specialized function based on the given function, number of arguments, and number of dimensions of the result.
I guess this is the same as the MATLAB meshgrid functionality. I've never really thought about the generalization to more than two dimensions, so its a bit harder to get my head around.
First, here is my completely general version, which is kinda crazy but I can't think of a better way to do it without generating code for common dimensions (e.g. 2, 3)
function numpy_mgridN(dims...)
X = Any[zeros(Int,dims...) for d in 1:length(dims)]
for d in 1:length(dims)
base_idx = Any[1:nd for nd in dims]
for i in 1:dims[d]
cur_idx = copy(base_idx)
cur_idx[d] = i
X[d][cur_idx...] = i
end
end
#show X
end
X = numpy_mgridN(3,4,5)
#show X[1][1,2,3] # 1
#show X[2][1,2,3] # 2
#show X[3][1,2,3] # 3
Now, what I mean by code generation is that, for the 2D case, you can simply do
function numpy_mgrid(dim1,dim2)
X = [i for i in 1:dim1, j in 1:dim2]
Y = [j for i in 1:dim1, j in 1:dim2]
return X,Y
end
and for the 3D case:
function numpy_mgrid(dim1,dim2,dim3)
X = [i for i in 1:dim1, j in 1:dim2, k in 1:dim3]
Y = [j for i in 1:dim1, j in 1:dim2, k in 1:dim3]
Z = [k for i in 1:dim1, j in 1:dim2, k in 1:dim3]
return X,Y,Z
end
Test with, e.g.
X,Y,Z=numpy_mgrid(3,4,5)
#show X
#show Y
#show Z
I guess mgrid shoves them all into one tensor, so you could do that like this
all = cat(4,X,Y,Z)
which is still slightly different:
julia> all[1,2,3,:]
1x1x1x3 Array{Int64,4}:
[:, :, 1, 1] =
1
[:, :, 1, 2] =
2
[:, :, 1, 3] =
3
julia> vec(all[1,2,3,:])
3-element Array{Int64,1}:
1
2
3
Related
I know in Julia, the index of an array begin from 1. Like
b = Array{Float64, 1}(undef, 10)
This array b is a 1d array with 10 elements. The index of b begins from 1.
But, I want an array whose index is from 0 or any integer, how to do that in Julia?
Say, I want the index ranges from 0 to 9, and I tried to do things like
b = Array{Float64, 1}(undef, 0:9)
But obviously it does not work in Julia.
Can Julia easily define an array with arbitrary index range like Fortran?
I googled a little and it seems not easy to do this in Julia, am I missing something?
Is there a generic way in Julia to define arbitrary indexed array? Or do I have to install packages like OffsetArrays?
It seems just not so great that Julia cannot generically define arbitrary indexed array.
Thanks!
In Julia, this is provided by the OffsetArrays package. Try, for example
using OffsetArrays
A = rand(10)
OA = OffsetArray(A, 0:9)
OA[0]
then
julia> OA[0]
0.26079620656304203
In Julia it is possible to create arrays of any size using the functions zeros(.) or ones(.). Is there a similar function to create an array that is filled with nothing at initialization but also accepts floats? I mean a function like in this example:
a = array_of_nothing(3)
# a = [nothing,nothing,nothing]
a[1] = 3.14
# a = [3.14,nothing,nothing]
I tried to find information on internet, but without success... Sorry, I am a beginner in Julia.
The fill function can be used to create arrays of arbitrary values, but it's not so easy to use here, since you want a Vector{Union{Float64, Nothing}}. Two options come to mind:
A comprehension:
a = Union{Float64, Nothing}[nothing for _ in 1:3];
a[2] = 3.14;
>> a
3-element Array{Union{Nothing, Float64},1}:
nothing
3.14
nothing
Or ordinary array initialization:
a = Vector{Union{Float64, Nothing}}(undef, 3)
fill!(a, nothing)
a[2] = 3.14
It seems that when you do Vector{Union{Float64, Nothing}}(undef, 3) the vector automatically contains nothing, but I wouldn't rely on that, so fill! may be necessary.
I think you are looking for the Base.fill — Function.
fill(x, dims)
This creates an array filled with value x.
println(fill("nothing", (1,3)))
You can also pass a function Foo() like fill(Foo(), dims) which will return an array filled with the result of evaluating Foo() once.
I have the following structure in Julia and I create an array with it.
julia> struct myStruct
a::Int
b::Int
c::String
end
julia> myArray = myStruct.(1:10,11:20,"ABC")
10-element Array{myStruct,1}:
myStruct(1, 11, "ABC")
myStruct(2, 12, "ABC")
myStruct(3, 13, "ABC")
myStruct(4, 14, "ABC")
myStruct(5, 15, "ABC")
myStruct(6, 16, "ABC")
myStruct(7, 17, "ABC")
myStruct(8, 18, "ABC")
myStruct(9, 19, "ABC")
myStruct(10, 20, "ABC")
What shall I do in Julia to get the maximum value of a?
Is it recommended to first getting a 2 column array with the first two values of the struct and then use findmax(my2colArray[:,1]) to find the maximum value?
I have three questions to understand how shall I do this:
If getting the array first is needed, how do I get efficiently that 2 column array?
If it is not needed, how would I get the maximum value of a directly from the array of structs?
The string will contain a maximum of 50 characters, and they will be ASCII (no UTF-8). Shall I fix the length of the string somehow to improve performance?
You can use the maximum function. maximum also takes a function, which you, in this case, can use to sort by the a field:
julia> struct myStruct
a::Int
b::Int
c::String
end
julia> myArray = myStruct.(21:30,11:20,"ABC");
julia> val = maximum(x -> x.a, myArray)
30
(Slightly modified your example to make the maximum value and the index different).
The easiest way to get the max value of a is, as #fredrikekre writes:
maxval = maximum(x->x.a, arr)
Unfortunately, this does not give you the index of that value, which you also asked for in a comment.
Ideally, you could use argmax or findmax instead:
(maxval, maxind) = findmax(x->x.a, arr) # <= This does not work!!
Currently, at version 1.2 of Julia, this does not work.
There may be some other clever solution, but my advice is to just write a loop yourself, it's easy and educational!
To address your questions:
0: (This was not a question) Remember to always name your types with UpperCamelCase: so MyStruct, not myStruct.
No, you don't need this, and it's not a good solution. (Also I don't know why you want a 2-column vector, when you only are looking for the max of a). But if you really want it anyway:
v = getproperty.(x, [:a :b])
For max value, see the answer by #fredrikekre, for max index see below.
No, I don't think so.
Write your own loop to get the max index and value. It's easy and fun, and you learn to write your own fast Julia code:
function find_amax(arr::AbstractArray{MyStruct})
isempty(arr) && ArgumentError("reducing over an empty collection is not allowed")
maxind, maxval = firstindex(arr), first(arr).a
for (i, x) in enumerate(arr)
if x.a > maxval
maxind, maxval = i, x.a
end
end
return maxval, maxind
end
There is a small inefficiency in the code above, the first value and index of x is read twice. If you want even faster performance, you can figure out a way to avoid that.
As for performance, this loop is about as fast as maximum(x->x.a, arr), and more than 60x as fast as building the 2-column matrix you asked for in question 1.
The main lesson is: You don't need to look for some clever "built-in" solution that you can plug your problem into. If you cannot quickly find one, just make your own, it will most likely be faster.
using ShiftedArrays
struct CircularMatrix{T} <: AbstractArray{T,2}
data::Array{T,2}
view::CircShiftedArray
currentIndex::Int
function CircularMatrix{T}(dims...) where T
data = zeros(T, dims...)
CircularMatrix(data, ShiftedArrays.circshift(data, (0, -1)), 1)
end
end
Base.size(M::CircularMatrix) = size(M.data)
Base.eltype(::Type{CircularMatrix{T}}) where {T} = T
function shift_forward!(M::CircularMatrix)
M.shift_forward!(1)
end
function shift_forward!(M::CircularMatrix, n)
# replace the view with a view shifted forwards.
M.currentIndex += n
M.view = ShiftedArrays.circshift(M.data, (n, M.currentIndex))
end
#inline Base.#propagate_inbounds function Base.getindex(M::CircularMatrix, i) = M.view[i]
#inline Base.#propagate_inbounds function Base.setindex!(M::CircularMatrix, data, i) = M.view[i] = data
How can I make CircularMatrix act just like a regular matrix.
So that I can access it like
m = CircularMatrix{Int}(4,4)
m[1, 1] = 5
x = view(m, 1, :)
Your matrix type is defined to be a subtype of AbstractArray{T, 2}. You need to implement a few methods in the informal array interface of Julia for your type to make functions and features that work on AbstractArray{T, 2} to also work on your custom type, that is, to make your CircularMatrix an iterable, indexable, completely functioning matrix.
The methods to implement are
size(M::CircularMatrix)
getindex(M::CircularMatrix, i::Int)
getindex(M::CircularMatrix, I::Vararg{Int, N})
setindex!(M::CircularMatrix, v, i::Int)
setindex!(M::CircularMatrix, v, I::Vararg{Int, N})
You already implement 1, 2 and 4 but have not yet set your indexing style. You might not need 3 and 5 if you choose linear indexing style. You only need to set IndexStyle to be IndexLinear() and maybe a few modifications, then everything should just work for your matrix.
1. size(M::CircularMatrix)
The first one is size. size(A::CircularMatrix) returns a Tuple of dimensions of A. I believe for your matrix probably something like the following
Base.size(M::CircularMatrix) = size(M.data)
2. getindex(M::CircularMatrix, i::Int)
This method is needed if you choose linear indexing style. getindex(M, i::Int) should give you the value at linear index i. You already implement it in your code. If you choose linear indexing, you need to set IndexStyle for your type and then you simply skip 3 and 5. Julia will automatically convert multiple index accesses, e.g. a[3, 5], to a linear index access.
Base.IndexStyle(::Type{<:CircularMatrix}) = IndexLinear()
Base.#propogate_inbounds function Base.getindex(M::CircularMatrix, i::Int)
#boundscheck checkbounds(M, i)
#inbounds M.view[i]
end
It might be better to use #inbounds here on the second line. If the caller doesn't use #inbounds, we check the bounds first and this hopefully makes the subsequent bounds check unnecessary. You might want to omit this during development, though.
3. getindex(M::CircularMatrix, I::Vararg{Int, N})
The third one is for Cartesian indexing style. If you choose this style you need to implement this method. Vararg{Int, N} in the signature stands for "exactly N Int arguments". Here N should be equal to the dimensionality of CircularMatrix. Since this is a matrix, N should be two. If you choose this style, you need to define something like the following
Base.#propogate_inbounds function Base.getindex(A::CircularMatrix, I::Vararg{Int, 2})
#boundscheck checkbounds(A, I...)
#inbounds A.view[# convert I[1]` and `I[2]` to a linear index in `view`]
end
or since your dimensionality is not parametric and a matrix is 2D, simply
Base.#propogate_inbounds function Base.getindex(A::CircularMatrix, i::Int, j::Int)
#boundscheck checkbounds(A, i, j)
#inbounds A.view[# convert i` and `j` to a linear index in `view`]
end
4. setindex!(M::CircularMatrix, v, i::Int)
The fourth one is similar to the second. This method should set the value at linear index i, if you choose linear indexing style.
5. setindex!(M::CircularMatrix, v, I::Vararg{Int, N})
The fifth one should be similar to the third, if you choose Cartesian indexing style.
After the implementations for 1, 2, and 4 and setting IndexStyle, you should have a custom matrix type that just works.
m[1, 1] = 5
x = view(m, 1, :)
for e in
...
end
for i in eachindex(m)
...
end
display(m)
println(m)
length(m)
ndims(m)
map(f, A)
....
These should all work.
A few notes
There is a documentation for Abstract Arrays interface here with a few examples. You can also see Optional Methods to implement.
There is a JuliaArray organization on GitHub that provides lots of useful custom array implementations including StaticArrays, OffsetArrays, etc. and also a JuliaMatrices organization that provides custom matrix types. You might want to take a look at their implementations.
#inline is redundant if you use Base.#propogate_inbounds.
#propagate_inbounds
Tells the compiler to inline a function while retaining the caller's
inbounds context.
You do not need to define eltype for your matrix, since there is already a definition for AbstractArray{T, N} which returns T.
Here's some toy code:
type MyType
x::Int
end
vec = [MyType(1), MyType(2), MyType(3), MyType(4)]
ids = [2, 1, 3, 1]
vec = vec[ids]
julia> vec
4-element Array{MyType,1}:
MyType(2)
MyType(1)
MyType(3)
MyType(1)
That looks fine, except for this behavior:
julia> vec[2].x = 60
60
julia> vec
4-element Array{MyType,1}:
MyType(2)
MyType(60)
MyType(3)
MyType(60)
I want to be able to rearrange the contents of a vector, with the possibility that I eliminate some values and duplicate others. But when I duplicate values, I don't want this copy behavior. Is there an "elegant" way to do this? Something like this works, but yeesh:
vec = [deepcopy(vec[ids[i]]) for i in 1:4]
The issue is that you're creating mutable types, and your vector therefore contains references to the instantiated data - so when you create a vector based on ids, you're creating what amounts to a vector of pointers to the structures. This further means that the elements in the vector with the same id are actually pointers to the same object.
There's no good way to do this without ensuring that your references are different. That either means 1) immutable types, which means you can't reassign x, or 2) copy/deepcopy.