In-place modification/reassignment of vector in Julia without getting copies - vector

Here's some toy code:
type MyType
x::Int
end
vec = [MyType(1), MyType(2), MyType(3), MyType(4)]
ids = [2, 1, 3, 1]
vec = vec[ids]
julia> vec
4-element Array{MyType,1}:
MyType(2)
MyType(1)
MyType(3)
MyType(1)
That looks fine, except for this behavior:
julia> vec[2].x = 60
60
julia> vec
4-element Array{MyType,1}:
MyType(2)
MyType(60)
MyType(3)
MyType(60)
I want to be able to rearrange the contents of a vector, with the possibility that I eliminate some values and duplicate others. But when I duplicate values, I don't want this copy behavior. Is there an "elegant" way to do this? Something like this works, but yeesh:
vec = [deepcopy(vec[ids[i]]) for i in 1:4]

The issue is that you're creating mutable types, and your vector therefore contains references to the instantiated data - so when you create a vector based on ids, you're creating what amounts to a vector of pointers to the structures. This further means that the elements in the vector with the same id are actually pointers to the same object.
There's no good way to do this without ensuring that your references are different. That either means 1) immutable types, which means you can't reassign x, or 2) copy/deepcopy.

Related

How to define non-1 or arbitrary index array in Julia?

I know in Julia, the index of an array begin from 1. Like
b = Array{Float64, 1}(undef, 10)
This array b is a 1d array with 10 elements. The index of b begins from 1.
But, I want an array whose index is from 0 or any integer, how to do that in Julia?
Say, I want the index ranges from 0 to 9, and I tried to do things like
b = Array{Float64, 1}(undef, 0:9)
But obviously it does not work in Julia.
Can Julia easily define an array with arbitrary index range like Fortran?
I googled a little and it seems not easy to do this in Julia, am I missing something?
Is there a generic way in Julia to define arbitrary indexed array? Or do I have to install packages like OffsetArrays?
It seems just not so great that Julia cannot generically define arbitrary indexed array.
Thanks!
In Julia, this is provided by the OffsetArrays package. Try, for example
using OffsetArrays
A = rand(10)
OA = OffsetArray(A, 0:9)
OA[0]
then
julia> OA[0]
0.26079620656304203

How to concatenate two vectors in Julia?

Given two vectors a = [1, 2] and b = [3, 4], how do I obtain the concatenated vector c = [1, 2, 3, 4]? It seems that hcat or vcat could be used as they work on arrays, but when using vectors to store collections of elements it seems unfitting to first think about the orientation of the data; it's just supposed to be a list of values.
You can write
[a; b]
Under the hood this is the same as vcat, but it's terser, looks better, and is easier to remember, as it's also consistent with literal matrix construction syntax.
An alternative for concatenating multiple vectors is
reduce(vcat, (a, b))
Most Array methods treat arrays as general "tensors" of arbitrary ranks ("data cubes"), so you do need to think about the orientation. In the general case, there's cat(a, b; dims), of which hcat and vcat are special cases.
There is another class of methods treating Vectors as list like. From those, append! is the method that, well, appends a vector to another. The problem is that it is mutable. So you can, for example, append!(copy(a), b), or use something like BangBang.NoBang.append (which just selects the right method internally, though).
For the case of more than two vectors to be concatenated, I like the pattern of
reduce(append!, (a, b), init=Int[])

The use of map when the function that map is used on has arrays of inputs

Julia's "higher-order" function "map" looks very useful. But while it is easy to understand how it can be used on functions that have one input, it is not obvious how map can be used when the function has multiple inputs, and when each these may be arrays. I would like discover how map is used in that situation.
Suppose I have the following function:
function randomSample(items, weights)
sample(items, Weights(weights))
end
Example:
Pkg.add("StatsBase")
using StatsBase
randomSample([1,0],[0.5, 0.5])
How can map be used here? I have tried something like:
items = [1 0;1 0;1 0]
weights = [1 0;0.5 0.5;0.75 0.25]
map(randomSample(items,weights))
In the example above, I would expect Julia to output a 3 by 1 array of integers (from the items), each row being either 0 or 1 depending on the corresponding weights.
In your case when items and weights are Matrix you can use the eachrow function like this:
map(randomSample, eachrow(items), eachrow(weights))
If you are on Julia version earlier than 1.1 you can write:
map(i -> randomSample(items[i, :], weights[i, :]), axes(items, 1))
or
map(i -> randomSample(view(items,i, :), view(weights, i, :)), axes(items, 1))
(the latter avoids allocations)
However, in practice I would probably define items and weights as vectors of vectors:
items = [[1, 0],[1, 0],[1, 0]]
weights = [[1, 0], [0.5, 0.5], [0.75, 0.25]]
and then you can simply write:
map(randomSample, items, weights)
or
randomSample.(items, weights)
The reason for my preference is the following:
it is conceptually clearer what is the structure of your data
vector of vectors is easier to mutate (e.g. you can push! a new entry at the end)
vector of vectors can be ragged if needed
in some cases it might be a bit faster (iterating by rows in Julia is not optimal as it uses column-major indexing; of course you can fix it in your Matrix approach by assuming that you store your data columnwise not colwise as you currently do)
(this is not a very strong preference and you can probably choose whatever is more convenient to you)

select string array values based on another array

I can select data that is equal to a value with
data = rand(1:3, 10)
value = 2
data .== value
or equal to a list of values with
values = [1, 2]
in.(data, (values,))
The last one is generic and also works for a scalar: in.(data, (value, )) .
However, this works for Int, but the generic does not work for String values:
data = rand(["A", "B", "C"], 10)
value = "B"
data .== value
values = ["A","B"]
in.(data, (values, ))
in.(data, (value, ))
ERROR: use occursin(x, y) for string containment
Is there a generic way for Strings?
For a generic val input I'm now writing the following, but I feel there must be a better solution.
isa(val, AbstractArray) ? in.(data, (val,)) : data .== val
Background: I'm creating a function to select rows from a dataframe (and do something with them) but I want to allow for both a list of values as well as a single value.
Here is a trick that is worth knowing:
[x;]
Now - if x is an array it will remain an array. If x is a scalar it will become a 1-element array. And this is exactly what you need.
So you can write
in.(data, ([val;],))
The drawback is that it allocates a new array, but I guess that val is small and it is not used in performance critical code? If the code is performance critical I think it is better to treat scalars and arrays by separate branches.

Slicing and broadcasting multidimensional arrays in Julia : meshgrid example

I recently started learning Julia by coding a simple implementation of Self Organizing Maps. I want the size and dimensions of the map to be specified by the user, which means I can't really use for loops to work on the map arrays because I don't know in advance how many layers of loops I will need. So I absolutely need broadcasting and slicing functions that work on arrays of arbitrary dimensions.
Right now, I need to construct an array of indices of the map. Say my map is defined by an array of size mapsize = (5, 10, 15), I need to construct an array indices of size (3, 5, 10, 15) where indices[:, a, b, c] should return [a, b, c].
I come from a Python/NumPy background, in which the solution is already given by a specific "function", mgrid :
indices = numpy.mgrid[:5, :10, :15]
print indices.shape # gives (3, 5, 10, 15)
print indices[:, 1, 2, 3] gives [1, 2, 3]
I didn't expect Julia to have such a function on the get-go, so I turned to broadcasting. In NumPy, broadcasting is based on a set of rules that I find quite clear and logical. You can use arrays of different dimensions in the same expression as long as the sizes in each dimension match or one of it is 1 :
(5, 10, 15) broadcasts to (5, 10, 15)
(10, 1)
(5, 1, 15) also broadcasts to (5, 10, 15)
(1, 10, 1)
To help with this, you can also use numpy.newaxis or None to easily add new dimensions to your array :
array = numpy.zeros((5, 15))
array[:,None,:] has shape (5, 1, 15)
This helps broadcast arrays easily :
A = numpy.arange(5)
B = numpy.arange(10)
C = numpy.arange(15)
bA, bB, bC = numpy.broadcast_arrays(A[:,None,None], B[None,:,None], C[None,None,:])
bA.shape == bB.shape == bC.shape = (5, 10, 15)
Using this, creating the indices array is rather straightforward :
indices = numpy.array(numpy.broadcast_arrays(A[:,None,None], B[None,:,None], C[None,None,:]))
(indices == numpy.mgrid[:5,:10,:15]).all() returns True
The general case is of course a bit more complicated, but can be worked around using list comprehension and slices :
arrays = [ numpy.arange(i)[tuple([None if m!=n else slice(None) for m in range(len(mapsize))])] for n, i in enumerate(mapsize) ]
indices = numpy.array(numpy.broadcast_arrays(*arrays))
So back to Julia. I tried to apply the same kind of rationale and ended up achieving the equivalent of the arrays list of the code above. This ended up being rather simpler than the NumPy counterpart thanks to the compound expression syntax :
arrays = [ (idx = ones(Int, length(mapsize)); idx[n] = i;reshape([1:i], tuple(idx...))) for (n,i)=enumerate(mapsize) ]
Now I'm stuck here, as I don't really know how to apply the broadcasting to my list of generating arrays here... The broadcast[!] functions ask for a function f to apply, and I don't have any. I tried using a for loop to try forcing the broadcasting:
indices = Array(Int, tuple(unshift!([i for i=mapsize], length(mapsize))...))
for i=1:length(mapsize)
A[i] = arrays[i]
end
But this gives me an error : ERROR: convert has no method matching convert(::Type{Int64}, ::Array{Int64,3})
Am I doing this the right way? Did I overlook something important? Any help is appreciated.
If you're running julia 0.4, you can do this:
julia> function mgrid(mapsize)
T = typeof(CartesianIndex(mapsize))
indices = Array(T, mapsize)
for I in eachindex(indices)
indices[I] = I
end
indices
end
It would be even nicer if one could just say
indices = [I for I in CartesianRange(CartesianIndex(mapsize))]
I'll look into that :-).
Broadcasting in Julia has been modelled pretty much on broadcasting in NumPy, so you should hopefully find that it obeys more or less the same simple rules (not sure if the way to pad dimensions when not all inputs have the same number of dimensions is the same though, since Julia arrays are column-major).
A number of useful things like newaxis indexing and broadcast_arrays have not been implemented (yet) however. (I hope they will.) Also note that indexing works a bit differently in Julia compared to NumPy: when you leave off indices for trailing dimensions in NumPy, the remaining indices default to colons. In Julia they could be said to default to ones instead.
I'm not sure if you actually need a meshgrid function, most things that you would want to use it for could be done by using the original entries of your arrays array with broadcasting operations. The major reason that meshgrid is useful in matlab is because it is terrible at broadcasting.
But it is quite straightforward to accomplish what you want to do using the broadcast! function:
# assume mapsize is a vector with the desired shape, e.g. mapsize = [2,3,4]
N = length(mapsize)
# Your line to create arrays below, with an extra initial dimension on each array
arrays = [ (idx = ones(Int, N+1); idx[n+1] = i;reshape([1:i], tuple(idx...))) for (n,i) in enumerate(mapsize) ]
# Create indices and fill it one coordinate at a time
indices = zeros(Int, tuple(N, mapsize...))
for (i,arr) in enumerate(arrays)
dest = sub(indices, i, [Colon() for j=1:N]...)
broadcast!(identity, dest, arr)
end
I had to add an initial singleton dimension on the entries of arrays to line up with the axes of indices (newaxis had been useful here...).
Then I go through each coordinate, create a subarray (a view) on the relevant part of indices, and fill it. (Indexing will default to returning subarrays in Julia 0.4, but for now we have to use sub explicitly).
The call to broadcast! just evaluates the identity function identity(x)=x on the input arr=arrays[i], broadcasts to the shape of the output. There's no efficiency lost in using the identity function for this; broadcast! generates a specialized function based on the given function, number of arguments, and number of dimensions of the result.
I guess this is the same as the MATLAB meshgrid functionality. I've never really thought about the generalization to more than two dimensions, so its a bit harder to get my head around.
First, here is my completely general version, which is kinda crazy but I can't think of a better way to do it without generating code for common dimensions (e.g. 2, 3)
function numpy_mgridN(dims...)
X = Any[zeros(Int,dims...) for d in 1:length(dims)]
for d in 1:length(dims)
base_idx = Any[1:nd for nd in dims]
for i in 1:dims[d]
cur_idx = copy(base_idx)
cur_idx[d] = i
X[d][cur_idx...] = i
end
end
#show X
end
X = numpy_mgridN(3,4,5)
#show X[1][1,2,3] # 1
#show X[2][1,2,3] # 2
#show X[3][1,2,3] # 3
Now, what I mean by code generation is that, for the 2D case, you can simply do
function numpy_mgrid(dim1,dim2)
X = [i for i in 1:dim1, j in 1:dim2]
Y = [j for i in 1:dim1, j in 1:dim2]
return X,Y
end
and for the 3D case:
function numpy_mgrid(dim1,dim2,dim3)
X = [i for i in 1:dim1, j in 1:dim2, k in 1:dim3]
Y = [j for i in 1:dim1, j in 1:dim2, k in 1:dim3]
Z = [k for i in 1:dim1, j in 1:dim2, k in 1:dim3]
return X,Y,Z
end
Test with, e.g.
X,Y,Z=numpy_mgrid(3,4,5)
#show X
#show Y
#show Z
I guess mgrid shoves them all into one tensor, so you could do that like this
all = cat(4,X,Y,Z)
which is still slightly different:
julia> all[1,2,3,:]
1x1x1x3 Array{Int64,4}:
[:, :, 1, 1] =
1
[:, :, 1, 2] =
2
[:, :, 1, 3] =
3
julia> vec(all[1,2,3,:])
3-element Array{Int64,1}:
1
2
3

Resources