Can broadcast be applied to subarrays/slices of array in julia - julia

I would like to broadcast to subarrays (i.e. broadcast to slices of array). This is useful in GPU programming for example I'd like to have:
X,Y,Z = (rand(3,3,3) for _=1:3)
#.[1,2] X = f(2X^2 + 6X^3 - sqrt(X)) + Y*Z
where #.[1,2] means broadcasting along dim 3, i.e. apply colons to dim 1 and 2 in the expression.
Is there a way to support this "sub-broadcast"?
Edit: add an example
julia> a = reshape(1:8, (2,2,2))
2×2×2 Base.ReshapedArray{Int64,3,UnitRange{Int64},Tuple{}}:
[:, :, 1] =
1 3
2 4
[:, :, 2] =
5 7
6 8
julia> broadcast(*, a, a)
2×2×2 Array{Int64,3}:
[:, :, 1] =
1 9
4 16
[:, :, 2] =
25 49
36 64
julia> broadcast(*, a, a, dim=3) # I would like to broadcast the matrix multiplication (batch mode) instead of elementwise multiplication.
2×2×2 Array{Int64,3}:
[:, :, 1] =
7 15
10 22
[:, :, 2] =
67 91
78 106
Edit 2: I am trying different vectorization methods here https://arrayfire.com/introduction-to-vectorization/ via the ArrayFire.jl package (a wrapper of arrayfire), including vectorization, parallel for-loops, batching, and advanced vectorizations. arrayfire has the gfor (http://arrayfire.org/docs/page_gfor.htm) method to run parallel computations on slices of matrices, and is implemented via broadcast in ArrayFire.jl. Currently, julia's broadcast acts element-wise. I just wonder if it can act "slice-wise" then it can do pure julia 3D and 4D support for Linear Algebra functions (https://github.com/arrayfire/arrayfire/issues/483).
Of course normal nested for loops will get the job done. I am just exited about the broadcast . syntax, and wonder if it can be extend.

I think you're looking for mapslices.
mapslices(x->x*x, a, (1,2))
2×2×2 Array{Int64,3}:
[:, :, 1] =
7 15
10 22
[:, :, 2] =
67 91
78 106
mapslices(f, A, dims)
Transform the given dimensions of array A using function f. f is
called on each slice of A of the form A[...,:,...,:,...]. dims is an
integer vector specifying where the colons go in this expression.
The results are concatenated along the remaining dimensions. For
example, if dims is [1,2] and A is 4-dimensional, f is called on
A[:,:,i,j] for all i and j.
Use setdiff if you want to specify which dimension to concatenate along instead of on which to apply the function.
(If you need a multi-argument version check out this gist https://gist.github.com/alexmorley/e585df0d8d857d7c9e4a5af75df43d00)

Related

Get only elements of one array that are in another array

I'm learning Julia coming from Python. I want to get the elements of an array b such that each element is in array a. My attempt in Julia is shown after doing what I need in python. My question is this: is there a better/faster way to do this in Julia? I'm suspicious about the simplicity of what I've written in Julia, and I worry that such a naive looking solution might have suboptimal performance (again coming from Python).
Python:
import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([7, 8, 2, 3, 5])
indices_b_in_a = np.nonzero(np.isin(b, a))
b_in_a = b[indices_b_in_a]
# array([2, 3])
Julia:
a = [1, 2, 3, 4];
b = [7, 8, 2, 3, 5];
indices_b_in_a = findall(ele -> ele in a, b);
b_in_a = b[indices_b_in_a];
#2-element Vector{Int64}:
# 2
# 3
Maybe this would be a helpful answer:
julia> intersect(Set(a), Set(b))
Set{Int64} with 2 elements:
2
3
# Or even
julia> intersect(a, b)
2-element Vector{Int64}:
2
3
Note that if you had repetitive numbers, this method fails to exactly replicate your expected behavior since I'm working on unique values here! If you have repetitive elements, there should replace an element-by-element approach for searching! in that case, using binary search would be a good choice.
Another approach is using broadcasting in Julia:
julia> a = rand(1:100, 1000);
b = rand(1:3000, 5000);
julia> b[in.(b, Ref(a))]
161-element Vector{Int64}:
8
5
70
73
⋮
# Exactly the same approach with a slightly different syntax
julia> b[b.∈Ref(a)]
161-element Vector{Int64}:
8
5
70
73
30
63
73
⋮
Q: What is the role of Ref in the above code block?
Ans: By wrapping a in Ref, I make a Reference of a and prevent the compiler from iterating through a as well within the broadcasting procedure. Otherwise, it would try to iterate on the elements of a and b simultaneously which is not the right solution (even if both objects hold the same length).
However, Julia's syntax is specific (typically), but it's not that complicated. I said this because you mentioned:
I worry that such a naive looking solution...
Last but not least, do not forget to wrap your code in a function if you want to obtain a good performance in Julia.
Another approach using array comprehensions.
julia> [i for i in a for j in b if i == j]
2-element Vector{Int64}:
2
3

What is the correct way to select rows from matrix by a boolean array?

I have a boolean array (from previous computations) and I would like to select the related rows from several matrices. That is why I need the proper index array (to be reused later). This is easy in Matlab and python but I do not crock the correct julian way of doing it...
I am aware of DataFrames, but would like to find an orthodox matrix and array way of doing this.
In Matlab I would say:
n= 9; temp= 1:n; A= 1.0 + temp;
someTest= mod(temp,2) == 0; % just a substitute of a more complex case
% now I have both someTest and A!
inds= find(someTest); Anew= A(inds,:);
% I got inds (which I need)!
What I have got working is this:
n= 10; data= Array(1:n); A= 1.0 .+ data;
someTest= rem.(data,2) .== 0;
inds= [xy[2] for xy in zip(someTest,1:length(someTest)) if xy[1]]; # (*)
Anew= A[inds,:];
What I assumed is that there is some shorter way to express the above phrase. in v. 0.6 there was find() function, but I have not gotten good sense of the julia documentation yet (I am a very very newbie in this).
You can use the BitArray just directly to select the elements:
julia> A[someTest]
5-element Array{Float64,1}:
3.0
5.0
7.0
9.0
11.0
Fot your case:
julia> A[someTest,:] == A[inds,:]
true
find in 0.6 was renamed to findall in Julia 1.0.
To get inds, you can simply do the following:
inds = findall(someTest)
You do not have to compute the intermediate someTest first, which would allocate an array you do not intend to use. Instead, you can do the test with findall directly passing a predicate function.
inds = findall(x -> rem(x,2) == 0, data)
This will return indices of data for which the predicate rem(x,2) == 0 returns true. This will not allocate an intermediate array to find the indices, and should be faster.
As a side note, most of the time you do not need to materialize a range in Julia. Ranges are already iterable and indexable. They will automatically be converted to an Array when there is a need. Array(1:n) or collect(1:n) are usually redundant, and allocates more memory.
Your Matlab code doesn't work. A is just a row-vector (1x9 matrix), so when you try to do A(inds, :) you get an error:
>> Anew= A(inds,:)
Index in position 1 exceeds array bounds
(must not exceed 1).
But if you just fix that, you can solve the problem in exactly the same way in both Matlab and Julia, using either logical indices or regular ones:
Matlab (I'm making sure it's a matrix this time):
n = 9;
temp = (1:n).';
A = temp * (1:4);
inds = mod(temp,2) == 0;
>> A(inds, :) % using logical indices
ans =
2 4 6 8
4 8 12 16
6 12 18 24
8 16 24 32
>> A(find(inds), :) % using regular indices
ans =
2 4 6 8
4 8 12 16
6 12 18 24
8 16 24 32
And now, Julia:
n = 9;
temp = 1:n;
A = temp .* (1:4)'; # notice that we're transposing the opposite vector from Matlab
inds = mod.(temp, 2) .== 0; # you can use iseven.(temp) instead
julia> A[inds, :] # logical indices (BitArray)
4×4 Array{Int64,2}:
2 4 6 8
4 8 12 16
6 12 18 24
8 16 24 32
julia> A[findall(inds), :] # regular integer indices
4×4 Array{Int64,2}:
2 4 6 8
4 8 12 16
6 12 18 24
8 16 24 32
In this case, I would use the logical indices in both Julia and Matlab. In fact, the Matlab linter (in the editor) will tell that you should use logical indices here because it's faster. In Julia, however, there might be cases where it's more efficient to use inds = findall(iseven, temp), and just skip the logical BitArray, like #hckr says.

Utilizing ndgrid/meshgrid functionality in Julia

I'm trying to find functionality in Julia similar to MATLAB's meshgrid or ndgrid. I know Julia has defined ndgrid in the examples but when I try to use it I get the following error.
UndefVarError: ndgrid not defined
Anyone know either how to get the builtin ndgrid function to work or possibly another function I haven't found or library that provides these methods (the builtin function would be preferred)? I'd rather not write my own in this case.
Thanks!
We prefer to avoid these functions, since they allocate arrays that usually aren't necessary. The values in these arrays have such a regular structure that they don't need to be stored; they can just be computed during iteration. For example, one alternative approach is to write an array comprehension:
julia> [ 10i + j for i=1:5, j=1:5 ]
5×5 Array{Int64,2}:
11 12 13 14 15
21 22 23 24 25
31 32 33 34 35
41 42 43 44 45
51 52 53 54 55
Or, you can write for loops, or iterate over a product iterator:
julia> collect(Iterators.product(1:2, 3:4))
2×2 Array{Tuple{Int64,Int64},2}:
(1, 3) (1, 4)
(2, 3) (2, 4)
I do find sometimes it's convenient to use some function like meshgrid in numpy. It's easy to do it with list comprehension:
function meshgrid(x, y)
X = [i for i in x, j in 1:length(y)]
Y = [j for i in 1:length(x), j in y]
return X, Y
end
e.g.
x = 1:4
y = 1:3
X, Y = meshgrid(x, y)
now
julia> X
4×3 Array{Int64,2}:
1 1 1
2 2 2
3 3 3
4 4 4
julia> Y
4×3 Array{Int64,2}:
1 2 3
1 2 3
1 2 3
1 2 3
However, I did not find this makes the code run faster than using iteration. Here's what I mean:
After defining
x = 1:1000
y = x
X, Y = meshgrid(x, y)
I did benchmark on the following two functions
using Statistics
function fun1()
return mean(sqrt.(X.*X + Y.*Y))
end
function fun2()
sum = 0.0
for i in 1:1000
for j in 1:1000
sum += sqrt(i*i + j*j)
end
end
return sum / (1000*1000)
end
Here are the benchmark results:
julia> #btime fun1()
8.310 ms (19 allocations: 30.52 MiB)
julia> #btime run2()
1.671 ms (0 allocations: 0 bytes)
The meshgrid method is both significantly slower and taking more memory. Any Julia expert knows why? I understand Julia is a compiling language unlike Python so iterations won't be slower than vectorization, but I don't understand why vector(array) calculation is many times slower than iteration. (For bigger N this difference is even larger.)
Edit: After reading this post, I have the following updated version of the 'meshgrid' method. The idea is to not create a meshgrid beforehand, but to do it in the calculation via Julia's powerful elementwise array operation:
x = collect(1:1000)
y = x'
function fun1v2()
mean(sqrt.(x .* x .+ y .* y))
end
The trick here is the .+ between a size-M column array and a size-N row array which returns a M-by-N array. It does the 'meshgrid' for you. This function is nearly 3 times faster then fun1, albeit not as fast as fun2.
julia> #btime fun1v2()
3.189 ms (24 allocations: 7.63 MiB)
765.8435104896155
Above, #ChrisRackauckas suggests that the "proper way" to do this is with a lazy operator but he hadn't gotten around to it.
There is now a registered packaged with lazy ndgrid in it:
https://github.com/JuliaArrays/LazyGrids.jl
It is more general than the version in
VectorizedRoutines.jl
because it can handle vectors with different types, e.g.,
ndgrid(1:3, Float16[0:2], ["x", "y", "z"]).
There are Literate.jl examples in the docs that show the lazy performance is pretty good.
Of course lazy meshgrid is just one step away:
meshgrid(y,x) = (ndgrid_lazy(x,y)[[2,1]]...,)

How do I add a dimension to an array? (opposite of `squeeze`)

I can never remember how to do this this.
How can go
from a Vector (size (n1)) to a Column Matrix (size (n1,1))?
or from a Matrix (size (n1,n2)) to a Array{T,3} (size (n1,n2,1))?
or from a Array{T,3} (size (n1,n2,n3)) to a Array{T,4} (size (n1,n2,n3, 1))?
and so forth.
I want to know to take Array and use it to define a new Array with an extra singleton trailing dimension.
I.e. the opposite of squeeze
You can do this with reshape.
You could define a method for this:
add_dim(x::Array) = reshape(x, (size(x)...,1))
julia> add_dim([3;4])
2×1 Array{Int64,2}:
3
4
julia> add_dim([3;4])
2×1 Array{Int64,2}:
3
4
julia> add_dim([3 30;4 40])
2×2×1 Array{Int64,3}:
[:, :, 1] =
3 30
4 40
julia> add_dim(rand(4,3,2))
4×3×2×1 Array{Float64,4}:
[:, :, 1, 1] =
0.483307 0.826342 0.570934
0.134225 0.596728 0.332433
0.597895 0.298937 0.897801
0.926638 0.0872589 0.454238
[:, :, 2, 1] =
0.531954 0.239571 0.381628
0.589884 0.666565 0.676586
0.842381 0.474274 0.366049
0.409838 0.567561 0.509187
Another easy way other than reshaping to an exact shape, is to use cat and ndims together. This has the added benefit that you can specify "how many extra (singleton) dimensions you would like to add". e.g.
a = [1 2 3; 2 3 4];
cat(ndims(a) + 0, a) # add zero singleton dimensions (i.e. stays the same)
cat(ndims(a) + 1, a) # add one singleton dimension
cat(ndims(a) + 2, a) # add two singleton dimensions
etc.
UPDATE (julia 1.3). The syntax for cat has changed in julia 1.3 from cat(dims, A...) to cat(A...; dims=dims).
Therefore the above example would become:
a = [1 2 3; 2 3 4];
cat(a; dims = ndims(a) + 0 )
cat(a; dims = ndims(a) + 1 )
cat(a; dims = ndims(a) + 2 )
etc.
Obviously, like Dan points out below, this has the advantage that it's nice and clean, but it comes at the cost of allocation, so if speed is your top priority and you know what you're doing, then in-place reshape operations will be faster and are to be preferred.
Some time before the Julia 1.0 release a reshape(x, Val{N}) overload was added which for N > ndim(x) results in the adding of right most singleton dimensions.
So the following works:
julia> add_dim(x::Array{T, N}) where {T,N} = reshape(x, Val(N+1))
add_dim (generic function with 1 method)
julia> add_dim([3;4])
2×1 Array{Int64,2}:
3
4
julia> add_dim([3 30;4 40])
2×2×1 Array{Int64,3}:
[:, :, 1] =
3 30
4 40
julia> add_dim(rand(4,3,2))
4×3×2×1 Array{Float64,4}:
[:, :, 1, 1] =
0.0737563 0.224937 0.6996
0.523615 0.181508 0.903252
0.224004 0.583018 0.400629
0.882174 0.30746 0.176758
[:, :, 2, 1] =
0.694545 0.164272 0.537413
0.221654 0.202876 0.219014
0.418148 0.0637024 0.951688
0.254818 0.624516 0.935076
Try this
function extend_dims(A,which_dim)
s = [size(A)...]
insert!(s,which_dim,1)
return reshape(A, s...)
end
the variable extend_dim specifies which dimension to extend
Thus
extend_dims(randn(3,3),1)
will produce a 1 x 3 x 3 array and so on.
I find this utility helpful when passing data into convolutional neural networks.

Element wise multiplication of a matrix and a vector?

Is there an in-built function in octave to multiply each column of a m X n element-wise with a column vector of size m that is more efficient than using a loop?
You can replicate the vector as many times as you need to turn it into a m x n matrix as well and then use the built-in element-wise multiplication operator .*:
>> A = [1 2; 3 4; 5 6];
>> B = [1; 2; 3];
>> A .* repmat(B, 1, columns(A))
ans =
1 2
6 8
15 18
I haven't tried Anna Lear's answer but as nobar commented in that answer, Octave now does broadcasting. So you just have to do A.*B. You will get a warning that'll say an automatic product broadcasting is being applied
>> A.*B
warning: product: automatic broadcasting operation applied
ans =
1 2
6 8
15 18

Resources