Julia - combining vectors into the matrix - julia

Let's assume I have two vectors x = [1, 2] and y = [3, 4]. How to best combine them to get a matrix m = [1 2; 3 4] in Julia Programming language? Thanks in advance for your support.

Note that in vcat(x', y') the operation x' is adjoint so it should not be used if you are working with complex numbers or vector elements that do not have adjoint defined (e.g. strings). Therefore then permutedims should be used but it will be slower as it allocates. A third way to do it is (admittedly it is more cumbersome to type):
julia> [reshape(x, 1, :); reshape(y, 1, :)]
2×2 Array{Int64,2}:
1 2
3 4
It is non allocating like [x'; y'] but does not do a recursive adjoint.
EDIT:
Note for Cameron:
julia> x = repeat(string.('a':'z'), 10^6);
julia> #btime $x';
1.199 ns (0 allocations: 0 bytes)
julia> #btime reshape($x, 1, :);
36.455 ns (2 allocations: 96 bytes)
so reshape allocates but only minimally (it needs to create an array object, while x' creates an immutable struct which does not require allocation).
Also I think it was a design decision to allocate. As for isbitsunion types actually reshape returns a struct so it does not allocate (similarly like for ranges):
julia> #btime reshape($x, 1, :)
12.211 ns (0 allocations: 0 bytes)
1×2 reshape(::Array{Union{Missing, Int64},1}, 1, 2) with eltype Union{Missing, Int64}:
1 missing

Two ways I know of:
julia> x = [1,2];
julia> y = [3,4];
julia> vcat(x', y')
2×2 Array{Int64,2}:
1 2
3 4
julia> permutedims(hcat(x, y))
2×2 Array{Int64,2}:
1 2
3 4

One more option - this one works both with numbers and other objects as Strings:
julia> rotl90([y x])
2×2 Array{Int64,2}:
1 2
3 4

What about
vcat(transpose(x), transpose(y))
or
[transpose(x); transpose(y)]

Related

How do you access multi-dimension array by N array of index element-wise?

Suppose we have
A = [1 2; 3 4]
In numpy, the following syntax will produce
A[[1,2],[1,2]] = [1,4]
But, in julia, the following produce a permutation which output
A[[1,2],[1,2]] = [1 2; 3 4]
Is there a concise way to achieve the same thing as numpy without using for loops?
To get what you want I would use CartesianIndex like this:
julia> A[CartesianIndex.([(1,1), (2,2)])]
2-element Vector{Int64}:
1
4
or
julia> A[[CartesianIndex(1,1), CartesianIndex(2,2)]]
2-element Vector{Int64}:
1
4
Like Bogumil said, you probably want to use CartesianIndex. But if you want to get your result from supplying the vectors of indices for each dimensions, as in your Python [1,2],[1,2] example, you need to zip these indices first:
julia> A[CartesianIndex.(zip([1,2], [1,2]))]
2-element Vector{Int64}:
1
4
How does this work? zip traverses both vectors of indices at the same time (like a zipper) and returns an iterator over the tuples of indices:
julia> zip([1,2],[1,2]) # is a lazy iterator
zip([1, 2], [1, 2])
julia> collect(zip([1,2],[1,2])) # collect to show all the tuples
2-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(2, 2)
and then CartesianIndex turns them into cartesian indices, which can then be used to get the corresponding values in A:
julia> CartesianIndex.(zip([1,2],[1,2]))
2-element Vector{CartesianIndex{2}}:
CartesianIndex(1, 1)
CartesianIndex(2, 2)

Utilizing ndgrid/meshgrid functionality in Julia

I'm trying to find functionality in Julia similar to MATLAB's meshgrid or ndgrid. I know Julia has defined ndgrid in the examples but when I try to use it I get the following error.
UndefVarError: ndgrid not defined
Anyone know either how to get the builtin ndgrid function to work or possibly another function I haven't found or library that provides these methods (the builtin function would be preferred)? I'd rather not write my own in this case.
Thanks!
We prefer to avoid these functions, since they allocate arrays that usually aren't necessary. The values in these arrays have such a regular structure that they don't need to be stored; they can just be computed during iteration. For example, one alternative approach is to write an array comprehension:
julia> [ 10i + j for i=1:5, j=1:5 ]
5×5 Array{Int64,2}:
11 12 13 14 15
21 22 23 24 25
31 32 33 34 35
41 42 43 44 45
51 52 53 54 55
Or, you can write for loops, or iterate over a product iterator:
julia> collect(Iterators.product(1:2, 3:4))
2×2 Array{Tuple{Int64,Int64},2}:
(1, 3) (1, 4)
(2, 3) (2, 4)
I do find sometimes it's convenient to use some function like meshgrid in numpy. It's easy to do it with list comprehension:
function meshgrid(x, y)
X = [i for i in x, j in 1:length(y)]
Y = [j for i in 1:length(x), j in y]
return X, Y
end
e.g.
x = 1:4
y = 1:3
X, Y = meshgrid(x, y)
now
julia> X
4×3 Array{Int64,2}:
1 1 1
2 2 2
3 3 3
4 4 4
julia> Y
4×3 Array{Int64,2}:
1 2 3
1 2 3
1 2 3
1 2 3
However, I did not find this makes the code run faster than using iteration. Here's what I mean:
After defining
x = 1:1000
y = x
X, Y = meshgrid(x, y)
I did benchmark on the following two functions
using Statistics
function fun1()
return mean(sqrt.(X.*X + Y.*Y))
end
function fun2()
sum = 0.0
for i in 1:1000
for j in 1:1000
sum += sqrt(i*i + j*j)
end
end
return sum / (1000*1000)
end
Here are the benchmark results:
julia> #btime fun1()
8.310 ms (19 allocations: 30.52 MiB)
julia> #btime run2()
1.671 ms (0 allocations: 0 bytes)
The meshgrid method is both significantly slower and taking more memory. Any Julia expert knows why? I understand Julia is a compiling language unlike Python so iterations won't be slower than vectorization, but I don't understand why vector(array) calculation is many times slower than iteration. (For bigger N this difference is even larger.)
Edit: After reading this post, I have the following updated version of the 'meshgrid' method. The idea is to not create a meshgrid beforehand, but to do it in the calculation via Julia's powerful elementwise array operation:
x = collect(1:1000)
y = x'
function fun1v2()
mean(sqrt.(x .* x .+ y .* y))
end
The trick here is the .+ between a size-M column array and a size-N row array which returns a M-by-N array. It does the 'meshgrid' for you. This function is nearly 3 times faster then fun1, albeit not as fast as fun2.
julia> #btime fun1v2()
3.189 ms (24 allocations: 7.63 MiB)
765.8435104896155
Above, #ChrisRackauckas suggests that the "proper way" to do this is with a lazy operator but he hadn't gotten around to it.
There is now a registered packaged with lazy ndgrid in it:
https://github.com/JuliaArrays/LazyGrids.jl
It is more general than the version in
VectorizedRoutines.jl
because it can handle vectors with different types, e.g.,
ndgrid(1:3, Float16[0:2], ["x", "y", "z"]).
There are Literate.jl examples in the docs that show the lazy performance is pretty good.
Of course lazy meshgrid is just one step away:
meshgrid(y,x) = (ndgrid_lazy(x,y)[[2,1]]...,)

How do I add a dimension to an array? (opposite of `squeeze`)

I can never remember how to do this this.
How can go
from a Vector (size (n1)) to a Column Matrix (size (n1,1))?
or from a Matrix (size (n1,n2)) to a Array{T,3} (size (n1,n2,1))?
or from a Array{T,3} (size (n1,n2,n3)) to a Array{T,4} (size (n1,n2,n3, 1))?
and so forth.
I want to know to take Array and use it to define a new Array with an extra singleton trailing dimension.
I.e. the opposite of squeeze
You can do this with reshape.
You could define a method for this:
add_dim(x::Array) = reshape(x, (size(x)...,1))
julia> add_dim([3;4])
2×1 Array{Int64,2}:
3
4
julia> add_dim([3;4])
2×1 Array{Int64,2}:
3
4
julia> add_dim([3 30;4 40])
2×2×1 Array{Int64,3}:
[:, :, 1] =
3 30
4 40
julia> add_dim(rand(4,3,2))
4×3×2×1 Array{Float64,4}:
[:, :, 1, 1] =
0.483307 0.826342 0.570934
0.134225 0.596728 0.332433
0.597895 0.298937 0.897801
0.926638 0.0872589 0.454238
[:, :, 2, 1] =
0.531954 0.239571 0.381628
0.589884 0.666565 0.676586
0.842381 0.474274 0.366049
0.409838 0.567561 0.509187
Another easy way other than reshaping to an exact shape, is to use cat and ndims together. This has the added benefit that you can specify "how many extra (singleton) dimensions you would like to add". e.g.
a = [1 2 3; 2 3 4];
cat(ndims(a) + 0, a) # add zero singleton dimensions (i.e. stays the same)
cat(ndims(a) + 1, a) # add one singleton dimension
cat(ndims(a) + 2, a) # add two singleton dimensions
etc.
UPDATE (julia 1.3). The syntax for cat has changed in julia 1.3 from cat(dims, A...) to cat(A...; dims=dims).
Therefore the above example would become:
a = [1 2 3; 2 3 4];
cat(a; dims = ndims(a) + 0 )
cat(a; dims = ndims(a) + 1 )
cat(a; dims = ndims(a) + 2 )
etc.
Obviously, like Dan points out below, this has the advantage that it's nice and clean, but it comes at the cost of allocation, so if speed is your top priority and you know what you're doing, then in-place reshape operations will be faster and are to be preferred.
Some time before the Julia 1.0 release a reshape(x, Val{N}) overload was added which for N > ndim(x) results in the adding of right most singleton dimensions.
So the following works:
julia> add_dim(x::Array{T, N}) where {T,N} = reshape(x, Val(N+1))
add_dim (generic function with 1 method)
julia> add_dim([3;4])
2×1 Array{Int64,2}:
3
4
julia> add_dim([3 30;4 40])
2×2×1 Array{Int64,3}:
[:, :, 1] =
3 30
4 40
julia> add_dim(rand(4,3,2))
4×3×2×1 Array{Float64,4}:
[:, :, 1, 1] =
0.0737563 0.224937 0.6996
0.523615 0.181508 0.903252
0.224004 0.583018 0.400629
0.882174 0.30746 0.176758
[:, :, 2, 1] =
0.694545 0.164272 0.537413
0.221654 0.202876 0.219014
0.418148 0.0637024 0.951688
0.254818 0.624516 0.935076
Try this
function extend_dims(A,which_dim)
s = [size(A)...]
insert!(s,which_dim,1)
return reshape(A, s...)
end
the variable extend_dim specifies which dimension to extend
Thus
extend_dims(randn(3,3),1)
will produce a 1 x 3 x 3 array and so on.
I find this utility helpful when passing data into convolutional neural networks.

How do you select a subset of an array based on a condition in Julia

How do you do simply select a subset of an array based on a condition? I know Julia doesn't use vectorization, but there must be a simple way of doing the following without an ugly looking multi-line for loop
julia> map([1,2,3,4]) do x
return (x%2==0)?x:nothing
end
4-element Array{Any,1}:
nothing
2
nothing
4
Desired output:
[2, 4]
Observed output:
[nothing, 2, nothing, 4]
You are looking for filter
http://docs.julialang.org/en/release-0.4/stdlib/collections/#Base.filter
Here is example an
filter(x->x%2==0,[1,2,3,5]) #anwers with [2]
There are element-wise operators (beginning with a "."):
julia> [1,2,3,4] % 2 .== 0
4-element BitArray{1}:
false
true
false
true
julia> x = [1,2,3,4]
4-element Array{Int64,1}:
1
2
3
4
julia> x % 2 .== 0
4-element BitArray{1}:
false
true
false
true
julia> x[x % 2 .== 0]
2-element Array{Int64,1}:
2
4
julia> x .% 2
4-element Array{Int64,1}:
1
0
1
0
You can use the find() function (or the .== syntax) to accomplish this. E.g.:
julia> x = collect(1:4)
4-element Array{Int64,1}:
1
2
3
4
julia> y = x[find(x%2.==0)]
2-element Array{Int64,1}:
2
4
julia> y = x[x%2.==0] ## more concise and slightly quicker
2-element Array{Int64,1}:
2
4
Note the .== syntax for the element-wise operation. Also, note that find() returns the indices that match the criteria. In this case, the indices matching the criteria are the same as the array elements that match the criteria. For the more general case though, we want to put the find() function in brackets to denote that we are using it to select indices from the original array x.
Update: Good point #Lutfullah Tomak about the filter() function. I believe though that find() can be quicker and more memory efficient. (though I understand that anonymous functions are supposed to get better in version 0.5 so perhaps this might change?) At least in my trial, I got:
x = collect(1:100000000);
#time y1 = filter(x->x%2==0,x);
# 9.526485 seconds (100.00 M allocations: 1.554 GB, 2.76% gc time)
#time y2 = x[find(x%2.==0)];
# 3.187476 seconds (48.85 k allocations: 1.504 GB, 4.89% gc time)
#time y3 = x[x%2.==0];
# 2.570451 seconds (57.98 k allocations: 1.131 GB, 4.17% gc time)
Update2: Good points in comments to this post that x[x%2.==0] is faster than x[find(x%2.==0)].
Another updated version:
v[v .% 2 .== 0]
Probably, for the newer versions of Julia, one needs to add broadcasting dot before both % and ==

Check size in bytes of variable using Julia

Question: How do I check the size in bytes of a variable using Julia?
What I've tried: In Matlab, the whos() function provided this information, but in Julia that just provides the variable names and module. Browsing the standard library in the Julia manual, sizeof() looked promising, but it only appears to provide the size of the canonical binary representation, rather than the current variable.
sizeof works on variables too
sizeof(a::Array{T,N})
returns the size of the array times the element size.
julia> x = [1 2 3 4]
1x4 Array{Int64,2}:
1 2 3 4
julia> sizeof(x)
32
julia> x = Int8[1 2 3 4]
1x4 Array{Int8,2}:
1 2 3 4
julia> sizeof(x)
4
sizeof(B::BitArray{N})
returns chunks; each chunk is 8 bytes so can represent up to 64 bits
julia> x = BitArray(36);
julia> sizeof(x)
8
julia> x = BitArray(65);
julia> sizeof(x)
16
sizeof(s::ASCIIString) and sizeof(s::UTF8String)
return the number of characters in the string (1 byte/char).
julia> sizeof("hello world")
11
sizeof(s::UTF16String) and sizeof(s::UTF32String)
Same as above but with 2 and 4 bytes/character respectively.
julia> x = utf32("abcd");
julia> sizeof(x)
16
Accordingly other strings
sizeof(s::SubString{ASCIIString}) at string.jl:590
sizeof(s::SubString{UTF8String}) at string.jl:591
sizeof(s::RepString) at string.jl:690
sizeof(s::RevString{T<:AbstractString}) at string.jl:737
sizeof(s::RopeString) at string.jl:802
sizeof(s::AbstractString) at string.jl:71
core values
returns the number of bytes each variable uses
julia> x = Int64(0);
julia> sizeof(x)
8
julia> x = Int8(0);
julia> sizeof(x)
1
julia> x = Float16(0);
julia> sizeof(x)
2
julia> x = sizeof(Float64)
8
one would expect, but note that Julia characters are wide characters
julia> sizeof('a')
4
getBytes
For cases where the layout is more complex and/or not contiguous. Here's a function that will iterate over the fields of a variable (if any) and return of sum of all of the sizeof results which should be the total number of bytes allocated.
getBytes(x::DataType) = sizeof(x);
function getBytes(x)
total = 0;
fieldNames = fieldnames(typeof(x));
if fieldNames == []
return sizeof(x);
else
for fieldName in fieldNames
total += getBytes(getfield(x,fieldName));
end
return total;
end
end
using it
create an instance of a random-ish type...
julia> type X a::Vector{Int64}; b::Date end
julia> x = X([i for i = 1:50],now())
X([1,2,3,4,5,6,7,8,9,10 … 41,42,43,44,45,46,47,48,49,50],2015-02-09)
julia> getBytes(x)
408
The function Base.summarysize provides exactly that
It also includes the overhead from the struct as seen in the examples.
julia> struct Foo a; b end
julia> Base.summarysize(ones(10000))
80040
julia> Base.summarysize(Foo(ones(10000), 1))
80064
julia> Base.summarysize(Foo(ones(10000), Foo(ones(10, 10), 1)))
80920
However, care should be taken as the function is non-exported and might not be future proof
In julia 1.6, varinfo() shows sizes:
julia> a = 1;
julia> v = ones(10000);
julia> varinfo()
name size summary
–––––––––––––––– ––––––––––– –––––––––––––––––––––––––––––
Base Module
Core Module
InteractiveUtils 250.022 KiB Module
Main Module
ans 78.164 KiB 10000-element Vector{Float64}
v 78.164 KiB 10000-element Vector{Float64}
a 8 bytes Int64
For specific variables, either use pattern matching (r"..." is a regular expression):
julia> varinfo(r"^v$")
name size summary
–––– –––––––––– –––––––––––––––––––––––––––––
v 78.164 KiB 10000-element Vector{Float64}
or combine the Base.summarysize from Korbinian answer with Base.format_bytes:
julia> pretty_summarysize(x) = Base.format_bytes(Base.summarysize(x))
pretty_summarysize (generic function with 1 method)
julia> pretty_summarysize(v)
"78.164 KiB"
Edit: beware that summarysize had a bug, at least in 1.5.3 and 1.6.1. varinfo was affected as well. It is fixed (tested with 1.7.3).

Resources