In MATLAB, pre-allocation of arrays that would otherwise change size during iterations, is recommended. On the assumption the recommendation also ges for Julia, I would like to know how to do that.
In MATLAB, the following code pre-allocates a 5 by 10 array:
A = nan(5,10)
How would the same be obtained in Julia?
A = nan(5,10) does not just allocate an array of doubles, but also initializes the entries of the array with NaNs (although MATLAB may not really fill the array under the hood).
The short answer is A = nan(5, 10) in MATLAB is equivalent in semantic to A = fill(NaN, 5, 10) in Julia.
The long answer is that, you have many options and more control for array allocation and initialization in Julia.
Array allocation without initialization
In Julia, it is possible to allocate an array or a matrix (which is a 2D array) and leave the entries uninitialized.
# Allocate an "uninitialized" m-by-n `Float64` (`double`) matrix
A = Array{Float64, 2}(undef, m, n)
# or equivalently
A = Matrix{Float64}(undef, m, n) # `Matrix{T}` is equivalent to `Array{T, 2}`
# you do not need to type dimensionality even with `Array`,
# the dimensionality will be inferred from the number of parameters
A = Array{Float64}(undef, m, n)
# You can do the same for arrays of different dimensions or other types
A = Array{Float64, 3}(undef, m, n, k) # 3D `Float64` array of size m*n*k
A = Array{Int64}(undef, m) # 1D `Int64` array
A = Vector{Float32}(undef, m) # 1D `Float32` (i.e. `single`) array. `Vector{T} === Array{T, 1}`
Array allocation without initialization using another array
In Julia, you can use the function similar to allocate an array using the type, element type and dimensionality information of another matrix and leave it uninitialized.
A = zeros(UInt8, m, n)
B = similar(A) # allocates the same type of array (dense, sparse, etc.) with the same element type, and the same dimensions as `A`
C = similar(A, Float64) # allocates the same type of array with the same dimensions as `A` but with the element type of `Float64`
Allocate an empty array
You can use the array construction syntax above passing 0 as the dimension, or simply T[] to create an empty array of type T.
A = Float64[]
Array allocations with initialization
# Allocate a `Float64` array and fill it with 0s
A = zeros(m, n) # m-by-n Float64 matrix filled with zeros
A = zeros(m, n, k, l) # a 4D Float64 array filled with zeros
# similarly to fill with `Float64` 1s
A = ones(m, n)
A = ones(m) # a 1D array of size `m`
A = ones(m, 1) # an `m`-by-1 2D array
# you can use these functions with other types as well
A = zeros(Float32, m, n)
A = ones(UInt8, m, n, k)
# you can allocate an array/matrix and fill it with any value you like using `fill`
# the type is inferred by the value entered
A = fill(4.0, (m, n)) # m-by-n matrix filled with `4.0`
A = fill(0.50f, m, n, k) # a 3D Float32 array filled `0.5`s
# so to fill with `NaN`s you can use
A = fill(NaN, m, n)
# random initialization
A = rand(m, n) # m-by-n Float64 matrix with uniformly distributed values in `[0,1)`
A = rand(Float32, m) # an array of size `m` with uniformly distributed values in `[0,1)`
A = randn(m, n) # the same as `rand` but with normally distributed values
# you can initialize the array with values randomly (uniform) picked from a collection
A = rand([1, 5, 7], m, n) # values will be picked from the array `[1,5,7]`
You can use fill!(A, value) or simply use A .= value to fill an already allocated array with the same value. If you import the module Random, you may use rand! or randn! to fill an already allocated array with random values. This might give you significant performance benefits as allocations will be avoided.
You may take a look at the Multi-dimensional Arrays section of Julia documentation to learn more about arrays in Julia.
Notes
In Julia, you cannot change the size of a multi-dimensional (not 1D) built-in Array.
A = zeros(5,5)
A[6,5] = 2 # bounds error
But you can push! values into a one-dimensional Array. This will efficiently resize the array.
julia> A = Int[];
julia> push!(A, 1);
julia> push!(A, 2, 3);
julia> A
3-element Array{Int64,1}:
1
2
3
In MATLAB nan() allocates an array of NaN as floating point values. So in Julia
A = fill(NaN, (5,10))
does this.
Related
In Julia, I would like to randomly generate a discrete fourier transform matrix of size n by n. I am currently not sure how to how to do such. Does anyone perhaps know a way to do this in Julia?
As said in the comments, you can use the FFTW.jl package for this purpose:
julia> using FFTW
julia> n = 5;
julia> rnd = rand(1:100, n, n);
julia> fft(rnd)
5×5 Matrix{ComplexF64}:
1216.0+0.0im 65.8754+10.3181im 106.125+119.409im 106.125-119.409im 65.8754-10.3181im
160.529-95.3957im 177.376-31.8946im -28.6976+150.325im 52.8237+139.038im 82.2517-165.542im
-91.0288-22.1566im 136.676+28.1im -42.8763-97.2573im -97.7517+4.15021im 8.19756-13.5548im
-91.0288+22.1566im 8.19756+13.5548im -97.7517-4.15021im -42.8763+97.2573im 136.676-28.1im
160.529+95.3957im 82.2517+165.542im 52.8237-139.038im -28.6976-150.325im 177.376+31.8946im
And for a Real datatype, you can use the rfft function:
julia> let n = 5
rnd = rand(n, n)
rfft(rnd)
end
3×5 Matrix{ComplexF64}:
10.54+0.0im 1.15104+0.522166im -0.449373-0.686863im -0.449373+0.686863im 1.15104-0.522166im
-1.2319+0.3485im -0.622914-0.649385im 1.39743-0.733653im 1.66696+0.694317im -1.59092-0.578805im
0.501205+0.962713im 0.056338-0.207403im 0.0156042-0.181913im -1.87067-1.66951im -0.672603-0.969665im
It might rise the question that why the result is a 3x5 matrix:
According to the official doc about the rfft function:
"Multidimensional FFT of a real array A, exploiting the fact that the transform has conjugate symmetry in order to save roughly half the computational time and storage costs compared with fft. If A has size (n_1, ..., n_d), the result has size (div(n_1,2)+1, ..., n_d)."
It's also possible to first create a random nxn matrix with the eltype of ComplexF64 and perform the fft on it; For this create the rnd variable like rand(ComplexF64, n, n) in the above let block, and replace the rfft with fft function.
I want to construct a bijective function f(k, n, seed) from [1,n] to [1,n] where 1<=k<=n and 1<=f(k, n, seed)<=n for each given seed and n. The function actually should return a value from a random permutation of 1,2,...,n. The randomness is decided by the seed. Different seed may corresponds to different permutation. I want the function f(k, n, seed)'s time complexity to be O(1) for each 1<=k<=n and any given seed.
Anyone knows how can I construct such a function? The randomness is allowed to be pseudo-randomness. n can be very large (e.g. >= 1e8).
No matter how you do it, you will always have to store a list of numbers still available or numbers already used ... A simple possibility would be the following
const avail = [1,2,3, ..., n];
let random = new Random(seed)
function f(k,n) {
let index = random.next(n - k);
let result = avail[index]
avail[index] = avail[n-k];
}
The assumptions for this are the following
the array avail is 0-indexed
random.next(x) creates an random integer i with 0 <= i < x
the first k to call the function f with is 0
f is called for contiguous k 0, 1, 2, 3, ..., n
The principle works as follows:
avail holds all numbers still available for the permution. When you take a random index, the element at that index is the next element of the permutation. Then instead of slicing out that element from the array, which is quite expensive, you just replace the currently selected element with the last element in the avail array. In the next iteration you (virtually) decrease the size of the avail array by 1 by decreasing the upper limit for the random by one.
I'm not sure, how secure this random permutation is in terms of distribution of the values, ie for instance it may happen that a certain range of numbers is more likely to be in the beginning of the permuation or in the end of the permutation.
A simple, but not very 'random', approach would be to use the fact that, if a is relatively prime to n (ie they have no common factors), then
x-> (a*x + b)%n
is a permutation of {0,..n-1} to {0,..n-1}. To find the inverse of this, you can use the extended euclidean algorithm to find k and l so that
1 = gcd(a,n) = k*a+l*n
for then the inverse of the map above is
y -> (k*x + c) mod n
where c = -k*b mod n
So you could choose a to be a 'random' number in {0,..n-1} that is relatively prime to n, and b to be any number in {0,..n-1}
Note that you'll need to do this in 64 bit arithmetic to avoid overflow in computing a*x.
I cannot for the life of me figure out how to use Cumulants.jl to get moments or cumulants from some data. I find the docs (https://juliahub.com/docs/Cumulants/Vrq25/1.0.4/) completely over my head.
Suppose I have a vector of some data e.g.:
using Distributions
d = rand(Exponential(1), 1000)
The documentation suggests, so far as I can understand it, that cumulants(d, 3) should return the first three cumulants. The function is defined like so:
cumulants(data::Matrix{T}, m::Int = 4, b::Int = 2) where T<: AbstractFloat
a Matrix in Julia is, so far as I understand, a 2D array. So I convert my data to a 2D array:
dm = reshape(d, length(d), 1)
But I get:
julia> cumulants(dm,3)
ERROR: DimensionMismatch("bad block size 2 > 1")
My question concisely: how do I use Cumulants.jl to get the first m cumulants and the first m moments from some simulated data?
Thanks!
EDIT: In the above example, c = cumulants(dm,3,1) as suggested in a comment will give, for c:
3-element Array{SymmetricTensors.SymmetricTensor{Float64,N} where N,1}:
SymmetricTensors.SymmetricTensor{Float64,1}(Union{Nothing, Array{Float64,1}}[[1.0122452678071678]], 1, 1, 1, true)
SymmetricTensors.SymmetricTensor{Float64,2}(Union{Nothing, Array{Float64,2}}[[1.0336298356976195]], 1, 1, 1, true)
SymmetricTensors.SymmetricTensor{Float64,3}(Union{Nothing, Array{Float64,3}}[[2.5438037582591146]], 1, 1, 1, true)
I find that I can access the first, second, and third cumulants by:
c[1][1]
c[2][1,1]
c[3][1,1,1]
Which I arrived at essentially by guessing. I have no idea why this nutty output format exists. I still cannot figure out how to get the first m cumulants as a vector easily.
As I wrote in the comments, if you have a univariate problem you should use cumulants(dm,3,1) as the cumulants are calulated using tensors and the tensors are saved in a block structure, where the blocks are of size bxb, i.e. the third argument in the function call. However, If you have only one column, the size of the tensors will be 1, so that it doesn't make sense to save it in a 2x2 block.
To access the cumulants in Array form you have to convert them first. This is done by Array(cumulant(data, nc, b)[c]), where nc is the number of cumulants you want to calculate, b is the block size (for efficient storage of the tensors), and c is the cumulant you need.
Summing up:
using Cumulants
# univariate data
unidata = rand(1000,1)
uc = cumulants(unidata, 3, 1)
Array(uc[1])
#1-element Array{Float64,1}:
# 0.48772026299259374
Array(uc[2])
#1×1 Array{Float64,2}:
# 0.0811428357438324
Array(uc[3])
#[:, :, 1] =
# 0.0008653019738796724
# multivariate data
multidata = rand(1000,3)
mc = cumulants(multidata, 3, 2)
Array(mc[1])
#3-element Array{Float64,1}:
# 0.5024511157116442
# 0.4904838734508787
# 0.48286680648519215
Array(mc[2])
#3×3 Array{Float64,2}:
# 0.0834021 -0.00368562 -0.00151614
# -0.00368562 0.0835084 0.00233202
# -0.00151614 0.00233202 0.0808521
Array(mc[3])
# [:, :, 1] =
# -0.000506926 -0.000763061 -0.00183751
# -0.000763061 -0.00104804 -0.00117227
# -0.00183751 -0.00117227 0.00112968
#
# [:, :, 2] =
# -0.000763061 -0.00104804 -0.00117227
# -0.00104804 0.000889305 -0.00116559
# -0.00117227 -0.00116559 -0.000106866
#
# [:, :, 3] =
# -0.00183751 -0.00117227 0.00112968
# -0.00117227 -0.00116559 -0.000106866
# 0.00112968 -0.000106866 0.00131965
The optimal size of the blocks can be found in their software paper (https://arxiv.org/pdf/1701.05420.pdf), where they write (for proper latex formatting have a look at the paper):
5.2.1. The optimal size of blocks.
The number of coefficients required to store a super-symmetric tensor of order d and n dimensions is equal to (d+n−1 over n). The storage of tensor disregarding the super-symmetry requires n^d coefficients. The block structure introduced in [49] uses more than minimal amount of memory but allows for easier further processing of super-symmetric tensors.If we store the super-symmetric tensor in the block structure, the block size parameter b appears. In our implementation in order to store a super-symmetric tensor in the block structure we need, assuming n|b, an array of (n over b)^d pointers to blocks and an array of the same size of flags that contain the information if a pointer points to a valid block. Recall that diagonal blocks contain redundant information.Therefore on the one hand, the smaller the value of b, the less redundant elements on diagonals of the block structure. On the other hand, the larger the value of b,the smaller the number of blocks, the smaller the blocks’ operation overhead, and the fewer the number of pointers pointing to empty blocks. For detailed discussion of memory usage see [49]. The analysis of the influence of the parameter b on the computational time of cumulants for some parameters are presented in Fig. 2. We obtain the shortest computation time for b = 2 in almost all test cases, and this value will be set as default and used in all efficiency tests. Note that for b = 1we loose all the memory savings.
Using Oskar's helpful answer, I thought I'd provide my wrapper function which accomplishes the goal of returning a vector of the first m cumulants, given an input of a 1D array of data.
using Cumulants
function mycumulants(d, m) # given a 1D array of data d, return a vector of the first m cumulants
res = zeros(m)
dm = reshape(d, length(d), 1) # Convert 1D array to 2D
c = cumulants(dm, m, 1) # Need the 1 (block size) or else it errors
for i in 1:m
res[i] = Array(c[i])[1]
end
return(res)
end
But it turns out this is really really slow compared to just directly calculating raw moments and coverting them to cumulants by e.g. k[5] = u[5] - 5*u[4]*u[1] - 10*u[3]*u[2] + 20*u[3]*u[1]^2 + 30*u[2]^2*u[1] - 60*u[2]*u[1]^3 + 24*u[1]^5 so I think I won't be using Cumulants.jl after all for my purposes, which only involve univariate data at this time.
Example of time difference for calculating the first six cumulants from some simulated data:
----Data set 2----
Direct calculation:
1.997 ms (14 allocations: 469.47 KiB)
Cumulants.jl:
152.798 ms (318435 allocations: 17.59 MiB)
I need to perform calculations on random batches of very larger integers. I have a function that compares the numbers for certain properties and returns a value based on those properties. Since the batches and the numbers themselves can be very large I want to speed up the process by utilizing the GPU.
Here is a short version of what i have running purely on the CPU now.
using Statistics
function check(M)
val = 0
#some code that calculates val based on M, e.g. the mean
val = mean(M)
return val
end
function distribution(N, n, exp) # N=batchsize, n=# of batches, exp=exponent of the upper limit of the integers
avg = 0
M = zeros(BigInt, N)
for i = 1 : n
M = rand(1 : BigInt(10) ^ exp, N)
avg += check(M)
end
avg /= n
println(avg, ":", N)
end
#example
distribution(10 ^ 3, 10 ^ 6, 100)
I have briefly used CUDAnative in Julia but I don't know how to implement the BigInt calculations. That package would be preferred but others are fine as well. Any help is appreciated.
BigInts are CPU only since they are not implemented in Julia, see 1.
For a given vector I would like to find the orthogonal basis around it,
i.e. the given vector normalized and randomly chosen basis of orthogonal sub-space.
Is there a convenient function for this in Julia?
The function you are looking for is called nullspace.
julia> x = randn(5);
julia> x⊥ = nullspace(x');
julia> x'x⊥
1×4 Array{Float64,2}:
7.69373e-16 -5.45785e-16 -4.27252e-17 1.26778e-16
You could define a function orth (if someonehasn't already done this)
orth(M) = qr(M)[1]
See here:
https://groups.google.com/forum/#!topic/julia-users/eG6a4tj7LGg and http://docs.julialang.org/en/release-0.4/stdlib/linalg/
Or from IterativeSolvers.jl:
orthogonalize{T}(v::Vector{T}, K::KrylovSubspace{T})
See:
https://github.com/JuliaMath/IterativeSolvers.jl
The following will calculate an orthogonal basis for matrix M
function orth(M::Matrix)
matrixRank = rank(M)
Ufactor = svdfact(M)[:U]
return Ufactor[:,1:matrixRank]
end
With julia documentation:
"""
orth(M)
Compute an orthogonal basis for matrix `A`.
Returns a matrix whose columns are the orthogonal vectors that constitute a basis for the range of A.
If the matrix is square/invertible, returns the `U` factor of `svdfact(A)`, otherwise the first *r* columns of U, where *r* is the rank of the matrix.
# Examples
```julia
julia> orth([1 8 12; 5 0 7])
2×2 Array{Float64,2}:
-0.895625 -0.44481
-0.44481 0.895625
```
```
julia> orth([1 8 12; 5 0 7 ; 6 4 1])
3×3 Array{Float64,2}:
-0.856421 0.468442 0.217036
-0.439069 -0.439714 -0.783498
-0.27159 -0.766298 0.582259
```
"""
function orth(M::Matrix)
matrixRank = rank(M)
Ufactor = svdfact(M)[:U]
return Ufactor[:,1:matrixRank]
end