In many Machine Learning use cases, you need to create an array filled with zeros, with specific dimensions. In Python, I would use np.zeros((2, 1)) to create a 2x1 array. What is the analog version of this in Julia?
In Julia, many of the operations from packages like numpy are built into the standard library. In the case of creating an array of zeros, you can do the following:
julia> zeros(2, 1)
2×1 Matrix{Float64}:
0.0
0.0
You can read more about the zeros function in the Julia docs.
Related
Long story short, A\b works well, it just takes too much memory. What's the inplace option? Is there an inplace option?
I need to solve A\b lots, and the number of allocations is giving me memory problems. I've tried gmres and similar solvers, but I'm not getting as accurate solutions. I've tried playing with the relative tolerance, but my solutions aren't working well. Note that A is a linear operator... if A doesn't get too ill conditioned, and it is fairly sparse.
LinearSolve.jl is an interface over the linear solvers of the Julia ecosystem. Its interface internally uses the mutating forms (which are not just ldiv!, but also lu! etc. as well, which are not compatible with sparse matrices, etc.) for performance as much as possible. It connects with multiple sparse solvers, so not just the default UMFPACK, but also others like KLU and Krylov methods. It will also do other tricks that are required if you're solving a lot, like caching the symbolic factorizations, which are not necessarily well-documented. It does all of this for you by default if you use the caching interface, and the details for what this entails in all of the sparse scenarios with maximum performance is basically best described by just looking at the source code. So just use it, or look at the code.
Using LinearSolve.jl in this manner is fairly straightforward. For example you just define and solve a LinearProblem:
using LinearSolve
n = 4
A = rand(n,n)
b1 = rand(n); b2 = rand(n)
prob = LinearProblem(A, b1)
linsolve = init(prob)
sol1 = solve(linsolve)
#=
4-element Vector{Float64}:
-0.9247817429364165
-0.0972021708185121
0.6839050402960025
1.8385599677530706
=#
and then you can replace b:
linsolve = LinearSolve.set_b(sol1.cache,b2)
sol2 = solve(linsolve)
sol2.u
#=
4-element Vector{Float64}:
1.0321556637762768
0.49724400693338083
-1.1696540870182406
-0.4998342686003478
=#
or replace A and solve:
A2 = rand(n,n)
linsolve = LinearSolve.set_A(sol2.cache,A2)
sol3 = solve(linsolve)
sol3.u
#=
4-element Vector{Float64}:
-6.793605395935224
2.8673042300837466
1.1665136934977371
-0.4097250749016653
=#
and it will do the right thing, i.e. in solving those 3 equations it will have done two factorizations (only refactorize after A is changed). Using arguments like alias_A and alias_b can be sent to ensure 0 memory is allocated (see the common solver options). When this is sparse matrices, this example would have only performed one symbolic factorization, 2 numerical factorizations, and 3 solves if A retained the same sparsity pattern. And etc. you get the point.
Note that the structs used for the interface are immutable, and so within functions Julia will typically use interprocedural optimizations and escape analysis to determine that they are not needed and entirely eliminate them from the generated code.
Finally found it. LinearAlgebra package: ldiv!
One would think that would show up more readily in a google search.
This is a constructor for arrays:
Array{T}(undef, dims)
I am new to Julia, and don't have a good background in programming. In this syntax, why is undef used for creating the array?
What is a constructor in Julia, in what situation do we use a constructor?
If we don't type constructor, Julia will automatically create a constructor. Then, Why we use constructor?
First, you want to understand what is a constructor:
For that, I suggest you the Julia doc: Constructors in Julia
Now that you have the theory, let's break apart this expression:
a = Array{Int}(undef, (2, 2))
What this expression is saying is "I want a to be an Array of dimension (2, 2)". So Julia will ask for some memory space. When I write it on the Julia REPL:
julia> a = Array{Int}(undef, (2, 2))
2×2 Array{Int64,2}:
0 0
0 0
Now Array{T}(undef, dims) is the generalization of that. "Construct an array of a specific type T with a specific number of dimensions dims"
So far, I didn't explain what is undef. undef is a shortcut for UndefInitializer(). In this example, we wanted an uninitialized array. What does it mean? For that, you have to understand that variables are not created ex nihilo on your terminal. They are occupying a specific place in the memory of your computer. And sometimes, the same memory space was occupied by another variable. So the space my new variable can take might not be empty:
julia> a = Array{Float64}(undef, (2, 2))
2×2 Array{Float64,2}:
6.94339e-310 6.94339e-310
6.94339e-310 0.0
Here, I never asked for these values to be there. I could erase it to work with a clean variable. But that would mean to erase the value for each cell, and it's much more expensive for the computer to replace each value rather than declaring "here is the new variable".
So basically, undef and uninitialized arrays are used for performance purposes. If you want an array well initialized, you can use fill.
Edited for Clarity!
There are a couple of ways to build/generate an array in Julia.
I have been using the single quote or apostrophe approach for column vectors because it is quicker than multiple commas within the []'s:
julia> a = [1 2 3 4]'
4×1 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
1
2
3
4
This is generating what I believe to be a more complicated data type: "LinearAlgebra.Adjoint{Int64,Array{Int64,1}}"
In comparison to comma separated elements:
julia> a = [1,2,3,4]
4-element Array{Int64,1}:
1
2
3
4
Which generates an Array{Int64,1} type.
The Question(s):
Is the LinearAlgebra.Adjoint{...} type more computationally expensive then the base array? Should I avoid generating this array in a general sense?(i.e. outside modeling linear algebra)
It's possible there is a small difference that wouldn't matter on a smaller scope but, I plan to eventually preform operations on large data sets. Should I try to keep consistent with generating them as Array{Int64,1} types for these purposes?
Original
I've been learning Julia and I would like to develop good habits early; focusing on computational efficiency. I have been working with arrays and gotten comfortable with the single quote notation at the end to convert into a column vector. From what I'm understanding about the type system, this isn't just a quicker version than commas way to organize. Is using the comma computationally more expensive or semantically undesirable in general? It seems it wouldn't matter with smaller data sets but, what about larger data sets?(e.g. 10k computations)
Deleted original code example to avoid confusion.
Here's a performance example:
julia> a = rand(10^6);
julia> b = rand(1, 10^6)';
julia> typeof(a)
Array{Float64,1}
julia> typeof(b)
Adjoint{Float64,Array{Float64,2}}
julia> #btime sum($a)
270.137 μs (0 allocations: 0 bytes)
500428.44363296847
julia> #btime sum($b)
1.710 ms (0 allocations: 0 bytes)
500254.2267732659
As you can see, the performance of the sum over the Vector is much better than the sum over the Adjoint (I'm actually a bit surprised about how big the difference is).
But to me the bigger reason to use Vector is that it just seems weird and unnatural to use the complicated and convoluted Adjoint type. It is also a much bigger risk that some code will not accept an Adjoint, and then you've just made extra trouble for yourself.
But, really, why do you want to use the Adjoint? Is it just to avoid writing in commas? How long are these vectors you are typing in? If vector-typing is a very big nuisance to you, you could consider writing [1 2 3 4][:] which will return a Vector. It will also trigger an extra allocation and copy, and it looks strange, but if it's a very big deal to you, maybe it's worth it.
My advice: type the commas.
I want to create an linearly spaced array of 10 elements between 0 and 1 in Julia. I tried the linspace command.
julia> linspace(0.0,1.0,10)
This is the output I got.
linspace(0.0,1.0,10)
I thought I was supposed to get an array as the output. I can't figure out what I'm doing wrong.
I use Julia v0.4.3 from the command line. I tried the same thing from Juno IDE and it worked fine there.
Actually, that is an array-like object! It just displays itself a little strangely because it generates its values on-the-fly as you ask for them. This is similar to ranges, wherein 1:1000000 will simply spit 1:1000000 right back at you without allocating and computing all million elements.
julia> v = linspace(0,1,10)
linspace(0.0,1.0,10)
julia> for elt in v
println(elt)
end
0.0
0.1111111111111111
0.2222222222222222
0.3333333333333333
0.4444444444444444
0.5555555555555556
0.6666666666666666
0.7777777777777778
0.8888888888888888
1.0
julia> v[3]
0.2222222222222222
The display of linspace objects has changed in the developmental version 0.5 precisely because others have had this same reaction, too. It now shows you a preview of the elements it will generate:
julia-0.5> linspace(0,1,10)
10-element LinSpace{Float64}:
0.0,0.111111,0.222222,0.333333,0.444444,0.555556,0.666667,0.777778,0.888889,1.0
julia-0.5> linspace(0,1,101)
101-element LinSpace{Float64}:
0.0,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,…,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99,1.0
I am using Julia v0.3.5, which comes with WinPython 3.4.2.5 build 4. I am new to Julia. I am testing how fast Julia is compared to using SciPy's BLAS wrapper for ddot(), which has the following arguments: x,y,n,offx,incx,offy,incy. Julia's OpenBLAS library does not have the offset arguments, so I am trying to figure out how to emulate them while maximizing speed. I am passing 100MB subarrays of a 1GB array (vector) multiple times, so I don't want Julia to create a copy of each subarray, which would reduce the speed. Python's SciPy function is taking a couple of hours to execute, so would like to optimize Julia's speed. I have been reading about how Julia 0.4 will offer array views that avoid the unnecessary copy, but I am unclear about how Julia 0.3.5 handles this.
So far, I learned using REPL that the BLAS dot() function conflicts with the method in linalg/matmul.jl. Therefore, I learned to access it this way:
import Base.LinAlg.BLAS
methods(Base.LinAlg.BLAS.dot)
From the method display, I see that I can pass pointers to x and y subarrays and thus avoid a copy. For example:
x = [1., 2., 3.]
y = [4., 5., 6.]
Base.LinAlg.BLAS.dot(2, pointer(x), 1, pointer(y), 1)
However, when I add an integer offset to a pointer (to access a subarray), REPL crashes.
How can I pass a pointer to a subarray or a subarray to Base.LinAlg.BLAS.dot without the slowdown of a copy of that subarray?
Anything else I missed?
It segfaults because pointer arithmatic doesn't work like you probably think it does (i.e. the C way). pointer(x)+1 is one byte after pointer(x), but you probably want pointer(x)+8, e.g.
Base.LinAlg.BLAS.dot(2, pointer(x)+1*sizeof(Float64), 1, pointer(y)+1*sizeof(Float64), 1)
or, more user friendly and recommended:
Base.LinAlg.dot(x,2:3,y,2:3)
which is defined here.
I'd say using pointers like that in Julia is really not recommended, but I imagine if you are doing this at all then it is a special circumstance.