linspace() not giving any proper result in Julia - julia

I want to create an linearly spaced array of 10 elements between 0 and 1 in Julia. I tried the linspace command.
julia> linspace(0.0,1.0,10)
This is the output I got.
linspace(0.0,1.0,10)
I thought I was supposed to get an array as the output. I can't figure out what I'm doing wrong.
I use Julia v0.4.3 from the command line. I tried the same thing from Juno IDE and it worked fine there.

Actually, that is an array-like object! It just displays itself a little strangely because it generates its values on-the-fly as you ask for them. This is similar to ranges, wherein 1:1000000 will simply spit 1:1000000 right back at you without allocating and computing all million elements.
julia> v = linspace(0,1,10)
linspace(0.0,1.0,10)
julia> for elt in v
println(elt)
end
0.0
0.1111111111111111
0.2222222222222222
0.3333333333333333
0.4444444444444444
0.5555555555555556
0.6666666666666666
0.7777777777777778
0.8888888888888888
1.0
julia> v[3]
0.2222222222222222
The display of linspace objects has changed in the developmental version 0.5 precisely because others have had this same reaction, too. It now shows you a preview of the elements it will generate:
julia-0.5> linspace(0,1,10)
10-element LinSpace{Float64}:
0.0,0.111111,0.222222,0.333333,0.444444,0.555556,0.666667,0.777778,0.888889,1.0
julia-0.5> linspace(0,1,101)
101-element LinSpace{Float64}:
0.0,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,…,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99,1.0

Related

Failure to report number that is too small

I did the following calculations in Julia
z = LinRange(-0.09025000000000001,0.19025000000000003,5)
d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051 .* (similar(z) .*0 .+1))
minimum(cdf.(d, (z[3]+z[2])/2))
The problem I have is that the last code sometimes gives me the correct result 4.418051841202834e-239, sometimes reports the error DomainError with NaN: Normal: the condition σ >= zero(σ) is not satisfied. I think this is because 4.418051841202834e-239 is too small. But I was wondering why my code can give me different results.
In addition to points mentioned by others, here are a few more:
Firstly, don't use LinRange when numerical accuracy is of importance. This is what the range function is for. LinRange can be used when numerical precision is of lesser importance, since it is faster. From the docstring of range:
Special care is taken to ensure intermediate values are computed rationally. To avoid this induced overhead, see the LinRange constructor.
Example:
julia> LinRange(-0.09025000000000001,0.19025000000000003,5) .- range(-0.09025000000000001,0.19025000000000003,5)
0.0:-3.469446951953614e-18:-1.3877787807814457e-17
Secondly, this is a pretty terrible way to create a vector of a certain value:
0.0051 .* (similar(z) .*0 .+1)
Other's have mentioned ones, etc. but I think it's better to use fill
fill(0.0051, size(z))
which directly fills the array with the right value. Perhaps one should use convert(eltype(z), 0.0051) inside fill.
Thirdly, don't create this vector at all! You use broadcasting, so just use the scalar value:
d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051) # look! just a scalar!
This is how broadcasting works, it expands singleton dimensions implicitly to match other arguments (without actually wasting that memory).
Much of the point of broadcasting is that you don't need to create that sort of 'dummy arrays' anymore. If you find yourself doing that, give it another think; constant-valued arrays are inherently wasteful, and you shouldn't need to create them.
There are two problems:
Noted by #Dan Getz: similar does no initialize the values and quite often unused areas of memory have values corresponding to NaN. In that case multiplication by 0 does not help since NaN * 0 == NaN. Instead you want to have ones(eltype(z),size(z))
you need to use higher precision than Float64. BigFloat is one way to go - just you need to remember to call setprecision(BigFloat, 128) so you actually control how many bits you use. However, much more time-efficient solution (if you run computations at scale) will be to use a dedicated package such as DoubleFloats.
Sample corrected code using DoubleFloats below:
julia> z = LinRange(df64"-0.09025000000000001",df64"0.19025000000000003",5)
5-element LinRange{Double64, Int64}:
-0.09025000000000001,-0.020125,0.05000000000000001,0.12012500000000002,0.19025000000000003
julia> d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051 .* ones(eltype(z),size(z)))
5-element Vector{Normal{Double64}}:
Normal{Double64}(μ=-0.083250505, σ=0.0051)
Normal{Double64}(μ=-0.016631754999999998, σ=0.0051)
Normal{Double64}(μ=0.049986995000000006, σ=0.0051)
Normal{Double64}(μ=0.11660574500000001, σ=0.0051)
Normal{Double64}(μ=0.18322449500000001, σ=0.0051)
julia> minimum(cdf.(d, (z[3]+z[2])/2))
4.418051841203009e-239
The problem in the code is similar(z) which produces a vector with undefined entries and is used without initialization. Use ones(length(z)) instead.

What is the inplace version of A\b in Julia?

Long story short, A\b works well, it just takes too much memory. What's the inplace option? Is there an inplace option?
I need to solve A\b lots, and the number of allocations is giving me memory problems. I've tried gmres and similar solvers, but I'm not getting as accurate solutions. I've tried playing with the relative tolerance, but my solutions aren't working well. Note that A is a linear operator... if A doesn't get too ill conditioned, and it is fairly sparse.
LinearSolve.jl is an interface over the linear solvers of the Julia ecosystem. Its interface internally uses the mutating forms (which are not just ldiv!, but also lu! etc. as well, which are not compatible with sparse matrices, etc.) for performance as much as possible. It connects with multiple sparse solvers, so not just the default UMFPACK, but also others like KLU and Krylov methods. It will also do other tricks that are required if you're solving a lot, like caching the symbolic factorizations, which are not necessarily well-documented. It does all of this for you by default if you use the caching interface, and the details for what this entails in all of the sparse scenarios with maximum performance is basically best described by just looking at the source code. So just use it, or look at the code.
Using LinearSolve.jl in this manner is fairly straightforward. For example you just define and solve a LinearProblem:
using LinearSolve
n = 4
A = rand(n,n)
b1 = rand(n); b2 = rand(n)
prob = LinearProblem(A, b1)
linsolve = init(prob)
sol1 = solve(linsolve)
#=
4-element Vector{Float64}:
-0.9247817429364165
-0.0972021708185121
0.6839050402960025
1.8385599677530706
=#
and then you can replace b:
linsolve = LinearSolve.set_b(sol1.cache,b2)
sol2 = solve(linsolve)
sol2.u
#=
4-element Vector{Float64}:
1.0321556637762768
0.49724400693338083
-1.1696540870182406
-0.4998342686003478
=#
or replace A and solve:
A2 = rand(n,n)
linsolve = LinearSolve.set_A(sol2.cache,A2)
sol3 = solve(linsolve)
sol3.u
#=
4-element Vector{Float64}:
-6.793605395935224
2.8673042300837466
1.1665136934977371
-0.4097250749016653
=#
and it will do the right thing, i.e. in solving those 3 equations it will have done two factorizations (only refactorize after A is changed). Using arguments like alias_A and alias_b can be sent to ensure 0 memory is allocated (see the common solver options). When this is sparse matrices, this example would have only performed one symbolic factorization, 2 numerical factorizations, and 3 solves if A retained the same sparsity pattern. And etc. you get the point.
Note that the structs used for the interface are immutable, and so within functions Julia will typically use interprocedural optimizations and escape analysis to determine that they are not needed and entirely eliminate them from the generated code.
Finally found it. LinearAlgebra package: ldiv!
One would think that would show up more readily in a google search.

How to increase Julia code performance by preventing memory allocation?

I am reading Julia performance tips,
https://docs.julialang.org/en/v1/manual/performance-tips/
At the beginning, it mentions two examples.
Example 1,
julia> x = rand(1000);
julia> function sum_global()
s = 0.0
for i in x
s += i
end
return s
end;
julia> #time sum_global()
0.009639 seconds (7.36 k allocations: 300.310 KiB, 98.32% compilation time)
496.84883432553846
julia> #time sum_global()
0.000140 seconds (3.49 k allocations: 70.313 KiB)
496.84883432553846
We see a lot of memory allocations.
Now example 2,
julia> x = rand(1000);
julia> function sum_arg(x)
s = 0.0
for i in x
s += i
end
return s
end;
julia> #time sum_arg(x)
0.006202 seconds (4.18 k allocations: 217.860 KiB, 99.72% compilation time)
496.84883432553846
julia> #time sum_arg(x)
0.000005 seconds (1 allocation: 16 bytes)
496.84883432553846
We see that by putting x into into the argument of the function, memory allocations almost disappeared and the speed is much faster.
My question are, can anyone explain,
why example 1 needs so many allocation, and why example 2 does not need as many allocations as example 1?
I am a little confused.
in the two examples, we see that the second time we run Julia, it is always faster than the first time.
Does that mean we need to run Julia twice? If Julia is only fast at the second run, then what is point? Why not Julia just do a compiling first, then do a run, just like Fortran?
Is there any general rule to preventing memory allocations? Or do we just always have to do a #time to identify the issue?
Thanks!
why example 1 needs so many allocation, and why example 2 does not need as many allocations as example 1?
Example 1 needs so many allocations, because x is a global variable (defined out of scope of the function sum_arg). Therefore the type of variable x can potentially change at any time, i.e. it is possible that:
you define x and sum_arg
you compile sum_arg
you redefine x (change its type) and run sum_arg
In particular, as Julia supports multiple threading, both actions in step 3 in general could happen even in parallel (i.e. you could have changed the type of x in one thread while sum_arg would be running in another thread).
So because after compilation of sum_arg the type of x can change Julia, when compiling sum_arg has to ensure that the compiled code does not rely on the type of x that was present when the compilation took place. Instead Julia, in such cases, allows the type of x to be changed dynamically. However, this dynamic nature of allowed x means that it has to be checked in run-time (not compile time). And this dynamic checking of x causes performance degradation and memory allocations.
You could have fixed this by declaring x to be a const (as const ensures that the type of x may not change):
julia> const x = rand(1000);
julia> function sum_global()
s = 0.0
for i in x
s += i
end
return s
end;
julia> #time sum_global() # this is now fast
0.000002 seconds
498.9290555615045
Why not Julia just do a compiling first, then do a run, just like Fortran?
This is exactly what Julia does. However, the benefit of Julia is that it does compilation automatically when needed. This allows you for a smooth interactive development process.
If you wanted you could compile the function before it is run with the precompile function, and then run it separately. However, normally people just run the function without doing it explicitly.
The consequence is that if you use #time:
The first time you run a function it returns you both execution time and compilation time (and as you can see in examples you have pasted - you get information what percentage of time was spent on compilation).
In the consecutive runs the function is already compiled so only execution time is returned.
Is there any general rule to preventing memory allocations?
These rules are exactly given in the Performance Tips section of the manual that you are quoting in your question. The tip on using #time is a diagnostic tip there. All other tips are rules that are recommended to get a fast code. However, I understand that the list is long so a shorter list that is good enough to start with in my experience is:
Avoid global variables
Avoid containers with abstract type parameters
Write type stable functions
Avoid changing the type of a variable

Type Efficiency: Array{Int64, 1} VERSUS LinearAlgebra.Adjoint{ Int64, Array{Int64, 1}}

Edited for Clarity!
There are a couple of ways to build/generate an array in Julia.
I have been using the single quote or apostrophe approach for column vectors because it is quicker than multiple commas within the []'s:
julia> a = [1 2 3 4]'
4×1 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
1
2
3
4
This is generating what I believe to be a more complicated data type: "LinearAlgebra.Adjoint{Int64,Array{Int64,1}}"
In comparison to comma separated elements:
julia> a = [1,2,3,4]
4-element Array{Int64,1}:
1
2
3
4
Which generates an Array{Int64,1} type.
The Question(s):
Is the LinearAlgebra.Adjoint{...} type more computationally expensive then the base array? Should I avoid generating this array in a general sense?(i.e. outside modeling linear algebra)
It's possible there is a small difference that wouldn't matter on a smaller scope but, I plan to eventually preform operations on large data sets. Should I try to keep consistent with generating them as Array{Int64,1} types for these purposes?
Original
I've been learning Julia and I would like to develop good habits early; focusing on computational efficiency. I have been working with arrays and gotten comfortable with the single quote notation at the end to convert into a column vector. From what I'm understanding about the type system, this isn't just a quicker version than commas way to organize. Is using the comma computationally more expensive or semantically undesirable in general? It seems it wouldn't matter with smaller data sets but, what about larger data sets?(e.g. 10k computations)
Deleted original code example to avoid confusion.
Here's a performance example:
julia> a = rand(10^6);
julia> b = rand(1, 10^6)';
julia> typeof(a)
Array{Float64,1}
julia> typeof(b)
Adjoint{Float64,Array{Float64,2}}
julia> #btime sum($a)
270.137 μs (0 allocations: 0 bytes)
500428.44363296847
julia> #btime sum($b)
1.710 ms (0 allocations: 0 bytes)
500254.2267732659
As you can see, the performance of the sum over the Vector is much better than the sum over the Adjoint (I'm actually a bit surprised about how big the difference is).
But to me the bigger reason to use Vector is that it just seems weird and unnatural to use the complicated and convoluted Adjoint type. It is also a much bigger risk that some code will not accept an Adjoint, and then you've just made extra trouble for yourself.
But, really, why do you want to use the Adjoint? Is it just to avoid writing in commas? How long are these vectors you are typing in? If vector-typing is a very big nuisance to you, you could consider writing [1 2 3 4][:] which will return a Vector. It will also trigger an extra allocation and copy, and it looks strange, but if it's a very big deal to you, maybe it's worth it.
My advice: type the commas.

setindex! error in Julia

I have been writing scripts in Julia recently, and have run across a problem using the setindex! function that I cannot find an answer to in any documentation (I have also searched stackoverflow, but could not find an answer - my apologies if my search was not good enough and I am repeating a question).
I am getting a MethodError relating to set index with code similar to the following (the error also appears in this code, which is altered simply to make it simpler):
a = 0:0.01:1
a = 2 * pi * (a - 0.4)
a[abs(a) .> pi] += - sign(a[a .> pi]) * 2 * pi
I realize that in the above code I could achieve a similar effect by simply changing the initial expression used to generate a so that it is never greater than pi in magnitude, but in the original code this would be much less readable due to intermediate steps that are not included - additionally, regardless of whether that is possible with this particular problems, there will be other instances using setindex! similarly which I would like to have a solution to.
I have tried using integer indexes instead of logical indexes and have tried storing the logical or integer index as another value. Neither has worked. I would guess this is coming from a fairly basic misunderstanding on my part, but thought this would be a good resource for help.
If this post is not-standard for stackoverflow in any way, I apologize, this is my first (and I did read the guidelines, but may have perfectly implemented them).
Thanks in advance
You haven't materialized the FloatRange into an Array, so there aren't really any indices to play with yet. It's just a rangelike object:
julia> a = 0:0.01:1
0.0:0.01:1.0
julia> a = 2 * pi * (a - 0.4)
-2.5132741228718345:0.06283185307179587:3.769911184307752
julia> dump(a)
FloatRange{Float64}
start: Float64 -251.32741228718345
step: Float64 6.283185307179586
len: Float64 101.0
divisor: Float64 100.0
Compare with:
julia> a = [a]
101-element Array{Float64,1}:
-2.51327
-2.45044
-2.38761
[...]
3.64425
3.70708
3.76991
after which
julia> maximum(a)
3.769911184307752
julia> a[abs(a) .> pi] += - sign(a[a .> pi]) * 2 * pi;
julia> maximum(a)
3.141592653589793
It's the difference between
julia> 1:2:9
1:2:9
julia> [1:2:9]
5-element Array{Int32,1}:
1
3
5
7
9

Resources