Customizing Julia (as in the language) Set

Customizing Julia (as in the language) Set - julia

I've got an application where Set semantics would be useful for me, but I'm dealing with floating point numbers. I'd like to have a criterion for inclusion in the set that isn't exact equality - that is, if my Set already contains 0.5, I'd like it to ignore 0.50000000000000000018 if there's an attempt to add it. Is there existing machinery that allows this?

As long as you don't need to keep the precision up in the Set, you could use a Set{Float16} or Set{Float32}:
julia> set = Set{Float16}()
Set{Float16}()
julia> push!(set, Float16(2.3))
Set{Float16} with 1 element:
Float16(2.3)
julia> push!(set, Float16(2.300000001))
Set{Float16} with 1 element:
Float16(2.3)

Related

Failure to report number that is too small

I did the following calculations in Julia
z = LinRange(-0.09025000000000001,0.19025000000000003,5)
d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051 .* (similar(z) .*0 .+1))
minimum(cdf.(d, (z[3]+z[2])/2))
The problem I have is that the last code sometimes gives me the correct result 4.418051841202834e-239, sometimes reports the error DomainError with NaN: Normal: the condition σ >= zero(σ) is not satisfied. I think this is because 4.418051841202834e-239 is too small. But I was wondering why my code can give me different results.

In addition to points mentioned by others, here are a few more:
Firstly, don't use LinRange when numerical accuracy is of importance. This is what the range function is for. LinRange can be used when numerical precision is of lesser importance, since it is faster. From the docstring of range:
Special care is taken to ensure intermediate values are computed rationally. To avoid this induced overhead, see the LinRange constructor.
Example:
julia> LinRange(-0.09025000000000001,0.19025000000000003,5) .- range(-0.09025000000000001,0.19025000000000003,5)
0.0:-3.469446951953614e-18:-1.3877787807814457e-17
Secondly, this is a pretty terrible way to create a vector of a certain value:
0.0051 .* (similar(z) .*0 .+1)
Other's have mentioned ones, etc. but I think it's better to use fill
fill(0.0051, size(z))
which directly fills the array with the right value. Perhaps one should use convert(eltype(z), 0.0051) inside fill.
Thirdly, don't create this vector at all! You use broadcasting, so just use the scalar value:
d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051) # look! just a scalar!
This is how broadcasting works, it expands singleton dimensions implicitly to match other arguments (without actually wasting that memory).
Much of the point of broadcasting is that you don't need to create that sort of 'dummy arrays' anymore. If you find yourself doing that, give it another think; constant-valued arrays are inherently wasteful, and you shouldn't need to create them.

There are two problems:
Noted by #Dan Getz: similar does no initialize the values and quite often unused areas of memory have values corresponding to NaN. In that case multiplication by 0 does not help since NaN * 0 == NaN. Instead you want to have ones(eltype(z),size(z))
you need to use higher precision than Float64. BigFloat is one way to go - just you need to remember to call setprecision(BigFloat, 128) so you actually control how many bits you use. However, much more time-efficient solution (if you run computations at scale) will be to use a dedicated package such as DoubleFloats.
Sample corrected code using DoubleFloats below:
julia> z = LinRange(df64"-0.09025000000000001",df64"0.19025000000000003",5)
5-element LinRange{Double64, Int64}:
-0.09025000000000001,-0.020125,0.05000000000000001,0.12012500000000002,0.19025000000000003
julia> d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051 .* ones(eltype(z),size(z)))
5-element Vector{Normal{Double64}}:
Normal{Double64}(μ=-0.083250505, σ=0.0051)
Normal{Double64}(μ=-0.016631754999999998, σ=0.0051)
Normal{Double64}(μ=0.049986995000000006, σ=0.0051)
Normal{Double64}(μ=0.11660574500000001, σ=0.0051)
Normal{Double64}(μ=0.18322449500000001, σ=0.0051)
julia> minimum(cdf.(d, (z[3]+z[2])/2))
4.418051841203009e-239

The problem in the code is similar(z) which produces a vector with undefined entries and is used without initialization. Use ones(length(z)) instead.

Can I define variable in Julia just like in Fortran

I am new to Julia. Have a quick question.
In Fortran, we can do
implicit none
Therefore whenever we use a variable, we need to define it first. Otherwise it will give error.
In Julia, I want to do the same time thing. I want to define the type of each variable first, like Float64, Int64, etc. So that I wish that Julia no longer need to automatically do type conversion, which may slow down the code.
Because I know if Julia code looks like Fortran it usually runs fast.
So, in short, is there a way in Julia such that, I can force it to only use variables whose types has been defined, and otherwise give me an error message? I just want to be strict.
I just wanted to define all the variables first, then use them. Just like Fortran does.
Thanks!
[Edit]
As suggested by the experts who answers the questions, thank you very much! I know that perhaps in Julia there is no need to manually define each variables one by one. Because for one thing, if I do that, the code will become just like Fortran, and it can be very very long especially if I have many variables.
But if I do not define the type of the each variables, is Julia smart enough to know the type? Will it do some conversions which may slow down the code?
Also, even if there is no need to define variables one by one, is there in some situations we may have to manually define the type manually?

No, this is not possible* as such. You can, however, add type annotations at any point in your code, which will raise an error if the variable is not of the expected type:
julia> i = 1
1
julia> i::Int64
1
julia> i = 1.0
1.0
julia> i::Int64
ERROR: TypeError: in typeassert, expected Int64, got a value of type Float64
Stacktrace:
[1] top-level scope
# REPL[4]:1
julia> i=0x01::UInt8
0x01
*Julia is a dynamically-typed language, though there are packages such as https://github.com/aviatesk/JET.jl for static type-checking (i.e., no type annotations required) and https://github.com/JuliaDebug/Cthulhu.jl for exploring Julia's type inference system.

Strict type declaration is not how you achieve performance in Julia.
You achieve the performance by type stability.
Moreover declaring the types is usually not recommended because it decreases interoperability.
Consider the following function:
f(x::Int) = rand() < 4+x ? 1 : "0"
The types of everything seem to be known. However, this is a terrible (from performance point of view) function because the type of output can not be calcuated by looking types of input. And this is exactly how you write the performant code with regard to types.
So how to check your code? There is a special macro #code_warntype to detect such cases so you can correct your code:
Another type related issue are the containers that should not be specified by abstract elements. So you never want to have neither Vector{Any} nor Vector{Real} - rather than that you want Vector{Int} or Vector{Float64}.
See also https://docs.julialang.org/en/v1/manual/performance-tips/ for further discussion.

How do I trim a slice and get the indices (indexes) of the result?

This seems like a simple problem:
let slice = " some wacky text. ";
let trimmed = slice.trim();
// how do I get the index of the start and end within the original slice?
Attempt 1
Look for an alternative API. trim wraps trim_matches which deals with indices internally anyway: so lets copy this code! But this uses std::str::pattern::Pattern which is unstable, thus can't be used outside std in stable Rust.
Attempt 2
Just use trim and calculate the slice indices from the pointers. There's a nice as_ptr_range method, but its also unstable; luckily as the PR says there's an easy work-around.
let slice_ptr = slice.as_ptr();
let trimmed_ptr = trimmed.as_ptr();
// don't bother about the end (we can use trimmed.len())
Now that we've got some pointers, we need their difference. sub is not the right method for this. offset_from is, but it's unstable (as noted in the design, it's only valid use is to compare two pointers into the same slice, which is exactly what we want to do, unfortunately it's yet another thing delayed by the details).
Now, there are hackier ways of solving this problem. We could transmute the pointers to usize (we know the element size is 1 byte, so no need to multiply). But this is most likely the Undefined Behaviour type of unsafe, so lets not go there.
Attempt 3
Edit: the source problem is easy to solve directly, so probably the answer in this case is roll-my-own. Possibly I should just close this.

Julia JuMP returns Binary variable at 0.99

I cannot put the whole code here, and was not able to reproduce the problem with a small code, but here is the beginning of the code:
using JuMP, Cbc, StatsBase
n = 3;
V = 1:(2n+1);
model = Model(with_optimizer(Cbc.Optimizer, seconds=120));
#variable(model, x[V], Bin);
...
#objective(model, Min, total_blah);
JuMP.optimize!(model)
result = termination_status(model)
JuMP.objective_value(model)
xsol = JuMP.value.(x);
The problem I have is that the solver returns a solution where some of the xsol have values like 0.99995, where I am expecting Binary, ie either 0 or 1.
Can someone explain this behavior?

I looked this up and CBC has an option called integerTolerance (or integerT) that helps CBC to decide whether a variable is integer valued. Using CBC.exe, I see:
Coin:integerTolerance
integerTolerance has value 1e-006
Indeed the default is 1e-6. You cannot set it to zero but you can make it smaller (valid range is 1e-020 to 0.5). (The only solver I know of that allows this tolerance to be set to zero is Cplex; usually doing that leads to longer solution times).
In general I would advice to keep it as it is. If small deviations from integer values irritate you, I would round integer variables in the solution before printing. This gives better looking solutions (but this rounding step may make the solution slightly infeasible).

Converting a Gray-Scale Array to a FloatingPoint-Array

I am trying to read a .tif-file in julia as a Floating Point Array. With the FileIO & ImageMagick-Package I am able to do this, but the Array that I get is of the Type Array{ColorTypes.Gray{FixedPointNumbers.Normed{UInt8,8}},2}.
I can convert this FixedPoint-Array to Float32-Array by multiplying it with 255 (because UInt8), but I am looking for a function to do this for any type of FixedPointNumber (i.e. reinterpret() or convert()).
using FileIO
# Load the tif
obj = load("test.tif");
typeof(obj)
# Convert to Float32-Array
objNew = real.(obj) .* 255
typeof(objNew)
The output is
julia> using FileIO
julia> obj = load("test.tif");
julia> typeof(obj)
Array{ColorTypes.Gray{FixedPointNumbers.Normed{UInt8,8}},2}
julia> objNew = real.(obj) .* 255;
julia> typeof(objNew)
Array{Float32,2}
I have been looking in the docs quite a while and have not found the function with which to convert a given FixedPoint-Array to a FloatingPont-Array without multiplying it with the maximum value of the Integer type.
Thanks for any help.
edit:
I made a small gist to see if the solution by Michael works, and it does. Thanks!
Note:I don't know why, but the real.(obj) .* 255-code does not work (see the gist).

Why not just Float32.()?
using ColorTypes
a = Gray.(convert.(Normed{UInt8,8}, rand(5,6)));
typeof(a)
#Array{ColorTypes.Gray{FixedPointNumbers.Normed{UInt8,8}},2}
Float32.(a)

The short answer is indeed the one given by Michael, just use Float32.(a) (for grayscale). Another alternative is channelview(a), which generally performs channel separation thus also stripping the color information from the array. In the latter case you won't get a Float32 array, because your image is stored with 8 bits per pixel, instead you'll get an N0f8 (= FixedPointNumbers.Normed{UInt8,8}). You can read about those numbers here.
Your instinct to multiply by 255 is natural, given how other image-processing frameworks work, but Julia has made some effort to be consistent about "meaning" in ways that are worth taking a moment to think about. For example, in another programming language just changing the numerical precision of an array:
img = uint8(255*rand(10, 10, 3)); % an 8-bit per color channel image
figure; image(img)
imgd = double(img); % convert to double-precision, but don't change the values
figure; image(imgd)
produces the following surprising result:
That second "all white" image represents saturation. In this other language, "5" means two completely different things depending on whether it's stored in memory as a UInt8 vs a Float64. I think it's fair to say that under any normal circumstances, a user of a numerical library would call this a bug, and a very serious one at that, yet somehow many of us have grown to accept this in the context of image processing.
These new types arise because in Julia we've gone to the effort to implement new numerical types (FixedPointNumbers) that act like fractional values (e.g., between 0 and 1) but are stored internally with the same bit pattern as the "corresponding" UInt8 (the one you get by multiplying by 255). This allows us to work with 8-bit data and yet allow values to always be interpreted on a consistent scale (0.0=black, 1.0=white).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Customizing Julia (as in the language) Set - julia

Related

Failure to report number that is too small

Can I define variable in Julia just like in Fortran

How do I trim a slice and get the indices (indexes) of the result?

Julia JuMP returns Binary variable at 0.99

Converting a Gray-Scale Array to a FloatingPoint-Array

Categories

Resources