Converting a Gray-Scale Array to a FloatingPoint-Array - julia

I am trying to read a .tif-file in julia as a Floating Point Array. With the FileIO & ImageMagick-Package I am able to do this, but the Array that I get is of the Type Array{ColorTypes.Gray{FixedPointNumbers.Normed{UInt8,8}},2}.
I can convert this FixedPoint-Array to Float32-Array by multiplying it with 255 (because UInt8), but I am looking for a function to do this for any type of FixedPointNumber (i.e. reinterpret() or convert()).
using FileIO
# Load the tif
obj = load("test.tif");
typeof(obj)
# Convert to Float32-Array
objNew = real.(obj) .* 255
typeof(objNew)
The output is
julia> using FileIO
julia> obj = load("test.tif");
julia> typeof(obj)
Array{ColorTypes.Gray{FixedPointNumbers.Normed{UInt8,8}},2}
julia> objNew = real.(obj) .* 255;
julia> typeof(objNew)
Array{Float32,2}
I have been looking in the docs quite a while and have not found the function with which to convert a given FixedPoint-Array to a FloatingPont-Array without multiplying it with the maximum value of the Integer type.
Thanks for any help.
edit:
I made a small gist to see if the solution by Michael works, and it does. Thanks!
Note:I don't know why, but the real.(obj) .* 255-code does not work (see the gist).

Why not just Float32.()?
using ColorTypes
a = Gray.(convert.(Normed{UInt8,8}, rand(5,6)));
typeof(a)
#Array{ColorTypes.Gray{FixedPointNumbers.Normed{UInt8,8}},2}
Float32.(a)

The short answer is indeed the one given by Michael, just use Float32.(a) (for grayscale). Another alternative is channelview(a), which generally performs channel separation thus also stripping the color information from the array. In the latter case you won't get a Float32 array, because your image is stored with 8 bits per pixel, instead you'll get an N0f8 (= FixedPointNumbers.Normed{UInt8,8}). You can read about those numbers here.
Your instinct to multiply by 255 is natural, given how other image-processing frameworks work, but Julia has made some effort to be consistent about "meaning" in ways that are worth taking a moment to think about. For example, in another programming language just changing the numerical precision of an array:
img = uint8(255*rand(10, 10, 3)); % an 8-bit per color channel image
figure; image(img)
imgd = double(img); % convert to double-precision, but don't change the values
figure; image(imgd)
produces the following surprising result:
That second "all white" image represents saturation. In this other language, "5" means two completely different things depending on whether it's stored in memory as a UInt8 vs a Float64. I think it's fair to say that under any normal circumstances, a user of a numerical library would call this a bug, and a very serious one at that, yet somehow many of us have grown to accept this in the context of image processing.
These new types arise because in Julia we've gone to the effort to implement new numerical types (FixedPointNumbers) that act like fractional values (e.g., between 0 and 1) but are stored internally with the same bit pattern as the "corresponding" UInt8 (the one you get by multiplying by 255). This allows us to work with 8-bit data and yet allow values to always be interpreted on a consistent scale (0.0=black, 1.0=white).

Related

Failure to report number that is too small

I did the following calculations in Julia
z = LinRange(-0.09025000000000001,0.19025000000000003,5)
d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051 .* (similar(z) .*0 .+1))
minimum(cdf.(d, (z[3]+z[2])/2))
The problem I have is that the last code sometimes gives me the correct result 4.418051841202834e-239, sometimes reports the error DomainError with NaN: Normal: the condition σ >= zero(σ) is not satisfied. I think this is because 4.418051841202834e-239 is too small. But I was wondering why my code can give me different results.
In addition to points mentioned by others, here are a few more:
Firstly, don't use LinRange when numerical accuracy is of importance. This is what the range function is for. LinRange can be used when numerical precision is of lesser importance, since it is faster. From the docstring of range:
Special care is taken to ensure intermediate values are computed rationally. To avoid this induced overhead, see the LinRange constructor.
Example:
julia> LinRange(-0.09025000000000001,0.19025000000000003,5) .- range(-0.09025000000000001,0.19025000000000003,5)
0.0:-3.469446951953614e-18:-1.3877787807814457e-17
Secondly, this is a pretty terrible way to create a vector of a certain value:
0.0051 .* (similar(z) .*0 .+1)
Other's have mentioned ones, etc. but I think it's better to use fill
fill(0.0051, size(z))
which directly fills the array with the right value. Perhaps one should use convert(eltype(z), 0.0051) inside fill.
Thirdly, don't create this vector at all! You use broadcasting, so just use the scalar value:
d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051) # look! just a scalar!
This is how broadcasting works, it expands singleton dimensions implicitly to match other arguments (without actually wasting that memory).
Much of the point of broadcasting is that you don't need to create that sort of 'dummy arrays' anymore. If you find yourself doing that, give it another think; constant-valued arrays are inherently wasteful, and you shouldn't need to create them.
There are two problems:
Noted by #Dan Getz: similar does no initialize the values and quite often unused areas of memory have values corresponding to NaN. In that case multiplication by 0 does not help since NaN * 0 == NaN. Instead you want to have ones(eltype(z),size(z))
you need to use higher precision than Float64. BigFloat is one way to go - just you need to remember to call setprecision(BigFloat, 128) so you actually control how many bits you use. However, much more time-efficient solution (if you run computations at scale) will be to use a dedicated package such as DoubleFloats.
Sample corrected code using DoubleFloats below:
julia> z = LinRange(df64"-0.09025000000000001",df64"0.19025000000000003",5)
5-element LinRange{Double64, Int64}:
-0.09025000000000001,-0.020125,0.05000000000000001,0.12012500000000002,0.19025000000000003
julia> d = Normal.(0.05*(1-0.95) .+ 0.95.*z .- 0.0051^2/2, 0.0051 .* ones(eltype(z),size(z)))
5-element Vector{Normal{Double64}}:
Normal{Double64}(μ=-0.083250505, σ=0.0051)
Normal{Double64}(μ=-0.016631754999999998, σ=0.0051)
Normal{Double64}(μ=0.049986995000000006, σ=0.0051)
Normal{Double64}(μ=0.11660574500000001, σ=0.0051)
Normal{Double64}(μ=0.18322449500000001, σ=0.0051)
julia> minimum(cdf.(d, (z[3]+z[2])/2))
4.418051841203009e-239
The problem in the code is similar(z) which produces a vector with undefined entries and is used without initialization. Use ones(length(z)) instead.

Strange pair of type declarations

Okay, so it may not be that strange, but I'm really new to Ada. In my job, I am translating legacy Ada to C, and have come across something that I haven't seen yet. I searched around, but couldn't really find it; here it is.
type Discrete_Names is ( ENUM_POS_4, --label names in an enum
ENUM_POS_5, --that evaluate to 4, 5, and 6
ENUM_POS_6); --respectively
type Discrete_Array_Type is Array (Discrete_Names) of Discrete.Does_Not_Matter
Side note—the Discrete.Does_Not_Matter just references another type in a different library.
It would be great if someone could just help me get my bearings and just figure out what is going on here.
Well, it is quite simple. In Ada arrays can be indexed by any discrete type, that is, integers, characters or enumeration types (your case). The line
type Discrete_Array_Type is Array (Discrete_Names) of Does_Not_Matter
declares Discrete_Array_Type as the type of an array that contains values of type Does_Not_Matter and it is indexed by values of type Discrete_Names.
If your doubt stems from the fact that ENUM_POS_4 has Pos equal to 4 -- so that it seems that the first index of the array is 4 and not 0 -- my suggestion is... forget about it. The compiler will take care of that. In Ada arrays can start from any index. For example, if you say
type Array_Foo is array(Positive range <>) of Characters;
Bar : Array_Foo(10..15);
Bar will be just 6 entries long (not 16) and when you access Bar(12) the compiler -- behind the scenes -- will remove the initial offset "10" to "12" so that you will access the third memory location reserved to Bar. (Actually, I think that for the sake of efficiency it will add 12 to the address of Bar diminished by 10 times the integer sizes, but this is a detail...)
My personal experience is that in cases like this you should not consider the enumerative type like a "integer in disguise" (although it will be internally represented by an integer), but like a type of its own that can be used to index an array. Let the compiler worry about the internal low-level details.

Convert RGBA{U8}(0.384,0.0,0.0,1.0) to Integer

I am using Images.jl in Julia. I am trying to convert an image into a graph-like data structure (v,w,c) where
v is a node
w is a neighbor and
c is a cost function
I want to give an expensive cost to those neighbors which have not the same color. However, when I load an image each pixel has the following Type RGBA{U8}(1.0,1.0,1.0,1.0), is there any way to convert this into a number like Int64 or Float?
If all you want to do is penalize adjacent pairs that have different color values (no matter how small the difference), I think img[i,j] != img[i+1,j] should be sufficient, and infinitely more performant than calling colordiff.
Images.jl also contains methods, raw and separate, that allow you to "convert" that image into a higher-dimensional array of UInt8. However, for your apparent application this will likely be more of a pain, because you'll have to choose between using a syntax like A[:, i, j] != A[:, i+1, j] (which will allocate memory and have much worse performance) or write out loops and check each color channel manually. Then there's always the slight annoyance of having to special case your code for grayscale and color, wondering what a 3d array really means (is it 3d grayscale or 2d with a color channel?), and wondering whether the color channel is stored as the first or last dimension.
None of these annoyances arise if you just work with the data directly in RGBA format. For a little more background, they are examples of Julia's "immutable" objects, which have at least two advantages. First, they allow you to clearly specify the "meaning" of a certain collection of numbers (in this case, that these 4 numbers represent a color, in a particular colorspace, rather than, say, pressure readings from a sensor)---that means you can write code that isn't forced to make assumptions that it can't enforce. Second, once you learn how to use them, they make your code much prettier all while providing fantastic performance.
The color types are documented here.
Might I recommend converting each pixel to greyscale if all you want is a magnitude difference.
See this answer for a how-to:
Converting RGB to grayscale/intensity
This will give you a single value for intensity that you can then use to compare.
Following #daycaster's suggestion, colordiff from Colors.jl can be used.
colordiff takes two colors as arguments. To use it, you should extract the color part of the pixel with color i.e. colordiff(color(v),color(w)) where v would be RGBA{U8(0.384,0.0,0.0,1.0) value.

SML fibonacci large numbers

I tried to write my own fib function that works for large numbers (over 50) and I had no luck. First I tried the obvious solution but that overflows way to quicly. my next solution was this
$fun fib(a:int, b:int, index:int) = if(index = 1) then
$ (a+b)
$ else
$ fib(b, (a+b), index - 1);
Unfortunatly this also overflows.
You need to take a look at the IntInf module, which provides access to arbitrary precision integers.
You can convert from Int.int to IntInf.int using IntInf.fromInt.
Note, for any operations you do on them, you have to use IntInf.<operation> instead of the Int counterpart. This includes things like addition and the likes.
Note that in Poly/ML, both structure Int and IntInf offer unbounded (big) integers by default. Since the implementation uses the GNU MP library at the bottom if it, and small machine integers in the range where this is still possible, it is also quite fast.

Stackoverflow with specialized Hashtbl (via Hashtbl.make)

I am using this piece of code and a stackoverflow will be triggered, if I use Extlib's Hashtbl the error does not occur. Any hints to use specialized Hashtbl without stackoverflow?
module ColorIdxHash = Hashtbl.Make(
struct
type t = Img_types.rgb_t
let equal = (==)
let hash = Hashtbl.hash
end
)
(* .. *)
let (ctable: int ColorIdxHash.t) = ColorIdxHash.create 256 in
for x = 0 to width -1 do
for y = 0 to height -1 do
let c = Img.get img x y in
let rgb = Color.rgb_of_color c in
if not (ColorIdxHash.mem ctable rgb) then ColorIdxHash.add ctable rgb (ColorIdxHash.length ctable)
done
done;
(* .. *)
The backtrace points to hashtbl.ml:
Fatal error: exception Stack_overflow Raised at file "hashtbl.ml",
line 54, characters 16-40 Called from file "img/write_bmp.ml", line
150, characters 52-108 ...
Any hints?
Well, you're using physical equality (==) to compare the colors in your hash table. If the colors are structured values (I can't tell from this code), none of them will be physically equal to each other. If all the colors are distinct objects, they will all go into the table, which could really be quite a large number of objects. On the other hand, the hash function is going to be based on the actual color R,G,B values, so there may well be a large number of duplicates. This will mean that your hash buckets will have very long chains. Perhaps some internal function isn't tail recursive, and so is overflowing the stack.
Normally the length of the longest chain will be 2 or 3, so it wouldn't be surprising that this error doesn't come up often.
Looking at my copy of hashtbl.ml (OCaml 3.12.1), I don't see anything non-tail-recursive on line 54. So my guess might be wrong. On line 54 a new internal array is allocated for the hash table. So another idea is just that your hashtable is just getting too big (perhaps due to the unwanted duplicates).
One thing to try is to use structural equality (=) and see if the problem goes away.
One reason you may have non-termination or stack overflows is if your type contains cyclic values. (==) will terminates on cyclic values (while (=) may not), but Hash.hash is probably not cycle-safe. So if you manipulate cyclic values of type Img_types.rgb_t, you have to devise your one cycle-safe hash function -- typically, calling Hash.hash on only one of the non-cyclic subfields/subcomponents of your values.
I've already been bitten by precisely this issue in the past. Not a fun bug to track down.

Resources