Mean function incorrect value - julia

I have an 80 element array with the same entries: 176.01977965813853
If I use the mean function I will get the value 176.01977965813842
Why is that?
Here is a minimal working example:
using Statistics
arr = fill(176.01977965813853, 80)
julia> mean(arr)
176.01977965813842
I expected this to return 176.01977965813853.

These are just expected floating point errors. But if you need very precise summations, you can use a a bit more elaborate (and costly) summation scheme:
julia> using KahanSummation
[ Info: Precompiling KahanSummation [8e2b3108-d4c1-50be-a7a2-16352aec75c3]
julia> sum_kbn(fill(176.01977965813853, 80))/80
176.01977965813853
Ref: Wikipedia

The problem as I understand it can be reproduced as follows:
using Statistics
arr = fill(176.01977965813853, 80)
julia> mean(arr)
176.01977965813842
The reason for this is that julia does all floating point arithmetic with 64 bits of precision by default (i.e. the Float64 type). Float64s cannot represent any real number. There is a finite step between each floating point number and rounding errors are incurred when you do arithmetic on them. These rounding errors are usually fine, but if you're not careful, they can be catastrophic. For instance:
julia> 1e100 + 1.0 - 1e100
0.0
That says that if I do 10^100 + 1 - 10^100 I get zero! If you want to get an upper bound on the errors caused by floating point arithmetic, we can use IntervalArithmetic.jl:
using IntervalArithmetic
julia> 1e100 + interval(1.0) - 1e100
[0, 1.94267e+84]
That says that the operation 1e100 + 1.0 - 1e100 is at least equal to 0.0 and at most 1.94*10^84, so the error bounds are huge!
We can do the same for the operation you were interested in,
arr = fill(interval(176.01977965813853), 80);
julia> mean(arr)
[176.019, 176.02]
julia> mean(arr).lo
176.019779658138
julia> mean(arr).hi
176.0197796581391
which says that the actual mean could be at least 176.019779658138 or at most 176.0197796581391, but one can't be any more certain due to floating point error! So here, Float64 gave the answer with at most 10^-13 percent error, which is actually quite small.
What if those are unacceptable error bounds? Use more precision! You can use the big string macro to get arbitrary precision number literals:
arr = fill(interval(big"176.01977965813853"), 80);
julia> mean(arr).lo
176.0197796581385299999999999999999999999999999999999999999999999999999999999546
julia> mean(arr).hi
176.019779658138530000000000000000000000000000000000000000000000000000000000043
That calculation was done using 256 bits of precision, but you can get even more precision using the setprecision function:
setprecision(1000)
arr = fill(interval(big"176.01977965813853"), 80);
julia> mean(arr).lo
176.019779658138529999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999599
julia> mean(arr).hi
176.019779658138530000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000579
Note that arbitrary precision arithmetic is sloooow compared to Float64s, so it's usually best to just use arbitrary precision arithmetic to validate your results to make sure you're converging to a good result within your desired accuracy.

Related

Computing harmonic series for very large N (arbitrary precision problems)

This is a followup question to a previous one I made.
I'm trying to compute the Harmonic series to very large terms, however when comparing to log(n)+γ I'm not getting the expected error.
I suspect the main problem is with the BigFloat julia type.
harmonic_bf = function(n::Int64)
x=BigFloat(0)
for i in n:-1:1
x += BigFloat(1/i)
end
x
end
For example it is well known that the lower bound for the formula: H_n - log(n) - γ is 1/2/(n+1).
However, this holds for n=10^7 then fails for n=10^8.
n=10^8
γ = big"0.57721566490153286060651209008240243104215933593992"
lower_bound(n) = 1/2/(n+1)
>>> harmonic_bf(n)-log(n)-γ > lower_bound(BigFloat(n))
false
It's driving me crazy, I can't seem to understand what is missing... BigFloat supossedly should get arithmetic precision problems out of the way, however it seems not to be the case.
Note: I tried with BigFloat with unset precision and with 256 bits of precision.
You have to make sure that you use BigFloat everywhere. First in your function (notice that BigFloat(1/n) is not the same as 1/BigFloat(i)):
function harmonic_bf(n::Int64)
x=BigFloat(0)
for i in n:-1:1
x += 1/BigFloat(i)
end
x
end
and then in the test (notice BigFloat under log):
julia> harmonic_bf(n)-log(BigFloat(n))-γ > lower_bound(BigFloat(n))
true

Is there standard way to get numerical consistency in Julia

└─╼ julia
Version 0.6.0 (2017-06-19 13:05 UTC)
julia> 1.0 + 0.1 - 1.0 - 0.1 == 0
false
julia> 1.0 + 0.1 - 1.0 - 0.1
8.326672684688674e-17
i understand that decimals like 0.1 cannot be represented exactly in binary based point without some additional effort e.g.
julia> 1//10
1//10
julia> 1 + 1//10
11//10
julia> 1 + 1//10 - 1
1//10
julia> 1 + 1//10 - 1 - 1//10
0//1
julia> 1 + 1//10 - 1 - 1//10 == 0
true
or going purely symbolic.
there are several rounding options:
julia> Round
RoundDown RoundNearest RoundNearestTiesUp RoundUp
RoundFromZero RoundNearestTiesAway RoundToZero RoundingMode
Without launching into a protracted discussion of numerical stability, does Julia have a recommended style?
thx
This really isn't a question about Julia. This will show up in any language using IEEE floating point arithmetic, since Julia just uses the standard. So the standard rules apply.
Don't expect floating point calculations to be exact. Instead, test floating point sameness using isapprox (or \approx for ≈) with a tolerance set appropriately.
If you need true decimals, you should use rationals like in that example you have.
Another helpful thing may be DecFP.jl which uses IEEE decimal arithmetic and thus is more precise in this kind of example.
If you need to be more precise, use higher precision. BigFloats have their purpose.
Also there is sum_kbn which may be all you need for your application:
julia> sum([1.0, 0.1, - 1.0, - 0.1])
8.326672684688674e-17
julia> sum_kbn([1.0, 0.1, - 1.0, - 0.1])
0.0
help?> sum_kbn
search: sum_kbn cumsum_kbn
sum_kbn(A)
Returns the sum of all elements of A, using the Kahan-Babuska-Neumaier compensated summation algorithm for additional accuracy.
Generally speaking, it is useless to worry about this "inconsistency". Your number is wrong in the sixteenth decimal, which is much less than the size of an atom when you measure the circumference of the Earth.
In practice the quantities you are dealing with are measured with some resolution and accuracy, are described by approximate models and computed with truncated methods.
It matters to take care of the real sources of errors and the numerical processes that amplify them. In other words, have a feeling for error calculus. Moving to exact arithmetic is often a nonsense.

julia floating point compare for zero

julia> r
3×3 Array{Float64,2}:
-1.77951 -0.79521 -2.57472
0.0 0.630793 0.630793
0.0 0.0 -1.66533e-16
julia> sort(abs(diag(r)))[1]
1.6653345369377348e-16
julia> isequal(floor(sort(abs(diag(r)))[1]),0)
true
But this is not right
julia> isequal(sort(abs(diag(r)))[1],convert(AbstractFloat,0.0))
false
Is there a function in Julia to check for floating point equivalent to zero?
-1.66533e-16 is not equivalent to zero. It is, however, approximately equivalent to zero (with respect to a particular tolerance), and julia does provide just such a function:
isapprox(1e-16, 0.0; atol=1e-15, rtol=0)
edit: or as Chris pointed out, a good choice for atol is eps() which corresponds to machine precision for that particular type:
julia> isapprox(1e-20, 0.0; atol=eps(Float64), rtol=0)
true
Do read the description for isapprox to see what the default arguments mean, and to see if you prefer an "absolute" or "relative" tolerance approach (or a mixed approach). Though for a comparison to zero specifically, using an absolute tolerance is fine and probably more intuitive.

Machine Arithmetic and Smearing: addition of a large an small number

So to 10000 one will add the value 1/10000 10000times. Logically this gives 10001.
However, due to smearing this does not occur which stems from storage limitations. The result is 10000.999999992928.
I have located where the smearing occurs, which is in the second addition:
1: 10000.0001
2: 10000.000199999999
3: 10000.000299999998
4: 10000.000399999997
etc...
However, grasping why the smearing occurred is where the struggle lies.
I wrote code to generate floating point binary numbers to see whether smearing occurred here
So 10000 = 10011100010000 or 1.001110001*10**13 while
0.0001= 0.00000000000001101001 or
1.1010001101101110001011101011000111000100001100101101*2**(-14)
then 10000.0001 = 10011100010000.00000000000001101001
Now the smearing occurs in the next addition. Does it have to do with mantissa size? Why does it only occur in this step as well? Just interested to know. I am going to add all the 1/10000 first and then add it to the 10000 to avoid smaering.
The small "smearing" error for a single addition can be computed exactly as
a=10000; b=0.0001
err = ((a+b)-a)-b
print "err=",err
>>> err= -7.07223084891e-13
The rounding error of an addition is of size (abs(a)+abs(b))*mu/2 or around 1e4 * 1e-16 = 1e-12, which nicely fits the computed result.
In general you also have to test the expression ((a+b)-b)-a, but one of them is always zero, here the latter one.
And indeed this single step error accumulated over all the steps already gives the observed result, secondary errors relating to the slow increase in the sum as first term in each addition having a much lower impact.
print err*10000
>>> -7.072230848908026e-09
print 10001+err*10000
>>> 10000.999999992928
The main problem is that 1/10000 i.e. 0.0001 cannot be encoded exactly as a machine float value (see the IEEE 754 standard), since 10000 is not a power of 2. Also 1/10 = 0.1 cannot be encoded as machine float, so you will experience phanomena like 0.1 + 0.1 + 0.1 > 0.3.
When computing with double precision (64 bit) the following holds:
1.0001 - 1 < 0.0001
10000.0001 + 9999*0.0001 == 10001
So I assume you are computing with single precision (32 bit)?

exp function in Julia evaluating to 0

I want to calculate and plot the probability density of a wave function in Julia. I wrote a small snippet of Julia code for evaluating the following function:
The Julia (incomplete) code is:
set_bigfloat_precision(100)
A = 10
C = 5
m = BigFloat(9.10938356e-31)
ℏ = BigFloat(1.054571800e-34)
t = exp(-(sqrt(C * m) / ℏ))
The last line where I evaluate t gives 0.000000000000.... I tried to set the precision of the BigFloat as well. No luck! What am I doing wrong? Help appreciated.
While in comments Chris Rackauckas has pointed out you entered the formula wrong. I figured it was interesting enough to answer the question anyway
Lets break it down so we can see what we are raising:
A = 10
C = 5
m = BigFloat(9.10938356e-31)
h = BigFloat(1.054571800e-34)
z = -sqrt(C * m)/h
t = exp(z)
So
z =-2.0237336022083455711032042949257e+19
so very roughly z=-2e19)
so roughly t=exp(-2e19) (ie t=1/((e^(2*10^19)))
That is a very small number.
Consider that
exp(big"-1e+10") = 9.278...e-4342944820
and
exp(big"-1e+18") = 2.233...e-434294481903251828
and yes, julia says:
exp(big"-2e+19) = 0.0000
exp(big"-2e+19) is a very small number.
That puts us in context I hope. Very small number.
So julia depends on MPFR for BigFloats
You can try MPFR online. At precision 8192, exp(-2e10)=0
So same result.
Now, it is not the precision that we care about.
But rather the range of the exponant.
MPFR use something kinda like IEEE style floats, where precision is the length of the mantissa, and then you have a exponent. 2^exponent * mantissa
So there is a limit on the range of the exponent.
See: MPFR docs:
Function: mpfr_exp_t mpfr_get_emin (void)
Function: mpfr_exp_t mpfr_get_emax (void)
Return the (current) smallest and largest exponents allowed for a floating-point variable. The smallest positive value of a floating-point variable is one half times 2 raised to the smallest exponent and the largest value has the form (1 - epsilon) times 2 raised to the largest exponent, where epsilon depends on the precision of the considered variable.
Now julia does set these to there maximum range the fairly default MPFR compile will allow. I've been digging around the MPFR source trying to find where this is set, but can't find it. I believe it is related to the max fault a Int64 can hold.
Base.MPFR.get_emin() = -4611686018427387903 =typemin(Int64)>>1 + 1
You can adjust this but only up.
So anyway
0.5*big"2.0"^(Base.MPFR.get_emin()) = 8.5096913117408361391297879096205e-1388255822130839284
but
0.5*big"2.0"^(Base.MPFR.get_emin()-1) = 0.00000000000...
Now we know that
exp(x) = 2^(log(2,e)*x)
So we can exp(z) = 2^(log(2,e)*z)
log(2,e)*z = -29196304319863382016
Base.MPFR.get_emin() = -4611686018427387903
So since the exponent (rough -2.9e19) is less than the minimum allowed exponent (roughly -4.3e17).
An underflow occurs.
Thus your answer as to why you get zero.
It may (or may not) be possible to recomplile MPFR with Int128 exponents, but julia hasn't.
Perhaps julia should throw a Underflow exception.
Free encouraged to report that as an issue on the Julia Bug Tracker.

Resources