Standard IEEEDoublePrecision
What is the largest and smallest number be displayed in the standard? how is it?
The largest "number" you can store in IEEE754 double precision is Infinity, the smallest is -Infinity.
If you mean non-infinite representations, then that would be roughly
±1.7976931348623157 x 10308.
See here for an excellent answer regarding IEEE754 formats. See here for a wikipedia article showing the representation.
Related
I'm using the R package MAST and it produces some impressively small P-values -- so small I didn't think they could be stored as regular floating point values. Quadruple precision reaches only $10^{-34}$ (source). How is this possible?
This isn't just R; computers in general can store tiny numbers because floating point numbers are represented with a sign bit, a fraction, and an exponent. The space reserved for the exponent permits very large and very small numbers. See the R documentation on machine precision (noting e.g. the difference between double.eps and double.xmin), and the Wikipedia page on IEEE 754-1985 which describes the original standard for representing floating-point numbers (updated in 2008).
In the Scilab documentation %pi is described as:
%pi returns the floating-point number nearest the value of π.
But what exactly is that number? Is it processor dependent?
By using "format()" in the Scilab console you can only show up to 25 digits.
As the Scilab article on %eps indicates, the floating point relative accuracy is not processor dependent: it is 2^(-52), because Scilab uses IEEE 754 double-precision binary floating-point format. According to Exploring Binary, the double-precision approximation to pi is
1.1001001000011111101101010100010001000010110100011 x 2^1
which is exactly
3.141592653589793115997963468544185161590576171875
Most of these digits are useless, as the actual decimal expansion of pi begins with
3.14159265358979323846...
The relative error is about 3.9e-17, within the promised 2^(-52) = 2.2e-16.
Can I always trust that when I divide two IEEE 754 doubles that exactly represent integers, I get the exact value?
For example:
80.0/4.0
=> 20.0
The result is exactly 20.0.
Are there some potential values where the result is inaccurate like 19.99999999?
If both integers can be exactly represented by the floating point format (i.e. for doubles, they are less than 253), then yes you will get the exact value.
Floating point is not some crazy fuzzy random algorithm, it does have well-defined deterministic behaviour. Operations that are exact will always give exact answers, and operations that are not exact will round to the representable value (assuming you haven't done anything funny with your rounding modes): "problems" arise when (1) the inputs are not exactly representable (such as 0.1, or 1020), or (2) you're performing multiple operations, some of which may cause intermediate rounding.
According to Wikipedia,
[floating point] division is accomplished by subtracting the divisor's exponent from the dividend's exponent, and dividing the dividend's significand by the divisor's significand.
If one number evenly divides another, then the former's significand should evenly divide the latter's significand. This will give you an exact result.
I ask because I am computing matrix multiplications where all the matrix values are integers.
I'd like to use LAPACK so that I get fast code that is correct. Will two large integers (whose product is less than 2^53), stored as doubles, when multiplied, yield a double containing the exact integer result?
Your analysis is correct:
All integers between -253 and 253 are exactly representable in double precision.
The IEEE754 standard requires calculations to be performed exactly, and then rounded to the nearest representable number.
Hence a product of two values that equals an integer in that range will therefore be represented exactly.
Reference: What every computer scientist should know about floating-point arithmetic. The key section is the discussion of the IEEE standard as pertaining to operations. That contains the statement of the second bullet point above. You already knew the first bullet point and it's the second point that completes the argument.
Yes! A double's data is split into its sign, exponent, and fraction:
Wikipedia has an article explaining the ranges of representable numbers
All integers between -2^53 and 2^53 are representable in double precision.
Between 2^52=4,503,599,627,370,496 and 2^53=9,007,199,254,740,992 the representable numbers are exactly the integers. For the next range, from 2^53 to 2^54, everything is multiplied by 2, so the representable numbers are the even ones, etc. Conversely, for the previous range from 2^51 to 2^52, the spacing is 0.5, etc.
Is there a function that returns the highest and lowest possible numeric values?
help(numeric) sends you to help(double) which has
Double-precision values:
All R platforms are required to work with values conforming tothe
IEC 60559 (also known as IEEE 754) standard. This basically works
with a precision of 53 bits, and represents to that precision a
range of absolute values from about 2e-308 to 2e+308. It also has
special values ‘NaN’ (many of them), plus and minus infinity and
plus and minus zero (although R acts as if these are the same).
There are also _denormal(ized)_ (or _subnormal_) numbers with
absolute values above or below the range given above but
represented to less precision.
See ‘.Machine’ for precise information on these limits. Note that
ultimately how double precision numbers are handled is down to the
CPU/FPU and compiler.
So you want to look at .Machine which on my 64-bit box has
$double.xmin
[1] 2.22507e-308
$double.xmax
[1] 1.79769e+308
help("numeric")
will ask you to do
help("double)
which will give the answer: range of absolute values from about 2e-308 to 2e+308.