Unexpected, exact integer results of floating point operations - r

R produces an exact decimal answer of 1 for 1/3*3. This shouldn't be the case for floating point operations with unavoidable roundoff error:
1/3 ~ 0.(01)
3 ~ 11
1/3 * 3 ~ 0.(1)
Looking at the output from R beyond the limits of double point precision, R computes 1/3*3 as exactly (base-10) 1. How is this result possible when using floating point arithmetic?
> sprintf("%.60f",1/3)
[1] "0.333333333333333314829616256247390992939472198486328125000000"
> sprintf("%.60f",3)
[1] "3.000000000000000000000000000000000000000000000000000000000000"
> sprintf("%.60f",1/3*3)
[1] "1.000000000000000000000000000000000000000000000000000000000000"

Related

How many Binary Digits needed for X Decimal Digits of Square Root of 2

I've been playing around with calculating the square root of 2 and the like. It's easy to come up with an algorithm that will produce n correct binary digits. What I'd like help with is determining how many binary digits I need to get m correct decimal digits? m Binary digits will get me m Decimal digits, but the m decimal digits may not all be correct yet.
EDIT:
I've determined that the lower bound on the binary precision = ceil(log2(10^m)).
Thinking about it there might not be a strict upper-bound, since a carry from any lower power of 2 (when converting to base 10) could potentially effect any higher digit base 10.
This may thus be a dynamic problem that requires evaluating the fractional expansion at m binary digits and determining which additional binary digits could potentially cause a carry in base 10.
Edit 2: I was probably overthinking this. After the initial calculation I can keep adding (1x10^(-precision)) and squaring the result until I exceed 2 - and then subtract (1x10^(-precision)) and I'll have my answer. Nevertheless I am still interested in finding/developing such an algorithm :)
Let x be a real and y be its approximation.
Let RE be the relative error of y with respect to x:
RE(x, y) = abs(x - y) / abs(x)
Let b be a nonnegative integer. The Log-Relative Error in base b is defined as:
LREb(x, y) = -logb(RE(x, y))
where logb is the base-b logarithm:
logb(z) = log(z) / log(b)
for any nonnegative z.
The LRE in base b represents the number of common digits between x and y. Here, the "number of correct digits" is not an integer, but a real number: this will simplify the next calculations avoiding the need for ceil and floor functions, provided that we accept statements such as : "y has 2.3 correct digits with respect to x". More precisely, if x and y have q common base b digits, then:
LREb(x, y) >= q - 1
With these equation, if the relative error has an upper bound, then the LREb has a lower bound. More precisely, if:
RE(x, y) <= epsilon
then:
LREb(x, y) >= -logb(epsilon)
Also, if the number of correct digits in base 10 is LRE10 = p, then RE = 10^-p, which implies that the number of correct digits in base 2 is:
LRE2 = -log2(10^-p)
what method you are using?
I am assuming binary search of x in y = x^2
integer part is limited by the result sqrt(y) and cannot be cut otherwise result would be wrong. However the x is limited by half the bits of y so:
ni2 = log2(|y|)
fractional part is tricky see:
the relation between binary and decimal digits
but after the nonlinear start of first digits the dependence stabilizes here reversed formula from linked answer:
nf2 = (((nf10-7.810)/9.6366363636363636363636)+1.0)<<5;
ni2 is integer part binary bits/digits
nf2 is fractional part binary bits/digits
nf10 is fractional part decadic digits
btw I used 32 bit aligned values as that is what I use for my arithmetics so:
9.6366363636363636363636 = 32/0.30102999566398119521373889472449
0.30102999566398119521373889472449 = log10(2)

Getting 0 in R instead of a precise result

How can I get the actual precise result instead of the rounded ones?
result = ((0.61)**(10435)*(0.39)**(6565))/((0.63)**(5023)*(0.60)**(5412)*(0.37)**(2977)*(0.40)**(3588))
out:
NaN
Because, denominator is 0
I think logarithm is a power tool to deal with exponential with large powers (see it properties in https://mathworld.wolfram.com/Logarithm.html)
You can try to use log first over your math expression and then apply exp in turn, i.e.,
result <- exp((10435*log(0.61)+6565*log(0.39)) - (5023*log(0.63)+5412*log(0.60)+ 2977*log(0.37)+3588*log(0.40)))
which gives
> result
[1] 0.001219116
R cannot handle such large exponents because that will converge to 0 beyond its precision. Precision is not infinite. For what you want, you need an arbitrary precision package, such as Rmpfr.
library(Rmpfr)
precision <- 120
result <- (mpfr(0.61, precision)**10435 * mpfr(0.39, precision)**6565) /
(mpfr(0.63, precision)**5023 * mpfr(0.60, precision)**5412 * mpfr(0.37, precision)**2977 * mpfr(0.40, precision)**3588)
print(result)
Output:
1 'mpfr' number of precision 120 bits
[1] 0.0012191160601483692718001967190171336975

R: approximating `e = exp(1)` using `(1 + 1 / n) ^ n` gives absurd result when `n` is large

So, I was just playing around with manually calculating the value of e in R and I noticed something that was a bit disturbing to me.
The value of e using R's exp() command...
exp(1)
#[1] 2.718282
Now, I'll try to manually calculate it using x = 10000
x <- 10000
y <- (1 + (1 / x)) ^ x
y
#[1] 2.718146
Not quite but we'll try to get closer using x = 100000
x <- 100000
y <- (1 + (1 / x)) ^ x
y
#[1] 2.718268
Warmer but still a bit off...
x <- 1000000
y <- (1 + (1 / x)) ^ x
y
#[1] 2.71828
Now, let's try it with a huge one
x <- 5000000000000000
y <- (1 + (1 / x)) ^ x
y
#[1] 3.035035
Well, that's not right. What's going on here? Am I overflowing the data type and need to use a certain package instead? If so, are there no warnings when you overflow a data type?
You've got a problem with machine precision. As soon as (1 / x) < 2.22e-16, 1 + (1 / x) is just 1. Mathematical limit breaks down in finite-precision numerical computations. Your final x in the question is already 5e+15, very close to this brink. Try x <- x * 10, and your y would be 1.
This is neither "overflow" nor "underflow" as there is no difficulty in representing a number as small as 1e-308. It is the problem of the loss of significant digits during floating-point arithmetic. When you do 1 + (1 / x), the bigger x is, the fewer significant digits in the (1 / x) part can be preserved when you add it to 1, and eventually you lose that (1 / x) term altogether.
## valid 16 significant digits
1 + 1.23e-01 = 1.123000000000000|
1 + 1.23e-02 = 1.012300000000000|
... ...
1 + 1.23e-15 = 1.000000000000001|
1 + 1.23e-16 = 1.000000000000000|
Any numerical analysis book would tell you the following.
Avoid adding a large number and a small number. In floating-point addition a + b = a * (1 + b / a), if b / a < 2.22e-16, there us a + b = a. This implies that when adding up a number of positive numbers, it is more stable to accumulate them from the smallest to the largest.
Avoid subtracting one number from another of the same magnitude, or you may get cancellation error. The web page has a classic example of using the quadratic formula.
You are also advised to have a read on Approximation to constant "pi" does not get any better after 50 iterations, a question asked a few days after your question. Using a series to approximate an irrational number is numerically stable as you won't get the absurd behavior seen in your question. But the finite number of valid significant digits imposes a different problem: numerical convergence, that is, you can only approximate the target value up to a certain number of significant digits. MichaelChirico's answer using Taylor series would converge after 19 terms, since 1 / factorial(19) is already numerically 0 when added to 1.
Multiplication / division between floating-point numbers don't cause problem on significant digits; they may cause "overflow" or "underflow". However, given the wide range of representable floating-point values (1e-308 ~ 1e+307), "overflow" and "underflow" should be rare. The real difficulty is with addition / subtraction where significant digits can be easily lost. See Can I stably invert a Vandermonde matrix with many small values in R? for an example on matrix computations. It is not impossible to get higher precision, but the work is probably more involved. For example, OP of the matrix example eventually used the GMP (GNU Multiple Precision Arithmetic Library) and associated R packages to proceed: How to put Rmpfr values into a function in R?
You might also try the Taylor series approximation to exp(1), namely
e^x = \sum_{k = 0}{\infty} x^k / k!
Thus we can approximate e = e^1 by truncating this sum; in R:
sprintf('%.20f', exp(1))
# [1] "2.71828182845904509080"
sprintf('%.20f', sum(1/factorial(0:10)))
# [1] "2.71828180114638451315"
sprintf('%.20f', sum(1/factorial(0:100)))
# [1] "2.71828182845904509080"

R small pvalues

I am calculating z-scores to see if a value is far from the mean/median of the distribution.
I had originally done it using the mean, then turned these into 2-side pvalues. But now using the median I noticed that there are some Na's in the pvalues.
I determined this is occuring for values that are very far from the median.
And looks to be related to the pnorm calculation.
"
'qnorm' is based on Wichura's algorithm AS 241 which provides
precise results up to about 16 digits. "
Does anyone know a way around this as I would like the very small pvalues.
Thanks,
> z<- -12.5
> 2-2*pnorm(abs(z))
[1] 0
> z<- -10
> 2-2*pnorm(abs(z))
[1] 0
> z<- -8
> 2-2*pnorm(abs(z))
[1] 1.332268e-15
Intermediately, you are actually calculating very high p-values:
options(digits=22)
z <- c(-12.5,-10,-8)
pnorm(abs(z))
# [1] 1.0000000000000000000000 1.0000000000000000000000 0.9999999999999993338662
2-2*pnorm(abs(z))
# [1] 0.000000000000000000000e+00 0.000000000000000000000e+00 1.332267629550187848508e-15
I think you will be better off using the low p-values (close to zero) but I am not good enough at math to know whether the error at close-to-one p-values is in the AS241 algorithm or the floating point storage. Look how nicely the low values show up:
pnorm(z)
# [1] 3.732564298877713761239e-36 7.619853024160526919908e-24 6.220960574271784860433e-16
Keep in mind 1 - pnorm(x) is equivalent to pnorm(-x). So, 2-2*pnorm(abs(x)) is equivalent to 2*(1 - pnorm(abs(x)) is equivalent to 2*pnorm(-abs(x)), so just go with:
2 * pnorm(-abs(z))
# [1] 7.465128597755427522478e-36 1.523970604832105383982e-23 1.244192114854356972087e-15
which should get more precisely what you are looking for.
One thought, you'll have to use an exp() with larger precision, but you might be able to use log(p) to get slightly more precision in the tails, otherwise you are effectively at 0 for the non-log p values in terms of the range that can be calculated:
> z<- -12.5
> pnorm(abs(z),log.p=T)
[1] -7.619853e-24
Converting back to the p value doesn't work well, but you could compare on log(p)...
> exp(pnorm(abs(z),log.p=T))
[1] 1
pnorm is a function which gives what P value is based on given x. If You do not specify more arguments, then default distribution is Normal with mean 0, and standart deviation 1.
Based on simetrity, pnorm(a) = 1-pnorm(-a).
In R, if you add positive numbers it will round them. But if you add negative no rounding is done. So using this formula and negative numbers you can calculate needed values.
> pnorm(0.25)
[1] 0.5987063
> 1-pnorm(-0.25)
[1] 0.5987063
> pnorm(20)
[1] 1
> pnorm(-20)
[1] 2.753624e-89

more significant digits

How can I get more significant digits in R? Specifically, I have the following example:
> dpois(50, lambda= 5)
[1] 1.967673e-32
However when I get the p-value:
> 1-ppois(50, lambda= 5)
[1] 0
Obviously, the p-value is not 0. In fact it should greater than 1.967673e-32 since I'm summing a bunch of probabilities. How do I get the extra precision?
Use lower.tail=FALSE:
ppois(50, lambda= 5, lower.tail=FALSE)
## [1] 2.133862e-33
Asking R to compute the upper tail is much more accurate than computing the lower tail and subtracting it from 1: given the inherent limitations of floating point precision, R can't distinguish (1-eps) from 1 for values of eps less than .Machine$double.neg.eps, typically around 10^{-16} (see ?.Machine).
This issue is discussed in ?ppois:
Setting ‘lower.tail = FALSE’ allows to get much more precise
results when the default, ‘lower.tail = TRUE’ would return 1, see
the example below.
Note also that your comment about the value needing to be greater than dpois(50, lambda=5) is not quite right; ppois(x,...,lower.tail=FALSE) gives the probability that the random variable is greater than x, as you can see (for example) by seeing that ppois(0,...,lower.tail=FALSE) is not exactly 1, or:
dpois(50,lambda=5) + ppois(50,lambda=5,lower.tail=FALSE)
## [1] 2.181059e-32
ppois(49,lambda=5,lower.tail=FALSE)
## [1] 2.181059e-32

Resources