Dealing with very small numbers in R - r

I need to calculate a list of very small numbers such as
(0.1)^1000, 0.2^(1200),
and then normalize them so they will sum up to one
a1 = 0.1^1000,
a2 = 0.2^1200
And I want to calculate
a1' = a1/(a1+a2),
I'm running into underflow problems, as I get a1=0. How can I get around this?
Theoretically I could deal with logs, and then log(a1) = 1000*log(0.l) would be a way to represent a1 without underflow problems - But in order to normalize I would need to get
log(a1+a2) - which I can't compute since I can't represent a1 directly.
I'm programming with R - as far as I can tell there is no data type such Decimal in c# which
allows you to get better than double-precision value.
Any suggestions will be appreciated, thanks

Mathematically spoken, one of those numbers will be appx. zero, and the other one. The difference between your numbers is huge, so I'm even wondering if this makes sense.
But to do that in general, you can use the idea from the logspace_add C-function that's underneath the hood of R. One can define logxpy ( =log(x+y) ) when lx = log(x) and ly = log(y) as :
logxpy <- function(lx,ly) max(lx,ly) + log1p(exp(-abs(lx-ly)))
Which means that we can use :
> la1 <- 1000*log(0.1)
> la2 <- 1200*log(0.2)
> exp(la1 - logxpy(la1,la2))
[1] 5.807714e-162
> exp(la2 - logxpy(la1,la2))
[1] 1
This function can be called recursively as well if you have more numbers. Mind you, 1 is still 1, and not 1 minus 5.807...e-162 . If you really need more precision and your platform supports long double types, you could code everything in eg C or C++, and return the results later on. But if I'm right, R can - for the moment - only deal with normal doubles, so ultimately you'll lose the precision again when the result is shown.
to do the math for you :
log(x+y) = log(exp(lx)+exp(ly))
= log( exp(lx) * (1 + exp(ly-lx) )
= lx + log ( 1 + exp(ly - lx) )
Now you just take the largest as lx, and then you come at the expression in logxpy().
EDIT 2 : Why take the maximum then? Easy, to assure that you use a negative number in exp(lx-ly). If lx-ly gets too big, then exp(lx-ly) would return Inf. That's not a correct result. exp(ly-lx) would return 0, which allows for a far better result:
Say lx=1 and ly=1000, then :
> 1+log1p(exp(1000-1))
[1] Inf
> 1000+log1p(exp(1-1000))
[1] 1000

The Brobdingnag package deals with very large or small numbers, essentially wrapping Joris's answer into a convenient form.
a1 <- as.brob(0.1)^1000
a2 <- as.brob(0.2)^1200
a1_dash <- a1 / (a1 + a2)
a2_dash <- a2 / (a1 + a2)

Try the arbitrary precision packages:
Rmpfr "R MPFR - Multiple Precision Floating-Point Reliable"
Ryacas "R Interface to the 'Yacas' Computer Algebra System" - may also be able to do arbitrary precision.

Maybe you can treat a1 and a2 as fractions. In your example, with
a1 = (a1num/a1denom)^1000 # 1/10
a2 = (a2num/a2denom)^1200 # 1/5
you would arrive at
a1' = (a1num^1000 * a2denom^1200)/(a1num^1000 * a2denom^1200 + a1denom^1000 * a2num^1200)
a2' = (a1denom^1000 * a2num^1200)/(a1num^1000 * a2denom^1200 + a1denom^1000 * a2num^1200)
which can be computed using the gmp package:
a1 <- as.double(pow.bigz(5,1200) / (pow.bigz(5,1200)+ pow.bigz(10,1000)))


Numerical blowup problem in a fractional function in R

I am working with this function in R:
betaFun = function(x){
if(x == 0){
return( ( 1+exp(x)*(x-1) )/( x*(exp(x)-1) ) )
The function is smooth and well defined for every x (at least from a theoretical point of view) and in 0 the limit approach to 0.5 (you can convince yourself about this by using Hopital theorem).
I have the following problem:
i.e. the fact that, due to the limit, R wrongly compute the values and I get a blowup in 0.
Here I report the numerical issue:
x = c(1e-4, 1e-6, 1e-8, 1e-10, 1e-12, 1e-13)
sapply(x, betaFun)
[1] 5.000083e-01 5.000442e-01 2.220446e+00 0.000000e+00 0.000000e+00 1.111111e+10
As you can see the evaluation is pretty weird, in particular last one.
I thought that I could solve this problem by defining the missing value in 0 (as you can see from the code) but it is not true.
Do you know how can I solve this numerical blow up problem?
I need high precision for this function since I have to invert it around 0. I will do it using nleqslv function from nleqslv library. Of course the inversion will return wrong solutions if the function has numerical problems.
I think that you are losing accuracy in the evaluation of exp(x)-1 for x close to 0. In C if I evaluate your function as
double f2( double x)
{ return (x==0) ? 0.5
: (x*exp(x) - expm1(x))/( x*expm1(x));
The problem goes away. Here expm1 is a math library function that computes exp(x) - 1, without losing accuracy for small x. I'm afraid I don't know if R has this, but you'd hope it would.
I think, though, that you would be better to test for |x| was sufficiently small, rather than 0.0. The point is that for small enough x both x*exp(x) and expm1(x) will be, as doubles, x, so their difference will be 0. To keep maximum accuracy may need to add a linear term to the 0.5 you return. I've not worked out precisely what 'sufficiently small should be, but it's somewhere around 1e-16 I think.
Your problem is that you take the quotient of two numbers with very small absolute values. Such numbers are only represented to floating point precision.
You don't specify why you need these function values for x values close to zero. One easy option would be coercion to high precision numbers:
betaFun = function(x){
x <- mpfr(as.character(x), precBits = 256)
#if x is calculated, you should switch to high precision numbers for its calculation
#this step could be removed then
#do calculation with high precision,
#then coerce to normal precision (assuming that is necessary)
ifelse(x == 0, 0.5, as((1 + exp(x) * (x - 1)) / (x * (exp(x) - 1)), "numeric"))
x = c(1e-4, 1e-6, 1e-8, 1e-10, 1e-12, 1e-13, 0)
#[1] 0.5000083 0.5000001 0.5000000 0.5000000 0.5000000 0.5000000 0.5000000
As you notice, you are encountering the problem near zero. The roots of both the numerator and denominator are zero. And as the OP mentioned, using L'Hôpitcal, you notice that in that f(x) = 1/2.
From a numerical point of view, things go slightly different. Floating points will always have an error as not every Real number can be represented as a floating point number. For example:
exp(1E-3) -1 = 0.0010005001667083845973138522822409868 # numeric
exp(1/1000)-1 = 0.001000500166708341668055753993058311563076200580... # true
The problem in evaluating numerically exp(1E-3)-1 already starts at the beginning, i.e. 1E-3
1E-3 = x = 0.0010000000000000000208166817117216851
exp(x) = 1.0010005001667083845973138522822409868
exp(x) - 1 = 0.0010005001667083845973138522822409868
1E-3 cannot be represented as a floating point, and is accurate upto 17 digits.
IEEE will give the closest floating point value possible to the true value of x, which already has an error due to (1). Still exp(x) is only accurate upto 17 digits.
By subtracting 1, we get a bunch of zero's in the beginning, and now our result is only accurate upto 14 digits.
So now that we know that we cannot represent everything exactly as a floating point, you should realize that near zero, it becomes a bit awkward and both numerator and denominator become less and less accurate, especially near 1E-13.
numerator_numeric(1E-13) = 1.1102230246251565E-16
numerator_true(1E-13) = 5.00000000000033333333333...E-27
Generally, what you do near such a point is use a Taylor expansion around zero, and the normal function everywhere else:
betaFun = function(x){
if(-1E-1 < x && x < 1E-1){
return(0.5 + x/12. - x^3/720. + x^5/30240.)
return( ( 1+exp(x)*(x-1) )/( x*(exp(x)-1) ) )
The above expansion is accurate upto 13 digits for x in the small region

Optimize within for loop cannot find function

I've got a function, KozakTaper, that returns the diameter of a tree trunk at a given height (DHT). There's no algebraic way to rearrange the original taper equation to return DHT at a given diameter (4 inches, for my purposes)...enter R! (using 3.4.3 on Windows 10)
My approach was to use a for loop to iterate likely values of DHT (25-100% of total tree height, HT), and then use optimize to choose the one that returns a diameter closest to 4". Too bad I get the error message Error in f(arg, ...) : could not find function "f".
Here's a shortened definition of KozakTaper along with my best attempt so far.
if(Bark=='ob' & SPP=='AB'){
else if(Bark=='ob' & SPP=='RS'){
p = 1.3/HT
z = DHT/HT
Xi = (1 - z^(1/3))/(1 - p^(1/3))
Qi = 1 - z^(1/3)
y = (a0_tap * (DBH^a1_tap) * (HT^a2_tap)) * Xi^(b1_tap * z^4 + b2_tap * (exp(-DBH/HT)) +
b3_tap * Xi^0.1 + b4_tap * (1/DBH) + b5_tap * HT^Qi + b6_tap * Xi + b7_tap*Planted)
HT <- .3048*85 #converting from english to metric (sorry, it's forestry)
for (i in c((HT*.25):(HT+1))) {
d <- KozakTaper(Bark='ob',SPP='RS',DHT=i,DBH=2.54*19,HT=.3048*85,Planted=0)
frame <- na.omit(d)
optimize(f=abs(10.16-d), interval=frame, lower=1, upper=90,
maximum = FALSE,
tol = .Machine$double.eps^0.25)
Eventually I would like this code to iterate through a csv and return i for the best d, which will require some rearranging, but I figured I should make it work for one tree first.
When I print d I get multiple values, so it is iterating through i, but it gets held up at the optimize function.
Defining frame was my most recent tactic, because d returns one NaN at the end, but it may not be the best input for interval. I've tried interval=c((HT*.25):(HT+1)), defining KozakTaper within the for loop, and defining f prior to the optimize, but I get the same error. Suggestions for what part I should target (or other approaches) are appreciated!
**Edit with a follow-up question:
I'm now trying to run this script for each row of a csv, "Input." The row contains the values for KozakTaper, and I've called them with this:
o <- optimize(f = function(x) abs(10.16 - KozakTaper(Bark='ob',
lower=Input$Ht*.25, upper=Input$Ht+1,
maximum = FALSE, tol = .Machine$double.eps^0.25)
Input$Opt <- o$minimum
Input$Mht <- Input$Opt/.3048. # converting back to English
Input$Ht and Input$DBH are numeric; Input$Species is factor.
However, I get the error invalid function value in 'optimize'. I get it whether I define "o" or just run optimize. Oddly, when I don't call values from the row but instead use the code from the answer, it tells me object 'HT' not found. I have the awful feeling this is due to some obvious/careless error on my part, but I'm not finding posts about this error with optimize. If you notice what I've done wrong, your explanation will be appreciated!
I'm not an expert on optimize, but I see three issues: 1) your call to KozakTaper does not iterate through the range you specify in the loop. 2) KozakTaper returns a a single number not a vector. 3) You haven't given optimize a function but an expression.
So what is happening is that you are not giving optimize anything to iterate over.
All you should need is this:
optimize(f = function(x) abs(10.16 - KozakTaper(Bark='ob',
lower=HT*.25, upper=HT+1,
maximum = FALSE, tol = .Machine$double.eps^0.25)
[1] 22.67713 ##Hopefully this is the right answer
[1] 0
Optimize will now substitute x in from lower to higher, trying to minimize the difference

R: approximating `e = exp(1)` using `(1 + 1 / n) ^ n` gives absurd result when `n` is large

So, I was just playing around with manually calculating the value of e in R and I noticed something that was a bit disturbing to me.
The value of e using R's exp() command...
#[1] 2.718282
Now, I'll try to manually calculate it using x = 10000
x <- 10000
y <- (1 + (1 / x)) ^ x
#[1] 2.718146
Not quite but we'll try to get closer using x = 100000
x <- 100000
y <- (1 + (1 / x)) ^ x
#[1] 2.718268
Warmer but still a bit off...
x <- 1000000
y <- (1 + (1 / x)) ^ x
#[1] 2.71828
Now, let's try it with a huge one
x <- 5000000000000000
y <- (1 + (1 / x)) ^ x
#[1] 3.035035
Well, that's not right. What's going on here? Am I overflowing the data type and need to use a certain package instead? If so, are there no warnings when you overflow a data type?
You've got a problem with machine precision. As soon as (1 / x) < 2.22e-16, 1 + (1 / x) is just 1. Mathematical limit breaks down in finite-precision numerical computations. Your final x in the question is already 5e+15, very close to this brink. Try x <- x * 10, and your y would be 1.
This is neither "overflow" nor "underflow" as there is no difficulty in representing a number as small as 1e-308. It is the problem of the loss of significant digits during floating-point arithmetic. When you do 1 + (1 / x), the bigger x is, the fewer significant digits in the (1 / x) part can be preserved when you add it to 1, and eventually you lose that (1 / x) term altogether.
## valid 16 significant digits
1 + 1.23e-01 = 1.123000000000000|
1 + 1.23e-02 = 1.012300000000000|
... ...
1 + 1.23e-15 = 1.000000000000001|
1 + 1.23e-16 = 1.000000000000000|
Any numerical analysis book would tell you the following.
Avoid adding a large number and a small number. In floating-point addition a + b = a * (1 + b / a), if b / a < 2.22e-16, there us a + b = a. This implies that when adding up a number of positive numbers, it is more stable to accumulate them from the smallest to the largest.
Avoid subtracting one number from another of the same magnitude, or you may get cancellation error. The web page has a classic example of using the quadratic formula.
You are also advised to have a read on Approximation to constant "pi" does not get any better after 50 iterations, a question asked a few days after your question. Using a series to approximate an irrational number is numerically stable as you won't get the absurd behavior seen in your question. But the finite number of valid significant digits imposes a different problem: numerical convergence, that is, you can only approximate the target value up to a certain number of significant digits. MichaelChirico's answer using Taylor series would converge after 19 terms, since 1 / factorial(19) is already numerically 0 when added to 1.
Multiplication / division between floating-point numbers don't cause problem on significant digits; they may cause "overflow" or "underflow". However, given the wide range of representable floating-point values (1e-308 ~ 1e+307), "overflow" and "underflow" should be rare. The real difficulty is with addition / subtraction where significant digits can be easily lost. See Can I stably invert a Vandermonde matrix with many small values in R? for an example on matrix computations. It is not impossible to get higher precision, but the work is probably more involved. For example, OP of the matrix example eventually used the GMP (GNU Multiple Precision Arithmetic Library) and associated R packages to proceed: How to put Rmpfr values into a function in R?
You might also try the Taylor series approximation to exp(1), namely
e^x = \sum_{k = 0}{\infty} x^k / k!
Thus we can approximate e = e^1 by truncating this sum; in R:
sprintf('%.20f', exp(1))
# [1] "2.71828182845904509080"
sprintf('%.20f', sum(1/factorial(0:10)))
# [1] "2.71828180114638451315"
sprintf('%.20f', sum(1/factorial(0:100)))
# [1] "2.71828182845904509080"

Difference between Error functions

In Mathematica there is an option see this question to calculate the difference between two error functions. However, I have not yet found any thing similar in R.
I need to calculate things like Erf(1604.041) - Erf(3117.127) and get a non zero value...
You can come close to the result of 4e-1117421 given in the comment by #James.
First, the error function can be computed like this in R:
1 - 2 * pnorm(-sqrt(2) * x)
However, this will give you numerical zeros due to floating point precision. Fortunately, pnorm can return the log of the p-values. You can then exponentiate it using arbitrary precision numbers:
2 * exp(mpfr(pnorm(-sqrt(2) * 1604.041, log.p = TRUE), precBits = 32)) -
2 * exp(mpfr(pnorm(-sqrt(2) * 3117.127, log.p = TRUE), precBits = 32))
#1 'mpfr' number of precision 32 bits
#[1] 4.2826176801e-1117421
(Note that you get only floating point precision for the log-p-values.)
However, I wonder in which kind of application such a precision is necessary. It's essentially a zero value.
Edit: And I've just found out that Rmpfr offers an implementation of the complementary error function. You can simply do this:
erfc(mpfr(3117.127, precBits = 32)) - erfc(mpfr(1604.041, precBits = 32))
#1 'mpfr' number of precision 32 bits
#[1] -4.2854514871e-1117421

Calculating log(sum of exp(terms) ) when "terms" are very small

I would like to compute log( exp(A1) + exp(A2) ).
The formula below
log(exp(A1) + exp(A2) ) = log[exp(A1)(1 + exp(A2)/exp(A1))] = A1 + log(1+exp(A2-A1))
is useful when A1 and A2 are large and numerically exp(A1)=Inf (or exp(A2)=Inf).
(this formula is discussed in this thread ->
How to calculate log(sum of terms) from its component log-terms). The formula is true when the role of A1 and A2 are replaced.
My concern of this formula is when A1 and A2 are very small. For example, when A1 and A2 are:
A1 <- -40000
A2 <- -45000
then the direct calculation of log(exp(A1) + exp(A2) ) is:
log(exp(A1) + exp(A2))
[1] -Inf
Using the formula above gives:
A1 + log(1 + exp(A2-A1))
[1] -40000
which is the value of A1.
Ising the formula above with flipped role of A1 and A2 gives:
A2 + log(1 + exp(A1-A2))
[1] Inf
Which of the three values are the closest to the true value of log(exp(A1) + exp(A2))? Is there robust way to compute log(exp(A1) + exp(A2)) that can be used both when A1, A2 are small and A1, A2 are large.
Thank you in advance
You should use something with more accuracy to do the direct calculation.
It’s not “useful when [they’re] large”. It’s useful when the difference is very negative.
When x is near 0, then log(1+x) is approximately x. So if A1>A2, we can take your first formula:
log(exp(A1) + exp(A2)) = A1 + log(1+exp(A2-A1))
and approximate it by A1 + exp(A2-A1) (and the approximation will get better as A2-A1 is more negative). Since A2-A1=-5000, this is more than negative enough to make the approximation sufficient.
Regardless, if y is too far from zero (either way) exp(y) will (over|under)flow a double and result in 0 or infinity (this is a double, right? what language are you using?). This explains your answers. But since exp(A2-A1)=exp(-5000) is close to zero, your answer is approximately -40000+exp(-5000), which is indistinguishable from -40000, so that one is correct.
in such huge exponent differences the safest you can do without arbitrary precision is
chose the biggest exponent let it be Am = max(A1,A2)
so: log(exp(A1)+exp(A2)) -> log(exp(Am)) = Am
that is the closest you can get for such case
so in your example the result is -40000+delta
where delta is something very small
If you want to use the second formula then all breaks down to computing log(1+exp(A))
if A is positive then the result is far from the real thing
if A is negative then it will truncate to log(1)=0 so you get the same result as in above
your exponent difference is base^500
single precision 32bit float can store numbers up to (+/-)2^(+/-128)
double precision 64bit float can store numbers up to (+/-)2^(+/-1024)
so when your base is 10 or e then this is nowhere near enough what you need
if you have quadruple precision that should be enough but when you start changing the exp difference again yo will quickly get to the same point as now
[PS] if you need more precision without arbitrary precision
you can try to create own number class
with internal store of numbers like number=a^b
where a,b are floats
but for that you would need to code all basic functions
*,/ is easy
+,- is a nightmare but there could be some approaches/algorithms out there even for this
