Calculating log(sum of exp(terms) ) when "terms" are very small - math

I would like to compute log( exp(A1) + exp(A2) ).
The formula below
log(exp(A1) + exp(A2) ) = log[exp(A1)(1 + exp(A2)/exp(A1))] = A1 + log(1+exp(A2-A1))
is useful when A1 and A2 are large and numerically exp(A1)=Inf (or exp(A2)=Inf).
(this formula is discussed in this thread ->
How to calculate log(sum of terms) from its component log-terms). The formula is true when the role of A1 and A2 are replaced.
My concern of this formula is when A1 and A2 are very small. For example, when A1 and A2 are:
A1 <- -40000
A2 <- -45000
then the direct calculation of log(exp(A1) + exp(A2) ) is:
log(exp(A1) + exp(A2))
[1] -Inf
Using the formula above gives:
A1 + log(1 + exp(A2-A1))
[1] -40000
which is the value of A1.
Ising the formula above with flipped role of A1 and A2 gives:
A2 + log(1 + exp(A1-A2))
[1] Inf
Which of the three values are the closest to the true value of log(exp(A1) + exp(A2))? Is there robust way to compute log(exp(A1) + exp(A2)) that can be used both when A1, A2 are small and A1, A2 are large.
Thank you in advance

You should use something with more accuracy to do the direct calculation.
It’s not “useful when [they’re] large”. It’s useful when the difference is very negative.
When x is near 0, then log(1+x) is approximately x. So if A1>A2, we can take your first formula:
log(exp(A1) + exp(A2)) = A1 + log(1+exp(A2-A1))
and approximate it by A1 + exp(A2-A1) (and the approximation will get better as A2-A1 is more negative). Since A2-A1=-5000, this is more than negative enough to make the approximation sufficient.
Regardless, if y is too far from zero (either way) exp(y) will (over|under)flow a double and result in 0 or infinity (this is a double, right? what language are you using?). This explains your answers. But since exp(A2-A1)=exp(-5000) is close to zero, your answer is approximately -40000+exp(-5000), which is indistinguishable from -40000, so that one is correct.

in such huge exponent differences the safest you can do without arbitrary precision is
chose the biggest exponent let it be Am = max(A1,A2)
so: log(exp(A1)+exp(A2)) -> log(exp(Am)) = Am
that is the closest you can get for such case
so in your example the result is -40000+delta
where delta is something very small
If you want to use the second formula then all breaks down to computing log(1+exp(A))
if A is positive then the result is far from the real thing
if A is negative then it will truncate to log(1)=0 so you get the same result as in above
[Notes]
your exponent difference is base^500
single precision 32bit float can store numbers up to (+/-)2^(+/-128)
double precision 64bit float can store numbers up to (+/-)2^(+/-1024)
so when your base is 10 or e then this is nowhere near enough what you need
if you have quadruple precision that should be enough but when you start changing the exp difference again yo will quickly get to the same point as now
[PS] if you need more precision without arbitrary precision
you can try to create own number class
with internal store of numbers like number=a^b
where a,b are floats
but for that you would need to code all basic functions
*,/ is easy
+,- is a nightmare but there could be some approaches/algorithms out there even for this

Related

R: approximating `e = exp(1)` using `(1 + 1 / n) ^ n` gives absurd result when `n` is large

So, I was just playing around with manually calculating the value of e in R and I noticed something that was a bit disturbing to me.
The value of e using R's exp() command...
exp(1)
#[1] 2.718282
Now, I'll try to manually calculate it using x = 10000
x <- 10000
y <- (1 + (1 / x)) ^ x
y
#[1] 2.718146
Not quite but we'll try to get closer using x = 100000
x <- 100000
y <- (1 + (1 / x)) ^ x
y
#[1] 2.718268
Warmer but still a bit off...
x <- 1000000
y <- (1 + (1 / x)) ^ x
y
#[1] 2.71828
Now, let's try it with a huge one
x <- 5000000000000000
y <- (1 + (1 / x)) ^ x
y
#[1] 3.035035
Well, that's not right. What's going on here? Am I overflowing the data type and need to use a certain package instead? If so, are there no warnings when you overflow a data type?
You've got a problem with machine precision. As soon as (1 / x) < 2.22e-16, 1 + (1 / x) is just 1. Mathematical limit breaks down in finite-precision numerical computations. Your final x in the question is already 5e+15, very close to this brink. Try x <- x * 10, and your y would be 1.
This is neither "overflow" nor "underflow" as there is no difficulty in representing a number as small as 1e-308. It is the problem of the loss of significant digits during floating-point arithmetic. When you do 1 + (1 / x), the bigger x is, the fewer significant digits in the (1 / x) part can be preserved when you add it to 1, and eventually you lose that (1 / x) term altogether.
## valid 16 significant digits
1 + 1.23e-01 = 1.123000000000000|
1 + 1.23e-02 = 1.012300000000000|
... ...
1 + 1.23e-15 = 1.000000000000001|
1 + 1.23e-16 = 1.000000000000000|
Any numerical analysis book would tell you the following.
Avoid adding a large number and a small number. In floating-point addition a + b = a * (1 + b / a), if b / a < 2.22e-16, there us a + b = a. This implies that when adding up a number of positive numbers, it is more stable to accumulate them from the smallest to the largest.
Avoid subtracting one number from another of the same magnitude, or you may get cancellation error. The web page has a classic example of using the quadratic formula.
You are also advised to have a read on Approximation to constant "pi" does not get any better after 50 iterations, a question asked a few days after your question. Using a series to approximate an irrational number is numerically stable as you won't get the absurd behavior seen in your question. But the finite number of valid significant digits imposes a different problem: numerical convergence, that is, you can only approximate the target value up to a certain number of significant digits. MichaelChirico's answer using Taylor series would converge after 19 terms, since 1 / factorial(19) is already numerically 0 when added to 1.
Multiplication / division between floating-point numbers don't cause problem on significant digits; they may cause "overflow" or "underflow". However, given the wide range of representable floating-point values (1e-308 ~ 1e+307), "overflow" and "underflow" should be rare. The real difficulty is with addition / subtraction where significant digits can be easily lost. See Can I stably invert a Vandermonde matrix with many small values in R? for an example on matrix computations. It is not impossible to get higher precision, but the work is probably more involved. For example, OP of the matrix example eventually used the GMP (GNU Multiple Precision Arithmetic Library) and associated R packages to proceed: How to put Rmpfr values into a function in R?
You might also try the Taylor series approximation to exp(1), namely
e^x = \sum_{k = 0}{\infty} x^k / k!
Thus we can approximate e = e^1 by truncating this sum; in R:
sprintf('%.20f', exp(1))
# [1] "2.71828182845904509080"
sprintf('%.20f', sum(1/factorial(0:10)))
# [1] "2.71828180114638451315"
sprintf('%.20f', sum(1/factorial(0:100)))
# [1] "2.71828182845904509080"

Simplify boolean algebra expression

Starting from a truth table of one subtractor with three inputs and two outputs I've obtained the following boolean formula for the second output z2 (the "loan" of the bit to the left) :
z2 = x0'x1'x2 + x0'x1x2' + x0'x1x2 + x0x1x2
(where x0' means NOT x0).
Simplifying it : z2 = x2(x0 ⊕ x1)' + x1(x0 ⊕ x2)' + x1x2
which means : z2 = x2(x0 XNOR x1) + x1(x0 XNOR x2) + x1x2
Can I simplify more?
I've tried with z2 = (x0 XNOR x1) + (x0 XNOR x2) + x1x2 but it doesn't do the trick.
Your original expression has 12 variable references and 16 operators. Your simplification has 8 variable references and 7 operators. Here is an expression with 6 variable references and 6 operators:
z2 = x0x1x2 + x0'(x1+x2)
I do not know if that is minimal in any sense.
You ask how I found that expression. I did not start from your simplification, I started from the truth table that you referenced in a comment. I reproduce it here:
As I looked the table looking for patterns, I saw that it looks like a chiasm or skew-symmetric matrix: if I flip the last column upside-down then take the complements of all items I end up with the original column. (I don't know the proper term for this kind of symmetry; those are the terms that came to my mind.) I tried to encapsulate that symmetry into a logical expression but failed.
That led me to look at the upper and lower halves of that last column. The upper half is mostly ones and the lower half is mostly zeros. Then it struck me that the upper half looks like the truth table for the binary OR operation and the lower half looks like the binary AND operation. The upper half is for x0' and the lower half is for x0, of course. Putting those facts together gave me my expression.
I confirmed that expression by seeing if I could manipulate the original expression into mine. I could, by doing
z2 = x0'x1'x2 + x0'x1x2' + x0'x1x2 + x0x1x2
= x0'(x1'x2 + x1x2' + x1x2) + x0x1x2
= x0'(x1 + x2) + x0x1x2
= x0x1x2 + x0'(x1 + x2)
The transition from the second to third line is, of course, equivalent to recognizing the truth table for binary OR, so this is not much different from my actual discovery method.
That latter method may be more transferable to other problems: factor out a common factor from multiple terms. My actual method was more fun but less transferable. My favorite definition of mathematics is "the study of patterns" which explains why the method was fun.

How to compute limiting value

I have to compute the value of this expression
(e1*cos(θ) - e2*sin(θ)) / ((cos(θ))^2 - (sin(θ))^2)
Here e1 and e2 are some complex expression.
In the case when θ approach to PI/4, then the denominator will approach to zero. But in that case e1 and e2 will also approach to same value. So at PI/4, the value of expression will be E/sqrt(2) where e1=e2=E
I can do special handing for θ=PI/4 but what about the value of θ very very near to PI/4. What is the general strategy to compute such kind of expressions
This is a bit tricky. You need to know more about how e1 and e2 behave.
First, some notation: I'll use the variable a = theta - pi/4, so that we are interested in a->0, and write e1 = E + d1, e2 = E + d2. Let F = your expression times sqrt(2)
In terms of these we have
F = ((E+d1)*(cos(a) - sin(a)) - (E+d2)*(cos(a) + sin(a))) / - sin( 2*a)
= (-(2*E+d1+d2)*sin(a) + (d1-d2)*cos(a)) / (-2*sin(a)*cos(a))
= (E+(d1+d2)/2)/cos(a) - (d1-d2)/(2*sin(a))
Assuming that d1->0 and d2->0 as a->0 the first term will tend to E.
However the second term could tend to anything, or blow up -- for example if d1=d2=sqrt(a).
We need to assume more, for example that d1-d2 has derivative D at a=0.
In this case we will have
F-> E - D/2 as a->0
To be able to compute F for values of a close to 0 we need to know even more.
One approach is to have code like this:
if ( fabs(a) < small) { F = E-D/2 + C*a; } else { F = // normal code }
So we need to figure out what 'small' and C should be. In part this depends on what (relative) accuracy you require. The most stringent requirement would be that at a = +- small, the difference between the approximation and the normal code should be too small to represent in a double (if that's what you're using). But note that we mustn't make 'small' too small, or there is a danger that, as doubles, the normal code will evaluate to 0/0 at a += small. One approach would be to expand the numerator and denominator of F (or just the second term) as power series (to say second order), divide each by a, and then divide these series, keeping terms up to second order; the first term in this gives you C above, and the second term will allow you to estimate the error in this approximation, and hence estimate 'small'.

Mathematical (Arithmetic) representation of XOR

I have spent the last 5 hours searching for an answer. Even though I have found many answers they have not helped in any way.
What I am basically looking for is a mathematical, arithmetic only representation of the bitwise XOR operator for any 32bit unsigned integers.
Even though this sounds really simple, nobody (at least it seems so) has managed to find an answer to this question.
I hope we can brainstorm, and find a solution together.
Thanks.
XOR any numerical input between 0 and 1 including both ends
a + b - ab(1 + a + b - ab)
XOR binary input
a + b - 2ab or (a-b)²
Derivation
Basic Logical Operators
NOT = (1-x)
AND = x*y
From those operators we can get...
OR = (1-(1-a)(1-b)) = a + b - ab
Note: If a and b are mutually exclusive then their and condition will always be zero - from a Venn diagram perspective, this means there is no overlap. In that case, we could write OR = a + b, since a*b = 0 for all values of a & b.
2-Factor XOR
Defining XOR as (a OR B) AND (NOT (a AND b)):
(a OR B) --> (a + b - ab)
(NOT (a AND b)) --> (1 - ab)
AND these conditions together to get...
(a + b - ab)(1 - ab) = a + b - ab(1 + a + b - ab)
Computational Alternatives
If the input values are binary, then powers terms can be ignored to arrive at simplified computationally equivalent forms.
a + b - ab(1 + a + b - ab) = a + b - ab - a²b - ab² + a²b²
If x is binary (either 1 or 0), then we can disregard powers since 1² = 1 and 0² = 0...
a + b - ab - a²b - ab² + a²b² -- remove powers --> a + b - 2ab
XOR (binary) = a + b - 2ab
Binary also allows other equations to be computationally equivalent to the one above. For instance...
Given (a-b)² = a² + b² - 2ab
If input is binary we can ignore powers, so...
a² + b² - 2ab -- remove powers --> a + b - 2ab
Allowing us to write...
XOR (binary) = (a-b)²
Multi-Factor XOR
XOR = (1 - A*B*C...)(1 - (1-A)(1-B)(1-C)...)
Excel VBA example...
Function ArithmeticXOR(R As Range, Optional EvaluateEquation = True)
Dim AndOfNots As String
Dim AndGate As String
For Each c In R
AndOfNots = AndOfNots & "*(1-" & c.Address & ")"
AndGate = AndGate & "*" & c.Address
Next
AndOfNots = Mid(AndOfNots, 2)
AndGate = Mid(AndGate, 2)
'Now all we want is (Not(AndGate) AND Not(AndOfNots))
ArithmeticXOR = "(1 - " & AndOfNots & ")*(1 - " & AndGate & ")"
If EvaluateEquation Then
ArithmeticXOR = Application.Evaluate(xor2)
End If
End Function
Any n of k
These same methods can be extended to allow for any n number out of k conditions to qualify as true.
For instance, out of three variables a, b, and c, if you're willing to accept any two conditions, then you want a&b or a&c or b&c. This can be arithmetically modeled from the composite logic...
(a && b) || (a && c) || (b && c) ...
and applying our translations...
1 - (1-ab)(1-ac)(1-bc)...
This can be extended to any n number out of k conditions. There is a pattern of variable and exponent combinations, but this gets very long; however, you can simplify by ignoring powers for a binary context. The exact pattern is dependent on how n relates to k. For n = k-1, where k is the total number of conditions being tested, the result is as follows:
c1 + c2 + c3 ... ck - n*∏
Where c1 through ck are all n-variable combinations.
For instance, true if 3 of 4 conditions met would be
abc + abe + ace + bce - 3abce
This makes perfect logical sense since what we have is the additive OR of AND conditions minus the overlapping AND condition.
If you begin looking at n = k-2, k-3, etc. The pattern becomes more complicated because we have more overlaps to subtract out. If this is fully extended to the smallest value of n = 1, then we arrive at nothing more than a regular OR condition.
Thinking about Non-Binary Values and Fuzzy Region
The actual algebraic XOR equation a + b - ab(1 + a + b - ab) is much more complicated than the computationally equivalent binary equations like x + y - 2xy and (x-y)². Does this mean anything, and is there any value to this added complexity?
Obviously, for this to matter, you'd have to care about the decimal values outside of the discrete points (0,0), (0,1), (1,0), and (1,1). Why would this ever matter? Sometimes you want to relax the integer constraint for a discrete problem. In that case, you have to look at the premises used to convert logical operators to equations.
When it comes to translating Boolean logic into arithmetic, your basic building blocks are the AND and NOT operators, with which you can build both OR and XOR.
OR = (1-(1-a)(1-b)(1-c)...)
XOR = (1 - a*b*c...)(1 - (1-a)(1-b)(1-c)...)
So if you're thinking about the decimal region, then it's worth thinking about how we defined these operators and how they behave in that region.
Non-Binary Meaning of NOT
We expressed NOT as 1-x. Obviously, this simple equation works for binary values of 0 and 1, but the thing that's really cool about it is that it also provides the fractional or percent-wise compliment for values between 0 to 1. This is useful since NOT is also known as the Compliment in Boolean logic, and when it comes to sets, NOT refers to everything outside of the current set.
Non-Binary Meaning of AND
We expressed AND as x*y. Once again, obviously it works for 0 and 1, but its effect is a little more arbitrary for values between 0 to 1 where multiplication results in partial truths (decimal values) diminishing each other. It's possible to imagine that you would want to model truth as being averaged or accumulative in this region. For instance, if two conditions are hypothetically half true, is the AND condition only a quarter true (0.5 * 0.5), or is it entirely true (0.5 + 0.5 = 1), or does it remain half true ((0.5 + 0.5) / 2)? As it turns out, the quarter truth is actually true for conditions that are entirely discrete and the partial truth represents probability. For instance, will you flip tails (binary condition, 50% probability) both now AND again a second time? Answer is 0.5 * 0.5 = 0.25, or 25% true. Accumulation doesn't really make sense because it's basically modeling an OR condition (remember OR can be modeled by + when the AND condition is not present, so summation is characteristically OR). The average makes sense if you're looking at agreement and measurements, but it's really modeling a hybrid of AND and OR. For instance, ask 2 people to say on a scale of 1 to 10 how much do they agree with the statement "It is cold outside"? If they both say 5, then the truth of the statement "It is cold outside" is 50%.
Non-Binary Values in Summary
The take away from this look at non-binary values is that we can capture actual logic in our choice of operators and construct equations from the ground up, but we have to keep in mind numerical behavior. We are used to thinking about logic as discrete (binary) and computer processing as discrete, but non-binary logic is becoming more and more common and can help make problems that are difficult with discrete logic easier/possible to solve. You'll need to give thought to how values interact in this region and how to translate them into something meaningful.
"mathematical, arithmetic only representation" are not correct terms anyway. What you are looking for is a function which goes from IxI to I (domain of integer numbers).
Which restrictions would you like to have on this function? Only linear algebra? (+ , - , * , /) then it's impossible to emulate the XOR operator.
If instead you accept some non-linear operators like Max() Sgn() etc, you can emulate the XOR operator with some "simpler" operators.
Given that (a-b)(a-b) quite obviously computes xor for a single bit, you could construct a function with the floor or mod arithmetic operators to split the bits out, then xor them, then sum to recombine. (a-b)(a-b) = a2 -2·a·b + b2 so one bit of xor gives a polynomial with 3 terms.
Without floor or mod, the different bits interfere with each other, so you're stuck with looking at a solution which is a polynomial interpolation treating the input a,b as a single value: a xor b = g(a · 232 + b)
The polynomial has 264-1 terms, though will be symmetric in a and b as xor is commutative so you only have to calculate half of the coefficients. I don't have the space to write it out for you.
I wasn't able to find any solution for 32-bit unsigned integers but I've found some solutions for 2-bit integers which I was trying to use in my Prolog program.
One of my solutions (which uses exponentiation and modulo) is described in this StackOverflow question and the others (some without exponentiation, pure algebra) can be found in this code repository on Github: see different xor0 and o_xor0 implementations.
The nicest xor represention for 2-bit uints seems to be: xor(A,B) = (A + B*((-1)^A)) mod 4.
Solution with +,-,*,/ expressed as Excel formula (where cells from A2 to A5 and cells from B1 to E1 contain numbers 0-4) to be inserted in cells from A2 to E5:
(1-$A2)*(2-$A2)*(3-$A2)*($A2+B$1)/6 - $A2*(1-$A2)*(3-$A2)*($A2+B$1)/2 + $A2*(1-$A2)*(2-$A2)*($A2-B$1)/6 + $A2*(2-$A2)*(3-$A2)*($A2-B$1)/2 - B$1*(1-B$1)*(3-B$1)*$A2*(3-$A2)*(6-4*$A2)/2 + B$1*(1-B$1)*(2-B$1)*$A2*($A2-3)*(6-4*$A2)/6
It may be possible to adapt and optimize this solution for 32-bit unsigned integers. It's complicated and it uses logarithms but seems to be the most universal one as it can be used on any integer number. Additionaly, you'll have to check if it really works for all number combinations.
I do realize that this is sort of an old topic, but the question is worth answering and yes, this is possible using an algorithm. And rather than go into great detail about how it works, I'll just demonstrate with a simple example (written in C):
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
typedef unsigned long
number;
number XOR(number a, number b)
{
number
result = 0,
/*
The following calculation just gives us the highest power of
two (and thus the most significant bit) for this data type.
*/
power = pow(2, (sizeof(number) * 8) - 1);
/*
Loop until no more bits are left to test...
*/
while(power != 0)
{
result *= 2;
/*
The != comparison works just like the XOR operation.
*/
if((power > a) != (power > b))
result += 1;
a %= power;
b %= power;
power /= 2;
}
return result;
}
int main()
{
srand(time(0));
for(;;)
{
number
a = rand(),
b = rand();
printf("a = %lu\n", a);
printf("b = %lu\n", b);
printf("a ^ b = %lu\n", a ^ b);
printf("XOR(a, b) = %lu\n", XOR(a, b));
getchar();
}
}
I think this relation might help in answering your question
A + B = (A XOR B ) + 2*(A.B)
(a-b)*(a-b) is the right answer. the only one? I guess so!

Dealing with very small numbers in R

I need to calculate a list of very small numbers such as
(0.1)^1000, 0.2^(1200),
and then normalize them so they will sum up to one
i.e.
a1 = 0.1^1000,
a2 = 0.2^1200
And I want to calculate
a1' = a1/(a1+a2),
a2'=a2(a1+a2).
I'm running into underflow problems, as I get a1=0. How can I get around this?
Theoretically I could deal with logs, and then log(a1) = 1000*log(0.l) would be a way to represent a1 without underflow problems - But in order to normalize I would need to get
log(a1+a2) - which I can't compute since I can't represent a1 directly.
I'm programming with R - as far as I can tell there is no data type such Decimal in c# which
allows you to get better than double-precision value.
Any suggestions will be appreciated, thanks
Mathematically spoken, one of those numbers will be appx. zero, and the other one. The difference between your numbers is huge, so I'm even wondering if this makes sense.
But to do that in general, you can use the idea from the logspace_add C-function that's underneath the hood of R. One can define logxpy ( =log(x+y) ) when lx = log(x) and ly = log(y) as :
logxpy <- function(lx,ly) max(lx,ly) + log1p(exp(-abs(lx-ly)))
Which means that we can use :
> la1 <- 1000*log(0.1)
> la2 <- 1200*log(0.2)
> exp(la1 - logxpy(la1,la2))
[1] 5.807714e-162
> exp(la2 - logxpy(la1,la2))
[1] 1
This function can be called recursively as well if you have more numbers. Mind you, 1 is still 1, and not 1 minus 5.807...e-162 . If you really need more precision and your platform supports long double types, you could code everything in eg C or C++, and return the results later on. But if I'm right, R can - for the moment - only deal with normal doubles, so ultimately you'll lose the precision again when the result is shown.
EDIT :
to do the math for you :
log(x+y) = log(exp(lx)+exp(ly))
= log( exp(lx) * (1 + exp(ly-lx) )
= lx + log ( 1 + exp(ly - lx) )
Now you just take the largest as lx, and then you come at the expression in logxpy().
EDIT 2 : Why take the maximum then? Easy, to assure that you use a negative number in exp(lx-ly). If lx-ly gets too big, then exp(lx-ly) would return Inf. That's not a correct result. exp(ly-lx) would return 0, which allows for a far better result:
Say lx=1 and ly=1000, then :
> 1+log1p(exp(1000-1))
[1] Inf
> 1000+log1p(exp(1-1000))
[1] 1000
The Brobdingnag package deals with very large or small numbers, essentially wrapping Joris's answer into a convenient form.
a1 <- as.brob(0.1)^1000
a2 <- as.brob(0.2)^1200
a1_dash <- a1 / (a1 + a2)
a2_dash <- a2 / (a1 + a2)
as.numeric(a1_dash)
as.numeric(a2_dash)
Try the arbitrary precision packages:
Rmpfr "R MPFR - Multiple Precision Floating-Point Reliable"
Ryacas "R Interface to the 'Yacas' Computer Algebra System" - may also be able to do arbitrary precision.
Maybe you can treat a1 and a2 as fractions. In your example, with
a1 = (a1num/a1denom)^1000 # 1/10
a2 = (a2num/a2denom)^1200 # 1/5
you would arrive at
a1' = (a1num^1000 * a2denom^1200)/(a1num^1000 * a2denom^1200 + a1denom^1000 * a2num^1200)
a2' = (a1denom^1000 * a2num^1200)/(a1num^1000 * a2denom^1200 + a1denom^1000 * a2num^1200)
which can be computed using the gmp package:
library(gmp)
a1 <- as.double(pow.bigz(5,1200) / (pow.bigz(5,1200)+ pow.bigz(10,1000)))

Resources