How to refine the result of a floating point division result? - math

I have an an algorithm for calculating the floating point square root divide using the newton-raphson algorith. My results are not fully accurate and sometimes off by 1 ulp.
I was wondering if there is a refinement algorithm for floating point division to get the final bits of accuracy. I use the tuckerman test for square root, but is there a similar algorithm for division? Or can the tuckerman test be adapted for division?
I tried using this algorithm too but didn't get full accuracy results:
z= divisor
r_temp = divisor*q
r = dividend - r_temp
result_temp = r*z
q + result_temp

One practical way of correctly rounding the result of iterative division is to produce a preliminary quotient to within one ulp of the mathematical result, then use the exactly-computed residual to compute the final result.
The tool of choice for the exact computation of residuals is the fused-multiply add (FMA) operation. Much of the foundational work of this approach (both in terms of the mathematics and of practical implementations) is due to Peter Markstein and was later refined by other researchers. Markstein's results are nicely summarized in his book:
Peter Markstein, IA-64 and Elementary Functions: Speed and Precision. Prentice-Hall 2000.
A straightforward approach to correctly-rounded division using Markstein's approach is to first compute a correctly-rounded reciprocal, then compute the correctly-rounded quotient by multiplying it with the
dividend, followed by the final residual-based rounding step.
The residual can be used to compute the final rounded result directly, as is shown for the quotient rounding in the code below (I noticed that this code sequence resulted in an incorrectly rounded result in one out of 1011 divisions, and replaced it with another instance of the comparison-and-select idiom) which is the technique used by Markstein. Alternatively it may be used as part of a two-sided comparison-and-select process somewhat akin to Tuckerman rounding, which is shown for the reciprocal rounding in the code below.
There is one caveat with regard to the reciprocal computation. Many commonly used iterative approaches (including the one I used below), when combined with Markstein's rounding technique, deliver an incorrect result if the mantissa of the divisor consists entirely of 1-bits.
One way of getting around this is to treat this case specially. In the code below I instead opted for a two-sided comparison-and-select approach, which also allows errors slightly larger than one ulp prior to rounding and thus eliminates the need to use FMA in the reciprocal iteration itself.
Please note that I omitted the handling of sub-normal results in the C code below to keep the code concise and easy to follow. I have limited myself to standard C library functions for tasks like extracting parts of floating-point operands, assembling floating-point numbers, and applying one-ulp increments and decrements. Most platforms will offer machine-specific options with higher performance for these.
float my_divf (float a, float b)
{
float q, r, ma, mb, e, s, t;
int ia, ib;
if (!isnanf (a+b) && !isinff (a) && !isinff (b) && (b != 0.0f)) {
/* normal cases: remove sign, split args into exponent and mantissa */
ma = frexpf (fabsf (a), &ia);
mb = frexpf (fabsf (b), &ib);
/* minimax polynomial approximation to 1/mb for mb in [0.5,1) */
r = - 3.54939341e+0f;
r = r * mb + 1.06481802e+1f;
r = r * mb - 1.17573657e+1f;
r = r * mb + 5.65684575e+0f;
/* apply one iteration with cubic convergence */
e = 1.0f - mb * r;
e = e * e + e;
r = e * r + r;
/* round reciprocal to nearest-or-even */
e = fmaf (-mb, r, 1.0f); // residual of 1st candidate
s = nextafterf (r, copysignf (2.0f, e)); // bump or dent
t = fmaf (-mb, s, 1.0f); // residual of 2nd candidate
r = (fabsf (e) < fabsf (t)) ? r : s; // candidate with smaller residual
/* compute preliminary quotient from correctly-rounded reciprocal */
q = ma * r;
/* round quotient to nearest-or-even */
e = fmaf (-mb, q, ma); // residual of 1st candidate
s = nextafterf (q, copysignf (2.0f, e)); // bump or dent
t = fmaf (-mb, s, ma); // residual of 2nd candidate
q = (fabsf (e) < fabsf (t)) ? q : s; // candidate with smaller residual
/* scale back into result range */
r = ldexpf (q, ia - ib);
if (r < 1.17549435e-38f) {
/* sub-normal result, left as an exercise for the reader */
}
/* merge in sign of quotient */
r = copysignf (r, a * b);
} else {
/* handle special cases */
if (isnanf (a) || isnanf (b)) {
r = a + b;
} else if (b == 0.0f) {
r = (a == 0.0f) ? (0.0f / 0.0f) : copysignf (1.0f / 0.0f, a * b);
} else if (isinff (b)) {
r = (isinff (a)) ? (0.0f / 0.0f) : copysignf (0.0f, a * b);
} else {
r = a * b;
}
}
return r;
}

Related

Constraining the solutions for linear equations

I'm searching for a way to solve a system of linear equations. Specifically 8 equations with a total of 16 unknown values.
Each unknown value (w[0...15]) is a 32-bit binary value which corresponds to 4 ascii characters written over 8 bits. For example:
For :
I've tried writing this system of linear equations as a single matrix equation. Which gives:
Right now, using the Eigen linear algebra library, I get my 16 solutions (w[0...15]) but all of them are either decimal or null values, which is not what I need. All 16 solutions need to be the equivalent of 4 hexadecimal characters under their binary representation. Meaning integers between 48 and 56 (ascii for '0' to '9'), 65 and 90 (ascii for 'A' to 'Z'), or 97 and 122 (ascii for 'a' to 'z').
Current 16 solutions:
I've found a solution to this problem using something called box-constraints. An example is shown here using python's lsq_linear function which allows the user to specify bounds. It seems Eigen does not let the user specify bounds in its decomposition methods.
Therefore, my question is, how do you get a similar result in C++ using a linear algebra library? Or is there a better way to solve such systems of equations without writing it under a single matrix equation?
Thanks in advance.
Since you're working with linear equations over Z/232Z, integer linear programming (as you tagged the question) may be a solution, and algorithms that are inherently floating point are not appropriate. Box constraints are not enough, they won't force the variables to take on integer values. Also, the model shown in the question does not taken into account that multiplying and adding in Z/232Z can wrap, which excludes many potential solutions (or perhaps that is intended?) and may make the instance accidentally infeasible when it was intended to be solvable.
ILP can model equations over Z/232Z relatively directly (using integer variables between 0 and 232 and some unconstrained additional variables scaled by 232 to "absorb" the wraparound), but it tends really struggle with that kind of formulation - I would say it's one of the worst cases for an ILP solver without getting into the "intentionally difficult" cases. A more indirect model with 32x boolean variables is also possible, but this leads to constraints with very large constants and ILP solvers tend to struggle with them too. Overall I do not recommend using ILP for this problem.
What I would recommend for this is an SMT solver that offers the bitvector theory, or as alternatively, a pseudo-boolean solver or plain SAT solver (which would leave the grunt work of implementing boolean circuits and converting them to CNF to you instead of having them builtin in the solver).
If you have more unknowns than equations for sure your system will be indeterminate, the rank of a 8 x 16 matrix is at most 8, thus you have at least 16 degrees of freedom.
Further more if you have bounds to your variables i.e. mixed equalities and inequalities, then your problem is better posed as a linear programming. You can set a dummy objective function c[i] = 0, you could use GLPK but that is a very generic solution. If you want a small code snipped you probably can find a toy implementation of the Simplex method that will satisfy your needs.
I went for an SMT solver as suggested by #harold. Specifically the CVC4 SMT Solver. Here is the code I've written in C++ answering my question about finding the 16 solutions (w[0...15]) for a system of 8 equations, constrained to be ascii characters. I have one last question though. What are pushing and popping for? (slv.push() and slv.pop())
#include <iostream>
#include <cvc4/cvc4.h>
using namespace std;
using namespace CVC4;
int main() {
// 1. initialize a CVC4 BitVector SMT solver
ExprManager em;
SmtEngine slv(&em);
slv.setOption("incremental", true); // enable incremental solving
slv.setOption("produce-models", true); // enable models
slv.setLogic("QF_BV"); // set the bitvector theory logic
Type bitvector8 = em.mkBitVectorType(size_8); // create a 8-bit wide bit-vector type (4 x 8-bit = 32-bit)
// 2. create the SMT solver variables
Expr w[16][4]; // w[0...15] where each w corresponds to 4 ascii characters
for (int i = 0; i < 16; ++i) {
for (int j = 0; j < 4; ++j) {
// a. define w[i] (four ascii characters per w[i])
w[i][j] = em.mkVar("w" + to_string(i) + to_string(j), bitvector8);
// b. constraint w[i][0...3] to be an ascii character
// - digit (0-9) constraint
// ascii lower bound digit constraint (bit-vector unsigned greater than or equal)
Expr digit_lower = em.mkExpr(kind::BITVECTOR_UGE, w[i][j], em.mkConst(BitVector(size_8, Integer(48))));
// ascii upper bound digit constraint (bit-vector unsigned less than or equal)
Expr digit_upper = em.mkExpr(kind::BITVECTOR_ULE, w[i][j], em.mkConst(BitVector(size_8, Integer(56))));
Expr digit_constraint = em.mkExpr(kind::AND, digit_lower, digit_upper);
// - lower alphanumeric character (a-z) constraint
// ascii lower bound alpha constraint (bit-vector unsigned greater than or equal)
Expr alpha_lower = em.mkExpr(kind::BITVECTOR_UGE, w[i][j], em.mkConst(BitVector(size_8, Integer(97))));
// ascii upper bound alpha constraint (bit-vector unsigned less than or equal)
Expr alpha_upper = em.mkExpr(kind::BITVECTOR_ULE, w[i][j], em.mkConst(BitVector(size_8, Integer(122))));
Expr alpha_constraint = em.mkExpr(kind::AND, alpha_lower, alpha_upper);
Expr ascii_constraint = em.mkExpr(kind::OR, digit_constraint, alpha_constraint);
slv.assertFormula(ascii_constraint);
}
}
// 3. encode the 8 equations
for (int i = 0; i < 8; ++i) {
// a. build the multiplication part (index * w[i])
vector<Expr> left_mult_hand;
for (int j = 0; j < 16; ++j) {
vector <Expr> inner_wj;
for (int k = 0; k < 4; ++k) inner_wj.push_back(w[j][k]);
Expr wj = em.mkExpr(kind::BITVECTOR_CONCAT, inner_wj);
Expr index = em.mkConst(BitVector(size_32, Integer(m_unknowns[j])));
left_mult_hand.push_back(em.mkExpr(kind::BITVECTOR_MULT, index, wj));
}
// b. sum each index * w[i]
slv.push();
Expr left_hand = em.mkExpr(kind::BITVECTOR_PLUS, left_mult_hand);
Expr result = em.mkConst(BitVector(size_32, Integer(globalSums.to_ulong())));
Expr assumption = em.mkExpr(kind::EQUAL, left_hand, result);
slv.assertFormula(assumption);
// c. check for satisfiability
cout << "Result from CVC4 is: " << slv.checkSat(em.mkConst(true)) << endl << endl;
slv.pop();
}
return 0;
}

What is the runtime of this recursive code?

I am wondering what the runtime for the following recursive function would be:
int f(int n) {
if (n <= 1) {
return 1;
}
return f(n-1) + f(n-1);
}
If you think of it as a call tree, each node would have 2 branches. The number of nodes in that call tree would be 2⁰ + 2¹ + 2² + 2³ + ... + 2^n which is equivalent to 2^(n+1) - 1. So the time complexity of this function should be O(2^(n+1)-1) assuming that each call has a constant time of O(1) - Am I correct?. According to the book where I have this example from, the time complexity is O(2^n). I am confused - what am I missing?
Big-O Notation ignores constant factors and lower order terms. So O(2^(n+1)-1) is equivalent to O(2^n).
O(2^(n+1)-1) = O(2^n * 2^1 - 1)
We drop the constant factor of 2^1, and then we drop the lower order term of -1 as 2^n grows asymptotically faster.

Julia on Float versus Octave on Float

Version: v"0.5.0-dev+1259"
Context: The goal is to calculate the Rademacher penalty bound on a give data points n with respect to VC-dimension dvc and probability expressed by delta
Please consider Julia code:
#Growth function on any n points with respect to VC-dimmension
function mh(n, dvc)
if n <= dvc
2^n #A
else
n^dvc #B
end
end
#Rademacher penalty bound
function rademacher_penalty_bound(n::Int, dvc::Int, delta::Float64)
sqrt((2.0*log(2.0*n*mh(n,dvc)))/n) + sqrt((2.0/n)*log(1.0/delta)) + 1.0/n
end
and the equivalent code in Octave/Matlab:
%Growth function on n points for a give VC dimmension (dvc)
function md = mh(n, dvc)
if n <= dvc
md= 2^n;
else
md = n^dvc;
end
end
%Rademacher penalty bound
function epsilon = rademacher_penalty_bound (n, dvc, delta)
epsilon = sqrt ((2*log(2*n*mh(n,dvc)))/n) + sqrt((2/n)*log(1/delta)) + 1/n;
end
Problem:
When I start testing it I receive the following results:
Julia first:
julia> rademacher_penalty_bound(50, 50, 0.05) #50 points
1.619360057204432
julia> rademacher_penalty_bound(500, 50, 0.05) #500 points
ERROR: DomainError:
[inlined code] from math.jl:137
in rademacher_penalty_bound at none:2
in eval at ./boot.jl:264
Now Octave:
octave:17> rademacher_penalty_bound(50, 50, 0.05)
ans = 1.6194
octave:18> rademacher_penalty_bound(500, 50, 0.05)
ans = 1.2387
Question: According to Noteworthy differences from MATLAB I think I followed the rule of thumb ("literal numbers without a decimal point (such as 42) create integers instead of floating point numbers..."). The code crashes when the number of points exceeds 51 (line #B in mh). Can someone with more experience can look at the code and say what I should improve/change?
While BigInt and BigFloat will work here, they're serious overkill. The real issue is that you're doing integer exponentiation in Julia and floating-point exponentiation in Octave/Matlab. So you just need to change mh to use floats instead of integers for exponents:
mh(n, dvc) = n <= dvc ? 2^float(n) : n^float(dvc)
rademacher_penalty_bound(n, dvc, δ) =
√((2log(2n*mh(n,dvc)))/n) + √(2log(1/δ)/n) + 1/n
With these definitions, you get the same results as Octave/Matlab:
julia> rademacher_penalty_bound(50, 50, 0.05)
1.619360057204432
julia> rademacher_penalty_bound(500, 50, 0.05)
1.2386545010981596
In Octave/Matlab, even when you input a literal without a decimal point, you still get a float – you have to do an explicit cast to int type. Also, exponentiation in Octave/Matlab always converts to float first. In Julia, x^2 is equivalent to x*x which prohibits conversion to floating-point.
Although BigInt and BigFloat are excellent tools when they are necessary, they should usually be avoided, since they are overkill and slow.
In this case, the problem is indeed the difference between Octave, that treats everything as a floating-point number, and Julia, that treats e.g. 2 as an integer.
So the first thing to do is to use floating-point numbers in Julia too:
function mh(n, dvc)
if n <= dvc
2.0 ^ n
else
Float64(n) ^ dvc
end
end
This already helps, e.g. mh(50, 50) works.
However, the correct solution for this problem is to look at the code more carefully, and realise that the function mh only occurs inside a log:
log(2.0*n*mh(n,dvc))
We can use the laws of logarithms to rewrite this as
log(2.0*n) + log_mh(n, dvc)
where log_mh is a new function, which returns the logarithm of the result of mh. Of course, this should not be written directly as log(mh(n, dvc)), but is rather a new function:
function log_mh(n, dvc)
if n <= dvc
n * log(2.0)
else
dvc * log(n)
end
end
In this way, you will be able to use huge numbers without overflow.
I don't know is it acceptable to get results of BigFloat but anyway in julia part you can use BigInt
#Growth function on any n points with respect to VC-dimmension
function mh(n, dvc)
if n <= dvc
(BigInt(2))^n #A
else
n^dvc #B
end
end
#Rademacher penalty bound
function rademacher_penalty_bound(n::BigInt, dvc::BigInt, delta::Float64)
sqrt((2.0*log(2.0*n*mh(n,dvc)))/n) + sqrt((2.0/n)*log(1.0/delta)) + 1.0/n
end
rademacher_penalty_bound(BigInt(500), BigInt(500), 0.05)
# => 1.30055251010957621105182244420.....
Because by default a Julia Int is a "machine-size" integer, a 64-bit integer for the common x86-64 platform, whereas Octave uses floating point. So in Julia mh(500,50) overflows. You can fix it by replacing mh() as follows:
function mh(n, dvc)
n2 = BigInt(n) # Or n2 = Float64(n)
if n <= dvc
2^n2 #A
else
n2^dvc #B
end
end

A numerical library which uses a paralleled algorithm to do one dimensional integration?

Is there a numerical library which can use a paralleled algorithm to do one dimensional integration (global adaptive method)? The infrastructure of my code decides that I cannot do multiple numerical integrations in parallel, but I have to use a paralleled algorithm to speed up.
Thanks!
Nag C numerical library does have a parallel version of adaptive quadrature (link here). Their trick is to request the user the following function
void (*f)(const double x[], Integer nx, double fv[], Integer *iflag, Nag_Comm *comm)
Here the function "f" evaluates the integrand at nx abscise points given by the vector x[]. This is where parallelization comes along, because you can use parallel_for (implemented in openmp for example) to evaluate f at those points concurrently. The integrator itself is single threaded.
Nag is a very expensive library, but if you code the integrator yourself using, for example, numerical recipes, it is not difficult to modify serial implementations to create parallel adaptive integrators using NAG idea.
I can't reproduce numerical recipes book to show where modifications are necessary due to license restriction. So let's take the simplest example of trapezoidal rule, where the implementation is quite simple and well known. The simplest way to create adaptive method using trapezoidal rule is to calculate the integral at a grid of points, then double the number of abscise points and compare the results. If the result changes by less than the requested accuracy, then there is convergence.
At each step, the trapezoidal rule can be computed using the following generic implementation
double trapezoidal( double (*f)(double x), double a, double b, int n)
{
double h = (b - a)/n;
double s = 0.5 * h * (f(a) + f(b));
for( int i = 1; i < n; ++i ) s += h * f(a + i*h);
return s;
}
Now you can make the following changes to implement NAG idea
double trapezoidal( void (*f)( double x[], int nx, double fv[] ), double a, double b, int n)
{
double h = (b - a)/n;
double x[n+1];
double fv[n+1];
for( int i = 0; i < n; ++i ) x[i+1] = (a + i * h);
x[n] = b;
f(x, n, fv); // inside f, use parallel_for to evaluate the integrand at x[i], i=0..n
double s = 0.5 * h * ( fv[0] + fv[n] );
for( int i = 1; i < n; ++i ) s += h * fv[i];
return s;
}
This procedure, however, will only speed-up your code if the integrand is very expensive to compute. Otherwise, you should parallelize your code at higher loops and not inside the integrator.
Why not simply implement a wrapper around a single threaded algorithm that dispatches integrals of subdivisions of the bounds to different threads and then adds them together at the end? e.g.
thread 0: i0 = integral(x0, (x0+x1)/2)
thread 1: i1 = integral((x0+x1)/2, x1)
i = i0 + i1

Mathematical (Arithmetic) representation of XOR

I have spent the last 5 hours searching for an answer. Even though I have found many answers they have not helped in any way.
What I am basically looking for is a mathematical, arithmetic only representation of the bitwise XOR operator for any 32bit unsigned integers.
Even though this sounds really simple, nobody (at least it seems so) has managed to find an answer to this question.
I hope we can brainstorm, and find a solution together.
Thanks.
XOR any numerical input between 0 and 1 including both ends
a + b - ab(1 + a + b - ab)
XOR binary input
a + b - 2ab or (a-b)²
Derivation
Basic Logical Operators
NOT = (1-x)
AND = x*y
From those operators we can get...
OR = (1-(1-a)(1-b)) = a + b - ab
Note: If a and b are mutually exclusive then their and condition will always be zero - from a Venn diagram perspective, this means there is no overlap. In that case, we could write OR = a + b, since a*b = 0 for all values of a & b.
2-Factor XOR
Defining XOR as (a OR B) AND (NOT (a AND b)):
(a OR B) --> (a + b - ab)
(NOT (a AND b)) --> (1 - ab)
AND these conditions together to get...
(a + b - ab)(1 - ab) = a + b - ab(1 + a + b - ab)
Computational Alternatives
If the input values are binary, then powers terms can be ignored to arrive at simplified computationally equivalent forms.
a + b - ab(1 + a + b - ab) = a + b - ab - a²b - ab² + a²b²
If x is binary (either 1 or 0), then we can disregard powers since 1² = 1 and 0² = 0...
a + b - ab - a²b - ab² + a²b² -- remove powers --> a + b - 2ab
XOR (binary) = a + b - 2ab
Binary also allows other equations to be computationally equivalent to the one above. For instance...
Given (a-b)² = a² + b² - 2ab
If input is binary we can ignore powers, so...
a² + b² - 2ab -- remove powers --> a + b - 2ab
Allowing us to write...
XOR (binary) = (a-b)²
Multi-Factor XOR
XOR = (1 - A*B*C...)(1 - (1-A)(1-B)(1-C)...)
Excel VBA example...
Function ArithmeticXOR(R As Range, Optional EvaluateEquation = True)
Dim AndOfNots As String
Dim AndGate As String
For Each c In R
AndOfNots = AndOfNots & "*(1-" & c.Address & ")"
AndGate = AndGate & "*" & c.Address
Next
AndOfNots = Mid(AndOfNots, 2)
AndGate = Mid(AndGate, 2)
'Now all we want is (Not(AndGate) AND Not(AndOfNots))
ArithmeticXOR = "(1 - " & AndOfNots & ")*(1 - " & AndGate & ")"
If EvaluateEquation Then
ArithmeticXOR = Application.Evaluate(xor2)
End If
End Function
Any n of k
These same methods can be extended to allow for any n number out of k conditions to qualify as true.
For instance, out of three variables a, b, and c, if you're willing to accept any two conditions, then you want a&b or a&c or b&c. This can be arithmetically modeled from the composite logic...
(a && b) || (a && c) || (b && c) ...
and applying our translations...
1 - (1-ab)(1-ac)(1-bc)...
This can be extended to any n number out of k conditions. There is a pattern of variable and exponent combinations, but this gets very long; however, you can simplify by ignoring powers for a binary context. The exact pattern is dependent on how n relates to k. For n = k-1, where k is the total number of conditions being tested, the result is as follows:
c1 + c2 + c3 ... ck - n*∏
Where c1 through ck are all n-variable combinations.
For instance, true if 3 of 4 conditions met would be
abc + abe + ace + bce - 3abce
This makes perfect logical sense since what we have is the additive OR of AND conditions minus the overlapping AND condition.
If you begin looking at n = k-2, k-3, etc. The pattern becomes more complicated because we have more overlaps to subtract out. If this is fully extended to the smallest value of n = 1, then we arrive at nothing more than a regular OR condition.
Thinking about Non-Binary Values and Fuzzy Region
The actual algebraic XOR equation a + b - ab(1 + a + b - ab) is much more complicated than the computationally equivalent binary equations like x + y - 2xy and (x-y)². Does this mean anything, and is there any value to this added complexity?
Obviously, for this to matter, you'd have to care about the decimal values outside of the discrete points (0,0), (0,1), (1,0), and (1,1). Why would this ever matter? Sometimes you want to relax the integer constraint for a discrete problem. In that case, you have to look at the premises used to convert logical operators to equations.
When it comes to translating Boolean logic into arithmetic, your basic building blocks are the AND and NOT operators, with which you can build both OR and XOR.
OR = (1-(1-a)(1-b)(1-c)...)
XOR = (1 - a*b*c...)(1 - (1-a)(1-b)(1-c)...)
So if you're thinking about the decimal region, then it's worth thinking about how we defined these operators and how they behave in that region.
Non-Binary Meaning of NOT
We expressed NOT as 1-x. Obviously, this simple equation works for binary values of 0 and 1, but the thing that's really cool about it is that it also provides the fractional or percent-wise compliment for values between 0 to 1. This is useful since NOT is also known as the Compliment in Boolean logic, and when it comes to sets, NOT refers to everything outside of the current set.
Non-Binary Meaning of AND
We expressed AND as x*y. Once again, obviously it works for 0 and 1, but its effect is a little more arbitrary for values between 0 to 1 where multiplication results in partial truths (decimal values) diminishing each other. It's possible to imagine that you would want to model truth as being averaged or accumulative in this region. For instance, if two conditions are hypothetically half true, is the AND condition only a quarter true (0.5 * 0.5), or is it entirely true (0.5 + 0.5 = 1), or does it remain half true ((0.5 + 0.5) / 2)? As it turns out, the quarter truth is actually true for conditions that are entirely discrete and the partial truth represents probability. For instance, will you flip tails (binary condition, 50% probability) both now AND again a second time? Answer is 0.5 * 0.5 = 0.25, or 25% true. Accumulation doesn't really make sense because it's basically modeling an OR condition (remember OR can be modeled by + when the AND condition is not present, so summation is characteristically OR). The average makes sense if you're looking at agreement and measurements, but it's really modeling a hybrid of AND and OR. For instance, ask 2 people to say on a scale of 1 to 10 how much do they agree with the statement "It is cold outside"? If they both say 5, then the truth of the statement "It is cold outside" is 50%.
Non-Binary Values in Summary
The take away from this look at non-binary values is that we can capture actual logic in our choice of operators and construct equations from the ground up, but we have to keep in mind numerical behavior. We are used to thinking about logic as discrete (binary) and computer processing as discrete, but non-binary logic is becoming more and more common and can help make problems that are difficult with discrete logic easier/possible to solve. You'll need to give thought to how values interact in this region and how to translate them into something meaningful.
"mathematical, arithmetic only representation" are not correct terms anyway. What you are looking for is a function which goes from IxI to I (domain of integer numbers).
Which restrictions would you like to have on this function? Only linear algebra? (+ , - , * , /) then it's impossible to emulate the XOR operator.
If instead you accept some non-linear operators like Max() Sgn() etc, you can emulate the XOR operator with some "simpler" operators.
Given that (a-b)(a-b) quite obviously computes xor for a single bit, you could construct a function with the floor or mod arithmetic operators to split the bits out, then xor them, then sum to recombine. (a-b)(a-b) = a2 -2·a·b + b2 so one bit of xor gives a polynomial with 3 terms.
Without floor or mod, the different bits interfere with each other, so you're stuck with looking at a solution which is a polynomial interpolation treating the input a,b as a single value: a xor b = g(a · 232 + b)
The polynomial has 264-1 terms, though will be symmetric in a and b as xor is commutative so you only have to calculate half of the coefficients. I don't have the space to write it out for you.
I wasn't able to find any solution for 32-bit unsigned integers but I've found some solutions for 2-bit integers which I was trying to use in my Prolog program.
One of my solutions (which uses exponentiation and modulo) is described in this StackOverflow question and the others (some without exponentiation, pure algebra) can be found in this code repository on Github: see different xor0 and o_xor0 implementations.
The nicest xor represention for 2-bit uints seems to be: xor(A,B) = (A + B*((-1)^A)) mod 4.
Solution with +,-,*,/ expressed as Excel formula (where cells from A2 to A5 and cells from B1 to E1 contain numbers 0-4) to be inserted in cells from A2 to E5:
(1-$A2)*(2-$A2)*(3-$A2)*($A2+B$1)/6 - $A2*(1-$A2)*(3-$A2)*($A2+B$1)/2 + $A2*(1-$A2)*(2-$A2)*($A2-B$1)/6 + $A2*(2-$A2)*(3-$A2)*($A2-B$1)/2 - B$1*(1-B$1)*(3-B$1)*$A2*(3-$A2)*(6-4*$A2)/2 + B$1*(1-B$1)*(2-B$1)*$A2*($A2-3)*(6-4*$A2)/6
It may be possible to adapt and optimize this solution for 32-bit unsigned integers. It's complicated and it uses logarithms but seems to be the most universal one as it can be used on any integer number. Additionaly, you'll have to check if it really works for all number combinations.
I do realize that this is sort of an old topic, but the question is worth answering and yes, this is possible using an algorithm. And rather than go into great detail about how it works, I'll just demonstrate with a simple example (written in C):
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
typedef unsigned long
number;
number XOR(number a, number b)
{
number
result = 0,
/*
The following calculation just gives us the highest power of
two (and thus the most significant bit) for this data type.
*/
power = pow(2, (sizeof(number) * 8) - 1);
/*
Loop until no more bits are left to test...
*/
while(power != 0)
{
result *= 2;
/*
The != comparison works just like the XOR operation.
*/
if((power > a) != (power > b))
result += 1;
a %= power;
b %= power;
power /= 2;
}
return result;
}
int main()
{
srand(time(0));
for(;;)
{
number
a = rand(),
b = rand();
printf("a = %lu\n", a);
printf("b = %lu\n", b);
printf("a ^ b = %lu\n", a ^ b);
printf("XOR(a, b) = %lu\n", XOR(a, b));
getchar();
}
}
I think this relation might help in answering your question
A + B = (A XOR B ) + 2*(A.B)
(a-b)*(a-b) is the right answer. the only one? I guess so!

Resources