Is 1 / 0 = 0 according to Isabelle? - isabelle

The following lemma:
lemma "(1::real) / 0 = 0" by simp
goes through because of theorem division_ring_divide_zero
I find this very disturbing since if I want to show that some fraction is non-zero I have to show that the numerator is non-zero AND the denominator is non-zero, which might make sense but confuses two different problems into one.
Is there a way of separating the well-definition of a fraction and its non-zeroness?

Isabelle/HOL is a logic of total functions, so there is no built-in notion of a fraction or any other function application being undefined. That is, a / b is defined for all a and b, and it returns their quotient except when b is zero. But then it still has a value.
In the library, the decision was made to complete the function in such a way that x / 0 = 0. This decision simplifies many proofs, since you have to deal with less side conditions. Unfortunately it also sometimes confuses people who expect something else.

Related

Power with integer exponents in Isabelle

Here is my definition of power for integer exponents following this mailing-list post:
definition
"ipow x n = (if n < 0 then (1 / x) ^ n else x ^ n)"
notation ipow (infixr "^⇩i" 80)
Is there a better way to define it?
Is there an existing theory in Isabelle that already includes it so that I can reuse its results?
Context
I am dealing with complex exponentials, for instance consider this theorem:
after I proved it I realized I need to work with integers n not just naturals and this involves using powers to take out the n from the exponential.
I don't think something like this exists in the library. However, you have a typo in your definition. I believe you want something like
definition
"ipow x n = (if n < 0 then (1 / x) ^ nat (-n) else x ^ nat n)"
Apart from that, it is fine. You could write inverse x ^ nat (-n), but it should make little difference in practice. I would suggest the name int_power since the corresponding operation with natural exponents is called power.
Personally, I would avoid introducting a new constant like this because in order to actually use it productively, you also need an extensive collection of theorems around it. This means quite a bit of (tedious) work. Do you really need to talk about integers here? I find that one can often get around it in practice (in particular, note that the exponentials in question are periodic anyway).
It may be useful to introduce such a power operator nevertheless; all I'm saying is you should be aware of the trade-off.
Side note: An often overlooked function in Isabelle that is useful when talking about exponentials like this is cis (as in ‘cosine + i · sine‘). cis x is equivalent to ‘exp(ix)’ where x is real.

Why a/0 returns Inf instead of NaN in R? [duplicate]

I'm just curious, why in IEEE-754 any non zero float number divided by zero results in infinite value? It's a nonsense from the mathematical perspective. So I think that correct result for this operation is NaN.
Function f(x) = 1/x is not defined when x=0, if x is a real number. For example, function sqrt is not defined for any negative number and sqrt(-1.0f) if IEEE-754 produces a NaN value. But 1.0f/0 is Inf.
But for some reason this is not the case in IEEE-754. There must be a reason for this, maybe some optimization or compatibility reasons.
So what's the point?
It's a nonsense from the mathematical perspective.
Yes. No. Sort of.
The thing is: Floating-point numbers are approximations. You want to use a wide range of exponents and a limited number of digits and get results which are not completely wrong. :)
The idea behind IEEE-754 is that every operation could trigger "traps" which indicate possible problems. They are
Illegal (senseless operation like sqrt of negative number)
Overflow (too big)
Underflow (too small)
Division by zero (The thing you do not like)
Inexact (This operation may give you wrong results because you are losing precision)
Now many people like scientists and engineers do not want to be bothered with writing trap routines. So Kahan, the inventor of IEEE-754, decided that every operation should also return a sensible default value if no trap routines exist.
They are
NaN for illegal values
signed infinities for Overflow
signed zeroes for Underflow
NaN for indeterminate results (0/0) and infinities for (x/0 x != 0)
normal operation result for Inexact
The thing is that in 99% of all cases zeroes are caused by underflow and therefore in 99%
of all times Infinity is "correct" even if wrong from a mathematical perspective.
I'm not sure why you would believe this to be nonsense.
The simplistic definition of a / b, at least for non-zero b, is the unique number of bs that has to be subtracted from a before you get to zero.
Expanding that to the case where b can be zero, the number that has to be subtracted from any non-zero number to get to zero is indeed infinite, because you'll never get to zero.
Another way to look at it is to talk in terms of limits. As a positive number n approaches zero, the expression 1 / n approaches "infinity". You'll notice I've quoted that word because I'm a firm believer in not propagating the delusion that infinity is actually a concrete number :-)
NaN is reserved for situations where the number cannot be represented (even approximately) by any other value (including the infinities), it is considered distinct from all those other values.
For example, 0 / 0 (using our simplistic definition above) can have any amount of bs subtracted from a to reach 0. Hence the result is indeterminate - it could be 1, 7, 42, 3.14159 or any other value.
Similarly things like the square root of a negative number, which has no value in the real plane used by IEEE754 (you have to go to the complex plane for that), cannot be represented.
In mathematics, division by zero is undefined because zero has no sign, therefore two results are equally possible, and exclusive: negative infinity or positive infinity (but not both).
In (most) computing, 0.0 has a sign. Therefore we know what direction we are approaching from, and what sign infinity would have. This is especially true when 0.0 represents a non-zero value too small to be expressed by the system, as it frequently the case.
The only time NaN would be appropriate is if the system knows with certainty that the denominator is truly, exactly zero. And it can't unless there is a special way to designate that, which would add overhead.
NOTE:
I re-wrote this following a valuable comment from #Cubic.
I think the correct answer to this has to come from calculus and the notion of limits. Consider the limit of f(x)/g(x) as x->0 under the assumption that g(0) == 0. There are two broad cases that are interesting here:
If f(0) != 0, then the limit as x->0 is either plus or minus infinity, or it's undefined. If g(x) takes both signs in the neighborhood of x==0, then the limit is undefined (left and right limits don't agree). If g(x) has only one sign near 0, however, the limit will be defined and be either positive or negative infinity. More on this later.
If f(0) == 0 as well, then the limit can be anything, including positive infinity, negative infinity, a finite number, or undefined.
In the second case, generally speaking, you cannot say anything at all. Arguably, in the second case NaN is the only viable answer.
Now in the first case, why choose one particular sign when either is possible or it might be undefined? As a practical matter, it gives you more flexibility in cases where you do know something about the sign of the denominator, at relatively little cost in the cases where you don't. You may have a formula, for example, where you know analytically that g(x) >= 0 for all x, say, for example, g(x) = x*x. In that case the limit is defined and it's infinity with sign equal to the sign of f(0). You might want to take advantage of that as a convenience in your code. In other cases, where you don't know anything about the sign of g, you cannot generally take advantage of it, but the cost here is just that you need to trap for a few extra cases - positive and negative infinity - in addition to NaN if you want to fully error check your code. There is some price there, but it's not large compared to the flexibility gained in other cases.
Why worry about general functions when the question was about "simple division"? One common reason is that if you're computing your numerator and denominator through other arithmetic operations, you accumulate round-off errors. The presence of those errors can be abstracted into the general formula format shown above. For example f(x) = x + e, where x is the analytically correct, exact answer, e represents the error from round-off, and f(x) is the floating point number that you actually have on the machine at execution.

Why is NA^0 = 1 in R? [duplicate]

Prompted by a spot of earlier code golfing why would:
>NaN^0
[1] 1
It makes perfect sense for NA^0 to be 1 because NA is missing data, and any number raised to 0 will give 1, including -Inf and Inf. However NaN is supposed to represent not-a-number, so why would this be so? This is even more confusing/worrying when the help page for ?NaN states:
In R, basically all mathematical functions (including basic
Arithmetic), are supposed to work properly with +/- Inf and NaN as
input or output.
The basic rule should be that calls and relations with Infs really are
statements with a proper mathematical limit.
Computations involving NaN will return NaN or perhaps NA: which of
those two is not guaranteed and may depend on the R platform (since
compilers may re-order computations).
Is there a philosophical reason behind this, or is it just to do with how R represents these constants?
This is referenced in the help page referenced by ?'NaN'
"The IEC 60559 standard, also known as the ANSI/IEEE 754 Floating-Point Standard.
http://en.wikipedia.org/wiki/NaN."
And there you find this statement regarding what should create a NaN:
"There are three kinds of operations that can return NaN:[5]
Operations with a NaN as at least one operand.
It is probably is from the particular C compiler, as signified by the Note you referenced. This is what the GNU C documentation says:
http://www.gnu.org/software/libc/manual/html_node/Infinity-and-NaN.html
" NaN, on the other hand, infects any calculation that involves it. Unless the calculation would produce the same result no matter what real value replaced NaN, the result is NaN."
So it seems that the GNU-C people have a different standard in mind when writing their code. And the 2008 version of ANSI/IEEE 754 Floating-Point Standard is reported to make that suggestion:
http://en.wikipedia.org/wiki/NaN#Function_definition
The published standard is not free. So if you are have access rights or money you can look here:
http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4610933
The answer can be summed up by "for historical reasons".
It seems that IEEE 754 introduced two different power functions - pow and powr, with the latter preserving NaN's in the OP case and also returning NaN for Inf^0, 0^0, 1^Inf, but eventually the latter was dropped as explained briefly here.
Conceptually, I'm in the NaN preserving camp, because I'm coming at the issue from viewpoint of limits, but from convenience point of view I expect current conventions are slightly easier to deal with, even if they don't make a lot of sense in some cases (e.g. sqrt(-1)^0 being equal to 1 while all operations are on real numbers makes little sense if any).
Yes, I'm late here, but as R Core member who was involved in this design, let me recall what I commented above. NaN preserving and NA preserving work "equivalently" in R, so if you agree that NA^0 should give 1, NaN^0 |-> 1 is a consequence.
Indeed (as others said) you should really read R's help pages and not C or
IEEE standards, to answer such questions,
and SimonO101 correctly cited
1 ^ y and y ^ 0 are 1, always
and I'm pretty sure that I was heavily involved (if not the author) of that.
Note that it is good, not bad, to be able to provide non-NaN answers, also in cases other programming languages do differently.
The consequence of such a rule is that more things work automatically correctly;
in the other case, the R programmer would have been urged to do more special casing herself.
Or put differently, a simple rule as the above (returning non-NaN in all cases) is a good rule, because it propagates continuity in a mathematical sense: lim_x f(x) = f(lim x).
We have had a few cases where it was clearly advantageous (i.e. did not need special casing, I'm repeating..) to adhere to the above "= 1" rule, rather than to propagate NaN. As I said further up, the sqrt(-1)^0 is also such an example, as 1 is the correct result as soon as you extend to the complex plane.
Here's one reasoning. From Goldberg:
In IEEE 754, NaNs are often represented as floating-point numbers with
the exponent e_max + 1 and nonzero significands.
So NaN is a floating-point number, though with a special meaning. Raising a number to the power zero sets its exponent to zero, therefore it will no longer be NaN.
Also note:
> 1^NaN
[1] 1
One is a number whose exponent is zero already.
Conceptually, the only problem with NaN^0 == 1 is that zero values can come about at least four different ways, but the IEEE format uses the same representation for three of them. The above formula equality sense for the most common case (which is one of the three), but not for the others.
BTW, the four cases I would recognize would be:
A literal zero
Unsigned zero: the difference between two numbers that are indistinguishable
Positive infinitesimal: The product or quotient of two numbers of matching sign, which is too small to be distinguished from zero.
Negative infinitesimal: The product or quotient of two numbers of opposite sign, which is too small to be distinguished from zero.
Some of these may be produced via other means (e.g. literal zero could be produced as the sum of two literal zeros; positive infinitesimal by the division of a very small number by a very large one, etc.).
If a floating-point recognized the above, it could usefully regard raising NaN to a literal zero as yielding one, and raising it to any other kind of zero as yielding NaN; such a rule would allow a constant result to be assumed in many cases where something that might be NaN would be raised to something the compiler could identify as a constant zero, without such assumption altering program semantics. Otherwise, I think the issue is that most code isn't going to care whether x^0 might would NaN if x is NaN, and there's not much point to having a compiler add code for conditions code isn't going to care about. Note that the issue isn't just the code to compute x^0, but for any computations based on that which would be constant if x^0 was.
If you look at the type of NaN, it is still a number, it's just not a specific number that can be represented by the numeric type.
EDIT:
For example, if you were to take 0/0. What is the result? If you tried to solve this equation on paper, you get stuck at the very first digit, how many zero's fit into another 0? You can put 0, you can put 1, you can put 8, they all fit into 0*x=0 but it's impossible to know which one the correct answer is. However, that does not mean the answer is no longer a number, it's just not a number that can be represented.
Regardless, any number, even a number that you can't represent, to the power of zero is still 1. If you break down some math x^8 * x^0 can be further simplified by x^(8+0) which equates to x^8, where did the x^0 go? It makes sense if x^0 = 1 because then the equation x^8 * 1 explains why x^0 just sort of disappears from existence.

Why is this linear program infeasible in GLPK?

I have the following problem set up in glpk. Two variables, p and v, and three constraints. The objective is to maximize v.
p >= 0
p == 1
-v + 3p >= 0
The answer should be v==3, but for some reason, the solver tells me it is infeasible when using the simplex method, and complains about numerical instability when using an interior point method.
This problem is generated as a subproblem of a bigger problem, and obviously not all subproblems are as trivial or I would just hardcode the solution.
Because, for some reason, by default, columns variables are fixed at 0 (GLP_FX) and not free. I don't see how that default makes sense.

Quantifying the non-randomness of a specialized random generator?

I just read this interesting question about a random number generator that never generates the same value three consecutive times. This clearly makes the random number generator different from a standard uniform random number generator, but I'm not sure how to quantitatively describe how this generator differs from a generator that didn't have this property.
Suppose that you handed me two random number generators, R and S, where R is a true random number generator and S is a true random number generator that has been modified to never produce the same value three consecutive times. If you didn't tell me which one was R or S, the only way I can think of to detect this would be to run the generators until one of them produced the same value three consecutive times.
My question is - is there a better algorithm for telling the two generators apart? Does the restriction of not producing the same number three times somehow affect the observable behavior of the generator in a way other than preventing three of the same value from coming up in a row?
As a consequence of Rice's Theorem, there is no way to tell which is which.
Proof: Let L be the output of the normal RNG. Let L' be L, but with all sequences of length >= 3 removed. Some TMs recognize L', but some do not. Therefore, by Rice's theorem, determining if a TM accepts L' is not decidable.
As others have noted, you may be able to make an assertion like "It has run for N steps without repeating three times", but you can never make the leap to "it will never repeat a digit three times." More appropriately, there exists at least one machine for which you can't determine whether or not it meets this criterion.
Caveat: if you had a truly random generator (e.g. nuclear decay), it is possible that Rice's theorem would not apply. My intuition is that the theorem still holds for these machines, but I've never heard it discussed.
EDIT: a secondary proof. Suppose P(X) determines with high probability whether or not X accepts L'. We can construct an (infinite number of) programs F like:
F(x): if x(F), then don't accept L'
else, accept L'
P cannot determine the behavior of F(P). Moreover, say P correctly predicts the behavior of G. We can construct:
F'(x): if x(F'), then don't accept L'
else, run G(x)
So for every good case, there must exist at least one bad case.
If S is defined by rejecting from R, then a sequence produced by S will be a subsequence of the sequence produced by R. For example, taking a simple random variable X with equal probability of being 1 or 0, you would have:
R = 0 1 1 0 0 0 1 0 1
S = 0 1 1 0 0 1 0 1
The only real way to differentiate these two is to look for streaks. If you are generating binary numbers, then streaks are incredibly common (so much so that one can almost always differentiate between a random 100 digit sequence and one that a student writes down trying to be random). If the numbers are taken from [0,1] uniformly, then streaks are far less common.
It's an easy exercise in probability to calculate the chance of three consecutive numbers being equal once you know the distribution, or even better, the expected number of numbers needed until the probability of three consecutive equal numbers is greater than p for your favourite choice of p.
Since you defined that they only differ with respect to that specific property there is no better algorithm to distinguish those two.
If you do triples of randum values of course the generator S will produce all other triples slightly more often than R in order to compensate the missing triples (X,X,X). But to get a significant result you'd need much more data than it will cost you to find any value three consecutive times the first time.
Probably use ENT ( http://fourmilab.ch/random/ )

Resources