I'm trying to build a calculator in Flex / actionscript 3 but have some weird results using the class Math :
trace(1.4 - .4); //should be 1 but it is 0.9999999999999999
trace(1.5 - .5); //should be 1 and it is 1
trace(1.444 - .444); //should be 1 and it is 1
trace(1.555 - .555); //should be 1 but it is 0.9999999999999999
I know there are some issues with floating point numbers, but as you can see, it should at least fail for all of my examples, am I right?
How the problem is solved in other calculators and how should I proceed in order to build a usable calculator in Actionscript 3 please?
Thank you in advance,
Adnan
Welcome to IEEE 754 floating point. Enjoy the inaccuracies. Use a fixed-point mechanism if you want to avoid them.
Your results are to be expected, and will be observed in any programming language with a floating point datatype. Computers cannot accurately store all numbers, which causes edge cases like the ones you posted.
Read up on floating point accuracy problems at Wikipedia.
I would assume that most calculators display fewer decimal places than the precision of their floating point. Rounding to fewer decimal places than your level of precision should alleviate this sort of problem, but it won't solve all of the issues.
Related
Good afternoon all. I wasn't exactly sure where to post this question so I apologize if this is the wrong thread. I am currently taking a Discreet Mathematics course and initially I thought I understood binary float to decimal conversion rather well from a previous course. However today while doing some practice work using arbitrary sizes, I came across a problem that I must understand.
For the sake of easy math, I am going to use a 1 sign bit, 3 bit exponent (with a 4 bit bias instead of 127) and a 4 bit mantissa.
I have this number. 0 010 0100 Seems easy enough and it probably is to all you experts.
I know the first bit 0 is the sign bit, this number is positive.
I also know that the next 3 bits are the exponent bits. 010 represents 2. For this problem I am using a 4 bit bias instead of 127 so I do 2 - 4 = -2. I will shift the invisible decimal over to the left 2 spots on the Mantissa.
Here is my question. This mantissa starts with 0 instead of a 1. So is the "invisible" decimal point before or after that 1?
Basically what I am asking is, before shifting the decimal, is the mantissa 0.100 or 1.00 ? Oddly enough with all the floating point questions asked on exams from my previous classes, I don't believe I came across this problem. Perhaps the Professor's were being kind to us by giving us easy scenarios.
I always thought that the Mantissa is "normalized so I should see this Mantissa as 1.000 before shifting to the left twice to get .01000 which becomes .25 in decimal. But now I am not so sure.
Thanks for your time all!
For normal float formats, there is an implied leading one which is not encoded in the mantissa bits. So your mantissa would actually be 1.0100 in binary.
for more info see IEEE_754-1985
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Comparing floating point values
How dangerous is it to compare floating point values?
I don't understand, why comparison of real numbers is a bad practice in programming? Of course I understand that real numbers can be represented with some order of accuracy. Can you explain me a weighty reason not to compare this kind of numbers? Examples would be good, the articles are also welcome.
Thanks beforehand.
From all the questions from under floating-accuracy tag on this site any discussion should probably start with a reference to this question: How dangerous is it to compare floating point values?
And a reference thereof to "What Every Computer Scientist Should Know About Floating Point Arithmetic" by David Goldberg. Here is a short summary.
1. Exact floating point results are not portable
Floating point arithmetic is neither commutative nor associative. IEEE 754 standard that most compilers and platforms follow does not guarantee exact reproducibility of results. Also results will vary on different processors.
2. Floating point comparison does not agree with mathematics
Consider the following statement
int i = 0; double x = 1.0;
while (x != 0.0) { x = x/2 ; i++;}
In real numbers this computation should never complete however in floating point it will terminate. The value of i depends on the underlying hardware. Using floating point comparison will make it more difficult to analyze the code.
3. Why then is floating point comparison implemented in hardware?
There are places where exact floating point equality is necessary. One is normalization of floating point numbers.
I'm trying to find a little bit more information for efficient square root algorithms which are most likely implemented on FPGA. A lot of algorithms are found already but which one are for example from Intel or AMD?
By efficient I mean they are either really fast or they don't need much memory.
EDIT: I should probably mention that the question is generally a floating point number and since most of the hardware implements the IEEE 754 standard where the number is represented as: 1 sign bit, 8 bits biased exponent and 23 bits mantissa.
Thanks!
Not a full solution, but a couple of pointers.
I assume you're working in floating point, so point 1 is remember that floating point is stored as a mantissa and exponent. The exponent of the square root will be approximately half the exponent of the original number thanks to logarithms.
Then the mantissa can be approximated with a look-up table, and then you can use a couple of newton-raphson rounds to give some accuracy to the result from the LUT.
I haven't implemented anything like this for about 8 years, but I think this is how I did it and was able to get a result in 3 or 4 cycles.
This is a great one for fast inverse-quare root.
Have a look at it here. Notice it's pretty much about the initial guess, rather amazing document :)
can any one please explain why this gives different outputs?
round(1.49999999999999)
1
round(1.4999999999999999)
2
I have read the round documentation but it does not mention anything about it there.
I know that R represents numbers in binary form, but why does adding two extra 9's changes the result?
Thanks.
1.4999999999999999 can't be represented internally, so it gets rounded to 1.5.
Now, when you apply round(), the result is 2.
Put those two numbers into variable and then print it - you'll see they are different.
Computers doesn't store this kind of numbers with this exact value, (They don't use decadic numbers internaly)
I have never used R, so I don't know is this is the issue, but in other languages such as C/C++ a number like 1.4999999999999999 is represented by a float or a double.
Since these have finite precision, you cannot represent something like 1.4999999999999999 exactly. It might be the case that 1.4999999999999999 actually gets stored as 1.50000000000000 instead due to limitations on floating point precision.
I am thinking recently on how floating point math works on computers and is hard for me understand all the tecnicals details behind the formulas. I would need to understand the basics of addition, subtraction, multiplication, division and remainder. With these I will be able to make trig functions and formulas.
I can guess something about it, but its a bit unclear. I know that a fixed point can be made by separating a 4 byte integer by a signal flag, a radix and a mantissa. With this we have a 1 bit flag, a 5 bits radix and a 10 bit mantissa. A word of 32 bits is perfect for a floating point value :)
To make an addition between two floats, I can simply try to add the two mantissas and add the carry to the 5 bits radix? This is a way to do floating point math (or fixed point math, to be true) or I am completely wrong?
All the explanations I saw use formulas, multiplications, etc. and they look so complex for a thing I guess, would be a bit more simple. I would need an explanation more directed to beginning programmers and less to mathematicians.
See Anatomy of a floating point number
The radix depends of the representation, if you use radix r=2 you can never change it, the number doesn't even have any data that tell you which radix have. I think you're wrong and you mean exponent.
To add two numbers in floating point you must make the exponent one equal to another by rotating the mantissa. One bit right means exponent+1, and one bit left means exponent -1, when you have the numbers with the same exponent then you can add them.
Value(x) = mantissa * radix ^ exponent
adding these two numbers
101011 * 2 ^ 13
001011 * 2 ^ 12
would be the same as adding:
101011 * 2 ^ 13
000101 * 2 ^ 13
After making exponent equal one to another you can operate.
You also have to know if the representation has implicit bit, I mean, the most significant bit must be a 1, so usually, as in the iee standard its known to be there, but it isn't representated, although its used to operate.
I know this can be a bit confusing and I'm not the best teacher so any doubt you have, just ask.
Run, don't walk, to get Knuth's Seminumerical Algorithms which contains wonderful intuition and algorithms behind doing multiprecision and floating point arithmetic.