Why do programming languages round down until .6? - math

If you put a decimal in a format where has to be rounded to the nearest 10th, and it is: 1.55, it'll round to 1.5. 1.56 will then round to 1.6. In school I recall learning that you round up when you reach five, and down if it's 4 or below. Why is it different in Python, et al.
Here's a code example for Python 2.6x (whatever the latest version is)
'{0:01.2f}'.format(5.555) # This will return '5.55'
After trying some of the examples provided, I realized something even more confusing:
'{0:01.1f}'.format(5.55) # This will return '5.5'
# But then
'{0:01.1f}'.format(1.55) # This will return '1.6'
Why the difference when using 1.55 vs 5.55. Both are typed as literals (so floats)

First off, in most languages an undecorated constant like "1.55" is treated as a double precision value. However, 1.55 is not exactly representable as double precision value, because it doesn't have a terminating representation in binary. This causes many curious behaviors, but one effect is that when you type 1.55, you don't actually get the value that's exactly halfway between 1.5 and 1.6.
In binary, the decimal number 1.55 is:
1.10001100110011001100110011001100110011001100110011001100110011001100...
When you type "1.55", this value actually gets rounded to the nearest representable double-precision value (on many systems... but there are exceptions, which I'll get to). This value is:
1.1000110011001100110011001100110011001100110011001101
which is slightly larger than 1.55; in decimal, it's exactly:
1.5500000000000000444089209850062616169452667236328125
So, when asked to round this value to a single digit after the decimal place, it will round up to 1.6. This is why most of the commenters have said that they can't duplicate the behavior that you're seeing.
But wait, on your system, "1.55" rounded down, not up. What's going on?
It could be a few different things, but the most likely is that you're on a platform (probably Windows), that defaults to doing floating-point arithmetic using x87 instructions, which use a different (80-bit) internal format. In the 80-bit format, 1.55 has the value:
1.100011001100110011001100110011001100110011001100110011001100110
which is slightly smaller than 1.55; in decimal, this number is:
1.54999999999999999995663191310057982263970188796520233154296875
Because it is just smaller than 1.55, it rounds down when it is rounded to one digit after the decimal point, giving the result "1.5" that you're observing.
FWIW: in most programming languages, the default rounding mode is actually "round to nearest, ties to even". It's just that when you specify fractional values in decimal, you'll almost never hit an exact halfway case, so it can be hard for a layperson to observe this. You can see it, though, if you look at how "1.5" is rounded to zero digits:
>>> "%.0f" % 0.5
'0'
>>> "%.0f" % 1.5
'2'
Note that both values round to even numbers; neither rounds to "1".
Edit: in your revised question, you seem to have switched to a different python interpreter, on which floating-point is done in the IEEE754 double type, not the x87 80bit type. Thus, "1.55" rounds up, as in my first example, but "5.55" converts to the following binary floating-point value:
101.10001100110011001100110011001100110011001100110011
which is exactly:
5.54999999999999982236431605997495353221893310546875
in decimal; since this is smaller than 5.55, it rounds down.

There are many ways to round numbers. You can read more about rounding on Wikipedia. The rounding method used in Python is Round half away from zero and the rounding method you are describing is more or less the same (at least for positive numbers).

Can you give some example code, because that's not the behaviour I see in Python:
>>> "%.1f" % 1.54
'1.5'
>>> "%.1f" % 1.55
'1.6'
>>> "%.1f" % 1.56
'1.6'

This doesn't appear to be the case. You're using the "float" string formatter, right?
>>> "%0.2f" % 1.55
'1.55'
>>> "%0.1f" % 1.55
'1.6'
>>> "%0.0f" % 1.55
'2'

Rounding and truncation is different for every programming language, so your question is probably directly related to Python.
However, rounding as a practice depends on your methodology.
You also should know that CONVERTING a decimal to a whole number in many programming languages yields different results from actually rounding the number.
Edit: Per some of the other posters, it seems that Python does not exhibit the rounding behavior you've described:
>>> "%0.2f" % 1.55
'1.55'
>>> "%0.1f" % 1.55
'1.6'
>>> "%0.0f" % 1.55
'2'

I can't see a reason for the exact behaviour that you are describing. If your numbers are just examples, a similar scenario can be explained by bankers rounding being used:
1.5 rounds to 2
2.5 rounds to 2
3.5 rounds to 4
4.5 rounds to 4
I.e. a .5 value will be rounded to the nearest even whole number. The reason for this is that rounding a lot of numbers would even out in the long run. If a bank for example is to pay interrest to a million customers, and 10% of them ends up with a .5 cent value to be rounded, the bank would pay out $500 more if the values were rounded up instead.
Another reason for unexpected rounding is the precision of floating point numbers. Most numbers can't be represented exactly, so they are represented by the closest possible approximation. When you think that you have a number that is 1.55, you may actually ending up with a number like 1.54999. Rounding that number to one decimal would of course result in 1.5 rather than 1.6.

One method to do away with at least one aspect of rounding problems (at least some of the time) is to do some preprocessing. Single and double precision formats can represent all integers exactly from -2^24-1 to 2^24-1 and -2^53-1 to 2^53-1 respectively. What can be done with a real number (with a non-zero fraction part) is to
strip off the sign and keep it for later
multiply the remaining positive number with 10^(number of decimal places required)
add 0.5 if your environment's rounding mode is set to chop (round towards zero)
round the number to nearest
sprintf the number to a string with 0 decimals in format
"manually" format the string according to its length following the sprintf, number of decimal places required, decimal point and sign
the string should now contain the exact number
Keep in mind that if the result after step 3 exceeds the range of the specific format (above) your answer will be incorrect.

Related

Does computer rounds the numbers in an operation first or round the result?

For instance, in opreation 9.4 - 9.0 - 0.4: Does computer first rounds the each number and store or does it make the computation with the help of some extra bits (this example is in double precision format) and then rounds the result? These are the stored values, but wasn't sure how to make this operation by hand to check if it rounds each number first or not.
binary( 9.4) = 0 10000000010 0010110011001100110011001100110011001100110011001101
binary(-9.0) = 1 10000000010 0010000000000000000000000000000000000000000000000000
binary(-0.4) = 1 01111111101 1001100110011001100110011001100110011001100110011010
binary(9.4 - 9.0 - 0.4) = 0 01111001100 0000000000000000000000000000000000000000000000000000
Generally, the computer will convert the numerals in 9.4 - 9.0 - 0.4 to numbers in an internal form, and then it will perform the arithmetic operations. These conversions generally round their results.
Consider the text in source code 9.4 - 9.0 - 0.4. Nothing in there is a number. That text is a string composed of characters. It contains the characters “9”, ”.”, “4”, “ ”, “-”, and so on. Generally, a computer converts this text to other forms for processing. You could write software that works with numbers in a text format, but this is rare. Generally, when we are using a programming language, either compiled or interpreted, the numerals in this text will be converted to some internal form. (A “numeral” is a sequence of symbols representing a number. So “9.4” is a numeral representing 9.4.)
IEEE-754 binary64 is a very common floating-point format. In this format, each representable number is expressed in units of some power of two. For example, the numbers .125, .250, .375, and .500 are also representable because they are multiples of 1/8, which is 2−3. However, 9.4 is not a multiple of any power of two, so it cannot be represented in IEEE-754 binary64.
When 9.4 is converted to binary64, the nearest representable value is 9.4000000000000003552713678800500929355621337890625. (This is a multiple of 2−50, which is the power of two used when representing numbers near 9.4, specifically numbers from 8 [inclusive] to 16 [exclusive].)
9 is representable in binary64, so 9 is converted to 9.
0.4 is not representable in binary64. When 0.4 is converted to binary64, the nearest representable value is 0.40000000000000002220446049250313080847263336181640625. This is a multiple of 2−54, which is the power of two used for numbers from ¼ to ½.
In 9.4 - 9.0 - 0.4, the result of the first subtraction is 0.4000000000000003552713678800500929355621337890625. This is exactly representable, so there is no rounding at this point. Then, when 0.4 is subtracted, after it has been converted to the value above, the result is 0.00000000000000033306690738754696212708950042724609375. This is also exactly representable, so there is again no rounding at this point.
The above describes what happens if binary64 is used throughout. Many programming languages, or specific implementations of them, use binary64. Some may use other formats. Some languages permit implementations to use a mix of formats—they may use a wider format than binary64 for doing calculations and convert to binary64 for the final result. This can cause you to see different results than the above.
So the answer to your question is that, with floating-point arithmetic, each operation produces a result that is equal to the number you would get by computing the exact real-number result and then rounding that real-number results to the nearest value representable in the floating-point format. (Rounding is most often done by rounding to the nearest representable value, with ties resolved by one of several methods, but other rounding choices are possible, such as rounding down.)
The operations generally do not round their operands. (There are exceptions, such as that some processors may convert subnormal inputs to zero.) However, those operands must be produced first, such as by converting source text to a representable number. Those conversions are separate operations from the subtraction or other operations that follow.
Some programs or some machines may use extra precision for intermediate results. This depends on lots of factors: the hardware available, what programming language you're using, what compiler you're using, what options you passed in to the compiler, etc. For example programs compiled for Intel CPU may sometimes use 80-bit precision for intermediate results, if they are compiled to use x87 instructions.
For the rest of the answer I'll assume all operations are done in 64 bit "double precision" floating point numbers.
Each number is rounded first, and the results are rounded too. For example 9.4 cannot be represented exactly as a binary floating point number, so 9.4 in a program is rounded to the closest floating point number. With 64-bit precision floats, the exact mathematical value of that number is:
9.4000000000000003552713678800500929355621337890625
So 9.4 is "rounded" to 9.4000000000000003552713678800500929355621337890625.
Similarly, 0.4 cannot be represented exactly. It is "rounded" to:
0.40000000000000002220446049250313080847263336181640625
The results of a computation may need to be rounded as well. Multiplication of two N-digit numbers produces a number with 2N digits. If you can only store N digits, what's going to happen with the rest? They are rounded off.
Here you ask about subtraction. With numbers of different magnitudes, the result of subtraction must be rounded. In the particular case of (9.4 - 9) - 0.4 all numbers have the same magnitude, so rounding of results is not happening, and the operations are mathematically exact:
Assuming all numbers are kept as 64 bit floats, the first subtraction computes:
9.4000000000000003552713678800500929355621337890625 - 9.0 =
0.4000000000000003552713678800500929355621337890625
The second subtraction computes:
0.4000000000000003552713678800500929355621337890625
- 0.40000000000000002220446049250313080847263336181640625
----------------------------------------------------------
0.00000000000000033306690738754696212708950042724609375

Qt convert to double without added precision

There is a lot of questions on rounding that i have looked at but tey all involve rounding a number to its nearest whole, or to a certain number of points. What i want to do is simply convert a string to a double without any added digits on the right of the decimal point. Here is my code and result as of now:
Convert the string 0.78240 to a double, which should be 0.78240 but instead is 0.78239999999999998 when i look at it in the debugger.
The string value is a QString and is converted to a double simply using the toDouble() function.
I don't understand how or where these extra numbers are coming from, but any help on converting from QString to double directly would be greatly appreciated!
The extra digits are there because you are converting a decimal real number to binary floating point.
Unlike real numbers, floating-point representations have infinite resolution and finite range, and also binary floating-point values do not exactly coincide with all (or even most) decimal real values.
The simple fact is that binary floating-point cannot exactly represent 0.7824010, your debugger is showing you all the available digits after round-tripping the binary value back to decimal.
It is not necessarily a problem, because the error is infinitesimally small compared to the magnitude of the value, and in any event the original 0.78240 value is no doubt some approximation of a real-world value - they are both approximations, just binary or decimal approximations.
The issue is normally dealt with at presentation rather then representation. For example, in this case, unlike your debugger which necessarily shows the full precision of the internal representation (you would not want it any other way in a debugger), the standard means of presenting such a value will limit itself to a small, or caller defined number of decimal places and this value presented to even 15 decimal places will be correctly presented as 0.782400000000000 (by default standard output methods will show just 0.7824).
Any double value presented at 15 significant decimal figures or fewer will display as expected, for a float this reduces to just 6 significant figures. I imagine your debugger is displaying more digits that can accurately be presented in an IEEE 754 64-bit FP (double) value because internally the x86 FPU uses an 80bit representation.
You are quite literally sweating the small stuff.
One place where this difference in representation does matter is in financial applications. For those, it is common to use decimal floating point and normally to many more significant figures than double can provide. However decimal floating-point is not normally implemented in hardware, so is much slower. Moreover decimal floating point is not directly supported in most programming languages, and requires library support. C# is an example of a language with built-in support for decimal floating-point; its decimal type is good for 28 significant figures.

how to make sure the rounding of a floating point number is done only on the first extra decimal position?

I have scanned a large number of questions and answers around math.round /.floor/.truncate/.ceiling but couldn't find a clear answer to my problem of rounding (with currency values).
I understand math.round / toeven / awayfromzero but that doesn't seem to do it.
( in the following examples 1.88 is just any number an can be replaced with any other n.mm value)
If the result of a calculation of "decimal / decimal" is 1.88499 I need 1.88 as the rounded result. If the calculation gives 1.885499 I want 1.89 as the result of rounding (always rounding to two decimal digits).
Now, math.round(1.88499999,2) Returns 1.89 though 0.00499999 is certainly BELOW the middle between 8 and 9.
I understand that if the number is rounded from the last decimal digit up to the first, the result is understandable:
1.88499999 -> 1.8849999 -> 1.884999 -> 1.88499 -> 1.8849 -> 1.885 -> 1.89
However, from a fiscal Point of view, like for VAT calculations, that doesn't help at all.
The only way i can think of is to cut the number first behind the third decimal digit to round it then to 2 required digits. But isn't there a more elegant way?
you can use Decimal.ToString() ,
try,
decimal dec = 1.88499999m;
dec = Convert.ToDecimal(dec.ToString("#.00"));
I have solved the issue by avoiding the float datatypes in the case.
I did use integer variables and multiplied all currency values by 100. This way the calculations are not dependend on rounding or rather are rounded correctly when the result of a calculation is set into an integer.

What determines which system is used to translate a base 10 number to decimal and vice-versa?

There are a lot of ways to store a given number in a computer. This site lists 5
unsigned
sign magnitude
one's complement
two's complement
biased (not commonly known)
I can think of another. Encode everything in Ascii and write the number with the negative sign (45) and period (46) if needed.
I'm not sure if I'm mixing apples and oranges but today I heard how computers store numbers using single and double precision floating point format. In this everything is written as a power of 2 multiplied by a fraction. This means numbers that aren't powers of 2 like 9 are written as a power of 2 multiplied by a fraction e.g. 9 ➞ 16*9/16. Is that correct?
Who decides which system is used? Is it up to the hardware of the computer or the program? How do computer algebra systems handle transindental numbers like π on a finite machine? It seems like things would be a lot easier if everything's coded in Ascii and the negative sign and the decimal is placed accordingly e.g. -15.2 would be 45 49 53 46 (to base 10)
➞
111000 110001 110101 101110
Well there are many questions here.
The main reason why the system you imagined is bad, is because the lack of entropy. An ASCII character is 8 bits, so instead of 2^32 possible integers, you could represent only 4 characters on 32 bits, so 10000 integer values (+ 1000 negative ones if you want). Even if you reduce to 12 codes (0-9, -, .) you still need 4 bits to store them. So, 10^8+10^7 integer values, still much less than 2^32 (remember, 2^10 ~ 10^3). Using binary is optimal, because our bits only have 2 values. Any base that is a power of 2 also makes sense, hence octal and hex -- but ultimately they're just binary with bits packed per 3 or 4 for readability. If you forget about the sign (just use one bit) and the decimal separator, you get BCD : Binary Coded Decimals, which are usually coded on 4 bits per digit though a version on 8 bits called uncompressed BCD also seems to exist. I'm sure with a bit of research you can find fixed or floating point numbers using BCD.
Putting the sign in front is exactly sign magnitude (without the entropy problem, since it has a constant size of 1 bit).
You're roughly right on the fraction in floating point numbers. These numbers are written with a mantissa m and an exponent e, and their value is m 2^e. If you represent an integer that way, say 8, it would be 1x2^3, then the fraction is 1 = 8/2^3. With 9 that fraction is not exactly representable, so instead of 1 we write the closest number we can with the available bits. That is what we do as well with irrational (and thus transcendental) numbers like Pi : we approximate.
You're not solving anything with this system, even for floating point values. The denominator is going to be a power of 10 instead of a power of 2, which seems more natural to you, because it is the usual way we write rounded numbers, but is not in any way more valid or more accurate. ** Take 1/6 for example, you cannot represent it with a finite number of digits in the form a/10^b. *
The most popular representations for negative numbers is 2's complement, because of its nice properties when adding negative and positive numbers.
Standards committees (argue a lot internally and eventually) decide what complex number formats like floating points look like, and how to consistently treat corner cases. E.g. should dividing by 0 yield NaN ? Infinity ? An exception ? You should check out the IEEE : www.ieee.org . Some committees are not even agreeing yet, for example on how to represent intervals for interval arithmetic. Eventually it's the people who make the processors who get the final word on how bits are interpreted into a number. But sticking to standards allows for portability and compatibility between different processors (or coprocessors, what if your GPU used a different number format ? You'd have more to do than just copy data around).
Many alternatives to floating point values exist, like fixed point or arbitrary precision numbers, logarithmic number systems, rational arithmetic...
* Since 2 divides 10, you might argue that all the numbers representable by a/2^b can be a5^b/10^b, so less numbers need to be approximated. That only covers a minuscule family (an ideal, really) of the rational numbers, which are an infinite set of numbers. So it still doesn't solve the need for approximations for many rational, as well as all irrational numbers (as Pi).
** In fact, because of the fact that we use the powers of 2 we pack more significant digits after the decimal separator than we would with powers of 10 (for a same number of bits). That is, 2^-(53+e), the smallest bit of the mantissa of a double with exponent e, is much smaller than what you can reach with 53 bits of ASCII or 4-bit base 10 digits : at best 10^-4 * 2^-e

How to get around some rounding errors?

I have a method that deals with some geographic coordinates in .NET, and I have a struct that stores a coordinate pair such that if 256 is passed in for one of the coordinates, it becomes 0. However, in one particular instance a value of approximately 255.99999998 is calculated, and thus stored in the struct. When it's printed in ToString(), it becomes 256, which should not happen - 256 should be 0. I wouldn't mind if it printed 255.9999998 but the fact that it prints 256 when the debugger shows 255.99999998 is a problem. Having it both store and display 0 would be even better.
Specifically there's an issue with comparison. 255.99999998 is sufficiently close to 256 such that it should equal it. What should I do when comparing doubles? use some sort of epsilon value?
EDIT: Specifically, my problem is that I take a value, perform some calculations, then perform the opposite calculations on that number, and I need to get back the original value exactly.
This sounds like a problem with how the number is printed, not how it is stored. A double has about 15 significant figures, so it can tell 255.99999998 from 256 with precision to spare.
You could use the epsilon approach, but the epsilon is typically a fudge to get around the fact that floating-point arithmetic is lossy.
You might consider avoiding binary floating-points altogether and use a nice Rational class.
The calculation above was probably destined to be 256 if you were doing lossless arithmetic as you would get with a Rational type.
Rational types can go by the name of Ratio or Fraction class, and are fairly simple to write
Here's one example.
Here's another
Edit....
To understand your problem consider that when the decimal value 0.01 is converted to a binary representation it cannot be stored exactly in finite memory. The Hexidecimal representation for this value is 0.028F5C28F5C where the "28F5C" repeats infinitely. So even before doing any calculations, you loose exactness just by storing 0.01 in binary format.
Rational and Decimal classes are used to overcome this problem, albeit with a performance cost. Rational types avoid this problem by storing a numerator and a denominator to represent your value. Decimal type use a binary encoded decimal format, which can be lossy in division, but can store common decimal values exactly.
For your purpose I still suggest a Rational type.
You can choose format strings which should let you display as much of the number as you like.
The usual way to compare doubles for equality is to subtract them and see if the absolute value is less than some predefined epsilon, maybe 0.000001.
You have to decide yourself on a threshold under which two values are equal. This amounts to using so-called fixed point numbers (as opposed to floating point). Then, you have to perform the round up manually.
I would go with some unsigned type with known size (eg. uint32 or uint64 if they're available, I don't know .NET) and treat it as a fixed point number type mod 256.
Eg.
typedef uint32 fixed;
inline fixed to_fixed(double d)
{
return (fixed)(fmod(d, 256.) * (double)(1 << 24))
}
inline double to_double(fixed f)
{
return (double)f / (double)(1 << 24);
}
or something more elaborated to suit a rounding convention (to nearest, to lower, to higher, to odd, to even). The highest 8 bits of fixed hold the integer part, the 24 lower bits hold the fractional part. Absolute precision is 2^{-24}.
Note that adding and substracting such numbers naturally wraps around at 256. For multiplication, you should beware.

Resources