I wonder if anyone has a good solution for decimal numbers.
I use a Arduino Mega, and try to convert a float with 6 numbers after decimal point. When I try, I get 5 numbers correct, but not number 6. The 6 number is either not counted, or shown as 0. I have tried a lot of different things, but it always end up showing 5 numbers correct, but not 6.
Do anyone has a solution for this?
Appriciate all help
In general, you can use scaled integer forms of floating-point numbers to preserve accuracy.
Specifically, if these are lat/lon values from a GPS device, you might be interested in my NeoGPS. Internally, it uses uses 32-bit integers to maintain 10 significant digits. As you have discovered, most libraries only provide the 6 or 7 digits because they use float.
The example NMEAloc.ino shows how to print the 32-bit integers as if they were floating-point values. It just prints the decimal point at the right place.
The NeoGPS distance and bearing calculations are also careful to perform math operations in a way that maintains that accuracy. The results are very good at small distances/bearings, unlike all other libraries that use the float type in naive calculations.
4 byte floats can hold 6 significant digits.
8 byte doubles can hold 15.
You need to use doubles to get the precision you want.
info on 4 byte floats
Related
There is a lot of questions on rounding that i have looked at but tey all involve rounding a number to its nearest whole, or to a certain number of points. What i want to do is simply convert a string to a double without any added digits on the right of the decimal point. Here is my code and result as of now:
Convert the string 0.78240 to a double, which should be 0.78240 but instead is 0.78239999999999998 when i look at it in the debugger.
The string value is a QString and is converted to a double simply using the toDouble() function.
I don't understand how or where these extra numbers are coming from, but any help on converting from QString to double directly would be greatly appreciated!
The extra digits are there because you are converting a decimal real number to binary floating point.
Unlike real numbers, floating-point representations have infinite resolution and finite range, and also binary floating-point values do not exactly coincide with all (or even most) decimal real values.
The simple fact is that binary floating-point cannot exactly represent 0.7824010, your debugger is showing you all the available digits after round-tripping the binary value back to decimal.
It is not necessarily a problem, because the error is infinitesimally small compared to the magnitude of the value, and in any event the original 0.78240 value is no doubt some approximation of a real-world value - they are both approximations, just binary or decimal approximations.
The issue is normally dealt with at presentation rather then representation. For example, in this case, unlike your debugger which necessarily shows the full precision of the internal representation (you would not want it any other way in a debugger), the standard means of presenting such a value will limit itself to a small, or caller defined number of decimal places and this value presented to even 15 decimal places will be correctly presented as 0.782400000000000 (by default standard output methods will show just 0.7824).
Any double value presented at 15 significant decimal figures or fewer will display as expected, for a float this reduces to just 6 significant figures. I imagine your debugger is displaying more digits that can accurately be presented in an IEEE 754 64-bit FP (double) value because internally the x86 FPU uses an 80bit representation.
You are quite literally sweating the small stuff.
One place where this difference in representation does matter is in financial applications. For those, it is common to use decimal floating point and normally to many more significant figures than double can provide. However decimal floating-point is not normally implemented in hardware, so is much slower. Moreover decimal floating point is not directly supported in most programming languages, and requires library support. C# is an example of a language with built-in support for decimal floating-point; its decimal type is good for 28 significant figures.
There are a lot of ways to store a given number in a computer. This site lists 5
unsigned
sign magnitude
one's complement
two's complement
biased (not commonly known)
I can think of another. Encode everything in Ascii and write the number with the negative sign (45) and period (46) if needed.
I'm not sure if I'm mixing apples and oranges but today I heard how computers store numbers using single and double precision floating point format. In this everything is written as a power of 2 multiplied by a fraction. This means numbers that aren't powers of 2 like 9 are written as a power of 2 multiplied by a fraction e.g. 9 ➞ 16*9/16. Is that correct?
Who decides which system is used? Is it up to the hardware of the computer or the program? How do computer algebra systems handle transindental numbers like π on a finite machine? It seems like things would be a lot easier if everything's coded in Ascii and the negative sign and the decimal is placed accordingly e.g. -15.2 would be 45 49 53 46 (to base 10)
➞
111000 110001 110101 101110
Well there are many questions here.
The main reason why the system you imagined is bad, is because the lack of entropy. An ASCII character is 8 bits, so instead of 2^32 possible integers, you could represent only 4 characters on 32 bits, so 10000 integer values (+ 1000 negative ones if you want). Even if you reduce to 12 codes (0-9, -, .) you still need 4 bits to store them. So, 10^8+10^7 integer values, still much less than 2^32 (remember, 2^10 ~ 10^3). Using binary is optimal, because our bits only have 2 values. Any base that is a power of 2 also makes sense, hence octal and hex -- but ultimately they're just binary with bits packed per 3 or 4 for readability. If you forget about the sign (just use one bit) and the decimal separator, you get BCD : Binary Coded Decimals, which are usually coded on 4 bits per digit though a version on 8 bits called uncompressed BCD also seems to exist. I'm sure with a bit of research you can find fixed or floating point numbers using BCD.
Putting the sign in front is exactly sign magnitude (without the entropy problem, since it has a constant size of 1 bit).
You're roughly right on the fraction in floating point numbers. These numbers are written with a mantissa m and an exponent e, and their value is m 2^e. If you represent an integer that way, say 8, it would be 1x2^3, then the fraction is 1 = 8/2^3. With 9 that fraction is not exactly representable, so instead of 1 we write the closest number we can with the available bits. That is what we do as well with irrational (and thus transcendental) numbers like Pi : we approximate.
You're not solving anything with this system, even for floating point values. The denominator is going to be a power of 10 instead of a power of 2, which seems more natural to you, because it is the usual way we write rounded numbers, but is not in any way more valid or more accurate. ** Take 1/6 for example, you cannot represent it with a finite number of digits in the form a/10^b. *
The most popular representations for negative numbers is 2's complement, because of its nice properties when adding negative and positive numbers.
Standards committees (argue a lot internally and eventually) decide what complex number formats like floating points look like, and how to consistently treat corner cases. E.g. should dividing by 0 yield NaN ? Infinity ? An exception ? You should check out the IEEE : www.ieee.org . Some committees are not even agreeing yet, for example on how to represent intervals for interval arithmetic. Eventually it's the people who make the processors who get the final word on how bits are interpreted into a number. But sticking to standards allows for portability and compatibility between different processors (or coprocessors, what if your GPU used a different number format ? You'd have more to do than just copy data around).
Many alternatives to floating point values exist, like fixed point or arbitrary precision numbers, logarithmic number systems, rational arithmetic...
* Since 2 divides 10, you might argue that all the numbers representable by a/2^b can be a5^b/10^b, so less numbers need to be approximated. That only covers a minuscule family (an ideal, really) of the rational numbers, which are an infinite set of numbers. So it still doesn't solve the need for approximations for many rational, as well as all irrational numbers (as Pi).
** In fact, because of the fact that we use the powers of 2 we pack more significant digits after the decimal separator than we would with powers of 10 (for a same number of bits). That is, 2^-(53+e), the smallest bit of the mantissa of a double with exponent e, is much smaller than what you can reach with 53 bits of ASCII or 4-bit base 10 digits : at best 10^-4 * 2^-e
Suppose I wanted to add, subtract, and/or multiply the following two floating point numbers that follow the format:
1 bit sign
3 bit exponent (bias 3)
6 bit mantissa
Can someone briefly explain how I would do that? I've tried searching online for helpful resources, but I haven't been able to find anything too intuitive. However, I know the procedure is generally supposed to be very simple. As an example, here are two numbers that I'd like to perform the three operations on:
0 110 010001
1 010 010000
To start, take the significand encoding and prefix it with a “1.”, and write the result with the sign determined by the sign bit. So, for your example numbers, we have:
+1.010001
-1.010000
However, these have different scales, because they have different exponents. The exponent of the second one is four less than the first one (0102 compared to 1102). So shift it right by four bits:
+1.010001
- .0001010000
Now both significands have the same scale (exponent 1102), so we can perform normal arithmetic, in binary:
+1.010001
- .0001010000
_____________
+1.0011000000
Next, round the significand to the available bits (seven). In this case, the trailing bits are zero, so the rounding does not change anything:
+1.001100
At this point, we could have a significand that needed more shifting, if it were greater than 2 (102) or less than 1. However, this significand is just where we want it, between 1 and 2. So we can keep the exponent as is (1102).
Convert the sign back to a bit, take the leading “1.” off the significand, and put the bits together:
0 110 001100
Exceptions would arise if the number overflowed or underflowed the normal exponent range, but those did not happen here.
I came across an interesting math problem that would require me to do some artithmetic with numbers that have more than 281 digits. I know that its impossible to represent a number this large with a system where there is one memory unit for each digit but wondered if there were any ways around this.
My initial thought was to use a extremely large base instead of base 10 (decimal). After some thought I believe (but can't verify) that the optimal base would be the square root of the number of digits (so for a number with 281 digits you'd use base 240ish) which is a improvement but that doesn't scale well and still isn't really practical.
So what options do I have? I know of many arbitrary precision libraries, but are there any that scale to support this sort of arithmetic?
Thanks o7
EDIT: after thinking some more i realize i may be completely wrong about the "optimal base would be the square root of the number of digits" but a) that's why im asking and b) im too tired to remember my initial reasoning for assumption.
EDIT 2: 1000,000 in base ten = F4240 in base 16 = 364110 in base 8. In base 16 you need 20 bits to store the number in base 8 you need 21 so it would seem that by increasing the base you decrees the total number of bits needed. (again this could be wrong)
This is really a compression problem pretending to be an arithmetic problem. What you can do with such a large number depends entirely on its Kolmogorov complexity. If you're required to do computations on such a large number, it's obviously not going be arrive as 2^81 decimal digits; the Kolmogorov complexity would too high in that case and you can't even finish reading the input before the sun goes out. The best way to deal with such a number is via delayed evaluation and symbolic rational types that a language like Scheme provides. This way a program may be able to answer some questions about the result of computations on the number without actually having to write out all those digits to memory.
I think you should just use scientific notation. You will lose precision, but you can not store numbers that large without losing precision, because storing 2^81 digits will require more than 10^24 bits(about thousand billion terabytes), which is much more that you can have nowadays.
that have more than 2^81 digits
Non-fractional number with 2^81 bits, will take 3*10^11 terabytes of data. Per number.
That's assuming you want every single digit and data isn't compressible.
You could attempt to compress the data storing it in some kind of sparse array that allocates memory only for non-zero elements, but that doesn't guarantee that data will be fit anywhere.
Such precision is useless and impossible to handle on modern hardware. 2^81 bits will take insane amount of time to simply walk through number (9584 trillion years, assuming 1 byte takes 1 millisecond), never mind multiplication/division. I also can't think of any problem that would require precision like that.
Your only option is to reduce precision to first N significant digits and use floating point numbers. Since data won't fit into double, you'll have to use bignum library with floating point support, that provides extremely large floating point numbers. Since you can represent 2^81 (exponent) in bits, you can store beginning of a number using very big floating point.
1000,000 in base ten
Regardless of your base, positive number will take at least floor(log2(number))+1 bits to store it. If base is not 2, then it will take more than floor(log2(number))+1 bits to store it. Numeric base won't reduce number of required bits.
I am trying to perform arithmetic using only 16-bit signed words. I need to be able to perform addition, multiplication, etc.
As an example I need to subtract two data values, below is an example:
7269.554688-46.8
or 4385.6616210938 + 32.2
However, these values need to be converted into 16-bit words and then the subtraction, multiplication, or addition can be performed.
I could also use multiple 16-bit words to store one value.
How would I go about performing operations like addition, subtraction and multiplication and how would I convert all of my input values appropriately so that the decimal points always line up properly?
What platform are you coding for? To perform the operations you've given as an example, you would need a floating point unit. Floating point numbers are usually represented through 32 bits or 64 bits, rarely 16 bit.
If you don't have one and all you have are simple operations on 16 bit integers, you could emulate a floating point unit, but that is not a trivial task.