I would like to declare a speed range for a record type in Ada. The following won't work, but is there a way to make it work?
--Speed in knots, range 0 to unlimited
Speed : float Range 0.0 .. unlimited ;
I just want a zero positive value for this number...
You can't -- but since Speed is of type Float, its value can't exceed Float'Last anyway.
Speed : Float range 0.0 .. Float'Last;
(You'll likely want to declare an explicit type or subtype.)
Just for completeness, you can also define your own basic float types rather than use one called Float which may or may not have the range you require.
For example, Float is defined somewhere in the compiler or RTS (Runtime System) sources, probably as type Float is digits 7; alongside type Long_Float is digits 15;, giving you 7 and 15 digits precision respectively.
You can define yours likewise to satisfy the precision and range your application requires. The philosophy is, state what you need (in range and precision), and let the compiler satisfy it most efficiently. This is programming in the problem domain, stating what you want - rather than in the solution domain, binding your program to what a specific machine or compiler supports.
The compiler will either use the next highest precision native float (usually IEEE 32-bit or 64-bit floats) or complain that it can't do that
(e.g. if you declare
type Extra_Long_Float is digits 33 range 0.0 .. Long_Float'Last * Long_Float'Last;
your compiler may complain if it doesn't support 128 bit floats.
Unlimited isn't possible. It would require unlimited memory. I'm not aware of any platform that has that. It's possible to write a package that provides rational numbers as big as the available memory can handle (see PragmARC.Rational_Numbers in the PragmAda Reusable Components for an example), but that's probably not what you're interested in. You can declare your own type with the maximal precision supported by your compiler:
type Speed_Value_Base is digits System.Max_Digits;
subtype Speed_Value is Speed_Value_Base range 0.0 .. Speed_Value_Base'Last;
Speed : Speed_Value;
which is probably what you're after.
Related
I saw that when using parse for example, you can do this: parse(Int, "123") (Int exists), but I can't do parse(Float, "12.3") (Float doesn't exist).
Why doesn't Float exist as well then? What is the difference between Int and for example Int64 or some other number after Int anyways (I know it has to do with the size, but how can you know when to use which)?
The reason for this is that machines are either 32 bit or 64 bit. This is the size of pointers on these machines, and since pointers are just integers in hardware, it is also the "natural" integer size. Floating point arithmetic is different. (Almost) all computers have both FLoat32 and Float64, and the choice needs to be made based on application (how much range and accuracy you need). That said, you could always define const Float = Float64 and then just use Float.
C++11 introduced very useful math functions in the standard like erf and erfc. There are mentions about "guaranteed underflow" for inputs greater or smaller than certain values, but I don't know enough about floating point representation to understand clearly what this means in terms of precision.
If this question makes sense; what precision (order of magnitude at least) can I expect from the approximation implemented by the standard library (if it is specified)?
This depends on the quality of the implementation, which is up to the vendor of the compiler (or runtime libraries, if acquired separately). In the best case, the precision will match the precision of the specific type you use (double, long double, and so on).
Notice that the precision of the returned value is not related to the guaranteed underflow. This is just an enforced postcondition that assures the return value is the special underflow FP value if the input is outside the expected domain.
I have a method that deals with some geographic coordinates in .NET, and I have a struct that stores a coordinate pair such that if 256 is passed in for one of the coordinates, it becomes 0. However, in one particular instance a value of approximately 255.99999998 is calculated, and thus stored in the struct. When it's printed in ToString(), it becomes 256, which should not happen - 256 should be 0. I wouldn't mind if it printed 255.9999998 but the fact that it prints 256 when the debugger shows 255.99999998 is a problem. Having it both store and display 0 would be even better.
Specifically there's an issue with comparison. 255.99999998 is sufficiently close to 256 such that it should equal it. What should I do when comparing doubles? use some sort of epsilon value?
EDIT: Specifically, my problem is that I take a value, perform some calculations, then perform the opposite calculations on that number, and I need to get back the original value exactly.
This sounds like a problem with how the number is printed, not how it is stored. A double has about 15 significant figures, so it can tell 255.99999998 from 256 with precision to spare.
You could use the epsilon approach, but the epsilon is typically a fudge to get around the fact that floating-point arithmetic is lossy.
You might consider avoiding binary floating-points altogether and use a nice Rational class.
The calculation above was probably destined to be 256 if you were doing lossless arithmetic as you would get with a Rational type.
Rational types can go by the name of Ratio or Fraction class, and are fairly simple to write
Here's one example.
Here's another
Edit....
To understand your problem consider that when the decimal value 0.01 is converted to a binary representation it cannot be stored exactly in finite memory. The Hexidecimal representation for this value is 0.028F5C28F5C where the "28F5C" repeats infinitely. So even before doing any calculations, you loose exactness just by storing 0.01 in binary format.
Rational and Decimal classes are used to overcome this problem, albeit with a performance cost. Rational types avoid this problem by storing a numerator and a denominator to represent your value. Decimal type use a binary encoded decimal format, which can be lossy in division, but can store common decimal values exactly.
For your purpose I still suggest a Rational type.
You can choose format strings which should let you display as much of the number as you like.
The usual way to compare doubles for equality is to subtract them and see if the absolute value is less than some predefined epsilon, maybe 0.000001.
You have to decide yourself on a threshold under which two values are equal. This amounts to using so-called fixed point numbers (as opposed to floating point). Then, you have to perform the round up manually.
I would go with some unsigned type with known size (eg. uint32 or uint64 if they're available, I don't know .NET) and treat it as a fixed point number type mod 256.
Eg.
typedef uint32 fixed;
inline fixed to_fixed(double d)
{
return (fixed)(fmod(d, 256.) * (double)(1 << 24))
}
inline double to_double(fixed f)
{
return (double)f / (double)(1 << 24);
}
or something more elaborated to suit a rounding convention (to nearest, to lower, to higher, to odd, to even). The highest 8 bits of fixed hold the integer part, the 24 lower bits hold the fractional part. Absolute precision is 2^{-24}.
Note that adding and substracting such numbers naturally wraps around at 256. For multiplication, you should beware.
Can anyone please tell me the usage of the following declarations shown below.I am a beginner in ada language.I had tried the internet but that was not clear enough.
type Unsigned_4 is mod 2 ** 4;
for Unsigned_4'Size use 4;
Unsigned_4 is a "modular type" taking the values 0, 1, .. 14, 15, and wrapping round.
U : Unsigned_4;
begin
U := Unsigned_4'Last; -- 15
U := U + 1; -- 0
You only need 4 bits to implement the type, so it's OK to specify that as its Size (I think this may be simply a confirming spec, since the compiler clearly knows that already; if you were hoping to fit it into 3 bits and said for Unsigned_4'Size use 3; the compiler would tell you that you were wrong).
Most compilers will want to store values of the type in at least a byte, for efficient access. The minimum size comes into its own when you use the type in a packed record (pragma Pack).
The "is mod" is Ada's way of saying that this is a modular type. Modular types work a bit like unsigned types in C: They don't have negative values, and once you reach the largest representable value, if you add one you will get 0.
If you were to try the same with a normal (non-modular) integer in Ada, you'd get constraint_error
For a computer working with a 64 bit processor, the largest number that it can handle would be 264 = 18,446,744,073,709,551,616. How does programming languages, say Java or be it C, C++ handle arithmetic of numbers higher than this value. Any register cannot hold it as a single piece. How was this issue tackled?
There are lots of specialized techniques for doing calculations on numbers larger than the register size. Some of them are outlined in this wikipedia article on arbitrary precision arithmetic
Low level languages, like C and C++, leave large number calculations to the library of your choice. One notable one is the GNU Multi-Precision library. High level languages like Python, and others, integrate this into the core of the language, so normal numbers and very large numbers are identical to the programmer.
You assume the wrong thing. The biggest number it can handle in a single register is a 64-bits number. However, with some smart programming techniques, you could just combined a few dozens of those 64-bits numbers in a row to generate a huge 6400 bit number and use that to do more calculations. It's just not as fast as having the number fit in one register.
Even the old 8 and 16 bits processors used this trick, where they would just let the number overflow to other registers. It makes the math more complex but it doesn't put an end to the possibilities.
However, such high-precision math is extremely unusual. Even if you want to calculate the whole national debt of the USA and store the outcome in Zimbabwean Dollars, a 64-bits integer would still be big enough, I think. It's definitely big enough to contain the amount of my savings account, though.
Programming languages that handle truly massive numbers use custom number primitives that go beyond normal operations optimized for 32, 64, or 128 bit CPUs. These numbers are especially useful in computer security and mathematical research.
The GNU Multiple Precision Library is probably the most complete example of these approaches.
You can handle larger numbers by using arrays. Try this out in your web browser. Type the following code in the JavaScript console of your web browser:
The point at which JavaScript fails
console.log(9999999999999998 + 1)
// expected 9999999999999999
// actual 10000000000000000 oops!
JavaScript does not handle plain integers above 9999999999999998. But writing your own number primitive is to make this calculation work is simple enough. Here is an example using a custom number adder class in JavaScript.
Passing the test using a custom number class
// Require a custom number primative class
const {Num} = require('./bases')
// Create a massive number that JavaScript will not add to (correctly)
const num = new Num(9999999999999998, 10)
// Add to the massive number
num.add(1)
// The result is correct (where plain JavaScript Math would fail)
console.log(num.val) // 9999999999999999
How it Works
You can look in the code at class Num { ... } to see details of what is happening; but here is a basic outline of the logic in use:
Classes:
The Num class contains an array of single Digit classes.
The Digit class contains the value of a single digit, and the logic to handle the Carry flag
Steps:
The chosen number is turned into a string
Each digit is turned into a Digit class and stored in the Num class as an array of digits
When the Num is incremented, it gets carried to the first Digit in the array (the right-most number)
If the Digit value plus the Carry flag are equal to the Base, then the next Digit to the left is called to be incremented, and the current number is reset to 0
... Repeat all the way to the left-most digit of the array
Logistically it is very similar to what is happening at the machine level, but here it is unbounded. You can read more about about how digits are
carried here; this can be applied to numbers of any base.
Ada actually supports this natively, but only for its typeless constants ("named numbers"). For actual variables, you need to go find an arbitrary-length package. See Arbitrary length integer in Ada
More-or-less the same way that you do. In school, you memorized single-digit addition, multiplication, subtraction, and division. Then, you learned how to do multiple-digit problems as a sequence of single-digit problems.
If you wanted to, you could multiply two twenty-digit numbers together using nothing more than knowledge of a simple algorithm, and the single-digit times tables.
In general, the language itself doesn't handle high-precision, high-accuracy large number arithmetic. It's far more likely that a library is written that uses alternate numerical methods to perform the desired operations.
For example (I'm just making this up right now), such a library might emulate the actual techniques that you might use to perform that large number arithmetic by hand. Such libraries are generally much slower than using the built-in arithmetic, but occasionally the additional precision and accuracy is called for.
As a thought experiment, imagine the numbers stored as a string. With functions to add, multiply, etc these arbitrarily long numbers.
In reality these numbers are probably stored in a more space efficient manner.
Think of one machine-size number as a digit and apply the algorithm for multi-digit multiplication from primary school. Then you don't need to keep the whole numbers in registers, just the digits as they are worked on.
Most languages store them as array of integers. If you add/subtract two to of these big numbers the library adds/subtracts all integer elements in the array separately and handles the carries/borrows.
It's like manual addition/subtraction in school because this is how it works internally.
Some languages use real text strings instead of integer arrays which is less efficient but simpler to transform into text representation.