I saw that when using parse for example, you can do this: parse(Int, "123") (Int exists), but I can't do parse(Float, "12.3") (Float doesn't exist).
Why doesn't Float exist as well then? What is the difference between Int and for example Int64 or some other number after Int anyways (I know it has to do with the size, but how can you know when to use which)?
The reason for this is that machines are either 32 bit or 64 bit. This is the size of pointers on these machines, and since pointers are just integers in hardware, it is also the "natural" integer size. Floating point arithmetic is different. (Almost) all computers have both FLoat32 and Float64, and the choice needs to be made based on application (how much range and accuracy you need). That said, you could always define const Float = Float64 and then just use Float.
Related
I would like to declare a speed range for a record type in Ada. The following won't work, but is there a way to make it work?
--Speed in knots, range 0 to unlimited
Speed : float Range 0.0 .. unlimited ;
I just want a zero positive value for this number...
You can't -- but since Speed is of type Float, its value can't exceed Float'Last anyway.
Speed : Float range 0.0 .. Float'Last;
(You'll likely want to declare an explicit type or subtype.)
Just for completeness, you can also define your own basic float types rather than use one called Float which may or may not have the range you require.
For example, Float is defined somewhere in the compiler or RTS (Runtime System) sources, probably as type Float is digits 7; alongside type Long_Float is digits 15;, giving you 7 and 15 digits precision respectively.
You can define yours likewise to satisfy the precision and range your application requires. The philosophy is, state what you need (in range and precision), and let the compiler satisfy it most efficiently. This is programming in the problem domain, stating what you want - rather than in the solution domain, binding your program to what a specific machine or compiler supports.
The compiler will either use the next highest precision native float (usually IEEE 32-bit or 64-bit floats) or complain that it can't do that
(e.g. if you declare
type Extra_Long_Float is digits 33 range 0.0 .. Long_Float'Last * Long_Float'Last;
your compiler may complain if it doesn't support 128 bit floats.
Unlimited isn't possible. It would require unlimited memory. I'm not aware of any platform that has that. It's possible to write a package that provides rational numbers as big as the available memory can handle (see PragmARC.Rational_Numbers in the PragmAda Reusable Components for an example), but that's probably not what you're interested in. You can declare your own type with the maximal precision supported by your compiler:
type Speed_Value_Base is digits System.Max_Digits;
subtype Speed_Value is Speed_Value_Base range 0.0 .. Speed_Value_Base'Last;
Speed : Speed_Value;
which is probably what you're after.
According to this presentation (http://oud.ocaml.org/2012/slides/oud2012-paper13-slides.pdf, PDF page 4), the following two data structures use different amount of memory
type t1 = { f1: float; f2:float};;
type t2 = (float * float);;
And t1 use less memory than t2, can anybody explain to me why is this the case?
19.3.3 of http://caml.inria.fr/pub/docs/manual-ocaml/intfc.html#sec425 says:
Arrays of floating-point numbers (type float array) have a special, unboxed, more efficient representation. These arrays are represented by pointers to blocks with tag Double_array_tag.
This is introduced to handle big float arrays efficiently, but this applies also to record types only with floats.
https://realworldocaml.org/v1/en/html/memory-representation-of-values.html is also a very good documentation which explains the internal value representation of OCaml.
In addition to camlspotter answer a few clarification notes:
In general records, arrays and tuples use the same amount of memory.
Some compound data structures using floats are exceptions. By default, each float is boxed, i.e., represented not as immediate value, but as a pointer to the allocated double precision floating point value. OCaml optimizes float records and arrays to store data directly, without double boxing.
This doesn't work for polymorphic records, e.g., float ref, that is underneath the hood is a record of type a ref = {mutable contents : 'a} still occupies extra amount of space, i.e., it is a pointer to a record, that contains a pointer to the word. But, if you define, type float_ref = {mutable float_contents : float} this would be a pointer to a record that contains directly a float value.
Has anyone experiences replacing floating point operations on ATMega (2560) based systems? There are a couple of very common situations which happen every day.
For example:
Are comparisons faster than divisions/multiplications?
Are float to int type cast with followed multiplication/division faster than pure floating point operations without type cast?
I hope I don't have to make a benchmark just for me.
Example one:
int iPartialRes = (int)fArg1 * (int)fArg2;
iPartialRes *= iFoo;
faster as?:
float fPartialRes = fArg1 * fArg2;
fPartialRes *= iFoo;
And example two:
iSign = fVal < 0 ? -1 : 1;
faster as?:
iSign = fVal / fabs(fVal);
the questions could be solved just by thinking a moment about it.
AVRs does not have a FPU so all floating point related stuff is done in software --> fp multiplication involves much more than a simple int multiplication
since AVRs also does not have a integer division unit a simple branch is also much faster than a software division. if dividing floating points this is the worst worst case :)
but please note, that your first 2 examples produce very different results.
This is an old answer but I will submit this elaborated answer for the curious.
Just typecasting a float will truncate it ie; 3.7 will become 3, there is no rounding.
Fastest math on a 2560 will be (+,-,*) with divide being the slowest due to no hardware divide. Typecasting to an unsigned long int after multiplying all operands by a pseudo decimal point that suits your fractal number(1) range that your floats are expected to see and tracking the sign as a bool will give the best range/accuracy compromise.
If your loop needs to be as fast as possible, avoid even integer division, instead multiplying by a pseudo fraction instead and then doing your typecast back into a float with myFloat(defined elsewhere) = float(myPseudoFloat) / myPseudoDecimalConstant;
Not sure if you came across the Show info page in the playground. It's basically a sketch that runs a benchmark on your (insert Arduino model here) Shows the actual compute times for various things and systems. The Mega 2560 will be very close to an At Mega 328 as far as FLOPs goes, up to 12.5K/s (80uS per divide float). Typecasting would likely handicap the CPU more as it introduces more overhead and might even give erroneous results due to rounding errors and lack of precision.
(1)ie: 543.509,291 * 100000 = 543,509,291 will move the decimal 6 places to the maximum precision of a float on an 8-bit AVR. If you first multiply all values by the same constant like 1000, or 100000, etc, then the decimal point is preserved and then you cast it back to a float number by dividing by your decimal constant when you are ready to print or store it.
float f = 3.1428;
int x;
x = f * 10000;
x now contains 31428
I have the following code:
NSUInteger one = 1;
CGPoint p = CGPointMake(-one, -one);
NSLog(#"%#", NSStringFromCGPoint(p));
Its output:
{4.29497e+09, 4.29497e+09}
On the other hand:
NSUInteger one = 1;
NSLog(#"%i", -one); // prints -1
I know there’s probably some kind of overflow going on, but why do the two cases differ and why doesn’t it work the way I want? Should I always remind myself of the particular numeric type of my variables and expressions even when doing trivial arithmetics?
P.S. Of course I could use unsigned int instead of NSUInteger, makes no difference.
When you apply the unary - to an unsigned value, the unsigned value is negated and then forced back into unsigned garb by having Utype_MAX + 1 repeatedly added to that value. When you pass that to CGPointMake(), that (very large) unsigned value is then assigned to a CGFloat.
You don't see this in your NSLog() statement because you are logging it as a signed integer. Convert that back to a signed integer and you indeed get -1. Try using NSLog("%u", -one) and you'll find you're right back at 4294967295.
unsigned int versus NSUInteger DOES make a difference: unsigned int is half the size of NSUInteger under an LP64 architecture (x86_64, ppc64) or when you compile with NS_BUILD_32_LIKE_64 defined. NSUInteger happens to always be pointer-sized (but use uintptr_t if you really need an integer that's the size of a pointer!); unsigned is not when you're using the LP64 model.
OK without actually knowing, but reading around on the net about all of these datatypes, I'd say the issue was with the conversion from a NUSInteger (which resolves to either an int (x32) or a long (x64)) to a CGFloat (which resolves to either a float(x32) or double(x64)).
In your second example that same conversion is not happening. The other thing that may be effecting it is that from my reading, NSUinteger is not designed contain negative numbers, only positive ones. So that is likely to be where things start to go wrong.
I have a method that deals with some geographic coordinates in .NET, and I have a struct that stores a coordinate pair such that if 256 is passed in for one of the coordinates, it becomes 0. However, in one particular instance a value of approximately 255.99999998 is calculated, and thus stored in the struct. When it's printed in ToString(), it becomes 256, which should not happen - 256 should be 0. I wouldn't mind if it printed 255.9999998 but the fact that it prints 256 when the debugger shows 255.99999998 is a problem. Having it both store and display 0 would be even better.
Specifically there's an issue with comparison. 255.99999998 is sufficiently close to 256 such that it should equal it. What should I do when comparing doubles? use some sort of epsilon value?
EDIT: Specifically, my problem is that I take a value, perform some calculations, then perform the opposite calculations on that number, and I need to get back the original value exactly.
This sounds like a problem with how the number is printed, not how it is stored. A double has about 15 significant figures, so it can tell 255.99999998 from 256 with precision to spare.
You could use the epsilon approach, but the epsilon is typically a fudge to get around the fact that floating-point arithmetic is lossy.
You might consider avoiding binary floating-points altogether and use a nice Rational class.
The calculation above was probably destined to be 256 if you were doing lossless arithmetic as you would get with a Rational type.
Rational types can go by the name of Ratio or Fraction class, and are fairly simple to write
Here's one example.
Here's another
Edit....
To understand your problem consider that when the decimal value 0.01 is converted to a binary representation it cannot be stored exactly in finite memory. The Hexidecimal representation for this value is 0.028F5C28F5C where the "28F5C" repeats infinitely. So even before doing any calculations, you loose exactness just by storing 0.01 in binary format.
Rational and Decimal classes are used to overcome this problem, albeit with a performance cost. Rational types avoid this problem by storing a numerator and a denominator to represent your value. Decimal type use a binary encoded decimal format, which can be lossy in division, but can store common decimal values exactly.
For your purpose I still suggest a Rational type.
You can choose format strings which should let you display as much of the number as you like.
The usual way to compare doubles for equality is to subtract them and see if the absolute value is less than some predefined epsilon, maybe 0.000001.
You have to decide yourself on a threshold under which two values are equal. This amounts to using so-called fixed point numbers (as opposed to floating point). Then, you have to perform the round up manually.
I would go with some unsigned type with known size (eg. uint32 or uint64 if they're available, I don't know .NET) and treat it as a fixed point number type mod 256.
Eg.
typedef uint32 fixed;
inline fixed to_fixed(double d)
{
return (fixed)(fmod(d, 256.) * (double)(1 << 24))
}
inline double to_double(fixed f)
{
return (double)f / (double)(1 << 24);
}
or something more elaborated to suit a rounding convention (to nearest, to lower, to higher, to odd, to even). The highest 8 bits of fixed hold the integer part, the 24 lower bits hold the fractional part. Absolute precision is 2^{-24}.
Note that adding and substracting such numbers naturally wraps around at 256. For multiplication, you should beware.