Converting integer to floating point number in Delphi

Converting integer to floating point number in Delphi - pointers

Could someone please explain the below code to me? It takes the Integer and converts it to a Single floating point number but if someone could break this down and elaborate that would be helpful.
singleVar := PSingle(#intVar)^

This doesn't convert the integer to a float. It reinterprets the bytes of the 32-bit integer as a single (a floating point data type that also has 32 bits).
#intVar is the address of the integer data in memory. The type is pointer to integer (PInteger). By writing PSingle(#intVar), you tell the compiler to pretend that it is a pointer to a single; in effect, you tell the compiler that it should interpret the data at this place in memory as a single. Finally, PSingle(#intVar)^ is simply dereferencing the pointer. Hence, it is the "single" value at this location in memory, that is, the original bytes now interpreted as a single.
Interpreting the bytes of an integer as a single doesn't give you the same numerical value in general. For instance, if the integer value is 123, the bytes are 7B00 0000. If you interpret this sequence of bytes as a single, you obtain 1,72359711111953E-43 which is not numerically equivalent.
To actually convert an integer to a single, you would write singleVar := intVar.

Related

SQLite Affinity Bug? [duplicate]

This question already has answers here:
Why does floating-point arithmetic not give exact results when adding decimal fractions?
(31 answers)
Closed 4 years ago.
EDIT:
The answer here: Is floating point math broken? assists in understanding this question. However, this question is not language agnostic. It is specific to the documented behavior and affinity of floating point numbers as handled by SQLite. Having a very similar answer to a different question != duplicate question.
QUESTION:
I have a rather complex SQLite Where Clause comparing numerical values. I have read and "think" I understand the Datatype Documentation here: https://www.sqlite.org/datatype3.html
Still confused as to the logic SQLite uses to determine datatypes in comparison clauses such as =, >, <, <> etc. I can narrow my example down to this bit of test SQL of which the results make little sense to me.
SELECT
CAST(10 AS NUMERIC) + CAST(254.53 AS NUMERIC) = CAST(264.53 AS NUMERIC) AS TestComparison1,
CAST(10 AS NUMERIC) + CAST(254.54 AS NUMERIC) = CAST(264.54 AS NUMERIC) AS TestComparison2
Result: "1" "0"
The second expression in the select statement (TestComparison2) is converting the left-side of the equation to a TEXT value. I can prove this by casting the right-side of the equation to TEXT and the result = 1.
Obviously I'm missing something in the way SQLite computes Affinity. These are values coming from columns in a large/complex query. Should I be casting both sides of the equations in WHERE/Join Clauses to TEXT to avoid these issues?

The reason why you are not getting the expected result is that the underlying results will be floating point.
Although DataTypes in SQLite3 covers much, you should also consider the following section from Expressions :-
Affinity of type-name Conversion Processing
NONE
Casting a value to a type-name with no affinity causes the value to be converted into a BLOB. Casting to a BLOB consists of first
casting the value to TEXT in the encoding of the database connection,
then interpreting the resulting byte sequence as a BLOB instead of as
TEXT.
TEXT
To cast a BLOB value to TEXT, the sequence of bytes that make up the BLOB is interpreted as text encoded using the database
encoding.
Casting an INTEGER or REAL value into TEXT renders the value as if via
sqlite3_snprintf() except that the resulting TEXT uses the encoding of
the database connection.
REAL
When casting a BLOB value to a REAL, the value is first converted to TEXT.
When casting a TEXT value to REAL, the longest possible prefix of the
value that can be interpreted as a real number is extracted from the
TEXT value and the remainder ignored. Any leading spaces in the TEXT
value are ignored when converging from TEXT to REAL.
If there is no prefix that can be interpreted as a real number, the
result of the conversion is 0.0.
INTEGER
When casting a BLOB value to INTEGER, the value is first converted to TEXT.
When casting a TEXT value to INTEGER, the longest possible prefix of the value >that can be interpreted as an integer number is extracted
from the TEXT value and the remainder ignored. Any leading spaces in
the TEXT value when converting from TEXT to INTEGER are ignored.
If there is no prefix that can be interpreted as an integer number,
the result of the conversion is 0.
If the prefix integer is greater than +9223372036854775807 then the
result of the cast is exactly +9223372036854775807.
Similarly, if the
prefix integer is less than -9223372036854775808 then the result of
the cast is exactly -9223372036854775808.
When casting to INTEGER, if the text looks like a floating point value with an exponent, the exponent will be ignored because it is no
part of the integer prefix. For example, "(CAST '123e+5' AS INTEGER)"
results in 123, not in 12300000.
The CAST operator understands decimal integers only — conversion of hexadecimal integers stops at the "x" in the "0x" prefix of the
hexadecimal integer string and thus result of the CAST is always zero.
A cast of a REAL value into an INTEGER results in the integer between the REAL value and zero that is closest to the REAL value. If
a REAL is greater than the greatest possible signed integer
(+9223372036854775807) then the result is the greatest possible signed
integer and if the REAL is less than the least possible signed integer
(-9223372036854775808) then the result is the least possible signed
integer.
Prior to SQLite version 3.8.2 (2013-12-06), casting a REAL value greater than +9223372036854775807.0 into an integer resulted in the
most negative integer, -9223372036854775808. This behavior was meant
to emulate the behavior of x86/x64 hardware when doing the equivalent
cast.
NUMERIC
Casting a TEXT or BLOB value into NUMERIC first does a forced conversion into REAL but then further converts the result into
INTEGER if and only if the conversion from REAL to INTEGER is lossless
and reversible. This is the only context in SQLite where the NUMERIC
and INTEGER affinities behave differently.
Casting a REAL or INTEGER value to NUMERIC is a no-op, even if a real
value could be losslessly converted to an integer.
NOTE
Before this section there is a section on Literal Values (i.e. casting probably only needs to be applied to values extracted from columns).
Try :-
SELECT
round(CAST(10 AS NUMERIC) + CAST(254.53 AS NUMERIC),2) = round(CAST(264.53 AS NUMERIC),2) AS TestComparison1,
round(CAST(10 AS NUMERIC) + CAST(254.54 AS NUMERIC),2) = round(CAST(264.54 AS NUMERIC),2) AS TestComparison2
:-

Why is typeof hex or binary number Uint64 while type of decimal number is Int64?

julia> typeof(-0b111)
Uint64
julia> typeof(-0x7)
Uint64
julia> typeof(-7)
Int64
I find this result a bit surprising. Why does the numeric base of the number determine signed or unsgined-ness?

Looks like this is expected behavior:
This behavior is based on the observation that when one uses unsigned
hex literals for integer values, one typically is using them to
represent a fixed numeric byte sequence, rather than just an integer
value.
http://docs.julialang.org/en/latest/manual/integers-and-floating-point-numbers/#integers
...seems like a bit of an odd choice.

This is a subjective call, but I think it's worked out pretty well. In my experience when you use hex or binary, you're interested in a specific pattern of bits – and you generally want it to be unsigned. When you're just interested a numeric value you use decimal because that's what we're most familiar with. In addition, when you're using hex or binary, the number of digits you use for input is typically significant, whereas in decimal, it isn't. So that's how literals work in Julia: decimal gives you a signed integer of a type that the value fits in, while hex and binary give you an unsigned value whose storage size is determined by the number of digits.

Nested as_type Casting

When I nest OpenCL's as_type operators, I get some strange errors. For example, this line works:
a = as_uint(NAN)&4290772991;
But these lines do not work:
a = as_float(as_uint(NAN)&4290772991);
a = as_uint(as_float(as_uint(NAN)&4290772991));
The error reads:
invalid reinterpretation: sizes of 'float' and 'long' must match
This error message is confusing, because it seems like no long is created by this code. All values here appear to be 32-bits, so it should be possible to reinterpret cast anything.
So why is this error happening?

In C99, undecorated decimal constants are assumed to be signed integers and the compiler will automagically define the constant as the smallest signed integer type which can hold the value using the progression int, then long int, then finally unsigned long int.
The smallest signed integer type which can hold 4290772991 is a 64 bit signed type (because of the sign bit requirement). Thus, the as_type calls you have where the reinterpret type is a 32 bit type fail because of the size mismatch between the 64 bit long int the compiler selects for your constant and the target float type.
You should be able to get around the problem by changing 4290772991 to 4290772991u. The suffix will explicitly denote the value as unsigned, and the compiler should select a 32 bit unsigned integer. Alternatively, you could also use 0xFFBFFFFF - there are different rules for hexadecimal constants and it should be assigned a type from the progression int, then unsigned int, then long int, then finally unsigned long int.

integer stored as float

I have some questions regarding integers and floats:
Can I store every 32-bit unsigned integer value into a 64-bit IEEE floating point value (such that when I assign the double value back to an int the int will contain the original value)?
What are the smallest (magnitude wise) positive and negative integer values that cannot be stored in a 32-bit IEEE floating point value (by the same definition as in 1)?
Do the answers to these questions depend on language used?
//edit: I know these questions sound a bit like from some test but I'm asking about these things because I need to make some decisions on a dataformat definition

Yes, you can store a 32-bit integer a 64-bit double without information loss. The mantissa has 53 bits of precision, which is enough.
A 32-bit float has a 24-bit mantissa, so the maximum and minimum integers with a unique representation are 2^24-1 and -2^24+1 (16777215 and -16777215). Greater numbers don't have a unique representation; for example 16777216 == (float)16777217.
If you assume the language follows IEEE-754, it doesn't depend on the language.

How to get around some rounding errors?

I have a method that deals with some geographic coordinates in .NET, and I have a struct that stores a coordinate pair such that if 256 is passed in for one of the coordinates, it becomes 0. However, in one particular instance a value of approximately 255.99999998 is calculated, and thus stored in the struct. When it's printed in ToString(), it becomes 256, which should not happen - 256 should be 0. I wouldn't mind if it printed 255.9999998 but the fact that it prints 256 when the debugger shows 255.99999998 is a problem. Having it both store and display 0 would be even better.
Specifically there's an issue with comparison. 255.99999998 is sufficiently close to 256 such that it should equal it. What should I do when comparing doubles? use some sort of epsilon value?
EDIT: Specifically, my problem is that I take a value, perform some calculations, then perform the opposite calculations on that number, and I need to get back the original value exactly.

This sounds like a problem with how the number is printed, not how it is stored. A double has about 15 significant figures, so it can tell 255.99999998 from 256 with precision to spare.

You could use the epsilon approach, but the epsilon is typically a fudge to get around the fact that floating-point arithmetic is lossy.
You might consider avoiding binary floating-points altogether and use a nice Rational class.
The calculation above was probably destined to be 256 if you were doing lossless arithmetic as you would get with a Rational type.
Rational types can go by the name of Ratio or Fraction class, and are fairly simple to write
Here's one example.
Here's another
Edit....
To understand your problem consider that when the decimal value 0.01 is converted to a binary representation it cannot be stored exactly in finite memory. The Hexidecimal representation for this value is 0.028F5C28F5C where the "28F5C" repeats infinitely. So even before doing any calculations, you loose exactness just by storing 0.01 in binary format.
Rational and Decimal classes are used to overcome this problem, albeit with a performance cost. Rational types avoid this problem by storing a numerator and a denominator to represent your value. Decimal type use a binary encoded decimal format, which can be lossy in division, but can store common decimal values exactly.
For your purpose I still suggest a Rational type.

You can choose format strings which should let you display as much of the number as you like.
The usual way to compare doubles for equality is to subtract them and see if the absolute value is less than some predefined epsilon, maybe 0.000001.

You have to decide yourself on a threshold under which two values are equal. This amounts to using so-called fixed point numbers (as opposed to floating point). Then, you have to perform the round up manually.
I would go with some unsigned type with known size (eg. uint32 or uint64 if they're available, I don't know .NET) and treat it as a fixed point number type mod 256.
Eg.
typedef uint32 fixed;
inline fixed to_fixed(double d)
{
return (fixed)(fmod(d, 256.) * (double)(1 << 24))
}
inline double to_double(fixed f)
{
return (double)f / (double)(1 << 24);
}
or something more elaborated to suit a rounding convention (to nearest, to lower, to higher, to odd, to even). The highest 8 bits of fixed hold the integer part, the 24 lower bits hold the fractional part. Absolute precision is 2^{-24}.
Note that adding and substracting such numbers naturally wraps around at 256. For multiplication, you should beware.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex