Hexadecimal to decimal voltage Conversion - hex

I want to convert the Hexadecimal values to voltage conversion as mentioned below,
2 Byte Signed 2s Comp Binary Fraction with Binary Point to the right of the most significant bit. 1:512V scaling.
Example :
0x2A80 → 170.00 V
0xD580 → ‐170.00 V
But the 0x2A80 conversion gives me 10880 decimal value. How can i get 170.00 V from 0x2A80?

If 0x2A80 is 170.00, then that means you have 10 bits before the point and 6 bits after the point. Or in other words, you have 10880/64 == 170.
Your question seems to contain a few misconceptions:
The fact that 170.0 is a voltage is irrelevant. Numbers work the same no matter whether they are voltages, distances, or just numbers without a unit.
In most programming languages, you don't have "decimal" or "hexadecimal" values, you just have values. Decimal and hexadecimal only come in when you're dealing with text output and string. 0x2A80 is 10880, and 0xD580 is -10880.
If you happen to be programming in C:
short fixedPointNumber;
float floatingPointNumber;
scanf("%hx", &fixedPointNumber);
floatingPointNumber = fixedPointNumber / 64.0f;
printf("Converted number: %f\n", floatingPointNumber);

Related

Mathematical precision at 19th decimal place and beyond

I have the same set of data and am running the same code, but sometimes I get different results at the 19th decimal place and beyond. Although this is not a great concern to me for numbers less than 0.0001, it makes me wonder whether 19th decimal place is Raku's limit of precision?
Word 104 differ:
0.04948872986571077 19 chars
0.04948872986571079 19 chars
Word 105 differ:
0.004052062278212545 20 chars
0.0040520622782125445 21 chars
TL;DR See the doc's outstanding Numerics page.
(I had forgotten about that page before I wrote the following answer. Consider this answer at best a brief summary of a few aspects of that page.)
There are two aspects to this. Internal precision and printing precision.
100% internal precision until RAM is exhausted
Raku supports arbitrary precision number types. Quoting Wikipedia's relevant page:
digits of precision are limited only by the available memory of the host system
You can direct Raku to use one of its arbitrary precision types.[1] If you do so it will retain 100% precision until it runs out of RAM.
Arbitrary precision type
Corresponding type checking[2]
Example of value of that type
Int
my Int $foo ...
66174449004242214902112876935633591964790957800362273
FatRat
my FatRat $foo ...
66174449004242214902112876935633591964790957800362273 / 13234889800848443102075932929798260216894990083844716
Thus you can get arbitrary internal precision for integers and fractions (including arbitrary precision decimals).
Limited internal precision
If you do not direct Raku to use an arbitrary precision number type it will do its best but may ultimately switch to limited precision. For example, Raku will give up on 100% precision if a formula you use calculates a Rat and the number's denominator exceeds 64 bits.[1]
Raku's fall back limited precision number type is Num:
On most platforms, [a Num is] an IEEE 754 64-bit floating point numbers, aka "double precision".
Quoting the Wikipedia page for that standard:
Floating point is used ... when a wider range is needed ... even if at the cost of precision.
The 53-bit significand precision gives from 15 to 17 significant decimal digits precision (2−53 ≈ 1.11 × 10−16).
Printing precision
Separate from internal precision is stringification of numbers.
(It was at this stage that I remembered the doc page on Numerics linked at the start of this answer.)
Quoting Printing rationals:
Keep in mind that output routines like say or put ... may choose to display a Num as an Int or a Rat number. For a more definitive string to output, use the raku method or [for a rational number] .nude
Footnotes
[1] You control the type of a numeric expression via the types of individual numbers in the expression, and the types of the results of numeric operations, which in turn depend on the types of the numbers. Examples:
1 + 2 is 3, an Int, because both 1 and 2 are Ints, and a + b is an Int if both a and b are Ints;
1 / 2 is not an Int even though both 1 and 2 are individually Ints, but is instead 1/2 aka 0.5, a Rat.
1 + 4 / 2 will print out as 3, but the 3 is internally a Rat, not an Int, due to Numeric infectiousness.
[2] All that enforcement does is generate a run-time error if you try to assign or bind a value that is not of the numeric type you've specified as the variable's type constraint. Enforcement doesn't mean that Raku will convert numbers for you. You have to write your formulae to ensure the result you get is what you want.[1] You can use coercion -- but coercion cannot regain precision that's already been lost.

How to do Division of two fixed point 64 bits variables in Synthesizable Verilog?

I'm implementing an Math equation in verilog, in a combinational scheme (assigns = ...) to the moment Synthesis tool (Quartus II) has been able to do add, sub and mul easly 32 bit unsigned absolute numbers by using the operators "+,- and *" respectively.
However, one of the final steps of the equation is to divide two 64 bits unsigned fixed point variables, the reason why is such of large 64 bit capacity is because I'm destinating 16 bits for integers and 48 bits for fractions (although, computer does everything in binary and doesn't care about fractions, I would be able to check the number to separate fraction from integer in the end).
Problem is that the operator "/" is useless since it auto-invokes a so-called "LPM_divide" library which output only gives me the integer, disregarding fractions, plus in a wrong position (the less significant bit).
For example:
b1000111010000001_000000000000000000000000000000000000000000000000 / b1000111010000001_000000000000000000000000000000000000000000000000
should be 1, it gives me
b0000000000000000_000000000000000000000000000000000000000000000001
So, how can I make this division for synthesizable verilog? What methods or algorithms should I follow, I'd like it to be faster, maybe a full combinational?
I'd like it to keep the 16 integers - 24 fractions user point of view. Thanks in advance.
First assume you multiply two fixed-point numbers.
Let's call them X and Y, first containing Xf fractional bits, and second Yf fractional bits accordingly.
If you multiply those numbers as integers, the LSB Xf+Yf bits of the integer result could be treated as fractional bits of resulting fixed-point number (and you still multiply them as integers).
Similarly, if you divide number of Sf fractional bits by number of Df fractional bits, the resulting integer could be treated as fixed-point number having Sf-Df fractional bits -- therefore your example with resulting integer 1.
Thus, if you need to get 48 fractional bits from your division of 16.48 number by another 16.48 number, append divident with another 48 zeroed fractional bits, then divide the resulting 64+48=112-bit number by another 64-bit number, treating both as integers (and using LPM_divide). The result's LSB 48 bits will then be what you need -- the resulting fixed-point number's 48 fractional bits.

decimal to floating point system.

i've been asked to work on the following question with the following specification/ rules...
Numbers are held in 16 bits split from left to right as follows:
1 bit sign flag that should be set for negative numbers and otherwise clear.
7 bit exponent held in Excess 63
8 bit significand, normalised to 1.x with only the fractional part stored – as in IEEE 754
Giving your answers in hexadecimal, how would the number -18 be represented in this system?
the answer is got is: 11000011 00100000 (or C320 in hexadecimal)
using the following method:
-18 decimal is a negative number so we have the sign bit set to 1.
18 in binary would be 0010010. This we could note down as 10010. We know work on what’s on the right side of the decimal point but in this case we don’t have any decimal point or fractions so we note down 0000 0000 since there are no fractions. We now write down the binary of 18 and the remainder zeroes (which are not necessarily required) and separate them with a decimal point as shown below:
10010.00000000
We now normalise this into the form 1.x by moving the decimal point and placing it between the first and second number (counting the amount of times we move the decimal point until it reaches that area). The result now is 1.001000000000 x 2^4 and we also know that the decimal point has been moved 4 times which for now we will consider to be our exponent value. The floating point system we are using has 7 bit exponent and uses excess 63. The exponent is 4 in excess 63 which would equal to 63 + 4 = 67 and this in 7 bit binary is shown as 1000011.
The sign bit is: 1 (-ve)
Exponent is: 1000011
Significand is 00100…
The binary representation is: 11000011 00100000 (or C320 in hexadecimal)
please let me know if it's correct or if i've done it wrong and what changes could be applied. thank you guy :)
Since you seem to have been assigned a lot of questions of this type, it may be useful to write an automated answer checker to validate your work. I've put together a quick converter in Python:
def convert_from_system(x):
#retrieve the first eight bits, and add a ninth bit to the left. This bit is the 1 in "1.x".
significand = (x & 0b11111111) | 0b100000000
#retrieve the next seven bits
exponent = (x >> 8) & 0b1111111
#retrieve the final bit, and determine the sign
sign = -1 if x >> 15 else 1
#add the excess exponent
exponent = exponent - 63
#multiply the significand by 2^8 to turn it from 1.xxxxxxxx into 1xxxxxxxx, then divide by 2^exponent to get back the decimal value.
result = sign * (significand / float(2**(8-exponent)))
return result
for value in [0x4268, 0xC320]:
print "The decimal value of {} is {}".format(hex(value), convert_from_system(value))
Result:
The decimal value of 0x4268 is 11.25
The decimal value of 0xc320 is -18.0
This confirms that -18 does convert into 0xC320.

Can a IEEE 754 real number "cover" all integers within its range?

The original question was edited (shortened) to focus on a problem of precision, not range.
Single, or double precision, every representation of real number is limited to (-range,+range). Within this range lie some integer numbers (1, 2, 3, 4..., and so on; the same goes with negative numbers).
Is there a guarantee that a IEEE 754 real number (float, double, etc) can "cover" all integers within its range? By "cover" I mean the real number will represent the integer number exactly, not as (for example) "5.000001".
Just as reminder: http://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html nice explanation of various number representation formats.
Update:
Because the question is for "can" I am also looking for the fact this cannot be done -- for it quoting a number is enough. For example "no it cannot be done, for example number 1748574 is not represented exactly by float number" (this number is taken out of thin air of course).
For curious reader
If you would like to play with IEEE 754 representation -- on-line calculator: http://www.ajdesigner.com/fl_ieee_754_word/ieee_32_bit_word.php
No, not all, but there exists a range within which you can represent all integers accurately.
Structure of 32bit floating point numbers
The 32bit floating point type uses
1 bit for the sign
8 bits for the exponent
23 bits for the fraction (leading 1 implied)
Representing numbers
Basically, you have a number in the form
(-)1.xxxx_xxxx_xxxx_xxxx_xxxx_xxx (binary)
which you then shift left/right with the (unbiased) exponent.
To have it represent an integer requiring n bits, you need to shift it by n-1 bits to the left. (All xes beyond the floating point are simply zero)
Representing integers with 24 bits
It is easy to see, that we can represent all integers requiring 24 bits (and less)
1xxx_xxxx_xxxx_xxxx_xxxx_xxxx.0 (unbiased exponent = 23)
since we can set the xes at will to either 1 or 0.
The highest number we can represent in this fashion is:
1111_1111_1111_1111_1111_1111.0
or 2^24 - 1 = 16777215
The next higher integer is 1_0000_0000_0000_0000_0000_0000. Thus, we need 25 bits.
Representing integers with 25 bits
If you try to represent a 25 bit integer (unbiased exponent = 24), the numbers have the following form:
1_xxxx_xxxx_xxxx_xxxx_xxxx_xxx0.0
The twenty-three digits that are available to you have all been shifted past the floating point. The leading digit is always a 1. In total, we have 24 digits. But since we need 25, a zero is appended.
A maximum is found
We can represent ``1_0000_0000_0000_0000_0000_0000with the form1_xxxx_xxxx_xxxx_xxxx_xxxx_xxx0.0, by simply assigning 1to allxes. The next higher integer from that is: 1_0000_0000_0000_0000_0000_0001. It's easy to see that this number cannot be represented accurately, because the form does not allow us to set the last digit to 1: It is always 0`.
It follows, that the 1 followed by 24 zeroes is an upper bound for the integers we can accurately represent.
The lower bound simply has its sign bit flipped.
Range within which all integers can be represented (including boundaries)
224 as an upper bound
-224 as a lower bound
Structure of 64bit floating point numbers
1 bit for the sign
11 exponent bits
52 fraction bits
Range within which all integers can be represented (including boundaries)
253 as an upper bound
-253 as a lower bound
This easily follows by applying the same argumentation to the structure of 64bit floating point numbers.
Note: That is not to say these are all integers we can represent, but it gives you a range within which you can represent all integers. Beyond that range, we can only represent a power of two multiplied with an integer from said range.
Combinatorial argument
Simply convincing ourselves that it is impossible for 32bit floating point numbers to represent all integers a 32bit integer can represent, we need not even look at the structure of floating point numbers.
With 32 bits, there are 232 different things we can represent. No more, no less.
A 32bit integer uses all of these "things" to represent numbers (pairwise different).
A 32bit floating point number can represent at least one number with a fractional part.
Thus, it is impossible for the 32bit floating point number to be able to represent this fractional number in addition to all 232 integers.
macias, to add to the already excellent answer by phant0m (upvoted; I suggest you accept it), I'll use your own words.
"No it cannot be done, for example number 16777217 is not represented exactly by float number."
Also, "for example number 9223372036854775809 is not represented exactly by double number".
This is assuming your computer is using the IEEE floating point format, which is a pretty strong bet.
No.
For example, on my system, the type float can represent values up to approximately 3.40282e+38. As an integer, that would be approximately 340282000000000000000000000000000000000, or about 2128.
The size of float is 32 bits, so it can exactly represent at most 232 distinct numbers.
An integer object generally uses all of its bits to represent values (with 1 bit dedicated as a sign bit for signed types). A floating-point object uses some of its bits to represent an exponent (8 bits for IEEE 32-bit float); this increases its range at the cost of losing precision.
A concrete example (1267650600228229401496703205376.0 is 2100, and is exactly representable as a float):
#include <stdio.h>
#include <float.h>
#include <math.h>
int main(void) {
float x = 1267650600228229401496703205376.0;
float y = nextafterf(x, FLT_MAX);
printf("x = %.1f\n", x);
printf("y = %.1f\n", y);
return 0;
}
The output on my system is:
x = 1267650600228229401496703205376.0
y = 1267650751343956853325350043648.0
Another way to look at it:
A 32-bit object can represent at most 232 distinct values.
A 32-bit signed integer can represent all integer values in the range -2147483648 .. 2147483647 (-231 .. +231-1).
A 32-bit float can represent many values that a 32-bit signed integer can't, either because they're fractional (0.5) or because they're too big (2.0100). Since there are values that can be represented by a 32-bit float but not by a 32-bit int, there must be other values that can be represented by a 32-bit int but not by a 32-bit float. Those values are integers that have more significant digits than a float can handle, because the int has 31 value bits but the float has only about 24.
Apparently you are asking whether a Real data type can represent all of the integer values in its range (absolute values up to FLT_MAX or DBL_MAX, in C, or similar constants in other languages).
The largest numbers representable by floating point numbers stored in K bits typically are much larger than the 2^K number of integers that K bits can represent, so typically the answer is no. 32-bit C floats exceed 10^37, 32-bit C integers are less than 10^10. To find out the next representable number after some number, use nextafter() or nextafterf(). For example, the code
printf ("%20.4f %20.4f\n", nextafterf(1e5,1e9), nextafterf(1e6,1e9));
printf ("%20.4f %20.4f\n", nextafterf(1e7,1e9), nextafterf(1e8,1e9));
prints out
100000.0078 1000000.0625
10000001.0000 100000008.0000
You might be interested in whether an integer J that is between two nearby fractional floating values R and S can be represented exactly, supposing S-R < 1 and R < J < S. Yes, such a J can be represented exactly. Every float value is the ratio of some integer and some power of 2. (Or is the product of some integer and some power of 2.) Let the power of 2 be P, and suppose R = U/P, S = V/P. Now U/P < J < V/P so U < J*P < V. More of J*P's low-order bits are zero than are those of U, V (because V-U < P, due to S-R < 1), so J can be represented exactly.
I haven't filled in all the details to show that J*P-U < P and V-J*P < P, but under the assumption S-R < 1 that's straightforward. Here is an example of R,J,S,P,U,V value computations: Let R=99999.9921875 = 12799999/128, (ie P=128); let S=100000.0078125 = 12800001/128; we have U=0xc34fff and V=0xc35001 and there is a number between them that has more low-order zeroes than either; to wit, J = 0xc35000/128 = 12800000/128 = 100000.0. For the numbers in this example, note that U and V require 24 bits for their exact representations (6 ea. 4-bit hex digits). Note that 24 bits is the number of bits of precision in IEEE 754 single-precision floating point numbers. (See table in wikipedia article.)
That each floating point number is a product or ratio of some integer and some power of 2 (as mentioned two paragraphs above) also is discussed in that floating point article, in a paragraph that begins:
By their nature, all numbers expressed in floating-point format are rational numbers with a terminating expansion in the relevant base (for example, ... a terminating binary expansion in base-2). Irrational numbers, such as π or √2, or non-terminating rational numbers, must be approximated. The number of digits (or bits) of precision also limits the set of rational numbers that can be represented exactly.

What types of numbers are representable in binary floating-point?

I've read a lot about floats, but it's all unnecessarily involved. I think I've got it pretty much understood, but there's just one thing I'd like to know for sure:
I know that, fractions of the form 1/pow(2,n), with n an integer, can be represented exactly in floating point numbers. This means that if I add 1/32 to itself 32 million times, I would get exactly 1,000,000.
What about something like 1/(32+16)? It's one over the sum of two powers of two, does this work? Or is it 1/32+1/16 that works? This is where I'm confused, so if anyone could clarify that for me I would appreciate it.
The rule can be summed up as this:
A number can be represented exactly in binary if the prime factorization of the denominator contains only 2. (i.e. the denominator is a power-of-two)
So 1/(32 + 16) is not representable in binary because it has a factor of 3 in the denominator. But 1/32 + 1/16 = 3/32 is.
That said, there are more restrictions to be representable in a floating-point type. For example, you only have 53 bits of mantissa in an IEEE double so 1/2 + 1/2^500 is not representable.
So you can do sum of powers-of-two as long as the range of the exponents doesn't span more than 53 powers.
To generalize this to other bases:
A number can be exactly represented in base 10 if the prime factorization of the denominator consists of only 2's and 5's.
A rational number X can be exactly represented in base N if the prime factorization of the denominator of X contains only primes found in the factorization of N.
A finite number can be represented in the common IEEE 754 double-precision format if and only if it equals M•2e for some integers M and e such that -253 < M < 253 and -1074 ≤ e ≤ 971.
For single precision, -224 < M < 224 and -149 ≤ e ≤ 104.
For double-precision, these are consequences of the facts that the double-precision format uses 52 bits to store a significand (which normally has 53 bits due to an implicit 1) and uses 11 bits to store an exponent. 11 bits encodes numbers from 0 to 2047, but 0 and 2047 are excluded for special purposes, and the encoded number is biased by 1023, so it represents unbiased exponents from -1022 to 1023. However, these unbiased exponents are for significands in the interval [1, 2), and those significands have fractions. To express the significand as an integer, I adjusted the exponent range by 52. Single-precision is similar, with 23 bits to store a 24-bit significand, 8 bits for the exponent, and a bias of 127.
Expressing the representable numbers using an integer times a power of two rather than the more common fractional significand simplifies some number theory and other reasoning about floating-point properties. I used it in this answer because it allows the set of representable values to be expressed concisely.
Floating-point numbers are literally represented using the form:
1.m * 2^e
Where 1.m is a binary fraction and e is a positive or negative integer.
As such, you can represent 1/32 + 1/16 exactly, as:
1.1000000 * 2^-4
(1.10 being the binary fraction equivalent to 1.5.) 1/48, however, is not representable in this format.
One point not yet mentioned is that semantically, a floating-point number may be best regarded as representing a range of values. The range of values has a very precisely-defined center point, and the IEEE spec generally requires that the result of a floating-point computation be the number whose range contains the point one would get operating upon the center-points of the original numbers, but in the sequence:
double N1 = 0.1;
float N2 = (float)N1;
double N3 = N2;
N2 is the unambiguous correct single-precision representation of the value that had been represented in N1, despite the language's silly requirement to use an explicit cast. N3 will represent one of the values that N2 could represent (the language spec happens to choose the double value whose range is centered upon the middle of the range of the float). Note that while N2 represents the value of its type whose range contains the correct value, N3 does not.
Incidentally, conversion of a number from a string to a float in .net and .net languages seems to go through an intermediate conversion to double, which may sometimes alter the value. For example, even though the value 13571357 is representable as a single-precision float, the value 13571357.499999999069f gets rounded to 13571358 (even though it's obviously closer to 13571357).

Resources