Struggle with binary substraction

Struggle with binary substraction - math

Last week I learned about arithmetic with binary numbers especially substraction with two's complement, it was pretty easy so far but something bothers me a little bit. Why is 0 - 1 = 1 with a borrow of 1?
Sure its -1 but should we get some result like 1001 (4 bits for size)? Can someone please explain?

0-1 is not 1 in binary
0-1 does not equal 1 in two's complement with signed numbers. But its representation does have alot of ones in it: you should get 1111.
Negative numbers in two's complement:
In two complement's format (which computers use) with signed magnitude, one bit is reserved to represent the sign of the number. By convention this is the leftmost bit, where 0 indicates a positive value and 1 indicates a negative value.
To get the representation of a negative number we follow two steps:
Flip all the bits
Then increment by one
Additive Inverse of 7:
7=0111
1000 // Flip the bits
-7=1001 // Add +1
So why do we get 1111 when we subtract 0-1?
How do we substract in two's complement? We subtract A-B by taking the additive inverse (the opposite) of B and adding the numbers.
So with A=0 and B=1:
Take the additive inverse by inverting and then increment by 1.
Additive Inverse of B:
B= 0001
-B= 1110 // Invert
-B= 1111 // Increment +1
Now sum A + (-B) :
A=0000
(+) -B=1111
-----------
A+(-B)=1111
Note that if we are only allowed one bit. We can't represent negative numbers since there's no room for the value.

Related

Binary 2's Complement

I'm facing a problem. when we want to subtract a number from another using 2's complement we can do that. I don't know how to subtract fractional number using 2's complement.
5 is in binary form 101 and 2 is 10. if we want to subtract 2 from 5 we need to find out 2's complement of 2
2's complement of 2-> 11111110
so if we now add with binary of 5 we can get the subtraction result. If I want to get the result of 5.5-2.125. what would be the procedure.

Fixed point numbers can be used and it is still common to find them in embedded code or hardware.
Their use is identical to integers, but you need to specify where your "point" is. For instance, assume that you want 3 bits after after the point and that your data is 8 bits, bits 7..3 are the integer part (left of "point") and bits 2..0 the fractional part. The interpretation of integer part is as usual the binary decomposition of this integer: bits 3 correspond to 20, bits 4 to 21, etc.
For the fractional part, the decomposition is in negative powers or two. bits 2 correspond to 2-1, bits 1 to 2-2 and bit 0 to 2-3.
So for you problem, 5.5=4+1+1/2=22+20+2-1 and its code is 00101(.)100. Similarly 2.125=2+1/8 and its code is 00010(.)001 (note (.) is just an help to understand the coding).
Indeed they are just integers, but you must take into account that all your numbers are multiplied by 2-3. This will have no impact for addition, but results of multiplication and division must be adjusted. Taking into account the position of point and managing over and underflows is the difficulty of arithmetic with fixed point, but it allows to do fractional computations even if your hardware does not provide floating point support (for instance with low end microcontrollers or FPGA systems).
Two complement is similar to integers and its computation is identical. If code of 2.125 is 00010(.)001, than -2.125==11101(.)111. Operations are as usual.
+5 00101(.)100
-2.125 11101(.)111
00011(.)011
and 00011(.)011=2+1+1/4+1/8=3,375
For the record, two complement first use was for fixed point fractional numbers and two complement name comes from that. If a fractional number if represented by, say 0(.)1100000 (0.75), its negative counter part will be 1(.)0100000 (-0.75 or 1.25 if interpreted as unsigned) and we always have x+(unsigned)-x=2. For this coding, the negative value of a fractional number x is the number y that must be added to x to get a 2, hence the name that y is 2's complement of x.

Whats the highest and the lowest integer for representing signed numbers in two's complement in 5 bits?

I understand how binary works and I can calculate binary to decimal, but I'm lost around signed numbers.
I have found a calculator that does the conversion. But I'm not sure how to find the maximum and the minumum number or convert if a binary number is not given, and question in StackO seems to be about converting specific numbers or doesn't include signed numbers to a specific bit.
The specific question is:
We have only 5 bits for representing signed numbers in two's complement:
What is the highest signed integer?
Write its decimal value (including the sign only if negative).
What is the lowest signed integer?
Write its decimal value (including the sign only if negative).
Seems like I'll have to go heavier on binary concepts, I just have 2 months in programming and I thought i knew about binary conversion.

From a logical point of view:
Bounds in signed representation
You have 5 bits, so there are 32 different combinations. It means that you can make 32 different numbers with 5 bits. On unsigned integers, it makes sense to store integers from 0 to 31 (inclusive) on 5 bits.
However, this is about unsigned integers. Meaning: we have to find a way to represent negative numbers too. Meaning: we have to store the number's value, but also its sign (+ or -). The representation used is 2's complement, and it is the one that's learned everywhere (maybe other exist but I don't know them). In this representation, the sign is given by the first bit. That is, in 2's complement representation a positive number starts with a 0 and a negative number starts with an 1.
And here the problem rises: Is 0 a positive number or a negative number ? It can't be both, because it would mean that 0 can be represented in two manners for a given number a bits (for 5: 00000 and 10000), that is we lose the space to put one more number. I have no idea how they decided, but fact is 0 is a positive number. For any number of bits, signed or unsigned, a 0 is represented with only 0.
Great. This gives us the answer to the first question: what is the upper bound for a decimal number represented in 2's complement ? We know that the first bit is for the sign, so all of the numbers we can represent must be composed of 4 bits. We can have 16 different values of 4-bits strings, and 0 is one of them, so the upper bound is 15.
Now, for the negative numbers, this becomes easy. We have already filled 16 values out of the 32 we can make on 5 bits. 16 left. We also know that 0 has already been represented, so we don't need to include it. Then we start at the number right before 0: -1. As we have 16 numbers to represent, starting from -1, the lowest signed integer we can represent on 5 bits is -16.
More generally, with n bits we can represent 2^n numbers. With signed integers, half of them are positive, and half of them are negative. That is, we have 2^(n-1) positive numbers and 2^(n-1) negative numbers. As we know 0 is considered as positive, the greatest signed integer we can represent on n bits is 2^(n-1) - 1 and the lowest is -2^(n-1)
2's complement representation
Now that we know which numbers can be represented on 5 bits, the question is to know how we represent them.
We already saw the sign is represented on the first bit, and that 0 is considered as positive. For positive numbers, it works the same way as it does for unsigned integers: 00000 is 0, 00001 is 1, 00010 is 2, etc until 01111 which is 15. This is where we stop for positive signed integers because we have occupied all the 16 values we had.
For negative signed integers, this is different. If we keep the same representation (10001 is -1, 10010 is -2, ...) then we end up with 11111 being -15 and 10000 not being attributed. We could decide to say it's -16 but we would have to check for this particular case each time we work with negative integers. Plus, this messes up all of the binary operations. We could also decide that 10000 is -1, 10001 is -2, 10010 is -3 etc. But it also messes up all of the binary operations.
2's complement works the following way. Let's say you have the signed integer 10011, you want to know what decimal is is.
Flip all the bits: 10011 --> 01100
Add 1: 01100 --> 01101
Read it as an unsigned integer: 01101 = 0*2^4 + 1*2^3 + 1*2^2 + 0*2^1 + 1*2^0 = 13.
10011 represents -13. This representation is very handy because it works both ways. How to represent -7 as a binary signed integer ? Start with the binary representation of 7 which is 00111.
Flip all the bits: 00111 --> 11000
Add 1: 11000 --> 11001
And that's it ! On 5 bits, -7 is represented by 11001.
I won't cover it, but another great advantage with 2's complement is that the addition works the same way. That is, When adding two binary numbers you do not have to care if they are signed or unsigned, this is the same algorithm behind.
With this, you should be able to answer the questions, but more importantly to understand the answers.
This topic is great for understanding 2's complement: Why is two's complement used to represent negative numbers?

Why we need to add 1 while doing 2's complement

The 2's complement of a number which is represented by N bits is 2^N-number.
For example: if number is 7 (0111) and i'm representing it using 4 bits then, 2's complement of it would be (2^N-number) i.e. (2^4 -7)=9(1001)
7==> 0111
1's compliment of 7==> 1000
1000
+ 1
-------------
1001 =====> (9)
While calculating 2's complement of a number, we do following steps:
1. do one's complement of the number
2. Add one to the result of step 1.
I understand that we need to do one's complement of the number because we are doing a negation operation. But why do we add the 1?
This might be a silly question but I'm having a hard time understanding the logic. To explain with above example (for number 7), we do one's complement and get -7 and then add +1, so -7+1=-6, but still we are getting the correct answer i.e. +9

Your error is in "we do one's compliment and get -7". To see why this is wrong, take the one's complement of 7 and add 7 to it. If it's -7, you should get zero because -7 + 7 = 0. You won't.
The one's complement of 7 was 1000. Add 7 to that, and you get 1111. Definitely not zero. You need to add one more to it to get zero!
The negative of a number is the number you need to add to it to get zero.
If you add 1 to ...11111, you get zero. Thus -1 is represented as all 1 bits.
If you add a number, say x, to its 1's complement ~x, you get all 1 bits.
Thus:
~x + x = -1
Add 1 to both sides:
~x + x + 1 = 0
Subtract x from both sides:
~x + 1 = -x

The +1 is added so that the carry over in the technique is taken care of.
Take the 7 and -7 example.
If you represent 7 as 00000111
In order to find -7:
Invert all bits and add one
11111000 -> 11111001
Now you can add following standard math rules:
00000111
+ 11111001
-----------
00000000
For the computer this operation is relatively easy, as it involves basically comparing bit by bit and carrying one.
If instead you represented -7 as 10000111, this won't make sense:
00000111
+ 10000111
-----------
10001110 (-14)
To add them, you will involve more complex rules like analyzing the first bit, and transforming the values.
A more detailed explanation can be found here.

Short answer: If you don't add 1 then you have two different representations of the number 0.
Longer answer: In one's complement
the values from 0000 to 0111 represent the numbers from 0 to 7
the values from 1111 to 1000 represent the numbers from 0 to -7
since their inverses are 0000 and 0111.
There is the problem, now you have 2 different ways of writing the same number, both 0000 and 1111 represent 0.
If you add 1 to these inverses they become 0001 and 1000 and represent the numbers from -1 to -8 therefore you avoid duplicates.

I'm going to directly answer what the title is asking (sorry the details aren't as general to everyone as understanding where flipping bits + adding one comes from).
First let motivate two's complement by recalling the fact that we can carry out standard (elementary school) arithmetic with them (i.e. adding the digits and doing the carrying over etc). Easy of computation is what motivates this representation (I assume it means we only 1 piece of hardware to do addition rather than 2 if we implemented subtraction differently than addition, and we do and subtract differently in elementary school addition btw).
Now recall the meaning of each of the digit's in two's complements and some binary numbers in this form as an example (slides borrowed from MIT's 6.004 course):
Now notice that arithmetic works as normal here and the sign is included inside the binary number in two's complement itself. In particular notice that:
1111....1111 + 0000....1 = 000....000
i.e.
-1 + 1 = 0
Using this fact let's try to derive what the two complement representation for -A should be. So the problem to solve is:
Q: Given the two's complement representation for A what is the two's complement's representation for -A?
To do this let's do some algebra using values we know:
A + (-A) = 0 = 1 + (-1) = 11...1 + 00000...1 = 000...0
now let's make -A the subject expressed in terms of numbers expressed in two's complement:
-A = 1 + (-1 - A) = 000.....1 + (111....1 - A)
where A is in two's complements. So what we need to compute is the subtraction of -1 and A in two's complement format. For that we notice how numbers are represented as a linear combination of it's bases (i.e. 2^i):
1*-2^N-1 + 1 * 2^N-1 + ... 1 = -1
a_N * -2^N-1 + a_N-1 * 2^N-1 + ... + a_0 = A
--------------------------------------------- (subtract them)
a_N-1 * -2^N-1 + a_N-1 -1 * 2^N-1 + ... + a_0 -1 = A
which essentially means we subtract each digit for it's corresponding value. This ends up simply flipping bits which results in the following:
-A = 1 + (-1 - A) = 1 + ~ A
where ~ is bit flip. This is why you need to bit flip and add 1.
Remark:
I think a comment that was helpful to me is that complement is similar to inverse but instead of giving 0 it gives 2^N (by definition) e.g. with 3 bits for the number A we want A+~A=2^N so 010 + 110 = 1000 = 8 which is 2^3. At least that clarifies what the word "complement" is suppose to mean here as it't not just the inverting of the meaning of 0 and 1.
If you've forgotten what two's complement is perhaps this will be helpful: What is “2's Complement”?
Cornell's answer that I hope to read at some point: https://www.cs.cornell.edu/~tomf/notes/cps104/twoscomp.html#whyworks

IEEE 754 Floating Point Add/Rounding

I don't understand how I can add in IEEE 754 Floating Point (mainly the "re-alignment" of the exponent)
Also for rounding, how does the Guard, Round & Sticky come into play? How to do rounding in general (base 2 floats that is)
eg. Suppose the qn: Add IEEE 754 Float represented in hex 0x383FFBAD
and 0x3FD0ECCD, then give answers in Round to 0, \$\pm \infty\$,
nearest
So I have
0x383FFBAD 0 | 0111 0000 | 0111 1111 1111 0111 0101 1010
0x3FD0ECCD 0 | 0111 1111 | 1010 0001 1101 1001 1001 1010
Then how should I continue? Feel free to use other examples if you wish

If I understood your "re-alignment" of the exponent correctly...
Here's an explanation of how the format relates to the actual value.
1.b(22)b(21)...b(0) * 2e-127 can be interpreted as a binary integer shifted left by e-127 bit positions. Of course, the shift amount can be negative, which is how we get fractions (values between 0 and 1).
In order to add 2 floating-point numbers of the same sign you need to first have their exponent part equal, or, in other words, denormalize one of the addends (the one with the smaller exponent).
The reason is very simple. When you add, for example, 1 thousand and 1 you want to add tens with tens, hundreds with hundreds, etc. So, you have 1.000*103 + 1.000*100 = 1.000*103 + 0.001*103(<--denormalized) = 1.001*103. This can, of course, result in truncation/rounding, if the format cannot represent the result exactly (e.g. if it could only have 2 significant digits, you'd end up with the same 1.0*103 for the sum).
So, just like in the above example with 1000 and 1, you may need to shift to the right one of the addends before adding their mantissas. You need to remember that there's an implict 1. bit in the format, which isn't stored in the float, which you have to account for when shifting and adding. After adding the mantissas, you most likely will run into a mantissa overflow and will have to denormalize again to get rid of the overflow.
That's the basics. There're special cases to consider as well.

What is a canonical signed digit?

What is a canonical signed digit (CSD) and how does one convert a binary number to a CSD and a CSD back to a binary number? How do you know if a digit of a CSD should be canonically chosen to be +, -, or 0?

Signed-digit binary uses three symbols in each power-of-two position: -1, 0, 1. The value represented is the sum of the positional coefficients times the corresponding power of 2, just like binary, the difference being that some of the coefficients may be -1. A number can have multiple distinct representations in this system.
Canonical signed digit representation is the same, but subject to the constraint that no two consecutive digits are non-0. It works out that each number has a unique representation in CSD.
See slides 31 onwards in Parhi's Bit Level Arithmetic for more, including a binary to CSD conversion algorithm.

What is canonical signed digit format?
Canonical Signed Digit (CSD) is a type of number representation. The important characteristics of the CSD presentation are:
CSD presentation of a number consists of numbers 0, 1 and -1. [1, 2].
The CSD presentation of a number is unique [2].
The number of nonzero digits is minimal [2].
There cannot be two consecutive non-zero digits [2].
How to convert a number into its CSD presentation?
First, find the binary presentation of the number.
Example 1
Lets take for example a number 287, which is 1 0001 1111 in binary representation. (256 + 16 + 8 + 4 + 2 + 1 = 287)
1 0001 1111
Starting from the right (LSB), if you find more than non-zero elements (1 or -1) in a row, take all of them, plus the next zero. (if there is not zero at the left side of the MSB, create one there). We see that the first part of this number is
01 1111
Add 1 to the number (i.e. change the 0 to 1, and all the 1's to 0's), and force the rightmost digit to be -1.
01 1111 -> 10 000-1
You can check that the number is still the same: 16 + 8 + 4 + 2 + 1 = 31 = 32 + (-1).
Now the number looks like this
1 0010 000-1
Since there are no more consecutive non-zero digits, the conversion is complete. Thus, the CSD presentation for the number 287 is 1 0010 000-1, which is 256 + 31 - 1.
Example 2
How about a little more challenging example. Number 345. In binary, it is
1 0101 1001
Find the first place (starting from righ), where there are more than one non-zero numbers in a row. Take also the next zero. Add one to it, and force the rightmost digit to be -1.
1 0110 -1001
Now we just created another pair of ones, which has to be transformed. Take the 011, and add one to it (get 100), and force the last digit to be -1. (get 10-1). Now the number looks like this
1 10-10 -1001
Do the same thing again. This time, you will have to imagine a zero in the left side of the MSB.
10 -10-10 -1001
You can make sure that this is the right CSD presentation by observing that: 1) There are no consecutive non-zero digits. 2) The sum adds to 325 (512 - 128 - 32 - 8 + 1 = 345).
More formal definitions of this algorithm can be found in [2].
Motivation behind the CSD presentation
CSD might be used in some other applications, too, but this is the digital microelectronics perspective. It is often used in digital multiplication. [1, 2]. Digital multiplication consists of two phases: Computing partial products and summing up the partial product. Let's consider the multiplication of 1010 and 1011:
1010
x 1011
1010
1010
0000
+ 1010
= 1101110
As we can see, the number of non-zero partial products (the 1010's), which are has to be summed up depends on the number of non-zero digits in the multiplier. Thus, the computation time of the sum of the partial products depends on the number of non-zero digits in the multiplier. Therefore the digital multiplication using CSD converted numbers is faster than using conventional digital numbers. The CSD form contains 33% less non-zero digits than the binary presentation (on average). For example, a conventional double precision floating point multiplication might take 100.2 ns, but only 93.2 ns, when using the CSD presentation. [1]
And how about the negative ones, then. Are there actually three states (voltage levels) in the microcircuit? No, the partial products calculated with the negative sign are not summed right away. Instead, you add the 2's complement (i.e. the negative presentation) of these numbers to the final sum.
Sources:
[1] D. Harini Sharma, Addanki
Purna Ramesh: Floating point multiplier using Canonical Signed
Digit
[2] Gustavo A. Ruiz, Mercedes Grand: Efficient canonic signed digit recoding

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex