Adding negative and positive binary? - math

X = 01001001 and Y = 10101010
If I want to add them together how do I do that? They are "Two's Complement"...
I have tried a lots of things but I am not quite sure I am getting the right answer since there seems to be different type of rules.
Just want to make sure it is correct:
1. Add them as they are do not convert the negative
2. Convert the negative number you get and that's the sum.
f.eks
01001001+10101010 = 11110011 => 00001100 => 1101 => -13
Or?
1. Convert the negative
2. Add them together and convert the negative
f.eks
01001001+10101010 => 01001001 + 01010110 => 10011111 => 01100001 => -97
So basically what I want to do is to take: X-Y, and X+Y
Can someone tell me how to do that?
Some resource sites:
student-binary
celtickane
swarthmore

The beauty of two's complement is that at the binary level it's a matter of interpretation rather than algorithm - the hardware for adding two signed numbers is the same as for unsigned numbers (ignoring flag bits).
Your first example - "just add them" - is exactly the right answer. Your example numbers
01001001 = 73
10101010 = -86
So, the correct answer is indeed -13.
Subtracting is just the same, in that no special processing is required for two's complement numbers: you "just subtract them".
Note that where things get interesting is the handling of overflow/underflow bits. You can't represent the result of 73 - (-86) as an 8-bit two's complement number...

Adding in two's complement doesn't require any special processing when the signs of the two arguments are opposite. You just add them as you normally would in binary, and the sign of the result is the sign you keep.

And just to make sure you understand two's complement, to convert from a positive to a negative number (or vice versa): invert each bit, then add 1 to the result.
For example, your positive number X = 01001001 becomes 10110101+1=10110110 as a negative number; your negative number Y = 10101010 becomes 01010101+1=01010110 as a positive number.
To subtract Y from X, negate Y and add. I.E. 01001001 + 01010110.

Your confusion might be because of the widths of the numbers involved. To get a better feel for this you could try creating a signed integer out of your unsigned integer.
If the MSB of your unsigned integer is already 0, then you can read it as signed and get the same result.
If the MSB is 1 then you can append a 0 to the left to get a signed number. You should sign-extend (that is, add 0s if the MSB is 0, add 1s if the MSB is 1) all the signed numbers to get a number of the same width so you can do the arithmetic "normally".
For instance, using your numbers:
X = 01001001: Unsigned, MSB is 0, do nothing.
Y = 10101010: Signed, did nothing with X, still do nothing.
But if we change the MSB of X to 1:
X = 11001001: Unsigned, MSB is 1, Add a 0 --> 011001001
Y = 10101010: Signed, extended X, so sign-extend Y --> 110101010
Now you have two signed numbers that you can add or subtract the way you already know.

01001001 + 10101010 = 11110011 => 00001100 => 1101 => -13
The first addend is 73. The second addend is -86. 86 = 101010. Padding to 8 bits including the 1 for the negative sign, -86 = 10101010
Both addends are in Sign-bit representation.
Solving them their sum is 1 1 1 1 0 0 1 1 which is an encoded binary (equivalent to having undergone One's Complement by inversion then Two's Complement by adding 1).
So do the reverse to have the decimal number. This time do first subtract 1 as inverse of Two's Complement = 1 1 1 1 0 0 1 1 - 1
= 1 1 1 1 0 0 1 0 then invert as in One's Complement = 0 0 0 0 1 1 0 1 which is equal to 13. Having done such reversal or having acknowledged One's complement and Two's Complement, the answer is negative. So affix the negative sign = -13

Related

How does the formula x & (x - 1) works?

From Hacker's Delight: 2nd Edition:
The formula here seems a little bit awkward here. How is some x vector is subtracted from 1 vector (presumbly 0x1111 1111) when x is smaller than 1? (Like: (as given in example) 0x0101 1000 - 0x0000 0000 doesn't make any sense to me) The former is a smaller number than the first one and the words aren't storing signed vectors either. Is that something related to RISC specific here or what?
As specified in the notation section of the book. A bold letter corresponds a vector for word like x = 00000000. And a bold one differs from a light face 1. As bold 1 = 11111111 which is an 8 bit word.
Edit2: Special Thanks to Paul Hankin to figure out the unconventional notations used here. A bold faced one refers to 32 bit size word which is [00000001] and a light faced 1 refers to a number one as in C.
Subtracting 1
Since we're more familiar with decimal than with binary, it sometimes helps to look at what happens in decimal.
What happens when subtracting 1 in decimal? Take for example 1786000 - 1 = 1785999.
If you subtract 1 from a positive number x in decimal:
all the zeroes at the right of x become 9;
the rightmost nonzero digit of x becomes 1 less;
other digits are unaffected.
Now, in binary, it works exactly the same, except we only have 0 1 instead of 0 123456789.
If you subtract 1 from a number x in binary:
all the zeroes at the right of x become 1;
the rightmost nonzero bit of x becomes 0;
other bits are unaffected.
What about negative numbers? Happily, representation using 2's complement is such that negative numbers behave exactly like positive numbers. In fact, when looking at the bits of x, you can subtract 1 from x without needing to know whether x is a signed int or an unsigned int.
x & (x-1)
Let's start with an example: x = 01011000. We can subtract 1 the way I just explained:
x = 01011000
x-1 = 01010111
Now what's the result of the bitwise-and operation x & (x-1)? We take the two bits in each column; if they are both 1, we write 1; if at least one of them is 0, we write 0.
x = 01011000
x-1 = 01010111
x&(x-1) = 01010000
What happened?
all the zeroes at the right of x remain zero;
the rightmost 1 of x becomes a 0 because of x-1;
all other bits are unaffected, because they are the same in x and x-1.
Conclusion: we have zeroed the rightmost 1 of x, and left all other bits unaffected.
Let's take a look at an what x-1 does.
Assume x is a value '???? 1000 (? are 0 or 1)
=> x-1 = ???? 0111
=> x & (x-1) = ???? 0000
It's very similar no matter where the right most 1 is placed within x.
Requested example:
x=00001111
=> x-1=00001110
=> x & (x-1) = 00001110
P.s. x-1 = 00001110 - 00000001 (<=> 00001110 + 11111111)

Struggle with binary substraction

Last week I learned about arithmetic with binary numbers especially substraction with two's complement, it was pretty easy so far but something bothers me a little bit. Why is 0 - 1 = 1 with a borrow of 1?
Sure its -1 but should we get some result like 1001 (4 bits for size)? Can someone please explain?
0-1 is not 1 in binary
0-1 does not equal 1 in two's complement with signed numbers. But its representation does have alot of ones in it: you should get 1111.
Negative numbers in two's complement:
In two complement's format (which computers use) with signed magnitude, one bit is reserved to represent the sign of the number. By convention this is the leftmost bit, where 0 indicates a positive value and 1 indicates a negative value.
To get the representation of a negative number we follow two steps:
Flip all the bits
Then increment by one
Additive Inverse of 7:
7=0111
1000 // Flip the bits
-7=1001 // Add +1
So why do we get 1111 when we subtract 0-1?
How do we substract in two's complement? We subtract A-B by taking the additive inverse (the opposite) of B and adding the numbers.
So with A=0 and B=1:
Take the additive inverse by inverting and then increment by 1.
Additive Inverse of B:
B= 0001
-B= 1110 // Invert
-B= 1111 // Increment +1
Now sum A + (-B) :
A=0000
(+) -B=1111
-----------
A+(-B)=1111
Note that if we are only allowed one bit. We can't represent negative numbers since there's no room for the value.

Whats the highest and the lowest integer for representing signed numbers in two's complement in 5 bits?

I understand how binary works and I can calculate binary to decimal, but I'm lost around signed numbers.
I have found a calculator that does the conversion. But I'm not sure how to find the maximum and the minumum number or convert if a binary number is not given, and question in StackO seems to be about converting specific numbers or doesn't include signed numbers to a specific bit.
The specific question is:
We have only 5 bits for representing signed numbers in two's complement:
What is the highest signed integer?
Write its decimal value (including the sign only if negative).
What is the lowest signed integer?
Write its decimal value (including the sign only if negative).
Seems like I'll have to go heavier on binary concepts, I just have 2 months in programming and I thought i knew about binary conversion.
From a logical point of view:
Bounds in signed representation
You have 5 bits, so there are 32 different combinations. It means that you can make 32 different numbers with 5 bits. On unsigned integers, it makes sense to store integers from 0 to 31 (inclusive) on 5 bits.
However, this is about unsigned integers. Meaning: we have to find a way to represent negative numbers too. Meaning: we have to store the number's value, but also its sign (+ or -). The representation used is 2's complement, and it is the one that's learned everywhere (maybe other exist but I don't know them). In this representation, the sign is given by the first bit. That is, in 2's complement representation a positive number starts with a 0 and a negative number starts with an 1.
And here the problem rises: Is 0 a positive number or a negative number ? It can't be both, because it would mean that 0 can be represented in two manners for a given number a bits (for 5: 00000 and 10000), that is we lose the space to put one more number. I have no idea how they decided, but fact is 0 is a positive number. For any number of bits, signed or unsigned, a 0 is represented with only 0.
Great. This gives us the answer to the first question: what is the upper bound for a decimal number represented in 2's complement ? We know that the first bit is for the sign, so all of the numbers we can represent must be composed of 4 bits. We can have 16 different values of 4-bits strings, and 0 is one of them, so the upper bound is 15.
Now, for the negative numbers, this becomes easy. We have already filled 16 values out of the 32 we can make on 5 bits. 16 left. We also know that 0 has already been represented, so we don't need to include it. Then we start at the number right before 0: -1. As we have 16 numbers to represent, starting from -1, the lowest signed integer we can represent on 5 bits is -16.
More generally, with n bits we can represent 2^n numbers. With signed integers, half of them are positive, and half of them are negative. That is, we have 2^(n-1) positive numbers and 2^(n-1) negative numbers. As we know 0 is considered as positive, the greatest signed integer we can represent on n bits is 2^(n-1) - 1 and the lowest is -2^(n-1)
2's complement representation
Now that we know which numbers can be represented on 5 bits, the question is to know how we represent them.
We already saw the sign is represented on the first bit, and that 0 is considered as positive. For positive numbers, it works the same way as it does for unsigned integers: 00000 is 0, 00001 is 1, 00010 is 2, etc until 01111 which is 15. This is where we stop for positive signed integers because we have occupied all the 16 values we had.
For negative signed integers, this is different. If we keep the same representation (10001 is -1, 10010 is -2, ...) then we end up with 11111 being -15 and 10000 not being attributed. We could decide to say it's -16 but we would have to check for this particular case each time we work with negative integers. Plus, this messes up all of the binary operations. We could also decide that 10000 is -1, 10001 is -2, 10010 is -3 etc. But it also messes up all of the binary operations.
2's complement works the following way. Let's say you have the signed integer 10011, you want to know what decimal is is.
Flip all the bits: 10011 --> 01100
Add 1: 01100 --> 01101
Read it as an unsigned integer: 01101 = 0*2^4 + 1*2^3 + 1*2^2 + 0*2^1 + 1*2^0 = 13.
10011 represents -13. This representation is very handy because it works both ways. How to represent -7 as a binary signed integer ? Start with the binary representation of 7 which is 00111.
Flip all the bits: 00111 --> 11000
Add 1: 11000 --> 11001
And that's it ! On 5 bits, -7 is represented by 11001.
I won't cover it, but another great advantage with 2's complement is that the addition works the same way. That is, When adding two binary numbers you do not have to care if they are signed or unsigned, this is the same algorithm behind.
With this, you should be able to answer the questions, but more importantly to understand the answers.
This topic is great for understanding 2's complement: Why is two's complement used to represent negative numbers?

Why we need to add 1 while doing 2's complement

The 2's complement of a number which is represented by N bits is 2^N-number.
For example: if number is 7 (0111) and i'm representing it using 4 bits then, 2's complement of it would be (2^N-number) i.e. (2^4 -7)=9(1001)
7==> 0111
1's compliment of 7==> 1000
1000
+ 1
-------------
1001 =====> (9)
While calculating 2's complement of a number, we do following steps:
1. do one's complement of the number
2. Add one to the result of step 1.
I understand that we need to do one's complement of the number because we are doing a negation operation. But why do we add the 1?
This might be a silly question but I'm having a hard time understanding the logic. To explain with above example (for number 7), we do one's complement and get -7 and then add +1, so -7+1=-6, but still we are getting the correct answer i.e. +9
Your error is in "we do one's compliment and get -7". To see why this is wrong, take the one's complement of 7 and add 7 to it. If it's -7, you should get zero because -7 + 7 = 0. You won't.
The one's complement of 7 was 1000. Add 7 to that, and you get 1111. Definitely not zero. You need to add one more to it to get zero!
The negative of a number is the number you need to add to it to get zero.
If you add 1 to ...11111, you get zero. Thus -1 is represented as all 1 bits.
If you add a number, say x, to its 1's complement ~x, you get all 1 bits.
Thus:
~x + x = -1
Add 1 to both sides:
~x + x + 1 = 0
Subtract x from both sides:
~x + 1 = -x
The +1 is added so that the carry over in the technique is taken care of.
Take the 7 and -7 example.
If you represent 7 as 00000111
In order to find -7:
Invert all bits and add one
11111000 -> 11111001
Now you can add following standard math rules:
00000111
+ 11111001
-----------
00000000
For the computer this operation is relatively easy, as it involves basically comparing bit by bit and carrying one.
If instead you represented -7 as 10000111, this won't make sense:
00000111
+ 10000111
-----------
10001110 (-14)
To add them, you will involve more complex rules like analyzing the first bit, and transforming the values.
A more detailed explanation can be found here.
Short answer: If you don't add 1 then you have two different representations of the number 0.
Longer answer: In one's complement
the values from 0000 to 0111 represent the numbers from 0 to 7
the values from 1111 to 1000 represent the numbers from 0 to -7
since their inverses are 0000 and 0111.
There is the problem, now you have 2 different ways of writing the same number, both 0000 and 1111 represent 0.
If you add 1 to these inverses they become 0001 and 1000 and represent the numbers from -1 to -8 therefore you avoid duplicates.
I'm going to directly answer what the title is asking (sorry the details aren't as general to everyone as understanding where flipping bits + adding one comes from).
First let motivate two's complement by recalling the fact that we can carry out standard (elementary school) arithmetic with them (i.e. adding the digits and doing the carrying over etc). Easy of computation is what motivates this representation (I assume it means we only 1 piece of hardware to do addition rather than 2 if we implemented subtraction differently than addition, and we do and subtract differently in elementary school addition btw).
Now recall the meaning of each of the digit's in two's complements and some binary numbers in this form as an example (slides borrowed from MIT's 6.004 course):
Now notice that arithmetic works as normal here and the sign is included inside the binary number in two's complement itself. In particular notice that:
1111....1111 + 0000....1 = 000....000
i.e.
-1 + 1 = 0
Using this fact let's try to derive what the two complement representation for -A should be. So the problem to solve is:
Q: Given the two's complement representation for A what is the two's complement's representation for -A?
To do this let's do some algebra using values we know:
A + (-A) = 0 = 1 + (-1) = 11...1 + 00000...1 = 000...0
now let's make -A the subject expressed in terms of numbers expressed in two's complement:
-A = 1 + (-1 - A) = 000.....1 + (111....1 - A)
where A is in two's complements. So what we need to compute is the subtraction of -1 and A in two's complement format. For that we notice how numbers are represented as a linear combination of it's bases (i.e. 2^i):
1*-2^N-1 + 1 * 2^N-1 + ... 1 = -1
a_N * -2^N-1 + a_N-1 * 2^N-1 + ... + a_0 = A
--------------------------------------------- (subtract them)
a_N-1 * -2^N-1 + a_N-1 -1 * 2^N-1 + ... + a_0 -1 = A
which essentially means we subtract each digit for it's corresponding value. This ends up simply flipping bits which results in the following:
-A = 1 + (-1 - A) = 1 + ~ A
where ~ is bit flip. This is why you need to bit flip and add 1.
Remark:
I think a comment that was helpful to me is that complement is similar to inverse but instead of giving 0 it gives 2^N (by definition) e.g. with 3 bits for the number A we want A+~A=2^N so 010 + 110 = 1000 = 8 which is 2^3. At least that clarifies what the word "complement" is suppose to mean here as it't not just the inverting of the meaning of 0 and 1.
If you've forgotten what two's complement is perhaps this will be helpful: What is “2's Complement”?
Cornell's answer that I hope to read at some point: https://www.cs.cornell.edu/~tomf/notes/cps104/twoscomp.html#whyworks

Rounding to the nearest integer in floating point

How can I round a floating point number to the nearest integer? I am looking for the algorithm in terms of binary since I have to implement the code in assembly.
UPDATED with method for proper rounding to even.
Basic Algorithm:
Store the 23-exponent+1'th bit (after the decimal point). Next, zero out the (23-exponent) least significant bits. Then use the stored bit and the new LSB to round. If the stored bit bit is 1, add one to the LSB of the non-truncated part and normalize if necessary. If the stored bit is 0, do nothing.
**
For results matching IEEE-754 standard:
**
Before Zeroing out the (23-exponent) least significant bits, OR together the (22-exponent) least significant bits. Call the result of that OR the rounding bit.
The stored (23-exponent + 1) bit (after the decimal point) will be called the guard bit.
Then zero out the (23-exponent) least significant bits).
If the guard bit is zero, do nothing.
If the guard bit is 1, and the sticky bit is 0, add one to the LSB if the LSB is 1.
If the guard bit is 1 and the sticky bit is 1, add one to the LSB.
Here are some examples using the basic algorithm:
x = 62.3
sign exponent mantissa
x = 0 5 (1).11110010011001100110011
Step 1: Store the exponent+1'th bit (after the decimal point)
exponent+1 = 6th bit
savedbit = 0
Step 2: Zero out 23-exponent least significant bits
23-exponent = 18, so we zero out the 18 LSBs
sign exponent mantissa
x = 0 5 (1).11110000000000000000000
Step 3: Use the next bit to round
Since the stored bit is 0, we do nothing, and the floating point number has been rounded to 62.
Another example:
x = 21.9
sign exponent mantissa
x = 0 4 (1).01011110011001100110011
Step 1: Store the exponent+1'th bit (after the decimal point)
exponent+1 = 5th bit
savedbit = 1
Step 2: Zero out 23-exponent least significant bits
23-exponent = 19, so we zero out the 19 LSBs
sign exponent mantissa
x = 0 4 (1).01010000000000000000000
Step 3: Use the next bit to round
Since the stored bit is 1, we add one to the LSB of the truncated part and get 22, which is the correct number:
We start with:
sign exponent mantissa
x = 0 4 (1).01010000000000000000000
Add one at this location:
+ 1
And we get 22:
sign exponent mantissa
x = 0 4 (1).01100000000000000000000
There is an SSE instruction for round to nearest: http://www.musicdsp.org/showone.php?id=246
inline int float2int(float x) {
int i;
__asm {
fld x
fistp i
}
return i;
}
Decrease the exponent by 1, add 1, increase the exponent by 1, truncate. Or just add 0.5 and truncate. Whichever floats your boat.

Resources