Arduino Arithmetic error negative result - arduino

I am trying to times 52 by 1000 and i am getting a negative result
int getNewSum = 52 * 1000;
but the following code is ouputting a negative result: -13536

An explanation of how two's complement representation works is probably better given on Wikipedia and other places than here. What I'll do here is to take you through the workings of your exact example.
The int type on your Arduino is represented using sixteen bits, in a two's complement representation (bear in mind that other Arduinos use 32 bits for it, but yours is using 16.) This means that both positive and negative numbers can be stored in those 16 bits, and if the leftmost bit is set, the number is considered negative.
What's happening is that you're overflowing the number of bits you can use to store a positive number, and (accidentally, as far as you're concerned) setting the sign bit, thus indicating that the number is negative.
In 16 bits on your Arduino, decimal 52 would be represented in binary as:
0000 0000 0011 0100
(2^5 + 2^4 + 2^2 = 52)
However, the result of multiplying 52 by 1,000 -- 52,000 -- will end up overflowing the magnitude bits of an int, putting a '1' in the sign bit on the end:
*----This is the sign bit. It's now 1, so the number is considered negative.
1100 1011 0010 0000
(typically, computer integer arithmetic and associated programming languages don't protect you against doing things like this, for a variety of reasons, mostly related to efficiency, and mostly now historical.)
Because that sign bit on the left-hand end is set, to convert that number back into decimal from its assumed two's complement representation, we assume it's a negative number, and then first take the one's complement (flipping all the bits):
0011 0100 1101 1111
-- which represents 13,535 -- and add one to it, yielding 13,536, and call it negative: -13,536, the value you're seeing.
If you read up on two's complement/integer representations in general, you'll get the hang of it.
In the meantime, this probably means you should be looking for a bigger type to store your number. Arduino has unsigned integers, and a long type, which will use four bytes to store your numbers, giving you a range from -2,147,483,648 to 2,147,483,647. If that's enough for you, you should probably just switch to use long instead of int.

Matt's answer is already a very good in depth explanation of what's happening, but for those looking for a more TL;dr practical answer:
Problem:
This happens quite often for Arduino programmers when they try to assign (= equal sign) the result of an arithmetic (usually multiplication) to a normal integer (int). As mentioned, when the result is bigger than the memory size compiler has assigned to the variables, overflowing happens.
Solution 1:
The easiest solution is to replace the int type with a bigger datatype considering your needs. As this tutorialspoint.com tutorial has explained there are different integer types we can use:
int:
16 bit: from -32,768 to 32,767
32 bit: from -2,147,483,648 to 2,147,483,647
unsigned int: from 0 to 65,535
long: from 2,147,483,648 to 2,147,483,647
unsigned long: from 0 to 4,294,967,295
Solution 2:
This works only if you have some divisions with big enough denominators, in your arithmetic. In Arduino compiler multiplication is calculated prior to the division. so if you have some divisions in your equation try to encapsulate them with parentheses. for example if you have a * b / c replace it with a * (b / c).

Related

How to Add/Subtract binary

I understand that when subtracting binaries you should convert the second binary to it's 2's Complement. But in the following case:
01101+11110
The 11110 was converted to its 2's Complement. In other words the working equation is now:
01101+00010
Now I am pretty confused on when I should be converting. Any help would be greatly appreciated!
What I commented maybe not precise enough, though. I mistake CARRY for OVERFLOW.
-
So explain it little further
Your operation is:
01101 (binary) + 11110 (binary) = 01011 (binary)
-
if signed (CORRECT RESULT):
13 (decimal) + -2 (decimal) = 11 (decimal)
if unsigned (UNDESIRED RESULT):
13 (decimal) + 30 (decimal) = 11 (decimal)
-
There are two things in a binary arithmetic operation called separately OVERFLOW and CARRY.
For addition operation, The CARRY flag is set if the
addition of two numbers causes a carry out of the most significant
(leftmost) bits added.
example:
1111 + 0001 = 0000 (carry flag is turned on)
0111 + 0001 = 1000 (carry flag is turned off)
For subtraction operation, The CARRY flag is set if the
the subtraction of two numbers requires a borrow into the most significant (leftmost)
bits subtracted.
example:
0000 - 0001 = 1111 (carry flag is turned on)
1000 - 0001 = 0111 (carry flag is turned off)
For addition operation, the OVERFLOW flag is set if the
addition of two numbers with the same most significant bit (both 0 or
both 1) and the result produced a binary with a different most
significant bit.
example:
0100 + 0100 = 1000 (overflow flag is turned on)
0100 + 0001 = 0101 (overflow flag is turned off)
-
Note that:
In unsigned arithmetic, watch the carry flag to detect errors.
In signed arithmetic, the carry flag tells you nothing interesting.
-
In signed arithmetic, watch the overflow flag to detect errors.
In unsigned arithmetic, the overflow flag tells you nothing interesting.
-
check error in adding unsigned binaries in assembly language
ADD BX, AX
//check carry here
check error in subtracting unsigned binaries in assembly language
SUB BX, AX
//check carry here
check error in adding signed binaries in assembly language
ADD BX, AX
//check overflow here
check error in subtracting signed binaries in assembly language
SUB BX, AX
//check overflow here
See this for detail of the flag sets in ADD/SUB
carry/overflow & subtraction in x86
As you have understood correctly you want to do a twos complement. Logic will use addition to do subtraction by taking advantage of the beauty of twos complement.
One of the ways we are taught to "take the twos complement" is to invert (ones complement) and add one.
So if you want to do 4 - 7 (0b00100 - 0b00111)
That is the same as 0b00100 + 0b11000 + 1, can take advantage of the carry in of the lsbit:
1
00100
+ 11000
========
Do the grade school math in base 2.
000001
00100
+ 11000
========
11101
which is -3 if this is interpreted as a signed operation (same add/subtract logic for signed and unsigned, processor doesn't know, only the programmer does).
I assume instead of
01101+11110
you wanted to subtract those two numbers
01101-11110 13 - (-2) if interpreted as signed numbers
000011
01101
+ 00001
========
01111
13 + 2 = 15
the carry in and the carry out of the msbit are the same so there is no signed overflow the result is good. Note there isn't an unsigned overflow either.
So for subtraction you invert the carry in and the second operand from what you would for addition. Some processor architectures invert the carry out to indicate a borrow, some don't. You have check the architecture of the processor you are using to know if yours does. Unfortunately few do a good job at that level of documentation so sometimes you have to figure it out experimentally.
For addition you do not invert any of the operands into the adder. The operation (add vs sub) determines whether or not you invert on the way into the adder.

How to tell if hex value is negative?

I have just learned how to read hexadecimal values. Until now, I was only reading them as positive numbers. I heard you could also write negative hex values.
My issue is that I can't tell if a value is negative or positive.
I found a few explanations here and there but if I try to verify them by using online hex to decimal converters, they always give me different results.
Sources I found:
https://stackoverflow.com/a/5827491/5016201
https://coderanch.com/t/246080/java-programmer-OCPJP/certification/Negative-Hexadecimal-number
If I understand correctly it means that: If a hex value written with all its bits having something > 7 as its first hex digit, it is negative.
All 'F' at the beginning or the first digit means is that the value is negative, it is not calculated.
For exemple if the hex value is written in 32 bits:
FFFFF63C => negative ( -2500 ?)
844fc0bb => negative ( -196099909 ?)
F44fc0bb => negative ( -196099909 ?)
FFFFFFFF => negative ( -1 ?)
7FFFFFFF => positive
Am I correct? If not, could you tell me what I am not getting right?
Read up on Two's complement representation: https://en.wikipedia.org/wiki/Two%27s_complement
I think that the easiest way to understand how negative numbers (usually) are treated is to write down a small binary number and then figure out how to do subtraction by one. When you reach 0 and apply that method once again - you'll see that you suddenly get all 1's. And that is how "-1" is (usually) represented: all ones in binary or all f's in hexadecimal. Commonly, if you work with signed numbers, they are represented by the first (most significant) bit being one. That is to say that if you work with a number of bits that is a multiple of four, then a number is negative if the first hexadecimal digit is 8,9,A,B,C,D,E or F.
The method to do negation is:
invert all the bits
add 1
Another benefit from this representation (two's complement) is that you only get one representation for zero, which would not be the case if you marked signed numbers by setting the MSB or just inverting them.
From what I understand, you always need to look at the left-most digit to tell the sign. If in hex, then anything from 0-7 is positive and 8-f is negative. Alternatively, you can convert from hex to binary, and if there's a 1 in the left-most digit, then the number is negative.
HEX <-> BINARY <-> SIGN
0-7 <-> 0000-0111 <-> pos
8-F <-> 1000-1111 <-> neg
The answer here in the forum looks good:
Each hexadecimal "digit" is 4 bits. The d in the high order position
is 1101. So you see it's got a high bit of one, therefore the whole
number is negative.
and
A hex number is always positive (unless you specifically put a minus
sign in front of it). It might be interpreted as a negative number
once you store it in a particular data type. Only then does the most
significant bit (MSB) matter, but it's the MSB of the number "as
stored in that data type". In that respect the answers above are only
partially correct: only in the context of an actual data type (like an
int or a long) does the MSB matter.
If you store "0xdcafe" in an int, the representation of it would be
"0000 0000 0000 1101 1100 1010 1111 1110" - the MSB is 0. Whereas the
representation of "0xdeadcafe" is "1101 1110 1010 1101 1100 1010 1111
1110" - the MSB is 1.
You can tell whether a hexadecimal integer is positive or negative by inspecting its most significant (highest) digit. If the digit is ≥ 8, the number is negative; if the digit is ≤ 7, the number is positive. For example, hexadecimal 8A20 is negative and 7FD9 is positive

Adding Two's Complement binary

Ladies and gentlemen, I have successfully been able to understand adding, etc. for unsigned binary numbers. However, this Two's complement has me beat.. Let me explain with some examples.
Example practice problem, perform each arithmetic using 8-bit signed integer storage system, under two's complement:
1111111
01100001
+ 00111111
----------
10100000 <== My Answer (idk if right / wrong), but i think right.
So it doesn't exceed 8 bits, but we have changed (positive + positive) = negative. That has to be an overflow, because the sign is changing, right? (I never understood what the carry in MSB and carry out MSB's were).
The reallllly tricky part for me are the following equations: (negative + negative) which in reality is equal to (negative - positive).
111111
10111111
+ 10010101
----------
1 | 01010100
So I think this should be wrong because when we discard the overflow bit (the 1 that is way out in left field), it turns the 8-bit representation into a positive number, when it should be a negative. SO this would entail an overflow, no?
The following equation is similar:
1111
10001110
+ 10110101
----------
1 | 01000011
Understandably, if we were working with 16-bit, etc. Then these wouldn't be overflows, because the signs aren't changing, the math is correct. But since when we store the 8-bit representation of these numbers, we lose the MSB, that would flip the signs.
But one thing that I noticed about my theory is, whenever adding two negative numbers, the MSB's will obviously always be 1, therefore you will always have a carry, which means you would always have an overflow.
** I think the more logical conclusion is that I am forgetting to convert the second negative to a positive or something prior to adding them, or something along those lines. But I've tried youtube and various research online. And TBH, my professor is terrible with the whole "communication" thing.. I would appreciate any help the community can give, so I can push past these problems and onto harder material XD.
Yes, if you discard a 1 carry bit then that signals overflow. Don't worry about overflow. It is correct to discard the carry bit.
But since when we store the 8-bit representation of these numbers, we lose the MSB, that would flip the signs.
It's important not to think of discarding a 1 bit as flipping the sign. First, discarding a carry bit is not the same as flipping the sign bit. You can discard the carry bit from a negative result and still end up with a negative answer. For instance:
1111111
11111111 (-1)
+ 11111111 (-1)
--------
1 | 11111110 (-2)
The final 1 carry bit is discarded, but the answer's sign doesn't flip.
Second, even if you're thinking of flipping (as opposed to discarding) the sign bit, it's not good to think of that as flipping the sign. In sign magnitude representation flipping the sign bit will flip the sign of the number. But in one's and two's complement, negation is more than just flipping the leftmost bit. If you just flip the bit you get a very different number. Yes, it has the opposite sign, but it's not the same number.
Sign Mag. | One's Compl. | Two's Compl.
01111111 = 127 | 127 | 127
11111111 = -127 | -0 | -1
Yes, your math is correct.
The elegance of two's compliment is that addition "just works" without any special considerations for the sign bit. The reason both of those subtractions underflow is because the magnitudes of the numbers are already quite large.
Let's do the last two questions in decimal:
(-65)
+ (-107)
--------
(-172) which underflows to 84.
(-114)
+ (-75)
--------
(-189) which underflows to 67.
The lowest signed 8-bit value is -128, so both of them underflow.

What determines which system is used to translate a base 10 number to decimal and vice-versa?

There are a lot of ways to store a given number in a computer. This site lists 5
unsigned
sign magnitude
one's complement
two's complement
biased (not commonly known)
I can think of another. Encode everything in Ascii and write the number with the negative sign (45) and period (46) if needed.
I'm not sure if I'm mixing apples and oranges but today I heard how computers store numbers using single and double precision floating point format. In this everything is written as a power of 2 multiplied by a fraction. This means numbers that aren't powers of 2 like 9 are written as a power of 2 multiplied by a fraction e.g. 9 ➞ 16*9/16. Is that correct?
Who decides which system is used? Is it up to the hardware of the computer or the program? How do computer algebra systems handle transindental numbers like π on a finite machine? It seems like things would be a lot easier if everything's coded in Ascii and the negative sign and the decimal is placed accordingly e.g. -15.2 would be 45 49 53 46 (to base 10)
➞
111000 110001 110101 101110
Well there are many questions here.
The main reason why the system you imagined is bad, is because the lack of entropy. An ASCII character is 8 bits, so instead of 2^32 possible integers, you could represent only 4 characters on 32 bits, so 10000 integer values (+ 1000 negative ones if you want). Even if you reduce to 12 codes (0-9, -, .) you still need 4 bits to store them. So, 10^8+10^7 integer values, still much less than 2^32 (remember, 2^10 ~ 10^3). Using binary is optimal, because our bits only have 2 values. Any base that is a power of 2 also makes sense, hence octal and hex -- but ultimately they're just binary with bits packed per 3 or 4 for readability. If you forget about the sign (just use one bit) and the decimal separator, you get BCD : Binary Coded Decimals, which are usually coded on 4 bits per digit though a version on 8 bits called uncompressed BCD also seems to exist. I'm sure with a bit of research you can find fixed or floating point numbers using BCD.
Putting the sign in front is exactly sign magnitude (without the entropy problem, since it has a constant size of 1 bit).
You're roughly right on the fraction in floating point numbers. These numbers are written with a mantissa m and an exponent e, and their value is m 2^e. If you represent an integer that way, say 8, it would be 1x2^3, then the fraction is 1 = 8/2^3. With 9 that fraction is not exactly representable, so instead of 1 we write the closest number we can with the available bits. That is what we do as well with irrational (and thus transcendental) numbers like Pi : we approximate.
You're not solving anything with this system, even for floating point values. The denominator is going to be a power of 10 instead of a power of 2, which seems more natural to you, because it is the usual way we write rounded numbers, but is not in any way more valid or more accurate. ** Take 1/6 for example, you cannot represent it with a finite number of digits in the form a/10^b. *
The most popular representations for negative numbers is 2's complement, because of its nice properties when adding negative and positive numbers.
Standards committees (argue a lot internally and eventually) decide what complex number formats like floating points look like, and how to consistently treat corner cases. E.g. should dividing by 0 yield NaN ? Infinity ? An exception ? You should check out the IEEE : www.ieee.org . Some committees are not even agreeing yet, for example on how to represent intervals for interval arithmetic. Eventually it's the people who make the processors who get the final word on how bits are interpreted into a number. But sticking to standards allows for portability and compatibility between different processors (or coprocessors, what if your GPU used a different number format ? You'd have more to do than just copy data around).
Many alternatives to floating point values exist, like fixed point or arbitrary precision numbers, logarithmic number systems, rational arithmetic...
* Since 2 divides 10, you might argue that all the numbers representable by a/2^b can be a5^b/10^b, so less numbers need to be approximated. That only covers a minuscule family (an ideal, really) of the rational numbers, which are an infinite set of numbers. So it still doesn't solve the need for approximations for many rational, as well as all irrational numbers (as Pi).
** In fact, because of the fact that we use the powers of 2 we pack more significant digits after the decimal separator than we would with powers of 10 (for a same number of bits). That is, 2^-(53+e), the smallest bit of the mantissa of a double with exponent e, is much smaller than what you can reach with 53 bits of ASCII or 4-bit base 10 digits : at best 10^-4 * 2^-e

How is overflow detected at the binary level?

I'm reading the textbook Computer Organization And Design by Hennessey and Patterson (4th edition). On page 225 they describe how overflow is detected in signed, 2's complement arithmetic. I just can't even understand what they're talking about.
"How do we detect [overflow] when it does occur? Clearly, adding or
substracting two 32-bit numbers can yield a result that needs 33 bits
to be fully expressed."
Sure. And it won't need 34 bits because even the smallest 34 bit number is twice the smallest 33 bit number, and we're adding 32 bit numbers.
"The lack of a 33rd bit means that when overflow occurs, the sign bit
is set with the value of the result instead of the proper sign of
the result."
What does this mean? The sign bit is set with the "value" of the result... meaning it's set as if the result were unsigned? And if so, how does that follow from the lack of a 33rd bit?
"Since we need just one extra bit, only the sign bit can be wrong."
And that's where they lost me completely.
What I'm getting from this is that, when adding signed numbers, there's an overflow if and only if the sign bit is wrong. So if you add two positives and get a negative, or if you add two negatives and get a positive. But I don't understand their explanation.
Also, this only applies to unsigned numbers, right? If you're adding signed numbers, surely detecting overflow is much simpler. If the last half-adder of the ALU sets its carry bit, there's an overflow.
note: I really don't know what tags are appropriate here, feel free to edit them.
Any time you want to deal with these kind of ALU items be it add, subtract, multiply, etc, start with 2 or 3 bit numbers, much easier to get a handle on than 32 or 64 bit numbers. After 2 or 3 bits it doesn't matter if it is 22 or 2200 bits it all works exactly the same from there on out. Basically you can by hand if you want make a table of all 3 bit operands and their results such that you can examine the whole table visually, but a table of all 32 bit operands against all 32 bit operands and their results, can't do that by hand in a reasonable time and cannot examine the whole table visually.
Now twos complement, that is just a scheme for representing positive and negative numbers, and it is not some arbitrary thing it has a reason, the reason for the madness is that your adder logic (which is also what the subtractor uses which is the same kind of thing the multiplier uses) DOES NOT CARE ABOUT UNSIGNED OR SIGNED. It does not know the difference. YOU the programmer cares in my three bit world the bit pattern 0b111 could be a positive seven (+7) or it could be a negative one. Same bit pattern, feed it to the add logic and the same thing comes out, and the answer that comes out I can choose to interpret as unsigned or twos complement (so long as I interpret the operands and the result all as either unsigned or all as twos complement). Twos complement also has the feature that for negative numbers the most significant bit (msbit) is set, for positive numbers it is zero. So it is not sign plus magnitude but we still talk about the msbit being the sign bit, because except for two special numbers that is what it is telling us, the sign of the number, the other bits are actually telling us the magnitude they are just not an unsigned magnitude as you might have in sign+magnitude notation.
So, the key to this whole question is understanding your limits. For a 3 bit unsigned number our range is 0 to 7, 0b000 to 0b111. for a 3 bit signed (twos complement) interpretation our range is -4 to +3 (0b100 to 0b011). For now limiting ourselves to 3 bits if you add 7+1, 0b111 + 0b001 = 0b1000 but we only have a 3 bit system so that is 0b000, 7+1 = 8, we cannot represent 8 in our system so that is an overflow, because we happen to be interpreting the bits as unsigned we look at the "unsigned overflow" which is also known as the carry bit or flag. Now if we take those same bits but interpret them as signed, then 0b111 (-1) + 0b001 (+1) = 0b000 (0). Minus one plus one is zero. No overflow, the "signed overflow" is not set...What is the signed overflow?
First what is the "unsigned overflow".
The reason why "it all works the same" no matter how many bits we have in our registers is no different than elementary school math with base 10 (decimal) numbers. If you add 9 + 1 which are both in the ones column you say 9 + 1 = zero carry the 1. you carry a one over to the tens column then 1 plus 0 plus 0 (you filled in two zeros in the tens column) is 1 carry the zero. You have a 1 in the tens column and a zero in the ones column:
1
09
+01
====
10
What if we declared that we were limited to only numbers in the ones column, there isn't any room for a tens column. Well that carry bit being a non-zero means we have an overflow, to properly compute the result we need another column, same with binary:
111
111
+ 001
=======
1000
7 + 1 = 8, but we cant do 8 if we declare a 3 bit system, we can do 7 + 1 = 0 with the carry bit set. Here is where the beauty of twos complement comes in:
111
111
+ 001
=======
000
if you look at the above three bit addition, you cannot tell by looking if that is 7 + 1 = 0 with the carry bit set or if that is -1 + 1 = 0.
So for unsigned addition, as we have known since grade school that a carry over into the next column of something other than zero means we have overflowed that many placeholders and need one more placeholder, one more column, to hold the actual answer.
Signed overflow. The sort of academic answer is if the carry in of the msbit column does not match the carry out. Let's take some examples in our 3 bit world. So with twos complement we are limited to -4 to +3. So if we add -2 + -3 = -5 that wont work correct?
To figure out what minus two is we do an invert and add one 0b010, inverted 0b101, add one 0b110. Minus three is 0b011 -> 0b100 -> 0b101
So now we can do this:
abc
100
110
+ 101
======
011
If you look at the number under the b that is the "carry in" to the msbit column, the number under the a the 1, is the carry out, these two do not match so we know there is a "signed overflow".
Let's try 2 + 2 = 4:
abc
010
010
+ 010
======
100
You may say but that looks right, sure unsigned it does, but we are doing signed math here, so the result is actually a -4 not a positive 4. 2 + 2 != -4. The carry in which is under the b is a 1, the carry out of the msbit is a zero, the carry in and the carry out don't match. Signed overflow.
There is a shortcut to figuring out the signed overflow without having to look at the carry in (or carry out). if ( msbit(opa) == msbit(opb) ) && ( msbit(res) != msbit(opb) ) signed overflow, else no signed overflow. opa being one operand, opb being the other and res the result.
010
+ 010
======
100
Take this +2 + +2 = -4. msbit(opa) and msbit(opb) are equal, and the result msbit is not equal to opb msbit so this is a signed overflow. You could think about it using this table:
x ab cr
0 00 00
0 01 01
0 10 01
0 11 10 signed overflow
1 00 01 signed overflow
1 01 10
1 10 10
1 11 11
This table is all the possible combinations if carry in bit, operand a, operand b, carry out and result bit for a single column turn your head sideways to the left to sort of see this x is the carry in, a and b columns are the two operands. cr as a pair is the result xab of 011 means 0+1+1 = 2 decimal which is 0b10 binary. So taking the rule that has been dictated to us, that if the carry in and carry out do not match that is a signed overflow. Well the two cases where the item in the x column does not match the item in the c column are indicated those are the cases where a and b inputs match each other, but the result bit is the opposite of a and b. So assuming the rule is correct this quick shortcut that does not require knowing what the carry bits are, will tell you if there was a signed overflow.
Now you are reading an H&P book. Which probably means mips or dlx, neither mips or dlx deal with carry and signed flags in the way that most other processors do. mips is not the best first instruction set IMO primarily for that reason, their approach is not wrong in any way, but being the oddball, you will spend forever thinking differently and having to translate when going to most other processors. Where if you learned the typical znvc flags (zero flag, negative flag, v=signed overflow, c=carry or unsigned overflow) way then you only have to translate when going to mips. Normally these are computed on every alu operation (for the non-mips type processors) you will see signed and unsigned overflow being computed for add and subtract. (I am used to an older mips, maybe this gen of books and the current instruction set has something different). Calling it addu add unsigned right at the start of mips after learning all of the above about how an adder circuit does not care about signed vs unsigned, is a huge problem with mips it really puts you in the wrong mindset for understanding something this simple. Leads to the belief that there is a difference between signed addition and unsigned addition when there isn't. It is only the overflow flags that are computed differently. Now multiply, and divide there is definitely a twos complement vs unsigned difference and you ideally need a signed multiply and an unsigned multiply or you need to deal with the limitation.
I recommend a simple (depending on how strong your bit manipulation is and twos complement) exercise that you can write in some high level language. Basically take all the combinations of unsigned numbers 0 to 7 added to 0 to 7 and save the result. Print out both as decimal and as binary (three bits for operands, four bits for result) and if the result is greater than 7 print overflow as well. Repeat this using signed variables using the numbers -4 to +3 added to -4 to +3. print both decimal with a +/- sign and the binary. If the result is less than -4 or greater than +3 print overflow. From those two tables you should be able to see that the rules above are true. Looking strictly at the operand and result bit patterns for the size allowed (three bits in this case) you will see that the addition operation gives the same result, same bit pattern for a given pair of inputs independent of whether those bit patterns are considered unsigned or twos complement. Also you can verify that unsigned overflow is when the result needs to use that fourth column, there is a carry out off of the msbit. For signed when the carry in doesn't match the carry out, which you see using the shortcut looking at the msbits of the operands and result. Even better is to have your program do those comparisons and print out something. So if you see a note in your table that the result is greater than 7 and a note in your table that bit 3 is set in the result, then you will see for the unsigned table that is always the case (limited to inputs of 0 to 7). And the more complicated one, signed overflow, is always when the result is less than -4 and greater than 3 and when the operand upper bits match and the result upper bit does not match the operands.
I know this is super long and very elementary. If I totally missed the mark here, please comment and I will remove or re-write this answer.
The other half of the twos complement magic. Hardware does not have subtract logic. One way to "convert" to twos complement is to "invert and add one". If I wanted to subtract 3 - 2 using twos complement what actually happens is that is the same as +3 + (-2) right, and to get from +2 to to -2 we invert and add one. Looking at our elementary school addition, did you notice the hole in the carry in on the first column?
111H
111
+ 001
=======
1000
I put an H above where the hole is. Well that carry in bit is added to the operands right? Our addition logic is not a two input adder it is a three input adder yes? Most of the columns have to add three one bit numbers in order to compute two operands. If we use a three input adder on the first column now we have a place to ... add one. If I wanted to subtract 3 - 2 = 3 + (-2) = 3 + (~2) + 1 which is:
1
011
+ 101
=====
Before we start and filled in it is:
1111
011
+ 101
=====
001
3 - 2 = 1.
What the logic does is:
if add then carry in = 0; the b operand is not inverted, the carry out is not inverted.
if subtract then carry in = 1; the b operand is inverted, the carry out MIGHT BE inverted.
The addition above shows a carry out, I didn't mention that this was an unsigned operation 3 - 2 = 1. I used some twos complement tricks to perform an unsigned operation, because here again no matter whether I interpret the operands as signed or unsigned the same rules apply for if add or if subtract. Why I said that the carry out MIGHT BE inverted is that some processors invert the carry out and some don't. It has to do with cascading operations, taking say a 32 bit addition logic and using the carry flag and an add with carry or subtract with borrow instruction creating a 64 bit add or subtract, or any multiple of the base register size. Say you have two 64 bit numbers in a 32 bit system a:b + c:d where a:b is the 64 bit number but it is held in the two registers a and b where a is the upper half and b is the lower half. so a:b + c:d = e:f on a 32 bit system unsigned that has a carry bit and add with carry:
add f,b,d
addc e,a,c
The add leaves its carry out bit from the upper most bit lane in the carry flag in the status register, the addc instruction is add with carry takes the operands a+c and if the carry bit is set then adds one more. a+c+1 putting the result in e and the carry out in the carry flag, so:
add f,b,d
addc e,a,c
addc x,y,z
Is a 96 bit addition, and so on. Here again something very foreign to mips since it doesn't use flags like other processors. Where the invert or don't invert comes in for signed carry out is on the subtract with borrow for a particular processor. For subtract:
if subtract then carry in = 1; the b operand is inverted, the carry out MIGHT BE inverted.
For subtract with borrow you have to say if the carry flag from the status register indicates a borrow then the carry in is a 0 else the carry in is a 1, and you have to get the carry out into the status register to indicate the borrow.
Basically for the normal subtract some processors invert b operand and carry on in the way in and carry out on the way out, some processors invert the b operand and carry in in the way in but don't invert carry out on the way out. Then when you want to do a conditional branch you need to know if the carry flag means greater than or less than (often the syntax will have a branch if greater or branch if less than and sometimes tell you which one is the simplified branch if carry set or branch if carry clear). (If you don't "get" what I just said there that is another equally long answer which won't mean anything so long as you are studying mips).
As a 32-bit signed integers are represented by 1 sign-bit and 31 bits for the actual number we are effectively adding two 31 bit-numbers. Hence the 32nd bit (sign bit) will be where the overflow will be visible.
"The lack of a 33rd bit means that when overflow occurs, the sign bit is set with the value of the result instead of the proper sign of the result."
Imagine the following addition of two positive numbers (16 bit to simpify):
0100 1100 0011 1010 (19514)
+ 0110 0010 0001 0010 (25106)
= 1010 1110 0110 1100 (-20884 [or 44652])
For the summation of two large negative numbers however the extra bit would be required
1100 1100 0011 1010
+ 1110 0010 0001 0010
=11010 1110 0110 1100
Usually the CPU have this 33rd bit (or whatever bitsize it operates on +1) exposed as a overflow-bit in the micro-architecture.
Their description relates to operations on values with a particular bit sequence: the first bit corresponds to the sign of the value, and the other bits relate to the magnitude of that value.
What does this mean? The sign bit is set with the "value" of the result...
They mean that the overflow bit - the one that is a consequence of adding two numbers that need to spill into the next digit over - is dumped into the same place that the sign bit should be.
"Since we need just one extra bit, only the sign bit can be wrong."
All this means is that, when you perform arithmetic that overflows, the only bit whose value may be incorrect is the sign bit. All of the other bits are still the value they should be.
This is a consequence of what was described above: confusion between the sign bit's value due to overflow.

Resources