Does the 6502 use signed or unsigned 8 bit registers (JAVA)? - cpu-registers

I'm writing an emulator for the 6502, and basically, there are some instructions where there's an offset saved in one of the registers (mostly X and Y) and I'm wondering, since branch instructions use signed 8 bit integers, do the registers keep their values as 8 bit signed? Meaning this:
switch(opcode) {
//Bunch of opcodes
case 0xD5:
//Read the memory area with final address being address + x offset
int rempResult = a - readMemory(address + x);
//Comparing some things, setting/disabling flags
//Incrementing program counter and cycles/ticks
break;
//More opcodes
}
Let's say in this situation that x = 0xEE. In regular binary, this would mean that x = 238. In the 6502 however, the branch instruction uses signed offset for jumping to memory addresses, so I'm wondering, is the 238 interpreted as -18 in this case, or is it just regular unsigned 8 bit value?

It varies.
They're not explicitly signed or unsigned for arithmetic, logical, shift, or load and store operations.
The conditional branches (and the unconditional one on the later 6502 descendants) all take the argument as signed; otherwise loops would be extremely awkward.
zero, x addressing is achieved by performing an 8-bit addition of x to the zero page address, ignoring carry, and reading from the zero page. So e.g.
LDX #-126 ; which is +130 if unsigned
LDA 23, x
Would read from address 23+130 = 153. But had it been 223+130 then the end read would have been from (223 + 130) MOD 256 = 97.
absolute, x/y is unsigned and carry works correctly (but costs an extra cycle)
(zero, x) is much like the direct version in that the offset is signed but the result is always within the zero page. Then the real address is read from there.
(zero), y is unsigned with carry working and costing.

The "sign" is simply the value of the most significant (aka bit 7) in an 8-bit byte.
6502 has support for signed values in these ways:
The N bit in .P - but it really just tells you if the last instruction turned on or off bit 7 of a memory location or register. It was common to use BPL/BMI to do stuff based on bit 7 in a memory location for flag or "boolean" like use.
The V bit of .P which is flipped "when the result of adding two positive numbers overflows and ends up negative, and when the result of adding two negative numbers overflows and ends up positive"
And of course obeying the sign bit for relative branch instructions only, e.g. BEQ with a value with bit 7 set will move to a lower memory location, not a higher one.
Beyond that, whether that bit means anything is completely up to you and your program. What really makes numbers signed or unsigned is how you display the numbers.
The linked article above goes into what one's complement and two's complement is and how it makes the mathematics work without the 6502 having to care too much about the sign.

Related

Advantages and disadvantages of single numeric (float) data type [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Why we use various data types in programming languages ? Why not use float everywhere ? I have heard some arguments like
Arithmetic on int is faster ( but why ?)
It takes more memory to store float. ( I get it.)
What are the additional benefits of using various types of numeric data types ?
Arithmetic on integers has traditionally been faster because it's a simpler operation. It can be implemented in logic gates and, if properly designed, the whole thing can happen in a single clock cycle.
On most modern PCs floating-point support is actually quite fast, because loads of time has been invested into making it fast. It's only on lower-end processors (like Arduino, or some versions of the ARM platform) where floating point seriously suffers, or is absent from the CPU altogether.
A floating point number contains a few different pieces of data: there's a sign bit, and the mantissa, and the exponent. To put those three parts together to determine the value they represent, you do something like this:
value = sign * mantissa * 2^exponent
It's a little more complicated than that because floating point numbers optimize how they store the mantissa a bit (for instance the first bit of the mantissa is assumed to be 1, thus the first bit doesn't actually need to be stored... But this also means zero has to be stored a particular way, and there's various "special values" that can be stored in floats like "not a number" and infinity that have to be handled correctly when working with floats)
So to store the number "3" you'd have a mantissa of 0.75 and an exponent of 2. (0.75 * 2^2 = 3).
But then to add two floats together, you first have to align them. For instance, 3 + 10:
m3 = 0.75 (stored as binary (1)1000000... the first (1) implicit and not actually stored)
e3 = 2
m10 = .625 (stored as binary (1)010000...)
e10 = 4 (.625 * 2^4 = 10)
You can't just add m3 and m10 together, 'cause you'd get the wrong answer. You first have to shift m3 over by a couple bits to get e3 and e10 to match, then you can add the mantissas together and reassemble the result into a new floating point number. A CPU with good floating-point implementation will do all that for you, of course, and do it fast.
So why else would you not want to use floating point values for everything? Well, for starters there's the problem of exactness. If you add or multiply two integers to get another integer, as long as you don't exceed the limits of your integer size, the answer you get will be exactly correct. This isn't the case with floating-point. For instance:
x = 1000000000.0
y = .0000000001
for (cc = 0; cc < 1000000000; cc++) { x += y; }
Logically you'd expect the final value of (x) to be 1000000000.1, but that's almost certainly not what you're going to get. When you add (y) to (x), the change to (x)'s mantissa may be so small that it doesn't even fit into the float, and so (x) may not change at all. And even if that's not the case, (y)'s value is not exact. There are no two integers (a, b) such that (a * 2^b = 10^-10). That's true for many common decimal values, actually. Even something simple like 0.3 can't be stored as an exact value in a binary floating-point number.
So (y) isn't exactly 10^-10, it's actually off by some small amount. For a 32-bit floating point number it'll be off by about 10^-26:
y = 10^-10 + error, error is about 10^-26
Then if you add (y) together ten billion times, the error is magnified by about ten billion times as well, so your final error is around 10^-16
A good floating-point implementation will try to minimize these errors, but it can't always get it right. The problem is fundamental to how the numbers are stored, and to some extent unavoidable. As a result, for instance, even though it seems natural to store a money value in a float, it might be preferable to store it as an integer instead, to get that assurance that the value is always exact.
The "exactness" issue also means that when you test the value of a floating point number, generally speaking, you can't use exact comparisons. For instance:
x = 11.0 / 500
if (x * 50 == 1.1) { ... It doesn't!
for (float x = 0.0; x < 1.0; x += 0.01) { print x; }
// prints 101 values instead of 100, the last one being 0.9999999...
The test fails because (x) isn't exactly the value we specified, and 1.1, when encoded as a float, isn't exactly the value we specified either. They're both close but not exact. So you have to do inexact comparisons:
if (abs(x - expected_value) < small_value) {...
Choosing the correct "small_value" is a problem unto itself. It can depend on what you're doing with the values, what kind of behavior you're trying to achieve.
Finally, if you look at the "it takes more memory" issue, you can also turn that around and think of it in terms of what you get for the memory you use.
If you can work with integer math for your problem, a 32-bit unsigned integer lets you work with (exact) values between 0 and around 4 billion.
If you're using 32-bit floats instead of 32-bit integers, you can store larger values than 4 billion, but you're still limited by the representation: of those 32 bits, one is used for the sign bit, and eight for the mantissa, so you get 23 bits (24, effectively) of mantissa. Once (x >= 2^24), you're beyond the range where integers are stored "exactly" in that float, so (x+1 = x). So a loop like this:
float i;
for (i = 1600000; i < 1700000; i += 1);
would never terminate: (i) would reach (2^24 = 16777216), and the least-significant bit of its mantissa would be of a magnitude greater than 1, so adding 1 to (i) would cease to have any effect.

Binary arithmetic - addition with overflow

When I need to subtract 2 numbers (X-Y), I can take 2's complement of Y and add it to X. Let's say our system represents integers using a byte (8 bits).
X = 7 = 00000111
Y = 5 = 00000101
2's complement of 5
11111010 + 1 = 11111011
Adding those 2 =
00000111
11111011
__________
100000010
There is a carryover. How does one deal with this carryover?
If I am using 8 bits, that means I have a range of -128 to 127. So 7 and -5 and their sum do not fall outside that range. So this is not overflow.
that depends on what you are trying to do
if you are just computing simple/single +/- operations
then the overflow is usually ignored
when you need to handle overflow/underflow
for example if you need to clamp the result for some reason (usually safety of the result range ...) then Carry flag of the ALU marks if the overflow underflow occur. After that you set the result as max positive or negative value depending on the inputs sign,magnitude and operation (+,-). Aome platforms have instructions that do this automatically (saturated add,sub).
Another reason is making bigint operations in that case carry is added as +/-1 to higher operation (sign depends on the operation)... but the result itself stays as is (add,adc,adc,adc,...)
on modern languages/platforms you do not have direct ALU flag register access anymore
sometimes you can tap into assembler but it can be slower in some cases then the computation itself. In that case use this approach 32bit ALU in C++ where cy is the carry flag
Should have read my textbook again :)
In 2's complement arithmetic the carryover is thrown away, versus 1's complement arithmetic, where carryover is carried back and added to the result.
This video helped me understand - https://www.youtube.com/watch?v=lKTsv6iVxV4

Two's Complement -- How are negative numbers handled?

It is my understanding that numbers are negated using the two's compliment, which to my understanding is: !num + 1.
So my question is does this mean that, for variable 'foo'=1, a negated 'foo' will be the exactly the same as variable 'bar'=255.
f we were to check if -'foo' == 'bar' or if -'foo' == 255, would we get that they are equal?
I know that some languages, such as Java, keep a sign bit -- so the comparisons would yield false. What of languages that do not? And I'm assuming that assembler/native machine does not have a sign bit.
In addition to all of this, I read about a zero flag or a carry-over flag that is set when a 'negative' number is added to another (of any sign) number. This flag being set whenever it is added because of the way two's complement works, 0x01 + 0xff = 0x00 (with the leading 1 truncated). What exactly is this flag used for?
And my last question, for other math operations (such as multiplication), would I have to re-negate the number (so it is now positive), perform the operation, and negate the result? E.g., !((!neg + 1) * pos) + 1.
Edit
Finished the question, so feel free fire away.
Yes, in two’s complement, the number x is represented as ~x+1, where ~x is the bitwise complement of the binary numeral for x in some fixed number bits. E.g., for eight bits, the binary numeral for x is 000000001, so the bitwise complement is 11111110, and adding one produces 11111111.
There is no way to distinguish -1 in eight-bit two’s complement from 255 in eight-bit binary (with no sign). They both have the same representation in bits: 11111111. If you are using both of these numbers, you must either separately remember which one is eight-bit two’s complement and which one is plain eight-bit binary or you must use more than eight bits. In other words, at the raw bit level, 11111111 is just eight bits; it has no value until we decide how to interpret it.
Java and typical other languages do not maintain a sign bit separate from the value of a number; the sign is part of the encoding of the number. Also, typical languages do not allow you to compare different types. If you have a two’s complement x and an unsigned y, then either one must be converted to the type of the other before comparison or they must both be converted to a third type. Thus, if you compare x and y, and one is converted to the other, then the conversion will overflow or wrap, and you cannot expect to get the correct mathematical result. To compare these two numbers, we might convert each of them to a wider integer, such as 32-bits, then compare. Converting the eight-bit two’s complement 11111111 to a 32-bit integer produces -1, and converting the eight-bit plain binary 11111111 to a 32-bit integer produces 255, and then the comparison reports they are unequal.
The zero flag and the carry flag you read about are flags that are set when a comparison instruction is executed in a computer processor. Most high-level languages do not give you direct access to these flags. Many processors have an instruction with a form like this:
cmp a, b
That instruction subtracts b from a and discards the difference but remembers several flags that describe the subtraction: Was the result zero (zero flag)? Did a borrow occur (borrow flag)? Was the result negative (sign flag)? Did an overflow occur (overflow flag)?
The compare instruction requires that the two things being compared be the same type (two’s complement or unsigned), but it does not care which type. The results can be tested later by checking particular combinations of the flags depending on the type. That is, the information recorded in the flags can distinguish whether one two’s complement number was greater than another or whether one unsigned number was greater than another, depending on what tests are made. There are conditional branch instructions that test the desired flag properties.
There is generally no need to “un-negate” a number to perform arithmetic operations. Processors include arithmetic instructions that work on two’s complement numbers. Usually the add and subtract instructions are type-agnostic, the same way the compare instruction is, but the multiply and divide instructions are not (except for some forms of multiply that return partial results). The add and subtract instructions can be type-agnostic because the wrapping that occurs in the arithmetic works for both two’s complement and unsigned. However, that wrapping does not work for multiplication and division.

How is overflow detected at the binary level?

I'm reading the textbook Computer Organization And Design by Hennessey and Patterson (4th edition). On page 225 they describe how overflow is detected in signed, 2's complement arithmetic. I just can't even understand what they're talking about.
"How do we detect [overflow] when it does occur? Clearly, adding or
substracting two 32-bit numbers can yield a result that needs 33 bits
to be fully expressed."
Sure. And it won't need 34 bits because even the smallest 34 bit number is twice the smallest 33 bit number, and we're adding 32 bit numbers.
"The lack of a 33rd bit means that when overflow occurs, the sign bit
is set with the value of the result instead of the proper sign of
the result."
What does this mean? The sign bit is set with the "value" of the result... meaning it's set as if the result were unsigned? And if so, how does that follow from the lack of a 33rd bit?
"Since we need just one extra bit, only the sign bit can be wrong."
And that's where they lost me completely.
What I'm getting from this is that, when adding signed numbers, there's an overflow if and only if the sign bit is wrong. So if you add two positives and get a negative, or if you add two negatives and get a positive. But I don't understand their explanation.
Also, this only applies to unsigned numbers, right? If you're adding signed numbers, surely detecting overflow is much simpler. If the last half-adder of the ALU sets its carry bit, there's an overflow.
note: I really don't know what tags are appropriate here, feel free to edit them.
Any time you want to deal with these kind of ALU items be it add, subtract, multiply, etc, start with 2 or 3 bit numbers, much easier to get a handle on than 32 or 64 bit numbers. After 2 or 3 bits it doesn't matter if it is 22 or 2200 bits it all works exactly the same from there on out. Basically you can by hand if you want make a table of all 3 bit operands and their results such that you can examine the whole table visually, but a table of all 32 bit operands against all 32 bit operands and their results, can't do that by hand in a reasonable time and cannot examine the whole table visually.
Now twos complement, that is just a scheme for representing positive and negative numbers, and it is not some arbitrary thing it has a reason, the reason for the madness is that your adder logic (which is also what the subtractor uses which is the same kind of thing the multiplier uses) DOES NOT CARE ABOUT UNSIGNED OR SIGNED. It does not know the difference. YOU the programmer cares in my three bit world the bit pattern 0b111 could be a positive seven (+7) or it could be a negative one. Same bit pattern, feed it to the add logic and the same thing comes out, and the answer that comes out I can choose to interpret as unsigned or twos complement (so long as I interpret the operands and the result all as either unsigned or all as twos complement). Twos complement also has the feature that for negative numbers the most significant bit (msbit) is set, for positive numbers it is zero. So it is not sign plus magnitude but we still talk about the msbit being the sign bit, because except for two special numbers that is what it is telling us, the sign of the number, the other bits are actually telling us the magnitude they are just not an unsigned magnitude as you might have in sign+magnitude notation.
So, the key to this whole question is understanding your limits. For a 3 bit unsigned number our range is 0 to 7, 0b000 to 0b111. for a 3 bit signed (twos complement) interpretation our range is -4 to +3 (0b100 to 0b011). For now limiting ourselves to 3 bits if you add 7+1, 0b111 + 0b001 = 0b1000 but we only have a 3 bit system so that is 0b000, 7+1 = 8, we cannot represent 8 in our system so that is an overflow, because we happen to be interpreting the bits as unsigned we look at the "unsigned overflow" which is also known as the carry bit or flag. Now if we take those same bits but interpret them as signed, then 0b111 (-1) + 0b001 (+1) = 0b000 (0). Minus one plus one is zero. No overflow, the "signed overflow" is not set...What is the signed overflow?
First what is the "unsigned overflow".
The reason why "it all works the same" no matter how many bits we have in our registers is no different than elementary school math with base 10 (decimal) numbers. If you add 9 + 1 which are both in the ones column you say 9 + 1 = zero carry the 1. you carry a one over to the tens column then 1 plus 0 plus 0 (you filled in two zeros in the tens column) is 1 carry the zero. You have a 1 in the tens column and a zero in the ones column:
1
09
+01
====
10
What if we declared that we were limited to only numbers in the ones column, there isn't any room for a tens column. Well that carry bit being a non-zero means we have an overflow, to properly compute the result we need another column, same with binary:
111
111
+ 001
=======
1000
7 + 1 = 8, but we cant do 8 if we declare a 3 bit system, we can do 7 + 1 = 0 with the carry bit set. Here is where the beauty of twos complement comes in:
111
111
+ 001
=======
000
if you look at the above three bit addition, you cannot tell by looking if that is 7 + 1 = 0 with the carry bit set or if that is -1 + 1 = 0.
So for unsigned addition, as we have known since grade school that a carry over into the next column of something other than zero means we have overflowed that many placeholders and need one more placeholder, one more column, to hold the actual answer.
Signed overflow. The sort of academic answer is if the carry in of the msbit column does not match the carry out. Let's take some examples in our 3 bit world. So with twos complement we are limited to -4 to +3. So if we add -2 + -3 = -5 that wont work correct?
To figure out what minus two is we do an invert and add one 0b010, inverted 0b101, add one 0b110. Minus three is 0b011 -> 0b100 -> 0b101
So now we can do this:
abc
100
110
+ 101
======
011
If you look at the number under the b that is the "carry in" to the msbit column, the number under the a the 1, is the carry out, these two do not match so we know there is a "signed overflow".
Let's try 2 + 2 = 4:
abc
010
010
+ 010
======
100
You may say but that looks right, sure unsigned it does, but we are doing signed math here, so the result is actually a -4 not a positive 4. 2 + 2 != -4. The carry in which is under the b is a 1, the carry out of the msbit is a zero, the carry in and the carry out don't match. Signed overflow.
There is a shortcut to figuring out the signed overflow without having to look at the carry in (or carry out). if ( msbit(opa) == msbit(opb) ) && ( msbit(res) != msbit(opb) ) signed overflow, else no signed overflow. opa being one operand, opb being the other and res the result.
010
+ 010
======
100
Take this +2 + +2 = -4. msbit(opa) and msbit(opb) are equal, and the result msbit is not equal to opb msbit so this is a signed overflow. You could think about it using this table:
x ab cr
0 00 00
0 01 01
0 10 01
0 11 10 signed overflow
1 00 01 signed overflow
1 01 10
1 10 10
1 11 11
This table is all the possible combinations if carry in bit, operand a, operand b, carry out and result bit for a single column turn your head sideways to the left to sort of see this x is the carry in, a and b columns are the two operands. cr as a pair is the result xab of 011 means 0+1+1 = 2 decimal which is 0b10 binary. So taking the rule that has been dictated to us, that if the carry in and carry out do not match that is a signed overflow. Well the two cases where the item in the x column does not match the item in the c column are indicated those are the cases where a and b inputs match each other, but the result bit is the opposite of a and b. So assuming the rule is correct this quick shortcut that does not require knowing what the carry bits are, will tell you if there was a signed overflow.
Now you are reading an H&P book. Which probably means mips or dlx, neither mips or dlx deal with carry and signed flags in the way that most other processors do. mips is not the best first instruction set IMO primarily for that reason, their approach is not wrong in any way, but being the oddball, you will spend forever thinking differently and having to translate when going to most other processors. Where if you learned the typical znvc flags (zero flag, negative flag, v=signed overflow, c=carry or unsigned overflow) way then you only have to translate when going to mips. Normally these are computed on every alu operation (for the non-mips type processors) you will see signed and unsigned overflow being computed for add and subtract. (I am used to an older mips, maybe this gen of books and the current instruction set has something different). Calling it addu add unsigned right at the start of mips after learning all of the above about how an adder circuit does not care about signed vs unsigned, is a huge problem with mips it really puts you in the wrong mindset for understanding something this simple. Leads to the belief that there is a difference between signed addition and unsigned addition when there isn't. It is only the overflow flags that are computed differently. Now multiply, and divide there is definitely a twos complement vs unsigned difference and you ideally need a signed multiply and an unsigned multiply or you need to deal with the limitation.
I recommend a simple (depending on how strong your bit manipulation is and twos complement) exercise that you can write in some high level language. Basically take all the combinations of unsigned numbers 0 to 7 added to 0 to 7 and save the result. Print out both as decimal and as binary (three bits for operands, four bits for result) and if the result is greater than 7 print overflow as well. Repeat this using signed variables using the numbers -4 to +3 added to -4 to +3. print both decimal with a +/- sign and the binary. If the result is less than -4 or greater than +3 print overflow. From those two tables you should be able to see that the rules above are true. Looking strictly at the operand and result bit patterns for the size allowed (three bits in this case) you will see that the addition operation gives the same result, same bit pattern for a given pair of inputs independent of whether those bit patterns are considered unsigned or twos complement. Also you can verify that unsigned overflow is when the result needs to use that fourth column, there is a carry out off of the msbit. For signed when the carry in doesn't match the carry out, which you see using the shortcut looking at the msbits of the operands and result. Even better is to have your program do those comparisons and print out something. So if you see a note in your table that the result is greater than 7 and a note in your table that bit 3 is set in the result, then you will see for the unsigned table that is always the case (limited to inputs of 0 to 7). And the more complicated one, signed overflow, is always when the result is less than -4 and greater than 3 and when the operand upper bits match and the result upper bit does not match the operands.
I know this is super long and very elementary. If I totally missed the mark here, please comment and I will remove or re-write this answer.
The other half of the twos complement magic. Hardware does not have subtract logic. One way to "convert" to twos complement is to "invert and add one". If I wanted to subtract 3 - 2 using twos complement what actually happens is that is the same as +3 + (-2) right, and to get from +2 to to -2 we invert and add one. Looking at our elementary school addition, did you notice the hole in the carry in on the first column?
111H
111
+ 001
=======
1000
I put an H above where the hole is. Well that carry in bit is added to the operands right? Our addition logic is not a two input adder it is a three input adder yes? Most of the columns have to add three one bit numbers in order to compute two operands. If we use a three input adder on the first column now we have a place to ... add one. If I wanted to subtract 3 - 2 = 3 + (-2) = 3 + (~2) + 1 which is:
1
011
+ 101
=====
Before we start and filled in it is:
1111
011
+ 101
=====
001
3 - 2 = 1.
What the logic does is:
if add then carry in = 0; the b operand is not inverted, the carry out is not inverted.
if subtract then carry in = 1; the b operand is inverted, the carry out MIGHT BE inverted.
The addition above shows a carry out, I didn't mention that this was an unsigned operation 3 - 2 = 1. I used some twos complement tricks to perform an unsigned operation, because here again no matter whether I interpret the operands as signed or unsigned the same rules apply for if add or if subtract. Why I said that the carry out MIGHT BE inverted is that some processors invert the carry out and some don't. It has to do with cascading operations, taking say a 32 bit addition logic and using the carry flag and an add with carry or subtract with borrow instruction creating a 64 bit add or subtract, or any multiple of the base register size. Say you have two 64 bit numbers in a 32 bit system a:b + c:d where a:b is the 64 bit number but it is held in the two registers a and b where a is the upper half and b is the lower half. so a:b + c:d = e:f on a 32 bit system unsigned that has a carry bit and add with carry:
add f,b,d
addc e,a,c
The add leaves its carry out bit from the upper most bit lane in the carry flag in the status register, the addc instruction is add with carry takes the operands a+c and if the carry bit is set then adds one more. a+c+1 putting the result in e and the carry out in the carry flag, so:
add f,b,d
addc e,a,c
addc x,y,z
Is a 96 bit addition, and so on. Here again something very foreign to mips since it doesn't use flags like other processors. Where the invert or don't invert comes in for signed carry out is on the subtract with borrow for a particular processor. For subtract:
if subtract then carry in = 1; the b operand is inverted, the carry out MIGHT BE inverted.
For subtract with borrow you have to say if the carry flag from the status register indicates a borrow then the carry in is a 0 else the carry in is a 1, and you have to get the carry out into the status register to indicate the borrow.
Basically for the normal subtract some processors invert b operand and carry on in the way in and carry out on the way out, some processors invert the b operand and carry in in the way in but don't invert carry out on the way out. Then when you want to do a conditional branch you need to know if the carry flag means greater than or less than (often the syntax will have a branch if greater or branch if less than and sometimes tell you which one is the simplified branch if carry set or branch if carry clear). (If you don't "get" what I just said there that is another equally long answer which won't mean anything so long as you are studying mips).
As a 32-bit signed integers are represented by 1 sign-bit and 31 bits for the actual number we are effectively adding two 31 bit-numbers. Hence the 32nd bit (sign bit) will be where the overflow will be visible.
"The lack of a 33rd bit means that when overflow occurs, the sign bit is set with the value of the result instead of the proper sign of the result."
Imagine the following addition of two positive numbers (16 bit to simpify):
0100 1100 0011 1010 (19514)
+ 0110 0010 0001 0010 (25106)
= 1010 1110 0110 1100 (-20884 [or 44652])
For the summation of two large negative numbers however the extra bit would be required
1100 1100 0011 1010
+ 1110 0010 0001 0010
=11010 1110 0110 1100
Usually the CPU have this 33rd bit (or whatever bitsize it operates on +1) exposed as a overflow-bit in the micro-architecture.
Their description relates to operations on values with a particular bit sequence: the first bit corresponds to the sign of the value, and the other bits relate to the magnitude of that value.
What does this mean? The sign bit is set with the "value" of the result...
They mean that the overflow bit - the one that is a consequence of adding two numbers that need to spill into the next digit over - is dumped into the same place that the sign bit should be.
"Since we need just one extra bit, only the sign bit can be wrong."
All this means is that, when you perform arithmetic that overflows, the only bit whose value may be incorrect is the sign bit. All of the other bits are still the value they should be.
This is a consequence of what was described above: confusion between the sign bit's value due to overflow.

Is there any sense in performing binary AND with a number where all bits are set to 1

Greetings everybody. I have seen examples of such operations for so many times that I begin to think that I am getting something wrong with binary arithmetic. Is there any sense to perform the following:
byte value = someAnotherByteValue & 0xFF;
I don't really understand this, because it does not change anything anyway. Thanks for help.
P.S.
I was trying to search for information both elsewhere and here, but unsuccessfully.
EDIT:
Well, off course i assume that someAnotherByteValue is 8 bits long, the problem is that i don't get why so many people ( i mean professionals ) use such things in their code. For example in SharpZlib there is:
buffer_ |= (uint)((window_[windowStart_++] & 0xff |
(window_[windowStart_++] & 0xff) << 8) << bitsInBuffer_);
where window_ is a byte buffer.
The most likely reason is to make the code more self-documenting. In your particular example, it is not the size of someAnotherByteValue that matters, but rather the fact that value is a byte. This makes the & redundant in every language I am aware of. But, to give an example of where it would be needed, if this were Java and someAnotherByteValue was a byte, then the line int value = someAnotherByteValue; could give a completely different result than int value = someAnotherByteValue & 0xff. This is because Java's long, int, short, and byte types are signed, and the rules for conversion and sign extension have to be accounted for.
If you always use the idiom value = someAnotherByteValue & 0xFF then, no matter what the types of the variable are, you know that value is receiving the low 8 bits of someAnotherByteValue.
uint s1 = (uint)(initial & 0xffff);
There is a point to this because uint is 32 bits, while 0xffff is 16 bits. The line selects the 16 least significant bits from initial.
Nope.. There is no use in doing this. Should you be using a value that is having its importance more than 8 bits, then the above statement has some meaning. Otherwise, its the same as the input.
If sizeof(someAnotherByteValue) is more than 8 bits and you want to extract the least signficant 8 bits from someAnotherByteValue then it makes sense. Otherwise, there is no use.
No, there is no point so long as you are dealing with a byte. If value was a long then the lower 8 bits would be the lower 8 bits of someAnotherByteValue and the rest would be zero.
In a language like C++ where operators can be overloaded, it's possible but unlikely that the & operator has been overloaded. That would be pretty unusual and bad practice though.
EDIT: Well, off course i assume that
someAnotherByteValue is 8 bits long,
the problem is that i don't get why so
many people ( i mean professionals )
use such things in their code. For
example in Jon Skeet's MiscUtil there
is:
uint s1 = (uint)(initial & 0xffff);
where initial is int.
In this particular case, the author might be trying to convert an int to a uint. The & with 0xffff would ensure that it would still convert Lowest 2 Bytes, even if the system is not one which has a 2 byte int type.
To be picky, there is no guaranty regarding a machine's byte size. There is no reason to assume in a extremely portable program that the architecture byte is 8 bits wide. To the best of my memory, according to the C standard (for example), a char is one byte, short is wider or the same as char, int is wider or the same as short, long is wider or the same as int and so on. Hence, theoretically there can be a compiler where a long is actually one byte wide, and that byte will be, say, 10 bits wide. Now, to ensure your program behaves the same on that machine, you need to use that (seemingly redundant) coding style.
"Byte" # Wikipedia gives examples for such peculiar architectures.

Resources