how those bit-wise operation work and why wouldn't it use little/small endian instead - arduino

i found those at arduino.h library, and was confused about the lowbyte macro
#define lowByte(w) ((uint8_t) ((w) & 0xff))
#define highByte(w) ((uint8_t) ((w) >> 8))
at lowByte : wouldn't the conversion from WORD to uint8_t just take the low byte anyway? i know they w & 0x00ff to get the low byte but wouldn't the casting just take the low byte ?
at both the low/high : why wouldn't they use little endians, and read with size/offset
i.e. if the w is 0x12345678, high is 0x1234, low is 0x5678, they write it to memory as 78 56 34 12 at say offset x
to read the w, you read to size of word at location x
to read the high, you read byte/uint8_t at location x
to read the low, you read byte/uint8_t at location x + 2

at lowByte : wouldn't the conversion from WORD to uint8_t just take the low byte anyway? i know they w & 0x00ff to get the low byte but wouldn't the casting just take the low byte ?
Yes. Some people like to be extra explicit in their code anyway, but you are right.
at both the low/high : why wouldn't they use little endians, and read with size/offset
I don't know what that means, "use little endians".
But simply aliasing a WORD as a uint8_t and using pointer arithmetic to "move around" the original object generally has undefined behaviour. You can't alias objects like that. I know your teacher probably said you can because it's all just bits in memory, but your teacher was wrong; C and C++ are abstractions over computer code, and have rules of their own.
Bit-shifting is the conventional way to achieve this.

In the case of lowByte, yes the cast to uint8_t is equivalent to (w) & 0xff).
Regarding "using little endians", you don't want to access individual bytes of the value because you don't necessarily know whether your system is using big endian or little endian.
For example:
uint16_t n = 0x1234;
char *p = (char *)&n;
printf("0x%02x 0x%02x", p[0], p[1]);
If you ran this code on a little endian machine it would output:
0x34 0x12
But if you ran it on a big endian machine you would instead get:
0x12 0x34
By using shifts and bitwise operators you operate on the value which must be the same on all implementations instead of the representation of the value which may differ.
So don't operate on individual bytes unless you have a very specific reason to.

Related

Why my memory has 18 digits addresses? D:

I was programming following a tutorial and when I was in the pointer tutorial I notice that the output of thsi code is so much larger than the normal (it is ptr = 0x000000cd9d1cf504) :/ why?
int main()
{
int pointerTest = 6;
void* ptr = 0;
ptr = &pointerTest;
std::cout << ptr << std::endl;
std::cin.get();
}
It's not an 18-digit address - it only consists of 16 digits. The prefix 0x merely indicates that what comes after it is going to be in hexadecimal form. The other commonly used notation for hexadecimal integers is h (or sometimes x, such as in VHDL) either prefixed or postfixed (for example hCD9D1CF504, h'CD9D1CF504 or CD9D1CF504h - note that this is quite unclear unless the hexadecimal digits A-F are capitalized).
One hexadecimal digit represents 4 bits, so the pointer is 4 * 16 = 64 bits in size. In other words, the binary executable produced by your compiler is 64-bit, while the tutorial binary likely was 32-bit, as pointed out by #Hawky in the comments.
To fully understand the difference between 32-bit and 64-bit code, you'll have to study computer architecture, the x86-64 in particular. Be warned, though - if you choose to go down that route, prepare for a lifetime of pain and suffering (the worst bit being that you might just enjoy it).

Difference between uint8_t* vs uint8_t

What is the difference/use for these 2 types? I have a basic understanding regarding pointers but I just can't wrap my head around this.
uint8_t* address_at_eeprom_location = (uint8_t*)10;
This line found in an Arduino example makes me feel so dumb. :)
So basically this is a double pointer?
The uint_t is the unsigned integer, this is the data stored directly in the memory. The uint_t * is the pointer to the memory in which the number is stored. The (uint_t*) is cast of the 10 - (literal which is translated to a number in the memory so the binary representation of the number ten) to the pointer type. This will create the storage to store the 10, and than will use its address and store it in the address_at_eeprom_location variable.
uint8_t is an unsigned 8 bit integer
uint8_t* is a pointer to an 8 bit integer in ram
(uint8_t*)10 is a pointer to an uint8_t at the address 10 in the ram
So basically this line saves the address of the location for an uint_8 in address_at_eeprom_location by setting it to 10. Most likely later in the code this address is used to write/read an actual uint8_t value to/from there.
Instead of a single value this can also be used as an starting point for an array later in the code:
uint8_t x = address_at_eeprom_location[3]
This would read the 3rd uint8_t starting from address 10 (so at address 13) in ram into the variable x

Explain concept of size of integer,character and float pointer in GCC

In GCC(Ubuntu 12 .04) Following code is the program which i need to understand for the concept of size of integer,character and float pointer.
#include<stdio.h>
main()
{
int i=20,*p;
char ch='a',*cp;
float f=22.3,*fp;
printf("%d %d %d\n",sizeof(p),sizeof(cp),sizeof(fp));
printf("%d %d %d\n",sizeof(*p),sizeof(*cp),sizeof(*fp));
}
Here i am getting following output when i run the above code in "UBUNTU 12.04"
Output:
8 8 8
4 1 4
As per this lines,"Irrespective of data types,size of pointer for address it will allow 4 bytes BY DEFAULT"
Then what is the reason behind getting sizeof(p)=8 instead it should be sizeof(p)=4....
Please explain me.
sizeof(x) will return the size of x. A pointer is like any other variable, except that it holds an address. On your 64 bit machine, the pointer takes 64 bits or 8 bytes, and that is what sizeof will return. All pointers on your machine will be 8 bytes long, regardless of what data they point to.
The data they point to may be of a different length.
int x = 5; // x is a 32 bit int, takes up 4 bytes
int *y = &x; // y holds the address of x, & is 8 bytes
float *z; // z holds the address of a float, and an address is still 8 bytes long
You're probably getting confused because you previously have done this on a 32 bit computer. You see, the 32 / 64 bit indicates the size of a machine address. So, on a 32 bit computer, a pointer holds an address that is at most 32 bits long, or four bytes. Your current machine must be a 64 bit machine, which is why the pointer needs to be 8 bytes long.
Read more about this here.
Heck, it's not just the address length. The size of other data types is also platform AND implementation dependent. For example, an int may be 16 bits on one platform & 32 bits on another. A third implementation might go crazy and have 128 bit ints. The only guarantee in the spec is that an int will be at least 16 bits long. When in doubt, always check. The Wikipedia page on C data types would be helpful.
sizeof(p) will return an address, and you are most likely running on a 64-bit machine, so your addresses will be (8*8) or 64 bits in length.
The size of the value dereferenced by p is a 32 bit integer (4*8).
You can verify this by seeing that all:
All pointers have sizeof as 8
Your char value is size 1 (typical of many implementations of c)
Print p and *p (for all variables). You will see the actual address length this way.
I'm not sure which documentation you're using but my guess is that they're talking about pointers in 32bit.
In 64 bit the size of a pointer becomes 8 bytes

Does the 6502 use signed or unsigned 8 bit registers (JAVA)?

I'm writing an emulator for the 6502, and basically, there are some instructions where there's an offset saved in one of the registers (mostly X and Y) and I'm wondering, since branch instructions use signed 8 bit integers, do the registers keep their values as 8 bit signed? Meaning this:
switch(opcode) {
//Bunch of opcodes
case 0xD5:
//Read the memory area with final address being address + x offset
int rempResult = a - readMemory(address + x);
//Comparing some things, setting/disabling flags
//Incrementing program counter and cycles/ticks
break;
//More opcodes
}
Let's say in this situation that x = 0xEE. In regular binary, this would mean that x = 238. In the 6502 however, the branch instruction uses signed offset for jumping to memory addresses, so I'm wondering, is the 238 interpreted as -18 in this case, or is it just regular unsigned 8 bit value?
It varies.
They're not explicitly signed or unsigned for arithmetic, logical, shift, or load and store operations.
The conditional branches (and the unconditional one on the later 6502 descendants) all take the argument as signed; otherwise loops would be extremely awkward.
zero, x addressing is achieved by performing an 8-bit addition of x to the zero page address, ignoring carry, and reading from the zero page. So e.g.
LDX #-126 ; which is +130 if unsigned
LDA 23, x
Would read from address 23+130 = 153. But had it been 223+130 then the end read would have been from (223 + 130) MOD 256 = 97.
absolute, x/y is unsigned and carry works correctly (but costs an extra cycle)
(zero, x) is much like the direct version in that the offset is signed but the result is always within the zero page. Then the real address is read from there.
(zero), y is unsigned with carry working and costing.
The "sign" is simply the value of the most significant (aka bit 7) in an 8-bit byte.
6502 has support for signed values in these ways:
The N bit in .P - but it really just tells you if the last instruction turned on or off bit 7 of a memory location or register. It was common to use BPL/BMI to do stuff based on bit 7 in a memory location for flag or "boolean" like use.
The V bit of .P which is flipped "when the result of adding two positive numbers overflows and ends up negative, and when the result of adding two negative numbers overflows and ends up positive"
And of course obeying the sign bit for relative branch instructions only, e.g. BEQ with a value with bit 7 set will move to a lower memory location, not a higher one.
Beyond that, whether that bit means anything is completely up to you and your program. What really makes numbers signed or unsigned is how you display the numbers.
The linked article above goes into what one's complement and two's complement is and how it makes the mathematics work without the 6502 having to care too much about the sign.

Is there any sense in performing binary AND with a number where all bits are set to 1

Greetings everybody. I have seen examples of such operations for so many times that I begin to think that I am getting something wrong with binary arithmetic. Is there any sense to perform the following:
byte value = someAnotherByteValue & 0xFF;
I don't really understand this, because it does not change anything anyway. Thanks for help.
P.S.
I was trying to search for information both elsewhere and here, but unsuccessfully.
EDIT:
Well, off course i assume that someAnotherByteValue is 8 bits long, the problem is that i don't get why so many people ( i mean professionals ) use such things in their code. For example in SharpZlib there is:
buffer_ |= (uint)((window_[windowStart_++] & 0xff |
(window_[windowStart_++] & 0xff) << 8) << bitsInBuffer_);
where window_ is a byte buffer.
The most likely reason is to make the code more self-documenting. In your particular example, it is not the size of someAnotherByteValue that matters, but rather the fact that value is a byte. This makes the & redundant in every language I am aware of. But, to give an example of where it would be needed, if this were Java and someAnotherByteValue was a byte, then the line int value = someAnotherByteValue; could give a completely different result than int value = someAnotherByteValue & 0xff. This is because Java's long, int, short, and byte types are signed, and the rules for conversion and sign extension have to be accounted for.
If you always use the idiom value = someAnotherByteValue & 0xFF then, no matter what the types of the variable are, you know that value is receiving the low 8 bits of someAnotherByteValue.
uint s1 = (uint)(initial & 0xffff);
There is a point to this because uint is 32 bits, while 0xffff is 16 bits. The line selects the 16 least significant bits from initial.
Nope.. There is no use in doing this. Should you be using a value that is having its importance more than 8 bits, then the above statement has some meaning. Otherwise, its the same as the input.
If sizeof(someAnotherByteValue) is more than 8 bits and you want to extract the least signficant 8 bits from someAnotherByteValue then it makes sense. Otherwise, there is no use.
No, there is no point so long as you are dealing with a byte. If value was a long then the lower 8 bits would be the lower 8 bits of someAnotherByteValue and the rest would be zero.
In a language like C++ where operators can be overloaded, it's possible but unlikely that the & operator has been overloaded. That would be pretty unusual and bad practice though.
EDIT: Well, off course i assume that
someAnotherByteValue is 8 bits long,
the problem is that i don't get why so
many people ( i mean professionals )
use such things in their code. For
example in Jon Skeet's MiscUtil there
is:
uint s1 = (uint)(initial & 0xffff);
where initial is int.
In this particular case, the author might be trying to convert an int to a uint. The & with 0xffff would ensure that it would still convert Lowest 2 Bytes, even if the system is not one which has a 2 byte int type.
To be picky, there is no guaranty regarding a machine's byte size. There is no reason to assume in a extremely portable program that the architecture byte is 8 bits wide. To the best of my memory, according to the C standard (for example), a char is one byte, short is wider or the same as char, int is wider or the same as short, long is wider or the same as int and so on. Hence, theoretically there can be a compiler where a long is actually one byte wide, and that byte will be, say, 10 bits wide. Now, to ensure your program behaves the same on that machine, you need to use that (seemingly redundant) coding style.
"Byte" # Wikipedia gives examples for such peculiar architectures.

Resources