PTR Directive in ASM, how does it work?

PTR Directive in ASM, how does it work? - pointers

I have this block of ASM code with a few variables and 1 instruction:
.data
g BYTE 32h
a DWORD 11111111h
h BYTE 64h
.code
mov ebx, DWORD PTR g
Could anyone explain why the value of ebx is not 11 11 11 32 instead of 00 00 00 32 or at least how does PTR work?
I thought that the PTR directive would represent the operand as a 32-bit operand ?
Thanks in advance.

See #Jester's comment if your code really looks like what you've posted.
But judging by your question I'm guessing that your code actually contains this line instead:
mov ebx, DWORD PTR g
I thought that the PTR directive would represent the operand as a 32-bit operand ?
That depends on what you mean by that. DWORD PTR would be used as a size specifier when the size is ambiguous.
For example, the instruction mov [eax], 0 would be ambiguous because the assembler has no idea of knowing if you meant to write a byte, a word, a dword, etc. So in that case you could use DWORD PTR to state that you want to write a DWORD to memory: mov DWORD PTR [eax], 0.
If you want to read a byte from memory and convert it to a DWORD you need to use movzx or movsx:
movzx ebx, BYTE PTR g ; if g should be treated as unsigned
movsx ebx, BYTE PTR g ; if g should be treated as signed

Unfortunately the assembly language for x86 was too generic, using
mov ebx,a
If you look at the instruction encodings (if you are writing/learning assembly you should have a reference handy and open anyway) you find that that might mean read a byte at address 8, or a 16 bit word at address a or perhaps a 32 bit word at address a. And it may or may not go further and allow for sign extension or not. So in order to get the right instruction that you want you need to add more stuff.
Assembly language is not some standard it is defined by the assembler, the program reading the ASCII file, so one assembly language for the same instruction set does not dictate what another is. x86 in particular starting with intel vs AT&T and then gcc vs masm vs nasm and so on. And naturally with gcc and AT&T and everyone else that intentionally didnt want to go along with what was already out there, how you specify if this is a byte read or word read or dword read varies. Likewise the default instruction if any that is generated if you dont specify what you want.

Related

Understanding pointers in assembly language

Are we enclosing the variable or register in brackets to specify a pointer in assembly?
Example1;
MOV eax, array+4
LEA eax, [array+4]
Example2;
section .data
array DB 116,97
section .bss
variable RESB 0
section .text
global _start:
_start:
mov eax,[array]
;exit
mov eax,1
int 0x80
I am not getting any errors while compiling or running the above code. Is the address of the zero index of the array placed in the EAX register?
Example3;
INC [variable]
When compiling the above code, I am getting the "operation size not specified" error. And why can't the command be used as INC variable?
Example4;
section .data
array DB 116,97
section .bss
variable RESB 97
section .text
global _start:
_start:
mov eax,4
mov ebx,1
mov ecx,variable
mov edx,1
int 0x80
;exit
mov eax,1
int 0x80
And this code is not working.

Are we enclosing the variable or registrar in brackets to specify a
pointer in assembly?
Example1;
MOV eax, array+4
LEA eax, [array+4]
The brackets are like the dereference operator in C (*ptr). They get the value at the resulting address inside the square brackets. As for the example, both of these essentially do the same thing. The first moves the address of the array label + 4 into eax. The second uses lea, which loads the effective address of its source operand. So you get array + 4, dereference it, and get the address again with lea and load it into eax.
Example2;
section .data
array DB 116,97
section .bss
variable RESB 0
section .text
global _start:
_start:
mov eax,[array]
;exit
mov eax,1
int 0x80
I am not getting any errors while compiling or running the above code.
Is the address of the zero index of the array placed in the eax
register?
Kind of. Since you're moving it into eax, a 32-bit register, it is assumed that you want to move the first 4 bytes at the address array into eax. But there are only 2 bytes at array: 116 and 97. So this is probably not what you intended. To load the first byte at array into eax, do movzx eax, BYTE [array], which will move array[0] into the LSByte of eax and zero out the higher bytes. mov al, [array] will also work, though it won't zero out the upper bytes.
Example3;
INC [variable]
When compiling the above code, I am getting the "operation size not
specified" error. And why can't the command be used as INC variable.
The error says it all. variable is just an address. When you use [], how many bytes should it take? You need to specify a size. For example to get the first byte, you would do inc BYTE [variable]. However, from the previous example, it seems like you've reserved nothing at variable, so trying to access any bytes at it may cause some issue. As for "And why can't the command be used as INC variable", as I just said, variable is just a label which translates to some address. You can't change the address which variable translates to.
Example4;
section .data
array DB 116,97
section .bss
variable RESB 97
section .text
global _start:
_start:
mov eax,4
mov ebx,1
mov ecx,variable
mov edx,1
int 0x80
;exit
mov eax,1
int 0x80
And this code is not working.
It may seem to not be printing anything, but it actually is. .bss zero-initializes any memory that you reserve. That means when you print the first byte at variable, it just prints the NUL character. However, this doesn't seem to be visible for you when you print it, so it seems like nothing has been printed.
(By the way, are you certain that you know what resb does? In one example, you reserve 0 bytes, and in another, you reserve 97 bytes for no apparent reason. You might want to take another look at what resb actually does.)

array ; variable address
byte[array] ; value of first byte of array
word[array] ; value of first word of array
byte[array + 1] ; value of second byte of array
Think of the variable names as pointers, and using size[name] gets the value being pointed (similar to *name in C where name is a pointer)

Byte and Bit addressable 8051

8051 SFRs
'P0,SP, DPL & DPH' have their byte addresses 80h,81h,82h,83h. Since P0 is bit addressable, P0.0 - P0.7 has bit addresses 80h - 87h. But, how it's gonna distinguish the addresses P0.1(81h) & SP(81h), P0.2(82h) & DPL(82h), P0.3(83h) & DPH(83h) …?

Byte addresses and bit addresses are never used in the same instruction.
So while
mov SP, #5 ; mov 81h, #5
mov P0.1, C ; setb 81h
both have the address 81h and both are written as mov, the first one is assembled as 0x75 0x81 0x5 and the second 0x91 0x81. To the processor, 0x75 and 0x91 mean entirely different things, namely, move the the value in the 3rd byte of this instruction to the address in the second byte of this instruction, and move the carry flag to the bit address in the second byte of this instruction. The assembler knows that mov addr, #imm and mov bit, C need to be encoded differently, and the processor really doesn't care how they're written because it doesn't see the source code at all.

x86 assembly, moving data from an array to a register

Ive been going over the book over and over again and cannot understand why this is giving me "improper operand type". It should work!
This is inline assembly in Visual Studio.
function(unsigned int* a){
unsigned int num;
_asm {
mov eax, a //This stores address (start of the array) in eax
mov num, dword ptr [eax*4] //This is the line I am having issues with.
That last line, I am trying to store the 4 byte value that is in the array. But I get error C2415: improper operand type
What am I doing wrong? How do I copy 4 byte value from an array into a 32 bit register?

In Visual C++'s inline assembly, all variables are accessed as memory operands1; in other words, wherever you write num you can think that the compiler will replace dword ptr[ebp - something].
Now, this means that in the last mov you are effectively trying to perform a memory-memory mov, which isn't provided on x86. Use a temporary register instead:
mov eax, dword ptr [a] ; load value of 'a' (which is an address) in eax
mov eax, dword ptr [eax] ; dereference address, and load contents in eax
mov dword ptr [num], eax ; store value in 'num'
Notice that I removed the *4, as it doesn't really make sense to multiply a pointer by four - maybe you meant to use a as base plus some other index?
1 Other compilers, such as gcc, provide means to control way more finely the interaction between inline assembly and compiler generated code, which provides great flexibility and power but has quite a steep learning curve and requires great care to get everything right.

x86 Assembly pointers

I am trying to wrap my mind around pointers in Assembly.
What exactly is the difference between:
mov eax, ebx
and
mov [eax], ebx
and when should dword ptr [eax] should be used?
Also when I try to do mov eax, [ebx] I get a compile error, why is this?

As has already been stated, wrapping brackets around an operand means that that operand is to be dereferenced, as if it were a pointer in C. In other words, the brackets mean that you are reading a value from (or storing a value into) that memory location, rather than reading that value directly.
So, this:
mov eax, ebx
simply copies the value in ebx into eax. In a pseudo-C notation, this would be: eax = ebx.
Whereas this:
mov eax, [ebx]
dereferences the contents of ebx and stores the pointed-to value in eax. In a pseudo-C notation, this would be: eax = *ebx.
Finally, this:
mov [eax], ebx
stores the value in ebx into the memory location pointed to by eax. Again, in pseudo-C notation: *eax = ebx.
The registers here could also be replaced with memory operands, such as symbolic variable names. So this:
mov eax, [myVar]
dereferences the address of the variable myVar and stores the contents of that variable in eax, like eax = myVar.
By contrast, this:
mov eax, myVar
stores the address of the variable myVar into eax, like eax = &myVar.
At least, that's how most assemblers work. Microsoft's assembler (called MASM), and the Microsoft C/C++ compiler's inline assembly, is a bit different. It treats the above two instructions as equivalent, essentially ignoring the brackets around memory operands.
To get the address of a variable in MASM, you would use the OFFSET keyword:
mov eax, OFFSET myVar
However, even though MASM has this forgiving syntax and allows you to be sloppy, you shouldn't. Always include the brackets when you want to dereference a variable and get its actual value. You will never get the wrong result if you write the code explicitly using the proper syntax, and it'll make it easier for others to understand. Plus, it'll force you to get into the habit of writing the code the way that other assemblers will expect it to be written, rather than relying on MASM's "do what I mean, not what I write" crutch.
Speaking of that "do what I mean, not what I write" crutch, MASM also generally allows you to get away with omitting the operand-size specifier, since it knows the size of the variable. But again, I recommend writing it for clarity and consistency. Therefore, if myVar is an int, you would do:
mov eax, DWORD PTR [myVar] ; eax = myVar
or
mov DWORD PTR [myVar], eax ; myVar = eax
This notation is necessary in other assemblers like NASM that are not strongly-typed and don't remember that myVar is a DWORD-sized memory location.
You don't need this at all when dereferencing register operands, since the name of the register indicates its size. al and ah are always BYTE-sized, ax is always WORD-sized, eax is always DWORD-sized, and rax is always QWORD-sized. But it doesn't hurt to include it anyway, if you like, for consistency with the way you notate memory operands.
Also when I try to do mov eax, [ebx] I get a compile error, why is this?
Um…you shouldn't. This assembles fine for me in MSVC's inline assembly. As we have already seen, it is equivalent to:
mov eax, DWORD PTR [ebx]
and means that the memory location pointed to by ebx will be dereferenced and that DWORD-sized value will be loaded into eax.
why I cant do mov a, [eax] Should that not make "a" a pointer to wherever eax is pointing?
No. This combination of operands is not allowed. As you can see from the documentation for the MOV instruction, there are essentially five possibilities (ignoring alternate encodings and segments):
mov register, register ; copy one register to another
mov register, memory ; load value from memory into register
mov memory, register ; store value from register into memory
mov register, immediate ; move immediate value (constant) into register
mov memory, immediate ; store immediate value (constant) in memory
Notice that there is no mov memory, memory, which is what you were trying.
However, you can make a point to what eax is pointing to by simply coding:
mov DWORD PTR [a], eax
Now a and eax have the same value. If eax was a pointer, then a is now a pointer to that same memory location.
If you want to set a to the value that eax is pointing to, then you will need to do:
mov eax, DWORD PTR [eax] ; eax = *eax
mov DWORD PTR [a], eax ; a = eax
Of course, this clobbers the pointer and replaces it with the dereferenced value. If you don't want to lose the pointer, then you will have to use a second "scratch" register; something like:
mov edx, DWORD PTR [eax] ; edx = *eax
mov DWORD PTR [a], edx ; a = edx
I realize this is all somewhat confusing. The mov instruction is overloaded with a large number of potential meanings in the x86 ISA. This is due to x86's roots as a CISC architecture. By contrast, modern RISC architectures do a better job of separating register-register moves, memory loads, and memory stores. x86 crams them all into a single mov instruction. It's too late to go back and fix it now; you just have to get comfortable with the syntax, and sometimes it takes a second glance.

Indirect Register Addressing

I am trying to figure out how register indirect addressing works. I have a variable that stores the value of 5 as follows:
section .data
number db 53 ; ascii code for 5
section .bss
number2 resb 1
section .text
global _start
_start:
mov eax,number
mov number2,[eax]
At the last two lines of the code what I am essentially trying to do is made eax act like a pointer to the data stored at number and then move this data into the number2 variable. I had though indirect register addressing was done via [register] but my code does not seem to work. Any help with regards to syntax would be much appreciated.

Labels work as addresses in nasm so your mov number2, [eax] would translate to something like mov 0x12345678, [eax] which is of course invalid because you cannot move data to immediate operand. So you would need mov [number2], [eax] but that's also invalid.
You can achieve this using some register to temporarily hold the value [eax]:
mov eax, number
mov dl, [eax]
mov [number2], dl

The problem here is, that number and number2 are not numbers, i.e. immediate literals. Instead they are interpreted as absolute memory addresses and the corresponding instructions, if they would exist would be e.g.
mov eax, [0x80000100] ;; vs
mov [0x80000104], [eax] ;; Invalid instruction
One has to pay attention to the instruction format as well, as answered by Mika Lammi -- is the instruction
mov src, dst ;; vs
mov dst, src
In addition, one should match the register size to the variable size; i.e
.data
number db 1; // this is a byte
.code
mov al, number

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex