What's the address of the program running in memory? - assemblies

I wrote a asm program it begins like this:
org 0100h
mov ax,cs
mov ds,ax
mov es,ax
But when I look at the program with winhex,the address is not 0100h.Could anyone tell me why?

I am going to quote Paul R and Michael Chourdakis from this question
"ORG is used to set the assembler location counter. This may or may not translate to a load address at link time."
"ORG is merely an indication on where to put the next piece of code/data, related to the current segment.
It is of no use to use it for fixed addresses, for the eventual address depends on the segment which is not known at assembly time."

If you look at the program in a hex editor it's not going to necessarily be at address 0x100. But let's look at a sample program here:
.code
org 100h
nop
mov ax,#code
mov ds,ax
mov si,100h
lodsb ;AL should be 0x90, the opcode for NOP.
Now that's assuming there's no linker or relocation magic going on behind the scenes (and on modern computers there usually is.) If you were programming an 8-bit CPU the org directive is usually literally the address of the first line of actual code.

Related

Are those addressed the real one?

GDB shows the memory addresses of the instructions as below, but are those the real addresses? I follow some kind of a blog and the addresses are much different than mine, 0x080484bf. Does it has to do smth with the environment?
**0x0000000000001286** <+253>: mov ecx,DWORD PTR [rbp-0x4]
**0x0000000000001289** <+256>: mov edx,DWORD PTR [rbp-0x4]
If that's normal, let's say I already overwrited the IP and I want to jump to 0x00..1286, how should I write the address in hex format? Will \x86\x12\x00\x00 be fine? Because it doesn't seem to work very well.

Why RBP instead of another register as a frame pointer?

I understand the usage of push rbp...pop rbp at the start and end of a function to preserve the rbp value of the calling function, since the rbp register is callee-preserved. And then I understand the 'convention' of using rbp as the current top of the stack frame for the current procedure being executed. But related to this I have two questions:
Is rbp just a convention? Could I just as easily use r11 (or any other register or even 8 bytes on the stack) as the base of the stack frame? Is there anything special about the rbp register, or it's just used as the stack frame based upon history and convention?
Why is mov %rbp, %rsp used as a 'cleanup' method before leaving a function? For example, often the push/pop instructions will be symmetrical, so is the mov %rbp, %rsp just a shorthand way where someone can 'skip' doing the symmetrical pops/adds and such? What would be an actual usage of where mov %rbp, %rsp would be useful? Almost all the times I see it in compiler output (with zero optimizations turned on), it seems either unnecessary or redundant, and I'm having trouble thinking of a scenario where it might actually be useful.
Optimized code doesn't use frame pointers at all, except for stuff like VLAs / alloca (variable-sized movement of RSP), or if you specifically use -fno-omit-frame-pointer (e.g. to make perf record stack sampling more efficient/reliable). Un-optimized code is usually not as interesting to look at. How to remove "noise" from GCC/clang assembly output?
x86_64 : is stack frame pointer almost useless?
Why is it better to use the ebp than the esp register to locate parameters on the stack? (only for code-size)
What are the advantages of a frame pointer?
So there are plenty of duplicates for the part about when / why to use a frame pointer at all. The interesting part is whether a register other than RBP could have been chosen.
The only things special about RBP are that leave can compactly do RSP=RBP + pop RBP; and that a (%rbp) addressing mode requires an explicit disp8 or disp32 (with value 0).
So if you are going to use a frame pointer at all, you should pick RBP because it's at least as good as any other reg at being a frame pointer, but worse than other regs for some other uses. You never need 0(frame_pointer), only other offsets. (R13 has the same always-needs-a-disp8=0 effect, but then every stack access would always need a REX prefix, like for add -12(%r13), %eax which doesn't with RBP.)
Also, all other "legacy" registers (that you can use without a REX, i.e. not R8-R15) have at least one implicit use in at least one instruction that compilers may actually generate, like cmpxchg16b, cpuid, shl %cl, %reg, rep movsb or whatever, so any other reg would be worse as a frame pointer. You can't do simple naive un-optimized (or toy-compiler) code-gen if you need to shuffle things around to free up RBX for some instruction that needs it for a different purpose. (Stack unwinding on exceptions may also rely on the frame pointer always being in a specific register, if your .cfi_* directives specified that.)
Consistency with previous x86 modes would have been sufficient reason to use RBP, to make it easier for puny human minds to remember, but there are still code-size and other reasons to pick RBP if you're going to use one. (In fact, since (%rsp) addressing modes always need a SIB byte, the instructions to set up a frame pointer can actually pay for themselves over a large function in terms of code size, although not in instructions / uops.)
Reasons that aren't still relevant:
An RBP base address implies the SS segment, like RSP, which was relevant in 16-bit mode, and theoretically in 32 (where non-flat memory models were possible), but not in 64-bit mode where it only affects the exception you get from a non-canonical address. So that part of the reason is basically gone, pretty much nobody cares about #GP vs. #SS there.
enter is too slow to be usable, but leave is still worth using if RSP isn't already pointing at the saved RBP, only costing 1 extra uop vs. manual mov %rbp, %rsp / pop %rbp on Intel CPUs, so that's what GCC does. You claim to have seen useless mov %rbp, %rsp instructions, but that's not what compilers actually do.
Note that mov %rbp, %rsp (3 bytes) is smaller than add $imm8, %rsp (4 bytes), so if you're using a frame pointer, you might as well restore RSP that way if it's not pointing at the saved RBP. (Unless you need to restore other registers if you saved them right below RBP instead of after a sub $imm, %rsp, although you can do the restoring with mov loads instead of pop.)

CTypes NASM - how to dereference a pointer to an array of pointers

UPDATE: the code below DOES work to dereference the pointer. I had incorrectly inserted some lines at the entry point that had the effect of overriding the memory location f1_ptr. The important part is that to defererence the pointer when it's stored in a memory location is: mov r15,qword[f1_ptr] / mov rdx,qword[r15+0]. Move memory to r15, then move r15 to rdx. That does it. But as Peter Cordes explains below, memory locations are not thread safe, so best to use registers for pointers at least.
****End of Update****
I am using ctypes to pass a pointer to an array of pointers; each pointer points to the start of a string in a list of names. In the Windows ABI, the pointer is passed as the first parameter in rcx.
On entry to the program, I ordinarily put pointers into memory variables because I can't keep them in the low registers like rcx and rdx; in this case, it's stored as mov [f1_ptr],rcx. But later in the program, when I move from memory to register it doesn't work. In other work with simple pointers (not pointers to an array of pointers), I have no problem.
Based on the answer to an earlier question (Python ctypes how to read a byte from a character array passed to NASM), I found that IF I store rcx in another register on entry (e.g., r15), I can freely use that with no problem downstream in the program. For example, to access the second byte of the second name string:
xor rax,rax
mov rdx,qword[r15+8]
movsx eax,BYTE[rdx+1]
jmp label_900
If instead I mov r15,[f1_ptr] downstream in the program, that doesn't work. To emulate the code above:
xor rax,rax
mov r15,qword[f1_ptr]
mov rdx,qword[r15+8]
movsx eax,BYTE[rdx+1]
jmp label_900
but it not only doesn't work, it crashes.
So the question is: rcx is stored in memory on entry to the program. Later I read it back from memory into r15 and dereference it the same way. Why doesn't it work the same way?
The full code, minus the code segments shown above, is at the link I posted above.

What happens to instruction pointers when address overrides are used to target a smaller address space?

What happens to instruction pointers when address overrides are used to target a smaller address space e.g. the default is 32-bit address but the override converts to 16?
So, let's say we're in x86-32 mode and the default is a 32-bit memory space for the current code segment we're in.
Further, the IP register contains the value 87654321h.
If I use 67h to override the default and make the memory space 16-bit for just that one instruction, how does the processor compute the offset into the current code segment?
Some bits in the IP have to be ignored, otherwise you'd be outside the 16-bit memory space specified by the override.
So, does the processor just ignore the 8765 part in the IP register?
That is, does the processor just use the 4 least significant bits and ignore the 4 most significant bits?
What about address overrides associated with access to data segments?
For example, we're in x86-32 mode, the default is 32 bit memory addressing and we use 67h prefix for this instruction: mov eax, [ebx].
Now, ebx contains a 32 bit number.
Does the 67h override change the above instruction to: mov eax, [bx]?
What about "constant pointers"? Example: mov eax, [87654321].
Would the 67h override change it to mov eax, [4321]?
Does the memory override affect the offset into the data segment also or just the code segment?
How do address overrides affect the stack pointer?
If the stack pointer contains a 32 bit number (again we'll use 87654321h) and I push or pop, what memory is referenced?
Pushing and popping indirectly accesses memory.
So, would you only use the 4321 bits in the IP register ignoring the most significant bits?
Also, what about the segment bases themselves?
Example: we're in x86-32 mode, default 32 bit memory space, but we use 67h override.
The CS register points to a descriptor in the GDT whose segment base is, again lol, 87654321h.
We're immediately outside of the 16-bit memory range without even adding an offset.
What does the processor do? Ignore the 4 most significant bits? The same question can be applied to the segment descriptors for the data and stack segments.
0x67 is the address-size prefix. It changes the interpretation of an addressing mode in the instruction.
It does not put the machine temporarily into 16-bit mode or truncate EIP to 16-bit, or affect any other addresses that don't explicitly come from an [addressing mode] in the instruction.
For push/pop, the instruction reference manual entry for push says:
The address size is used only when referencing a source operand in memory.
So in 32-bit mode, a16 push eax would still set esp-=4 and then store [esp] = eax. It would not truncate ESP to 16 bits. The prefix would have no effect, because the only memory operand is implicit not explicit.
push [ebx] is affected by the 67 prefix, though.
db 0x67
push dword [ebx]
would decode as push dword [bp+di], and load 32 bits from that 16-bit address (ignoring the high 16 of those registers). (16-bit addressing modes use a different encoding than 32/64 (with no optional SIB byte).
However, it would still update the full esp, and store to [esp].
(For the effective-address encoding details, see Intel's volume 2 PDF, Chapter 2: INSTRUCTION FORMAT, table 2-1 (16-bit) vs. table 2-2 (32-bit).)
In 64-bit mode, the address-size prefix would turn push [rbx] into push [ebx]).
Since some forms of push can be affected by the address-size prefix, this might not fall into the category of meaningless prefixes, use of which is reserved and may produce unpredictable behaviour in future CPUs. (What happens when you use a memory override prefix but all the operands are registers?). OTOH, that may only apply to the push r/m32 opcode for push, not for the push r32 short forms that can't take a memory operand.
I think the way it's worded, Intel's manual really doesn't guarantee that even the push r/m32 longer encoding of push ebx wouldn't decode as something different in future CPUs with a 67 prefix.
For example, we're in x86-32 mode, the default is 32 bit memory addressing and we use 67h prefix for this instruction: mov eax, [ebx].
Now, ebx contains a 32 bit number.
Does the 67h override change the above instruction to: mov eax, [bx]?
What about "constant pointers"? Example: mov eax, [87654321].
Would the 67h override change it to mov eax, [4321]?
The address size override doesn't just change the size of the address, it actually changes the addressing scheme.
A 67 override on mov eax, [ebx] changes it to mov eax, [bp+di].
A 67 override on mov eax, [87654321] changes it to mov eax, [di] (followed by and [ebx+65], eax and some xchg instruction).

How does the EIP register get its value?

I've just started to learn assembly in school, and we're starting to dive into registers and how to use them.
A point that I can't seem to understand is how does the instruction pointer get the address of the next instruction?
For instance take the following code:
nop
pushl %ebp
movl %esp, %ebp
subl $4, %esp
In the previous code the instruction pointer gets incremented after each line, and I'd like to know how does it know which instruction to do next (i.e mov,sub,push,...etc.)? Are all the previous instruction first loaded into RAM when we first run the program and the address of the first instruction (nop in this case) gets automatically loaded into eip, then it just goes over them one by one? Or am I missing something?
Any help is appreciated.
EIP is updated by the microcode (firmware) in the CPU itself each time an instruction is retrieved and decoded for execution. I don't believe you can even access it is in the usual sense. However it can be modified using a jmp instruction, which is functionally (not include pipeline issues and so forth) the same as mov %eip,address. It is also updated on conditional jumps, call, and ret instructions.
Once your program is loaded into memory (during this process you can think of you program as simply data like any other file), the OS (or some other loader program) performs a jmp to the start of your program. Of course the code you are showing as example code is the real start of the program but simply a function that main has called.

Resources