GDB shows the memory addresses of the instructions as below, but are those the real addresses? I follow some kind of a blog and the addresses are much different than mine, 0x080484bf. Does it has to do smth with the environment?
**0x0000000000001286** <+253>: mov ecx,DWORD PTR [rbp-0x4]
**0x0000000000001289** <+256>: mov edx,DWORD PTR [rbp-0x4]
If that's normal, let's say I already overwrited the IP and I want to jump to 0x00..1286, how should I write the address in hex format? Will \x86\x12\x00\x00 be fine? Because it doesn't seem to work very well.
Related
UPDATE: the code below DOES work to dereference the pointer. I had incorrectly inserted some lines at the entry point that had the effect of overriding the memory location f1_ptr. The important part is that to defererence the pointer when it's stored in a memory location is: mov r15,qword[f1_ptr] / mov rdx,qword[r15+0]. Move memory to r15, then move r15 to rdx. That does it. But as Peter Cordes explains below, memory locations are not thread safe, so best to use registers for pointers at least.
****End of Update****
I am using ctypes to pass a pointer to an array of pointers; each pointer points to the start of a string in a list of names. In the Windows ABI, the pointer is passed as the first parameter in rcx.
On entry to the program, I ordinarily put pointers into memory variables because I can't keep them in the low registers like rcx and rdx; in this case, it's stored as mov [f1_ptr],rcx. But later in the program, when I move from memory to register it doesn't work. In other work with simple pointers (not pointers to an array of pointers), I have no problem.
Based on the answer to an earlier question (Python ctypes how to read a byte from a character array passed to NASM), I found that IF I store rcx in another register on entry (e.g., r15), I can freely use that with no problem downstream in the program. For example, to access the second byte of the second name string:
xor rax,rax
mov rdx,qword[r15+8]
movsx eax,BYTE[rdx+1]
jmp label_900
If instead I mov r15,[f1_ptr] downstream in the program, that doesn't work. To emulate the code above:
xor rax,rax
mov r15,qword[f1_ptr]
mov rdx,qword[r15+8]
movsx eax,BYTE[rdx+1]
jmp label_900
but it not only doesn't work, it crashes.
So the question is: rcx is stored in memory on entry to the program. Later I read it back from memory into r15 and dereference it the same way. Why doesn't it work the same way?
The full code, minus the code segments shown above, is at the link I posted above.
in c language we use & to get the address of a variable and * to dereference the variable.
int variable=10;
int *pointer;
pointer = &variable;
How to do it in nasm x86 assembly language.
i read nasm manual and found that [ variable_address ] works like dereferencing.( i maybe wrong ).
section .data
variable db 'A'
section .text
global _start
_start:
mov eax , 4
mov ebx , 1
mov ecx , [variable]
mov edx , 8
int 0x80
mov eax ,1
int 0x80
i executed this code it prints nothing. i can't understand what is wrong with my code.
need your help to understand pointer and dereferencing in nasm x86.
There are no variables in assembly. (*)
variable db 'A'
Does several things. It defines assembly-time symbol variable, which is like bookmark into memory, containing address of *here* in the time of compilation. It's same thing as doing label on empty line like:
variable:
The db 'A' directive is "define byte", and you give it single byte value to be defined, so it will produce single byte into resulting machine code with value 0x41 or 65 in decimal. That's the value of big letter A in ASCII encoding.
Then:
mov ecx , [variable]
Does load 4 bytes from memory cells at address variable, which means the low 8 bits ecx will contain the value 65, and the upper 24 bits will contain some junk which happened to reside in the following 3 bytes after the 'A' .. (would you use db 'ABCD', then the ecx would be equal to value 0x44434241 ('D' 'C' 'B' 'A' letters, "reversed" in bits due to little-endian encoding of dword values on x86).
But the sys_write expect the ecx to hold address of memory, where the content bytes are stored, so you need instead:
mov ecx, variable
That will in NASM load address of the data into ecx.
(in MASM/TASM this would instead assemble as mov ecx,[variable] and to get address you have to use mov ecx, OFFSET variable, in case you happen to find some MASM/TASM example, be aware of the syntax difference).
*) some more info about "no variables". Keep in mind in assembly you are on the machine level. On the machine level there is computer memory, which is addressable by bytes (on x86 platform! There are some platforms, where memory may be addressable by different size, they are not common, but in micro-controllers world you may find some). So by using some memory address, you can access some particular byte(s) in the physical memory chip (which particular physical place in memory chip is addressed depends on your platform, the modern OS will usually give user application virtual addressing space, translated to physical addresses by CPU on the fly, transparently, without bothering user code about that translation).
All the advanced logical concepts like "variables", "arrays", "strings", etc... are just bunch of byte values in memory, and all that logical meaning is given to the memory data by the instructions being executed. When you look at those data without the context of the instructions, they are just some byte values in memory, nothing more.
So if you are not precise with your code, and you access single-byte "variable" by instruction fetching dword, like you did in your mov ecx,[variable] example, there's nothing wrong about that from the machine point of view, and it will happily fetch 4 bytes of memory into ecx register, nor the NASM is bothered to report you, that you are probably out-of-bounds accessing memory beyond your original variable definition. This is sort of stupid behaviour, if you think in terms like "variables", and other high-level programming languages concepts. But assembly is not intended for such work, actually having the full control over machine is the main purpose of assembly, and if you want to fetch 4 bytes, you can, it's all up to programmer. It just requires tremendous amount of precision, and attention to detail, staying aware of your memory structures layout, and using correct instructions with desired memory operand sizes, like movzx ecx,byte [variable] to load only single byte from memory, and zero-extend that value into full 32b value in the target ecx register.
I've just started to learn assembly in school, and we're starting to dive into registers and how to use them.
A point that I can't seem to understand is how does the instruction pointer get the address of the next instruction?
For instance take the following code:
nop
pushl %ebp
movl %esp, %ebp
subl $4, %esp
In the previous code the instruction pointer gets incremented after each line, and I'd like to know how does it know which instruction to do next (i.e mov,sub,push,...etc.)? Are all the previous instruction first loaded into RAM when we first run the program and the address of the first instruction (nop in this case) gets automatically loaded into eip, then it just goes over them one by one? Or am I missing something?
Any help is appreciated.
EIP is updated by the microcode (firmware) in the CPU itself each time an instruction is retrieved and decoded for execution. I don't believe you can even access it is in the usual sense. However it can be modified using a jmp instruction, which is functionally (not include pipeline issues and so forth) the same as mov %eip,address. It is also updated on conditional jumps, call, and ret instructions.
Once your program is loaded into memory (during this process you can think of you program as simply data like any other file), the OS (or some other loader program) performs a jmp to the start of your program. Of course the code you are showing as example code is the real start of the program but simply a function that main has called.
I wrote a asm program it begins like this:
org 0100h
mov ax,cs
mov ds,ax
mov es,ax
But when I look at the program with winhex,the address is not 0100h.Could anyone tell me why?
I am going to quote Paul R and Michael Chourdakis from this question
"ORG is used to set the assembler location counter. This may or may not translate to a load address at link time."
"ORG is merely an indication on where to put the next piece of code/data, related to the current segment.
It is of no use to use it for fixed addresses, for the eventual address depends on the segment which is not known at assembly time."
If you look at the program in a hex editor it's not going to necessarily be at address 0x100. But let's look at a sample program here:
.code
org 100h
nop
mov ax,#code
mov ds,ax
mov si,100h
lodsb ;AL should be 0x90, the opcode for NOP.
Now that's assuming there's no linker or relocation magic going on behind the scenes (and on modern computers there usually is.) If you were programming an 8-bit CPU the org directive is usually literally the address of the first line of actual code.
I've got to learn assembly and I'm very confused as to what the different registers do/point to.
On some architectures, like MIPS, all registers are created equal, and there is really no difference beyond the name of the register (and software conventions). On x86 you can mostly use any registers for general-purpose computing, but some registers are implicitly bound to the instruction set.
Lots of information about special purposes for registers can be found here.
Examples:
eax, accumulator: many arithmetic instructions implicitly operate on eax. There are also special shorter EAX-specific encodings for many instructions: add eax, 123456 is 1 byte shorter than add ecx, 123456, for example. (add eax, imm32 vs. add r/m32, imm32)
ebx, base: few implicit uses, but xlat is one that matches the "Base" naming. Still relevant: cmpxchg8b. Because it's rarely required for anything specific, some 32-bit calling-conventions / ABIs use it as a pointer to the "global offset table" in Position Independent Code (PIC).
edx, data: some arithmetic operations implicitly operate on the 64-bit value in edx:eax
ecx, counter used for shift counts, and for rep movs. Also, the mostly-obsolete loop instruction implicitly decrements ecx
esi, source index: some string operations read a string from the memory pointed to by esi
edi, destination index: some string operations write a string to the memory pointed to by edi. e.g. rep movsb copies ECX bytes from [esi] to [edi].
ebp, base pointer: normally used to point to local variables. Used implicitly by leave.
esp, stack pointer: points to the top of the stack, used implicitly by push, pop, call and ret
The x86 instruction set is a complex beast, really. Many instructions have shorter forms that implicitly use one register or another. Some registers can be used to do certain addressing while others cannot.
The Intel 80386 Programmer's Reference Manual is a irreplaceable resource, it basically tells you everything there is to know about x86 assembly, except for newer extensions and performance on modern hardware.
The PC Assembly (e)book is a great resource for learning assembly.
The sp register is the stack pointer, used for stack operation like push and pop.
The stack is known as a LIFO structure (last-in, first-out), meaning that the last thing pushed on is the fist thing popped off. It's used, among other things, to implement the ability to call functions.
The bp register is the base pointer, and is commonly used for stack frame operations.
This means that it's a fixed reference to locate local variables, passed parameters and so forth on the stack, for a given level (while sp may change during the execution of a function, bp usually does not).
If you're looking at assembly language like:
mov eax, [bp+8]
you're seeing the code access a stack-level-specific variable.
The si register is the source index, typically used for mass copy operations (di is its equivalent destination index). Intel had these registers along with specific instructions for quick movement of bytes in memory.
The e- variants are just the 32-bit versions of these (originally) 16-bit registers. And, as if that weren't enough, we have 64-bit r- variants as well :-)
Perhaps the simplest place to start is here. It's specific to the 8086 but the concepts haven't changed that much. The simplicity of the 8086 compared to the current crop will be a good starting point for your education. Once you've learned the basics, it will be much easier to move up to the later members of the x86 family.
Transcribed here and edited quite a bit, to make the answer self-contained.
GENERAL PURPOSE REGISTERS
8086 CPU has 8 general purpose registers, each register has its own name:
AX - the accumulator register (divided into AH/AL). Probably the most commonly used register for general purpose stuff.
BX - the base address register (divided into BH/BL).
CX - the count register (divided into CH/CL). Special purpose instructions for loping and shifting.
DX - the data register (divided into DH/DL). Used with AX for some MUL and DIV operations, and for specifying ports in some IN and OUT operations.
SI - source index register. Special purpose instruction to use this as a source of mass memory transfers (DS:SI).
DI - destination index register. Special purpose instruction to use this as a destination of mass memory transfers (ES:DI).
BP - base pointer, primarily used for accessing parameters and variables on the stack.
SP - stack pointer, used for the basic stack operations.
SEGMENT REGISTERS
CS - points at the segment containing the current instruction.
DS - generally points at segment where variables are defined.
ES - extra segment register, it's up to a coder to define its usage.
SS - points at the segment containing the stack.
Although it is possible to store any data in the segment registers, this is never a good idea. The segment registers have a very special purpose - pointing at accessible blocks of memory.
Segment registers work together with general purpose register to access any memory value. For example, if we would like to access memory at the physical address 12345h, we could set the DS = 1230h and SI = 0045h. This way we can access much more memory than with a single register, which is limited to 16 bit values.
The CPU makes a calculation of the physical address by multiplying the segment register by 10h and adding the general purpose register to it (1230h * 10h + 45h = 12345h):
1230
0045
=====
12345
The address formed with 2 registers is called an effective address.
This usage is for real mode only (which is the only mode the 8086 had). Later processors changed these registers from segments to selectors and they are used to lookup addresses in a table, rather than having a fixed calculation performed on them.
By default BX, SI and DI registers work with DS segment register; and BP and SP work with SS segment register.
SPECIAL PURPOSE REGISTERS
IP - the instruction pointer:
Always points to next instruction to be executed.
Offset address relative to CS.
IP register always works together with CS segment register and it points to currently executing instruction.
FLAGS REGISTER
Determines the current state of the processor. These flags are modified automatically by CPU after mathematical operations, this allows to determine the type of the result, and to determine conditions to transfer control to other parts of the program.
Generally you cannot access these registers directly.
Carry Flag CF - this flag is set to 1 when there is an unsigned overflow. For example when you add bytes 255 + 1 (result is not in range 0...255). When there is no overflow this flag is set to 0.
Parity Flag PF - this flag is set to 1 when there is even number of one bits in result, and to 0 when there is odd number of one bits.
Auxiliary Flag AF - set to 1 when there is an unsigned overflow for low nibble (4 bits).
Zero Flag ZF - set to 1 when result is zero. For non-zero result this flag is set to 0.
Sign Flag SF - set to 1 when result is negative. When result is positive it is set to 0. (This flag takes the value of the most significant bit.)
Trap Flag TF - Used for on-chip debugging.
Interrupt enable Flag IF - when this flag is set to 1 CPU reacts to interrupts from external devices.
Direction Flag DF - this flag is used by some instructions to process data chains, when this flag is set to 0 - the processing is done forward, when this flag is set to 1 the processing is done backward.
Overflow Flag OF - set to 1 when there is a signed overflow. For example, when you add bytes 100 + 50 (result is not in range -128...127).
Here's a simplified summary:
ESP is the current stack pointer, so you generally only update it to manipulate stack, and EBP is intended for stack manipulation too, for example saving the value of ESP before allocating stack space for local variables. But you can use EBP as a general purpose register too.
ESI is the Extended Source Index register, "string" (different from C-string, and I don't mean the type of C-string women wear either) instructions like MOVS use ESI and EDI.
Memory Addressing:
x86 CPUs have these special registers called "segment registers", each of them can point to different address, for example DS (commonly called data segment) may point to 0x1000000, and SS (commonly called stack segment) may point to 0x2000000.
When you use EBP and ESP, the default segment register used is SS, for ESI (and other general purpose registers), it's DS. For example, let's say DS=0x1000000, SS=0x2000000, EBP=0x10, ESI=0x10, so:
mov eax,[esp] //loading from address 0x2000000 + 0x10
mov eax,[esi] //loading from address 0x1000000 + 0x10
You can also specify a segment register to use, overriding the default:
mov eax,ds:[ebp]
In terms of addition, subtraction, logical operations, etc, there's no real difference between them.